Compare commits

...

4919 Commits

Author SHA1 Message Date
Konstantin Osipov
fd293768e7 storage_proxy: do not touch all_replicas.front() if it's empty.
The list of all endpoints for a query can be empty if we have
replication_factor 0 or there are no live endpoints for this token.
Do not access all_replicas.front() in this case.

Fixes #5935.
Message-Id: <20200306192521.73486-2-kostja@scylladb.com>

(cherry picked from commit 9827efe554)
2020-06-22 18:29:15 +03:00
Gleb Natapov
22dfa48585 cql transport: do not log broken pipe error when a client closes its side of a connection abruptly
Fixes #5661

Message-Id: <20200615075958.GL335449@scylladb.com>
(cherry picked from commit 7ca937778d)
2020-06-21 13:09:22 +03:00
Benny Halevy
2f3d7f1408 cql3::util::maybe_quote: avoid stack overflow and fix quote doubling
The function was reimplemented to solve the following issues.
The cutom implementation also improved its performance in
close to 19%

Using regex_match("[a-z][a-z0-9_]*") may cause stack overflow on long input strings
as found with the limits_test.py:TestLimits.max_key_length_test dtest.

std::regex_replace does not replace in-place so no doubling of
quotes was actually done.

Add unit test that reproduces the crash without this fix
and tests various string patterns for correctness.

Note that defining the regex with std::regex::optimize
still ended up with stack overflow.

Fixes #5671

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0329fe1fd1)
2020-06-21 13:07:21 +03:00
Gleb Natapov
76a08df939 commitlog: fix size of a write used to zero a segment
Due to a bug the entire segment is written in one huge write of 32Mb.
The idea was to split it to writes of 128K, so fix it.

Fixes #5857

Message-Id: <20200220102939.30769-1-gleb@scylladb.com>
(cherry picked from commit df2f67626b)
2020-06-21 13:03:05 +03:00
Amnon Heiman
6aa129d3b0 api/storage_service.cc: stream result of token_range
The get token range API can become big which can cause large allocation
and stalls.

This patch replace the implementation so it would stream the results
using the http stream capabilities instead of serialization and sending
one big buffer.

Fixes #6297

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 7c4562d532)
2020-06-21 12:57:48 +03:00
Takuya ASADA
b4f781e4eb scylla_post_install.sh: fix operator precedence issue with multiple statements
In bash, 'A || B && C' will be problem because when A is true, then it will be
evaluates C, since && and || have the same precedence.
To avoid the issue we need make B && C in one statement.

Fixes #5764

(cherry picked from commit b6988112b4)
2020-06-21 12:47:05 +03:00
Takuya ASADA
27594ca50e scylla_raid_setup: create missing directories
We need to create hints, view_hints, saved_caches directories
on RAID volume.

Fixes #5811

(cherry picked from commit 086f0ffd5a)
2020-06-21 12:45:27 +03:00
Rafael Ávila de Espíndola
0f2f0d65d7 configure: Reduce the dynamic linker path size
gdb has a SO_NAME_MAX_PATH_SIZE of 512, so we use that as the path
size.

Fixes: #6494

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200528202741.398695-2-espindola@scylladb.com>
(cherry picked from commit aa778ec152)
2020-06-21 12:29:16 +03:00
Tomasz Grabiec
31c2f8a3ae row_cache: Fix undefined behavior on key linearization
This is relevant only when using partition or clustering keys which
have a representation in memory which is larger than 12.8 KB (10% of
LSA segment size).

There are several places in code (cache, background garbage
collection) which may need to linearize keys because of performing key
comparison, but it's not done safely:

 1) the code does not run with the LSA region locked, so pointers may
get invalidated on linearization if it needs to reclaim memory. This
is fixed by running the code inside an allocating section.

 2) LSA region is locked, but the scope of
with_linearized_managed_bytes() encloses the allocating section. If
allocating section needs to reclaim, linearization context will
contain invalidated pointers. The fix is to reorder the scopes so
that linearization context lives within an allocating section.

Example of 1 can be found in
range_populating_reader::handle_end_of_stream() where it performs a
lookup:

  auto prev = std::prev(it);
  if (prev->key().equal(*_cache._schema, *_last_key->_key)) {
     it->set_continuous(true);

but handle_end_of_stream() is not invoked under allocating section.

Example of 2 can be found in mutation_cleaner_impl::merge_some() where
it does:

  return with_linearized_managed_bytes([&] {
  ...
    return _worker_state->alloc_section(region, [&] {

Fixes #6637.
Refs #6108.

Tests:

  - unit (all)

Message-Id: <1592218544-9435-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit e81fc1f095)
2020-06-21 11:58:59 +03:00
Yaron Kaikov
ec12331f11 release: prepare for 3.3.4 2020-06-15 21:19:02 +03:00
Avi Kivity
ccc463b5e5 tools: toolchain: regenerate for gnutls 3.6.14
CVE-2020-13777.

Fixes #6627.

Toolchain source image registry disambiguated due to tighter podman defaults.
2020-06-15 08:05:58 +03:00
Calle Wilund
4a9676f6b7 gms::inet_address: Fix sign extension error in custom address formatting
Fixes #5808

Seems some gcc:s will generate the code as sign extending. Mine does not,
but this should be more correct anyhow.

Added small stringify test to serialization_test for inet_address

(cherry picked from commit a14a28cdf4)
2020-06-09 20:16:50 +03:00
Takuya ASADA
aaf4989c31 aws: update enhanced networking supported instance list
Sync enhanced networking supported instance list to latest one.

Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html

Fixes #6540

(cherry picked from commit 969c4258cf)
2020-06-09 16:03:00 +03:00
Asias He
b29f954f20 gossip: Make is_safe_for_bootstrap more strict
Consider

1. Start n1, n2 in the cluster
2. Stop n2 and delete all data for n2
3. Start n2 to replace itself with replace_address_first_boot: n2
4. Kill n2 before n2 finishes the replace operation
5. Remove replace_address_first_boot: n2 from scylla.yaml of n2
6. Delete all data for n2
7. Start n2

At step 7, n2 will be allowed to bootstrap as a new node, because the
application state of n2 in the cluster is HIBERNATE which is not
rejected in the check of is_safe_for_bootstrap. As a result, n2 will
replace n2 with a different tokens and a different host_id, as if the
old n2 node was removed from the cluster silently.

Fixes #5172

(cherry picked from commit cdcedf5eb9)
2020-05-25 14:30:53 +03:00
Eliran Sinvani
5546d5df7b Auth: return correct error code when role is not found
Scylla returns the wrong error code (0000 - server internal error)
in response to trying to do authentication/authorization operations
that involves a non-existing role.
This commit changes those cases to return error code 2200 (invalid
query) which is the correct one and also the one that Cassandra
returns.
Tests:
    Unit tests (Dev)
    All auth and auth_role dtests

(cherry picked from commit ce8cebe34801f0ef0e327a32f37442b513ffc214)

Fixes #6363.
2020-05-25 12:58:38 +03:00
Amnon Heiman
541c29677f storage_service: get_range_to_address_map prevent use after free
The implementation of get_range_to_address_map has a default behaviour,
when getting an empty keypsace, it uses the first non-system keyspace
(first here is basically, just a keyspace).

The current implementation has two issues, first, it uses a reference to
a string that is held on a stack of another function. In other word,
there's a use after free that is not clear why we never hit.

The second, it calls get_non_system_keyspaces twice. Though this is not
a bug, it's redundant (get_non_system_keyspaces uses a loop, so calling
that function does have a cost).

This patch solves both issues, by chaning the implementation to hold a
string instead of a reference to a string.

Second, it stores the results from get_non_system_keyspaces and reuse
them it's more efficient and holds the returned values on the local
stack.

Fixes #6465

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 69a46d4179)
2020-05-25 12:48:48 +03:00
Hagit Segev
06f18108c0 release: prepare for 3.3.3 2020-05-24 23:28:07 +03:00
Tomasz Grabiec
90002ca3d2 sstables: index_reader: Fix overflow when calculating promoted index end
When index file is larger than 4GB, offset calculation will overflow
uint32_t and _promoted_index_end will be too small.

As a result, promoted_index_size calculation will underflow and the
rest of the page will be interpretd as a promoted index.

The partitions which are in the remainder of the index page will not
be found by single-partition queries.

Data is not lost.

Introduced in 6c5f8e0eda.

Fixes #6040
Message-Id: <20200521174822.8350-1-tgrabiec@scylladb.com>

(cherry picked from commit a6c87a7b9e)
2020-05-24 09:46:11 +03:00
Rafael Ávila de Espíndola
da23902311 repair: Make sure sinks are always closed
In a recent next failure I got the following backtrace

    function=function@entry=0x270360 "seastar::rpc::sink_impl<Serializer, Out>::~sink_impl() [with Serializer = netw::serializer; Out = {repair_row_on_wire_with_cmd}]") at assert.c:101
    at ./seastar/include/seastar/core/shared_ptr.hh:463
    at repair/row_level.cc:2059

This patch changes a few functions to use finally to make sure the sink
is always closed.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200515202803.60020-1-espindola@scylladb.com>
(cherry picked from commit 311fbe2f0a)

Ref #6414
2020-05-20 09:00:57 +03:00
Asias He
2b0dc21f97 repair: Fix race between write_end_of_stream and apply_rows
Consider: n1, n2, n1 is the repair master, n2 is the repair follower.

=== Case 1 ===
1) n1 sends missing rows {r1, r2} to n2
2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1
   is written to sstable, r2 is not written yet, r1 belongs to
   partition 1, r2 belongs to partition 2. It yields after row r1 is
   written.
   data: partition_start, r1
3) n1 sends repair_row_level_stop to n2 because error has happened on n1
4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream()
   data: partition_start, r1, partition_end
5) Step 2 resumes to apply the rows.
   data: partition_start, r1, partition_end, partition_end, partition_start, r2

=== Case 2 ===
1) n1 sends missing rows {r1, r2} to n2
2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1
   is written to sstable, r2 is not written yet, r1 belongs to partition
   1, r2 belongs to partition 2. It yields after partition_start for r2
   is written but before _partition_opened is set to true.
   data: partition_start, r1, partition_end, partition_start
3) n1 sends repair_row_level_stop to n2 because error has happened on n1
4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream().
   Since _partition_opened[node_idx] is false, partition_end is skipped,
   end_of_stream is written.
   data: partition_start, r1, partition_end, partition_start, end_of_stream

This causes unbalanced partition_start and partition_end in the stream
written to sstables.

To fix, serialize the write_end_of_stream and apply_rows with a semaphore.

Fixes: #6394
Fixes: #6296
Fixes: #6414
(cherry picked from commit b2c4d9fdbc)
2020-05-20 08:22:05 +03:00
Piotr Dulikowski
b544691493 hinted handoff: don't keep positions of old hints in rps_set
When sending hints from one file, rps_set field in send_one_file_ctx
keeps track of commitlog positions of hints that are being currently
sent, or have failed to be sent. At the end of the operation, if sending
of some hints failed, we will choose position of the earliest hint that
failed to be sent, and will retry sending that file later, starting from
that position. This position is stored in _last_not_complete_rp.

Usually, this set has a bounded size, because we impose a limit of at
most 128 hints being sent concurrently. Because we do not attempt to
send any more hints after a failure is detected, rps_set should not have
more than 128 elements at a time.

Due to a bug, commitlog positions of old hints (older than
gc_grace_seconds of the destination table) were inserted into rps_set
but not removed after checking their age. This could cause rps_set to
grow very large when replaying a file with old hints.

Moreover, if the file mixed expired and non-expired hints (which could
happen if it had hints to two tables with different gc_grace_seconds),
and sending of some non-expired hints failed, then positions of expired
hints could influence calculation _last_not_complete_rp, and more hints
than necessary would be resent on the next retry.

This simple patch removes commitlog position of a hint from rps_set when
it is detected to be too old.

Fixes #6422

(cherry picked from commit 85d5c3d5ee)
2020-05-20 08:06:17 +03:00
Piotr Dulikowski
d420b06844 hinted handoff: remove discarded hint positions from rps_set
Related commit: 85d5c3d

When attempting to send a hint, an exception might occur that results in
that hint being discarded (e.g. keyspace or table of the hint was
removed).

When such an exception is thrown, position of the hint will already be
stored in rps_set. We are only allowed to retain positions of hints that
failed to be sent and needed to be retried later. Dropping a hint is not
an error, therefore its position should be removed from rps_set - but
current logic does not do that.

Because of that bug, hint files with many discardable hints might cause
rps_set to grow large when the file is replayed. Furthermore, leaving
positions of such hints in rps_set might cause more hints than necessary
to be re-sent if some non-discarded hints fail to be sent.

This commit fixes the problem by removing positions of discarded hints
from rps_set.

Fixes #6433

(cherry picked from commit 0c5ac0da98)
2020-05-20 08:04:10 +03:00
Avi Kivity
b3a2cb2f68 Update seastar submodule
* seastar 0ebd89a858...30f03aeba9 (1):
  > timer: add scheduling_group awareness

Fixes #6170.
2020-05-10 18:39:20 +03:00
Hagit Segev
c8c057f5f8 release: prepare for 3.3.2 2020-05-10 18:16:28 +03:00
Gleb Natapov
038bfc925c storage_proxy: limit read repair only to replicas that answered during speculative reads
Speculative reader has more targets that needed for CL. In case there is
a digest mismatch the repair runs between all of them, but that violates
provided CL. The patch makes it so that repair runs only between
replicas that answered (there will be CL of them).

Fixes #6123

Reviewed-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200402132245.GA21956@scylladb.com>
(cherry picked from commit 36a24bbb70)
2020-05-07 19:48:37 +03:00
Mike Goltsov
13a4e7db83 fix error in fstrim service (scylla_util.py)
On Centos 7 machine:

fstrim.timer not enabled, only unmasked due scylla_fstrim_setup on installation
When trying run scylla-fstrim service manually you get error:

Traceback (most recent call last):
File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 60, in <module>
main()
File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 44, in main
cfg = parse_scylla_dirs_with_default(conf=args.config)
File "/opt/scylladb/scripts/scylla_util.py", line 484, in parse_scylla_dirs_with_default
if key not in y or not y[k]:
NameError: name 'k' is not defined

It caused by error in scylla_util.py

Fixes #6294.

(cherry picked from commit 068bb3a5bf)
2020-05-07 19:45:50 +03:00
Juliusz Stasiewicz
727d6cf8f3 atomic_cell: special rule for printing counter cells
Until now, attempts to print counter update cell would end up
calling abort() because `atomic_cell_view::value()` has no
specialized visitor for `imr::pod<int64_t>::basic_view<is_mutable>`,
i.e. counter update IMR type. Such visitor is not easy to write
if we want to intercept counters only (and not all int64_t values).

Anyway, linearized byte representation of counter cell would not
be helpful without knowing if it consists of counter shards or
counter update (delta) - and this must be known upon `deserialize`.

This commit introduces simple approach: it determines cell type on
high level (from `atomic_cell_view`) and prints counter contents by
`counter_cell_view` or `atomic_cell_view::counter_update_value()`.

Fixes #5616

(cherry picked from commit 0ea17216fe)
2020-05-07 19:40:47 +03:00
Tomasz Grabiec
6d6d7b4abe sstables: Release reserved space for sharding metadata
The intention of the code was to clear sharding metadata
chunked_vector so that it doesn't bloat memory.

The type of c is `chunked_vector*`. Assigning `{}`
clears the pointer while the intended behavior was to reset the
`chunked_vector` instance. The original instance is left unmodified
with all its reserved space.

Because of this, the previous fix had no effect because token ranges
are stored entirely inline and popping them doesn't realease memory.

Fixes #4951

Tests:
  - sstable_mutation_test (dev)
  - manual using scylla binary on customer data on top of 2019.1.5

Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1584559892-27653-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 5fe626a887)
2020-05-07 19:06:22 +03:00
Tomasz Grabiec
28f974b810 Merge "Don't return stale data by properly invalidating row cache after cleanup" from Raphael
Row cache needs to be invalidated whenever data in sstables
changes. Cleanup removes data from sstables which doesn't belong to
the node anymore, which means cache must be invalidated on cleanup.
Currently, stale data can be returned when a node re-owns ranges which
data are still stored in the node's row cache, because cleanup didn't
invalidate the cache."

Fixes #4446.

tests:
- unit tests (dev mode)
- dtests:
    update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test
    cleanup_test.py

(cherry picked from commit d0b6be0820)
2020-05-07 16:24:51 +03:00
Piotr Sarna
5fdadcaf3b network_topology_strategy: validate integers
In order to prevent users from creating a network topology
strategy instance with invalid inputs, it's not enough to use
std::stol() on the input: a string "3abc" still returns the number '3',
but will later confuse cqlsh and other drivers, when they ask for
topology strategy details.
The error message is now more human readable, since for incorrect
numeric inputs it used to return a rather cryptic message:
    ServerError: stol()
This commit fixes the issue and comes with a simple test.

Fixes #3801
Tests: unit(dev)
Message-Id: <7aaae83d003738f047d28727430ca0a5cec6b9c6.1583478000.git.sarna@scylladb.com>

(cherry picked from commit 5b7a35e02b)
2020-05-07 16:24:49 +03:00
Pekka Enberg
a960394f27 scripts/jobs: Keep memory reserve when calculating parallelism
The "jobs" script is used to determine the amount of compilation
parallelism on a machine. It attempts to ensure each GCC process has at
least 4 GB of memory per core. However, in the worst case scenario, we
could end up having the GCC processes take up all the system memory,
forcin swapping or OOM killer to kick in. For example, on a 4 core
machine with 16 GB of memory, this worst case scenario seems easy to
trigger in practice.

Fix up the problem by keeping a 1 GB of memory reserve for other
processes and calculating parallelism based on that.

Message-Id: <20200423082753.31162-1-penberg@scylladb.com>
(cherry picked from commit 7304a795e5)
2020-05-04 19:01:54 +03:00
Piotr Sarna
3216a1a70a alternator: fix signature timestamps
Generating timestamps for auth signatures used a non-thread-safe
::gmtime function instead of thread-safe ::gmtime_r.

Tests: unit(dev)
Fixes #6345

(cherry picked from commit fb7fa7f442)
2020-05-04 17:08:13 +03:00
Avi Kivity
5a7fd41618 Merge 'Fix hang in multishard_writer' from Asias
"
This series fix hang in multishard_writer when error happens. It contains
- multishard_writer: Abort the queue attached to consumers when producer fails
- repair: Fix hang when the writer is dead

Fixes #6241
Refs: #6248
"

* asias-stream_fix_multishard_writer_hang:
  repair: Fix hang when the writer is dead
  mutation_writer_test: Add test_multishard_writer_producer_aborts
  multishard_writer: Abort the queue attached to consumers when producer fails

(cherry picked from commit 8925e00e96)
2020-05-01 20:13:00 +03:00
Raphael S. Carvalho
dd24ba7a62 api/service: fix segfault when taking a snapshot without keyspace specified
If no keyspace is specified when taking snapshot, there will be a segfault
because keynames is unconditionally dereferenced. Let's return an error
because a keyspace must be specified when column families are specified.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200427195634.99940-1-raphaelsc@scylladb.com>
(cherry picked from commit 02e046608f)

Fixes #6336.
2020-04-30 12:57:14 +03:00
Avi Kivity
204f6dd393 Update seastar submodule
* seastar a0bdc6cd85...0ebd89a858 (1):
  > http server: fix "Date" header format

Fixes #6253.
2020-04-26 19:31:44 +03:00
Nadav Har'El
b1278adc15 alternator: unzero "scylla_alternator_total_operations" metric
In commit 388b492040, which was only supposed
to move around code, we accidentally lost the line which does

    _executor.local()._stats.total_operations++;

So after this commit this counter was always zero...
This patch returns the line incrementing this counter.

Arguably, this counter is not very important - a user can also calculate
this number by summing up all the counters in the scylla_alternator_operation
array (these are counters for individual types of operations). Nevertheless,
as long as we do export a "scylla_alternator_total_operations" metric,
we need to correctly calculate it and can't leave it zero :-)

Fixes #5836

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200219162820.14205-1-nyh@scylladb.com>
(cherry picked from commit b8aed18a24)
2020-04-19 19:07:31 +03:00
Botond Dénes
ee9677ef71 schema: schema(): use std::stable_sort() to sort key columns
When multiple key columns (clustering or partition) are passed to
the schema constructor, all having the same column id, the expectation
is that these columns will retain the order in which they were passed to
`schema_builder::with_column()`. Currently however this is not
guaranteed as the schema constructor sort key columns by column id with
`std::sort()`, which doesn't guarantee that equally comparing elements
retain their order. This can be an issue for indexes, the schemas of
which are built independently on each node. If there is any room for
variance between for the key column order, this can result in different
nodes having incompatible schemas for the same index.
The fix is to use `std::stable_sort()` which guarantees that the order
of equally comparing elements won't change.

This is a suspected cause of #5856, although we don't have hard proof.

Fixes: #5856
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
[avi: upgraded "Refs" to "Fixes", since we saw that std::sort() becomes
      unstable at 17 elements, and the failing schema had a
      clustering key with 23 elements]
Message-Id: <20200417121848.1456817-1-bdenes@scylladb.com>
(cherry picked from commit a4aa753f0f)
2020-04-19 18:19:05 +03:00
Nadav Har'El
2060e361cf materialized views: fix corner case of view updates used by Alternator
While CQL does not allow creation of a materialized view with more than one
base regular column in the view's key, in Alternator we do allow this - both
partition and clustering key may be a base regular column. We had a bug in
the logic handling this case:

If the new base row is missing a value for *one* of the view key columns,
we shouldn't create a view row. Similarly, if the existing base row was
missing a value for *one* of the view key columns, a view row does not
exist and doesn't need to be deleted.  This was done incorrectly, and made
decisions based on just one of the key columns, and the logic is now
fixed (and I think, simplified) in this patch.

With this patch, the Alternator test which previously failed because of
this problem now passes. The patch also includes new tests in the existing
C++ unit test test_view_with_two_regular_base_columns_in_key. This tests
was already supposed to be testing various cases of two-new-key-columns
updates, but missed the cases explained above. These new tests failed
badly before this patch - some of them had clean write errors, others
caused crashes. With this patch, they pass.

Fixes #6008.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200312162503.8944-1-nyh@scylladb.com>
(cherry picked from commit 635e6d887c)
2020-04-19 15:24:19 +03:00
Hagit Segev
6f939ffe19 release: prepare for 3.3.1 2020-04-18 00:23:31 +03:00
Kamil Braun
69105bde8a sstables: freeze types nested in collection types in legacy sstables
Some legacy `mc` SSTables (created in Scylla 3.0) may contain incorrect
serialization headers, which don't wrap frozen UDTs nested inside collections
with the FrozenType<...> tag. When reading such SSTable,
Scylla would detect a mismatch between the schema saved in schema
tables (which correctly wraps UDTs in the FrozenType<...> tag) and the schema
from the serialization header (which doesn't have these tags).

SSTables created in Scylla versions 3.1 and above, in particular in
Scylla versions that contain this commit, create correct serialization
headers (which wrap UDTs in the FrozenType<...> tag).

This commit does two things:
1. for all SSTables created after this commit, include a new feature
   flag, CorrectUDTsInCollections, presence of which implies that frozen
   UDTs inside collections have the FrozenType<...> tag.
2. when reading a Scylla SSTable without the feature flag, we assume that UDTs
   nested inside collections are always frozen, even if they don't have
   the tag. This assumption is safe to be made, because at the time of
   this commit, Scylla does not allow non-frozen (multi-cell) types inside
   collections or UDTs, and because of point 1 above.

There is one edge case not covered: if we don't know whether the SSTable
comes from Scylla or from C*. In that case we won't make the assumption
described in 2. Therefore, if we get a mismatch between schema and
serialization headers of a table which we couldn't confirm to come from
Scylla, we will still reject the table. If any user encounters such an
issue (unlikely), we will have to use another solution, e.g. using a
separate tool to rewrite the SSTable.

Fixes #6130.

(cherry picked from commit 3d811e2f95)
2020-04-17 09:12:28 +03:00
Kamil Braun
e09e9a5929 sstables: move definition of column_translation::state::build to a .cc file
Ref #6130
2020-04-17 09:12:28 +03:00
Piotr Sarna
2308bdbccb alternator: use partition tombstone if there's no clustering key
As @tgrabiec helpfully pointed out, creating a row tombstone
for a table which does not have a clustering key in its schema
creates something that looks like an open-ended range tombstone.
That's problematic for KA/LA sstable formats, which are incapable
of writing such tombstones, so a workaround is provided
in order to allow using KA/LA in alternator.

Fixes #6035
Cherry-picked from 0a2d7addc0
2020-04-16 12:14:10 +02:00
Asias He
a2d39c9a2e gossip: Add an option to force gossip generation
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.

n1, n2, n3 running scylla version with commit
0a52ecb6df (gossip: Fix max generation
drift measure)

One year later, user wants the upgrade n1,n2,n3 to a new version

when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.

Such unnecessary marking of node down can cause availability issues.
For example:

DC1: n1, n2
DC2: n3, n4

When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.

To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.

Once all the nodes run the version with commit
0a52ecb6df, the option is no logger
needed.

Fixes #5164

(cherry picked from commit 743b529c2b)
2020-03-27 12:49:23 +01:00
Asias He
5fe2ce3bbe gossiper: Always use the new generation number
User reported an issue that after a node restart, the restarted node
is marked as DOWN by other nodes in the cluster while the node is up
and running normally.

Consier the following:

- n1, n2, n3 in the cluster
- n3 shutdown itself
- n3 send shutdown verb to n1 and n2
- n1 and n2 set n3 in SHUTDOWN status and force the heartbeat version to
  INT_MAX
- n3 restarts
- n3 sends gossip shadow rounds to n1 and n2, in
  storage_service::prepare_to_join,
- n3 receives response from n1, in gossiper::handle_ack_msg, since
  _enabled = false and _in_shadow_round == false, n3 will apply the
  application state in fiber1, filber 1 finishes faster filber 2, it
  sets _in_shadow_round = false
- n3 receives response from n2, in gossiper::handle_ack_msg, since
  _enabled = false and _in_shadow_round == false, n3 will apply the
  application state in fiber2, filber 2 yields
- n3 finishes the shadow round and continues
- n3 resets gossip endpoint_state_map with
  gossiper.reset_endpoint_state_map()
- n3 resumes fiber 2, apply application state about n3 into
  endpoint_state_map, at this point endpoint_state_map contains
  information including n3 itself from n2.
- n3 calls gossiper.start_gossiping(generation_number, app_states, ...)
  with new generation number generated correctly in
  storage_service::prepare_to_join, but in
  maybe_initialize_local_state(generation_nbr), it will not set new
  generation and heartbeat if the endpoint_state_map contains itself
- n3 continues with the old generation and heartbeat learned in fiber 2
- n3 continues the gossip loop, in gossiper::run,
  hbs.update_heart_beat() the heartbeat is set to the number starting
  from 0.
- n1 and n2 will not get update from n3 because they use the same
  generation number but n1 and n2 has larger heartbeat version
- n1 and n2 will mark n3 as down even if n3 is alive.

To fix, always use the the new generation number.

Fixes: #5800
Backports: 3.0 3.1 3.2
(cherry picked from commit 62774ff882)
2020-03-27 12:49:20 +01:00
Piotr Sarna
aafa34bbad cql: fix qualifying indexed columns for filtering
When qualifying columns to be fetched for filtering, we also check
if the target column is not used as an index - in which case there's
no need of fetching it. However, the check was incorrectly assuming
that any restriction is eligible for indexing, while it's currently
only true for EQ. The fix makes a more specific check and contains
many dynamic casts, but these will hopefully we gone once our
long planned "restrictions rewrite" is done.
This commit comes with a test.

Fixes #5708
Tests: unit(dev)

(cherry picked from commit 767ff59418)
2020-03-22 09:00:51 +01:00
Hagit Segev
7ae2cdf46c release: prepare for 3.3.0 2020-03-19 21:46:44 +02:00
Hagit Segev
863f88c067 release: prepare for 3.3.rc3 2020-03-15 22:45:30 +02:00
Avi Kivity
90b4e9e595 Update seastar submodule
* seastar f54084c08f...a0bdc6cd85 (1):
  > tls: Fix race and stale memory use in delayed shutdown

Fixes #5759 (maybe)
2020-03-12 19:41:50 +02:00
Konstantin Osipov
434ad4548f locator: correctly select endpoints if RF=0
SimpleStrategy creates a list of endpoints by iterating over the set of
all configured endpoints for the given token, until we reach keyspace
replication factor.
There is a trivial coding bug when we first add at least one endpoint
to the list, and then compare list size and replication factor.
If RF=0 this never yields true.
Fix by moving the RF check before at least one endpoint is added to the
list.
Cassandra never had this bug since it uses a less fancy while()
loop.

Fixes #5962
Message-Id: <20200306193729.130266-1-kostja@scylladb.com>

(cherry picked from commit ac6f64a885)
2020-03-12 12:09:46 +02:00
Avi Kivity
cbbb15af5c logalloc: increase capacity of _regions vector outside reclaim lock
Reclaim consults the _regions vector, so we don't want it moving around while
allocating more capacity. For that we take the reclaim lock. However, that
can cause a false-positive OOM during startup:

1. all memory is allocated to LSA as part of priming (2baa16b371)
2. the _regions vector is resized from 64k to 128k, requiring a segment
   to be freed (plenty are free)
3. but reclaiming_lock is taken, so we cannot reclaim anything.

To fix, resize the _regions vector outside the lock.

Fixes #6003.
Message-Id: <20200311091217.1112081-1-avi@scylladb.com>

(cherry picked from commit c020b4e5e2)
2020-03-12 11:25:20 +02:00
Benny Halevy
3231580c05 dist/redhat: scylla.spec.mustache: set _no_recompute_build_ids
By default, `/usr/lib/rpm/find-debuginfo.sh` will temper with
the binary's build-id when stripping its debug info as it is passed
the `--build-id-seed <version>.<release>` option.

To prevent that we need to set the following macros as follows:
  unset `_unique_build_ids`
  set `_no_recompute_build_ids` to 1

Fixes #5881

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 25a763a187)
2020-03-09 15:21:50 +02:00
Piotr Sarna
62364d9dcd Merge 'cql3: do_execute_base_query: fix null deref ...
... when clustering key is unavailable' from Benny

This series fixes null pointer dereference seen in #5794

efd7efe cql3: generate_base_key_from_index_pk; support optional index_ck
7af1f9e cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable
7fe1a9e cql3: do_execute_base_query: fixup indentation

Fixes #5794

Branches: 3.3

Test: unit(dev) secondary_indexes_test:TestSecondaryIndexes.test_truncate_base(debug)

* bhalevy/fix-5794-generate_base_key_from_index_pk:
  cql3: do_execute_base_query: fixup indentation
  cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable
  cql3: generate_base_key_from_index_pk; support optional index_ck

(cherry picked from commit 4e95b67501)
2020-03-09 15:20:01 +02:00
Takuya ASADA
3bed8063f6 dist/debian: fix "unable to open node-exporter.service.dpkg-new" error
It seems like *.service is conflicting on install time because the file
installed twice, both debian/*.service and debian/scylla-server.install.

We don't need to use *.install, so we can just drop the line.

Fixes #5640

(cherry picked from commit 29285b28e2)
2020-03-03 12:40:39 +02:00
Yaron Kaikov
413fcab833 release: prepare for 3.3.rc2 2020-02-27 14:45:18 +02:00
Juliusz Stasiewicz
9f3c3036bf cdc: set TTLs on CDC log cells
Cells in CDC logs used to be created while completely neglecting
TTLs (the TTLs from `cdc = {...'ttl':600}`). This patch adds TTLs
to all cells; there are no row markers, so wee need not set TTL
there.

Fixes #5688

(cherry picked from commit 67b92c584f)
2020-02-26 18:12:55 +02:00
Benny Halevy
ff2e108a6d gossiper: do_stop_gossiping: copy live endpoints vector
It can be resized asynchronously by mark_dead.

Fixes #5701

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200203091344.229518-1-bhalevy@scylladb.com>
(cherry picked from commit f45fabab73)
2020-02-26 13:00:11 +02:00
Gleb Natapov
ade788ffe8 commitlog: use commitlog IO scheduling class for segment zeroing
There may be other commitlog writes waiting for zeroing to complete, so
not using proper scheduling class causes priority inversion.

Fixes #5858.

Message-Id: <20200220102939.30769-2-gleb@scylladb.com>
(cherry picked from commit 6a78cc9e31)
2020-02-26 12:51:10 +02:00
Benny Halevy
1f8bb754d9 storage_service: drain_on_shutdown: unregister storage_proxy subscribers from local_storage_service
Match subscription done in main() and avoid cross shard access
to _lifecycle_subscribers vector.

Fixes #5385

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Acked-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200123092817.454271-1-bhalevy@scylladb.com>
(cherry picked from commit 5b0ea4c114)
2020-02-25 16:39:49 +02:00
Tomasz Grabiec
7b2eb09225 Merge fixes for use-after-frees related to shutdown of services
Backport of 884d5e2bcb and
4839ca8491.

Fixes crashes when scylla is stopped early during boot.

Merged from https://github.com/xemul/scylla/tree/br-mm-combined-fixes-for-3.3

Fixes #5765.
2020-02-25 13:34:01 +01:00
Pavel Emelyanov
d2293f9fd5 migration_manager: Abort and wait cluster upgrade waiters
The maybe_schedule_schema_pull waits for schema_tables_v3 to
become available. This is unsafe in case migration manager
goes away before the feature is enabled.

Fix this by subscribing on feature with feature::listener and
waiting for condition variable in maybe_schedule_schema_pull.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-24 14:18:15 +03:00
Pavel Emelyanov
25b31f6c23 migration_manager: Abort and wait delayed schema pulls
The sleep is interrupted with the abort source, the "wait" part
is done with the existing _background_tasks gate. Also we need
to make sure the gate stays alive till the end of the function,
so make use of the async_sharded_service (migration manager is
already such).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-24 14:18:15 +03:00
Pavel Emelyanov
742a1ce7d6 storage_service: Unregister from gossiper notifications ... at all
This unregistration doesn't happen currently, but doesn't seem to
cause any problems in general, as on stop gossiper is stopped and
nothing from it hits the store_service.

However (!) if an exception pops up between the storage_service
is subscribed on gossiper and the drain_on_shutdown defer action
is set up  then we _may_ get into the following situation:

- main's stuff gets unrolled back
- gossiper is not stopped (drain_on_shutdown defer is not set up)
- migration manager is stopped (with deferred action in main)
- a nitification comes from gossiper
    -> storage_service::on_change might want to pull schema with
       the help of local migration manager
    -> assert(local_is_initialized) strikes

Fix this by registering storage_service to gossiper a bit earlier
(both are already initialized y that time) and setting up unregister
defer right afterwards.

Test: unit(dev), manual start-stop
Bug: #5628

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200130190343.25656-1-xemul@scylladb.com>
2020-02-24 14:18:15 +03:00
Avi Kivity
4ca9d23b83 Revert "streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations"
This reverts commit bdc542143e. Exposes a data resurrection
bug (#5838).
2020-02-24 10:02:58 +02:00
Avi Kivity
9e97f3a9b3 Update seastar submodule
* seastar dd686552ff...f54084c08f (2):
  > reactor: fallback to epoll backend when fs.aio-max-nr is too small
  > util: move read_sys_file_as() from iotune to seastar header, rename read_first_line_as()

Fixes #5638.
2020-02-20 10:25:00 +02:00
Piotr Dulikowski
183418f228 hh: handle counter update hints correctly
This patch fixes a bug that appears because of an incorrect interaction
between counters and hinted handoff.

When a counter is updated on the leader, it sends mutations to other
replicas that contain all counter shards from the leader. If consistency
level is achieved but some replicas are unavailable, a hint with
mutation containing counter shards is stored.

When a hint's destination node is no longer its replica, it is attempted
to be sent to all its current replicas. Previously,
storage_proxy::mutate was used for that purpose. It was incorrect
because that function treats mutations for counter tables as mutations
containing only a delta (by how much to increase/decrease the counter).
These two types of mutations have different serialization format, so in
this case a "shards" mutation is reinterpreted as "delta" mutation,
which can cause data corruption to occur.

This patch backports `storage_proxy::mutate_hint_from_scratch`
function, which bypasses special handling of counter mutations and
treats them as regular mutations - which is the correct behavior for
"shards" mutations.

Refs #5833.
Backports: 3.1, 3.2, 3.3
Tests: unit(dev)
(cherry picked from commit ec513acc49)
2020-02-19 16:49:12 +02:00
Piotr Sarna
756574d094 db,view: fix generating view updates for partition tombstones
The update generation path must track and apply all tombstones,
both from the existing base row (if read-before-write was needed)
and for the new row. One such path contained an error, because
it assumed that if the existing row is empty, then the update
can be simply generated from the new row. However, lack of the
existing row can also be the result of a partition/range tombstone.
If that's the case, it needs to be applied, because it's entirely
possible that this partition row also hides the new row.
Without taking the partition tombstone into account, creating
a future tombstone and inserting an out-of-order write before it
in the base table can result in ghost rows in the view table.
This patch comes with a test which was proven to fail before the
changes.

Branches 3.1,3.2,3.3
Fixes #5793

Tests: unit(dev)
Message-Id: <8d3b2abad31572668693ab585f37f4af5bb7577a.1581525398.git.sarna@scylladb.com>
(cherry picked from commit e93c54e837)
2020-02-16 20:26:28 +02:00
Rafael Ávila de Espíndola
a348418918 service: Add a lock around migration_notifier::_listeners
Before this patch the iterations over migration_notifier::_listeners
could race with listeners being added and removed.

The addition side is not modified, since it is common to add a
listener during construction and it would require a fairly big
refactoring. Instead, the iteration is modified to use indexes instead
of iterators so that it is still valid if another listener is added
concurrently.

For removal we use a rw lock, since removing an element invalidates
indexes too. There are only a few places that needed refactoring to
handle unregister_listener returning a future<>, so this is probably
OK.

Fixes #5541.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200120192819.136305-1-espindola@scylladb.com>
(cherry picked from commit 27bd3fe203)
2020-02-16 20:13:42 +02:00
Avi Kivity
06c0bd0681 Update seastar submodule
* seastar 3f3e117de3...dd686552ff (1):
  > perftune.py: Use safe_load() for fix arbitrary code execution

Fixes #5630.
2020-02-16 15:53:16 +02:00
Avi Kivity
223c300435 Point seastar submodule at scylla-seastar.git branch-3.3
This allows us to backport seastar patches to Scylla 3.3.
2020-02-16 15:51:46 +02:00
Gleb Natapov
ac8bef6781 commitlog: fix flushing an entry marked as "sync" in periodic mode
After 546556b71b we can have mixed writes into commitlog,
some do flush immediately some do not. If non flushing write races with
flushing one and becomes responsible for writing back its buffer into a
file flush will be skipped which will cause assert in batch_cycle() to
trigger since flush position will not be advanced. Fix that by checking
that flush was skipped and in this case flush explicitly our file
position.

Fixes #5670

Message-Id: <20200128145103.GI26048@scylladb.com>
(cherry picked from commit c654ffe34b)
2020-02-16 15:48:40 +02:00
Pavel Solodovnikov
68691907af lwt: fix handling of nulls in parameter markers for LWT queries
This patch affects the LWT queries with IF conditions of the
following form: `IF col in :value`, i.e. if the parameter
marker is used.

When executing a prepared query with a bound value
of `(None,)` (tuple with null, example for Python driver), it is
serialized not as NULL but as "empty" value (serialization
format differs in each case).

Therefore, Scylla deserializes the parameters in the request as
empty `data_value` instances, which are, in turn, translated
to non-empty `bytes_opt` with empty byte-string value later.

Account for this case too in the CAS condition evaluation code.

Example of a problem this patch aims to fix:

Suppose we have a table `tbl` with a boolean field `test` and
INSERT a row with NULL value for the `test` column.

Then the following update query fails to apply due to the
error in IF condition evaluation code (assume `v=(null)`):
`UPDATE tbl SET test=false WHERE key=0 IF test IN :v`
returns false in `[applied]` column, but is expected to succeed.

Tests: unit(debug, dev), dtest(prepared stmt LWT tests at https://github.com/scylladb/scylla-dtest/pull/1286)

Fixes: #5710

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200205102039.35851-1-pa.solodovnikov@scylladb.com>
(cherry picked from commit bcc4647552)
2020-02-16 15:29:28 +02:00
Avi Kivity
f59d2fcbf1 Merge "stop passing tracing state pointer in client_state" from Gleb
"
client_state is used simultaneously by many requests running in parallel
while tracing state pointer is per request. Both those facts do not sit
well together and as a result sometimes tracing state is being overwritten
while still been used by active request which may cause incorrect trace
or even a crash.
"

Fixes #5700.

Backported from 9f1f60fc38

* 'gleb/trace_fix_3.3_backport' of ssh://github.com/scylladb/seastar-dev:
  client_state: drop the pointer to a tracing state from client_state
  transport: pass tracing state explicitly instead of relying on it been in the client_state
  alternator: pass tracing state explicitly instead of relying on it been in the client_state
2020-02-16 15:23:41 +02:00
Asias He
bdc542143e streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations
The table::flush_streaming_mutations is used in the days when streaming
data goes to memtable. After switching to the new streaming, data goes
to sstables directly in streaming, so the sstables generated in
table::flush_streaming_mutations will be empty.

It is unnecessary to invalidate the cache if no sstables are added. To
avoid unnecessary cache invalidating which pokes hole in the cache, skip
calling _cache.invalidate() if the sstables is empty.

The steps are:

- STREAM_MUTATION_DONE verb is sent when streaming is done with old or
  new streaming
- table::flush_streaming_mutations is called in the verb handler
- cache is invalidated for the streaming ranges

In summary, this patch will avoid a lot of cache invalidation for
streaming.

Backports: 3.0 3.1 3.2
Fixes: #5769
(cherry picked from commit 5e9925b9f0)
2020-02-16 15:16:24 +02:00
Botond Dénes
061a02237c row: append(): downgrade assert to on_internal_error()
This assert, added by 060e3f8 is supposed to make sure the invariant of
the append() is respected, in order to prevent building an invalid row.
The assert however proved to be too harsh, as it converts any bug
causing out-of-order clustering rows into cluster unavailability.
Downgrade it to on_internal_error(). This will still prevent corrupt
data from spreading in the cluster, without the unavailability caused by
the assert.

Fixes: #5786
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200211083829.915031-1-bdenes@scylladb.com>
(cherry picked from commit 3164456108)
2020-02-16 15:12:46 +02:00
Gleb Natapov
35b6505517 client_state: drop the pointer to a tracing state from client_state
client_state is shared between requests and tracing state is per
request. It is not safe to use the former as a container for the later
since a state can be overwritten prematurely by subsequent requests.

(cherry picked from commit 31cf2434d6)
2020-02-13 13:45:56 +02:00
Gleb Natapov
866c04dd64 transport: pass tracing state explicitly instead of relying on it been in the client_state
Multiple requests can use the same client_state simultaneously, so it is
not safe to use it as a container for a tracing state which is per request.
Currently next request may overwrite tracing state for previous one
causing, in a best case, wrong trace to be taken or crash if overwritten
pointer is freed prematurely.

Fixes #5700

(cherry picked from commit 9f1f60fc38)
2020-02-13 13:45:56 +02:00
Gleb Natapov
dc588e6e7b alternator: pass tracing state explicitly instead of relying on it been in the client_state
Multiple requests can use the same client_state simultaneously, so it is
not safe to use it as a container for a tracing state which is per
request. This is not yet an issue for the alternator since it creates
new client_state object for each request, but first of all it should not
and second trace state will be dropped from the client_state, by later
patch.

(cherry picked from commit 38fcab3db4)
2020-02-13 13:45:56 +02:00
Takuya ASADA
f842154453 dist/debian: keep /etc/systemd .conf files on 'remove'
Since dpkg does not re-install conffiles when it removed by user,
currently we are missing dependencies.conf and sysconfdir.conf on rollback.
To prevent this, we need to stop running
'rm -rf /etc/systemd/system/scylla-server.service.d/' on 'remove'.

Fixes #5734

(cherry picked from commit 43097854a5)
2020-02-12 14:26:40 +02:00
Yaron Kaikov
b38193f71d dist/docker: Switch to 3.3 release repository (#5756)
Change the SCYLLA_REPO_URL variable to point to branch-3.3 instead of
master. This ensures that Docker image builds that don't specify the
variable build from the right repository by default.
2020-02-10 11:11:38 +02:00
Rafael Ávila de Espíndola
f47ba6dc06 lua: Handle nil returns correctly
This is a minimum backport to 3.3.

With this patch lua nil values are mapped to CQL null values instead
of producing an error.

Fixes #5667

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200203164918.70450-1-espindola@scylladb.com>
2020-02-09 18:55:42 +02:00
Hagit Segev
0d0c1d4318 release: prepare for 3.3.rc1 2020-02-09 15:55:24 +02:00
Takuya ASADA
9225b17b99 scylla_post_install.sh: fix 'integer expression expected' error
awk returns float value on Debian, it causes postinst script failure
since we compare it as integer value.
Replaced with sed + bash.

Fixes #5569

(cherry picked from commit 5627888b7c)
2020-02-04 14:30:04 +02:00
Gleb Natapov
00b3f28199 db/system_keyspace: use user memory limits for local.paxos table
Treat writes to local.paxos as user memory, as the number of writes is
dependent on the amount of user data written with LWT.

Fixes #5682

Message-Id: <20200130150048.GW26048@scylladb.com>
(cherry picked from commit b08679e1d3)
2020-02-02 17:36:52 +02:00
Rafael Ávila de Espíndola
1bbe619689 types: Fix encoding of negative varint
We would sometimes produce an unnecessary extra 0xff prefix byte.

The new encoding matches what cassandra does.

This was both a efficiency and correctness issue, as using varint in a
key could produce different tokens.

Fixes #5656

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
(cherry picked from commit c89c90d07f)
2020-02-02 16:00:58 +02:00
Avi Kivity
c36f71c783 test: make eventually() more patient
We use eventually() in tests to wait for eventually consistent data
to become consistent. However, we see spurious failures indicating
that we wait too little.

Increasing the timeout has a negative side effect in that tests that
fail will now take longer to do so. However, this negative side effect
is negligible to false-positive failures, since they throw away large
test efforts and sometimes require a person to investigate the problem,
only to conclude it is a false positive.

This patch therefore makes eventually() more patient, by a factor of
32.

Fixes #4707.
Message-Id: <20200130162745.45569-1-avi@scylladb.com>

(cherry picked from commit ec5b721db7)
2020-02-01 13:20:22 +02:00
Pekka Enberg
f5471d268b release: prepare for 3.3.rc0 2020-01-30 14:00:51 +02:00
Takuya ASADA
fd5c65d9dc dist/debian: Use tilde for release candidate builds
We need to add '~' to handle rcX version correctly on Debian variants
(merged at ae33e9f), but when we moved to relocated package we mistakenly
dropped the code, so add the code again.

Fixes #5641

(cherry picked from commit dd81fd3454)
2020-01-28 18:34:48 +02:00
Avi Kivity
3aa406bf00 tools: toolchain: dbuild: relax process limit in container
Docker restricts the number of processes in a container to some
limit it calculates. This limit turns out to be too low on large
machines, since we run multiple links in parallel, and each link
runs many threads.

Remove the limit by specifying --pids-limit -1. Since dbuild is
meant to provide a build environment, not a security barrier,
this is okay (the container is still restricted by host limits).

I checked that --pids-limit is supported by old versions of
docker and by podman.

Fixes #5651.
Message-Id: <20200127090807.3528561-1-avi@scylladb.com>

(cherry picked from commit 897320f6ab)
2020-01-28 18:14:01 +02:00
Piotr Sarna
c0253d9221 db,view: fix checking for secondary index special columns
A mistake in handling legacy checks for special 'idx_token' column
resulted in not recognizing materialized views backing secondary
indexes properly. The mistake is really a typo, but with bad
consequences - instead of checking the view schema for being an index,
we asked for the base schema, which is definitely not an index of
itself.

Branches 3.1,3.2 (asap)
Fixes #5621
Fixes #4744

(cherry picked from commit 9b379e3d63)
2020-01-21 23:32:11 +02:00
Avi Kivity
12bc965f71 atomic_cell: consistently use comma as separator in pretty-printers
The atomic_cell pretty printers use a mix of commas and semicolons.
This change makes them use commas everywhere, for consistency.
Message-Id: <20200116133327.2610280-1-avi@scylladb.com>
2020-01-16 17:26:33 +01:00
Nadav Har'El
1ed21d70dc merge: CDC: do mutation augmentation from storage proxy
Merged pull request https://github.com/scylladb/scylla/pull/5567
from Calle Wilund:

Fixes #5314

Instead of tying CDC handling into cql statement objects, this patch set
moves it to storage proxy, i.e. shared code for mutating stuff. This means
we automatically handle cdc for code paths outside cql (i.e. alternator).

It also adds api handling (though initially inefficient) for batch statements.

CDC is tied into storage proxy by giving the former a ref to the latter (per
shard). Initially this is not a constructor parameter, because right now we
have chicken and egg issues here. Hopefully, Pavels refactoring of migration
manager and notifications will untie these and this relationship can become
nicer.

The actual augmentation can (as stated above) be made much more efficient.
Hopefully, the stream management refactoring will deal with expensive stream
lookup, and eventually, we can maybe coalesce pre-image selects for batches.
However, that is left as an exercise for when deemed needed.

The augmentation API has an optional return value for a "post-image handler"
to be used iff returned after mutation call is finished (and successful).
It is not yet actually invoked from storage_proxy, but it is at least in the
call chain.
2020-01-16 17:12:56 +02:00
Avi Kivity
e677f56094 Merge "Enable general centos RPM (not only centos7)" from Hagit 2020-01-16 14:13:24 +02:00
Tomasz Grabiec
36d90e637e Merge "Relax migration manager dependencies" from Pavel Emalyanov
The set make dependencies between mm and other services cleaner,
in particular, after the set:

- the query processor no longer needs migration manager
  (which doesn't need query processor either)

- the database no longer needs migration manager, thus the mutual
  dependency between these two is dropped, only migration manager
  -> database is left

- the migration manager -> storage_service dependency is relaxed,
  one more patchset will be needed to remove it, thus dropping one
  more mutual dependency between them, only the storage_service
  -> migration manager will be left

- the migration manager is stopped on drain, but several more
  services need it on stop, thus causing use after free problems,
  in particular there's a caught bug when view builder crashes
  when unregistering from notifier list on stop. Fixed.

Tests: unit(dev)
Fixes: #5404
2020-01-16 12:12:25 +01:00
Hagit Segev
d0405003bd building-packages doc: Update no specific el7 on path 2020-01-16 12:49:08 +02:00
Rafael Ávila de Espíndola
c42a2c6f28 configure: Add -O1 when compiling generated parsers
Enabling asan enables a few cleanup optimizations in gcc. The net
result is that using

  -fsanitize=address -fno-sanitize-address-use-after-scope

Produces code that uses a lot less stack than if the file is compiled
with just -O0.

This patch adds -O1 in addition to
-fno-sanitize-address-use-after-scope to protect the unfortunate
developer that decides to build in dev mode with --cflags='-O0 -g'.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200116012318.361732-2-espindola@scylladb.com>
2020-01-16 12:05:50 +02:00
Rafael Ávila de Espíndola
317e0228a8 configure: Put user flags after the mode flags
It is sometimes convenient to build with flags that don't match any
existing mode.

Recently I was tracking a bug that would not reproduce with debug, but
reproduced with dev, so I tried debugging the result of

./configure.py --cflags="-O0 -g"

While the binary had debug info, it still had optimizations because
configure.py put the mode flags after the user flags (-O0 -O1). This
patch flips the order (-O1 -O0) so that the flags passed in the
command line win.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200116012318.361732-1-espindola@scylladb.com>
2020-01-16 12:05:50 +02:00
Gleb Natapov
51281bc8ad lwt: fix write timeout exception reporting
CQL transport code relies on an exception's C++ type to create correct
reply, but in lwt we converted some mutation_timeout exceptions to more
generic request_timeout while forwarding them which broke the protocol.
Do not drop type information.

Fixes #5598.

Message-Id: <20200115180313.GQ9084@scylladb.com>
2020-01-16 12:05:50 +02:00
Piotr Jastrzębski
0c8c1ec014 config: fix description of enable_deprecated_partitioners
Murmur3 is the default partitioner.
ByteOrder and Random are the deprecated ones
and should be mentioned in the description.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-16 12:05:50 +02:00
Nadav Har'El
9953a33354 merge "Adding a schema file when creating a snapshot"
Merged pull request https://github.com/scylladb/scylla/pull/5294 from
Amnon Heiman:

To use a snapshot we need a schema file that is similar to the result of
running cql DESCRIBE command.

The DESCRIBE is implemented in the cql driver so the functionality needs
to be re-implemented inside scylla.

This series adds a describe method to the schema file and use it when doing
a snapshot.

There are different approach of how to handle materialize views and
secondary indexes.

This implementation creates each schema.cql file in its own relevant
directory, so the schema for materializing view, for example, will be
placed in the snapshot directory of the table of that view.

Fixes #4192
2020-01-16 12:05:50 +02:00
Piotr Dulikowski
c383652061 gossip: allow for aborting on sleep
This commit makes most sleeps in gossip.cc abortable. It is now possible
to quickly shut down a node during startup, most notably during the
phase while it waits for gossip to settle.
2020-01-16 12:05:50 +02:00
Avi Kivity
e5e0642f2a tools: toolchain: add dependencies for building debian and rpm packages
This reduces network traffic and eliminates time for installation when
building packages from the frozen toolchain, as well as isolating the
build from updates to those dependencies which may cause breakage.
2020-01-16 12:05:50 +02:00
Pekka Enberg
da9dae3dbe Merge 'test.py: add support for CQL tests' from Kostja
This patch set adds support for CQL tests to test.py,
as well as many other improvements:

* --name is now a positional argument
* test output is preserved in testlog/${mode}
* concise output format
* better color support
* arbitrary number of test suites
* per-suite yaml-based configuration
* options --jenkins and --xunit are removed and xml
  files are generated for all runs

A simple driver is written in C++ to read CQL for
standard input, execute in embedded mode and produce output.

The patch is checked with BYO.

Reviewed-by: Dejan Mircevski <dejan@scylladb.com>
* 'test.py' of github.com:/scylladb/scylla-dev: (39 commits)
  test.py: introduce BoostTest and virtualize custom boost arguments
  test.py: sort tests within a suite, and sort suites
  test.py: add a basic CQL test
  test.py: add CQL .reject files to gitignore
  test.py: print a colored unidiff in case of test failure
  test.py: add CqlTestSuite to run CQL tests
  test.py: initial import of CQL test driver, cql_repl
  test.py: remove custom colors and define a color palette
  test.py: split test output per test mode
  test.py: remove tests_to_run
  test.py: virtualize Test.run(), to introduce CqlTest.Run next
  test.py: virtualize test search pattern per TestSuite
  test.py: virtualize write_xunit_report()
  test.py: ensure print_summary() is agnostic of test type
  test.py: tidy up print_summary()
  test.py: introduce base class Test for CQL and Unit tests
  test.py: move the default arguments handling to UnitTestSuite
  test.py: move custom unit test command line arguments to suite.yaml
  test.py: move command line argument processing to UnitTestSuite
  test.py: introduce add_test(), which is suite-specific
  ...
2020-01-16 12:05:50 +02:00
Pekka Enberg
e8b659ec5d dist/docker: Remove Ubuntu-based Docker image
The Ubuntu-based Docker image uses Scylla 1.0 and has not been updated
since 2017. Let's remove it as unmaintained.

Message-Id: <20200115102405.23567-1-penberg@scylladb.com>
2020-01-16 12:05:50 +02:00
Avi Kivity
546556b71b Merge "allow commitlog to wait for specific entires to be flushed on disk" from Gleb
"
Currently commitlog supports two modes of operation. First is 'periodic'
mode where all commitlog writes are ready the moment they are stored in
a memory buffer and the memory buffer is flushed to a storage periodically.
Second is a 'batch' mode where each write is flushed as soon as possible
(after previous flush completed) and writes are only ready after they
are flushed.

The first option is not very durable, the second is not very efficient.
This series adds an option to mark some writes as "more durable" in
periodic mode meaning that they will be flushed immediately and reported
complete only after the flush is complete (flushing a durable write also
flushes all writes that came before it). It also changes paxos to use
those durable writes to store paxos state.

Note that strictly speaking the last patch is not needed since after
writing to an actual table the code updates paxos table and the later
uses durable writes that make sure all previous writes are flushed. Given
that both writes supposed to run on the same shard this should be enough.
But it feels right to make base table writes durable as well.
"

* 'gleb/commilog_sync_v4' of github.com:scylladb/seastar-dev:
  paxos: immediately sync commitlog entries for writes made by paxos learn stage
  paxos: mark paxos table schema as "always sync"
  schema: allow schema to be marked as 'always sync to commitlog'
  commitlog: add test for per entry sync mode
  database: pass sync flag from db::apply function to the commitlog
  commitlog: add sync method to entry_writer
2020-01-16 12:05:50 +02:00
Rafael Ávila de Espíndola
2ebd1463b2 tests: Handle null and not present values differently
Before this patch result_set_assertions was handling both null values
and missing values in the same way.

This patch changes the handling of missing values so that now checking
for a null value is not the same as checking for a value not being
present.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200114184116.75546-1-espindola@scylladb.com>
2020-01-16 12:05:50 +02:00
Botond Dénes
0c52c2ba50 data: make cell::make_collection(): more consistent and safer
3ec889816 changed cell::make_collection() to take different code paths
depending whether its `data` argument is nothrow copyable/movable or
not. In case it is not, it is wrapped in a view to make it so (see the
above mentioned commit for a full explanation), relying on the methods
pre-existing requirement for callers to keep `data` alive while the
created writer is in use.
On closer look however it turns out that this requirement is neither
respected, nor enforced, at least not on the code level. The real
requirement is that the underlying data represented by `data` is kept
alive. If `data` is a view, it is not expected to be kept alive and
callers don't, it is instead copied into `make_collection()`.
Non-views however *are* expected to be kept alive. This makes the API
error prone.
To avoid any future errors due to this ambiguity, require all `data`
arguments to be nothrow copyable and movable. Callers are now required
to pass views of nonconforming objects.

This patch is a usability improvement and is not fixing a bug. The
current code works as-is because it happens to conform to the underlying
requirements.

Refs: #5575
Refs: #5341

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200115084520.206947-1-bdenes@scylladb.com>
2020-01-16 12:05:50 +02:00
Amnon Heiman
ac8aac2b53 tests/cql_query_test: Add schema describe tests
This patch adds tests for the describe method.

test_describe_simple_schema tests regular tables.

test_describe_view_schema tests view and index.

Each test, create a table, find the schema, call the describe method and
compare the results to the string that was used to create the table.

The view tests also verify that adding an index or view does not change
the base table.

When comparing results, leading and trailing white spaces are ignored
and all combination of whitespaces and new lines are treated equaly.

Additional tests may be added at a future phase if required.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:07:57 +02:00
Amnon Heiman
028525daeb database: add schema.cql file when creating a snapshot
When creating a snapshot we need to add a schema.cql file in the
snapshot directory that describes the table in that snapshot.

This patch adds the file using the schema describe method.

get_snapshot_details and manifest_json_filter were modified to ignore
the schema.cql file.

Fixes #4192

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:06:00 +02:00
Amnon Heiman
82367b325a schema: Add a describe method
This patch adds a describe method to a table schema.

It acts similar to a DESCRIBE cql command that is implemented in a CQL
driver.

The method supports tables, secondary indexes local indexes and
materialize views.

relates to: #4192

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:06:00 +02:00
Amnon Heiman
6f58d51c83 secondary_index_manager: add the index_name_from_table_name function
index_name_from_table_name is a reverse of index_table_name,
it gets a table name that was generated for an index and return the name
of the index that generated that table.

Relates to #4192

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:06:00 +02:00
Pavel Emelyanov
555856b1cd migration_manager: Use in-place value factory
The factory is purely a state-less thing, there is no difference what
instance of it to use, so we may omit referencing the storage_service
in passive_announce

This is 2nd simple migration_manager -> storage_service link to cut
(more to come later).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
f129d8380f migration_manager: Get database through storage_proxy
There are several places where migration_manager needs storage_service
reference to get the database from, thus forming the mutual dependency
between them. This is the simplest case where the migration_manager
link to the storage_service can be cut -- the databse reference can be
obtained from storage_proxy instead.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
5cf365d7e7 database: Explicitly pass migration_manager through init_non_system_keyspace
This is the last place where database code needs the migration_manager
instance to be alive, so now the mutual dependency between these two
is gone, only the migration_manager needs the database, but not the
vice-versa.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
ebebf9f8a8 database: Do not request migration_manager instance for passive_announce
The helper in question is static, so no need to play with the
migration_manager instances.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
3f84256853 migration_manager: Remove register/unregister helpers
In the 2nd patch the migration_manager kept those for
simpler patching, but now we can drop it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
9e4b41c32a tests: Switch on migration notifier
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
9d31bc166b cdc: Use migration_notifier to (un)register for events
If no one provided -- get it from storage_service.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:19 +03:00
Pavel Emelyanov
ecab51f8cc storage_service: Use migration_notifier (and stop worrying)
The storage_server needs migration_manager for notifications and
carefully handles the manager's stop process not to demolish the
listeners list from under itself. From now on this dependency is
no longer valid (however the storage_service seems still need the
migration_manager, but this is different story).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
7814ed3c12 cql_server: Use migration_notifier in events_notifier
This patch removes an implicit cql_server -> migration_manager
dependency, as the former's event notifier uses the latter
for notifications.

This dependency also breaks a loop:
storage_service -> cql_server -> migration_manager -> storage_service

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
d9edcb3f15 query_processor: Use migration_notifier
This patch breaks one (probably harmless but still) dependency
loop. The query_processor -> migration_manager -> storage_proxy
 -> tracing -> query_processor.

The first link is not not needed, as the query_processor needs the
migration_manager purely to (ub)subscribe on notifications.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
2735024a53 auth: Use migration_notifier
The same as with view builder. The constructor still needs both,
but the life-time reference is now for notifier only.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
28f1250b8b view_builder: Use migration notifier
The migration manager itself is still needed on start to wait
for schema agreement, but there's no longer the need for the
life-time reference on it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
7cfab1de77 database: Switch on mnotifier from migration_manager
Do not call for local migration manager instance to send notifications,
call for the local migration notifier, it will always be alive.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
f45b23f088 storage_service: Keep migration_notifier
The storage service will need this guy to initialize sub-services
with. Also it registers itself with notifiers.

That said, it's convenient to have the migration notifier on board.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
e327feb77f database: Prepare to use on-database migration_notifier
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
f240d5760c migration_manager: Split notifier from main class
The _listeners list on migration_manager class and the corresponding
notify_xxx helpers have nothing to do with the its instances, they
are just transport for notification delivery.

At the same time some services need the migration manager to be alive
at their stop time to unregister from it, while the manager itself
may need them for its needs.

The proposal is to move the migration notifier into a complete separate
sharded "service". This service doesn't need anything, so it's started
first and stopped last.

While it's not effectively a "migration" notifier, we inherited the name
from Cassandra and renaming it will "scramble neurons in the old-timers'
brains but will make it easier for newcomers" as Avi says.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:19 +03:00
Pavel Emelyanov
074cc0c8ac migration_manager: Helpers for on_before_ notifications
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:27:27 +03:00
Pavel Emelyanov
1992755c72 storage_service: Kill initialization helper from init.cc
The helper just makes further patching more complex, so drop it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:27:27 +03:00
Konstantin Osipov
a665fab306 test.py: introduce BoostTest and virtualize custom boost arguments 2020-01-15 13:37:25 +03:00
Gleb Natapov
51672e5990 paxos: immediately sync commitlog entries for writes made by paxos learn stage 2020-01-15 12:15:42 +02:00
Gleb Natapov
0fc48515d8 paxos: mark paxos table schema as "always sync"
We want all writes to paxos table to be persisted on a storage before
declared completed.
2020-01-15 12:15:42 +02:00
Gleb Natapov
16e0fc4742 schema: allow schema to be marked as 'always sync to commitlog'
All writes that uses this schema will be immediately persisted on a
storage.
2020-01-15 12:15:42 +02:00
Gleb Natapov
0ce70c7a04 commitlog: add test for per entry sync mode 2020-01-15 12:15:42 +02:00
Gleb Natapov
29574c1271 database: pass sync flag from db::apply function to the commitlog
Allow upper layers to request a mutation to be persisted on a disk before
making future ready independent of which mode commitlog is running in.
2020-01-15 12:15:42 +02:00
Gleb Natapov
e0bc4aa098 commitlog: add sync method to entry_writer
If the method returns true commitlog should sync to file immediately
after writing the entry and wait for flush to complete before returning.
2020-01-15 12:15:42 +02:00
Piotr Sarna
9aab75db60 alternator: clean up single value rjson comparator
The comparator is refreshed to ensure the following:
 - null compares less to all other types;
 - null, true and false are comparable against each other,
   while other types are only comparable against themselves and null.

Comparing mixed types is not currently reachable from the alternator
API, because it's only used for sets, which can only use
strings, binary blobs and numbers - thus, no new pytest cases are added.

Fixes #5454
2020-01-15 10:57:49 +02:00
Juliusz Stasiewicz
d87d01b501 storage_proxy: intercept rpc::closed_error if counter leader is down (#5579)
When counter mutation is about to be sent, a leader is elected, but
if the leader fails after election, we get `rpc::closed_error`. The
exception propagates high up, causing all connections to be dropped.

This patch intercepts `rpc::closed_error` in `storage_proxy::mutate_counters`
and translates it to `mutation_write_failure_exception`.

References #2859
2020-01-15 09:56:45 +01:00
Konstantin Osipov
a351ea57d5 test.py: sort tests within a suite, and sort suites
This makes it easier to navigate the test artefacts.

No need to sort suites since they are already
stored in a dict.
2020-01-15 11:41:19 +03:00
Konstantin Osipov
ba87e73f8e test.py: add a basic CQL test 2020-01-15 11:41:19 +03:00
Konstantin Osipov
44d31db1fc test.py: add CQL .reject files to gitignore
To avoid accidental commit, add .reject files to .gitignore
2020-01-15 11:41:19 +03:00
Konstantin Osipov
4f64f0c652 test.py: print a colored unidiff in case of test failure
Print a colored unidiff between result and reject files in case of test
failure.
2020-01-15 11:41:19 +03:00
Konstantin Osipov
d3f9e64028 test.py: add CqlTestSuite to run CQL tests
Run the test and compare results. Manage temporary
and .reject files.

Now that there are CQL tests, improve logging.

run_test success no longer means test success.
2020-01-15 11:41:19 +03:00
Konstantin Osipov
b114bfe0bd test.py: initial import of CQL test driver, cql_repl
cql_repl is a simple program which reads CQL from stdin,
executes it, and writes results to stdout.

It support --input, --output and --log options.
--log is directed to cql_test.log by default.
--input is stdin by default
--output is stdout by default.

The result set output is print with a basic
JSON visitor.
2020-01-15 11:41:16 +03:00
Konstantin Osipov
0ec27267ab test.py: remove custom colors and define a color palette
Using a standard Python module improves readability,
and allows using colors easily in other output.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
0165413405 test.py: split test output per test mode
Store test temporary files and logs in ${testdir}/${mode}.
Remove --jenkins and --xunit, and always write XML
files at a predefined location: ${testdir}/${mode}/xml/.

Use .xunit.xml extension for tests which XML output is
in xunit format, and junit.xml for an accumulated output
of all non-boost tests in junit format.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
4095ab08c8 test.py: remove tests_to_run
Avoid storing each test twice, use per-tests
list to construct a global iterable.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
169128f80b test.py: virtualize Test.run(), to introduce CqlTest.Run next 2020-01-15 10:53:24 +03:00
Konstantin Osipov
d05f6c3cc7 test.py: virtualize test search pattern per TestSuite
CQL tests have .cql extension, while unit tests
have .cc.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
abcc182ab3 test.py: virtualize write_xunit_report()
Make sure any non-boost test can participate in the report.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
18aafacfad test.py: ensure print_summary() is agnostic of test type
Introduce a virtual Test.print_summary() to print
a failed test summary.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
21fbe5fa81 test.py: tidy up print_summary()
Now that we have tabular output, make print_summary()
more concise.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
c171882b51 test.py: introduce base class Test for CQL and Unit tests 2020-01-15 10:53:24 +03:00
Konstantin Osipov
fd6897d53e test.py: move the default arguments handling to UnitTestSuite
Move UnitTeset default seastar argument handling to UnitTestSuite
(cleanup).
2020-01-15 10:53:24 +03:00
Konstantin Osipov
d3126f08ed test.py: move custom unit test command line arguments to suite.yaml
Load the command line arguments, if any, from suite.yaml, rather
than keep them hard-coded in test.py.

This is allows operations team to have easier access to these.

Note I had to sacrifice dynamic smp count for mutation_reader_test
(the new smp count is fixed at 3) since this is part
of test configuration now.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
ef6cebcbd2 test.py: move command line argument processing to UnitTestSuite 2020-01-15 10:53:24 +03:00
Konstantin Osipov
4a20617be3 test.py: introduce add_test(), which is suite-specific 2020-01-15 10:53:24 +03:00
Konstantin Osipov
7e10bebcda test.py: move long test list to suite.yaml
Use suite.yaml for long tests
2020-01-15 10:53:24 +03:00
Konstantin Osipov
32ffde91ba test.py: move test id assignment to TestSuite
Going forward finding and creating tests will be
a responsibility of TestSuite, so the id generator
needs to be shared.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
b5b4944111 test.py: move repeat handling to TestSuite
This way we can avoid iterating over all tests
to handle --repeat.
Besides, going forward the tests will be stored
in two places: in the global list of all tests,
for the runner, and per suite, for suite-based
reporting, so it's easier if TestSuite
if fully responsible for finding and adding tests.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
34a1b49fc3 test.py: move add_test_list() to TestSuite 2020-01-15 10:53:24 +03:00
Konstantin Osipov
44e1c4267c test.py: introduce test suites
- UnitTestSuite - for test/unit tests
- BoostTestSuite - a tweak on UnitTestSuite, with options
  to log xml test output to a dedicated file
2020-01-15 10:53:24 +03:00
Konstantin Osipov
eed3201ca6 test.py: use path, rather than test kind, for search pattern
Going forward there may be multiple suites of the same kind.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
f95c97667f test.py: support arbitrary number of test suites
Scan entire test/ for folders that contain suite.yaml,
and load tests from these folders. Skip the rest.

Each folder with a suite.yaml is expected to have a valid
suite configuration in the yaml file.

A suite is a folder with test of the same type. E.g.
it can be a folder with unit tests, boost tests, or CQL
tests.

The harness will use suite.yaml to create an appropriate
suite test driver, to execute tests in different formats.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
c1f8169cd4 test.py: add suite.yaml to boost and unit tests
The plan is to move suite-specific settings to the
configuration file.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
ec9ad04c8a test.py: move 'success' to TestUnit class
There will be other success attributes: program return
status 0 doesn't mean the test is successful for all tests.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
b4aa4d35c3 test.py: save test output in tmpdir
It is handy to have it so that a reference of a failed
test is available without re-running it.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
f4efe03ade test.py: always produce xml output, derive output paths from tmpdir
It reduces the number of configurations to re-test when test.py is
modified.  and simplifies usage of test.py in build tools, since you no
longer need to bother with extra arguments.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
d2b546d464 test.py: output job count in the log 2020-01-15 10:53:24 +03:00
Konstantin Osipov
233f921f9d test.py: make test output brief&tabular
New format:

% ./test.py --verbose --mode=release
================================================================================
[N/TOTAL] TEST                                                 MODE   RESULT
------------------------------------------------------------------------------
[1/111]   boost/UUID_test                                    release  [ PASS ]
[2/111]   boost/enum_set_test                                release  [ PASS ]
[3/111]   boost/like_matcher_test                            release  [ PASS ]
[4/111]   boost/observable_test                              release  [ PASS ]
[5/111]   boost/allocation_strategy_test                     release  [ PASS ]
^C
% ./test.py foo
================================================================================
[N/TOTAL] TEST                                                 MODE   RESULT
------------------------------------------------------------------------------
[3/3]     unit/memory_footprint_test                          debug   [ PASS ]
------------------------------------------------------------------------------
2020-01-15 10:53:24 +03:00
Konstantin Osipov
879bea20ab test.py: add a log file
Going forward I'd like to make terminal output brief&tabular,
but some test details are necessary to preserve so that a failure
is easy to debug. This information now goes to the log file.

- open and truncate the log file on each harness start
- log options of each invoked test in the log, so that
  a failure is easy to reproduce
- log test result in the log

Since tests are run concurrently, having an exact
trace of concurrent execution also helps
debugging flaky tests.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
cbee76fb95 test.py: gitignore the default ./test.py tmpdir, ./testlog 2020-01-15 10:53:24 +03:00
Konstantin Osipov
1de69228f1 test.py: add --tmpdir
It will be used for test log files.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
caf742f956 test.py: flake8 style fix 2020-01-15 10:53:24 +03:00
Konstantin Osipov
dab364c87d test.py: sort imports 2020-01-15 10:53:24 +03:00
Konstantin Osipov
7ec4b98200 test.py: make name a positional argument.
Accept multiple test names, treat test name
as a substring, and if the same name is given
multiple times, run the test multiple times.
2020-01-15 10:53:24 +03:00
Dejan Mircevski
bb2e04cc8b alternator: Improve comments on comparators
Some comparator methods in conditions.cc use unexpected operators;
explain why.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-01-14 22:25:55 +02:00
Tomasz Grabiec
c8a5a27bd9 Merge "storage_service: Move load_broadcaster away" from Pavel E.
The storage_service struct is a collection of diverse things,
most of them requiring only on start and on stop and/or runing
on shard 0 (but is nonetheless sharded).

As a part of clearing this structure and generated by it inter-
-componenes dependencies, here's the sanitation of load_broadcaster.
2020-01-14 19:26:06 +01:00
Calle Wilund
313ed91ab0 cdc: Listen for migration callbacks on all shards
Fixes #5582

... but only populate log on shard 0.

Migration manager callbacks are slightly assymetric. Notifications
for pre-create/update mutations are sent only on initiating shard
(neccesary, because we consider the mutations mutable).
But "created" callbacks are sent on all shards (immutable).

We must subscribe on all shards, but still do population of cdc table
only once, otherwise we can either miss table creat or populate
more than once.

v2:
- Add test case
Message-Id: <20200113140524.14890-1-calle@scylladb.com>
2020-01-14 16:35:41 +01:00
Avi Kivity
2138657d3a Update seastar submodule
* seastar 36cf5c5ff0...3f3e117de3 (16):
  > memcached: don't use C++17-only std::optional
  > reactor: Comment why _backend is assigned in constructor body
  > log: restore --log-to-stdout for backward compatibility
  > used_size.hh: Include missing headers
  > core: Move some code from reactor.cc to future.cc
  > future-util: move parallel_for_each to future-util.cc
  > task: stop wrapping tasks with unique_ptr
  > Merge "Setup timer signal handler in backend constructor" from Pavel
Fixes #5524
  > future: avoid a branch in future's move constructor if type is trivial
  > utils: Expose used_size
  > stream: Call get_future early
  > future-util: Move parallel_for_each_state code to a .cc
  > memcached: log exceptions
  > stream: Delete dead code
  > core: Turn pollable_fd into a simple proxy over pollable_fd_state.
  > Merge "log to std::cerr" from Benny
2020-01-14 16:56:25 +02:00
Pavel Emelyanov
e1ed8f3f7e storage_service: Remove _shadow_token_metadata
This is the part of de-bloating storage_service.

The field in question is used to temporary keep the _token_metadata
value during shard-wide replication. There's no need to have it as
class member, any "local" copy is enough.

Also, as the size of token_metadata is huge, and invoke_on_all()
copies the function for each shard, keep one local copy of metadata
using do_with() and pass it into the invoke_on_all() by reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Reviewed-by:  Asias He <asias@scylladb.com>
Message-Id: <20200113171657.10246-1-xemul@scylladb.com>
2020-01-14 16:29:10 +02:00
Rafael Ávila de Espíndola
054f5761a7 types: Refactor code into a serialize_varint helper
This is a bit cleaner and avoids a boost::multiprecision::cpp_int copy
while serializing a decimal.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200110221422.35807-1-espindola@scylladb.com>
2020-01-14 16:28:27 +02:00
Avi Kivity
6c84dd0045 cql3: update_statement: do not set query option always_return_static_content for list read-before-write
The query option always_return_static_content was added for lightweight
transations in commits e0b31dd273 (infrastructure) and 65b86d155e
(actual use). However, the flag was added unconditionally to
update_parameters::options. This caused it to be set for list
read-modify-write operations, not just for lightweight transactions.
This is a little wasteful, and worse, it breaks compatibility as old
nodes do not understand the always_return_static_content flag and
complain when they see it.

To fix, remove the always_return_static_content from
update_parameters::options and only set it from compare-and-swap
operations that are used to implement lightweight transactions.

Fixes #5593.

Reviewed-by: Gleb Natapov <gleb@scylladb.com>
Message-Id: <20200114135133.2338238-1-avi@scylladb.com>
2020-01-14 16:15:20 +02:00
Hagit Segev
ef88e1e822 CentOS RPMs: Remove target to enable general centos. 2020-01-14 14:31:03 +02:00
Alejo Sanchez
6909d4db42 cql3: BYPASS CACHE query counter
This patch is the first part of requested full scan metrics.
It implements a counter of SELECT queries with BYPASS CACHE option.

In scope of #5209

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Message-Id: <20200113222740.506610-2-alejo.sanchez@scylladb.com>
2020-01-14 12:19:00 +02:00
Rafael Ávila de Espíndola
dca1bc480f everywhere: Use serialized(foo) instead of data_value(foo).serialize()
This is just a simple cleanup that reduces the size of another patch I
am working on and is an independent improvement.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200114051739.370127-1-espindola@scylladb.com>
2020-01-14 12:17:12 +02:00
Pavel Emelyanov
b9f28e9335 storage_service: Remove dead drain branch
The drain_in_progress variable here is the future that's set by the
drain() operation itself. Its promise is set when the drain() finishes.

The check for this future in the beginning of drain() is pointless.
No two drain()-s can run in parallels because of run_with_api_lock()
protection. Doing the 2nd drain after successfull 1st one is also
impossible due to the _operation_mode check. The 2nd drain after
_exceptioned_ (and thus incomplete) 1st one will deadlock, after
this patch will try to drain for the 2nd time, but that should by ok.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200114094724.23876-1-xemul@scylladb.com>
2020-01-14 12:07:29 +02:00
Piotr Sarna
36ec43a262 Merge "add table with connected cql clients" from Juliusz
This change introduces system.clients table, which provides
information about CQL clients connected.

PK is the client's IP address, CK consists of outgoing port number
and client_type (which will be extended in future to thrift/alternator/redis).
Table supplies also shard_id and username. Other columns,
like connection_stage, driver_name, driver_version...,
are currently empty but exist for C* compatibility and future use.

This is an ordinary table (i.e. non-virtual) and it's updated upon
accepting connections. This is also why C*'s column request_count
was not introduced. In case of abrupt DB stop, the table should not persist,
so it's being truncated on startup.

Resolves #4820
2020-01-14 10:01:07 +02:00
Avi Kivity
1f46133273 Merge "data: make cell::make_collection() exception safe" from Botond
"
Most of the code in `cell` and the `imr` infrastructure it is built on
is `noexcept`. This means that extra care must be taken to avoid rouge
exceptions as they will bring down the node. The changes introduced by
0a453e5d3a did just that - introduced rouge `std::bad_alloc` into this
code path by violating an undocumented and unvalidated assumption --
that fragment ranges passed to `cell::make_collection()` are nothrow
copyable and movable.

This series refactors `cell::make_collection()` such that it does not
have this assumption anymore and is safe to use with any range.

Note that the unit test included in this series, that was used to find
all the possible exception sources will not be currently run in any of
our build modes, due to `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` not
being set. I plan to address this in a followup because setting this
flags fails other tests using the failure injection mechanism. This is
because these tests are normally run with the failure injection disabled
so failures managed to lurk in without anyone noticing.

Fixes: #5575
Refs: #5341

Tests: unit(dev, debug)
"

* 'data-cell-make-collection-exception-safety/v2' of https://github.com/denesb/scylla:
  test: mutation_test: add exception safety test for large collection serialization
  data/cell.hh: avoid accidental copies of non-nothrow copiable ranges
  utils/fragment_range.hh: introduce fragment_range_view
2020-01-14 10:01:06 +02:00
Nadav Har'El
5b08ec3d2c alternator: error on unsupported ScanIndexForward=false
We do not yet support the ScanIndexForward=false option for reversing
the sort order of a Query operation, as reported in issue #5153.
But even before implementing this feature, it is important that we
produce an error if a user attempts to use it - instead of outright
ignoring this parameter and giving the user wrong results. This is
what this patch does.

Before this patch, the reverse-order query in the xfailing test
test_query.py::test_query_reverse seems to succeed - yet gives
results in the wrong order. With this patch, the query itself fails -
stating that the ScanIndexForward=false argument is not supported.

Refs #5153

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200105113719.26326-1-nyh@scylladb.com>
2020-01-14 10:01:06 +02:00
Pavel Emelyanov
c4bf532d37 storage_service: Fix race in removenode/force_removenode/other
Here's another theoretical problem, that involves 3 sequential calls
to respectively removenode, force_removenode and some other operation.
Let's walk through them

First goes the removenode:
  run_with_api_lock
    _operation_in_progress = "removenode"
    storage_service::remove_node
      sleep in replicating_nodes.empty() loop

Now the force_removenode can run:

  run_with_no_api_lock
    storage_service::force_removenode
      check _operation_in_progress (not empty)
      _force_remove_completion = true
      sleep in _operation_in_progress.empty loop

Now the 1st call wakes up and:

    if _force_remove_completion == true
      throw <some exception>
  .finally() handler in run_with_api_lock
    _operation_in_progress = <empty>

At this point some other operation may start. Say, drain:

  run_with_api_lock
    _operation_in_progress = "drain"
    storage_service::drain
      ...
      go to sleep somewhere

No let's go back to the 1st op that wakes up from its sleep.
The code it executes is

    while (!ss._operation_in_progress.empty()) {
        sleep_abortable()
    }

and while the drain is running it will never exit.

However (! and this is the core of the race) should the drain
operation happen _before_ the force_removenode, another check
for _operation_in_progress would have made the latter exit with
the "Operation drain is in progress, try again" message.

Fix this inconsistency by making the check for current operation
every wake-up from the sleep_abortable.

Fixes #5591

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-14 10:01:06 +02:00
Pavel Emelyanov
cc92683894 storage_service: Fix race and deadlock in removenode/force_removenode
Here's a theoretical problem, that involves 3 sequential calls
to respectively removenode, force_removenode and removenode (again)
operations. Let's walk through them

First goes the removenode:
  run_with_api_lock
    _operation_in_progress = "removenode"
    storage_service::remove_node
      sleep in replicating_nodes.empty() loop

Now the force_removenode can run:

  run_with_no_api_lock
    storage_service::force_removenode
      check _operation_in_progress (not empty)
      _force_remove_completion = true
      sleep in _operation_in_progress.empty loop

Now the 1st call wakes up and:

    if _force_remove_completion == true
      _force_remove_completion = false
      throw <some exception>
  .finally() handler in run_with_api_lock
    _operation_in_progress = <empty>

! at this point we have _force_remove_completion = false and
_operation_in_progress = <empty>, which opens the following
opportunity for the 3d removenode:

  run_with_api_lock
    _operation_in_progress = "removenode"
    storage_service::remove_node
      sleep in replicating_nodes.empty() loop

Now here's what we have in 2nd and 3rd ops:

1. _operation_in_progress = "removenode" (set by 3rd) prevents the
   force_removenode from exiting its loop
2. _force_remove_completion = false (set by 1st on exit) prevents
   the removenode from waiting on replicating_nodes list

One can start the 4th call with force_removenode, it will proceed and
wake up the 3rd op, but after it we'll have two force_removenode-s
running in parallel and killing each other.

I propose not to set _force_remove_completion to false in removenode,
but just exit and let the owner of this flag unset it once it gets
the control back.

Fixes #5590

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-14 10:01:06 +02:00
Benny Halevy
ff55b5dca3 cql3: functions: limit sum overflow detection to integral types
Other types do not have a wider accumulator at the moment.
And static_cast<accumulator_type>(ret) != _sum evaluates as
false for NaN/Inf floating point values.

Fixes #5586

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200112183436.77951-1-bhalevy@scylladb.com>
2020-01-14 10:01:06 +02:00
Avi Kivity
e3310201dd atomic_cell_or_collection: type-aware print atomic_cell or collection components
Now that atomic_cell_view and collection_mutation_view have
type-aware printers, we can use them in the type-aware atomic_cell_or_collection
printer.
Message-Id: <20191231142832.594960-1-avi@scylladb.com>
2020-01-14 10:01:06 +02:00
Avi Kivity
931b196d20 mutation_partition: row: resolve column name when in schema-aware printer
Instead of printing the column id, print the full column name.
Message-Id: <20191231142944.595272-1-avi@scylladb.com>
2020-01-14 10:01:06 +02:00
Nadav Har'El
4aa323154e merge: Pretty print canonical_mutation objects
Merged pull request https://github.com/scylladb/scylla/pull/5533
from Avi Kivity:

canonical_mutation objects are used for schema reconciliation, which is a
fragile area and thus deserves some debugging help.

This series makes canonical_mutation objects printable.
2020-01-14 10:01:06 +02:00
Takuya ASADA
5241deda2d dist: nonroot: fix CLI tool path for nonroot (#5584)
CLI tool path is hardcorded, need to specify correct path on nonroot.
2020-01-14 10:01:06 +02:00
Nadav Har'El
1511b945f8 merge: Handle multiple regular base columns in view pk
Merged patch series from Piotr Sarna:

"Previous assumption was that there can only be one regular base column
in the view key. The assumption is still correct for tables created
via CQL, but it's internally possible to create a view with multiple
such columns - the new assumption is that if there are multiple columns,
they share their liveness.

This series is vital for indexing to work properly on alternator,
so it would be best to solve the issue upstream. I strived to leave
the existing semantics intact as long as only up to one regular
column is part of the materialized view primary key, which is the case
for Scylla's materialized views. For alternator it may not be true,
but all regular columns in alternator share liveness info (since
alternator does not support per-column TTL), which is sufficient
to compute view updates in a consistent way.

Fixes #5006
Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo)"

Piotr Sarna (3):
  db,view: fix checking if partition key is empty
  view: handle multiple regular base columns in view pk
  test: add a case for multiple base regular columns in view key

 alternator-test/test_gsi.py              |  1 -
 view_info.hh                             |  5 +-
 cql3/statements/alter_table_statement.cc |  2 +-
 db/view/view.cc                          | 77 ++++++++++++++----------
 mutation_partition.cc                    |  2 +-
 test/boost/cql_query_test.cc             | 58 ++++++++++++++++++
 6 files changed, 109 insertions(+), 36 deletions(-)
2020-01-14 10:01:00 +02:00
Nadav Har'El
f16e3b0491 merge: bouncing lwt request to an owning shard
Merged patch series from Gleb Natapov:

"LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by the transport
code that jumps to a correct shard and re-process incoming message there.

The nicer way to achieve the same would be to jump to a right shard
inside of the storage_proxy::cas(), but unfortunately with current
implementation of the modification statements they are unusable by
a shard different from where it was created, so the jump should happen
before a modification statement for an cas() is created. When we fix our
cql code to be more cross-shard friendly this can be reworked to do the
jump in the storage_proxy."

Gleb Natapov (4):
  transport: change make_result to takes a reference to cql result
    instead of shared_ptr
  storage_service: move start_native_transport into a thread
  lwt: Process lwt request on a owning shard
  lwt: drop invoke_on in paxos_state prepare and accept

 auth/service.hh                           |   5 +-
 message/messaging_service.hh              |   2 +-
 service/client_state.hh                   |  30 +++-
 service/paxos/paxos_state.hh              |  10 +-
 service/query_state.hh                    |   6 +
 service/storage_proxy.hh                  |   2 +
 transport/messages/result_message.hh      |  20 +++
 transport/messages/result_message_base.hh |   4 +
 transport/request.hh                      |   4 +
 transport/server.hh                       |  25 ++-
 cql3/statements/batch_statement.cc        |   6 +
 cql3/statements/modification_statement.cc |   6 +
 cql3/statements/select_statement.cc       |   8 +
 message/messaging_service.cc              |   2 +-
 service/paxos/paxos_state.cc              |  48 ++---
 service/storage_proxy.cc                  |  47 ++++-
 service/storage_service.cc                | 120 +++++++------
 test/boost/cql_query_test.cc              |   1 +
 thrift/handler.cc                         |   3 +
 transport/messages/result_message.cc      |   5 +
 transport/server.cc                       | 203 ++++++++++++++++------
 21 files changed, 377 insertions(+), 180 deletions(-)
2020-01-14 09:59:59 +02:00
Botond Dénes
300728120f test: mutation_test: add exception safety test for large collection serialization
Use `seastar::memory::local_failure_injector()` to inject al possible
`std::bad_alloc`:s into the collection serialization code path. The test
just checks that there are no `std::abort()`:s caused by any of the
exceptions.

The test will not be run if `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` is
not defined.
2020-01-13 16:53:35 +02:00
Botond Dénes
3ec889816a data/cell.hh: avoid accidental copies of non-nothrow copiable ranges
`cell::make_collection()` assumes that all ranges passed to it are
nothrow copyable and movable views. This is not guaranteed, is not
expressed in the interface and is not mentioned in the comments either.
The changes introduced by 0a453e5d3a to collection serialization, making
it use fragmented buffers, fell into this trap, as it passes
`bytes_ostream` to `cell::make_collection()`. `bytes_ostream`'s copy
constructor allocates and hence can throw, triggering an
`std::terminate()` inside `cell::make_collection()` as the latter is
noexcept.

To solve this issue, non-nothrow copyable and movable ranges are now
wrapped in a `fragment_range_view` to make them so.
`cell::make_collection()` already requires callers to keep alive the
range for the duration of the call, so this does not introduce any new
requirements to the callers. Additionally, to avoid any future
accidents, do not accept temporaries for the `data` parameter. We don't
ever want to move this param anyway, we will either have a trivially
copyable view, or a potentially heavy-weight range that we will create a
trivially copyable view of.
2020-01-13 16:53:35 +02:00
Botond Dénes
b52b4d36a2 utils/fragment_range.hh: introduce fragment_range_view
A lightweight, trivially copyable and movable view for fragment ranges.
Allows for uniform treatment of all kinds of ranges, i.e. treating all
of them as a view. Currently `fragment_range.hh` provides lightweight,
view-like adaptors for empty and single-fragment ranges (`bytes_view`). To
allow code to treat owning multi-fragment ranges the shame way as the
former two, we need a view for the latter as well -- this is
`fragment_range_view`.
2020-01-13 16:52:59 +02:00
Calle Wilund
75f2b2876b cdc: Remove free function for mutation augmentation 2020-01-13 13:18:55 +00:00
Calle Wilund
3eda3122af cdc: Move mutation augment from cql3::modification_statement to storage proxy
Using the attached service object
2020-01-13 13:18:55 +00:00
Juliusz Stasiewicz
27dfda0b9e main/transport: using the infrastructure of system.clients
Resolves #4820. Execution path in main.cc now cleans up system.clients
table if it exists (this is done on startup). Also, server.cc now calls
functions that notify about cql clients connecting/disconnecting.
2020-01-13 14:07:04 +01:00
Pavel Emelyanov
148da64a7e storage_servce: Move load_broadcaster away
This simplifies the storage_service API and fixes the
complain about shared_ptr usage instead of unique_ptr.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-13 13:55:09 +03:00
Pavel Emelyanov
b6e1e6df64 misc_services: Introduce load_meter
There's a lonely get_load_map() call on storage_service that
needs only load broadcaster, always runs on shard 0 and that's it.

Next patch will move this whole stuff into its own helper no-shard
container and this is preparation for this.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-13 13:53:08 +03:00
Gleb Natapov
5753ab7195 lwt: drop invoke_on in paxos_state prepare and accept
Since lwt requests are now running on an owning shard there is no longer
a need to invoke cross shard call on paxos_state level. RPC calls may
still arrive to a wrong shard so we need to make cross shard call there.
2020-01-13 10:26:02 +02:00
Gleb Natapov
d28dd4957b lwt: Process lwt request on a owning shard
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by transport code
that jumps to a correct shard and re-process incoming message there.
2020-01-13 10:26:02 +02:00
Piotr Sarna
3853594108 alternator-test: turn off TLS self-signed verification
Two test cases did not ignore TLS self-signed warnings, which are used
locally for testing HTTPS.

Fixes #5557

Tests(test_health, test_authorization)
Message-Id: <8bda759dc1597644c534f94d00853038c2688dd7.1578394444.git.sarna@scylladb.com>
2020-01-10 15:31:30 +02:00
Rafael Ávila de Espíndola
5313828ab8 cql3: Fix indentation
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200109025855.10591-2-espindola@scylladb.com>
2020-01-09 10:42:55 +02:00
Rafael Ávila de Espíndola
4da6dc1a7f cql3: Change a lambda capture order to match another
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200109025855.10591-1-espindola@scylladb.com>
2020-01-09 10:42:49 +02:00
Avi Kivity
6d454d13ac db/schema_tables: make gratuitous generic lambdas in do_merge_schema() concrete
Those gratuitous lambdas make life harder for IDE users by hiding the actual
types from the IDEs.
Message-Id: <20200107154746.1918648-1-avi@scylladb.com>
2020-01-08 17:43:18 +01:00
Avi Kivity
454074f284 Merge "database: Avoid OOMing with flush continuations after failed memtable flush" from Tomasz
"
The original fix (10f6b125c8) didn't
take into account that if there was a failed memtable flush (Refs
flush) but is not a flushable memtable because it's not the latest in
the memtable list. If that happens, it means no other memtable is
flushable as well, cause otherwise it would be picked due to
evictable_occupancy(). Therefore the right action is to not flush
anything in this case.

Suspected to be observed in #4982. I didn't manage to reproduce after
triggering a failed memtable flush.

Fixes #3717
"

* tag 'avoid-ooming-with-flush-continuations-v2' of github.com:tgrabiec/scylla:
  database: Avoid OOMing with flush continuations after failed memtable flush
  lsa: Introduce operator bool() to occupancy_stats
  lsa: Expose region_impl::evictable_occupancy in the region class
2020-01-08 16:58:54 +02:00
Gleb Natapov
feed544c5d paxos: fix truncation time checking during learn stage
The comparison is done in millisecons, not microseconds.

Fixes #5566

Message-Id: <20200108094927.GN9084@scylladb.com>
2020-01-08 14:37:07 +01:00
Gleb Natapov
2832f1d9eb storage_service: move start_native_transport into a thread
The code runs only once and it is simple if it runs in a seastar thread.
2020-01-08 14:57:57 +02:00
Gleb Natapov
7fb2e8eb9f transport: change make_result to takes a reference to cql result instead of shared_ptr 2020-01-08 14:57:57 +02:00
Avi Kivity
0bde5906b3 Merge "cql3: detect and handle int overflow in aggregate functions #5537" from Benny
"
Fix overflow handling in sum() and avg().

sum:
 - aggregated into __int128
 - detect overflow when computing result and log a warning if found

avg:
 - fix division function to divide the accumulator type _sum (__int128 for integers) by _count

Add unit tests for both cases

Test:
  - manual test against Cassandra 3.11.3 to make sure the results in the scylla unit test agree with it.
  - unit(dev), cql_query_test(debug)

Fixes #5536
"

* 'cql3-sum-overflow' of https://github.com/bhalevy/scylla:
  test: cql_query_test: test avg overflow
  cql3: functions: protect against int overflow in avg
  test: cql_query_test: test sum overflow
  cql3: functions: detect and handle int overflow in sum
  exceptions: sort exception_code definitions
  exceptions: define additional cassandra CQL exceptions codes
2020-01-08 10:39:38 +02:00
Avi Kivity
d649371baa Merge "Fix crash on SELECT SUM(udf(...))" from Rafael
"
We were failing to start a thread when the UDF call was nested in an
aggregate function call like SUM.
"

* 'espindola/fix-sum-of-udf' of https://github.com/espindola/scylla:
  cql3: Fix indentation
  cql3: Add missing with_thread_if_needed call
  cql3: Implement abstract_function_selector::requires_thread
  remove make_ready_future call
2020-01-08 10:25:42 +02:00
Benny Halevy
dafbd88349 query: initialize read_command timestamp to now
This was initialized to api::missing_timestamp but
should be set to either a client provided-timestamp or
the server's.

Unlike write operations, this timestamp need not be unique
as the one generated by client_state::get_timestamp.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200108074021.282339-2-bhalevy@scylladb.com>
2020-01-08 10:19:07 +02:00
Benny Halevy
39325cf297 storage_proxy: fix int overflow in service::abstract_read_executor::execute
exec->_cmd->read_timestamp may be initialized by default to api::min_timestamp,
causing:
  service/storage_proxy.cc:3328:116: runtime error: signed integer overflow: 1577983890961976 - -9223372036854775808 cannot be represented in type 'long int'
  Aborting on shard 1.

Do not optimize cross-dc repair if read_timestamp is missing (or just negative)
We're interested in reads that happen within write_timeout of a write.

Fixes #5556

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200108074021.282339-1-bhalevy@scylladb.com>
2020-01-08 10:18:59 +02:00
Raphael S. Carvalho
390c8b9b37 sstables: Move STCS implementation to source file
header only implementation potentially create a problem with duplicate symbols

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200107154258.9746-1-raphaelsc@scylladb.com>
2020-01-08 09:55:35 +02:00
Benny Halevy
20a0b1a0b6 test: cql_query_test: test avg overflow
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:50:50 +02:00
Benny Halevy
1c81422c1b cql3: functions: protect against int overflow in avg
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:33 +02:00
Benny Halevy
9053ef90c7 test: cql_query_test: test sum overflow
Add unit tests for summing up int's and bigint's
with possible handling of overflow.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:33 +02:00
Benny Halevy
e97a111f64 cql3: functions: detect and handle int overflow in sum
Detect integer overflow in cql sum functions and throw an error.
Note that Cassandra quietly truncates the sum if it doesn't fit
in the input type but we rather break compatibility in this
case. See https://issues.apache.org/jira/browse/CASSANDRA-4914?focusedCommentId=14158400&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14158400

Fixes #5536

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:33 +02:00
Benny Halevy
98260254df exceptions: sort exception_code definitions
Be compatible with Cassandra source.
It's easier to maintain this way.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:21 +02:00
Benny Halevy
30d0f1df75 exceptions: define additional cassandra CQL exceptions codes
As of e9da85723a

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:40:57 +02:00
Rafael Ávila de Espíndola
282228b303 cql3: Fix indentation
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:14:50 -08:00
Rafael Ávila de Espíndola
4316bc2e18 cql3: Add missing with_thread_if_needed call
This fixes an assert when doing sum(udf(...)).

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:14:50 -08:00
Rafael Ávila de Espíndola
d301d31de0 cql3: Implement abstract_function_selector::requires_thread
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:14:24 -08:00
Rafael Ávila de Espíndola
dc9b3b8ff2 remove make_ready_future call
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:10:27 -08:00
Calle Wilund
9f6b22d882 cdc: Assign self to storage proxy object 2020-01-07 12:01:58 +00:00
Calle Wilund
fc5904372b storage_proxy: Add (optional) cdc service object pointer member
The cdc service is assigned from outside, post construction, mainly
because of the chickens and eggs in main startup. Would be nice to
have it unconditionally, but this is workable.
2020-01-07 12:01:58 +00:00
Calle Wilund
d6003253dd storage_proxy: Move mutate_counters to private section
It is (and shall) only be called from inside storage proxy,
and we would like this to be reflected in the interface
so our eventual moving of cdc logic into the mutate call
chains become easier to verify and comprehend.
2020-01-07 12:01:58 +00:00
Calle Wilund
b6c788fccf cdc: Add augmentation call to cdc service
To eventually replace the free function.
Main difference is this is build to both handle batches correctly
and to eventually allow hanging cdc object on storage proxy,
and caches on the cdc object.
2020-01-07 12:01:58 +00:00
Piotr Sarna
04dc8faec9 test: add a case for multiple base regular columns in view key
The test case checks that having two base regular columns
in the materialized view key (not obtainable via CQL),
still works fine when values are inserted or deleted.
If TTL was involved and these columns would have different expiration
rules, the case would be more complicated, but it's not possible
for a user to reach that case - neither with CQL, nor with alternator.
2020-01-07 12:19:06 +01:00
Piotr Sarna
155a47cc55 view: handle multiple regular base columns in view pk
Previous assumption was that there can only be one regular base column
in the view key. The assumption is still correct for tables created
via CQL, but it's internally possible to create a view with multiple
such columns - the new assumption is that if there are multiple columns,
they share their liveness.
This patch is vital for indexing to work properly on alternator,
so it would be best to solve the issue upstream. I strived to leave
the existing semantics intact as long as only up to one regular
column is part of the materialized view primary key, which is the case
for Scylla's materialized views. For alternator it may not be true,
but all regular columns in alternator share liveness info (since
alternator does not support per-column TTL), which is sufficient
to compute view updates in a consistent way.

Fixes #5006

Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo)

Message-Id: <c9dec243ce903d3a922ce077dc274f988bcf5d57.1567604945.git.sarna@scylladb.com>
2020-01-07 12:18:39 +01:00
Avi Kivity
6e0a073b2e mutation_partition: use type-aware printing of the clustering row
Now that position_in_partition_view has type-aware printing, use it
to provide a human readable version of clustering keys.
Message-Id: <20191231151315.602559-2-avi@scylladb.com>
2020-01-07 12:17:11 +01:00
Avi Kivity
488c42408a position_in_partition_view: add type-aware printer
If the position_in_partition_view represents a clustering key,
we can now see it with the clustering key decoded according to
the schema.
Message-Id: <20191231151315.602559-1-avi@scylladb.com>
2020-01-07 12:15:09 +01:00
Piotr Sarna
54315f89cd db,view: fix checking if partition key is empty
Previous implementation did not take into account that a column
in a partition key might exist in a mutation, but in a DEAD state
- if it's deleted. There are no regressions for CQL, while for
alternator and its capability of having two regular base columns
in a view key, this additional check must be performed.
2020-01-07 12:05:36 +01:00
Avi Kivity
3a3c20d337 schema_tables: de-templatize diff_table_or_view()
This reduces code bloat and makes the code friendlier for IDEs, as the
IDE now understands the type of create_schema.
Message-Id: <20191231134803.591190-1-avi@scylladb.com>
2020-01-07 11:56:54 +01:00
Avi Kivity
e5e42672f5 sstables: reduce bloat from sstables::write_simple()
sstables::write_simple() has quite a lot of boilerplate
which gets replicated into each template instance. Move
all of that into a non-template do_write_simple(), leaving
only things that truly depend on the component being written
in the template, and encapsulating them with a
noncopyable_function.

An explicit template instantiation was added, since this
is used in a header file. Before, it likely worked by
accident and stopped working when the template became
small enough to inline.

Tests: unit (dev)
Message-Id: <20200106135453.1634311-1-avi@scylladb.com>
2020-01-07 11:56:11 +01:00
Avi Kivity
8f7f56d6a0 schema_tables: make gratuitous generic lambda in create_tables_from_partitions() concrete
The generic lambda made IDE searches for create_table_from_table_row() fail.
Message-Id: <20191231135210.591972-1-avi@scylladb.com>
2020-01-07 11:49:10 +01:00
Avi Kivity
92fd83d3af schema_tables: make gratuitoous generic lambda in create_table_from_name() concrete
The lambda made IDE searches for read_table_mutations fail.
Message-Id: <20191231135103.591741-1-avi@scylladb.com>
2020-01-07 11:48:56 +01:00
Avi Kivity
dd6dd97df9 schema_tables: make gratuitous generic lambda in merge_tables_and_views() concrete
The generic lambda made IDE searches for create_table_from_mutations fail.
Message-Id: <20191231135059.591681-1-avi@scylladb.com>
2020-01-07 11:48:39 +01:00
Avi Kivity
c63cf02745 canonical_mutation: add pretty printing
Add type-aware printing of canonical_mutation objects.
2020-01-07 12:06:31 +02:00
Avi Kivity
e093121687 mutation_partition_view: add virtual visitor
mutation_partition_view now supports a compile-time resolved visitor.
This is performant but results in bloat when the performance is not
needed. Furthermore, the template function that applies the object
to the visitor is private and out-of-line, to reduce compile time.

To allow visitation on mutation_partition_view objects, add a virtual
visitor type and a non-template accept function.

Note: mutation_partition_visitor is very similar to the new type,
but different enough to break the template visitor which is used
to implement the new visitor.

The new visitor will be used to implement pretty printing for
canonical_mutation.
2020-01-07 12:06:31 +02:00
Avi Kivity
75d9909b27 collection_mutation_view: add type-aware pretty printer
Add a way for the user to associate a type with a collection_mutation_view
and get a nice printout.
2020-01-07 12:06:29 +02:00
Rafael Ávila de Espíndola
b80852c447 main: Explicitly allow scylla core dumps
I have not looked into the security reason for disabling it when
a program has file capabilities.

Fixes #5560

[avi: remove extraneous semicolon]
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200106231836.99052-1-espindola@scylladb.com>
2020-01-07 11:15:59 +02:00
Rafael Ávila de Espíndola
07f1cb53ea tests: run with ASAN_OPTIONS='disable_coredump=0:abort_on_error=1'
These are the same options we use in seastar.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200107001513.122238-1-espindola@scylladb.com>
2020-01-07 11:11:49 +02:00
Takuya ASADA
238a25a0f4 docker: fix typo of scylla-jmx script path (#5551)
The path should /opt/scylladb/jmx, not /opt/scylladb/scripts/jmx.

Fixes #5542
2020-01-07 10:54:16 +02:00
Asias He
401854dbaf repair: Avoid duplicated partition_end write
Consider this:

1) Write partition_start of p1
2) Write clustering_row of p1
3) Write partition_end of p1
4) Repair is stopped due to error before writing partition_start of p2
5) Repair calls repair_row_level_stop() to tear down which calls
   wait_for_writer_done(). A duplicate partition_end is written.

To fix, track the partition_start and partition_end written, avoid
unpaired writes.

Backports: 3.1 and 3.2
Fixes: #5527
2020-01-06 14:06:02 +02:00
Eliran Sinvani
e64445d7e5 debian-reloc: Propagate PRODUCT variable to renaming command in debian pkg
commit 21dec3881c introduced
a bug that will cause scylla debian build to fail. This is
because the commit relied on the environment PRODUCT variable
to be exported (and as a result, to propogate to the rename
command that is executed by find in a subshell)
This commit fixes it by explicitly passing the PRODUCT variable
into the rename command.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20200106102229.24769-1-eliransin@scylladb.com>
2020-01-06 12:31:58 +02:00
Asias He
38d4015619 gossiper: Remove HIBERNATE status from dead state
In scylla, the replacing node is set as HIBERNATE status. It is the only
place we use HIBERNATE status. The replacing node is supposed to be
alive and updating its heartbeat, so it is not supposed to be in dead
state.

This patch fixes the following problem in replacing.

   1) start n1, n2
   2) n2 is down
   3) start n3 to replace n2, but kill n3 in the middle of the replace
   4) start n4 to replace n2

After step 3 and step 4, the old n3 will stay in gossip forever until a
full cluster shutdown. Note n3 will only stay in gossip but in
system.peers table. User will see the annoying and infinite logs like on
all the nodes

   rpc - client $ip_of_n3:7000: fail to connect: Connection refused

Fixes: #5449
Tests: replace_address_test.py + manual test
2020-01-06 11:47:31 +02:00
Amos Kong
c5ec1e3ddc scylla_ntp_setup: check redhat variant version by prase_version (#5434)
VERSION_ID of centos7 is "7", but VERSION_ID of oel7.7 is "7.7"
scylla_ntp_setup doesn't work on OEL7.7 for ValueError.

- ValueError: invalid literal for int() with base 10: '7.7'

This patch changed redhat_version() to return version string, and compare
with parse_version().

Fixes #5433

Signed-off-by: Amos Kong <amos@scylladb.com>
2020-01-06 11:43:14 +02:00
Asias He
145fd0313a streaming: Fix map access in stream_manager::get_progress
When the progress is queried, e.g., query from nodetool netstats
the progress info might not be updated yet.

Fix it by checking before access the map to avoid errors like:

std::out_of_range (_Map_base::at)

Fixes: #5437
Tests: nodetool_additional_test.py:TestNodetool.netstats_test
2020-01-06 10:31:15 +02:00
Rafael Ávila de Espíndola
98cd8eddeb tests: Run with halt_on_error=1:abort_on_error=1
This depends on the just emailed fixes to undefined behavior in
tests. With this change we should quickly notice if a change
introduces undefined behavior.

Fixes #4054

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>

Message-Id: <20191230222646.89628-1-espindola@scylladb.com>
2020-01-05 17:20:31 +02:00
Rafael Ávila de Espíndola
dc5ecc9630 enum_option_test: Add explicit underlying types to enums
We expect to be able to create variables with out of range values, so
these enums needs explicit underlying types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200102173422.68704-1-espindola@scylladb.com>
2020-01-05 17:20:31 +02:00
Nadav Har'El
f0d8dd4094 merge: CDC rolling upgrade
Merged pull request https://github.com/scylladb/scylla/pull/5538 from
Avi Kivity and Piotr Jastrzębski.

This series prepares CDC for rolling upgrade. This consists of
reducing the footprint of cdc, when disabled, on the schema, adding
a cluster feature, and redacting the cdc column when transferring
it to other nodes. The latter is needed because we'll want to backport
this to 3.2, which doesn't have canonical_mutations yet.
2020-01-05 17:13:12 +02:00
Gleb Natapov
720c0aa285 commitlog: update last sync timestamp when cycle a buffer
If in memory buffer has not enough space for incoming mutation it is
written into a file, but the code missed updating timestamp of a last
sync, so we may sync to often.
Message-Id: <20200102155049.21291-9-gleb@scylladb.com>
2020-01-05 16:13:59 +02:00
Gleb Natapov
14746e4218 commitlog: drop segment gate
The code that enters the gate never defers before leaving, so the gate
behaves like a flag. Lets use existing flag to prohibit adding data to a
closed segment.
Message-Id: <20200102155049.21291-8-gleb@scylladb.com>
2020-01-05 16:13:59 +02:00
Gleb Natapov
f8c8a5bd1f test: fix error reporting in commitlog_test
Message-Id: <20200102155049.21291-7-gleb@scylladb.com>
2020-01-05 16:13:58 +02:00
Gleb Natapov
680330ae70 commitlog: introduce segment::close() function.
Currently segment closing code is spread over several functions and
activated based on the _closed flag. Make segment closing explicit
by moving all the code into close() function and call it where _closed
flag is set.
Message-Id: <20200102155049.21291-6-gleb@scylladb.com>
2020-01-05 16:13:55 +02:00
Gleb Natapov
a1ae08bb63 commitlog: remove unused segment::flush() parameter
Message-Id: <20200102155049.21291-5-gleb@scylladb.com>
2020-01-05 16:13:55 +02:00
Gleb Natapov
1e15e1ef44 commitlog: cleanup segment sync()
Call cycle() only once.
Message-Id: <20200102155049.21291-4-gleb@scylladb.com>
2020-01-05 16:13:54 +02:00
Gleb Natapov
3d3d2c572e commitlog: move segment shutdown code from sync()
Currently sync() does two completely different things based on the
shutdown parameter. Separate code into two different function.
Message-Id: <20200102155049.21291-3-gleb@scylladb.com>
2020-01-05 16:13:54 +02:00
Gleb Natapov
89afb92b28 commitlog: drop superfluous this
Message-Id: <20200102155049.21291-2-gleb@scylladb.com>
2020-01-05 16:13:53 +02:00
Piotr Jastrzebski
95feeece0b scylla_tables: treat empty cdc props as disabled
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
396e35bf20 cdc: add schema_change test for cdc_options
The original "test_schema_digest_does_not_change" test case ensures
that schema digests will match for older nodes that do not support
all the features yet (including computed columns).
The additional case uses sstables generated after CDC was enabled
and a table with CDC enabled is created,
in order to make sure that the digest computed
including CDC column does not change spuriously as well.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
c08e6985cd cdc: allow cluster rolling upgrade
Addition of cdc column in scylla_tables changes how schema
digests are calculated, and affect the ABI of schema update
messages (adding a column changes other columns' indexes
in frozen_mutation).

To fix this, extend the schema_tables mechanism with support
for the cdc column, and adjust schemas and mutations to remove
that column when sending schemas during upgrade.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
caa0a4e154 tests: disable CDC in schema_change_tests
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
129af99b94 cdc: Return reference from cluster_supports_cdc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
4639989964 cdc: Add CDC_OPTIONS schema_feature
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Avi Kivity
c150f2e5d7 schema_tables, cdc: don't store empty cdc columns in scylla_tables
An empty cdc column in scylla_tables is hashed differently from
a missing column. This causes schema mismatch when a schema is
propagated to another node, because the other node will redact
the schema column completely if the cluster feature isn't enabled,
and an empty value is hashed differently from a missing value.

Store a tombstone instead. Tombstones are removed before
digesting, so they don't affect the outcome.

This change also undoes the changes in 386221da84 ("schema_tables:
 handle 'cdc' options") to schema_change_test
test_merging_does_not_alter_tables_which_didnt_change. That change
enshrined the breakage into the test, instead of fixing the root cause,
which was that we added an an extra mutation to the schema (for
cdc options, which were disabled).
2020-01-05 14:36:18 +02:00
Rafael Ávila de Espíndola
3d641d4062 lua: Use existing cpp_int cast logic
Different versions of boost have different rules for what conversions
from cpp_int to smaller intergers are allowed.

We already had a function that worked with all supported versions, but
it was not being use by lua.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200104041028.215153-1-espindola@scylladb.com>
2020-01-05 12:10:54 +02:00
Rafael Ávila de Espíndola
88b5aadb05 tests: cql_test_env: wait for two futures starting internal services
I noticed this while looking at the crashes next is currently
experiencing.

While I have no idea if this fixes the issue, it does avoid broken
future warnings (for no_sharded_instance_exception) in a debug build.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200103201540.65324-1-espindola@scylladb.com>
2020-01-05 12:09:59 +02:00
Avi Kivity
4b8e2f5003 Update seastar submodule
* seastar 0525bbb08...36cf5c5ff (6):
  > memcached: Fix use after free in shutdown
  > Revert "task: stop wrapping tasks with unique_ptr"
  > task: stop wrapping tasks with unique_ptr
  > http: Change exception formating to the generic seastar one
  > Merge "Avoid a few calls to ~exception_ptr" from Rafael
  > tests: fix core generation with asan
2020-01-03 15:48:53 +02:00
Nadav Har'El
44c2a44b54 alternator-test: test for ConditionExpression feature
This patch adds a very comprehensive test for the ConditionExpression
feature, i.e., the newer syntax of conditional writes replacing
the old-style "Expected" - for the UpdateItem, PutItem and DeleteItem
operations.

I wrote these tests while closely following the DynamoDB ConditionExpression
documentation, and attempted to cover all conceivable features, subfeatures
and subcases of the ConditionExpression syntax - to serve as a test for a
future support for this feature in Alternator (see issue #5053).

As usual, all these tests pass on AWS DynamoDB, but because we haven't yet
implemented this feature in Alternator, all but one xfail on Alternator.

Refs #5053.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191229143556.24002-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Nadav Har'El
aad5eeab51 alternator: better error messages when Alternator port is taken
If Alternator is requested to be enabled on a specific port but the port is
already taken, the boot fails as expected - but the error log is confusing;
It currently looks something like this:

WARN  2019-12-24 11:22:57,303 [shard 0] alternator-server - Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)
... (many more messages about the server shutting down)
INFO  2019-12-24 11:22:58,008 [shard 0] init - Startup failed: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)

There are two problems here. First, the "WARN" should really be an "ERROR",
because it causes the server to be shut down and the user must see this error.
Second, the final line in the log, something the user is likely to see first,
contains only the ultimate cause for the exception (an address already in use)
but not the information what this address was needed for.

This patch solves both issues, and the log now looks like:

ERROR 2019-12-24 14:00:54,496 [shard 0] alternator-server - Failed to set up Alterna
tor HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system
:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)
...
INFO  2019-12-24 14:00:55,056 [shard 0] init - Startup failed: std::_Nested_exception<std::runtime_error> (Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043): std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191224124127.7093-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Nadav Har'El
1f64a3bbc9 alternator: error on unsupported ReturnValues option
We don't support yet the ReturnValues option on PutItem, UpdateItem or
DeleteItem operations (see issue #5053), but if a user tries to use such
an option anyway, we silently ignore this option. It's better to fail,
reporting the unsupported option.

In this patch we check the ReturnValues option and if it is anything but
the supported default ("NONE"), we report an error.

Also added a test to confirm this fix. The test verifies that "NONE" is
allowed, and something which is unsupported (e.g., "DOG") is not ignored
but rather causes an error.

Refs #5053.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191216193310.20060-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Rafael Ávila de Espíndola
dc93228b66 reloc: Turn the default flags into common flags
These are flags we always want to enable. In particular, we want them
to be used by the bots, but the bots run this script with
--configure-flags, so they were being discarded.

We put the user option later so that they can override the common
options.

Fixes #5505

Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Reviewed-by: Takuya ASADA <syuu@scylladb.com>
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-03 15:48:20 +02:00
Rafael Ávila de Espíndola
d4dfb6ff84 build-id: Handle the binary having multiple PT_NOTE headers
There is no requirement that all notes be placed in a single
PT_NOTE. It looks like recent lld's actually put each section in its
own PT_NOTE.

This change looks for build-id in all PT_NOTE headers.

Fixes #5525

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191227000311.421843-1-espindola@scylladb.com>
2020-01-03 15:48:20 +02:00
Avi Kivity
1e9237d814 dist: redhat: use parallel compression for rpm payload
rpm compression uses xz, which is painfully slow. Adjust the
compression settings to run on all threads.

The xz utility documentation suggests that 0 threads is
equivalent to all CPUs, but apparently the library interface
(which rpmbuild uses) doesn't think the same way.

Message-Id: <20200101141544.1054176-1-avi@scylladb.com>
2020-01-03 15:48:20 +02:00
Nadav Har'El
de1171181c user defined types: fix support for case-sensitive type names
In the current code, support for case-sensitive (quoted) user-defined type
names is broken. For example, a test doing:

    CREATE TYPE "PHone" (country_code int, number text)
    CREATE TABLE cf (pk blob, pn "PHone", PRIMARY KEY (pk))

Fails - the first line creates the type with the case-sensitive name PHone,
but the second line wrongly ends up looking for the lowercased name phone,
and fails with an exception "Unknown type ks.phone".

The problem is in cql3_type_name_impl. This class is used to convert a
type object into its proper CQL syntax - for example frozen<list<int>>.
The problem is that for a user-defined type, we forgot to quote its name
if not lowercase, and the result is wrong CQL; For example, a list of
PHone will be written as list<PHone> - but this is wrong because the CQL
parser, when it sees this expression, lowercases the unquoted type name
PHone and it becomes just phone. It should be list<"PHone">, not list<PHone>.

The solution is for cql3_type_name_impl to use for a user-defined type
its get_name_as_cql_string() method instead of get_name_as_string().

get_name_as_cql_string() is a new method which prints the name of the
user type as it should be in a CQL expression, i.e., quoted if necessary.

The bug in the above test was apparently caused when our code serialized
the type name to disk as the string PHone (without any quoting), and then
later deserialized it using the CQL type parser, which converted it into
a lowercase phone. With this patch, the type's name is serialized as
"PHone", with the quotes, and deserialized properly as the type PHone.
While the extra quotes may seem excessive, they are necessary for the
correct CQL type expression - remember that the type expression may be
significantly more complex, e.g., frozen<list<"PHone">> and all of this,
including the quotes, is necessary for our parser to be able to translate
this string back into a type object.

This patch may cause breakage to existing databases which used case-
sensitive user-defined types, but I argue that these use cases were
already broken (as demonstrated by this test) so we won't break anything
that actually worked before.

Fixes #5544

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200101160805.15847-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Pavel Emelyanov
34f8762c4d storage_service: Drop _update_jobs
This field is write-only.
Leftover from 83ffae1 (storage_service: Drop block_until_update_pending_ranges_finished)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191226091210.20966-1-xemul@scylladb.com>
2020-01-03 15:48:20 +02:00
Pavel Emelyanov
f2b20e7083 cache_hitrate_calculator: Do not reinvent the peering_sharded_service
The class in question wants to run its own instances on different
shards, for this sake it keeps reference on sharded self to call
invoke_on() on. There's a handy peering_sharded_service<> in seastar
for the same, using it makes the code nicer and shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191226112401.23960-1-xemul@scylladb.com>
2020-01-03 15:48:19 +02:00
Rafael Ávila de Espíndola
bbed9cac35 cql3: move function creation to a .cc file
We had a lot of code in a .hh file, that while using templeates, was
only used from creating functions during startup.

This moves it to a new .cc file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200101002158.246736-1-espindola@scylladb.com>
2020-01-03 15:48:19 +02:00
Benny Halevy
c0883407fe scripts: Add cpp-name-format: pretty printer
Pretty-print cpp-names, useful for deciphering complex backtraces.

For example, the following line:
    service::storage_proxy::init_messaging_service()::{lambda(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector<frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info>)#1}::operator()(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector<frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info>) const at /local/home/bhalevy/dev/scylla/service/storage_proxy.cc:4360

Is formatted as:
    service::storage_proxy::init_messaging_service()::{
      lambda(
        seastar::rpc::client_info const&,
        seastar::rpc::opt_time_point,
        std::vector<
          frozen_mutation,
          std::allocator<frozen_mutation>
        >,
        db::consistency_level,
        std::optional<tracing::trace_info>
      )#1
    }::operator()(
      seastar::rpc::client_info const&,
      seastar::rpc::opt_time_point,
      std::vector<
        frozen_mutation,
        std::allocator<frozen_mutation>
      >,
      db::consistency_level,
      std::optional<tracing::trace_info>
    ) const at /local/home/bhalevy/dev/scylla/service/storage_proxy.cc:4360

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191226142212.37260-1-bhalevy@scylladb.com>
2020-01-01 12:08:12 +02:00
Rafael Ávila de Espíndola
75817d1fe7 sstable: Add checks to help track problems with large_data_handler use after free
I can't quite figure out how we were trying to write a sstable with
the large data handler already stopped, but the backtrace suggests a
good place to add extra checks.

This patch adds two check. One at the start and one at the end of
sstable::write_components. The first one should give us better
backtraces if the large_data_handler is already stopped. The second
one should help catch some race condition.

Refs: #5470
Message-Id: <20191231173237.19040-1-espindola@scylladb.com>
2020-01-01 12:03:31 +02:00
Rafael Ávila de Espíndola
3c34e2f585 types: Avoid an unaligned load in json integer serialization
The patch also adds a test that makes the fixed issue easier to
reproduce.

Fixes #5413
Message-Id: <20191231171406.15980-1-espindola@scylladb.com>
2019-12-31 19:23:42 +02:00
Gleb Natapov
bae5cb9f37 commitlog: remove unused argument during segment creation
Since 99a5a77234 all segments are created
equal and "active" argument is never true, so drop it.

Message-Id: <20191231150639.GR9084@scylladb.com>
2019-12-31 17:14:03 +02:00
Rafael Ávila de Espíndola
aa535a385d enum_option_test: Add an explicit underlying type to an enum
We expect to be able to create a variable with an out of range value,
so the enum needs an explicit underlying type.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191230222029.88942-1-espindola@scylladb.com>
2019-12-31 16:59:00 +02:00
Nadav Har'El
48a914c291 Fix uninitialized members
Merged pull request https://github.com/scylladb/scylla/pull/5532 from
Benny Halevy:

Initialize bool members in row_level_repair and _storage_service causing
ubsan errors.

Fixes #5531
2019-12-31 10:32:54 +02:00
Takuya ASADA
aa87169670 dist/debian: add procps on Depends
We require procps package to use sysctl on postinst script for scylla-kernel-conf.

Fixes #5494

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191218234100.37844-1-syuu@scylladb.com>
2019-12-30 19:30:35 +02:00
Avi Kivity
972127e3a8 atomic_cell: add type-aware pretty printing
The standard printer for atomic_cell prints the value as hex,
because atomic_cell does not include the type. Add a type-aware
printer that allows the user to provide the type.
2019-12-30 18:27:04 +02:00
Avi Kivity
19f68412ad atomic_cell: move pretty printers from database.cc to atomic_cell.cc
atomic_cell.cc is the logical home for atomic_cell pretty printers,
and since we plan to add more pretty printers, start by tidying up.
2019-12-30 18:20:30 +02:00
Eliran Sinvani
21dec3881c debian-reloc: rename buld product to the name specified in SCYLLA-VERSION-GEN
When the product name is other than "scylla", the debian
packaging scripts go over all files that starts with "scylla-"
and change the prefix to be the actual product name.
However, if there are no such files in the directory
the script will fail since the renaming command will
get the wildcard string instrad of an actual file name.
This patch replaces the command with a command with
an equivalent desired effect that only operates on files
if there are any.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20191230143250.18101-1-eliransin@scylladb.com>
2019-12-30 17:45:50 +02:00
Takuya ASADA
263385cb4b dist: stop replacing /usr/lib/scylla with symlink (#5530)
Since we merged /usr/lib/scylla with /opt/scylladb, we removed
/usr/lib/scylla and replace it with the symlink point to /opt/scylladb.
However, RPM does not support replacing a directory with a symlink,
we are doing some dirty hack using RPM scriptlet, but it causes
multiple issues on upgrade/downgrade.
(See: https://docs.fedoraproject.org/en-US/packaging-guidelines/Directory_Replacement/)

To minimize Scylla upgrading/downgrade issues on user side, it's better
to keep /usr/lib/scylla directory.
Instead of creating single symlink /usr/lib/scylla -> /opt/scylladb,
we can create symlinks for each setup scripts like
/usr/lib/scylla/<script> -> /opt/scylladb/scripts/<script>.

Fixes #5522
Fixes #4585
Fixes #4611
2019-12-30 13:52:24 +02:00
Hagit Segev
9d454b7dc6 reloc/build_rpm.sh: Fix '--builddir' option handling (#5519)
The '--builddir' option value is assigned to the "builddir" variable,
which is wrong. The correct variable is "BUILDDIR" so use that instead
to fix the '--builddir' option.

Also, add logging to the script when executing the "dist/redhat_build.rpm.sh"
script to simplify debugging.
2019-12-30 13:25:22 +02:00
Benny Halevy
8aa5d84dd8 storage_service: initialize _is_bootstrap_mode
Hit the following ubsan error with bootstrap_test:TestBootstrap.manual_bootstrap_test in debug mode:
  service/storage_service.cc:3519:37: runtime error: load of value 190, which is not a valid value for type 'bool'

The use site is:
  service::storage_service::is_cleanup_allowed(seastar::basic_sstring<char, unsigned int, 15u, true>)::{lambda(service::storage_service&)#1}::operator()(service::storage_service&) const at /local/home/bhalevy/dev/scylla/service/storage_service.cc:3519

While at it, initialize `_initialized` to false as well, just in case.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-30 11:44:58 +02:00
Benny Halevy
474ffb6e54 repair: initialize row_level_repair: _zero_rows
Avoid following UBSAN error:
repair/row_level.cc:2141:7: runtime error: load of value 240, which is not a valid value for type 'bool'

Fixes #5531

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-30 11:44:58 +02:00
Fabiano Lucchese
d7795b1efa scylla_setup: Support for enforcing optimal Linux clocksource setting (#5499)
A Linux machine typically has multiple clocksources with distinct
performances. Setting a high-performant clocksource might result in
better performance for ScyllaDB, so this should be considered whenever
starting it up.

This patch introduces the possibility of enforcing optimized Linux
clocksource to Scylla's setup/start-up processes. It does so by adding
an interactive question about enforcing clocksource setting to scylla_setup,
which modifies the parameter "CLOCKSOURCE" in scylla_server configuration
file. This parameter is read by perftune.py which, if set to "yes", proceeds
to (non persistently) setting the clocksource. On x86, TSC clocksource is used.

Fixes #4474
Fixes #5474
Fixes #5480
2019-12-30 10:54:14 +02:00
Avi Kivity
e223154268 cdc: options: return an empty options map when cdc is disabled
This is compatible with 3.1 and below, which didn't have that schema
field at all.
2019-12-29 16:34:37 +02:00
Benny Halevy
27e0aee358 docs/debugging.md: fix anchor links
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191229074136.13516-1-bhalevy@scylladb.com>
2019-12-29 16:26:26 +02:00
Pavel Solodovnikov
aba9a11ff0 cql: pass variable_specifications via lw_shared_ptr
Instances of `variable_specifications` are passed around as
shared_ptr's, which are redundant in this case since the class
is marked as `final`. Use `lw_shared_ptr` instead since we know
for sure it's not a polymorphic pointer.

Tests: unit(debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191225232853.45395-1-pa.solodovnikov@scylladb.com>
2019-12-29 16:26:26 +02:00
Benny Halevy
4c884908bb directories: Keep a unique set of directories to initialize
If any two directories of data/commitlog/hints/view_hints
are the same we still end up running verify_owner_and_mode
and disk_sanity(check_direct_io_support) in parallel
on the same directoriea and hit #5510.

This change uses std::set rather than std::vector to
collect a unique set of directories that need initialization.

Fixes #5510

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191225160645.2051184-1-bhalevy@scylladb.com>
2019-12-29 16:26:26 +02:00
Gleb Natapov
60a851d3a5 commitlog: always flush segments atomically with writing
db::commitlog::segment::batch_cycle() assumes that after a write
for a certain position completes (as reported by
_pending_ops.wait_for_pending()) it will also be flushed, but this is
true only if writing and flushing are atomic wrt _pending_ops lock.
It usually is unless flush_after is set to false when cycle() is
called. In this case only writing is done under the lock. This
is exactly what happens when a segment is closed. Flush is skipped
because zero header is added after the last entry and then flushed, but
this optimization breaks batch_cycle() assumption. Fix it by flushing
after the write atomically even if a segment is being closed.

Fixes #5496

Message-Id: <20191224115814.GA6398@scylladb.com>
2019-12-24 14:52:23 +02:00
Pavel Emelyanov
a5cdfea799 directories: Do not mess with per-shard base dir
The hints and view_hints directory has per-shard sub-dirs,
and the directories code tries to create, check and lock
all of them, including the base one.

The manipulations in question are excessive -- it's enough
to check and lock either the base dir, or all the per-shard
ones, but not everything. Let's take the latter approach for
its simplicity.

Fixes #5510

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Looks-good-to: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223142429.28448-1-xemul@scylladb.com>
2019-12-24 14:49:28 +02:00
Benny Halevy
f8f5db42ca dbuild: try to pull image if not present locally
Pekka Enberg <penberg@scylladb.com> wrote:
> Image might not be present, but the subsequent "docker run" command will automatically pull it.

Just letting "docker run" fail produces kinda confusing error message,
referring to docker help, but the we want to provide the user
with our own help, so still fail early, just also try to pull the image
if "docker image inspect" failed, indicating it's not present locally.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223085219.1253342-4-bhalevy@scylladb.com>
2019-12-24 11:13:23 +02:00
Benny Halevy
ee2f97680a dbuild: just die when no image-id is provided
Suggested-by: Pekka Enberg <penberg@scylladb.com>
> This will print all the available Docker images,
> many (most?) of them completely unrelated.
> Why not just print an error saying that no image was specified,
> and then perhaps print usage.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223085219.1253342-3-bhalevy@scylladb.com>
2019-12-24 11:13:22 +02:00
Benny Halevy
87b2f189f7 dbuild: s/usage/die/
Suggested-by: Dejan Mircevski <dejan@scylladb.com>
> The use pattern of this function strongly suggests a name like `die`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223085219.1253342-2-bhalevy@scylladb.com>
2019-12-24 11:13:21 +02:00
Benny Halevy
718e9eb341 table: move_sstables_from_staging: fix use after free of shared_sstable
Introduced in 4b3243f5b9

Reproducible with materialized_views_test:TestMaterializedViews.mv_populating_from_existing_data_during_node_remove_test
and read_amplification_test:ReadAmplificationTest.no_read_amplification_on_repair_with_mv_test

==955382==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200023de18 at pc 0x00000051d788 bp 0x7f8a0563fcc0 sp 0x7f8a0563fcb0
READ of size 8 at 0x60200023de18 thread T1 (reactor-1)
    #0 0x51d787 in seastar::lw_shared_ptr<sstables::sstable>::lw_shared_ptr(seastar::lw_shared_ptr<sstables::sstable> const&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:289
    #1 0x10ba189 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl
e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1530
    #2 0x109c4f1 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl
e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1556
    #3 0x106941a in do_for_each<__gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>*, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >, table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(
std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:618
    #4 0x1069203 in operator() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:626
    #5 0x10ba589 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36
    #6 0x10ba668 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>*, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging
(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44
    #7 0x10ba7c0 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>*, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging
(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563
    ...

0x60200023de18 is located 8 bytes inside of 16-byte region [0x60200023de10,0x60200023de20)
freed by thread T1 (reactor-1) here:
    #0 0x7f8a153b796f in operator delete(void*) (/lib64/libasan.so.5+0x11096f)
    #1 0x6ab4d1 in __gnu_cxx::new_allocator<seastar::lw_shared_ptr<sstables::sstable> >::deallocate(seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/ext/new_allocator.h:128
    #2 0x612052 in std::allocator_traits<std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::deallocate(std::allocator<seastar::lw_shared_ptr<sstables::sstable> >&, seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:470
    #3 0x58fdfb in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::_M_deallocate(seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/bits/stl_vector.h:351
    #4 0x52a790 in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~_Vector_base() /usr/include/c++/9/bits/stl_vector.h:332
    #5 0x52a99b in std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~vector() /usr/include/c++/9/bits/stl_vector.h:680
    #6 0xff60fa in ~<lambda> /local/home/bhalevy/dev/scylla/table.cc:2477
    #7 0xff7202 in operator() /local/home/bhalevy/dev/scylla/table.cc:2496
    #8 0x106af5b in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1573
    #9 0x102f5d5 in futurize_apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1645
    #10 0x102f9ee in operator()<seastar::semaphore_units<seastar::named_semaphore_exception_factory> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/semaphore.hh:488
    #11 0x109d2f1 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36
    #12 0x109d42c in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable>
 >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44
    #13 0x109d595 in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable>
 >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563
    ...

Fixes #5511

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191222214326.1229714-1-bhalevy@scylladb.com>
2019-12-23 15:20:41 +02:00
Konstantin Osipov
476fbc60be test.py: prepare to remove custom colors
Add dbuild dependency on python3-colorama,
which will be used in test.py instead of a hand-made palette.

[avi: update tools/toolchain/image]
Message-Id: <20191223125251.92064-2-kostja@scylladb.com>
2019-12-23 15:13:22 +02:00
Pavel Emelyanov
d361894b9d batchlog_manager: Speed up token_metadata endpoints counting a bit
In this place we only need to know the number of endpoints,
while current code additionally shuffles them before counting.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:45 +02:00
Pavel Emelyanov
6e06c88b4c token_metadata: Remove unused helper
There are two _identical_ methods in token_metadata class:
get_all_endpoints_count() and number_of_endpoints().
The former one is used (called) the latter one is not used, so
let's remove it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:43 +02:00
Pavel Emelyanov
2662d9c596 migration_manager: Remove run_may_throw() first argument
It's unused in this function. Also this helps getting
rid of global instances of components.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:42 +02:00
Pavel Emelyanov
703b16516a storage_service: Remove unused helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:41 +02:00
Takuya ASADA
e0071b1756 reloc: don't archive dist/ami/files/*.rpm on relocatable package
We should skip archiving dist/ami/files/*.rpm on relocatable package,
since it doesn't used.
Also packer and variables.json, too.

Fixes #5508

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191223121044.163861-1-syuu@scylladb.com>
2019-12-23 14:19:51 +02:00
Tomasz Grabiec
28dec80342 db/schema_tables: Add trace-level logging of schema digesting
This greatly helps to narrow down the source of schema digest mismatch
between nodes. Intented use is to enable this logger on disagreeing
nodes and trigger schema digest recalculation and observe which
mutations differ in digest and then examine their content.

Message-Id: <1574872791-27634-1-git-send-email-tgrabiec@scylladb.com>
2019-12-23 12:28:22 +02:00
Konstantin Osipov
1116700bc9 test.py: do not return 0 if there are failed tests
Fix a return value regression introduced when switching to asyncio.

Message-Id: <20191222134706.16616-2-kostja@scylladb.com>
2019-12-22 16:14:32 +02:00
Asias He
7322b749e0 repair: Do not return working_row_buf_nr in get combined row hash verb
In commit b463d7039c (repair: Introduce
get_combined_row_hash_response), working_row_buf_nr is returned in
REPAIR_GET_COMBINED_ROW_HASH in addition to the combined hash. It is
scheduled to be part of 3.1 release. However it is not backported to 3.1
by accident.

In order to be compatible between 3.1 and 3.2 repair. We need to drop
the working_row_buf_nr in 3.2 release.

Fixes: #5490
Backports: 3.2
Tests: Run repair in a mixed 3.1 and 3.2 cluster
2019-12-21 20:13:15 +02:00
Takuya ASADA
8eaecc5ed6 dist/common/scripts/scylla_setup: add swap existance check
Show warnings when no swap is configured on the node.

Closes #2511

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191220080222.46607-1-syuu@scylladb.com>
2019-12-21 20:03:58 +02:00
Pavel Solodovnikov
5a15bed569 cql3: return result_set by cref in cql3::result::result_set
Changes summary:
* make `cql3::result_set` movable-only
* change signature of `cql3::result::result_set` to return by cref
* adjust available call sites to the aforementioned method to accept cref

Motivation behind this change is elimination of dangerous API,
which can easily set a trap for developers who don't expect that
result_set would be returned by value.

There is no point in copying the `result_set` around, so make
`cql3::result::result_set` to cache `result_set` internally in a
`unique_ptr` member variable and return a const reference so to
minimize unnecessary copies here and there.

Tests: unit(debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191220115100.21528-1-pa.solodovnikov@scylladb.com>
2019-12-21 16:56:42 +02:00
Takuya ASADA
3a6cb0ed8c install.sh: drop limits.d from nonroot mode
The file only required for root mode.

Fixes #5507

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191220101940.52596-1-syuu@scylladb.com>
2019-12-21 15:26:08 +02:00
Botond Dénes
08bb0bd6aa mutation_fragment_stream_validator: wrap exceptions into own exception type
So a higher level component using the validator to validate a stream can
catch only validation errors, and let any other incidental exception
through.

This allows building data correctors on top of the
`mutation_fragment_stream_validator`, by filtering a fragment stream
through a validator, catching invalid fragment stream exceptions and
dropping the respective fragments from the stream.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191220073443.530750-1-bdenes@scylladb.com>
2019-12-20 12:05:00 +01:00
Rafael Ávila de Espíndola
91c7f5bf44 Print build-id on startup
Fixes #5426

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191218031556.120089-1-espindola@scylladb.com>
2019-12-19 15:43:04 +02:00
Avi Kivity
440ad6abcc Revert "relocatable: Check that patchelf didn't mangle the PT_LOAD headers"
This reverts commit 237ba74743. While it
works for the scylla executable, it fails for iotune, which is built
by seastar. It should be reinstated after we pass the correct link
parameters to the seastar build system.
2019-12-19 11:20:34 +02:00
Pekka Enberg
c0aea19419 Merge "Add a timeout for housekeeping for offline installs" from Amnon
"
These series solves an issue with scylla_setup and prevent it from
waiting forever if housekeeping cannot look for the new Scylla version.

Fixes #5302

It should be backported to versions that support offline installations.
"

* 'scylla_setup_timeout' of git://github.com/amnonh/scylla:
  scylla_setup: do not wait forever if no reply is return housekeeping
  scylla_util.py: Add optional timeout to out function
2019-12-19 08:18:19 +02:00
Rafael Ávila de Espíndola
8d777b3ad5 relocatable: Use a super long path for the dynamic linker
Having a long path allows patchelf to change the interpreter without
changing the PT_LOAD headers and therefore without moving the
build-id out of the first page.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191213224803.316783-1-espindola@scylladb.com>
2019-12-18 19:10:59 +02:00
Pavel Solodovnikov
c451f6d82a LWT: Fix required participants calculation for LOCAL_SERIAL CL
Suppose we have a multi-dc setup (e.g. 9 nodes distributed across
3 datacenters: [dc1, dc2, dc3] -> [3, 3, 3]).

When a query that uses LWT is executed with LOCAL_SERIAL consistency
level, the `storage_proxy::get_paxos_participants` function
incorrectly calculates the number of required participants to serve
the query.

In the example above it's calculated to be 5 (i.e. the number of
nodes needed for a regular QUORUM) instead of 2 (for LOCAL_SERIAL,
which is equivalent to LOCAL_QUORUM cl in this case).

This behavior results in an exception being thrown when executing
the following query with LOCAL_SERIAL cl:

INSERT INTO users (userid, firstname, lastname, age) VALUES (0, 'first0', 'last0', 30) IF NOT EXISTS

Unavailable: Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level for cl LOCAL_SERIAL. Requires 5, alive 3" info={'required_replicas': 5, 'alive_replicas': 3, 'consistency': 'LOCAL_SERIAL'}

Tests: unit(dev), dtest(consistency_test.py)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191216151732.64230-1-pa.solodovnikov@scylladb.com>
2019-12-18 16:58:32 +01:00
Botond Dénes
cd6bf3cb28 scylla-gdb.py: static_vector: update for changed storage
The actual buffer is now in a member called 'data'. Leave the old
`dummy.dummy` and `dummy` as fall-back. This seems to change every
Fedora release.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191218153544.511421-1-bdenes@scylladb.com>
2019-12-18 17:39:56 +02:00
Tomasz Grabiec
5865d08d6c migration_manager: Recalculate schema only on shard 0
Schema is node-global, update_schema_version_and_announce() updates
all shards.  We don't need to recalculate it from every shard, so
install the listeners only on shard 0. Reduces noise in the logs.

Message-Id: <1574872860-27899-1-git-send-email-tgrabiec@scylladb.com>
2019-12-18 16:43:26 +02:00
Pavel Emelyanov
998f51579a storage_service: Rip join_ring config option
The option in question apparently does not work, several sharded objects
are start()-ed (and thus instanciated) in join_roken_ring, while instances
themselves of these objects are used during init of other stuff.

This leads to broken seastar local_is_initialized assertion on sys_dist_ks,
but reading the code shows more examples, e.g. the auth_service is started
on join, but is used for thrift and cql servers initialization.

The suggestion is to remove the option instead of fixing. The is_joined
logic is kept since on-start joining still can take some time and it's safer
to report real status from the API.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191203140717.14521-1-xemul@scylladb.com>
2019-12-18 12:45:13 +02:00
Nadav Har'El
8157f530f5 merge: CDC: handle schema changes
Merged pull request https://github.com/scylladb/scylla/pull/5366 from Calle Wilund:

Moves schema creation/alter/drop awareness to use new "before" callbacks from
migration manager, and adds/modifies log and streams table as part of the base
table modification.

Makes schema changes semi-atomic per node. While this does not deal with updates
coming in before a schema change has propagated cluster, it now falls into the
same pit as when this happens without CDC.

Added side effect is also that now schemas are transparent across all subsystems,
not just cql.

Patches:
  cdc_test: Add small test for altering base schema (add column)
  cdc: Handle schema changes via migration manager callbacks
  migration_manager: Invoke "before" callbacks for table operations
  migration_listener: Add empty base class and "before" callbacks for tables
  cql_test_env: Include cdc service in cql tests
  cdc: Add sharded service that does nothing.
  cdc: Move "options" to separate header to avoid to much header inclusion
  cdc: Remove some code from header
2019-12-17 23:04:36 +02:00
Avi Kivity
1157ee16a5 Update seastar submodule
* seastar 00da4c8760...0525bbb08f (7):
  > future: Simplify future_state_base::any move constructor
  > future: don't create temporary tuple on future::get().
  > future: don't instantiate new future on future::then_wrapped().
  > future: clean-up the Result handling in then_wrapped().
  > Merge "Fix core dumps when asan is enabled" from Rafael
  > future: Move ignore to the base class
  > future: Don't delete in ignore
2019-12-17 19:47:50 +02:00
Botond Dénes
638623b56b configure.py: make build.ninja target depend on SCYLLA-VERSION-GEN
Currently `SCYLLA-VERSION-GEN` is not a dependency of any target and
hence changes done to it will not be picked up by ninja. To trigger a
rebuild and hence version changes to appear in the `scylla` target
binary, one has to do `touch configure.py`. This is counter intuitive
and frustrating to people who don't know about it and wonder why their
changed version is not appearing as the output of `scylla --version`.

This patch makes `SCYLLA-VERSION-GEN` a dependency of `build.ninja,
making the `build.ninja` target out-of-date whenever
`SCYLLA-VERSION-GEN` is changed and hence will trigger a rerun of
`configure.py` when the next target is built, allowing a build of e.g.
`scylla` to pick up any changes done to the version automatically.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191217123955.404172-1-bdenes@scylladb.com>
2019-12-17 17:40:04 +02:00
Avi Kivity
7152ba0c70 Merge "tests: automatically search for unit tests" from Kostja
"
This patch set rearranges the test files so that
it is now possible to search for tests automatically,
and adds this functionality to test.py
"

* 'test.py.requeue' of ssh://github.com/scylladb/scylla-dev:
  cmake: update CMakeLists.txt to scan test/ rather than tests/
  test.py: automatically lookup all unit and boost tests
  tests: move all test source files to their new locations
  tests: move a few remaining headers
  tests: move another set of headers to the new test layout
  tests: move .hh files and resources to new locations
  tests: remove executable property from data_listeners_test.cc
2019-12-17 17:32:18 +02:00
Amnon Heiman
dd42f83013 scylla_setup: do not wait forever if no reply is return housekeeping
When scylla is installed without a network connectivity, the test if a
newer version is available can cause scylla_setup to wait forever.

This patch adds a limit to the time scylla_setup will wait for a reply.

When there is no reply, the relevent error will be shown that it was
unable to check for newer version, but this will not block the setup
script.

Fixes #5302

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-12-17 14:56:47 +02:00
Nadav Har'El
aa1de5a171 merge: Synchronize snapshot and staging sstable deletion using sem
Merged pull request https://github.com/scylladb/scylla/pull/5343 from
Benny Halevy.

Fixes #5340

Hold the sstable_deletion_sem table::move_sstables_from_subdirs to
serialize access to the staging directory. It now synchronizes snapshot,
compaction deletion of sstables, and view_update_generator moving of
sstables from staging.

Tests:

    unit (dev) [expect test_user_function_timestamp_return that fails for me locally, but also on master]
    snapshot_test.py (dev)
2019-12-17 14:06:02 +02:00
Juliusz Stasiewicz
7fdc8563bf system_keyspace: Added infrastructure for table `system.clients'
I used the following as a reference:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/virtual/ClientsTable.java
At this moment there is only info about IP, clients outgoing port,
client 'type' (i.e. CQL/thrift/alternator), shard ID and username.
Column `request_count' is NOT present and CK consists of
(`port', `client_type'), contrary to what C*'s has: (`port').

Code that notifies `system.clients` about new connections goes
to top-level files `connection_notifier.*`. Currently only CQL
clients are observed, but enum `client_type` can be used in future
to notify about connections with other protocols.
2019-12-17 11:31:28 +01:00
Benny Halevy
4b3243f5b9 table: move_sstables_from_staging_in_thread with _sstable_deletion_sem
Hold the _sstable_deletion_sem while moving sstables from the staging directory
so not to move them under the feet of table::snapshot.

Fixes #5340

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
0446ce712a view_update_generator::start: use variable binding
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
5d7c80c148 view_update_generator::start: fix indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
02784f46b9 view_update_generator: handle errors when processing sstable
Consumer may throw, in this case, break from the loop and retry.

move_sstable_from_staging_in_thread may theoretically throw too,
ignore the error in this case since the sstable was already processed,
individual move failures are already ignored and moving from staging
will be retried upon restart.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
abda12107f sstables: move_to_new_dir: add do_sync_dirs param
To be used for "batch" move of several sstables from staging
to the base directory, allowing the caller to sync the directories
once when all are moved rather than for each one of them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
6efef84185 sstable: return future from move_to_new_dir
distributed_loader::probe_file needlessly creates a seastar
thread for it and the next patch will use it as part of
a parallel_for_each loop to move a list of sstables
(and sync the directories once at the end).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
0d2a7111b2 view_update_generator: sstable_with_table: std::move constructor args
Just a small optimization.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:19:55 +02:00
Nadav Har'El
fc85c49491 alternator: error on unsupported parallel scan
We do not yet support the parallel Scan options (TotalSegments, Segment),
as reported in issue #5059. But even before implementing this feature, it
is important that we produce an error if a user attempts to use it - instead
of outright ignoring this parameter. This is what this patch does.

The patch also adds a full test, test_scan.py::test_scan_parallel, for the
parallel scan feature. The test passes on DynamoDB, and still xfails
on Alternator after this patch - but now the Scan request fails immediately
reporting the unsupported option - instead of what the pre-patch code did:
returning the wrong results and the test failing just when the results
do not match the expectations.

Refs #5059.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191217084917.26191-1-nyh@scylladb.com>
2019-12-17 11:27:56 +02:00
Avi Kivity
f7d69b0428 Revert "Merge "bouncing lwt request to an owning shard" from Gleb"
This reverts commit 64cade15cc, reversing
changes made to 9f62a3538c.

This commit is suspected of corrupting the response stream.

Fixes #5479.
2019-12-17 11:06:10 +02:00
Rafael Ávila de Espíndola
237ba74743 relocatable: Check that patchelf didn't mangle the PT_LOAD headers
Should avoid issue #4983 showing up again.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191213224803.316783-2-espindola@scylladb.com>
2019-12-16 20:18:32 +02:00
Avi Kivity
3b7aca3406 Merge "db: Don't create a reference to nullptr" from Rafael
"
Only the first patch is needed to fix the undefined behavior, but the
followup ones simplify the memory management around user types.
"

* 'espindola/fix-5193-v2' of ssh://github.com/espindola/scylla:
  db: Don't use lw_shared_ptr for user_types_metadata
  user_types_metadata: don't implement enable_lw_shared_from_this
  cql3: pass a const user_types_metadata& to prepare_internal
  db: drop special case for top level UDTs
  db: simplify db::cql_type_parser::parse
  db: Don't create a reference to nullptr
  Add test for loading a schema with a non native type
2019-12-16 17:10:58 +02:00
Konstantin Osipov
d6bc7cae67 cmake: update CMakeLists.txt to scan test/ rather than tests/
A follow up on directory rename.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
e079a04f2a test.py: automatically lookup all unit and boost tests 2019-12-16 17:47:42 +03:00
Konstantin Osipov
1c8736f998 tests: move all test source files to their new locations
1. Move tests to test (using singular seems to be a convention
   in the rest of the code base)
2. Move boost tests to test/boost, other
   (non-boost) unit tests to test/unit, tests which are
   expected to be run manually to test/manual.

Update configure.py and test.py with new paths to tests.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
2fca24e267 tests: move a few remaining headers
Move sstable_test.hh, test_table.hh and cql_assertions.hh from tests/ to
test/lib or test/boost and update dependent .cc files.
Move tests/perf_sstable.hh to test/perf/perf_sstable.hh
2019-12-16 17:47:42 +03:00
Konstantin Osipov
b9bf1fbede tests: move another set of headers to the new test layout
Move another small subset of headers to test/
with the same goals:
- preserve bisectability
- make the revision history traceable after a move

Update dependent files.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
8047d24c48 tests: move .hh files and resources to new locations
The plan is to move the unstructured content of tests/ directory
into the following directories of test/:

test/lib - shared header and source files for unit tests
test/boost - boost unit tests
test/unit - non-boost unit tests
test/manual - tests intended to be run manually
test/resource - binary test resources and configuration files

In order to not break git bisect and preserve the file history,
first move most of the header files and resources.
Update paths to these files in .cc files, which are not moved.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
644595e15f tests: remove executable property from data_listeners_test.cc
Executable flag must be committed to git by mistake.
2019-12-16 17:47:41 +03:00
Benny Halevy
d2e00abe13 tests: commitlog_test: test_allocation_failure: improve error reporting
We're seeing the following error from test from time to time:
  fatal error: in "test_allocation_failure": std::runtime_error: Did not get expected exception from writing too large record

This is not reproducible and the error string does not contain
enough information to figure out what happened exactly, therefore
this patch adds an exception if the call succeeded unexpectedly
and also prints the unexpected exception if one was caught.

Refs #4714

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191215052434.129641-1-bhalevy@scylladb.com>
2019-12-16 15:38:48 +01:00
Asias He
6b7344f6e5 streaming: Fix typo in stream_result_future::maybe_complete
s/progess/progress/

Refs: #5437
2019-12-16 11:12:03 +02:00
Dejan Mircevski
f3883cd935 dbuild: Fix podman invocation (#5481)
The is_podman check was depending on `docker -v` printing "podman" in
the output, but that doesn't actually work, since podman prints $0.
Use `docker --help` instead, which will output "podman".

Also return podman's return status, which was previously being
dropped.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-16 11:11:48 +02:00
Avi Kivity
00ae4af94c Merge "Sanitize and speed-up (a bit) directories set up" from Pavel
"
On start there are two things that scylla does on data/commitlog/etc.
dirs: locks and verifies permissions. Right now these two actions are
managed by different approaches, it's convenient to merge them.

Also the introduced in this set directories class makes a ground for
better --workdir option handling. In particular, right now the db::config
entries are modified after options parse to update directories with
the workdir prefix. With the directories class at hands will be able
to stop doing this.
"

* 'br-directories-cleanup' of https://github.com/xemul/scylla:
  directories: Make internals work on fs::path
  directories: Cleanup adding dirs to the vector to work on
  directories: Drop seastar::async usage
  directories: Do touch_and_lock and verify sequentially
  directories: Do touch_and_lock in parallel
  directories: Move the whole stuff into own .cc file
  directories: Move all the dirs code into .init method
  file_lock: Work with fs::path, not sstring
2019-12-15 16:02:46 +02:00
Takuya ASADA
5e502ccea9 install.sh: setup workdir correctly on nonroot mode
Specify correct workdir on nonroot mode, to set correct path of
data / commitlog / hints directories at once.

Fixes #5475

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191213012755.194145-1-syuu@scylladb.com>
2019-12-15 16:00:57 +02:00
Avi Kivity
c25d51a4ea Revert "scylla_setup: Support for enforcing optimal Linux clocksource setting (#5379)"
This reverts commit 4333b37f9e. It breaks upgrades,
and the user question is not informative enough for the user to make a correct
decision.

Fixes #5478.
Fixes #5480.
2019-12-15 14:37:40 +02:00
Pavel Emelyanov
23a8d32920 directories: Make internals work on fs::path
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
373fcfdb3e directories: Cleanup adding dirs to the vector to work on
The unordered_set is turned into vector since for fs::path
there's no hash() method that's needed for set.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
14437da769 directories: Drop seastar::async usage
Now the only future-able operation remained is the call to
parallel_for_each(), all the rest is non-blocking preparation,
so we can drop the seastar::async and just return the future
from parallel_for_each.

The indendation is now good, as in previous patch is was prepared
just for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
06f4f3e6d8 directories: Do touch_and_lock and verify sequentially
The goal is to drop the seastar::async() usage.

Currently we have two places that return futures -- calls to
parallel_for_each-s.  We can either chain them together or,
since both are working on the same set of directories, chain
actions inside them.

For code simplicity I propose to chain actions.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
8d0c820aa1 directories: Do touch_and_lock in parallel
The list of paths that should be touch-and-locked is already
at hands, this shortens the code and makes it slightly faster
(in theory).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
71a528d404 directories: Move the whole stuff into own .cc file
In order not to pollute the root dir place the code in
utils/ directory, "utils" namespace.

While doing this -- move the touch_and_lock from the
class declaration.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Benny Halevy
9ec98324ed messaging_service: unregister_handler: return rpc unregister_handler future
Now that seastar returns it.

Fixes https://github.com/scylladb/scylla/issues/5228

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191212143214.99328-1-bhalevy@scylladb.com>
2019-12-12 16:38:36 +02:00
Pavel Emelyanov
f2b3c17e66 directories: Move all the dirs code into .init method
The seastar::async usage is tempoarary, added for bisect-safety,
soon it will go away. For this reason the indentation in the
.init method is not "canonical", but is prepared for one-patch
drop of the seastar::async.

The hinted_handoff_enabled arg is there, as it's not just a
parameter on config, it had been parsed in main.cc.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 17:33:11 +03:00
Pavel Emelyanov
82ef2a7730 file_lock: Work with fs::path, not sstring
The main.cc code that converts sstring to fs::path
will be patched soon, the file_desc::open belongs
to seastar and works on sstrings.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 17:32:10 +03:00
Konstantin Osipov
bc482ee666 test.py: remove an unused option
Message-Id: <20191204142622.89920-2-kostja@scylladb.com>
2019-12-12 15:53:35 +02:00
Avi Kivity
64cade15cc Merge "bouncing lwt request to an owning shard" from Gleb
"
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by the transport
code that jumps to a correct shard and re-process incoming message there.
"

* 'gleb/bounce_lwt_request' of github.com:scylladb/seastar-dev:
  lwt: take raw lock for entire cas duration
  lwt: drop invoke_on in paxos_state prepare and accept
  lwt: Process lwt request on a owning shard
  storage_service: move start_native_transport into a thread
  transport: change make_result to takes a reference to cql result instead of shared_ptr
2019-12-12 15:50:22 +02:00
Nadav Har'El
9f62a3538c alternator: fix BEGINS_WITH operator for blobs
The implementation of Expected's BEGINS_WITH operator on blobs was
incorrect, naively comparing the base64-encoded strings, which doesn't
work. This patches fixes the code to compare the decoded strings.

The reason why the BEGINS_WITH test missed this bug was that we forgot
to check the blob case and only tested the string case; So this patch
also adds the missing test - which reproduces this bug, and verifies
its fix.

Fixes #5457

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191211115526.29862-1-nyh@scylladb.com>
2019-12-12 14:02:56 +01:00
Dejan Mircevski
27b8b6fe9d cql3: Fix needs_filtering() for clustering columns
The LIKE operator requires filtering, so needs_filtering() must check
is_LIKE().  This already happens for partition columns, but it was
overlooked for clustering columns in the initial implementation of
LIKE.

Fixes #5400.

Tests: unit(dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-12 01:19:13 +02:00
Benny Halevy
d1bcb39e7f hinted handoff: log message after removing hints directory (#5372)
To be used by dtest as an indicator that endpoint's hints
were drained and hints directory is removed.

Refs #5354

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-12 01:16:19 +02:00
Rafael Ávila de Espíndola
3b61cf3f0b db: Don't use lw_shared_ptr for user_types_metadata
The user_types_metadata can simply be owned by the keyspace. This
simplifies the code since we never have to worry about nulls and the
ownership is now explicit.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
a55838323b user_types_metadata: don't implement enable_lw_shared_from_this
It looks like this was done just to avoid including
user_types_metadata.hh, which seems a bit much considering that it
requires adding specialization to the seastar namespace.

A followup patch will also stop using lw_shared_ptr for
user_types_metadata.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
f7c2c60b07 cql3: pass a const user_types_metadata& to prepare_internal
We never modify the user_types_metadata via prepare_internal, so we
can pass it a const reference.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
99cb8965be db: drop special case for top level UDTs
This was originally done in 7f64a6ec4b,
but that commit was reverted in reverted in
8517eecc28.

The revert was done because the original change would call parse_raw
for non UDT types. Unlike the old patch, this one doesn't change the
behavior of non UDT types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
7ae9955c5f db: simplify db::cql_type_parser::parse
The variant of db::cql_type_parser::parse that has a
user_types_metadata argument was only used from the variant that
didn't. This inlines one in the other.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
2092e1ef6f db: Don't create a reference to nullptr
The user_types variable can be null during db startup since we have to
create types before reading the system table defining user types.

This avoids undefined behavior, but is unlikely that it was causing
more serious problems since the variable is only used when creating
user types and we don't create any until after all system tables are
read, in which case the user_types variable is not null.

Fixes #5193

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
6143941535 Add test for loading a schema with a non native type
This would have found the error with the previous version of the patch
series.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:43:34 -08:00
Gleb Natapov
64cfb9b1f6 lwt: take raw lock for entire cas duration
It will prevent parallel update by the same coordinator and should
reduce contention.
2019-12-11 14:41:31 +02:00
Gleb Natapov
898d2330a2 lwt: drop invoke_on in paxos_state prepare and accept
Since lwt requests are now running on an owning shard there is no longer
a need to invoke cross shard call.
2019-12-11 14:41:31 +02:00
Gleb Natapov
964c532c4f lwt: Process lwt request on a owning shard
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by transport code
that jumps to a correct shard and re-process incoming message there.
2019-12-11 14:41:31 +02:00
Gleb Natapov
54be057af3 storage_service: move start_native_transport into a thread
The code runs only once and it is simple if it runs in a seastar thread.
2019-12-11 14:41:31 +02:00
Gleb Natapov
007ba3e38e transport: change make_result to takes a reference to cql result instead of shared_ptr 2019-12-11 14:41:31 +02:00
Nadav Har'El
9e5c6995a3 alternator-test: add tests for ReturnValues parameter
This patch adds comprehensive tests for the ReturnValue parameter of
the write operations (PutItem, UpdateItem, DeleteItem), which can return
pre-write or post-write values of the modified item. The tests are in
a new test file, alternator-test/test_returnvalues.py.

This feature is not yet implemented in Alternator, so all the new
tests xfail on Alternator (and all pass on AWS).

Refs #5053

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191127163735.19499-1-nyh@scylladb.com>
2019-12-11 13:26:39 +01:00
Nadav Har'El
ab69bfc111 alternator-test: add xfailing tests for ScanIndexForward
This patch adds tests for Query's "ScanIndexForward" parameter, which
can be used to return items in reversed sort order.
We test that a Limit works and returns the given number of *last* items
in the sort order, and also that such reverse queries can be resumed,
i.e., paging works in the reverse order.

These tests pass against AWS DynamoDB, but fail against Alternator (which
doesn't support ScanIndexForward yet), so it is marked xfail.

Refs #5153.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191127114657.14953-1-nyh@scylladb.com>
2019-12-11 13:26:39 +01:00
Pekka Enberg
6bc18ba713 storage_proxy: Remove reference to MBean interface
The JMX interface is implemented by the scylla-jmx project, not scylla.
Therefore, let's remove this historical reference to MBeans from
storage_proxy.

Message-Id: <20191211121652.22461-1-penberg@scylladb.com>
2019-12-11 14:24:28 +02:00
Avi Kivity
63474a3380 Merge "Add experimental_features option" from Dejan
"
Add --experimental-features -- a vector of features to unlock. Make corresponding changes in the YAML parser.

Fixes #5338
"

* 'vecexper' of https://github.com/dekimir/scylla:
  config: Add `experimental_features` option
  utils: Add enum_option
2019-12-11 14:23:08 +02:00
Avi Kivity
56b9bdc90f Update seastar submodule
* seastar e440e831c8...00da4c8760 (7):
  > Merge "reactor: fix iocb pool underflow due to unaccounted aio fsync" from Avi
Fixes #5443.
  > install-dependencies.sh: fix arch dependencies
  > Merge " rpc: fix use-after-free during rpc teardown vs. rpc server message handling" from Benny
  > Merge "testing: improve the observability of abandoned failed futures" from Botond
  > rework the fair_queue tester
  > directory_test: Update to use run instead of run_deprecated
  > log: support fmt 6.0 branch with chrono.h for log
2019-12-11 14:17:49 +02:00
Benny Halevy
105c8ef5a9 messaging_service: wait on unregister_handler
Prepare for returning future<> from seastar rpc
unregister_handler.

Refs https://github.com/scylladb/scylla/issues/5228

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191208153924.1953-1-bhalevy@scylladb.com>
2019-12-11 14:17:41 +02:00
Nadav Har'El
06c3802a1a storage_proxy: avoid overflow in view-backlog delay calculation
In the calculate_delay() code for view-backlog flow control, we calculate
a delay and cap it at a "budget" - the remaining timeout. This timeout is
measured in milliseconds, but the capping calculation converted it into
microseconds, which overflowed if the timeout is very large. This causes
some tests which enable the UB sanitizer to fail.

We fix this problem by comparing the delay to the budget in millisecond
resolution, not in microsecond resolution. Then, if the calculated delay
is short enough, we return it using its full microsecond resolution.

Fixes #5412

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191205131130.16793-1-nyh@scylladb.com>
2019-12-11 14:10:54 +02:00
Nadav Har'El
2824d8f6aa Merge: alternator: Fix EQ operator for sets
Merged pull request https://github.com/scylladb/scylla/pull/5453
from Piotr Sarna:

Checking the EQ relation for alternator attributes is usually performed
simply by comparing underlying JSON objects, but sets (SS, BS, NS types)
need a special routine, as we need to make sure that sets stored in
a different order underneath are still equal, e.g:

[1, 3, 2] == [1, 2, 3]

Fixes #5021
2019-12-11 13:20:25 +02:00
Piotr Sarna
421db1dc9d alternator-test: remove XFAIL from set EQ test
With this series merged, test_update_expected_1_eq_set from
test_expected.py suite starts passing.
2019-12-11 12:07:39 +01:00
Piotr Sarna
a8e45683cb alternator: add EQ comparison for sets
Checking the EQ relation for alternator attributes is usually performed
simply by comparing underlying JSON objects, but sets (SS, BS, NS types)
need a special routine, as we need to make sure that sets stored in
a different order underneath are still equal, e.g:
[1, 3, 2] == [1, 2, 3]

Fixes #5021
2019-12-11 12:07:39 +01:00
Piotr Sarna
fb37394995 schema_tables: notify table deletions before creations
If a set of mutations contains both an entry that deletes a table
and an entry that adds a table with the same name, it's expected
to be a replacement operation (delete old + create new),
rather than a useless "try to create a table even though it exists
already and then immediately delete the original one" operation.
As such, notifications about the deletions should be performed
before notifications about the creations. The place that originally
suffered from this wrong order is view building - which in this case
created an incorrect duplicated entry in the view building bookkeeping,
and then immediately deleted it, resulting in having old, deprecated
entries with stale UUIDS lying in the build queue and never proceeding,
because the underlying table is long gone.
The issue is fixed by ensuring the order of notifications:
 - drops are announced first, view drops are announced before table drops;
 - creations follow, table creations are announced before views;
 - finally, changes to tables and views are announced;

Fixes #4382

Tests: unit(dev), mv_populating_from_existing_data_during_node_stop_test
2019-12-11 12:48:29 +02:00
Benny Halevy
d544df6c3c dist/ami/build_ami.sh: support incremental build of rpms (#5191)
Iterate over an array holding all rpm names to see if any
of them is missing from `dist/ami/files`. If they are missing,
look them up in build/redhat/RPMS/x86_64 so that if reloc/build_rpm.sh
was run manually before dist/ami/build_ami.sh we can just collect
the built rpms from its output dir.

If we're still missing any rpms, then run reloc/build_rpm.sh
and copy the required rpms from build/redhat/RPMS/x86_64.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Reviewed-by: Glauber Costa <glauber@scylladb.com>
2019-12-11 12:48:29 +02:00
Amnon Heiman
f43285f39a api: replace swagger definition to use long instead of int (#5380)
In swagger 1.2 int is defined as int32.

We originally used int following the jmx definition, in practice
internally we use uint and int64 in many places.

While the API format the type correctly, an external system that uses
swagger-based code generator can face a type issue problem.

This patch replace all use of int in a return type with long that is defined as int64.

Changing the return type, have no impact on the system, but it does help
external systems that use code generator from swagger.

Fixes #5347

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-12-11 12:48:29 +02:00
Nadav Har'El
2abac32f2e Merged: alternator: Implement CONTAINS and NOT_CONTAINS in Expected
Merged pull request https://github.com/scylladb/scylla/pull/5447
by Dejan Mircevski.

Adds the last missing operators in the "Expected" parameter and re-enable
their tests.

Fixes #5034.
2019-12-11 12:48:29 +02:00
Cem Sancak
86b8036502 Fix DPDK mode in prepare script
Fixes #5455.
2019-12-11 12:48:29 +02:00
Calle Wilund
35089da983 conf/config: Add better descriptive text on server/client encryption
Provide some explanation on prio strings + direction to gnutls manual.
Document client auth option.
Remove confusing/misleading statement on "custom options"

Message-Id: <20191210123714.12278-1-calle@scylladb.com>
2019-12-11 12:48:28 +02:00
Dejan Mircevski
32af150f1d alternator: Implement NOT_CONTAINS operator in Expected
Enable existing NOT_CONTAINS test, add NOT_CONTAINS to the list of
recognized operators, implement check_NOT_CONTAINS, and hook it up to
verify_expected_one().

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-10 15:31:47 -05:00
Dejan Mircevski
bd2bd3c7c8 alternator: Implement CONTAINS operator in Expected
Enable existing CONTAINS test, implement check_CONTAINS, and hook it
up to verify_expected_one().

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-10 15:31:47 -05:00
Dejan Mircevski
5a56fd384c config: Add experimental_features option
When the user wants to turn on only some experimental features, they
can use this new option.  The existing `experimental` option is
preserved for backwards compatibility.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-10 11:47:03 -05:00
Piotr Sarna
9504bbf5a4 alternator: move unwrap_set to serialization header
The utility function for unwrapping a set is going to be useful
across source files, so it's moved to serialization.hh/serialization.cc.
2019-12-10 15:08:47 +01:00
Piotr Sarna
4660e58088 alternator: move rjson value comparison to rjson.hh
The comparison struct is going to be useful across source files,
so it's moved into rjson header, where it conceptually belongs anyway.
2019-12-10 15:08:47 +01:00
Botond Dénes
db0e2d8f90 scylla-gdb.py: document and add safety net to seastar::thread related commands
Almost all commands provided by `scylla-gdb.py` are safe to use. The
worst that could happen if they fail is that you won't get the desired
information. There is one notable exception: `scylla thread`. If
anything goes wrong while this command is executed - gdb crashes, a bug
in the command, etc. - there is a good change the process under
examination will crash. Sometimes this is fine, but other times e.g.
when live debugging a production node, this is unacceptable.
To avoid any accidents add documentation to all commands working with
`seastar::thread`. And since most people don't read documentation,
especially when debugging under pressure, add a safety net to the
`scylla thread` command. When run, this command will now warn of the
dangers and will ask for explicit acknowledgment of the risk of crash,
by means of passing an `--iamsure` flag. When this flag is missing, it
will refuse to run. I am sure this will be very annoying but I am also
sure that the avoided crashes are worth it.

As part of making `scylla thread` safe, its argument parsing code is
migrated to `argparse`. This changes the usage but this should be fine
because it is well documented.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191129092838.390878-1-bdenes@scylladb.com>
2019-12-10 11:51:57 +02:00
Eliran Sinvani
765db5d14f build_ami: Trim ami description attribute to the allowed size
The ami description attribute is only allowed to be 255
characters long. When build_ami.sh generates an ami, it
generates an ami description which is a concatenation
of all of the componnents version strings. It can
happen that the description string is too long which
eventually causes the ami build to fail. This patch
trims the description string to 255 characters.
It is ok since the individual versions of the components
are also saved in tags attached to the image.

Tests:
 1. Reproduced with a long description and
    validated that it doesn't fail after the fix.

Fixes #5435

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20191209141143.28893-1-eliransin@scylladb.com>
2019-12-10 11:51:57 +02:00
Fabiano Lucchese
4333b37f9e scylla_setup: Support for enforcing optimal Linux clocksource setting (#5379)
A Linux machine typically has multiple clocksources with distinct
performances. Setting a high-performant clocksource might result in
better performance for ScyllaDB, so this should be considered whenever
starting it up.

This patch introduces the possibility of enforcing optimized Linux
clocksource to Scylla's setup/start-up processes. It does so by adding
an interactive question about enforcing clocksource setting to scylla_setup,
which modifies the parameter "CLOCKSOURCE" in scylla_server configuration
file. This parameter is read by perftune.py which, if set to "yes", proceeds
to (non persistently) setting the clocksource. On x86, TSC clocksource is
used.

Fixes #4474
2019-12-10 11:51:57 +02:00
Pavel Emelyanov
3a21419fdb features: Remove _FEATURE suffix from hinted_handoff feature name
All the other features are named w/o one. The internal const-s
are all different, but I'm fixing it separately.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191209154310.21649-1-xemul@scylladb.com>
2019-12-10 11:51:57 +02:00
Dejan Mircevski
a26bd9b847 utils: Add enum_option
This allows us to accept command-line options with a predefined set of
valid arguments.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-09 09:45:59 -05:00
Calle Wilund
7c5e4c527d cdc_test: Add small test for altering base schema (add column) 2019-12-09 14:35:04 +00:00
Calle Wilund
cb0117eb44 cdc: Handle schema changes via migration manager callbacks
This allows us to create/alter/drop log and desc tables "atomically"
with the base, by including these mutations in the original mutation
set, i.e. batch create/alter tables.

Note that population does not happen until types are actually
already put into database (duh), thus there _is_ still a gap
between creating cdc and it being truly usable. This may or may
not need handling later.
2019-12-09 14:35:04 +00:00
Rafael Ávila de Espíndola
761b19cee5 build: Split the build and host linker flags
A general build system knows about 3 machines:

* build: where the building is running
* host: where the built software will run
* target: the machine the software will produce code for

The target machine is only relevant for compilers, so we can ignore
it.

Until now we could ignore the build and host distinction too. This
patch adds the first difference: don't use host ld_flags when linking
build tools (gen_crc_combine_table).

The reason for this change is to make it possible to build with
-Wl,--dynamic-linker pointing to a path that will exist on the host
machine, but may not exist on the build machine.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191207030408.987508-1-espindola@scylladb.com>
2019-12-09 15:54:57 +02:00
Calle Wilund
27183f648d migration_manager: Invoke "before" callbacks for table operations
Potentially allowing (cdc) augmentation of mutations.

Note: only does the listener part in seastar::thread, to avoid
changing call behaviour.
2019-12-09 12:12:09 +00:00
Calle Wilund
f78a3bf656 migration_listener: Add empty base class and "before" callbacks for tables
Empty base type makes for less boiler plate in implementations.
The "before" callbacks are for listeners who need to potentially
react/augment type creation/alteration _before_ actually
committing type to schema tables (and holding the semaphore for this).

I.e. it is for cdc to add/modify log/desc tables "atomically" with base.
2019-12-09 12:12:09 +00:00
Calle Wilund
4e406105b1 cql_test_env: Include cdc service in cql tests 2019-12-09 12:12:09 +00:00
Calle Wilund
a21e140169 cdc: Add sharded service that does nothing.
But can be used to hang functionality into eventually.
2019-12-09 12:12:09 +00:00
Calle Wilund
2787b0c4f8 cdc: Move "options" to separate header to avoid to much header inclusion
cdc should not contaminate the whole universe.
2019-12-09 12:12:09 +00:00
fastio
8f326b28f4 Redis: Combine all the source files redis/commands/* into redis/commands.{hh,cc}
Fixes: #5394

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-12-08 13:54:33 +02:00
Avi Kivity
9c63cd8da5 sysctl: reduce kernel tendency to swap anonymous pages relative to page cache (#5417)
The vm.swappiness sysctl controls the kernel's prefernce for swapping
anonymous memory vs page cache. Since Scylla uses very large amounts
of anonymous memory, and tiny amounts of page cache, the correct setting
is to prefer swapping page cache. If the kernel swaps anonymous memory
the reactor will stall until the page fault is satisfied. On the other
hand, page cache pages usually belong to other applications, usually
backup processes that read Scylla files.

This setting has been used in production in Scylla Cloud for a while
with good results.

Users can opt out by not installing the scylla-kernel-conf package
(same as with the other kernel tunables).
2019-12-08 13:04:25 +02:00
Avi Kivity
0e319e0359 Update seastar submodule
* seastar 166061da3...e440e831c (8):
  > Fail tests on ubsan errors
  > future: make a couple of asserts more strict
  > future: Move make_ready out of line
  > config: Do not allow zero rates
Fixes #5360
  > future: add new state to avoid temporaries in get_available_state().
  > future: avoid temporary future_state on get_available_state().
  > future: inline future::abandoned
  > noncopyable_function: Avoid uninitialized warning on empty types
2019-12-06 18:33:23 +02:00
Piotr Sarna
0718ff5133 Merge 'min/max on collections returns human-readable result' from Juliusz
Previously, scylla used min/max(blob)->blob overload for collections,
tuples and UDTs; effectively making the results being printed as blobs.
This PR adds "dynamically"-typed min()/max() functions for compound types.

These types can be complicated, like map<int,set<tuple<..., and created
in runtime, so functions for them are created on-demand,
similarly to tojson(). The comparison remains unchanged - underneath
this is still byte-by-byte weak lex ordering.

Fixes #5139

* jul-stas/5139-minmax-bad-printing-collections:
  cql_query_tests: Added tests for min/max/count on collections
  cql3: min()/max() for collections/tuples/UDTs do not cast to blobs
2019-12-06 16:40:17 +01:00
Juliusz Stasiewicz
75955beb0b cql_query_tests: Added tests for min/max/count on collections
This tests new min/max function for collections and tuples. CFs
in test suite were named according to types being tested, e.g.
`cf_map<int,text>' what is not a valid CF name. Therefore, these
names required "escaping" of invalid characters, here: simply
replacing with '_'.
2019-12-06 12:15:49 +01:00
Juliusz Stasiewicz
9efad36fb8 cql3: min()/max() for collections/tuples/UDTs do not cast to blobs
Before:
cqlsh> insert into ks.list_types (id, val) values (1, [3,4,5]);
cqlsh> select max(val) from ks.list_types;

 system.max(val)
------------------------------------------------------------
 0x00000003000000040000000300000004000000040000000400000005

After:
cqlsh> select max(val) from ks.list_types;

 system.max(val)
--------------------
 [3, 4, 5]

This is accomplished similarly to `tojson()`/`fromjson()`: functions
are generated on demand from within `cql3::functions::get()`.
Because collections can have a variety of types, including UDTs
and tuples, it would be impossible to statically define max(T t)->T
for every T. Until now, max(blob)->blob overload was used.

Because `impl_max/min_function_for` is templated with the
input/output type, which can be defined in runtime, we need type-erased
("dynamic") versions of these functors. They work identically, i.e.
they compare byte representations of lhs and rhs with
`bytes::operator<`.

Resolves #5139
2019-12-06 12:14:51 +01:00
Avi Kivity
a18a921308 docs: maintainer.md: use command line to merge multi-commit pull requests
If you merge a pull request that contains multiple patches via
the github interface, it will document itself as the committer.

Work around this brain damage by using the command line.
2019-12-06 10:59:46 +01:00
Botond Dénes
7b37a700e1 configure.py: make tests explicitely depend on libseastar_testing.a
So that changes to libseastar_testing.a make all test target out of
date.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191205142436.560823-1-bdenes@scylladb.com>
2019-12-05 19:30:34 +02:00
Piotr Sarna
3a46b1bb2b Merge "handle hints on separate connection and scheduling group" from Piotr
Introduce a new verb dedicated for receiving and sending hints: HINT_MUTATION. It is handled on the streaming connection, which is separate from the one used for handling mutations sent by coordinator during a write.

The intent of using a separate connection is to increase fairness while handling hints and user requests - this way, a situation can be avoided in which one type of requests saturate the connection, negatively impacting the other one.

Information about new RPC support is propagated through new gossip feature HINTED_HANDOFF_SEPARATE_CONNECTION.

Fixes #4974.

Tests: unit(release)
2019-12-05 17:25:26 +01:00
Calle Wilund
c11874d851 gms::inet_address: Use special ostream formatting to match Java
To make gms::inet_address::to_string() similar in output to origin.
The sole purpose being quick and easy fix of API/JMX ipv6
formatting of endpoints etc, where strings are used as lexical
comparisons instead of textual representation.

A better, but more work, solution is to fix the scylla-jmx
bridge to do explicit parse + re-format of addresses, but there
are many such callpoints.

An even better solution would be to fix nodetool to not make this
mistake of doing lexical comparisons, but then we risk breaking
merge compatibility. But could be an option for a separate
nodeprobe impl.

Message-Id: <20191204135319.1142-1-calle@scylladb.com>
2019-12-05 17:01:26 +02:00
Gleb Natapov
4893bc9139 tracing: split adding prepared query parameters from stopping of a trace
Currently query_options objects is passed to a trace stopping function
which makes it mandatory to make them alive until the end of the
query. The reason for that is to add prepared statement parameters to
the trace.  All other query options that we want to put in the trace are
copied into trace_state::params_values, so lets copy prepared statement
parameters there too. Trace enabled case will become a little bit more
expensive but on the other hand we can drop a continuation that holds
query_options object alive from a fast path. It is safe to drop the call
to stop_foreground_prepared() here since The tracing will be stopped
in process_request_one().

Message-Id: <20191205102026.GJ9084@scylladb.com>
2019-12-05 17:00:47 +02:00
Tomasz Grabiec
aa173898d6 Merge "Named semaphores in concurrency reader, segment_manager and region_group" from Juliusz
Selected semaphores' names are now included in exception messages in
case of timeout or when admission queue overflows.

Resolves #5281
2019-12-05 14:19:56 +01:00
Nadav Har'El
5b2f35a21a Merge "Redis: fix the options related to Redis API, fix the DEL and GET command"
Merged pull request https://github.com/scylladb/scylla/pull/5381 by
Peng Jian, fixing multiple small issues with Redis:

* Rename the options related to Redis API, and describe them clearly.
* Rename redis_transport_port to redis_port
* Rename redis_transport_port_ssl to redis_ssl_port
* Rename redis_default_database_count to redis_database_count
* Remove unnecessary option enable_redis_protocol
* Modify the default value of opition redis_read_consistency_level and redis_write_consistency_level to LOCAL_QUORUM

* Fix the DEL command: support to delete mutilple keys in one command.

* Fix the GET command: return the empty string when the required key is not exists.

* Fix the redis-test/test_del_non_existent_key: mark xfail.
2019-12-05 11:58:34 +02:00
Avi Kivity
85822c7786 database: fix schema use-after-move in make_multishard_streaming_reader
On aarch64, asan detected a use-after-move. It doesn't happen on x86_64,
likely due to different argument evaluation order.

Fix by evaluating full_slice before moving the schema.

Note: I used "auto&&" and "std::move()" even though full_slice()
returns a reference. I think this is safer in case full_slice()
changes, and works just as well with a reference.

Fixes #5419.
2019-12-05 11:58:34 +02:00
Piotr Sarna
79c3a508f4 table: Reduce read amplification in view update generation
This commit makes sure that single-partition readers for
read-before-write do not have fast-forwarding enabled,
as it may lead to huge read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs #5409
Fixes #4615
Fixes #5418
2019-12-05 11:58:34 +02:00
Konstantin Osipov
6a5e7c0e22 tests: reduce the number of iterations of dynamic_bitset_test
This test execution time dominates by a serious margin
test execution time in dev/release mode: reducing its
execution time improves the test.py turnaround by over 70%.

Message-Id: <20191204135315.86374-2-kostja@scylladb.com>
2019-12-05 11:58:34 +02:00
Avi Kivity
07427c89a2 gdb: change 'scylla thread' command to access fs_base register directly
Currently, 'scylla thread' uses arch_prctl() to extract the value of
fsbase, used to reference thread local variables. gdb 8 added support
for directly accessing the value as $fs_base, so use that instead. This
works from core dumps as well as live processes, as you don't need to
execute inferior functions.

The patch is required for debugging threads in core dumps, but not
sufficient, as we still need to set $rip and $rsp, and gdb still[1]
doesn't allow this.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=9370
2019-12-05 11:58:34 +02:00
Piotr Dulikowski
adfa7d7b8d messaging_service: don't move unsigned values in handlers
Performing std::move on integral types is pointless. This commit gets
rid of moves of values of `unsigned` type in rpc handlers.
2019-12-05 00:58:31 +01:00
Piotr Dulikowski
77d2ceaeba storage_proxy: handle hints through separate rpc verb 2019-12-05 00:51:52 +01:00
Piotr Dulikowski
2609065090 storage_proxy: move register_mutation handler to local lambda
This refactor makes it possible to reuse the lambda in following
commits.
2019-12-05 00:51:52 +01:00
Piotr Dulikowski
6198ee2735 hh: introduce HINTED_HANDOFF_SEPARATE_CONNECTION feature
The feature introduced by this commit declares that hints can be sent
using the new dedicated RPC verb. Before using the new verb, nodes need
to know if other nodes in the cluster will be able to handle the new
RPC verb.
2019-12-05 00:51:52 +01:00
Piotr Dulikowski
2e802ca650 hh: add HINT_MUTATION verb
Introduce a new verb dedicated for receiving and sending hints:
HINT_MUTATION. It is handled on the streaming connection, which is
separate from the one used for handling mutations sent by coordinator
during a write.

The intent of using a separate connection is to increase fariness while
handling hints and user requests - this way, a situation can be avoided
in which one type of requests saturate the connection, negatively
impacting the other one.
2019-12-05 00:51:49 +01:00
Avi Kivity
fd951a36e3 Merge "Let compaction wait on background deletions" from Benny
"
In several cases in distributed testing (dtest) we trigger compaction using nodetool compact assuming that when it is done, it is indeed really done.
However, the way compaction is currently implemented in scylla, it may leave behind some background tasks to delete the old sstables that were compacted.

This commit changes major compaction (triggered via the ss::force_keyspace_compaction api) so it would wait on the background deletes and will return only when they finish.

Fixes #4909

Tests: unit(dev), nodetool_refresh_with_data_perms_test, test_nodetool_snapshot_during_major_compaction
"
2019-12-04 11:18:41 +02:00
Takuya ASADA
c9d8606786 dist/common/scripts/scylla_ntp_setup: relax RHEL version check
We may able to use chrony setup script on future version of RHEL/CentOS,
it better to run chrony setup when RHEL version >= 8, not only 8.

Note that on Fedora it still provides ntp/ntpdate package, so we run
ntp setup on it for now. (same on debian variants)

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191203192812.5861-1-syuu@scylladb.com>
2019-12-04 10:59:14 +02:00
Juliusz Stasiewicz
430b2ad19d commitlog+region_group: timeout exceptions with names
`segment_manager' now uses a decorated version of `timed_out_error'
with hardcoded name. On the other hand `region_group' uses named
`on_request_expiry' within its `expiring_fifo'.
2019-12-03 19:07:19 +01:00
Avi Kivity
91d3f2afce docs: maintainers.md: fix typo in git push --force-with-lease
Just one lease, not many.

Reported by Piotr Sarna.
2019-12-03 18:17:46 +01:00
Calle Wilund
56a5e0a251 commitlog_replayer: Ensure applied frozen_mutation is safe during apply
Fixes #5211

In 79935df959 replay apply-call was
changed from one with no continuation to one with. But the frozen
mutation arg was still just lambda local.

Change to use do_with for this case as well.

Message-Id: <20191203162606.1664-1-calle@scylladb.com>
2019-12-03 18:28:01 +02:00
Juliusz Stasiewicz
d043393f52 db+semaphores+tests: mandatory `name' param in reader_concurrency_semaphore
Exception messages contain semaphore's name (provided in ctor).
This affects the queue overflow exception as well as timeout
exception. Also, custom throwing function in ctor was changed
to `prethrow_action', i.e. metrics can still be updated there but
now callers have no control over the type of the exception being
thrown. This affected `restricted_reader_max_queue_length' test.
`reader_concurrency_semaphore'-s docs are updated accordingly.
2019-12-03 15:41:34 +01:00
Amos Kong
e26b396f16 scylla-docker: fix default data_directories in scyllasetup.py (#5399)
Use default data_file_directories if it's not assigned in scylla.yaml

Fixes #5398

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-12-03 13:58:17 +02:00
Rafael Ávila de Espíndola
1cd17887fa build: strip debug when configured with --debuginfo 0
In a build configured with --debuginfo 0 the scylla binary still ends
up with some debug info from the libraries that are statically linked
in.

We should avoid compiling subprojects (including seastar) with debug
info when none is needed, but this at least avoids it showing up in
the binary.

The main motivation for this is that it is confusing to get a binary
with *some* debug info in it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191127215843.44992-1-espindola@scylladb.com>
2019-12-03 12:41:04 +02:00
Tomasz Grabiec
0a453e5d30 Merge "Use fragmented buffers for collection de/serialization" from Botond
This series refactors the collection de/serialization code to use
fragmented buffers, avoiding the large allocations and the associated
pains when working with large collections. Currently all operations that
involve collections require deserializing them, executing the operation,
then serializing them again to their internal storage format. The
de/serialization operations happen in linearized buffers, which means
that we have to allocate a buffer large enough to hold the *entire*
collection. This can cause immense pressure on the memory allocator,
which, in the face of memory fragmentation, might be unable to serve the
allocation at all. We've seen this causing all sorts of nasty problems,
including but not limited to: failing compactions, failing memtable
flush, OOM crash and etc.

Users are strongly discouraged from using large collections, yet they
are still a fact of life and have been haunting us since forever.

The proper solution for these problems would be to come up with an
in-memory format for collections, however that is a major effort, with a
lot of unknowns. This is something we plan on doing at some point but
until it happens we should make life less painful for those with large
collections.

The goal of this series is to avoid the need of allocating these large
buffers. Serialization now happens into a `bytes_ostream` which
automatically fragments the values internally. Deserialization happens
with `utils::linearizing_input_stream` (introduced by this series), which
linearizes only the individual collection cells, but not the entire
collection.
An important goal of this series was to introduce the least amount of
risk, and hence the least amount of code. This series does not try to
make a revolution and completely revamp and optimize the
de/serialization codepaths. These codepaths have their days numbered so
investing a lot of effort into them is in vain. We can apply incremental
optimizations where we deem it necessary.

Fixes: #5341
2019-12-03 10:31:34 +01:00
fastio
01599ffbae Redis API: Support the syntax of deleting multiple keys in one DEL command, fix the returning value for GET command.
Support to delete multiple keys in one DEL command.
The feature of returning number of the really deleted keys is still not supported.
Return empty string to client for GET command when the required key is not exists.

Fixes: #5334

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-12-03 17:27:40 +08:00
fastio
039b83ad3b Redis API: Rename options related to Redis API, describe them clearly, and remove unnecessary one.
Rename option redis_transport_port to redis_port, which the redis transport listens on for clients.
Rename option redis_transport_port_ssl to redis_ssl_port, which the redis TLS transport listens on for clients.
Rename option redis_database_count. Set the redis dabase count.
Rename option redis_keyspace_opitons to redis_keyspace_replication_strategy_options. Set the replication strategy for redis keyspace.
Remove option enable_redis_protocol, which is unnecessary.

Fixes: #5335

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-12-03 17:13:35 +08:00
Nadav Har'El
7b93360c8d Merge: redis: skip processing request of EOF
Merged pull request https://github.com/scylladb/scylla/pull/5393/ by
Amos Kong:
`
When I test the redis cmd by echo and nc, there is a redundant error in the end.
I checked by strace, currently if client read nothing from stdin, it will
shutdown the socket, redis server will read nothing (0 byte) from socket. But
it tries to process the empty command and returns an error.

$ echo -n -e '*1\r\n$4\r\nping\r\n' |strace nc localhost 6379
| ...
|    read(0, "*1\r\n$4\r\nping\r\n", 8192)   = 14
|    select(5, [4], [4], [], NULL)           = 1 (out [4])
|>>> sendto(4, "*1\r\n$4\r\nping\r\n", 14, 0, NULL, 0) = 14
|    select(5, [0 4], [], [], NULL)          = 1 (in [0])
|    recvfrom(0, 0x7ffe4d5b6c70, 8192, 0, 0x7ffe4d5b6bf0, 0x7ffe4d5b6bec) = -1 ENOTSOCK (Socket operation on non-socket)
|    read(0, "", 8192)                       = 0
|>>> shutdown(4, SHUT_WR)                    = 0
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "+PONG\r\n-ERR unknown command ''\r\n", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 32
|    write(1, "+PONG\r\n-ERR unknown command ''\r\n", 32+PONG
|    -ERR unknown command ''
|    ) = 32
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 0
|    close(1)                                = 0
|    close(4)                                = 0

Current result:
  $ echo -n -e '' |nc localhost 6379
  -ERR unknown command ''
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG
  -ERR unknown command ''

Expected:
  $ echo -n -e '' |nc localhost 6379
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG
2019-12-03 10:40:20 +02:00
Avi Kivity
83feb9ea77 tools: toolchain: update frozen image
Commit 96009881d8 added diffutils to the dependencies via
Seastar's install-dependencies.sh, after it was inadvertantly
dropped in 1164ff5329 (update to Fedora 31; diffutils is no
longer brought in as a side effect of something else).

Regenerate the image to include diffutils.

Ref #5401.
2019-12-03 10:36:55 +02:00
Amos Kong
fb9af2a86b redis-test: add test_raw_cmd.py
This patch added subtests for EOF process, it reads and writes the socket
directly by using protocol cmds.

We can add more tests in future, tests with Redis module will hide some
protocol error.

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-12-03 10:47:56 +08:00
Amos Kong
4fa862adf4 redis: skip processing request of EOF
When I test the redis cmd by echo and nc, there is a redundant error in the end.
I checked by strace, currently if client read nothing from stdin, it will
shutdown the socket, redis server will read nothing (0 byte) from socket. But
it tries to process the empty command and returns an error.

$ echo -n -e '*1\r\n$4\r\nping\r\n' |strace nc localhost 6379
| ...
|    read(0, "*1\r\n$4\r\nping\r\n", 8192)   = 14
|    select(5, [4], [4], [], NULL)           = 1 (out [4])
|>>> sendto(4, "*1\r\n$4\r\nping\r\n", 14, 0, NULL, 0) = 14
|    select(5, [0 4], [], [], NULL)          = 1 (in [0])
|    recvfrom(0, 0x7ffe4d5b6c70, 8192, 0, 0x7ffe4d5b6bf0, 0x7ffe4d5b6bec) = -1 ENOTSOCK (Socket operation on non-socket)
|    read(0, "", 8192)                       = 0
|>>> shutdown(4, SHUT_WR)                    = 0
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "+PONG\r\n-ERR unknown command ''\r\n", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 32
|    write(1, "+PONG\r\n-ERR unknown command ''\r\n", 32+PONG
|    -ERR unknown command ''
|    ) = 32
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 0
|    close(1)                                = 0
|    close(4)                                = 0

Current result:
  $ echo -n -e '' |nc localhost 6379
  -ERR unknown command ''
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG
  -ERR unknown command ''

Expected:
  $ echo -n -e '' |nc localhost 6379
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-12-03 10:47:56 +08:00
Rafael Ávila de Espíndola
bb114de023 dbuild: Fix confusion about relabeling
podman needs to relabel directories in exactly the same cases docker
does. The difference is that podman cannot relabel /tmp.

The reason it was working before is that in practice anyone using
dbuild has already relabeled any directories that need relabeling,
with the exception of /tmp, since it is recreated on every boot.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191201235614.10511-2-espindola@scylladb.com>
2019-12-02 18:38:16 +02:00
Rafael Ávila de Espíndola
867cdbda28 dbuild: Use a temporary directory for /tmp
With this we don't have to use --security-opt label=disable.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191201235614.10511-1-espindola@scylladb.com>
2019-12-02 18:38:14 +02:00
Botond Dénes
1d1f8b0d82 tests: mutation_test: add large collection allocation test
Checking that there are no large allocations when a large collection is
de/serialized.
2019-12-02 17:13:53 +02:00
Avi Kivity
28355af134 docs: add maintainer's handbook (#5396)
This is a list of recipes used by maintainers to maintain
scylla.git.
2019-12-02 15:01:54 +02:00
Calle Wilund
8c6d6254cf cdc: Remove some code from header 2019-12-02 13:00:19 +00:00
Botond Dénes
4c59487502 collection_mutation: don't linearize the buffer on deserialization
Use `utils::linearizing_input_stream` for the deserizalization of the
collection. Allows for avoiding the linearization of the entire cell
value, instead only linearizing individual values as they are
deserialized from the buffer.
2019-12-02 10:10:31 +02:00
Botond Dénes
690e9d2b44 utils: introduce linearizing_input_stream
`linearizing_input_stream` allows transparently reading linearized
values from a fragmented buffer. This is done by linearizing on-the-fly
only those read values that happen to be split across multiple
fragments. This reduces the size of the largest allocation from the size
of the entire buffer (when the entire buffer is linearized) to the size
of the largest read value. This is a huge gain when the buffer contains
loads of small objects, and modest gains when the buffer contains few
large objects. But the even in the worst case the size of the largest
allocation will be less or equal compared to the case where the entire
buffer is linearized.

This stream is planned to be used as glue code between the fragmented
cell value and the collection deserialization code which expects to be
reading linearized values.
2019-12-02 10:10:31 +02:00
Botond Dénes
065d8d37eb tests: random-utils: get_string(): add overload that takes engine parameter 2019-12-02 10:10:31 +02:00
Botond Dénes
2f9307c973 collection_mutation: use a fragmented buffer for serialization
For the serialization `bytes_ostream` is used.
2019-12-02 10:10:31 +02:00
Botond Dénes
fc5b096f73 imr: value_writer::write_to_destination(): don't dereference chunk iterator eagerly
Currently the loop which writes the data from the fragmented origin to
the destination, moves to the next chunk eagerly after writing the value
of the current chunk, if the current chunk is exhausted.
This presents a problem when we are writing the last piece of data from
the last chunk, as the chunk will be exhausted and we eagerly attempt to
move to the next chunk, which doesn't exist and dereferencing it will
fail. The solution is to not be eager about moving to the next chunk and
only attempt it if we actually have more data to write and hence expect
more chunks.
2019-12-02 10:10:31 +02:00
Botond Dénes
875314fc4b bytes_ostream: make it a FragmentRange
The presence of `const_iterator` seems to be a requirement as well
although it is not part of the concept. But perhaps it is just an
assumption made by code using it.
2019-12-02 10:10:31 +02:00
Botond Dénes
4054ba0c45 serialization: accept any CharOutputIterator
Not just bytes::output_iterator. Allow writing into streams other than
just `bytes`. In fact we should be very careful with writing into
`bytes` as they require potentially large contiguous allocations.

The `write()` method is now templatized also on the type of its first
argument, which now accepts any CharOutputIterator. Due to our poor
usage of namespace this now collides with `write` defined inside
`db/commitlog/commitlog.cc`. Luckily, the latter doesn't really have to
be templatized on the data type it reads from, and de-templatizing it
resolves the clash.
2019-12-02 10:10:31 +02:00
Botond Dénes
07007edab9 bytes_ostream: add output_iterator
To allow it being used for serialization code, which works in terms of
output iterators.
2019-12-02 10:10:31 +02:00
Takuya ASADA
c5a95210fe dist/common/scripts/scylla_setup: list virtio-blk devices correctly on interactive RAID setup
Currently interactive RAID setup prompt does not list virtio-blk devices due to
following reasons:
 - We fail matching '-p' option on 'lsblk --help' output since misusage of
   regex functon, list_block_devices() always skipping to use lsblk output.
 - We don't check existance of /dev/vd* when we skipping to use lsblk.
 - We mistakenly excluded virtio-blk devices on 'lsblk -pnr' output using '-e'
   option, but we actually needed them.

To fix the problem we need to use re.search() instead of re.match() to match
'-p' option on 'lsblk --help', need to add '/dev/vd*' on block device list,
then need to stop '-e 252' option on lsblk which excludes virtio-blk.

Additionally, it better to parse 'TYPE' field of lsblk output, we should skip
'loop' devices and 'rom' devices since these are not disk devices.

Fixes #4066

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191201160143.219456-1-syuu@scylladb.com>
2019-12-01 18:36:48 +02:00
Takuya ASADA
124da83103 dist/common/scripts: use chrony as NTP server on RHEL8/CentOS8
We need to use chrony as NTP server on RHEL8/CentOS8, since it dropped
ntpd/ntpdate.

Fixes #4571

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191101174032.29171-1-syuu@scylladb.com>
2019-12-01 18:35:03 +02:00
Nadav Har'El
b82417ba27 Merge "alternator: Implement Expected operators LE, GE, and BETWEEN"
Merged pull request https://github.com/scylladb/scylla/pull/5392 from
Dejan Mircevski.

Refs #5034

The patches:
  alternator: Implement LE operator in Expected
  alternator: Implement GE operator in Expected
  alternator: Make cmp diagnostic a value, not funct
  utils: Add operator<< for big_decimal
  alternator: Implement BETWEEN operator in Expected
2019-12-01 16:11:11 +02:00
Nadav Har'El
8614c30bcf Merge "implement echo command"
Merged pull request https://github.com/scylladb/scylla/pull/5387 from
Amos Kong:

This patch implemented echo command, which return the string back to client.

Reference:

    https://redis.io/commands/echo
2019-12-01 10:29:57 +02:00
Amos Kong
49fee4120e redis-test: add test_echo
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-30 13:32:00 +08:00
Amos Kong
3e2034f07b redis: implement echo command
This patch implemented echo command, which return the string back to client.

Reference:
- https://redis.io/commands/echo

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-30 13:30:35 +08:00
Dejan Mircevski
dcb1b360ba alternator: Implement BETWEEN operator in Expected
Enable existing BETWEEN test, and add some more coverage to it.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 16:47:21 -05:00
Dejan Mircevski
c43b286f35 utils: Add operator<< for big_decimal
... and remove an existing duplicate from lua.cc.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 15:32:09 -05:00
Dejan Mircevski
e0d77739cc alternator: Make cmp diagnostic a value, not funct
All check_compare diagnostics are static strings, so there's no need
to call functions to get them.  Instead of a function, make diagnostic
a simple value.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 15:09:05 -05:00
Dejan Mircevski
65cb84150a alternator: Implement GE operator in Expected
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 12:29:08 -05:00
Dejan Mircevski
f201f0eaee alternator: Implement LE operator in Expected
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 11:59:52 -05:00
Avi Kivity
96009881d8 Update seastar submodule
* seastar 8eb6a67a4...166061da3 (3):
  > install-dependencies.sh: add diffutils
  > reactor: replace std::optional (in _network_stack_ready) with compat::optional
  > noncopyable_function: disable -Wuninitialized warning in noncopyable_function_base

Ref #5386.
2019-11-29 12:50:48 +02:00
Tomasz Grabiec
6562c60c86 Merge "test.py: terminate children upon signal" from Kostja
Allows a signal to terminate the outstanding
test tasks, to avoid dangling children.
2019-11-29 12:05:03 +02:00
Pekka Enberg
bb227cf2b4 Merge "Fix default directories in Scylla setup scripts" from Amos
"Fix two problem in scylla_io_setup:

 - Problem 1: paths of default directories is invalid, introduced by
   commit 5ec1915 ("scylla_io_setup: assume default directories under
   /var/lib/scylla").

 - Problem 2: wrong path join, introduced by commit 31ddb21
   ("dist/common/scripts: support nonroot mode on setup scripts").

Fix a problem in scylla_io_setup, scylla_fstrim and scylla_blocktune.py:

  - Fixed default scylla directories when they aren't assigned in
    scylla.yaml"

Fixes #5370

Reviewed-by: Pavel Emelyanov <xemul@scylladb.com>

* 'scylla_io_setup' of git://github.com/amoskong/scylla:
  use parse_scylla_dirs_with_default to get scylla directories
  scylla_io_setup: fix data_file_directories check
  scylla_util: introduce helper to process the default scylla directories
  scylla_util: get workdir by datadir() if it's not assigned in scylla.yaml
  scylla_io_setup: fix path join of default scylla directories
2019-11-29 12:05:03 +02:00
Ultrabug
61f1e6e99c test.py: fix undefined variable 'options' in write_xunit_report() 2019-11-28 19:06:22 +03:00
Ultrabug
5bdc0386c4 test.py: comparison to False should be 'if cond is False:' 2019-11-28 19:06:22 +03:00
Ultrabug
737b1cff5e test.py: use isinstance() for type comparison 2019-11-28 19:06:22 +03:00
Konstantin Osipov
c611325381 test.py: terminate children upon signal
Use asyncio as a more modern way to work with concurrency,
Process signals in an event loop, terminate all outstanding
tests before exiting.

Breaking change: this commit requires Python 3.7 or
newer to run this script. The patch adds a version
check and a message to enforce it.
2019-11-28 19:06:22 +03:00
Botond Dénes
cf24f4fe30 imr: move documentation to docs/
Where all the other documentation is, and hence where people would be
looking for it.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191128144612.378244-1-bdenes@scylladb.com>
2019-11-28 16:47:52 +02:00
Avi Kivity
36dd0140a8 Update seastar submodule
* seastar 5c25de907a...8eb6a67a4b (1):
  > util/backtrace.hh: add missing print.hh include
2019-11-28 16:47:16 +02:00
Benny Halevy
7aef39e400 tracing: one_session_records: keep local tracing ptr
Similar to trace_state keep shared_ptr<tracing> _local_tracing_ptr
in one_session_records when constructed so it can be used
during shutdown.

Fixes #5243

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-11-28 15:24:10 +01:00
Gleb Natapov
75499896ab client_state: store _user as optional instead of shared_ptr
_user cannot outlive client_state class instance, so there is no point
in holding it in shared_ptr.

Tested: debug test.py and dtest auth_test.py

Message-Id: <20191128131217.26294-5-gleb@scylladb.com>
2019-11-28 15:48:59 +02:00
Gleb Natapov
1538cea043 cql: modification_statement: store _restrictions as optional instead of shared_ptr
_restrictions can be optional since its lifetime is managed by
modification_statement class explicitly.

Message-Id: <20191128131217.26294-4-gleb@scylladb.com>
2019-11-28 15:48:54 +02:00
Gleb Natapov
ce5d6d5eee storage_service: store thrift server as an optional instead of shared_ptr
Only do_stop_rpc_server uses the shared_ptr to prolong server's
lifetime until stop() completes, but do_with() can be used to achieve the
same.

Message-Id: <20191128131217.26294-3-gleb@scylladb.com>
2019-11-28 15:48:51 +02:00
Gleb Natapov
b9b99431a8 storage_service: store cql server as an optional instead of shared_ptr
Only do_stop_native_transport() uses the shared_ptr to prolong server's
lifetime until stop() completes, but do_with() can be used to achieve the
same.

Message-Id: <20191128131217.26294-2-gleb@scylladb.com>
2019-11-28 15:48:47 +02:00
Avi Kivity
2b7e97514a Update seastar submodule
* seastar 6f0ef32514...5c25de907a (7):
  > shared_future: Fix crash when all returned futures time out
Fixes #5322.
  > future: don't create temporaries on get_value().
  > reactor: lower the default stall threshold to 200ms
  > reactor: Simplify network initialization
  > reactor: Replace most std::function with noncopyable_function
  > futures: Avoid extra moves in SEASTAR_TYPE_ERASE_MORE mode
  > inet_address: Make inet_address == operator ignore scope (again)
2019-11-28 14:48:01 +02:00
Juliusz Stasiewicz
fa12394dfe reader_concurrency_semaphore: cosmetic changes
Added line breaks, replaced unused include, included seastarx.hh
instead of `using namespace seastar`.
2019-11-28 13:39:08 +01:00
Nadav Har'El
fde336a882 Merged "5139 minmax bad printing"
Merged pull request https://github.com/scylladb/scylla/pull/5311 from
Juliusz Stasiewicz:

This is a partial solution to #5139 (only for two types) because of the
above and because collections are much harder to do. They are coming in
a separate PR.
2019-11-28 14:06:43 +02:00
Juliusz Stasiewicz
3b9ebca269 tests/cql_query_test: add test for aggregates on inet+time_type
This is a test to max(), min() and count() system functions on
the arguments of types: `net::inet_address` and `time_native_type`.
2019-11-28 11:20:43 +01:00
Juliusz Stasiewicz
9c23d89531 cql3/functions: add missing min/max/count for inet and time type
References #5139. Aggregate functions, like max(), when invoked
on `inet_address' and `time_native_type' used to choose
max(blob)->blob overload, with casting of argument and result to
bytes. This is because appropriate calls to
`aggregate_fcts::make_XXX_function()' were missing. This commit
adds them. Functioning remains the same but now clients see
user-friendly representations of aggregate result, not binary.

Comparing inet addresses without inet::operator< is performed by
trick, where ADL is bypassed by wrapping the name of std::min/max
and providing an overload of wrapper on inet type.
2019-11-28 11:18:31 +01:00
Pavel Emelyanov
8532093c61 cql: The cql_server does not need proxy reference
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191127153842.4098-1-xemul@scylladb.com>
2019-11-28 10:58:46 +01:00
Amos Kong
e2eb754d03 use parse_scylla_dirs_with_default to get scylla directories
Use default data_file_directories/commitlog_directory if it's not assigned
in scylla.yaml

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 15:48:14 +08:00
Amos Kong
bd265bda4f scylla_io_setup: fix data_file_directories check
Use default data_file_directories if it's not assigned in scylla.yaml

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 15:47:56 +08:00
Amos Kong
123c791366 scylla_util: introduce helper to process the default scylla directories
Currently we support to assign workdir from scylla.yaml, and we use many
hardcode '/var/lib/scylla' in setup scripts.

Some setup scripts get scylla directories by parsing scylla.yaml, introduced
parse_scylla_dirs_with_default() that adds default values if scylla directories
aren't assigned in scylla.yaml

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 14:54:32 +08:00
Amos Kong
b75061b4bc scylla_util: get workdir by datadir() if it's not assigned in scylla.yaml
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 14:38:01 +08:00
Amos Kong
ada0e92b85 scylla_io_setup: fix path join of default scylla directories
Currently we are checking an invalid path of some default scylla directories,
the directories don't exist, so the tune will always be skipped. It caused by
two problem.

Problem 1: paths of default directories is invalid

Introduced by commit 5ec191536e, we try to tune some scylla default directories
if they exist. But the directory paths we try are wrong.

For example:
- What we check: /var/lib/scylla/commitlog_directory
- Correct one: /var/lib/scylla/commitlog

Problem 2: wrong path join

Introduced by commit 31ddb2145a, default_path might be replaced from
'/var/lib/scylla/' to '/var/lib/scylla'.

Our code tries to check an invalid path that is wrongly join, eg:
'/var/lib/scyllacommitlog'

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 14:37:58 +08:00
Amos Kong
d4a26f2ad0 scylla_util: get_scylla_dirs: return default data/commitlog directories if they aren't set (#5358)
The default values of data_file_directories and commitlog_directory were
commented by commit e0f40ed16a. It causes scylla_util.py:get_scylla_dirs() to
fail in checking the values.

This patch changed get_scylla_dirs() to return default data/commitlog
directories if they aren't set.

Fixes #5358 

Reviewed-by: Pavel Emelyanov <xemul@scylladb.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-27 13:52:05 +02:00
Nadav Har'El
cb1ed5eab2 alternator-test: test Query's Limit parameter
Add a test, test_query.py::test_query_limit, to verify that the Limit
parameter correctly limits the number of rows returned by the Query.
This was supposed to already work correctly - but we never had a test for
it. As we hoped, the test passes (on both Alternator and DynamoDB).

Another test, test_query.py::test_query_limit_paging, verifies that
paging can be done with any setting of Limit. We already had tests
for paging of the Scan operation, but not for the Query operation.

Refs #5153

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-27 12:27:26 +01:00
Nadav Har'El
c01ca661a0 alternator-test: Select parameter of Query and Scan
This is a comprehensive test for the "Select" parameter of Query and Scan
operations, but only for the base-table case, not index, so another future
patch should add similar tests in test_gsi.py and test_lsi.py as well.

The main use of the Select parameter is to allow returning just the count
of items, instead of their content, but it also has other esoteric options,
all of which we test here.

The test currently succeeds on AWS DynamoDB, demonstrating that the test
is correct, but fails on Alternator because the "Select" parameter is not
yet supported. So the test is marked xfail.

Refs #5058

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-27 12:22:33 +01:00
Botond Dénes
9d09f57ba5 scylla-gdb.py: scylla_smp_queues: use lazy initalization
Currently the command tries to read all seastar smp queues in its
initialization code in the constructor. This constructor is run each
time `scylla-gdb.py` is sourced in `gdb` which leads to slowdowns and
sometimes also annoying errors because the sourcing happens in the wrong
context and seastar symbols are not available.
Avoid this by running this initializing code lazily, on the first
invocation.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191127095408.112101-1-bdenes@scylladb.com>
2019-11-27 12:04:57 +01:00
Tomasz Grabiec
87b72dad3e Merge "treewide: add missing const qualifiers" from Pavel Solodovnikov
This patchset adds missing "const" function qualifiers throughout
the Scylla code base, which would make code less error-prone.

The changeset incorporates Kostja's work regarding const qualifiers
in the cql code hierarchy along with a follow-up patch addressing the
review comment of the corresponding patch set (the patch subject is
"cql: propagate const property through prepared statement tree.").
2019-11-27 10:56:20 +01:00
Rafael Ávila de Espíndola
91b43f1f06 dbuild: fix podman with selinux enabled
With this change I am able to run tests using docker-podman. The
option also exists in docker.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191126194101.25221-1-espindola@scylladb.com>
2019-11-26 21:50:56 +02:00
Rafael Ávila de Espíndola
480055d3b5 dbuild: Fix missing docker options
With the recent changes docker was missing a few options. In
particular, it was missing -u.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191126194347.25699-1-espindola@scylladb.com>
2019-11-26 21:45:31 +02:00
Rafael Ávila de Espíndola
c0a2cd70ff lua: fix test with boost 1.66
The boost 1.67 release notes says

Changed maximum supported year from 10000 to 9999 to resolve various issues

So change the test to use a larger number so that we get an exception
with both boost 1.66 and boost 1.67.

Fixes #5344

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191126180327.93545-1-espindola@scylladb.com>
2019-11-26 21:17:15 +02:00
Pavel Solodovnikov
55a1d46133 cql: some more missing const qualifiers
There are several virtual functions in public interfaces named "is_*"
that clearly should be marked as "const", so fix that.
2019-11-26 17:57:51 +03:00
Pavel Solodovnikov
412f1f946a cql: remove "mutable" on _opts in select_statement
_opts initialization can be safely done in the constructor, hence no need to make it mutable.
2019-11-26 17:55:10 +03:00
Piotr Sarna
d90dbd6ab0 Merge "support podman as a replacement to docker" from Avi
Docker on Fedora 31 is flakey, and is not supported at all on RHEL 8.
Podman is a drop-in replacement for docker; this series adds support
for using podman in dbuild.

Apart from actually working on Fedora 31 hosts,
podman is nicer in being more secure and not requiring a daemon.

Fixes #5332
2019-11-26 15:17:49 +01:00
Tomasz Grabiec
5c9fe83615 Merge "Sanitize sub-modules shutting down" from Pavel
As suggested in issue #4586 here is the helper that prints
"shutting down foo" message, then shuts the foo down, then
prints the "[it] was successull" one. In between it catches
the exception (if any) and warns this in logs.

By "then" I mean literally then, not the seastar's then() :)

Fixes: #4586
2019-11-26 15:14:22 +02:00
Piotr Sarna
9c5a5a5ac2 treewide: add names to semaphores
By default, semaphore exceptions bring along very little context:
either that a semaphore was broken or that it timed out.
In order to make debugging easier without introducing significant
runtime costs, a notion of named semaphore is added.
A named semaphore is simply a semaphore with statically defined
name, which is present in its errors, bringing valuable context.
A semaphore defined as:

  auto sem = semaphore(0);

will present the following message when it breaks:
"Semaphore broken"
However, a named semaphore:

  auto named_sem = named_semaphore(0, named_semaphore_exception_factory{"io_concurrency_sem"});

will present a message with at least some debugging context:

  "Semaphore broken: io_concurrency_sem"

It's not much, but it would really help in pinpointing bugs
without having to inspect core dumps.

At the same time, it does not incur any costs for normal
semaphore operations (except for its creation), but instead
only uses more CPU in case an error is actually thrown,
which is considered rare and not to be on the hot path.

Refs #4999

Tests: unit(dev), manual: hardcoding a failure in view building code
2019-11-26 15:14:21 +02:00
Avi Kivity
6fbb724140 conf: remove unsupported options from scylla.yaml (#5299)
These unsupported options do nothing except to confuse users who
try to tune them.

Options removed:

hinted_handoff_throttle_in_kb
max_hints_delivery_threads
batchlog_replay_throttle_in_kb
key_cache_size_in_mb
key_cache_save_period
key_cache_keys_to_save
row_cache_size_in_mb
row_cache_save_period
row_cache_keys_to_save
counter_cache_size_in_mb
counter_cache_save_period
counter_cache_keys_to_save
memory_allocator
saved_caches_directory
concurrent_reads
concurrent_writes
concurrent_counter_writes
file_cache_size_in_mb
index_summary_capacity_in_mb
index_summary_resize_interval_in_minutes
trickle_fsync
trickle_fsync_interval_in_kb
internode_authenticator
native_transport_max_threads
native_transport_max_concurrent_connections
native_transport_max_concurrent_connections_per_ip
rpc_server_type
rpc_min_threads
rpc_max_threads
rpc_send_buff_size_in_bytes
rpc_recv_buff_size_in_bytes
internode_send_buff_size_in_bytes
internode_recv_buff_size_in_bytes
thrift_framed_transport_size_in_mb
concurrent_compactors
compaction_throughput_mb_per_sec
sstable_preemptive_open_interval_in_mb
inter_dc_stream_throughput_outbound_megabits_per_sec
cross_node_timeout
streaming_socket_timeout_in_ms
dynamic_snitch_update_interval_in_ms
dynamic_snitch_reset_interval_in_ms
dynamic_snitch_badness_threshold
request_scheduler
request_scheduler_options
throttle_limit
default_weight
weights
request_scheduler_id
2019-11-26 15:14:21 +02:00
Amos Kong
817f34d1a9 ami: support new aws instance types: c5d, m5d, m5ad, r5d, z1d (#5330)
Currently scylla_io_setup will skip in scylla_setup, because we didn't support
those new instance types.

I manually executed scylla_io_setup, and the scylla-server started and worked
well.

Let's apply this patch first, then check if there is some new problem in
ami-test.

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-26 15:14:21 +02:00
Konstantin Osipov
90346236ac cql: propagate const property through prepared statement tree.
cql_statement is a class representing a prepared statement in Scylla.
It is used concurrently during execution, so it is important that its
change is not changed by execution.

Add const qualifier to the execution methods family, throghout the
cql hierarchy.

Mark a few places which do mutate prepared statement state during
execution as mutable. While these are not affecting production today,
as code ages, they may become a source of latent bugs and should be
moved out of the prepared state or evaluated at prepare eventually:

cf_property_defs::_compaction_strategy_class
list_permissions_statement::_resource
permission_altering_statement::_resource
property_definitions::_properties
select_statement::_opts
2019-11-26 14:18:17 +03:00
Pavel Solodovnikov
2f442f28af treewide: add const qualifiers throughout the code base 2019-11-26 02:24:49 +03:00
Pavel Emelyanov
50a1ededde main: Remove now unused defer-with-log helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
a0f92d40ee main: Shut down sighup handler with verbose helper
And (!) fix the misprinted variable name.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
0719369d83 repair: Remove extra logging on shutdown
The shutdown start/finish messages are already printed in verbose_shutdown()

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
2d64fc3a3e main: Shut down database with verbose_shutdown helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
636c300db5 main: Shut down prometheus with verbose_shutdown()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

---

v2:
- Have stop easrlier so that exception in start/listen do
  not prevent prometheu.stop from calling
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
804b152527 main: Sanitize shutting down callbacks
As suggested in issue #4586 here is the helper that prints
"shutting down foo" message, then shuts the foo down, then
prints the "shutting down foo was successfull". In between
it catches the exception (if any) and warns this in logs.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:45:49 +03:00
Nadav Har'El
4160b3630d Merge "Return preimage from CDC only when it's enabled"
Merged pull request https://github.com/scylladb/scylla/pull/5218
from Piotr Jastrzębski:

Users should be able to decide whether they need preimage or not. There is
already an option for that but it's not respected by the implementation.
This PR adds support for this functionality.

Tests: unit(dev).

Individual patches:
  cdc: Don't take storage_proxy as transformer::pre_image_select param
  cdc::append_log_mutations: use do_with instead of shared_ptr
  cdc::append_log_mutations: fix undefined behavior
  cdc: enable preimage in test_pre_image_logging test
  cdc: Return preimage only when it's requested
  cdc: test both enabled and disabled preimage in test_pre_image_logging
2019-11-25 14:32:17 +02:00
Pavel Emelyanov
f6ac969f1e mm: Stop migration manager
Before stopping the db itself, stop the migration service.
It must be stopped before RPC, but RPC is not stopped yet
itself, so we should be safe here.

Here's the tail of the resulting logs:

INFO  2019-11-20 11:22:35,193 [shard 0] init - shutdown migration manager
INFO  2019-11-20 11:22:35,193 [shard 0] migration_manager - stopping migration service
INFO  2019-11-20 11:22:35,193 [shard 1] migration_manager - stopping migration service
INFO  2019-11-20 11:22:35,193 [shard 0] init - Shutdown database started
INFO  2019-11-20 11:22:35,193 [shard 0] init - Shutdown database finished
INFO  2019-11-20 11:22:35,193 [shard 0] init - stopping prometheus API server
INFO  2019-11-20 11:22:35,193 [shard 0] init - Scylla version 666.development-0.20191120.25820980f shutdown complete.

Also -- stop the mm on drain before the commitlog it stopped.
[Tomasz: mm needs the cl because pulling schema changes from other nodes
involves applying them into the database. So cl/db needs to be
stopped after mm is stopped.]

The drain logs would look like

...
INFO  2019-11-25 11:00:40,562 [shard 0] migration_manager - stopping migration service
INFO  2019-11-25 11:00:40,562 [shard 1] migration_manager - stopping migration service
INFO  2019-11-25 11:00:40,563 [shard 0] storage_service - DRAINED:

and then on stop

...
INFO  2019-11-25 11:00:46,427 [shard 0] init - shutdown migration manager
INFO  2019-11-25 11:00:46,427 [shard 0] init - Shutdown database started
INFO  2019-11-25 11:00:46,427 [shard 0] init - Shutdown database finished
INFO  2019-11-25 11:00:46,427 [shard 0] init - stopping prometheus API server
INFO  2019-11-25 11:00:46,427 [shard 0] init - Scylla version 666.development-0.20191125.3eab6cd54 shutdown complete.

Fixes #5300

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191125080605.7661-1-xemul@scylladb.com>
2019-11-25 12:59:01 +01:00
Asias He
6ec602ff2c repair: Fix rx_hashes_nr metrics (#5213)
In get_full_row_hashes_with_rpc_stream and
repair_get_row_diff_with_rpc_stream_process_op which were introduced in
the "Repair switch to rpc stream" series, rx_hashes_nr metrics are not
updated correctly.

In the test we have 3 nodes and run repair on node3, we makes sure the
following metrics are correct.

assertEqual(node1_metrics['scylla_repair_tx_hashes_nr'] + node2_metrics['scylla_repair_tx_hashes_nr'],
   	    node3_metrics['scylla_repair_rx_hashes_nr'])
assertEqual(node1_metrics['scylla_repair_rx_hashes_nr'] + node2_metrics['scylla_repair_rx_hashes_nr'],
   	    node3_metrics['scylla_repair_tx_hashes_nr'])
assertEqual(node1_metrics['scylla_repair_tx_row_nr'] + node2_metrics['scylla_repair_tx_row_nr'],
   	    node3_metrics['scylla_repair_rx_row_nr'])
assertEqual(node1_metrics['scylla_repair_rx_row_nr'] + node2_metrics['scylla_repair_rx_row_nr'],
   	    node3_metrics['scylla_repair_tx_row_nr'])
assertEqual(node1_metrics['scylla_repair_tx_row_bytes'] + node2_metrics['scylla_repair_tx_row_bytes'],
   	    node3_metrics['scylla_repair_rx_row_bytes'])
assertEqual(node1_metrics['scylla_repair_rx_row_bytes'] + node2_metrics['scylla_repair_rx_row_bytes'],
            node3_metrics['scylla_repair_tx_row_bytes'])

Tests: repair_additional_test.py:RepairAdditionalTest.repair_almost_synced_3nodes_test
Fixes: #5339
Backports: 3.2
2019-11-25 13:57:37 +02:00
Piotr Jastrzebski
2999cb5576 cdc: test both enabled and disabled preimage in test_pre_image_logging
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
222b94c707 cdc: Return preimage only when it's requested
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
c94a5947b7 cdc: enable preimage in test_pre_image_logging test
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
595c9f9d32 cdc::append_log_mutations: fix undefined behavior
The code was iterating over a collection that was modified
at the same time. Iterators were used for that and collection
modification can invalidate all iterators.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
f0f44f9c51 cdc::append_log_mutations: use do_with instead of shared_ptr
This will not only safe some allocations but also improve
code readability.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
b8d9158c21 cdc: Don't take storage_proxy as transformer::pre_image_select param
transformer has access to storage_proxy through its _ctx field.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Nadav Har'El
3eab6cd549 Merged "toolchain: update to Fedora 31"
Merged pull request https://github.com/scylladb/scylla/pull/5310 from
Avi Kivity:

This is a minor update as gcc and boost versions did not change. A noteable
update is patchelf 0.10, which adds support to large binaries.

A few minor issues exposed by the update are fixed in preparatory patches.

Patches:
  dist: rpm: correct systemd post-uninstall scriptlet
  build: force xz compression on rpm binary payload
  tools: toolchain: update to Fedora 31
2019-11-24 13:38:45 +02:00
Tomasz Grabiec
e3d025d014 row_cache: Fix abort on bad_alloc during cache update
Since 90d6c0b, cache will abort when trying to detach partition
entries while they're updated. This should never happen. It can happen
though, when the update fails on bad_alloc, because the cleanup guard
invalidates the cache before it releases partition snapshots (held by
"update" coroutine).

Fix by destroying the coroutine first.

Fixes #5327.

Tests:
  - row_cache_test (dev)

Message-Id: <1574360259-10132-1-git-send-email-tgrabiec@scylladb.com>
2019-11-24 12:06:51 +02:00
Rafael Ávila de Espíndola
8599f8205b rpmbuild: don't use dwz
By default rpm uses dwz to merge the debug info from various
binaries. Unfortunately, it looks like addr2line has not been updated
to handle this:

// This works
$ addr2line  -e build/release/scylla 0x1234567

$ dwz -m build/release/common.debug build/release/scylla.debug build/release/iotune.debug

// now this fails
$ addr2line -e build/release/scylla 0x1234567

I think the issue is

https://sourceware.org/bugzilla/show_bug.cgi?id=23652

Fixes #5289

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123015734.89331-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
25d5d39b3c reloc: Force using sha1 for build-ids
The default build-id used by lld is xxhash, which is 8 bytes long. rpm
requires build-ids to be at least 16 bytes long
(https://github.com/rpm-software-management/rpm/issues/950). We force
using sha1 for now. That has no impact in gold and bfd since that is
their default. We set it in here instead of configure.py to not slow
down regular builds.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123020801.89750-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
b5667b9c31 build: don't compress debug info in executables
By default we were compressing debug info only in release
executables. The idea, if I understand it correctly, is that those are
the ones we ship, so we want a more compact binary.

I don't think that was doing anything useful. The compression is just
gzip, so when we ship a .tar.xz, having the debug info compressed
inside the scylla binary probably reduces the overall compression a
bit.

When building a rpm the situation in amusing. As part of the rpm
build process the debug info is decompressed and extracted to an
external file.

Given that most of the link time goes to compressing debug info, it is
probably a good idea to just skip that.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123022825.102837-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Tomasz Grabiec
d84859475e Merge "Refactor test.py and cleanup resources" from Kostja
Structure the code to be able to introduce futures.
Apply trivial cleanups.
Switch to asyncio and use it to work with processes and
handle signals. Cleanup all processes upon signal.
2019-11-24 11:35:29 +02:00
Tomasz Grabiec
e166fdfa26 Merge "Optimize LWT query phase" from Vladimir Davydov
This patch implements a simple optimization for LWT: it makes PAXOS
prepare phase query locally and return the current value of the modified
key so that a separate query is not necessary. For more details see
patch 6. Patch 1 fixes a bug in next. Patches 2-5 contain trivial
preparatory refactoring.
2019-11-24 11:35:29 +02:00
Pavel Solodovnikov
4879db70a6 system_keyspace: support timeouts in queries to system.paxos table.
Also introduce supplementary `execute_cql_with_timeout` function.

Remove redundant comment for `execute_cql`.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191121214148.57921-1-pa.solodovnikov@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
bf5f864d80 paxos: piggyback result query on prepare response
Current LWT implementation uses at least three network round trips:
 - first, execute PAXOS prepare phase
 - second, query the current value of the updated key
 - third, propose the change to participating replicas

(there's also learn phase, but we don't wait for it to complete).

The idea behind the optimization implemented by this patch is simple:
piggyback the current value of the updated key on the prepare response
to eliminate one round trip.

To generate less network traffic, only the closest to the coordinator
replica sends data while other participating replicas send digests which
are used to check data consistency.

Note, this patch changes the API of some RPC calls used by PAXOS, but
this should be okay as long as the feature in the early development
stage and marked experimental.

To assess the impact of this optimization on LWT performance, I ran a
simple benchmark that starts a number of concurrent clients each of
which updates its own key (uncontended case) stored in a cluster of
three AWS i3.2xlarge nodes located in the same region (us-west-1) and
measures the aggregate bandwidth and latency. The test uses shard-aware
gocql driver. Here are the results:

                latency 99% (ms)    bandwidth (rq/s)    timeouts (rq/s)
    clients     before  after       before  after       before  after
          1          2      2          626    637            0      0
          5          4      3         2616   2843            0      0
         10          3      3         4493   4767            0      0
         50          7      7        10567  10833            0      0
        100         15     15        12265  12934            0      0
        200         48     30        13593  14317            0      0
        400        185     60        14796  15549            0      0
        600        290     94        14416  15669            0      0
        800        568    118        14077  15820            2      0
       1000        710    118        13088  15830            9      0
       2000       1388    232        13342  15658           85      0
       3000       1110    363        13282  15422          233      0
       4000       1735    454        13387  15385          329      0

That is, this optimization improves max LWT bandwidth by about 15%
and allows to run 3-4x more clients while maintaining the same level
of system responsiveness.
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
6160b9017d commitlog: make sure a file is closed
If allocate or truncate throws, we have to close the file.

Fixes #4877

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191114174810.49004-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
3d1d4b018f paxos: remove unnecessary move constructor invocations
invoke_on() guarantees that captures object won't be destroyed until the
future returned by the invoked function is resolved so there's no need
to move key, token, proposal for calling paxos_state::*_impl helpers.
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
cfb079b2c9 types: Refactor duplicated value_cast implementation
The two implementations of value_cast were almost identical.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-3-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
ef2e96c47c storage_proxy: factor out helper to sort endpoints by proximity
We need it for PAXOS.
2019-11-24 11:35:29 +02:00
Nadav Har'El
854e6c8d7b alternator-test: test_health_only_works_for_root_path: remove wrong check
The test_health_only_works_for_root_path test checks that while Alternator's
HTTP server responds to a "GET /" request with success ("health check"), it
should respond to different URLs with failures (page not found).

One of the URLs it tested was "/..", but unfortunately some versions of
Python's HTTP client canonize this request to just a "/", causing the
request to unexpectedly succeed - and the test to fail.

So this patch just drops the "/.." check. A few other nonsense URLs are
attempted by the test - e.g., "/abc".

Fixes #5321

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
63d4590336 storage_proxy: move digest_algorithm upper
We need it for PAXOS.

Mark it as static inline while we are at it.
2019-11-24 11:35:29 +02:00
Nadav Har'El
43d3e8adaf alternator: make DescribeTable return table schema
One of the fields still missing in DescribeTable's response (Refs #5026)
was the table's schema - KeySchema and AttributeDefinitions.

This patch adds this missing feature, and enables the previously-xfailing
test test_describe_table_schema.

A complication of this patch is that in a table with secondary indexes,
we need to return not just the base table's schema, but also the indexes'
schema. The existing tests did not cover that feature, so we add here
two more tests in test_gsi.py for that.

One of these secondary-index schema tests, test_gsi_2_describe_table_schema,
still fails, because it outputs a range-key which Scylla added to a view
because of its own implementation needs, but wasn't in the user's
definition of the GSI. I opened a separate issue #5320 for that.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
f5c2a23118 serializer: add reference_wrapper handling
Serialize reference_wrapper<T> as T and make sure is_equivalent<> treats
reference_wrapper<T> wrapped in std::optional<> or std::variant<>, or
std::tuple<> as T.

We need it to avoid copying query::result while serializing
paxos::promise.
2019-11-24 11:35:29 +02:00
Botond Dénes
89f9b89a89 scylla-gdb.py: scylla task_histogram: scan all tasks with -a or -s 0
Currently even if `-a` or `-s 0` is provided, `scylla task_histogram`
will scan a limited amount of pages due to a bug in the scan loop's stop
condition, which will be trigger a stop once the default sample limit is
reached. Fix the loop by skipping this check when the user wants to scan
all tasks.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191121141706.29476-1-bdenes@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
1452653fbc query_context: fix use after free of timeout_config in execute_cql_with_timeout
timeout_config is used by reference by cql3::query_processor::process(),
see cql3::query_options, so the caller must make sure it doesn't go away.
2019-11-24 11:35:29 +02:00
Avi Kivity
ff7e78330c tools: toolchain: dbuild: work around "podman logs --follow" hang
At least some versions of 'podman logs --follow' hang when the
container eventually exits (also happens with docker on recent
versions). Fortunately, we don't need to use 'podman logs --follow'
and can use the more natural non-detached 'podman run', because
podman does not proxy SIGTERM and instead shuts down the container
when it receives it.

So, to work around the problem, use the same code path in interactive
and non-interactive runs, when podman is in use instead of docker.
2019-11-22 13:59:05 +02:00
Avi Kivity
702834d0e4 tools: dbuild: avoid uid/gid/selinux hacks when using podman
With docker, we went to considerable lengths to ensure that
access to mounted volume was done using the calling user, including
supplementary groups. This avoids root-owned files being left around
after a build, and ensures that access to group-shared files (like
/var/cache/ccache) works as expected.

All of this is unnecessary and broken when using podman. Podman
uses a proxy to access files on behalf of the container, so naturally
all access is done using the calling user's identity. Since it remaps
user and group IDs, assigning the host uid/gid is meaningless. Using
--userns host also breaks, because sudo no longer works.

Fix this by making all the uid/gid/selinux games specific to docker and
ignore them when using podman. To preserve the functionality of tools
that depend on $HOME, set that according to the host setting.
2019-11-22 13:58:29 +02:00
Tomasz Grabiec
9d7f8f18ab database: Avoid OOMing with flush continuations after failed memtable flush
The original fix (10f6b125c8) didn't
take into account that if there was a failed memtable flush (Refs
flush) but is not a flushable memtable because it's not the latest in
the memtable list. If that happens, it means no other memtable is
flushable as well, cause otherwise it would be picked due to
evictable_occupancy(). Therefore the right action is to not flush
anything in this case.

Suspected to be observed in #4982. I didn't manage to reproduce after
triggering a failed memtable flush.

Fixes #3717
2019-11-22 12:08:36 +01:00
Tomasz Grabiec
fb28543116 lsa: Introduce operator bool() to occupancy_stats 2019-11-22 12:08:28 +01:00
Tomasz Grabiec
a69fda819c lsa: Expose region_impl::evictable_occupancy in the region class 2019-11-22 12:08:10 +01:00
Avi Kivity
1c181c1b85 tools: dbuild: don't mount duplicate volumes
podman refuses to start with duplicate volumes, which routinely
happen if the toplevel directory is the working directory. Detect
this and avoid the duplicate.
2019-11-22 10:13:30 +02:00
Konstantin Osipov
b8b5834cf1 test.py: simplify message output in run_test() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
90a8f79d7e test.py: use UnitTest class where possible 2019-11-21 23:16:22 +03:00
Konstantin Osipov
8cd8cfc307 test.py: rename harness command line arguments to 'options'
UnitTest class uses juggles with the name 'args' quite a bit to
construct the command line for a unit test, so let's spread
the harness command line arguments from the unit test command line
arguments a bit apart by consistently calling the harness command line
arguments 'options', and unit test command line arguments 'args'.

Rename usage() to parse_cmd_line().
2019-11-21 23:16:22 +03:00
Konstantin Osipov
e5d624d055 test.py: consolidate argument handling in UnitTest constructor
Create unique UnitTest objects in find_tests() for each found match,
including repeat, to ensure each test has its own unique id.
This will also be used to store execution state in the test.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
dd60673cef test.py: move --collectd to standard args 2019-11-21 23:16:22 +03:00
Konstantin Osipov
fe12f73d7f test.py: introduce class UnitTest 2019-11-21 23:16:22 +03:00
Konstantin Osipov
bbcdee37f7 test.py: add add_test_list() to find_tests() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
4723afa09c test.py: add long tests with add_test() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
13f1e2abc6 test.py: store the non-default seastar arguments along with definition 2019-11-21 23:16:22 +03:00
Konstantin Osipov
72ef11eb79 test.py: introduce add_test() to find_tests()
To avoid code duplication, and to build upon later.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
b50b24a8a7 test.py: avoid an unnecessary loop in find_tests() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
a5103d0092 test.py: move args.repeat processing to find_tests()
It somewhat stands in the way of using asyncio

This patch also implements a more comprehensive
fix for #5303, since we not only have --repeat, but
run some tests in different configurations, in which
case xml output is also overwritten.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
0f0a49b811 test.py: introduce print_summary() and write_xunit_report()
(One more moving of the code around).
2019-11-21 23:16:22 +03:00
Konstantin Osipov
22166771ef test.py: rename test_to_run tests_to_run 2019-11-21 23:16:22 +03:00
Konstantin Osipov
1d94d9827e test.py: introduce run_all_tests() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
29087e1349 test.py: move out run_test() routine
(Trivial code refactoring.)
2019-11-21 23:16:22 +03:00
Konstantin Osipov
79506fc5ab test.py: introduce find_tests()
Trivial code refactoring.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
a44a1c4124 test.py: remove print_status_succint
(Trivial code cleanup.)
2019-11-21 23:16:22 +03:00
Konstantin Osipov
b9605c1d37 test.py: move mode list evaluation to usage() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
0c4df5a548 test.py: add usage() 2019-11-21 23:16:22 +03:00
Pavel Emelyanov
e0f40ed16a cli: Add the --workdir|-W option
When starting scylla daemon as non-root the initialization fails
because standard /var/lib/scylla is not accessible by regular users.
Making the default dir accessible for user is not very convenient
either, as it will cause conflicts if two or more instances of scylla
are in use.

This problem can be resolved by specifying --commitlog-directory,
--data-file-directories, etc on start, but it's too much typing. I
propose to revive Nadav's --home option that allows to move all the
directories under the same prefix in one go.

Unlike Nadav's approach the --workdir option doesn't do any tricky
manipulations with existing directories. Insead, as Pekka suggested,
the individual directories are placed under the workir if and only
if the respective option is NOT provided. Otherwise the directory
configuration is taken as is regardless of whether its absolute or
relative path.

The values substutution is done early on start. Avi suggested that
this is unsafe wrt HUP config re-read and proper paths must be
resolved on the fly, but this patch doesn't address that yet, here's
why.

First of all, the respective options are MustRestart now and the
substitution is done before HUP handler is installed.

Next, commitlog and data_file values are copied on start, so marking
the options as LiveUpdate won't make any effect.

Finally, the existing named_value::operator() returns a reference,
so returning a calculated (and thus temporary) value is not possible
(from my current understanding, correct me if I'm wrong). Thus if we
want the *_directory() to return calculated value all callers of them
must be patched to call something different (e.g. *_directory.get() ?)
which will lead to more confusion and errors.

Changes v3:
 - the option is --workdir back again
 - the existing *directory are only affected if unset
 - default config doesn't have any of these set
 - added the short -W alias

Changes v2:
 - the option is --home now
 - all other paths are changed to be relative

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191119130059.18066-1-xemul@scylladb.com>
2019-11-21 15:07:39 +02:00
Rafael Ávila de Espíndola
5417c5356b types: Move get_castas_fctn to cql3
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-9-espindola@scylladb.com>
2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola
f06d6df4df types: Simplify casts to string
These now just use the to_string member functions, which makes it
possible to move the code to another file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-8-espindola@scylladb.com>
2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola
786b1ec364 types: Move json code to its own file
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-7-espindola@scylladb.com>
2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola
af8e207491 types: Avoid using deserialize_value in json code
This makes it independent of internal functions and makes it possible
to move it to another file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-6-espindola@scylladb.com>
2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola
ed65e2c848 types: Move cql3_kind to the cql3 directory
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-5-espindola@scylladb.com>
2019-11-21 12:08:47 +02:00
Rafael Ávila de Espíndola
bd560e5520 types: Fix dynamic types of some data_value objects
I found these mismatched types while converting some member functions
to standalone functions, since they have to use the public API that
has more type checks.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-4-espindola@scylladb.com>
2019-11-21 12:08:46 +02:00
Rafael Ávila de Espíndola
0d953d8a35 types: Add a test for value_cast
We had no tests on when value_cast throws or when it moves the value.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-2-espindola@scylladb.com>
2019-11-21 12:08:45 +02:00
Konstantin Osipov
002ff51053 lua: make sure the latest master builds on Debian/Ubuntu
Use pkg-config to search for Lua dependencies rather
than hard-code include and link paths.

Avoid using boost internals, not present in earlier
versions of boost.

Reviewed-by: Rafael Avila de Espindola <espindola@scylladb.com>
Message-Id: <20191120170005.49649-1-kostja@scylladb.com>
2019-11-21 07:57:12 +02:00
Pavel Solodovnikov
d910899d61 configure.py: support multi-threaded linking via gold
Use `-Wl,--threads` flag to enable multi-threaded linking when
using `ld.gold` linker.

Additional compilation test is required because it depends on whether
or not the `gold` linker has been compiled with `--enable-threads` option.

This patch introduces a substantial improvement to the link times of
`scylla` binary in release and debug modes (around 30 percent).

Local setup reports the following numbers with release build for
linking only build/release/scylla:

Single-threaded mode:
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:09.30
Multi-threaded mode:
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:51.57

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191120163922.21462-1-pa.solodovnikov@scylladb.com>
2019-11-20 19:28:00 +02:00
Nadav Har'El
89d6d668cb Merge "Redis API in Scylla"
Merged patch series from Peng Jian, adding optionally-enabled Redis API
support to Scylla. This feature is experimental, and partial - the extent
of this support is detailed in docs/redis/redis.md.

Patches:
   Document: add docs/redis/redis.md
   redis: Redis API in Scylla
   Redis API: graft redis module to Scylla
   redis-test: add test cases for Redis API
2019-11-20 16:59:13 +02:00
Piotr Sarna
086e744f8f scripts/find-maintainer: refresh maintainers list
This commit attempts to make the maintainers list up-to-date
to the best of my knowledge, because it got really stale over the time.

Message-Id: <eab6d3f481712907eb83e91ed2b8dbfa0872155f.1574261533.git.sarna@scylladb.com>
2019-11-20 16:56:31 +02:00
Glauber Costa
73aff1fc95 api: export system uptime via REST
This will be useful for tools like nodetool that want to query the uptime
of the system.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190619110850.14206-1-glauber@scylladb.com>
2019-11-20 16:44:11 +02:00
Tomasz Grabiec
9a686ac551 Merge "scylla-gdb: active sstables: support k_l/mc sstable readers" from Benny
Fixes #5277
2019-11-19 23:49:39 +01:00
Avi Kivity
1164ff5329 tools: toolchain: update to Fedora 31
This is a minor update as gcc and boost versions do not change.

glibc-langpack-en no longer gets pulled in by default. As it is required
by some locale use somewhere, it is added to the explicit dependencies.
2019-11-20 00:08:30 +02:00
Avi Kivity
301c835cbf build: force xz compression on rpm binary payload
Fedora 31 switched the default compression to zstd, which isn't readable
by some older rpm distributions (CentOS 7 in particular). Tell it to use
the older xz compression instead, so packages produced on Fedora 31 can
be installed on older distributions.
2019-11-20 00:08:24 +02:00
Avi Kivity
3ebd68ef8a dist: rpm: correct systemd post-uninstall scriptlet
The post-uninstall scriptlet requires a parameter, but older versions
of rpm survived without it. Fedora 31's rpm is more strict, so supply
this parameter.
2019-11-20 00:03:49 +02:00
Peng Jian
e6adddd8ef redis-test: add test cases for Redis API
Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-20 04:56:16 +08:00
Peng Jian
f2801feb66 Redis API: graft redis module to Scylla
In this document, the detailed design and implementation of Redis API in
Scylla is provided.

v2: build: work around ragel 7 generated code bug (suggested by Avi)
    Ragel 7 incorrectly emits some unused variables that don't compile.
    As a workaround, sed them away.

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-20 04:55:58 +08:00
Peng Jian
0737d9e84d redis: Redis API in Scylla
Scylla has advantage and amazing features. If Redis build on the top of Scylla,
it has the above features automatically. It's achived great progress
in cluster master managment, data persistence, failover and replication.

The benefits to the users are easy to use and develop in their production
environment, and taking avantages of Scylla.

Using the Ragel to parse the Redis request, server abtains the command name
and the parameters from the request, invokes the Scylla's internal API to
read and write the data, then replies to client.

Signed-off-by: Peng Jian, <pengjian.uestc@gmail.com>
2019-11-20 04:55:56 +08:00
Peng Jian
708a42c284 Document: add docs/redis/redis.md
In this document, the detailed design and implementation of Redis API in
Scylla is provided.

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-11-20 04:46:33 +08:00
Nadav Har'El
9b9609c65b merge: row_marker: correct row expiry condition
Merged patch set by Piotr Dulikowski:

This change corrects condition on which a row was considered expired by its
TTL.

The logic that decides when a row becomes expired was inconsistent with the
logic that decides if a single cell is expired. A single cell becomes expired
when expiry_timestamp <= now, while a row became expired when
expiry_timestamp < now (notice the strict inequality). For rows inserted
with TTL, this caused non-key cells to expire (change their values to null)
one second before the row disappeared. Now, row expiry logic uses non-strict
inequality.

Fixes #4263,
Fixes #5290.

Tests:

    unit(dev)
    python test described in issue #5290
2019-11-19 18:14:15 +02:00
Amnon Heiman
9df10e2d4b scylla_util.py: Add optional timeout to out function
It is useful to have an option to limit the execution time of a shell
script.

This patch adds an optional timeout parameter, if a parameter will be
provided a command will return and failure if the duration is passed.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-11-19 17:30:28 +02:00
Nadav Har'El
b38c3f1288 Merge "Add separate counters for accesses to system tables"
Merged patch series from Juliusz Stasiewicz:

Welcome to my first PR to Scylla!
The task was intended as a warm-up ("noob") exercise; its description is
here: #4182 Sorry, I also couldn't help it and did some scouting: edited
descriptions of some metrics and shortened few annoyingly long LoC.
2019-11-19 15:21:56 +02:00
Piotr Dulikowski
9be842d3d8 row_marker: tests for row expiration 2019-11-19 13:45:30 +01:00
Tomasz Grabiec
5e4abd75cc main: Abort on EBADF and ENOTSOCK by default
Those are typically symptoms of use-after-free or memory corruption in
the program. It's better to catch such error sooner than later.

That situation is also dangerous since if a valid descriptor would
land under the invalid access, not the one which was intended for the
operation, then the operation may be performed on the wrong file and
result in corruption.

Message-Id: <1565206788-31254-1-git-send-email-tgrabiec@scylladb.com>
2019-11-19 13:07:33 +02:00
Piotr Dulikowski
589313a110 row_marker: correct expiration condition
This change corrects condition on which a row was considered expired by
its TTL.

The logic that decides when a row becomes expired was inconsistent with
the logic that decides if a single cell is expired. A single cell
becomes expired when `expiry_timestamp <= now`, while a row became
expired when `expiry_timestamp < now` (notice the strict inequality).
For rows inserted with TTL, this caused non-key cells to expire (change
their values to null) one second before the row disappeared. Now, row
expiry logic uses non-strict inequality.

Fixes: #4263, #5290.

Tests:
- unit(dev)
- python test described in issue #5290
2019-11-19 11:46:59 +01:00
Pekka Enberg
505f2c1008 test.py: Append test repeat cycle to output XML filename
Currently, we overwrite the same XML output file for each test repeat
cycle. This can cause invalid XML to be generated if the XML contents
don't match exactly for every iteration.

Fix the problem by appending the test repeat cycle in the XML filename
as follows:

  $ ./test.py --repeat 3 --name vint_serialization_test --mode dev --jenkins jenkins_test

  $ ls -1 *.xml
  jenkins_test.release.vint_serialization_test.0.boost.xml
  jenkins_test.release.vint_serialization_test.1.boost.xml
  jenkins_test.release.vint_serialization_test.2.boost.xml


Fixes #5303.

Message-Id: <20191119092048.16419-1-penberg@scylladb.com>
2019-11-19 11:30:47 +02:00
Rafael Ávila de Espíndola
750adee6e3 lua: fix build with boost 1.67 and older vs fmt
It is not completely clear why the fmt base code fails with boost
1.67, but it is easy to avoid.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191118210540.129603-1-espindola@scylladb.com>
2019-11-19 11:14:00 +02:00
Tomasz Grabiec
ff567649fa Merge "gossip: Limit number of pending gossip ACK and ACK2 messages" from Asias
In a cross-dc large cluster, the receiver node of the gossip SYN message
might be slow to send the gossip ACK message. The ack messages can be
large if the payload of the application state is big, e.g.,
CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK
message can consume unlimited amount of memory which causes OOM
eventually.

To fix, this patch queues the SYN message and handles it later if the
previous ACK message is still being sent. However, we only store the
latest SYN message. Since the latest SYN message from peer has the
latest information, so it is safe to drop the previous SYN message and
keep the latest one only. After this patch, there can be at most 1
pending SYN message and 1 pending ACK message per peer node.
2019-11-18 10:52:38 +01:00
Benny Halevy
f9e93bba38 sstables: compaction: move cleanup parameter to compaction_descriptor
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191117165806.3234-1-bhalevy@scylladb.com>
2019-11-18 10:52:20 +01:00
Avi Kivity
1fe062aed4 Merge "Add basic UDF support" from Rafael
"

This patch series adds only UDF support, UDA will be in the next patch series.

With this all CQL types are mapped to Lua. Right now we setup a new
lua state and copy the values for each argument and return. This will
be optimized once profiled.

We require --experimental to enable UDF in case there is some change
to the table format.
"

* 'espindola/udf-only-v4' of https://github.com/espindola/scylla: (65 commits)
  Lua: Document the conversions between Lua and CQL
  Lua: Implement decimal subtraction
  Lua: Implement decimal addition
  Lua: Implement support for returning decimal
  Lua: Implement decimal to string conversion
  Lua: Implement decimal to floating point conversion
  Lua: Implement support for decimal arguments
  Lua: Implement support for returning varint
  Lua: Implement support for returning duration
  Lua: Implement support for duration arguments
  Lua: Implement support for returning inet
  Lua: Implement support for inet arguments
  Lua: Implement support for returning time
  Lua: Implement support for time arguments
  Lua: Implement support for returning timeuuid
  Lua: Implement support for returning uuid
  Lua: Implement support for uuid and timeuuid arguments
  Lua: Implement support for returning date
  Lua: Implement support for date arguments
  Lua: Implement support for returning timestamp
  ...
2019-11-17 16:38:19 +02:00
Konstantin Osipov
48f3ca0fcb test.py: use the configured build modes from ninja mode_list
Add mode_list rule to ninja build and use it by default when searching
for tests in test.py.

Now it is no longer necessary to explicitly specify the test mode when
invoking test.py.

(cherry picked from commit a211ff30c7f2de12166d8f6f10d259207b462d4b)
2019-11-17 13:42:10 +01:00
Nadav Har'El
2fb2eb27a2 sstables: allow non-traditional characters in table name
The goal of this patch is to fix issue #5280, a rather serious Alternator
bug, where Scylla fails to restart when an Alternator table has secondary
indexes (LSI or GSI).

Traditionally, Cassandra allows table names to contain only alphanumeric
characters and underscores. However, most of our internal implementation
doesn't actually have this restriction. So Alternator uses the characters
':' and '!' in the table names to mark global and local secondary indexes,
respectively. And this actually works. Or almost...

This patch fixes a problem of listing, during boot, the sstables stored
for tables with such non-traditional names. The sstable listing code
needlessly assumes that the *directory* name, i.e., the CF names, matches
the "\w+" regular expression. When an sstable is found in a directory not
matching such regular expression, the boot fails. But there is no real
reason to require such a strict regular expression. So this patch relaxes
this requirement, and allows Scylla to boot with Alternator's GSI and LSI
tables and their names which include the ":" and "!" characters, and in
fact any other name allowed as a directory name.

Fixes #5280.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191114153811.17386-1-nyh@scylladb.com>
2019-11-17 14:27:47 +02:00
Shlomi Livne
3e873812a4 Document backport queue and procedure (#5282)
This document adds information about how fixes are tracked to be
backported into releases and what is the procedure that is followed to
backport those fixes.

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2019-11-17 01:45:24 -08:00
Benny Halevy
c215ad79a9 scylla-gdb: resolve: add startswith parameter
Allow filtering the resolved addresses by a startswith string.

The common use case if for resolving vtable ptrs, when resolving
the output of `find_vptrs` that may be too long for the host
(running gdb) memory size. In this case the number of vtable
ptrs is considerably smaller than the total number of objects
returned by find_ptrs (e.g. 462 vs. 69625 in a OOM core I
examined from scylla --smp=2 --memory=1024M)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-11-17 11:40:54 +02:00
Benny Halevy
2f688dcf08 scylla-gdb.py: find_single_sstable_readers: fix support for sstable_mutation_reader
provide template arguments for k_l and m readers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-11-17 11:02:05 +02:00
Kamil Braun
a67e887dea sstables: fix sstable file I/O CQL tracing when reading multiple files (#5285)
CQL tracing would only report file I/O involving one sstable, even if
multiple sstables were read from during the query.

Steps to reproduce:

create a table with NullCompactionStrategy
insert row, flush memtables
insert row, flush memtables
restart Scylla
tracing on
select * from table
The trace would only report DMA reads from one of the two sstables.

Kudos to @denesb for catching this.

Related issue: #4908
2019-11-17 00:38:37 -08:00
Tomasz Grabiec
a384d0af76 Merge "A set of cleanups over main() code" from Pavel E.
There are ... signs of massive start/stop code rework in the
main() function. While fixing the sub-modules interdependencies
during start/stop I've polished these signs too, so here's the
simplest ones.
2019-11-15 15:25:18 +01:00
Pavel Emelyanov
1dc490c81c tracing: Move register_tracing_keyspace_backend forward decl into proper header
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
7e81df71ba main: Shorten developer_mode() evaluation
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
1bd68d87fc main: Do not carry pctx all over the code
v2:
- do not use struct initialization extention

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
655b6d0d1e main: Hide start_thrift
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
26f2b2ce5e main,db: Kill some unused .hh includes
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
f5b345604f main: Factor out get_conf_sub
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
924d52573d main: Remove unused return_value variable (and capture)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
2195edb819 gitignore: Add tags file
This file is generated by ctags utility for navigation, so it
is not to be tracked by git.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191031221339.19030-1-xemul@scylladb.com>
2019-11-14 16:50:11 +01:00
Gleb Natapov
e0668f806a lwt: change format of partition key serialization for system.paxos table
Serialize provided partition_key in such a way that the serialized value
will hash to the same token as the original key. This way when system.paxos
table is updated the update is shard local.

Message-Id: <20191114135449.GU10922@scylladb.com>
2019-11-14 15:07:16 +01:00
Avi Kivity
19b665ea6b Merge "Correctly handle null/unset frozen collection/UDT columns in INSERT JSON." from Kamil
"
When using INSERT JSON with frozen collection/UDT columns, if the columns were left unspecified or set to null, the statement would create an empty non-null value for these columns instead of using null values as it should have. For example:

cqlsh:b> create table t (k text primary key, l frozen<list<int>>, m frozen<map<int, int>>, s frozen<set<int>>, u frozen<ut>);
cqlsh:b> insert into t JSON '{"k": "insert_json"}';
cqlsh:b> select * from t;
 k                 | l    | m    | s    | u
-------------------+------+------+------+------
       insert_json |     [] |     {} |     {} |

This PR fixes this.
Resolves #5246 and closes #5270.
"

* 'frozen-json' of https://github.com/kbr-/scylla:
  tests: add null/unset frozen collection/UDT INSERT JSON test
  cql3: correctly handle frozen null/unset collection/UDT columns in INSERT JSON
  cql3: decouple execute from term binding in user_type::setter
2019-11-14 15:29:30 +02:00
Avi Kivity
4544aa0b34 Update seastar submodule
* seastar 75e189c6ba...6f0ef32514 (6):
  > Merge "Add named semaphores" from Piotr
  > parallel_for_each_state: pass rvalue reference to add_future
  > future: Pass rvalue to uninitialized_wrapper::uninitialized_set.
  > dependencies: Add libfmt-dev to debian
  > log: Fix logger behavior when logging both to stdout and syslog.
  > README.md: list Scylla among the projects using Seastar
2019-11-14 15:01:18 +02:00
Juliusz Stasiewicz
1cfa458409 metrics: separate counters for `system' KS accesses
Resolves #4182. Metrics per system tables are accumulated separately,
depending on the origin of query (DB internals vs clients).
2019-11-14 13:14:39 +01:00
Vladimir Davydov
ab42b72c6d cql: fix SERIAL consistency check for batch statements
If CONSISTENCY is set to SERIAL or LOCAL SERIAL, all write requests must
fail according to Cassandra's documentation. However, batched writes
bypass this check. Fix this.
2019-11-14 12:15:39 +01:00
Vladimir Davydov
25aeefd6f3 cql: fix CAS consistency level validation
This patch resurrects Cassandra's code validating a consistency level
for CAS requests. Basically, it makes CAS requests use a special
function instead of validate_for_write to make error messages more
coherent.

Note, we don't need to resurrect requireNetworkTopologyStrategy as
EACH_QUORUM should work just fine for both CAS and non-CAS writes.
Looks like it is just an artefact of a rebase in the Cassandra
repository.
2019-11-14 12:15:39 +01:00
Juliusz Stasiewicz
b1e4d222ed cql3: cosmetics - improved description of metrics 2019-11-14 10:35:42 +01:00
Avi Kivity
cd075e9132 reloc: do not install dependencies when building the relocatable package
The dependencies are provided by the frozen toolchain. If a dependency
is missing, we must update the toolchain rather than rely on build-time
installation, which is not reproducible (as different package versions
are available at different times).

Luckily "dnf install" does not update an already-installed package. Had
that been a case, none of our builds would have been reproducible, since
packages would be updated to the latest version as of the build time rather
than the version selected by the frozen toolchain.

So, to prevent missing packages in the frozen toolchain translating to
an unreproducible build, remove the support for installing dependencies
from reloc/build_reloc.sh. We still parse the --nodeps option in case some
script uses it.

Fixes #5222.

Tests: reloc/build_reloc.sh.
2019-11-14 09:37:14 +02:00
Gleb Natapov
552c56633e storage_proxy: do not release mutation if not all replies were received
MV backpressure code frees mutation for delayed client replies earlier
to save memory. The commit 2d7c026d6e that
introduced the logic claimed to do it only when all replies are received,
but this is not the case. Fix the code to free only when all replies
are received for real.

Fixes #5242

Message-Id: <20191113142117.GA14484@scylladb.com>
2019-11-13 16:23:19 +02:00
Raphael S. Carvalho
3e70523111 distributed_loader: Release disk space of SSTables deleted by resharding
Resharding is responsible for the scheduling the deletion of sstables
resharded, but it was not refreshing the cache of the shards those
sstables belong to, which means cache was incorrectly holding reference
to them even after they were deleted. The consequence is sstables
deleted by resharding not having their disk space freed until cache
is refreshed by a subsequent procedure that triggers it.

Fixes #5261.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20191107193550.7860-1-raphaelsc@scylladb.com>
2019-11-13 16:03:27 +02:00
Avi Kivity
6aed3b7471 Merge "cql: trivial cleanup" from Vova
* 'cql-trivial-cleanup' of ssh://github.com/scylladb/scylla-dev:
  cql: rename modification_statement::_sets_a_collection to _selects_a_collection
  cql: rename _column_conditions to _regular_conditions
  cql: remove unnecessary optional around prefetch_data
2019-11-13 15:12:10 +02:00
Avi Kivity
1cb9f9bdfe Merge "Use a fixed-size bitset for column set" from Kostja
"
Use a fixed-size, rather than a dynamically growing
bitset for column mask. This avoids unnecessary memory
reallocation in the most common case.
"

* 'column_set' of ssh://github.com/scylladb/scylla-dev:
  schema: pre-allocate the bitset of column_set
  schema: introduce schema::all_columns_count()
  schema: rename column_mask to column_set
2019-11-13 15:08:13 +02:00
Tomasz Grabiec
f68e17eb52 Merge "Partition/row hit/miss counters for memtable write operations" from Piotr D.
Adds per-table metrics for counting partition and row reuse
in memtables. New metrics are as follows:
    - memtable_partition_writes - number of write operations performed
          on partitions in memtables,
    - memtable_partition_hits - number of write operations performed
          on partitions that previously existed in a memtable,
    - memtable_row_writes - number of row write operations performed
          in memtables,
    - memtable_row_hits - number of row write operations that ovewrote
          rows previously present in a memtable.

Tests: unit(release)
2019-11-13 13:11:51 +01:00
Juliusz Stasiewicz
8318a6720a cql3: error msg w/ arg counts for prepared stmts with wrong arg cnt
Fixes #3748. Very small change: added argument count (expectation vs. reality)
to error msg within `invalid_request_exception'.
2019-11-13 13:43:37 +02:00
Nadav Har'El
ccb9038c69 alternator: Implement Expected operators LT and GT
Merged patch series from Dejan Mircevski. Implements the "LT" and "GT"
operators of the Expected update option (i.e., conditional updates),
and enables the pre-existing tests for them.
2019-11-13 12:07:44 +02:00
Konstantin Osipov
6159c012db schema: pre-allocate the bitset of column_set
The number of columns is usually small, and avoiding
a resize speeds up bit manipulation functions.
2019-11-13 11:41:51 +03:00
Konstantin Osipov
e95d675567 schema: introduce schema::all_columns_count()
schema::all_columns_count() will be used to reserve
memory of the column_set bitmask.
2019-11-13 11:41:42 +03:00
Konstantin Osipov
191acec7ab schema: rename column_mask to column_set
Since it contains a precise set of columns, it's more
accurate to call it a set, not a mask. Besides, the name
column_mask is already used for column options on storage
level.
2019-11-13 11:41:30 +03:00
Kamil Braun
d6446e352e tests: add null/unset frozen collection/UDT INSERT JSON test
When using INSERT JSON with null/unspecified frozen collection/UDT
columns, the columns should be set to null.

See #5270.
2019-11-12 18:24:47 +01:00
Vladimir Davydov
8110178e5d cql: rename modification_statement::_sets_a_collection to _selects_a_collection
This is merely to avoid confusion: we use _sets prefix to indicate that
there are operations over static/regular columns (_sets_static_columns,
_sets_regular_columns), but _sets_a_collection is set for both operations
and conditions. So let's rename it to _selects_a_collection and add some
comments.
2019-11-12 20:15:42 +03:00
Vladimir Davydov
a19192950e cql: rename _column_conditions to _regular_conditions
It's weird that modification_statement has _static_conditions for
conditions on static columns and _column_conditions for conditions on
regular columns, as if conditions on static columns are not column
conditions. Let's rename _column_conditions to _regular_conditions to
avoid confusion.
2019-11-12 20:15:35 +03:00
Konstantin Osipov
0ad0369684 cql: remove unnecessary optional around prefetch_data 2019-11-12 20:15:24 +03:00
Kamil Braun
6c04c5bed5 cql3: correctly handle frozen null/unset collection/UDT columns in INSERT JSON
Before this commit, an empty non-null value was created for
frozen collection/UDT columns when an INSERT JSON statement was executed
with the value left unspecified or set to null.
This was incompatible with Cassandra which inserted a null (dead cell).

Fixes #5270.
2019-11-12 18:05:01 +01:00
Kamil Braun
0ad7d71f31 cql3: decouple execute from term binding in user_type::setter
This commit makes it possible to pass a bound value terminal
directly to the setter.
Continuation of commit bfe3c20035.
2019-11-12 18:02:21 +01:00
Takuya ASADA
614ec6fc35 install.sh: drop --pkg option, use .install file on .deb package
--pkg option on install.sh is introduced for .deb packaging since it requires
different install directory for each subpackage.
But we actually able to use "debian/tmp" for shared install directory,
then we can specify file owner of the package using .install files.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191030203142.31743-1-syuu@scylladb.com>
2019-11-12 16:50:37 +02:00
Piotr Dulikowski
59fbbb993f memtables: add partition/row hit/miss counters
Adds per-table metrics for counting partition and row reuse
in memtables. New metrics are as follows:
    - memtable_partition_writes - number of write operations performed
          on partitions in memtables,
    - memtable_partition_hits - number of write operations performed
          on partitions that previously existed in a memtable,
    - memtable_row_writes - number of row write operations performed
          in memtables,
    - memtable_row_hits - number of row write operations that ovewrote
          rows previously present in a memtable.

Tests: unit(release)
2019-11-12 13:35:41 +01:00
Piotr Dulikowski
48f7b2e4fb table: move out table::stats to table_stats
This change was done in order to be able to forward-declare
the table::stats structure.
2019-11-12 13:35:41 +01:00
Avi Kivity
cf7291462d Merge "cql3/functions: add missing min/max/count functions for ascii type" from Piotr
"
Adds missing overloads of functions count, min, max for type ascii.

Now they work:

cqlsh> CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
cqlsh> USE ks;
cqlsh:ks> CREATE TABLE test_ascii (id int PRIMARY KEY, value ascii);
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (0, 'abcd');
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (1, 'efgh');
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (2, 'ijkl');
cqlsh:ks> SELECT * FROM test_ascii;

 id | value
----+-------
  1 |  efgh
  0 |  abcd
  2 |  ijkl

(3 rows)
cqlsh:ks> SELECT count(value) FROM test_ascii;

 system.count(value)
---------------------
                   3

(1 rows)
cqlsh:ks> SELECT min(value) FROM test_ascii;

 system.min(value)
-------------------
              abcd

(1 rows)
cqlsh:ks> SELECT max(value) FROM test_ascii;

 system.max(value)
-------------------
              ijkl

(1 rows)
Tests:

unit(release)
cql_group_functions_tests.py (with added check for ascii type)

Fixes #5147.
"

* '5147-fix-min-max-count-for-ascii' of https://github.com/piodul/scylla:
  tests/cql_query_test: add aggregate functions test
  cql3/functions: add missing min/max/count for ascii
2019-11-12 14:15:14 +02:00
Piotr Dulikowski
41cb16a526 tests/cql_query_test: add aggregate functions test
Adds a test for min, max and avg functions for those primitive types for
which those functions are working at the moment.
2019-11-12 13:01:34 +01:00
Piotr Dulikowski
6d78d7cc69 cql3/functions: add missing min/max/count for ascii
Adds missing overloads of functions `count`, `min`, `max` for
type `ascii`. Now they work:

cqlsh> CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy',
'replication_factor': 1};
cqlsh> USE ks;
cqlsh:ks> CREATE TABLE test_ascii (id int PRIMARY KEY, value ascii);
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (0, 'abcd');
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (1, 'efgh');
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (2, 'ijkl');
cqlsh:ks> SELECT * FROM test_ascii;

 id | value
----+-------
  1 |  efgh
  0 |  abcd
  2 |  ijkl

(3 rows)
cqlsh:ks> SELECT count(value) FROM test_ascii;

 system.count(value)
---------------------
                   3

(1 rows)
cqlsh:ks> SELECT min(value) FROM test_ascii;

 system.min(value)
-------------------
              abcd

(1 rows)
cqlsh:ks> SELECT max(value) FROM test_ascii;

 system.max(value)
-------------------
              ijkl

(1 rows)

Tests:
- unit(release)
- cql_group_functions_tests.py (with added check for `ascii` type)

Fixes #5147.
2019-11-12 13:01:34 +01:00
Rafael Ávila de Espíndola
10bcbaf348 Lua: Document the conversions between Lua and CQL
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
6ffddeae5e Lua: Implement decimal subtraction
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
aba8e531d1 Lua: Implement decimal addition
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
bb84eabbb3 Lua: Implement support for returning decimal
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
bc17312a86 Lua: Implement decimal to string conversion
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
e83d5bf375 Lua: Implement decimal to floating point conversion
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b568bf4f54 Lua: Implement support for decimal arguments
This is just the minimum to pass a value to Lua. Right now you can't
actually do anything with it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
6c3f050eb4 Lua: Implement support for returning varint
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
dc377abd68 Lua: Implement support for returning duration
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
c3f021d2e4 Lua: Implement support for duration arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9208b2f498 Lua: Implement support for returning inet
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
64be94ab01 Lua: Implement support for inet arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
faf029d472 Lua: Implement support for returning time
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
772f2a4982 Lua: Implement support for time arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
484f498534 Lua: Implement support for returning timeuuid
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9c2daf6554 Lua: Implement support for returning uuid
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ae1a1a4085 Lua: Implement support for uuid and timeuuid arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
f8aeed5beb Lua: Implement support for returning date
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
384effa54b Lua: Implement support for date arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
63bc960152 Lua: Implement support for returning timestamp
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ee95756f62 Lua: Implement support for timestamp arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
1c6d5507b4 Lua: Implement support for returning counter
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
0d9d53b5da Lua: Implement support for counter arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
74c4e58b6b Lua: Add a test for nested types.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b226511ce8 Lua: Implement support for returning maps
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
5c8d1a797f Lua: Implement support for map arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b5b15ce4e6 Lua: Implement support for returning set
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
cf7ba441e4 Lua: Implement support for set arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
02f076be43 Lua: Implement support for returning udt
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
92c8e94d9a Lua: Implement support for udt arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
a7c3f6f297 Lua: Implement support for returning list
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
688736f5ff Lua: Implement support for returning tuple
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ab5708a711 Lua: Implement support for list and tuple arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
534f29172c Lua: Implement support for returning boolean
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b03c580493 Lua: Implement support for boolean arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
dcfe397eb6 Lua: Implement support for returning floating point
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
cf4b7ab39a Lua: Implement support for returning blob
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
3d22433cd4 Lua: Implement support for blob arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
dd754fcf01 Lua: Implement support for returning ascii
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
affb1f8efd Lua: Implement support for returning text
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
f8ed347ee7 Lua: Implement support for string arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
0e4f047113 Lua: Implement a visitor for return values
This adds support for all integer types. Followup commits will
implement the missing types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
34b770e2fb Lua: Push varint as decimal
This makes it substantially simpler to support both varint and
decimal, which will be implemented in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9b3cab8865 Lua: Implement support for varint to integer conversion
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
5a40264d97 Lua: Implement support for varint arguments
Right now it is not possible to do anything with the value.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
3230b8bd86 Lua: Implement support for floating point arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9ad2cc2850 Lua: Implement a visitor for arguments
With this we support all simple integer types. Followup patches will
implement the missing types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ee1d87a600 Lua: Plug in the interpreter
This add a wrapper around the lua interpreter so that function
executions are interruptible and return futures.

With this patch it is possible to write and use simple UDFs that take
and return integer values.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
bc3bba1064 Lua: Add lua.cc and lua.hh skeleton files
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
7015e219ca Lua: Link with liblua
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
61200ebb04 Lua: Add config options
This patch just adds the config options that we will expose for the
lua runtime.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
d9337152f3 Use threads when executing user functions
This adds a requires_thread predicate to functions and propagates that
up until we get to code that already returns futures.

We can then use the predicate to decide if we need to use
seastar::async.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
52b48b415c Test that schema digests with UDFs don't change
This refactors test_schema_digest_does_not_change to also test a
schema with user defined functions and user defined aggregates.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
fc72a64c67 Add schema propagation and storage for UDF
With this it is possible to create user defined functions and
aggregates and they are saved to disk and the schema change is
propagated.

It is just not possible to call them yet.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ce6304d920 UDF: Add a feature and config option to track if udf is enabled
It can only be enabled with --experimental.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:40:47 -08:00
Rafael Ávila de Espíndola
dd17dfcbef Reject "OR REPLACE ... IF NOT EXISTS" in the grammar
The parser now rejects having both OR REPLACE and IF NOT EXISTS in the
same statement.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
e7e3dab4aa Convert UDF parsing code to c++
For now this just constructs the corresponding c++ classes.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
5c45f3b573 Update UDF syntax
This updates UDF syntax to the current specification.

In particular, this removes DETERMINISTIC and adds "CALLED ON NULL
INPUT" and "RETURNS NULL ON NULL INPUT".

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
c75cd5989c transport: Add support for FUNCTION and AGGREGATE to schema_change
While at it, modernize the code a bit and add a test.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
dac3cf5059 Clear functions between cql_test_env runs
At some point we should make the function list non static, but this
allows us to write tests for now.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
de1a970b93 cql: convert functions to add, remove and replace functions
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
33f9d196f9 Add iterator version of functions::find
This avoids allocating a std::vector and is more flexible since the
iterator can be passed to erase.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
7f9dadee5c Implement functions::type_equals.
Since the types are uniqued we can just use ==.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
5cef5a1b38 types: Add a friend visitor over data_value
This is a simple wrapper that allows code that is not in the types
hierarchy to visit a data_value.

Will be used by UDF.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
9bf9a84e4d types: Move the data_value visitor to a header
It will be used by the UDF implementation.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Yaron Kaikov
4a9b2a8d96 dist/docker: Add SCYLLA_REPO_URL argument to Dockerfile (#5264)
This change adds a SCYLLA_REPO_URL argument to Dockerfile, which defines
the RPM repository used to install Scylla from.

When building a new Docker image, users can specify the argument by
passing the --build-arg SCYLLA_REPO_URL=<url> option to the docker build
command. If the argument is not specified, the same RPM repository is
used as before, retaining the old default behavior.

We intend to use this in release engineering infrastructure to specify
RPM repositories for nightly builds of release branches (for example,
3.1.x), which are currently only using the stable RPMs.
2019-11-07 09:21:05 +02:00
Pavel Emelyanov
486e3f94d0 deps: Add libunistring-dev to debian
With this, previous patch to seastar and (suddenly) xenial repo for
scylla-libthrift010-dev scylla-antlr35-c++-dev the build on debian
buster finally passes.

Signed-off-by: Pavel Emelyanov <xemul@scyladb.com>
Message-Id: <CAHTybb-QFyJ7YQW0b6pjhY_xUr-_b1w_O3K1=1FOwrNM55BkLQ@mail.gmail.com>
2019-11-01 09:03:39 +02:00
Dejan Mircevski
859883b31d alternator: Implement GT operator in Expected
Add cmp_gt and use it in check_compare() to handle the GT case.  Also
reactivate GT tests.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-31 17:18:22 -04:00
Dejan Mircevski
0f7d837757 alternator: Factor out check_compare()
Code for check_LT(), check_GT(), etc. will be nearly identical, so
factor it out into a single function that takes a comparator object.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-31 17:01:29 -04:00
Dejan Mircevski
a47b768959 alternator: Implement LT operator in Expected
Add check_LT() function and reactivate LT tests.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-31 16:07:29 -04:00
Dejan Mircevski
ceae3c182f alternator: Overload base64_decode on rjson::value
In 1ca9dc5d47, it was established that the correct way to
base64-decode a JSON value is via string_view, rather than directly
from GetString().

This patch adds a base64_decode(rjson::value) overload, which
automatically uses the correct procedure.  It saves typing, ensures
correctness (fixing one incorrect call found), and will come in handy
for future EXPECTED comparisons.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-31 15:56:03 -04:00
Dejan Mircevski
9955f0342f alternator: Make unwrap_number() visible
unwrap_number() is now a public function in serialization.hh instead
of a static function visible only in executor.cc.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-31 10:46:30 -04:00
Nadav Har'El
3f859adebd Merge: Fix filtering static columns on empty partitions
Merged patch series from Piotr Sarna:

An otherwise empty partition can still have a valid static column.
Filtering didn't take that fact into account and only filtered
full-fledged rows, which may result in non-matching rows being returned
to the client.

Fixes #5248
2019-10-31 10:50:21 +02:00
Pavel Emelyanov
5fe4757725 docs: The scylla's dpdk config is boolean
Docs say one can say --disable-dpdk , while it's not so. It's the seastar's
configure.py that has tristate -dpdk option, the scylla's one can only be
enabled.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Message-Id: <CAHTybb-rxP8DbH-wW4Zf-w89iuCirt6T6-PjZAUfVFj7C5yb=A@mail.gmail.com>
2019-10-31 10:12:17 +02:00
Vladimir Davydov
9ea8114f8c cql: fix CAS metric label
"type" label is already in use for the counter type ("derive", "gauge",
etc). Using the same label for "cas" / "non-cas" overwrites it. Let's
instead call the new label "conditional" and use "yes" / "no" for its
value, as suggested by Kostja.
Message-Id: <3082b16e4d6797f064d58da95fb4e50b59ab795c.1572451480.git.vdavydov@scylladb.com>
2019-10-30 17:14:17 +01:00
Avi Kivity
398c482cd0 Merge "combined reader gallop mode" from Piotr
"
In case when a single reader contributes a stream of fragments and keeps winning over other readers, mutation_reader_merger will enter gallop mode, in which it is assumed that the reader will keep winning over other readers. Currently, a reader needs to contribute 3 fragments to enter that mode.

In gallop mode, fragments returned by the galloping reader will be compared with the best fragment from _fragment_heap. If it wins, the fragment is directly returned. Otherwise, gallop mode ends and merging performed as in general case, which involves heap operations.

In current implementation, when the end of partition is encountered while in gallop mode, the gallop mode is ended unconditionally.

A microbenchmark was added in order to test performance of the galloping reader optimization. A combining reader that merges results from four other readers is created. Each sub-reader provides a range of 32 clustering rows that is disjoint from others. All sub-readers return rows from the same partition. An improvement can be observed after introducing the galloping reader optimization.

As for other benchmarks from the "combined" group, results are pretty close to the old ones. The only one that seems to have suffered slightly is combined.many_overlapping.

Median times from a single run of perf_mutation_readers.combined: (1s run duration, 5 runs per benchmark, release mode)

test name                            before    after     improvement
one_row                              49.070ns  48.287ns  1.60%
single_active                        61.574us  61.235us  0.55%
many_overlapping                     488.193us 514.977us -5.49%
disjoint_interleaved                 57.462us  57.111us  0.61%
disjoint_ranges                      56.545us  56.006us  0.95%
overlapping_partitions_disjoint_rows 127.039us 80.849us  36.36%
Same results, normalized per mutation fragment:

test name                            before   after    improvement
one_row                              16.36ns  16.10ns  1.60%
single_active                        109.46ns 108.86ns 0.55%
many_overlapping                     216.97ns 228.88ns -5.49%
disjoint_interleaved                 102.15ns 101.53ns 0.61%
disjoint_ranges                      100.52ns 99.57ns  0.95%
overlapping_partitions_disjoint_rows 246.38ns 156.80ns 36.36%
Tested on AMD Ryzen Threadripper 2950X @ 3.5GHz.

Tests: unit(release)
Fixes #3593.
"

* '3593-combined_reader-gallop-mode' of https://github.com/piodul/scylla:
  mutation_reader: gallop mode microbenchmark
  mutation_reader: combined reader gallop tests
  mutation_reader: gallop mode for combined reader
  mutation_reader: refactor prepare_next
2019-10-30 17:34:47 +02:00
Piotr Sarna
dd00470a44 tests: add a test case for filtering on static columns
The test case covers filtering with an empty partition.

Refs #5248
2019-10-30 15:34:10 +01:00
Piotr Sarna
ca6fe598ec cql3: fix filtering on a static column for empty partitions
An otherwise empty partition can still have a valid static column.
Filtering didn't take that fact into account and only filtered
full-fledged rows, which may result in non-matching rows being returned
to the client.

Fixes #5248
2019-10-30 15:31:54 +01:00
Tomasz Grabiec
9da3aec115 Merge "Mutation diff improvements" from Benny
- accept diff_command option
 - standard input support
2019-10-30 13:40:58 +01:00
Tomasz Grabiec
0d9367e08f Merge "Scyllatop: one pass update of multiple metrics" from Benny
Update previous results dictionary using the update_metrics method.
It calls metric_source.query_list to get a list of results (similar to discover()) then for each line in the response it updates results dictionary.

New results may be appeneded depending on the do_append parameter (True by default).

Previously, with prometheous, each metric.update called query_list resulting in O(n^2) when all metric were updated, like in the scylla_top dtest - causing test timeout when testing debug build.
(E.g. dtest-debug/216/testReport/scyllatop_test/TestScyllaTop/default_start_test/)
2019-10-30 13:38:39 +01:00
Tomasz Grabiec
b7b0a53b50 Merge "Add metrics for light-weigth transactions" from Vova
This patch set adds metrics useful for analyzing light-weight
transaction performance. The same metrics are available in Cassandra.
2019-10-30 12:09:03 +01:00
Vladimir Davydov
f0075ba845 cql: account cas requests separately
This patch adds "type" label to the following CQL metrics:

  inserts
  updates
  deletes
  batches
  statements_in_batches

The label is set to "cas" for conditional statements and "non-cas" for
unconditional statements.

Note, for a batch to be accounted as CAS, it is enough to have just one
conditional statement. In this case all statements within the batch are
accounted as CAS as well.
2019-10-30 13:44:35 +03:00
Piotr Dulikowski
81883a9f2e mutation_reader: gallop mode microbenchmark
This microbenchmark tests performance of the galloping reader
optimization. A combining reader that merges results from four other
readers is created. Each sub-reader provides a range of 32 clustering
rows that is disjoint from others. All sub-readers return rows from
the same partition. An improvement can be observed after introducing the
galloping reader optimization.

As for other benchmarks from the "combined" group, results are pretty
close to the old ones. The only one that seems to have suffered slightly
is combined.many_overlapping.

Median times from a single run of perf_mutation_readers.combined:
(1s run duration, 5 runs per benchmark, release mode)

test name                            before    after     improvement
one_row                              49.070ns  48.287ns  1.60%
single_active                        61.574us  61.235us  0.55%
many_overlapping                     488.193us 514.977us -5.49%
disjoint_interleaved                 57.462us  57.111us  0.61%
disjoint_ranges                      56.545us  56.006us  0.95%
overlapping_partitions_disjoint_rows 127.039us 80.849us  36.36%

Same results, normalized per mutation fragment:

test name                            before   after    improvement
one_row                              16.36ns  16.10ns  1.60%
single_active                        109.46ns 108.86ns 0.55%
many_overlapping                     216.97ns 228.88ns -5.49%
disjoint_interleaved                 102.15ns 101.53ns 0.61%
disjoint_ranges                      100.52ns 99.57ns  0.95%
overlapping_partitions_disjoint_rows 246.38ns 156.80ns 36.36%

Tested on AMD Ryzen Threadripper 2950X @ 3.5GHz.
2019-10-30 09:51:18 +01:00
Piotr Dulikowski
29d6842db9 mutation_reader: combined reader gallop tests 2019-10-30 09:51:18 +01:00
Piotr Dulikowski
2b4ca0c562 mutation_reader: gallop mode for combined reader
In case when a single reader contributes a stream of fragments
and keeps winning over other readers, mutation_reader_merger will
enter gallop mode, in which it is assumed that the reader will keep
winning over other readers. Currently, a reader needs to contribute
3 fragments to enter that mode.

In gallop mode, fragments returned by the galloping reader will be
compared with the best fragment from _fragment_heap. If it wins, the
fragment is directly returned. Otherwise, gallop mode ends and
merging performed as in general case, which involves heap operations.

In current implementation, when the end of partition is encountered
while in gallop mode, the gallop mode is ended unconditionally.

Fixes #3593.
2019-10-30 09:51:18 +01:00
Piotr Dulikowski
2a46a09e7c mutation_reader: refactor prepare_next
Move out logic responsible for adding readers at partition boundary
into `maybe_add_readers_at_partition_boundary`, and advancing one reader
into `prepare_one`. This will allow to reuse this logic outside
`prepare_next`.
2019-10-30 09:49:12 +01:00
Avi Kivity
623071020e commitlog: change variadic stream in read_log_file to future<struct>
Since seastar::streams are based on future/promise, variadic streams
suffer the same fate as variadic futures - deprecation and eventual
removal.

This patch therefore replaces a variadic stream in commitlog::read_log_file()
with a non-variadic stream, via a helper struct.

Tests: unit (dev)
2019-10-29 19:25:12 +01:00
Botond Dénes
271ab750a6 scylla-gdb.py: add replica section to scylla memory
Recently, scylla memory started to go beyond just providing raw stats
about the occupancy of the various memory pools, to additionally also
provide an overview of the "usual suspects" that cause memory pressure.
As part of this, recently 46341bd63f
added a section of the coordinator stats. This patch continues this
trend and adds a replica section, with the "usual suspects":
* read concurrency semaphores
* execution stages
* read/write operations

Example:

    Replica:
      Read Concurrency Semaphores:
        user sstable reads:        0/100, remaining mem:      84347453 B, queued: 0
        streaming sstable reads:   0/ 10, remaining mem:      84347453 B, queued: 0
        system sstable reads:      0/ 10, remaining mem:      84347453 B, queued: 0
      Execution Stages:
        data query stage:
          03 "service_level_sg_0"             4967
             Total                            4967
        mutation query stage:
             Total                            0
        apply stage:
          03 "service_level_sg_0"             12608
          06 "statement"                      3509
             Total                            16117
      Tables - Ongoing Operations:
        pending writes phaser (top 10):
                  2 ks.table1
                  2 Total (all)
        pending reads phaser (top 10):
               3380 ks.table2
                898 ks.table1
                410 ks.table3
                262 ks.table4
                 17 ks.table8
                  2 system_auth.roles
               4969 Total (all)
        pending streams phaser (top 10):
                  0 Total (all)

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191029164817.99865-1-bdenes@scylladb.com>
2019-10-29 18:03:06 +01:00
Vladimir Davydov
e510288b6f api: wire up column_family cas-related statistics 2019-10-29 19:26:18 +03:00
Vladimir Davydov
b75862610e paxos_state: account paxos round latency
This patch adds the following per table stats:

  cas_prepare_latency
  cas_propose_latency
  cas_commit_latency

They are equivalent to CasPropose, CasPrepare, CasCommit metrics exposed
by Cassandra.
2019-10-29 19:26:18 +03:00
Vladimir Davydov
21c3c98e5b api: wire up storage_proxy cas-related statistics 2019-10-29 19:26:18 +03:00
Vladimir Davydov
c27ab87410 storage_proxy: add cas request accounting
This patch implements accounting of Cassandra's metrics related to
lightweight transactions, namely:

  cas_read_latency              transactional read latency (histogram)
  cas_write_latency             transactional write latency (histogram)
  cas_read_timeouts             number of transactional read timeouts
  cas_write_timeouts            number of transactional write timeouts
  cas_read_unavailable          number of transactional read
                                unavailable errors
  cas_write_unavailable         number of transactional write
                                unavailable errors
  cas_read_unfinished_commit    number of transaction commit attempts
                                that occurred on read
  cas_write_unfinished_commit   number of transaction commit attempts
                                that occurred on write
  cas_write_condition_not_met   number of transaction preconditions
                                that did not match current values
  cas_read_contention           how many contended reads were
                                encountered (histogram)
  cas_write_contention          how many contended writes were
                                encountered (histogram)
2019-10-29 19:25:47 +03:00
Vladimir Davydov
967a9e3967 storage_proxy: zap ballot_and_contention
Pass contention by reference to begin_and_repair_paxos(), where it is
incremented on every sleep. Rationale: we want to account the total
number of times query() / cas() had to sleep, either directly or within
begin_and_repair_paxos(), no matter if the function failed or succeeded.
2019-10-29 19:22:18 +03:00
Botond Dénes
49aa8ab8a0 scylla-gdb.py: add compatibility with Scylla 3.0
Even though every Scylla version has its own scylla-gdb.py, because we
don't backport any fixes or improvements, practically we end up always
using master's version when debugging older versions of Scylla too. This
is made harder by the fact that both Scylla's and its dependencies'
(most notably that of libstdc++ and boost) code is constantly changing
between releases, requiring edits to scylla-gdb.py to make it usable
with past releases.

This patch attempts to make it easier to use scylla-gdb.py with past
releases, more specifically Scylla 3.0. This is achieved by wrapping
problematic lines in a `try: except:` and putting the backward
compatible version in the `except:` clause. These lines have comments
with the version they provide support for, so they can be removed when
said version is not supported anymore.

I did not attempt to provide full coverage, I only fixed up problems
that surfaced when using my favourite commands with 3.0.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191029155737.94456-1-bdenes@scylladb.com>
2019-10-29 17:05:19 +01:00
Botond Dénes
e48f301e95 repair: repair_cf_range(): extract result of local checksum calculation only once
The loop that collects the result of the checksum calculations and logs
any errors. The error logging includes `checksums[0]` which corresponds
to the checksum calculation on the local node. This violates the
assumption of the code following the loop, which assumes that the future
of `checksums[0]` is intact after the loop terminates. However this is
only true when the checksum calculation is successful and is false when
it fails, as in this case the loop extracts the error and logs it. When
the code after the loop checks again whether said calculation failed, it
will get a false negative and will go ahead and attempt to extract the
value, triggering an assert failure.
Fix by making sure that even in the case of failed checksum calculation,
the result of `checksum[0]` is extracted only once.

Fixes: #5238
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191029151709.90986-1-bdenes@scylladb.com>
2019-10-29 17:00:37 +01:00
Avi Kivity
60ea29da90 Update seastar submodule
* seastar 2963970f6b...75e189c6ba (7):
  > posix-stack: Do auto-resolve of ipv6 scope iff not set for link-local dests
  > README.md: Add redpanda and smf to 'Projects using Seastar'
  > unix_domain_test: don't assume that at temporary_buffer is null terminated
  > socket_address: Use offsetof instead of null pointer
  > README: add projects using seastar section to readme
  > Adjustments for glibc 2.30 and hwloc 2.0
  > Mark future::failed() as const
2019-10-29 14:34:10 +02:00
Gleb Natapov
0e9df4eaf8 lwt: mark lwt as experimental
We may want to change paxos tables format and change internode protocol,
so hide lwt behind experimental flag for now.

Message-Id: <20191029102725.GM2866@scylladb.com>
2019-10-29 14:33:48 +02:00
Benny Halevy
79d5fed40b mutation_fragment_stream_validator: validate end of stream in partition_key filter
Currently end of stream validation is done in the destructor,
but the validator may be destructed prematurely, e.g. on
exception, as seen in https://github.com/scylladb/scylla/issues/5215

This patch adds a on_end_of_stream() method explicitly called by
consume_pausable_in_thread.  Also, the respective concepts for
ParitionFilter, MutationFragmentFilter and a new on for the
on_end_of_stream method were unified as FlattenedConsumerFilter.

Refs #5215

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 506ff40bd447f00158c24859819d4bb06436c996)
2019-10-29 12:35:33 +01:00
Benny Halevy
d5f53bc307 mutation_fragment_stream_validator: validate partition key monotonicity
Fixes #4804

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 736360f823621f7994964fee77f37378ca934c56)
2019-10-29 12:35:33 +01:00
Gleb Natapov
e5e44bfda2 client_state: fix get_timestamp_for_paxos() to always advance a timestamp
Message-Id: <20191029102336.GL2866@scylladb.com>
2019-10-29 13:07:33 +02:00
Tomasz Grabiec
c2a4c915f3 Merge "Fix a few issues with CAS requests" from Vladimir D.
There are a few issues at the CQL layer, because of which the result of
a CAS request execution may differ between Scylla and Cassandra. Mostly,
it happens when static columns are involved. The goal of this patch set
is to fix these issues, thus making Scylla's implementation of CAS yield
the same results as Cassandra's.
2019-10-29 11:50:15 +01:00
Rafael Ávila de Espíndola
c74864447b types: Simplify validate_visitor for strings
We have different types for ascii and utf8, so there is no need for
an extra if.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191024232911.22700-1-espindola@scylladb.com>
2019-10-29 11:02:55 +02:00
Nadav Har'El
d69ab1b588 CDC: (atomic) delta + (non-optional) pre-image data columns
Merged patch series by Calle Wilund, with a few fixes by Piotr Jastrzębski:

Adds delta and pre-image data column writes for the atomic columns in a
cdc-enabled table.

Note that in this patch set it is still unconditional. Adding option support
comes in next set.

Uses code more or less derived from alternator to select pre-image, using
raw query interface. So should be fairly low overhead to query generation.
Pre-image and delta mutations are mixed in with the actual modification
mutations to generate the full cdc log (sans post-image).
2019-10-29 09:39:28 +02:00
Calle Wilund
7db393fe12 cdc_test: Add helper methods + preimage test
Add filtering, sorting etc helpers + simple pre-image test

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-29 07:49:05 +01:00
Vladimir Davydov
65b86d155e cql: add static row to CAS failure result if there are static conditions
Even if no rows match clustering key restrictions of a conditional
statement with static columns conditions, we still must include the
static column value into the CAS failure result set. For example,
the following conditional DELETE statement

  create table t(k int, c int, s int static, v int, primary key(k, c));
  insert into t(k, s) values(1, 1);
  delete v from t where k=1 and c=1 if v=1 and s=1;

must return

  [applied=False, v=null, s=1]

not just

  [applied=False, v=null, s=null]

To fix that, set partition_slice::option::always_return_static_content
for querying rows used for checking conditions so that we have the
static row in update_parameters::prefetch_data even if no regular row
matches clustering column restrictions. Plus modify cas_request::
applies_to() so that it sets is_in_cas_result_set flag for the static
row in case there are static column conditions, but the result set
happens to be empty.

As pointed out by Tomek, there's another reason to set partition_slice::
option::always_return_static_content apart from building a correct
result set on CAS failure. There could be a batch with two statements,
one with clustering key restrictions which select no row, and another
statement with only static column conditions. If we didn't enable this
flag, we wouldn't get a static row even if it exists, and static column
conditions would evaluate as if the static row didn't exist, for
example, the following batch

  create table t(k int, c int, s int static, primary key(k, c));
  insert into t(k, s) values(1, 1);
  begin batch
  insert into t(k, c) values(1, 1) if not exists
  update t set s = 2 where k = 1 if s = 1
  apply batch;

would fail although it clearly must succeed.
2019-10-28 22:30:37 +03:00
Vladimir Davydov
e0b31dd273 query: add flag to return static row on partition with no rows
A SELECT statement that has clustering key restrictions isn't supposed
to return static content if no regular rows matches the restrictions,
see #589. However, for the CAS statement we do need to return static
content on failure so this patch adds a flag that allows the caller to
override this behavior.
2019-10-28 21:50:44 +03:00
Vladimir Davydov
57d284d254 cql: exclude statements not checked by cas from result set
Apart from conditional statements, there may be other reading statements
in a batch, e.g. manipulating lists. We must not include rows fetched
for them into the CAS result set. For instance, the following CAS batch:

  create table t(p int, c int, i int, l list<int>, primary key(p, c));
  insert into t(p, c, i) values(1, 1, 1)
  insert into t(p, c, i, l) values(1, 1, 1, [1, 2, 3])
  begin batch
  update t set i=3 where p=1 and c=1 if i=2
  update t set l=l-[2] where p=1 and c=2
  apply batch;

is supposed to return

  [applied] | p | c | i
  ----------+---+---+---
     False  | 1 | 1 | 1

not

  [applied] | p | c | i
  ----------+---+---+---
     False  | 1 | 1 | 1
     False  | 1 | 2 | 1

To filter out such collateral rows from the result set, let's mark rows
checked by conditional statements with a special flag.
2019-10-28 21:50:43 +03:00
Vladimir Davydov
74b9e80e4c cql: fix EXISTS check that applies only to static columns
If a CQL statement only updates static columns, i.e. has no clustering
key restrictions, we still fetch a regular row so that we can check it
against EXISTS condition. In this case we must be especially careful: we
can't simply pass the row to modification_statement::applies_to, because
it may turn out that the row has no static columns set, i.e. there's no
in fact static row in the partition. So we filter out such rows without
static columns right in cas_request::applies_to before passing them
further to modification_statement::applies_to.

Example:

  create table t(p int, c int, s int static, primary key(p, c));
  insert into t(p, c) values(1, 1);
  insert into t(p, s) values(1, 1) if not exists;

The conditional statement must succeed in this case.
2019-10-28 21:49:37 +03:00
Vladimir Davydov
8fbf344f03 cql: ignore clustering key if statement checks only static columns
In case a CQL statement has only static columns conditions, we must
ignore clustering key restrictions.

Example:

  create table t(p int, c int, s int static, v int, primary key(p, c));
  insert into t(p, s) values(1, 1);
  update t set v=1 where p=1 and c=1 if s=1;

This conditional statement must successfully insert row (p=1, c=1, v=1)
into the table even though there's no regular row with p=1 and c=1 in
the table before it's executed, because the statement condition only
applies to the static column s, which exists and matches.
2019-10-28 21:13:19 +03:00
Vladimir Davydov
54cf903bb2 cql: differentiate static from regular EXISTS conditions
If a modification statement doesn't have a clustering column restriction
while the table has static columns, then EXISTS condition just needs to
check if there's a static row in the partition, i.e. it doesn't need to
select any regular rows. Let's treat such EXIST condition like a static
column condition so that we can ignore its clustering key range while
checking CAS conditions.
2019-10-28 21:13:05 +03:00
Vladimir Davydov
934a87999f cql: turn prefetch_data::row into struct
This will allow us to add helper methods and store extra info in each
row. For example, we can add a method for checking if a row has static
columns. Also, to build CAS result set, we need to differentiate rows
fetched to check conditions from those fetched for reading operations.
Using struct as row container will allow us to store this information in
each prefetched row.
2019-10-28 21:12:52 +03:00
Vladimir Davydov
bdd62b8bc3 cql: remove static column check from create_clustering_ranges
The check is pointless, because we check exactly the same while
preparing the statement, see process_where_clause() method of
modification_statement.
2019-10-28 21:12:43 +03:00
Vladimir Davydov
a8ddbffa75 cql: fix applies_only_to_static_columns check
Currently, we set _sets_regular_columns/_sets_static_columns flags when
adding regular/static conditions to modification_statement. We use them
in applies_only_to_static_columns() function that returns true iff
_sets_static_columns is set and _sets_regular_columns is clear. We
assume that if this function returns true then the statement only deals
with static columns and so must not have clustering key restrictions.
Usually, that's true, but there's one exception: DELETE FROM ...
statement that deletes whole rows. Technically, this statement doesn't
have any column operations, i.e. _sets_regular_columns flag is clear.
So if such a statement happens to have a static condition, we will
assume that it only applies to static columns and mistakenly raise an
error.

Example:

  create table t(k int, c int, s int static, v int, primary key(k, c));
  delete from t where k=1 and c=1 if s=1;

To fix this, let's not set the above mentioned flags when adding
conditions and instead check if _column_conditions array is empty in
applies_only_to_static_columns().
2019-10-28 21:12:36 +03:00
Vladimir Davydov
fbb11dac11 cql: set conditions before processing where clause
modification_statement::process_where_clause() assumes that both
operations and conditions has been added to the statement when it's
called: it uses this information to raise an error in case the statement
restrictions are incompatible with operations or conditions. Currently,
operations are set before this function is called, but not conditions.
This results in "Invalid restrictions on clustering columns since
the {} statement modifies only static columns" error while trying to
execute the following statements:

  create table t(k int, c int, s int static, v int, primary key(k, c));
  delete s from t where k=1 and c=1 if v=1;
  update t set s=1 where k=1 and c=1 if v=1;

Fix this by always initializing conditions before processing WHERE
clause.
2019-10-28 21:12:22 +03:00
Botond Dénes
edc1750297 scylla-gdb.py: introduce scylla smp-queues
Print a histogram of the number of async work items in the shard's
outgoing smp queues.
Example:

    (gdb) scylla smp-queues
        10747 17 ->  3 ++++++++++++++++++++++++++++++++++++++++
          721 17 -> 19 ++
          247 17 -> 20 +
          233 17 -> 10 +
          210 17 -> 14 +
          205 17 ->  4 +
          204 17 ->  5 +
          198 17 -> 16 +
          197 17 ->  6 +
          189 17 -> 11 +
          181 17 ->  1 +
          179 17 -> 13 +
          176 17 ->  2 +
          173 17 ->  0 +
          163 17 ->  8 +
            1 17 ->  9 +

Useful for identifying the target shard, when `scylla task_histogram`
indicates a high number of async work items.

To produce the histogram the command goes over all virtual objects in
memory and identifies the source and target queues of each
`seastar::smp_message_queue::async_work_item` object. Practically the
source queue will always be that of the current shard. As this scales
with the number of virtual objects in memory, it can take some time to
run. An alternative implementation would be to instead read the actual
smp queues, but the code of that is scary so I went for the simpler and
more reliable solution.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191028132456.37796-1-bdenes@scylladb.com>
2019-10-28 15:42:55 +02:00
Tomasz Grabiec
3b37027598 Merge "lwt: implement basic lightweight transactions support" from Kostja
This patch set introduces light-weight transactions support to
ScyllaDB. It is a subset of the full series, which adds
basic LWT support and which has been reviewed thus far.
2019-10-28 11:45:28 +01:00
Tomasz Grabiec
f745819ed7 Merge "lwt: paxos protocol implementation" from Gleb
This is paxos implementation for LWT. LWT itself is not included in the
patch so the code is essentially is not wired yet (except read path).
2019-10-28 11:29:40 +01:00
Avi Kivity
f8ba96efcf Merge "test_udt_mutations fixes" from Benny
"
mutation_test/test_udt_mutations kept failing on my machine and I tracked it down to the 3rd patch in this series (use int64_t constants for long_type). While at it, this series also fixes a comment and the end iterator in BOOST_REQUIRE(std::all_of(...))

mutation_test: test_udt_mutations: fixup udt comment
mutation_test: test_udt_mutations: fix end iterator in call to std::all_of
mutation_test: test_udt_mutations: use int64_t constants for long_type

Test: mutation_test(dev, debug)
"

* 'test_udt_mutations-fixes' of https://github.com/bhalevy/scylla:
  mutation_test: test_udt_mutations: use int64_t constants for long_type
  mutation_test: test_udt_mutations: fix end iterator in call to std::all_of
  mutation_test: test_udt_mutations: fixup udt comment
2019-10-28 10:43:52 +02:00
Calle Wilund
36328acf60 cql_assertions: Change signature to accept sstring 2019-10-28 06:16:12 +01:00
Calle Wilund
7d98f735ee cdc: Add static columns to data/preimage mutations
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
19bba5608a cdc: Create and perform a pre-image select for mutations
As well as generate per-image rows in resulting log mutation

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
d4ee1938c7 cdc: Add modification record for regular atomic values in mutations
Fills in the data columns for regular columns iff they are
atomic (not unfrozed collections)
2019-10-28 06:16:12 +01:00
Calle Wilund
3fdcbd9dff cdc: Set row op in log
Adds actual operation (part delete, range delete, update) to
cdc log
2019-10-28 06:16:12 +01:00
Calle Wilund
8a6b72f47e cdc: Add pre-image select generator method
Based on a mutation, creates a pre-image select operation.

Note, this uses raw proxy query to shortcut parsing etc,
instead of trying to cache by generated query. Hypothesis is that
this is essentially faster.

The routine assumes all rows in a mutation touch same static/regular
columns. If this is not always true it will need additional
calculations.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
d74f32b07a cql3::untyped_result_set: Add constructor from cql3:;result_set
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
3ed7a9dd69 cql3::untyped_result_set: Add view getter to make non-intrusive read chaper
Also use in actual data conversion.
2019-10-28 06:16:12 +01:00
Calle Wilund
451bb7447d cdc: Add log / log data column operation types and make data cols tuples of these
Makes static/regular data columns tuple<op, value, ttl> as per spec.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Konstantin Osipov
e555dc502e lwt: implement basic lightweight transactions support
Support single-statement conditional updates and as well as batches.

This patch almost fully rewrites column_condition.cc, implementing
is_satisfied_by().

Most of the remaining complications in column_condition implementation
come from the need to properly handle frozen and multi-cell
collection in predicates - up until now it was not possible
to compare entire collection values between each other. This is further
complicated since multi-cell lists and sets are returned as maps.

We can no longer assume that the columns fetched by prefetch operation
are non-frozen collections. IF EXISTS/IF NOT EXISTS condition
fetches all columns, besides, a column may be needed to check other
condition.

When fetching the old row for LWT or to apply updates on list/columns,
we now calculate precisely the list of columns to fetch.

The primary key columns are also included in CAS batch result set,
and are thus also prefetched (the user needs them to figure out which
statements failed to apply).

The patch is cross-checked for compatibility with cassandra-3.11.4-1545-g86812fa502
but does deviate from the origin in handling of conditions on static
row cells. This is addressed in future series.
2019-10-27 23:42:49 +03:00
Konstantin Osipov
67e68dabf0 lwt: ensure we don't crash when we get a LIKE 2019-10-27 23:42:49 +03:00
Konstantin Osipov
f8f36d066c lwt: check for unsupported collection type in condition element access
We don't support conditions with element access on non-frozen UDTs,
check that only supported collection types are supplied.
2019-10-27 23:42:49 +03:00
Konstantin Osipov
c9f0adf616 lwt: rewrite cql3::raw::column_condition::prepare()
Restructure the code to avoid quite a bit of code duplication.
2019-10-27 23:42:47 +03:00
Konstantin Osipov
c2217df4d8 lwt: reorganize column_condition declaration and add comments 2019-10-27 23:42:03 +03:00
Konstantin Osipov
22b0240fe7 lwt: remove useless code in column_condition.hh
Each column_condition and raw::column_condition construction case had a
static method wrapping its constructor, simply supplying some defaults.

This neither improves clarity nor maintainability.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
3e25b83391 lwt: propagate if_exists condition from the parser to AST
UPDATE ... IF EXISTS is legal, but IF EXISTS condition
was not propagated from the parser to AST (rad::update_statement).
2019-10-27 23:42:03 +03:00
Konstantin Osipov
df28985295 lwt: introduce cql_statment_opt_metadata
cql_statement_opt_metadata is an interim node
in cql (prepared) statement hierarchy parenting
modification_statement and batch_statement. If there
is IF condition in such statements, they return a result set,
and thus have a result set metadata.

The metadata itself is filled in a subsequent patch.
2019-10-27 23:42:03 +03:00
Vladimir Davydov
c8869e803e lwt: remove commented out validateWhereClauseForConditions
This logic was implemented in validate_where_clause_for_conditions()
method of modification_statement class.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
eb5e82c6a1 lwt: add CAS where clause validation
Add checks for conditional modification statement limitations:
- WHERE clustering_key IN (list) IF condition is not supported
  since a conditions is evaluated for a single row/cell, so
  allowing multiple rows to match the WHERE clause would create
  ambiguity,
- the same is true for conditional range deletions.
- ensure all clustering restrictions are eq for conditional delete

  We must not allow statements like

  create table t(p int, c int, v int, primary key (p, c));
  delete from t where p=1 and c>0 if v=1;

  because there may be more than one statement in a partition satisfying
  WHERE clause, in which case it's unclear which of them should satisfy
  IF condition: all or just one.

  Raising an error on such a statement is consistent with Cassandra's
  behavior.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
203eb3eccc lwt: sleep a random amount of time when retrying CAS
Sleep a random interval between 0 and 100 ms before retrying CAS.
Reuse sleep function, make the distribution object thread local.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
0674fab05c lwt: implement storage_proxy::cas()
Introduce service::cas_request abstract base class
which can be used to parameterize Paxos logic.

Implement storage_proxy::cas() - compare and swap - the storage proxy
entry point for lightweight transactions.
2019-10-27 23:42:03 +03:00
Gleb Natapov
70adf65341 storage_proxy: make mutation holder responsible for mutation operation
Currently the code that manipulates mutations during write need to
check what kind of mutations are those and (sometimes) choose different
code paths. This patch encapsulates the differences in virtual
functions of mutation_holder object, so that high level code will not
concern itself with the details. The functions that are added:
apply_locally(), apply_remotely() and store_hint().
2019-10-27 23:21:51 +03:00
Gleb Natapov
b3e01a45d7 lwt: storage_proxy: implement paxos protocol
This patch adds all functionality needed for Paxos protocol. The
implementation does not strictly adhere to Paxos paper since the original
paper allows setting a value only once, while for LWT we need to be able
to make another Paxos round after "learn" phase completes, which requires
things like repair to be introduced.
2019-10-27 23:21:51 +03:00
Gleb Natapov
8d6201a23b lwt: Add RPC verbs needed for paxos implementation
Paxos protocol has three stages: prepare, accept, learn. This patch adds
rpc verb for each of those stages. To be term compatible with Cassandra
the patch calls those stages: prepare, propose, commit.
2019-10-27 23:21:51 +03:00
Gleb Natapov
d1774693bf lwt: Define state needed by paxos and persist it
Paxos protocol relies on replicas having a state that persists over
crashes/restarts. This patch defines such state and stores it in the
database itself in the paxos table to make it persistent.

The stored state is:
  in_progress_ballot    - promised ballot
  proposal              - accepted value
  proposal_ballot       - the ballot of the accepted value
  most_recent_commit    - most recently learned value
  most_recent_commit_at - the ballot of the most recently learned value
2019-10-27 23:21:51 +03:00
Gleb Natapov
15b935b95d lwt: add data structures needed for paxos implementation
This patch add two data structures that will be used by paxos. First
one is "proposal" which contains a ballot and a mutation representing
a value paxos protocol is trying to set. Second one is
"prepare_response" which is a value returned by paxos prepare stage.
It contains currently accepted value (if any) and most recently
learned value (again if any). The later is used to "repair" replicas
that missed previous "learn" message.
2019-10-27 23:21:51 +03:00
Benny Halevy
1895fb276e mutation_test: test_udt_mutations: use int64_t constants for long_type
Otherwise they are decomposed and serialized as 4-byte int32.

For example, on my machine cell[1] looked like this:
{0002, atomic_cell{0000000310600000;ts=0;expiry=-1,ttl=0}}

and it failed cells_equal against:
{0002, atomic_cell{0000000300000000;ts=0;expiry=-1,ttl=0}}

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-27 20:51:29 +02:00
Benny Halevy
fec772538c mutation_test: test_udt_mutations: fix end iterator in call to std::all_of
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-27 19:49:25 +02:00
Benny Halevy
9c8cf9f51d mutation_test: test_udt_mutations: fixup udt comment
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-27 19:47:43 +02:00
Benny Halevy
76581e7f14 docs/debugging.md: fix gdb command for retrieving shared libraries information
This correct command is `info sharedlibrary`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191027153541.27286-1-bhalevy@scylladb.com>
2019-10-27 18:15:09 +02:00
Dejan Mircevski
2a136ba1bc alternator: Fix race condition in set_routes()
server::set_routes() was setting the value of server::_callbacks.
This led to a race condition, as set_routes() is invoked on every
shard simultaneously.  It is also unnecessary, since _callbacks can be
initialized in the constructor.

Fixes #5220.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-27 12:31:24 +02:00
Avi Kivity
27ef73f4f1 Merge "Report file I/O in CQL tracing when reading from sstables." from Kamil
"
Introduce the traced_file class which wraps a file, adding CQL trace messages before and after every operation that returns a future.
Use this file to trace reads from SSTable data and index files.

Fixes #4908.
"

* 'traced_file' of https://github.com/kbr-/scylla:
  sstables: report sstable index file I/O in CQL tracing
  sstables: report sstable data file I/O in CQL tracing
  tracing: add traced_file class
2019-10-26 22:53:37 +03:00
Avi Kivity
2b856a7317 Merge "Support non-frozen UDTs." from Kamil
"
This change allows creating tables with non-frozen UDT columns. Such columns can then have single fields modified or deleted.

I had to do some refactoring first. Please read the initial commit messages, they are pretty descriptive of what happened (read the commits in the order they are listed on my branch: https://github.com/kbr-/scylla/commits/udt, starting from kbr-@8eee36e, in order to understand them). I also wrote a bunch of documentation in the code.

Fixes #2201.
"

* 'udt' of https://github.com/kbr-/scylla: (64 commits)
  tests: too many UDT fields check test
  collection_mutation: add a FIXME.
  tests: add a non-frozen UDT materialized view test
  tests: add a UDT mutation test.
  tests: add a non-frozen UDT "JSON INSERT" test.
  tests: add a non-frozen UDT to for_each_schema_change.
  tests: more non-frozen UDT tests.
  tests: move some UDT tests from cql_query_test.cc to new file.
  types: handle trailing nulls in tuples/UDTs better.
  cql3: enable deleting single fields of non-frozen UDTs.
  cql3: enable setting single fields of a non-frozen UDT.
  cql3: enable non-frozen UDTs.
  cql3: introduce user_types::marker.
  cql3: generalize function_call::make_terminal to UDTs.
  cql3: generalize insert_prepared_json_statement::execute_set_value to UDTs.
  cql3: use a dedicated setter operation for inserting user types.
  cql3: introduce user_types::value.
  types: introduce to_bytes_opt_vec function.
  cql3: make user_types::delayed_value::bind_internal return vector<bytes_opt>.
  cql3: make cql3_type::raw_ut::to_string distinguish frozenness.
  ...
2019-10-26 22:53:37 +03:00
Piotr Sarna
657e7ef5a5 alternator: add alternator health check
The health check is performed simply by issuing a GET request
to the alternator port - it returns the following status 200
response when the server is healthy:

$ curl -i localhost:8000
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 23
Server: Seastar httpd
Date: 21 Oct 2019 12:55:33 GMT

healthy: localhost:8000

This commit comes with a test.
Fixes #5050
Message-Id: <3050b3819661ee19640c78372e655470c1e1089c.1571921618.git.sarna@scylladb.com>
2019-10-26 18:14:18 +03:00
Botond Dénes
01e913397a tests: memtable_test: flush_reader_test: compare compacted mutations
To filter out artificial differences due to different representation of
an equivalent set of writes.

Fixes: #5207

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191024103718.29266-1-bdenes@scylladb.com>
2019-10-26 18:14:18 +03:00
Kamil Braun
432ef7c9af sstables: report sstable index file I/O in CQL tracing
Use tracing::make_traced_file when reading from the index file in
index_reader.
2019-10-25 14:10:28 +02:00
Kamil Braun
394c36835a sstables: report sstable data file I/O in CQL tracing
Use tracing::make_traced_file when creating an sstable input_stream.
To achieve that, trace_state needs to be plumbed down through some
functions.
2019-10-25 14:10:28 +02:00
Kamil Braun
a8c9d1206a tracing: add traced_file class
This is a thin wrapper over the `seastar::file` class which adds
CQL trace messages before and after I/O operations.
2019-10-25 14:10:24 +02:00
Kamil Braun
2889edea3e tests: too many UDT fields check test 2019-10-25 12:05:10 +02:00
Kamil Braun
adfc04ebec collection_mutation: add a FIXME.
We could use iterators over cells instead of a vector of cells
in collection_mutation(_view)_description. Then some use cases could
provide iterators that construct the cells "on the fly".
2019-10-25 12:05:10 +02:00
Kamil Braun
45d2a96980 tests: add a non-frozen UDT materialized view test 2019-10-25 12:05:10 +02:00
Kamil Braun
e0c233ede1 tests: add a UDT mutation test. 2019-10-25 12:05:08 +02:00
Kamil Braun
a21d12faae tests: add a non-frozen UDT "JSON INSERT" test. 2019-10-25 12:04:44 +02:00
Kamil Braun
ae3464da45 tests: add a non-frozen UDT to for_each_schema_change. 2019-10-25 12:04:44 +02:00
Kamil Braun
b87b700e66 tests: more non-frozen UDT tests. 2019-10-25 12:04:44 +02:00
Kamil Braun
474742ac5d tests: move some UDT tests from cql_query_test.cc to new file. 2019-10-25 12:04:44 +02:00
Kamil Braun
612de1f4e3 types: handle trailing nulls in tuples/UDTs better.
Comparing user types after adding new fields was bugged.
In the following scenario:

create type ut (a int);
create table cf (a int primary key, b frozen<ut>);
insert into cf (a, b) values (0, (0));
alter type ut add b int;
select * from cf where b = {a:0,b:null};

the row with a = 0 should be returned, even though the value stored in the database is shorter
(by one null) than the value given by the user. Until now it wouldn't
have.
2019-10-25 12:04:44 +02:00
Kamil Braun
1a9034e38a cql3: enable deleting single fields of non-frozen UDTs.
This was already possible by setting the field to null, but now it
supports the DELETE syntax.
2019-10-25 12:04:44 +02:00
Kamil Braun
4d271051dd cql3: enable setting single fields of a non-frozen UDT.
The commit introduces the necessary modifications to the grammar,
a set_field raw operation, and a setter_by_field operation.
2019-10-25 12:04:44 +02:00
Kamil Braun
e74b5deb5d cql3: enable non-frozen UDTs.
Add a cluster feature for non-frozen UDTs.

If the cluster supports non-frozen UDTs, do not return an error
message when trying to create a table with a non-frozen user type.
2019-10-25 12:04:44 +02:00
Kamil Braun
7ac7a3994d cql3: introduce user_types::marker.
cql3::user_types::marker is a dedicated cql3::abstract_marker for user
type placeholders in prepared CQL queries. When bound, it returns a
user_types::value.
2019-10-25 12:04:44 +02:00
Kamil Braun
36999c94f4 cql3: generalize function_call::make_terminal to UDTs.
Use the dedicated user_types::value.
There is no way this code can be executed now, so I left a TODO.
2019-10-25 12:04:44 +02:00
Kamil Braun
49a7461345 cql3: generalize insert_prepared_json_statement::execute_set_value to UDTs.
For user types, use its dedicated setter and value.
2019-10-25 12:04:44 +02:00
Kamil Braun
40f9ce2781 cql3: use a dedicated setter operation for inserting user types.
cql3::user_types::setter is a dedicated cql3::operation
for inserting and updating user types. It handles the multi-cell
(non-frozen) case.
2019-10-25 12:04:44 +02:00
Kamil Braun
51be1e3e9d cql3: introduce user_types::value.
This is a dedicated multi_item_terminal for user type values.
Will be useful in future commits.
2019-10-25 12:04:44 +02:00
Kamil Braun
abe6c2d3d2 types: introduce to_bytes_opt_vec function.
It converts a vector<bytes_view_opt> to a vector<bytes_opt>.
Used in a bunch of places.
2019-10-25 12:04:44 +02:00
Kamil Braun
8ff2aebd76 cql3: make user_types::delayed_value::bind_internal return vector<bytes_opt>.
Previously it returned vector<cql3::raw_value>, even though we don't use
unset values when setting a UDT value (fields that are not provided
become nulls. Thats how C* does it).
This simplifies future implementation of user_types::{value, setter}.
2019-10-25 12:04:44 +02:00
Kamil Braun
f0a3af6adc cql3: make cql3_type::raw_ut::to_string distinguish frozenness.
This is used in error messages and may be useful.
2019-10-25 12:04:44 +02:00
Kamil Braun
c89de228e3 cql3: generalize some error messages to UDTs 2019-10-25 12:04:44 +02:00
Kamil Braun
fd3bc27418 cql3: disallow non-frozen UDTs when creating secondary indexes 2019-10-25 12:04:44 +02:00
Kamil Braun
ff0bd0bb7a cql3: check for nested non-frozen UDTs in create_type_statement. 2019-10-25 12:04:44 +02:00
Kamil Braun
adf857e9ed cql3: add cql3_type::is_user_type.
This will be used in future commits.
2019-10-25 12:04:44 +02:00
Kamil Braun
6ccb1ee19f cql3: generalize create_table_statement::raw_statement::prepare to UDTs.
Check for UDT with nested non-frozen collection.
Check for UDT with COMPACT STORAGE.
Check for UDT inside PRIMARY KEY.
2019-10-25 12:04:44 +02:00
Kamil Braun
a8c7670722 types: add multi_cell field to user_type_impl.
is_value_compatible_with_internal and update_user_type were generalized
to the non-frozen case.

For now, all user_type_impls in the code are non-multi-cell (frozen).
This will be changed in future commits.
2019-10-25 12:04:44 +02:00
Kamil Braun
b904d04925 cql3: add a TODO to implement column_conditions for UDTs.
This will become relevant after LWT is implemented.
2019-10-25 12:04:44 +02:00
Kamil Braun
44534a4a0a sstables: generalize some comments to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
b38b8af0f2 schema: generalize compound_name to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
270cf2b289 query-result-set: generalize result_set_builder to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
2ada219f2c view: generalize create_virtual_column and maybe_make_virtual to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
574e1cd514 tests: generalize timestamp_based_spliiting_writer and bucket_writer to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
6da89e40df tests: generalize random_schema.cc:generate_collection to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
0fbfb67cbb tests: generalize mutation_test.cc summaries to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
a3a2f65fbf types: generalize serialize_for_cql to UDTs.
Also introduces a helper "linearized" function, which implements
a pattern occurring in all serialize_for_cql_aux functions.
2019-10-25 12:04:44 +02:00
Kamil Braun
05d4b2e1a4 tests: generalize data_model.cc:mutation_description::build to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
338fde672a mp_row_consumer: generalize consume_cell (kl) and consume_column (mc) to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
5e447e3250 mutation_partition_view: generalize read_collection_cell to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
90927c075a converting_mutation_partition_applier: generalize accept_cell to UDTs. 2019-10-25 12:04:42 +02:00
Kamil Braun
d9baff0e4b collection_mutation: generalize collection_mutation.cc:difference to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
a344019b25 collection_mutation: generalize collection_mutation_view::last_update to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
691f00408d collection_mutation: generalize merge to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
7f5cd8e8ce collection_mutation: generalize collection_mutation_view_description::materialize to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
20b42b1155 collection_mutation: generalize collection_mutation_view::is_any_live to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
323370e4ba collection_mutation: generalize deserialize_collection_mutation to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
393974df3b cql3: make {lists,maps,sets}::value::from_serialized take const {}_type&.
This will simplify the code a bit where from_serialized is used
after switching to visitors. Also reduces the number of shared_ptr
copies.
2019-10-25 10:49:19 +02:00
Kamil Braun
4327bba0db types: introduce (de)serialize_field_index functions.
These functions are used to translate field indices, which are used to
identify fields inside UDTs, from/to a serialized representation to be
stored inside sstables and mutations.
They do it in a way that is compatible with C*.
2019-10-25 10:49:19 +02:00
Kamil Braun
90d05eb627 cql3: reject too long user-defined types 2019-10-25 10:49:19 +02:00
Kamil Braun
0f8f950b74 cql3: optimize multi_item_terminal::get_elements().
Now it returns const std::vector<bytes_opt>& instead of
std::vector<bytes_opt>.
2019-10-25 10:49:19 +02:00
Kamil Braun
4374982de0 types: collection_type_impl::to_value becomes serialize_for_cql.
The purpose of collection_type_impl::to_value was to serialize a
collection for sending over CQL. The corresponding function in origin
is called serializeForNativeProtocol, but the name is a bit lengthy,
so I settled for serialize_for_cql.

The method now became a free-standing function, using the visit
function to perform a dispatch on the collection type instead
of a virtual call. This also makes it easier to generalize it to UDTs
in future commits.

Remove the old serialize_for_native_protocol with a FIXME: implement
inside. It was already implemented (to_value), just called differently.

remove dead methods: enforce_limit and serialized_values. The
corresponding methods in C* are auxiliary methods used inside
serializeForNativeProtocol. In our case, the entire algorithm
is wholly written in serialize_for_cql.
2019-10-25 10:49:19 +02:00
Kamil Braun
e5c0a992ef cql3: make cql3_type::raw::to_string private.
It only needs to be used in operator<<, which is a friend
of cql3_type::raw.
2019-10-25 10:42:58 +02:00
Kamil Braun
ff4d857a9d cql3: remove a dynamic_pointer_cast to user_type_impl.
There exists a method to check if something is a user type:
is_user_type(); use it instead.
2019-10-25 10:42:58 +02:00
Kamil Braun
d8f8908d34 types: introduce user_type_impl::idx_of_field method.
Each field of a user type has its index inside the type.
This method allows to find it easily, which is needed in a bunch of
places.
2019-10-25 10:42:58 +02:00
Kamil Braun
c77643a345 cql3: make cql3_type::_frozen protected. Add is_frozen() method.
Noone modifies _frozen from the outside.
Moving the field to `protected` makes it harder to introduce bugs.
2019-10-25 10:42:58 +02:00
Kamil Braun
d83ebe1092 collection_mutation: move collection_type_impl::difference to collection_mutation.hh. 2019-10-25 10:42:58 +02:00
Kamil Braun
7e3bbe548c collection_mutation: move collection_type_impl::merge to collection_mutation.hh. 2019-10-25 10:42:58 +02:00
Kamil Braun
a41277a7cd collection_mutation: move collection_type_impl::last_update to collection_mutation_view 2019-10-25 10:42:58 +02:00
Kamil Braun
30802f5814 collection_mutation: move collection_type_impl::is_any_live to collection_mutation_view 2019-10-25 10:42:58 +02:00
Kamil Braun
e16ba76c2e collection_mutation: move collection_type_impl::is_empty to collection_mutation_view. 2019-10-25 10:42:58 +02:00
Kamil Braun
bbdb438d89 collection_mutation: easier (de)serialization of collection_mutation(s).
`collection_type_impl::serialize_mutation_form`
became `collection_mutation(_view)_description::serialize`.

Previously callers had to cast their data_type down to collection_type
to use serialize_mutation_form. Now it's done inside `serialize`.
In the future `serialize` will be generalized to handle UDTs.

`collection_type_impl::deserialize_mutation_form`
became a free standing function `deserialize_collection_mutation`
with similiar benefits. Actually, noone needs to call this function
manually because of the next paragraph.

A common pattern consisting of linearizing data inside a `collection_mutation_view`
followed by calling `deserialize_mutation_form` has been abstracted out
as a `with_deserialized` method inside collection_mutation_view.

serialize_mutation_form_only_live was removed,
because it hadn't been used anywhere.
2019-10-25 10:42:58 +02:00
Kamil Braun
e4101679e4 collection_mutation: generalize constructor of collection_mutation to abstract_type.
The constructor doesn't use anything specific to collection_type_impl.
In the future it will also handle non-frozen user types.
2019-10-25 10:42:58 +02:00
Kamil Braun
b1d16c1601 types: move collection_type_impl::mutation(_view) out of collection_type_impl.
collection_type_impl::mutation became collection_mutation_description.
collection_type_impl::mutation_view became collection_mutation_view_description.
These classes now reside inside collection_mutation.hh.

Additional documentation has been written for these classes.

Related function implementations were moved to collection_mutation.cc.

This makes it easier to generalize these classes to non-frozen UDTs in future commits.
The new names (together with documentation) better describe their purpose.
2019-10-25 10:19:45 +02:00
Kamil Braun
c0d3e6c773 atomic_cell: move collection_mutation(_view) to a new file.
The classes 'collection_mutation' and 'collection_mutation_view'
were moved to a separate header, collection_mutation.hh.

Implementations of functions that operate on these classes,
including some methods of collection_type_impl, were moved
to a separate compilation unit, collection_mutation.cc.

This makes it easier to modify these structures in future commits
in order to generalize them for non-frozen User Defined Types.

Some additional documentation has been written for collection_mutation.
2019-10-25 10:19:45 +02:00
Kamil Braun
c90ea1056b Remove mutation_partition_applier.
It had been replaced by partition_builder
in commit dc290f0af7.
2019-10-25 10:19:45 +02:00
Asias He
f32ae00510 gossip: Limit number of pending gossip ACK2 messages
Similar to "gossip: Limit number of pending gossip ACK messages", limit
the number of pending gossip ACK2 messages in gossiper::handle_ack_msg.

Fixes #5210
2019-10-25 12:44:28 +08:00
Asias He
15148182ab gossip: Limit number of pending gossip ACK messages
In a cross-dc large cluster, the receiver node of the gossip SYN message
might be slow to send the gossip ACK message. The ack messages can be
large if the payload of the application state is big, e.g.,
CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK
message can consume unlimited amount of memory which causes OOM
eventually.

To fix, this patch queues the SYN message and handles it later if the
previous ACK message is still being sent. However, we only store the
latest SYN message. Since the latest SYN message from peer has the
latest information, so it is safe to drop the previous SYN message and
keep the latest one only. After this patch, there can be at most 1
pending SYN message and 1 pending ACK message per peer node.

Fixes #5210
2019-10-25 12:44:28 +08:00
Nadav Har'El
8bffb800e1 alternator: Use system_auth.roles for alternator authorization
Merged patch series from Piotr Sarna:

This series couples system_auth.roles with authorization routines
in alternator. The `salted_hash` field, which is every user's hashed
password, is used as a secret key for the signature generation
in alternator.
This series also adds related expiration verifications for alternator
signatures.
It also comes with more test cases and docs updates.

Tests: alternator(local, remote), manual

Piotr Sarna (11):
  alternator: add extracting key from system_auth.roles
  alternator: futurize verify_signature function
  alternator: move the api handler to a separate function
  alternator: use keys from system_auth.roles for authorization
  alternator: add key cache to authorization
  alternator-test: add a wrong password test
  alternator: verify that the signature has not expired
  alternator: add additional datestamp verification
  alternator-test: add tests for expired signatures
  docs: update alternator entry for authorization
  alternator-test: add authorization to README

 alternator-test/conftest.py                |   2 +-
 alternator-test/test_authorization.py      |  44 ++++++++-
 alternator-test/test_describe_endpoints.py |   2 +-
 alternator/auth.hh                         |  15 ++-
 alternator/server.hh                       |  10 +-
 alternator/auth.cc                         |  62 +++++++++++-
 alternator/server.cc                       | 106 ++++++++++++---------
 alternator-test/README.md                  |  28 ++++++
 docs/alternator/alternator.md              |   7 +-
 9 files changed, 221 insertions(+), 55 deletions(-)
2019-10-23 20:51:08 +03:00
Tomasz Grabiec
e621db591e Merge "Fix TTL serialization breakage" from Avi
ommit 93270dd changed gc_clock to be 64-bit, to fix the Y2038
problem. While 64-bit tombstone::deletion_time is serialized in a
compatible way, TTLs (gc_clock::duration) were not.

This patchset reverts TTL serialization to the 32-bit serialization
format, and also allows opting-in to the 64-bit format in case a
cluster was installed with the broken code. Only Scylla 3.1.0 is
vulnerable.

Fixes #4855

Tests: unit (dev)
2019-10-23 18:23:26 +02:00
Tomasz Grabiec
71720be4f7 Merge "storage_service: Reject nodetool cleanup when there is pending ranges" from Asias
From Shlomi:

4 node cluster Node A, B, C, D (Node A: seed)
cassandra-stress write n=10000000 -pop seq=1..10000000 -node <seed-node>
cassandra-stress read duration=10h -pop seq=1..10000000 -node <seed-node>
while read is progressing
Node D: nodetool decommission
Node A: nodetool status node - wait for UL
Node A: nodetool cleanup (while decommission progresses)

I get the error on c-s once decommission ends
  java.io.IOException: Operation x0 on key(s) [383633374d31504b5030]: Data returned was not validated
The problem is when a node gets new ranges, e.g, the bootstrapping node, the
existing nodes after a node is removed or decommissioned, nodetool cleanup will
remove data within the new ranges which the node just gets from other nodes.

To fix, we should reject the nodetool cleanup when there is pending ranges on that node.

Note, rejecting nodetool cleanup is not a full protection because new ranges
can be assigned to the node while cleanup is still in progress. However, it is
a good start to reject until we have full protection solution.

Refs: #5045
2019-10-23 17:45:41 +02:00
Avi Kivity
2970578677 config: add configuration option for 3.1.0 heritage clusters
Scylla 3.1.0 broke the serialization format for TTLs. Later versions
corrected it, but if a cluster was originally installed as 3.1.0,
it will use the broken serialization forever. This configuration option
allows upgrades from 3.1.0 to succeed, by enabling the broken format
even for later versions.
2019-10-23 18:36:35 +03:00
Avi Kivity
bf4c319399 gc_clock, serialization: define new serialization for gc_clock::duration (aka TTLs)
Scylla 3.1.0 inadvertently changed the serialization format of TTLs
(internally represented as gc_clock::duration) from 32-bit to 64-bit,
as part of preparation for Y2038 (which comes earlier for TTLed cells).
This breaks mutations transported in a mixed cluster.

To fix this, we revert back to the 32-bit format, unless we're in a 3.1.0-
heritage cluster, in which case we use the 64-bit format. Overflow of
a TTL is not a concern, since TTLs are capped to 20 years by the TTL layer.
An assertion is added to verify this.

This patch only defines a variable to indicate we're in
a 3.1.0 heritage cluster, but a way to set it is left to
a later patch.
2019-10-23 18:36:33 +03:00
Avi Kivity
771e028c1a Update seastar submodule
* seastar 6bcb17c964...2963970f6b (4):
  > Merge "IPv6 scope support and network interface impl" from Calle
  > noncopyable_function: do not copy uninitialized data
  > Merge "Move smp and smp queue out of reactor" from Asias
  > Consolidate posix socket implementations
2019-10-23 16:43:02 +03:00
Piotr Sarna
472e3cb4e1 alternator-test: add authorization to README
The README paragraph informs about turning on authorization with:
   alternator-enforce-authorization: true
and has a short note on how to set up the secret key for tests.
2019-10-23 15:05:39 +02:00
Piotr Sarna
280eb28324 docs: update alternator entry for authorization
The document now mentions that secret keys are extracted
from the system_auth.roles table.
2019-10-23 15:05:39 +02:00
Piotr Sarna
ebb0af3500 alternator-test: add tests for expired signatures
The first test case ensures that expired signatures are not accepted,
while the second one checks that signatures with dates that reach out
too far into the future are also refused.
2019-10-23 15:05:39 +02:00
Piotr Sarna
a0a33ae4f3 alternator: add additional datestamp verification
The authorization signature contains both a full obligatory date header
and a shortened datestamp - an additional verification step ensures that
the shortened stamp matches the full date.
2019-10-23 15:05:39 +02:00
Piotr Sarna
718cba10a1 alternator: verify that the signature has not expired
AWS signatures have a 15min expiration policy. For compatibility,
the same policy is applied for alternator requests. The policy also
ensures that signatures expanding more than 15 minutes into the future
are treated as unsafe and thus not accepted.
2019-10-23 15:05:39 +02:00
Piotr Sarna
e90c4a8130 alternator-test: add a wrong password test
The additional test case submits a request as a user that is expected
to exist (in the local setup), but the provided password is incorrect.
It also updates test_wrong_key_access so it uses an empty string
for trying to authenticate as an inexistent user - in order to cover
more corner cases.
2019-10-23 15:05:39 +02:00
Piotr Sarna
524b03dea5 alternator: add key cache to authorization
In order to avoid fetching keys from system_auth.roles system table
on every request, a cache layer is introduced. And in order not to
reinvent the wheel, the existing implementation of loading_cache
with max size 1024 and a 1 minute timeout is used.
2019-10-23 15:05:39 +02:00
Piotr Sarna
6dee7737d7 alternator: use keys from system_auth.roles for authorization
Instead of having a hardcoded secret key, the server now verifies
an actual key extracted from system_auth.roles system table.
This commit comes with a test update - instead of 'whatever':'whatever',
the credentials used for a local run are 'alternator':'secret_pass',
which matches the initial contents of system_auth.roles table,
which acts as a key store.

Fixes #5046
2019-10-23 15:05:39 +02:00
Piotr Sarna
388b492040 alternator: move the api handler to a separate function
The lambda used for handling the api request has grown a little bit
too large, so it's moved to a separate method. Along with it,
the callbacks are now remembered inside the class itself.
2019-10-23 15:05:39 +02:00
Piotr Sarna
a93cf12668 alternator: futurize verify_signature function
The verify_signature utility will later be coupled with Scylla
authorization. In order to prepare for that, it is first transformed
into a function that returns future<>, and it also becomes a member
of class server. The reason it becoming a member function is that
it will make it easier to implement a server-local key cache.
2019-10-23 15:05:39 +02:00
Piotr Sarna
dc310baa2d alternator: add extracting key from system_auth.roles
As a first step towards coupling alternator authorization with Scylla
authorization, a helper function for extracting the key (salted_hash)
belonging to the user is added.
2019-10-23 15:05:39 +02:00
Asias He
f876580740 storage_service: Reject nodetool cleanup when there is pending ranges
From Shlomi:

4 node cluster Node A, B, C, D (Node A: seed)
cassandra-stress write n=10000000 -pop seq=1..10000000 -node <seed-node>
cassandra-stress read duration=10h -pop seq=1..10000000 -node <seed-node>
while read is progressing
Node D: nodetool decommission
Node A: nodetool status node - wait for UL
Node A: nodetool cleanup (while decommission progresses)

I get the error on c-s once decommission ends
  java.io.IOException: Operation x0 on key(s) [383633374d31504b5030]: Data returned was not validated

The problem is when a node gets new ranges, e.g, the bootstrapping node, the
existing nodes after a node is removed or decommissioned, nodetool cleanup will
remove data within the new ranges which the node just gets from other nodes.

To fix, we should reject the nodetool cleanup when there is pending ranges on that node.

Note, rejecting nodetool cleanup is not a full protection because new ranges
can be assigned to the node while cleanup is still in progress. However, it is
a good start to reject until we have full protection solution.

Refs: #5045
2019-10-23 19:20:36 +08:00
Asias He
a39c8d0ed0 Revert "storage_service: remove storage_service::_is_bootstrap_mode."
It will be needed by "storage_service: Reject nodetool cleanup when
there is pending ranges"

This reverts commit dbca327b46.
2019-10-23 19:20:36 +08:00
Raphael S. Carvalho
fc120a840d compaction: dont rely on undefined behavior when making garbage collected writer
Argument evaluation order is UB, so it's not guaranteed that
c->make_garbage_collected_sstable_writer() is called before
compaction is moved to run().

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20191023052647.9066-1-raphaelsc@scylladb.com>
2019-10-23 11:04:51 +03:00
Benny Halevy
3b3611b57a mutation_diff: standard input support
Also, not that the file name is properly quoted
it may contain space characters.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-23 08:29:58 +03:00
Benny Halevy
6feb4d5207 mutation_diff: accept diff_command option
To support using other diff tools than colordiff

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-23 08:29:47 +03:00
Tomasz Grabiec
dfac542466 Merge "extend multi-cell list & set type support" from Kostja
Make it possible to compare multi-cell lists and sets serialized
as maps with literal values and serialize them to network using
a standard format (vector of values).

This is a pre-requisite patch for column condition evaluation
in light-weight transactions.
2019-10-23 07:39:57 +03:00
Nadav Har'El
774f8aa4b8 docs/debugging.md: add guide on how to debug cores
Merged patch series from Botond Dénes:

This series extends the existing docs/debugging.md with a detailed guide
on how to debug Scylla coredumps. The intended target audience is
developers who are debugging their first core, hence the level of
details (hopefully enough). That said this should be just as useful for
seasoned debuggers just quickly looking up some snippet they can't
remember exactly. A Throubleshooting chapter is also added in this
series for commonly-met problems.

I decided to create this guide after myself having struggled for more
than a day on just opening(!) a coredump that was produced on Ubuntu.
As my main source, I used the How-to-debug-a-coredump page from the
internal wiki which contains many useful information on debugging
coredumps, however I found it to be missing some crucial information, as
well being very terse, thus being primarily useful for experienced
debuggers who can fill in the blanks. The reason I'm not extending said
wiki page is that I think this information should not be hidden in some
internal wiki page. Also, docs/debugging.md now seems to be a much
better base for such a document. This document was started as a
comprehensive debugging manual for beginners (but not just).

You will notice that the information on how to debug cores from
CentOS/Redhat are quite sparse. This is because I have no experience
with such cores, so for now the respective chapters are just stubs. I
intend to complete them in the future after having gained the necessary
experience and knowledge, however those being in possession of said
knowledge are more then welcome to send a patch. :)

Botond Dénes (4):
	docs/debugging.md: demote 'Starting GDB' and 'Using GDB'
	docs/debugging.md: fix formatting issues
	docs/debugging.md: add 'Debugging coredumps' subchapter
	docs/debugging.md: add 'Throubleshooting' subchapter

  docs/debugging.md | 240 +++++++++++++++++++++++++++++++++++++++++++---
  1 file changed, 228 insertions(+), 12 deletions(-)
2019-10-23 07:39:57 +03:00
Rafael Ávila de Espíndola
b3372be679 install-dependencies: Add Lua
Add lua as a dependency in preparation for UDF. This is the first
patch since it has to go in before to allow for a frozen toolchain
update.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
[avi: update frozen toolchain image]
Message-Id: <20191018231442.11864-2-espindola@scylladb.com>
2019-10-23 07:39:57 +03:00
Konstantin Osipov
a30c08e04e lwt: support for multi-cell set & list value serialization 2019-10-22 17:40:42 +03:00
Piotr Jastrzebski
eb8ae06ced cdc: Return db_context::builder by reference
from it's with_* functions.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-22 17:13:43 +03:00
Konstantin Osipov
605755e3f6 lwt: support for multi-cell map & list comparison with literal values
Multi-cell lists and maps may be stored in different formats: as sorted
vectors of pairs of values, when retreived from storage, or as sorted
vectors of values, when created from parser literals or supplied as
parameter values.

Implement a specialized compare for use when receiver and paramter
representation don't match.

Add helpers.
2019-10-22 17:07:33 +03:00
Raphael S. Carvalho
3b6583990d sstables: Fix sluggish backlog controller with incremental compaction
The problem is that backlog tracker is not being updated properly after
incremental compaction.
When replacing sstables earlier, we tell backlog tracker that we're done
with exhausted sstables[1], but we *don't* tell it about the new, sealed
sstables created that will replace the exhausted ones.
[1]: exhausted sstable is one that can be replaced earlier by compaction.
We need to notify backlog tracker about every sstable replacement which
was triggered by incremental compaction.
Otherwise, backlog for a table that enables incremental compaction will
be lower than it actually should. That's because new sstables being
tracked as partial decrease the backlog, whereas the exhausted ones
increase it.
The formula for a table's backlog is basically:
backlog(sstable set + compacting(1) - partial(2))
(1) compacting includes all compaction's input sstables, but the
exhausted ones are removed from it (correct behavior).
(2) partial includes all compaction's output sstables, but the ones
that replaced the exhausted sstables aren't removed from it (incorrect
behavior).
This problem is fixed by making backlog track *fully* aware of the early
replacement, not only the exhausted sstables, but also the new sstables
that replaced the exhausted ones. The new sstables need to be moved
inside the tracker from partial state to the regular one.

Fixes #5157.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20191016002838.23811-1-raphaelsc@scylladb.com>
2019-10-22 16:19:57 +03:00
Vladimir Davydov
6c6689f779 cql: refactor statement accounting
Rather than passing a pointer to a cql_stats member corresponding to
the statement type, pass a reference to a cql_stats object and use
statement_type, which is already stored in modification_statement, for
determining which counter to increment. This will allow us to account
conditional statements, which will have a separate set of counters,
right in modification_statement::execute() - all we'll need to do is
add the new counters and bump them in case execute_with_condition is
called.

While we are at it, remove extra inclusions from statement_type.hh so as
not to introduce any extra dependencies for cql_stats.hh users.

Message-Id: <20191022092258.GC21588@esperanza>
2019-10-22 12:39:14 +03:00
Nadav Har'El
51fc6c7a8e make static_row optional to reduce memory footprint
Merged patch series from Avi Kivity:

The static row can be rare: many tables don't have them, and tables
that do will often have mutations without them (if the static row
is rarely updated, it may be present in the cache and in readers,
but absent in memtable mutations). However, it always consumes ~100
bytes of memory, even if it not present, due to row's overhead.

Change it to be optional by allocating it as an external object rather
than inlined into mutation_partition. This adds overhead when the
static row is present (17 bytes for the reference, back reference,
and lsa allocator overhead).

perf_simple_query appears to marginally (2%) faster. Footprint is
reduced by ~9% for a cache entry, 12% in memtables. More details are
provided in the patch commitlog.

Tests: unit (debug)

Avi Kivity (4):
  managed_ref: add get() accessor
  managed_ref: add external_memory_usage()
  mutation_partition: introduce lazy_row
  mutation_partition: make static_row optional to reduce memory
    footprint

 cell_locking.hh                          |   2 +-
 converting_mutation_partition_applier.hh |   4 +-
 mutation_partition.hh                    | 284 ++++++++++++++++++++++-
 partition_builder.hh                     |   4 +-
 utils/managed_ref.hh                     |  12 +
 flat_mutation_reader.cc                  |   2 +-
 memtable.cc                              |   2 +-
 mutation_partition.cc                    |  45 +++-
 mutation_partition_serializer.cc         |   2 +-
 partition_version.cc                     |   4 +-
 tests/multishard_mutation_query_test.cc  |   2 +-
 tests/mutation_source_test.cc            |   2 +-
 tests/mutation_test.cc                   |  12 +-
 tests/sstable_mutation_test.cc           |  10 +-
 14 files changed, 355 insertions(+), 32 deletions(-)
2019-10-22 12:25:15 +03:00
Avi Kivity
bc03b0fd47 Merge "Some refactoring of node startup code" from Kamil
"
The node startup code (in particular the functions storage_service::prepare_to_join and storage_service::join_token_ring) is complicated and hard to understand.

This patch set aims to simplify it at least a bit by removing some dead code, moving code around so it's easier to understand and adding some comments that explain what the code does.
I did it to help me prepare for implementing generation and gossiping of CDC streams.
"

* 'bootstrap-refactors' of https://github.com/kbr-/scylla:
  storage_service: more comments in join_token_ring
  db: remove system_keyspace::update_local_tokens
  db: improve documentation for update_tokens and get_saved_tokens in system_keyspace
  storage_service: remove storage_service::_is_bootstrap_mode.
  storage_service: simplify storage_service::bootstrap method
  storage_service: fix typo in handle_state_moving
  storage_service: remove unnecessary use of stringstream
  storage_service: remove redundant call to update_tokens during join_token_ring
  storage_service: remove storage_service::set_tokens method.
  storage_service: remove is_survey_mode
  storage_service::handle_state_normal: tokens_to_update* -> owned_tokens
  storage_service::handle_state_normal: remove local_tokens_to_remove
  db::system_keyspace::update_tokens: take tokens by const ref
  db::system_keyspace::prepare_tokens: make static, take tokens by const ref
  token_metadata::update_normal_tokens: take tokens by const ref
2019-10-22 12:11:11 +03:00
Asias He
0a52ecb6df gossip: Fix max generation drift measure
Assume n1 and n2 in a cluster with generation number g1, g2. The
cluster runs for more than 1 year (MAX_GENERATION_DIFFERENCE). When n1
reboots with generation g1' which is time based, n2 will see
g1' > g2 + MAX_GENERATION_DIFFERENCE and reject n1's gossip update.

To fix, check the generation drift with generation value this node would
get if this node were restarted.

This is a backport of CASSANDRA-10969.

Fixes #5164
2019-10-21 20:20:55 +02:00
Kamil Braun
f1c26bf5c9 storage_service: more comments in join_token_ring
Explain why a call to update_normal_tokens is needed.
2019-10-21 11:11:03 +02:00
Kamil Braun
fb1e35f032 db: remove system_keyspace::update_local_tokens
That was dead code.
2019-10-21 11:11:03 +02:00
Kamil Braun
1b0c8e5d99 db: improve documentation for update_tokens and get_saved_tokens in system_keyspace 2019-10-21 11:11:03 +02:00
Kamil Braun
dbca327b46 storage_service: remove storage_service::_is_bootstrap_mode.
The flag did nothing. It was used in one place to check if there's a
bug, but it can easily by proven by reading the code that the check
would never pass.
2019-10-21 11:11:03 +02:00
Kamil Braun
b757a19f84 storage_service: simplify storage_service::bootstrap method
The storage_service::bootstrap method took a parameter: tokens to
bootstrap with. However, this method is only called in one place
(join_token_ring) with only one parameter: _bootstrap_tokens. It doesn't
make sense to call this method anywhere else with any other parameter.

This commit also adds a comment explaining what the method does and
moves it into the private section of storage_service.
2019-10-21 11:11:03 +02:00
Kamil Braun
84b41bd89b storage_service: fix typo in handle_state_moving 2019-10-21 11:11:03 +02:00
Kamil Braun
2ff4f9b8f4 storage_service: remove unnecessary use of stringstream 2019-10-21 11:11:03 +02:00
Kamil Braun
06cc7d409d storage_service: remove redundant call to update_tokens during join_token_ring
When a non-seed node was bootstrapping, system_keyspace::update_tokens
was called twice: first right after the tokens were generated (or
received if we were replacing a different node) in the call to
`bootstrap`, and then later in join_token_ring. The second call was
redundant.

The join_token_ring call was also redundant if we were not bootstrapping
and had tokens saved previously (e.g. when restarting). In that case we
would have read them from LOCAL and then save the same tokens again.

This commit removes the redundant call and inserts calls to
update_tokens where they are necessary, when new tokens are generated.
The aim is to make the code easier to understand.

It also adds a comment which explains why the tokens don't need to be
generated in one of the cases.
2019-10-21 11:11:03 +02:00
Kamil Braun
a223864f81 storage_service: remove storage_service::set_tokens method.
After commit 36ccf72f3c, this method
was used only in one place.
Its name did not make it obvious what it does and when is it safe to call it.
This commit pulls out the code from set_tokens to the point where it was
called (join_token_ring). The code is only possible to understand in
context.

This code was also saving the tokens to the LOCAL table before
retrieving them from this table again. There is no point in doing that:
1. there are no races, since when join_token_ring is running, it is the
only function which can call system_keyspace::update_tokens (which saves them to the
LOCAL table). There can be no multiple instances of join_token_ring.
2. Even if there was a race, this wouldn't fix anything. The tokens we
retrieve from LOCAL by calling get_local_tokens().get0() could already
be different in the LOCAL table when the get0() returns.
2019-10-21 11:09:59 +02:00
Kamil Braun
36ccf72f3c storage_service: remove is_survey_mode
That was dead, untested code, making it unnecessarily hard
to implement new features.
2019-10-21 10:38:49 +02:00
Kamil Braun
602c7268cc storage_service::handle_state_normal: tokens_to_update* -> owned_tokens
Replace the two variables:
    tokens_to_update_in_metadata
    tokens_to_update_in_system_keyspace
which were exactly the same, with one variable owned_tokens.
The new name describes what the variable IS instead what's it used for.

Add a comment to clarify what "owned" means: those are the tokens the
node chose and any collision was resolved positively for this node.

Move the variable definition further down in the code, where it's
actually needed.
2019-10-21 10:38:49 +02:00
Kamil Braun
2db07c697f storage_service::handle_state_normal: remove local_tokens_to_remove
That was dead code.
Removing tokens is handled inside remove_endpoint, using the
endpoints_to_remove set.
2019-10-21 10:38:49 +02:00
Kamil Braun
8c8a17a0fe db::system_keyspace::update_tokens: take tokens by const ref 2019-10-21 10:38:49 +02:00
Kamil Braun
00dcea3478 db::system_keyspace::prepare_tokens: make static, take tokens by const ref 2019-10-21 10:38:49 +02:00
Kamil Braun
e4ac4db1c5 token_metadata::update_normal_tokens: take tokens by const ref 2019-10-21 10:38:45 +02:00
Nadav Har'El
765dc86de4 Fix legacy token column handling for local indexes
Merged patch series from Piotr Sarna:

Calculating the select statement for given view_info structure
used to work fine, but once local indexes were introduced, a subtle
bug appeared: the legacy token column does not exist in local indexes
and a valid clustering key column was omitted instead.
That results in potentially incorrect partition slices being used later
in read-before-write.
There's a long term plan for removing select_statement from
view info altogether, but nonetheless the bug needs to be fixed first.

Branch: master, 3.1

Tests: unit(dev) + manual confirmation that a correct legacy column is picked
2019-10-20 16:04:40 +03:00
Nadav Har'El
631846a852 CDC: Implement minimal version that logs only primary key of each change
Merge a patch series from Piotr Jastrzębski (haaawk):

This PR introduces CDC in it's minimal version.

It is possible now to create a table with CDC enabled or to enable/disable
CDC on existing table. There is a management of CDC log and description
related to enabling/disabling CDC for a table.

For now only primary key of the changed data is logged.

To be able to co-locate cdc streams with related base table partitions it
was needed to propagate the information about the number of shards per node.
This was node through gossip.

There is an assumption that all the nodes use the same value for
sharding_ignore_msb_bits. If it does not hold we would have to gossip
sharding_ignore_msb_bits around together with the number of shards.

Fixes #4986.

Tests: unit(dev, release, debug)
2019-10-20 11:41:01 +03:00
Botond Dénes
4aa734f238 scylla-gdb.py: scylla generate_object_graph: use correct obj in edges
Currently, the function that generates the graph edges (and vertices)
with a breadth-first traversal of the object graph accidentally uses the
object that is the starting point of the graph as the "to" part of each
edge. This results in the graph having each of its edges point to the
starting point, as if all objects in it referenced said object directly.
Fix by using the object of the currently examined object.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191018113019.95093-1-bdenes@scylladb.com>
2019-10-18 13:48:20 +02:00
Botond Dénes
4dff50b7a4 docs/debugging.md: add 'Throubleshooting' subchapter
To the 'Debugging Scylla with GDB' chapter.
2019-10-18 10:08:23 +03:00
Botond Dénes
77ea086975 docs/debugging.md: add 'Debugging coredumps' subchapter
To the 'Debuggin Scylla with GDB` chapter. The '### Debugging
relocatable binaries built with the toolchain' subchapter is demoted to
be just a section in this new subchapter. It is also renamed to
'Relocatable binaries'.
This subchapter intends to be a complete guide on how to debug coredumps
from how to obtain the correct version of all the binaries all the way
to how to correctly open the core with GDB.
2019-10-18 10:08:23 +03:00
Pekka Enberg
f01d0e011c Update seastar submodule
* seastar e888b1df...6bcb17c9 (4):
  > iotune: don't crash in sequential read test if hitting EOF
  > Remove FindBoost.cmake from install files
  > Merge "Move reactor backend out of reactor" from Asias
  > fair_queue: Add fair_queue.cc
2019-10-18 08:45:22 +03:00
Piotr Jastrzebski
2b26e3c904 test: change test_partition_key_logging to test_primary_key_logging
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
997be35ef3 modification_statement: log in cdc clustering key of a change
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
d8718a4ffc test: add test_partition_key_logging
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
96c800ed0b modification_statement: log in cdc partition key of a change
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
a1edb68b16 test: check that alter table with cdc manages log and desc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
a45c894032 alter_table_statement: handle 'with cdc ='
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
629cdb5065 test: check that drop table with cdc removes log and desc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
57c3377b1f cql_test_env: add require_table_does_not_exist assertion
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
50d53cd43e drop_table_statement: remove cdc log and desc if cdc is enabled
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
b9d6635fc5 test: check that create table with cdc sets up log and desc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
81a34168a3 create_table_statement: handle 'with cdc ='
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:14 +02:00
Piotr Jastrzebski
6e29f5e826 create_table_statement: prepare announce_migration for cdc
This patch wrapps announce_migration logic into a lambda
that will be used both when cdc is used and when it's not.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
a9e43f4e86 test: add test_with_cdc_parameter
At the moment, this test only checks that table
creation and alteration sets cdc_options property
on a table correctly.

Future patches will extend this test to cover more
CDC aspects.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
8c6d860402 cql3: add cdc table property
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
386221da84 schema_tables: handle 'cdc' options
cdc options will be stored in scylla_tables to preserve
compatibility with Cassandra.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
8df942a320 schema_builder: handle schema::_cdc_options
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
ca9536a771 schema: add _cdc_options field
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
f079dce7b1 snitch: Provide getter for ignore_msb_bits of an endpoint
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
afe520ad77 gossip: Add application_state::IGNORE_MSB_BITS
We would like to share with other nodes
the value of ignore_msb_bits property used by the node.

This is needed because CDC will operate on
streams of changes. Each shard on each node
will have its own stream that will be identified
by a stream_id. Stream_id will be selected in
such a way that using stream_id as partition key
will locate partition identified by stream_id on
a node and shard that the stream belongs to.

To be able to generate such stream_id we need
to know ignore_msb_bits property value for each node.

IMPORTANT NOTE: At this point CDC does not support
topology changes. It will work only on a stable cluster.
Support for topology modifications will be added in
later steps.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
b9d5851830 snitch: Provide getter for shard_count of an endpoint
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
a66d7cfe57 gossip: Add application_state::SHARD_COUNT
We would like to share with other nodes
the number of shards available at the node.

This is needed because CDC will operate on
streams of changes. Each shard on each node
will have its own stream that will be identified
by a stream_id. Stream_id will be selected in
such a way that using stream_id as partition key
will locate partition identified by stream_id on
a node and shard that the stream belongs to.

To be able to generate such stream_id we need
to know how many shards are on each node.

IMPORTANT NOTE: At this point CDC does not support
topology changes. It will work only on a stable cluster.
Support for topology modifications will be added in
later steps.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
f7ce8e4f2b cdc: Add flag guarding it's usage
At first, CDC will only be enabled when experimental flag is on.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Tomasz Grabiec
d7c3e48e8c Merge "Prepare modification_statement for LWT" from Kostja
Refactor modification_statement to enable lightweight
transaction implementation.

This patch set re-arranges logic of
modification_statement::get_mutations() and uses
a column mask of identify the columns to prefetch.
It also pre-computes a few modification statement properties
at prepare, assuming the prepared statement is invalidated if
the underlying schema changes.
2019-10-17 10:51:00 +02:00
Konstantin Osipov
5d3bf03811 lwt: pre-compute modification_statement properties at prepare
They are used more extensively with introduction of lightweight
transactions, and pre-computing makes it easier to reason about
complexity of the scenarios where they are involved.
2019-10-16 22:44:44 +03:00
Konstantin Osipov
6e0f76ea60 lwt: use column mask to build partition_slice
Pre-compute column mask of columns to prefetch when preparing
a modification statement and use it to build partition_slice
object for read command. Fetch only the required columns.

Ligthweight transactions build up on this by using adding
columns used in conditions and in cas result set to the column
maks of columns to read. Batch statements unite all column
masks to build a single relation for all rows modified by
conditional statements of a batch.
2019-10-16 22:44:37 +03:00
Konstantin Osipov
f32a7a0763 lwt: move option set for modification statement read command
Move the option set for read command to update_parameters
class, since this class encapsulates the logic of working
with the read command result.
2019-10-16 22:41:00 +03:00
Konstantin Osipov
c0f0ab5edd lwt: introduce column mask
Introduce a bitset container which can be used to compute
all columns used in a query.

Add a partition_slice constructor which uses the bitset.
2019-10-16 22:40:55 +03:00
Konstantin Osipov
a00b9a92b3 lwt: refactor modification statement get_mutations()
Refactor get_mutations() so that the read command and
apply_updates() functions can be used in lightweight transactions.

Move read_command creation to an own method, as well as apply_updates().
Rewrite get_mutations() using the new API.

Avoid unnecessary shared pointers.
2019-10-16 22:32:51 +03:00
Tomasz Grabiec
7b7e4be049 Merge "lwt: introduce column_definition::ordinal_id" from Kostja
Introduce a column definition ordinal_id and use it in boosted
update_parameters::prefetch_data as a column index of a full row.

Lightweight transactions prefetch data and return a result set.
Make sure update_parameters::prefetch_data can serve as a
single representation of prefetched list cells as well as
condition cells and as a CAS result set.

I have a lot of plans for column_definition::ordinal_id, it
simplifies a lot of operations with columns and will also be
used for building a bitset of columns used in a query
or in multiple queries of a batch.
2019-10-16 15:11:10 +02:00
Konstantin Osipov
a2b629c3a1 lwt: boost update_parameters to serve as a CAS result set
In modification_statement/batch_statement, we need to prefetch data to
1) apply list operations
2) evaluate CAS conditions
3) return CAS result set.

Boost update_parameters::prefetch_data to serve as a single result set
for all of the above. In case of a batch, store multiple rows for
multiple clustering keys involved in the batch.

Use an ordered set for columns and rows to make sure 3) CAS result set
is returned to the client in an ordered manner.

Deserialize the primary key and add it to result set rows since
it is returned to the client as part of CAS result set.

Index columns using ordinal_id - this allows having a single
set for all columns and makes columns easy to look up.

Remove an extra memcpy to build view objects when looking
up a cell by primary key, use partition_key/clustering_key
objects for lookup.
2019-10-16 15:56:50 +03:00
Konstantin Osipov
a450c25946 lwt: remove dead code in cql3/update_parameters.hh 2019-10-16 15:48:40 +03:00
Konstantin Osipov
a4ccbece5c lwt: remove an unnecessary optional around prefetch_data
Get rid of an unnecessary optional around
update_parameters::prefetch_data.

update_parameters won't own prefetch_data in the future anyway,
since prefetch_data can be shared among multiple modification
statements of a batch, each statement having its own options
and hence its own update_parameters instance.
2019-10-16 15:48:25 +03:00
Konstantin Osipov
7a399ebe0d lwt: move prefetch_data_builder to update_parameters.cc
Move prefetch_data_builder class from modification_statement.cc
to update_parameters.cc.

We're going to share the same builder to build a result set
for condition evaluation and to apply updates of batch statements, so we
need to share it.

No other changes.
2019-10-16 15:48:08 +03:00
Konstantin Osipov
fa73421198 lwt: introduce column_definition::ordinal_id
Make sure every column in the schema, be it a column of partition
key, clustering key, static or regular one, has a unique ordinal
identifier.

This makes it easy to compute the set of columns used in a query,
as well as index row cells.

Allow to get column definition in schema by ordinal id.
2019-10-16 15:46:25 +03:00
Avi Kivity
543e6974b9 Merge "Fix Incremental Compaction Efficiency" from Raphael
"
Incremental compaction code to release exhausted sstables was inefficient because
it was basically preventing any release from ever happening. So a new solution is
implemented to make incremental compaction approach actually efficient while
being cautious about not introducing data resurrection. This solution consists of
storing GC'able tombstones in a temporary sstable and keeping it till the end of
compaction. Overhead is avoided by not enabling it to strategies that don't work
with runs composed of multiple fragments.

Fixes #4531.

tests: unit, longevity 1TB for incremental compaction
"

* 'fix_incremental_compaction_efficiency/v6' of https://github.com/raphaelsc/scylla:
  tests: Check that partition is not resurrected on compaction failure
  tests: Add sstable compaction test for gc-only mutation compactor consumer
  sstables: Fix Incremental Compaction Efficiency
2019-10-16 15:15:53 +03:00
Tomasz Grabiec
054b53ac06 Merge "Introduce scylla generate_object_graph and improve scylla find and scylla fiber" from Botond
Introduce `scylla generate_object_graph`, a command which generates a
visual object graph, where vertices are objects and edges are
references. The graph starts from the object specified by the user. The
graph allows visual inspection of the object graph and hopefully allows
the user to identify the object in question.

Add the `--resolve` flag to `scylla find`. When specified, `scylla find`
will attempt to resolve the first pointer in the found objects as a vtable
pointer. If successful the pointer as well as the resolved  symbol will
be added to the listing.

In the listing of `scylla fiber` also print the starting task (as the
first item).
2019-10-15 20:11:16 +02:00
Tomasz Grabiec
c76f905497 Merge "scylla-gdb.py: improve the toolbox for investigating OOMs (but not just)" from Botond
This mini-series contains assorted improvements that I found very useful
while debugging OOM crashes in the past weeks:
* A wrapper for `std::list`.
* A wrapper for `std::variant`.
* Making `scylla find` usable from python code.
* Improvements to `scylla sstables` and `scylla task_histogram`
  commands.
* The `$downcast_vptr()` convenience function.
* The `$dereference_lw_shared_ptr()` convenience function.

Convenience functions in gdb are similar to commands, with some key
differences:
* They have a defined argument list.
* They can return values.
* They can be part of any gdb expression in which functions are allowed.

This makes them very useful for doing operations on values then
returning them so that the developer can use it the gdb shell.
2019-10-15 19:54:09 +02:00
Avi Kivity
acc433b286 mutation_partition: make static_row optional to reduce memory footprint
The static row can be rare: many tables don't have them, and tables
that do will often have mutations without them (if the static row
is rarely updated, it may be present in the cache and in readers,
but absent in memtable mutations). However, it always consumes ~100
bytes of memory, even if it not present, due to row's overhead.

Change it to be optional by using lazy_row instead of row. Some call
sites treewide were adjusted to deal with the extra indirection.

perf_simple_query appears to improve by 2%, from 163krps to 165 krps,
though it's hard to be sure due to noisy measurements.

memory_footprint comparisons (before/after):

mutation footprint:		       mutation footprint:
 - in cache:	 1096		        - in cache:	992
 - in memtable:	 854		        - in memtable:	750
 - in sstable:	 351		        - in sstable:	351
 - frozen:	 540		        - frozen:	540
 - canonical:	 827		        - canonical:	827
 - query result: 342		        - query result: 342

 sizeof(cache_entry) = 112	        sizeof(cache_entry) = 112
 -- sizeof(decorated_key) = 36	        -- sizeof(decorated_key) = 36
 -- sizeof(cache_link_type) = 32        -- sizeof(cache_link_type) = 32
 -- sizeof(mutation_partition) = 200    -- sizeof(mutation_partition) = 96
 -- -- sizeof(_static_row) = 112        -- -- sizeof(_static_row) = 8
 -- -- sizeof(_rows) = 24	        -- -- sizeof(_rows) = 24
 -- -- sizeof(_row_tombstones) = 40     -- -- sizeof(_row_tombstones) = 40

 sizeof(rows_entry) = 232	        sizeof(rows_entry) = 232
 sizeof(lru_link_type) = 16	        sizeof(lru_link_type) = 16
 sizeof(deletable_row) = 168	        sizeof(deletable_row) = 168
 sizeof(row) = 112		        sizeof(row) = 112
 sizeof(atomic_cell_or_collection) = 8  sizeof(atomic_cell_or_collection) = 8

Tests: unit (dev)
2019-10-15 15:42:05 +03:00
Avi Kivity
88613e6882 mutation_partition: introduce lazy_row
lazy_row adds indirection to the row class, in order to reduce storage requirements
when the row is not present. The intent is to use it for the static row, which is
not present in many schemas, and is often not present in writes even in schemas that
have a static row.

Indirection is done using managed_ref, which is lsa-compatible.

lazy_row implements most of row's methods, and a few more:
 - get(), get_existing(), and maybe_create(): bypass the abstraction and the
   underlying row
 - some methods that accept a row parameter also have an overload with a lazy_row
   parameter
2019-10-15 15:42:05 +03:00
Avi Kivity
efe8fa6105 managed_ref: add external_memory_usage()
Like other managed containers, add external_memory_usage() so we can account
for a partition's memory footprint in memtable/cache.
2019-10-15 15:41:42 +03:00
Botond Dénes
71923577a4 docs/debugging.md: fix formatting issues 2019-10-15 14:40:24 +03:00
Botond Dénes
4babd116d8 docs/debugging.md: demote 'Starting GDB' and 'Using GDB'
They really belong to the 'Introduction' chapter, instead of being
separate chapters of their own.
2019-10-15 14:40:20 +03:00
Pekka Enberg
0c1dad0838 Merge "Misc documentation cleanup" from Botond
"Delete README-DPDK.md, move IDL.md to docs/ and fix
docs/review-checklist.md to point to scylla's coding style document,
instead of seastar's."

* 'documentation-cleanup/v3' of https://github.com/denesb/scylla:
  docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's
  docs: mv coding-style.md docs/
  rm README-DPDK.md
  docs: mv IDL.md docs/
2019-10-15 12:53:49 +02:00
Pekka Enberg
b466d7ee33 Merge "Misc documentation cleanup" from Botond
"Delete README-DPDK.md, move IDL.md to docs/ and fix
docs/review-checklist.md to point to scylla's coding style document,
instead of seastar's."

* 'documentation-cleanup/v3' of https://github.com/denesb/scylla:
  docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's
  docs: mv coding-style.md docs/
  rm README-DPDK.md
  docs: mv IDL.md docs/
2019-10-15 08:53:22 +03:00
Benny Halevy
fef3342a34 test: random_schema::make_ckeys: fix inifinte loop
Allow returning fewer random clustering keys than requested since
the schema may limit the total number we can generate, for example,
if there is only one boolean clustering column.

Fixes #5161

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-15 08:52:39 +03:00
Botond Dénes
544f38ea6d docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's 2019-10-15 08:23:08 +03:00
Botond Dénes
56df6fbd58 docs: mv coding-style.md docs/
It is not discoverable in its current location (root directory) due to
the sheer number of source files in there.
2019-10-15 08:23:08 +03:00
Botond Dénes
c0706e52ce rm README-DPDK.md
Probably a leftover from the era when seastar and scylla shared the same
git repo.
2019-10-15 08:23:01 +03:00
Botond Dénes
061ac53332 docs: mv IDL.md docs/
Documentations should be in docs/.
2019-10-15 08:21:09 +03:00
Piotr Sarna
9e98b51aaa view: fix view_info select statement for local indexes
Calculating the select statement for given view_info structure
used to work fine, but once local indexes were introduced, a subtle
bug appeared: the legacy token column does not exist in local indexes
and a valid clustering key column was omitted instead.
That results in potentially incorrect partition slices being used later
in read-before-write.
There's a long term plan for removing select_statement from
view info altogether, but nonetheless the bug needs to be fixed first.
2019-10-14 17:14:19 +02:00
Piotr Sarna
2ee8c6f595 index: add is_global_index() utility
The helper function is useful for determining if given schema
represents a global index.
2019-10-14 17:13:32 +02:00
Botond Dénes
b2e10a3f2f scylla-gdb.py: introduce scylla generate_object_graph
When investigating OOM:s a prominent pattern is a size class that is
exploded, using up most of the available memory alone. If one is lucky,
the objects causing the OOM are instances of some virtual class, making
their identification easy. Other times the objects are referenced by
instances of some virtual class, allowing their identification with some
work. However there are cases where neither these objects nor their
direct referrers are instances of virtual classes. This is the case
`scylla generate_object_graph` intends to help.

scylla generate_object_graph, like its name suggests generates the
object graph of the requested object. The object graph is a directed
graph, where vertices are objects and edges are references between them,
going from referrers to the referee. The vertices contain information,
like the address of the object, its size, whether it is a live or not
and if applies, the address and symbol name of its vtable. The edges
contain the list of offsets the referrer has references at. The
generated graph is an image, which allows the visual inspection of the
object graph, allowing the developer to notice patterns and hopefully
identify the problematic objects.

The graph is generated with the help of `graphwiz`. The command
generates `.dot` files which can be converted to images with the help of
the `dot` utility. The command can do this if the output file is one of
the supported image formats (e.g. `png`), otherwise only the `.dot` file
is generated, leaving the actual image generation to the user.
2019-10-14 16:21:18 +03:00
Botond Dénes
f9e8e54603 scylla-gdb.py: boost scylla find
Add `--resolve` flag, which will make the command attempt to resolve the
first pointer of the found objects as a vtable pointer. If this is
successful the vtable pointer as well as the symbol name will be added
to the listing. This in particular makes backtracing continuation chains
a breeze, as the continuation object the searched one depends on can be
found at glance in the resulting listing (instead of having to manually
probe each item).

The arguments of `scylla find` are now parsed via `argparse`. While at
it, support for all the size classes supported by the underlying `find`
command were added, in addition to `w` and `g`. However the syntax of
specifying the size class to use has been changed, it now has to be
specified with the `-s|--size` command line argument, instead of passing
`-w` or `-g`.
2019-10-14 16:21:18 +03:00
Botond Dénes
0773104f32 scylla_fiber: also print the task that is the starting point of the fiber
Or in other words, the task that is the argument of the search. Example:
    (gdb) scylla fiber 0x60001a305910
    Starting task: (task*) 0x000060001a305910 0x0000000004aa5260 vtable for seastar::continuation<...> + 16
    #0  (task*) 0x0000600016217c80 0x0000000004aa5288 vtable for seastar::continuation<...> + 16
    #1  (task*) 0x000060000ac42940 0x0000000004aa2aa0 vtable for seastar::continuation<...> + 16
    #2  (task*) 0x0000600023f59a50 0x0000000004ac1b30 vtable for seastar::continuation<...> + 16
2019-10-14 13:36:25 +03:00
Botond Dénes
1a8846c04a scylla-gdb.py: move the code finding text_start and text_end to get_text_range()
This code is currently duplicated in `find_vptrs()` and
`scylla_task_histogram`. Refactor it out into a function.
The code is also improved in two ways:
* Make the search stricter, ensuring (hopefully) that indeed the
  executable's text section is found, not that of the first object in
  the `gdb file` listing.
* Throw an exception in the case when the search fails.
2019-10-14 13:25:28 +03:00
Raphael S. Carvalho
7f1a2156c7 table: Don't account for shared SSTables in compaction backlog tracker
We don't want to add shared sstables to table's backlog tracker because:
1) table's backlog tracker has only an influence on regular compaction
2) shared sstables are never regular compacted, they're worked by
resharding which has its own backlog tracker.

Such sstables belong to more than one shard, meaning that currently
they're added to backlog tracker of all shards that own them.
But the thing is that such sstables ends up being resharded in shard
that may be completely random. So increasing backlog of all shards
such sstables belong to, won't lead to faster resharding. Also, table's
backlog tracker is supposed to deal only with regular compaction.

Accounting for shared sstables in table's tracker may lead to incorrect
speed up of regular compactions because the controller is not aware
that some relevant part of the backlog is due to pending resharding.
The fix is about ignoring sstables that will be resharded and let
table's backlog tracker account only for sstables that can be worked on
by regular compaction, and rely on resharding controlling itself
with its own tracker.
NOTE: this doesn't fix the resharding controlling issue completely,
as described in #4952. We'll still need to throttle regular compaction
on behalf of resharding. So subsequent work may be about:
- move resharding to its own priority class, perhaps streaming.
- make a resharding's backlog tracker accounts for sstables in all of
its pending jobs, not only the ongoing ones (currently limited to 1 by shard).
- limit compaction shares when resharding is in progress.
THIS only fixes the issue in which controller for regular compaction
shouldn't account sstables completely exclusive to resharding.

Fixes #5077.
Refs #4952.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190924022109.17400-1-raphaelsc@scylladb.com>
2019-10-13 10:14:13 +03:00
Raphael S. Carvalho
88611d41d0 sstables: Fix major compaction's space amplification with incremental compaction
Incremental compaction efficiency depends on the reference of sstables
compacted being all released because the file descriptors of sstable
components are only closed once the sstable object is destructed.
Incremental compaction is not working for major compaction because a reference
to released sstables are being kept in the compaction manager, which prevents
their disk usage from being released. So the space amplification would be
the same as with a non-incremental approach, i.e. needs twice the amount of
used disk space for the table(s). With this issue fixed, the database now
becomes very major compaction friendly, the space requirement becoming very
low, a constant which is roughly number of fragments being currently compacted
multiplied by fragment size (1GB by default), for each table involved.

Fixes #5140.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20191003211927.24153-1-raphaelsc@scylladb.com>
2019-10-13 09:55:11 +03:00
Raphael S. Carvalho
17c66224f7 tests: Check that partition is not resurrected on compaction failure
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-10-13 00:06:51 -03:00
Raphael S. Carvalho
6301a10fd7 tests: Add sstable compaction test for gc-only mutation compactor consumer
Make sure gc'able-tombstone-only sstable is properly generated with data that
comes from regular compaction's input sstable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-10-12 21:38:53 -03:00
Raphael S. Carvalho
91260cf91b sstables: Fix Incremental Compaction Efficiency
Compaction prevents data resurrection from happening by checking that there's
no way a data shadowed by a GC'able tombstone will survive alone, after
a failure for example.

Consider the following scenario:
We have two runs A and B, each divided to 5 fragments, A1..A5, B1..B5.

They have the following token ranges:

 A:  A1=[0, 3]   A2=[4, 7]  A3=[8, 11]   A4=[12, 15]    A5=[16,18]
B is the same as A's ranges, offset by 1:

 B:  B1=[1,4]    B2=[5,8]  B3=[9,12]    B4=[13,16]    B5=[17,19]

Let's say we are finished flushing output until position 10 in the compaction.
We are currently working on A3 and B3, so obviously those cannot be deleted.
Because B2 overlaps with A3, we cannot delete B2 either.
Otherwise, B2 could have a GC'able tombstone that shadows data in A3, and after
B2 is gone, dead data in A3 could be resurrected *on failure*.
Now, A2 overlaps with B2 which we couldn't delete yet, so we can't delete A2.
Now A2 overlaps with B1 so we can't delete B1. And B1 overlaps with A1 so
we can't delete A1. So we can't delete any fragment.

The problem with this approach is obvious, fragments can potentially not be
released due to data dependency, so incremental compaction efficiency is
severely reduced.
To fix it, let's not purge GC'able tombstones right away in the mutation
compactor step. Instead, let's have compaction writing them to a separate
sstable run that would be deleted in the end of compaction.
By making sure that tombstone information from all compacting sstables is not
lost, we no longer need to have incremental compaction imposing lots of
restriction on which fragments could be released. Now, any sstable which data
is safe in a new sstable can be released right away. In addition, incremental
compaction will only take place if compaction procedure is working with one
multi-fragment sstable run at least.

Fixes #4531.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-10-12 21:36:03 -03:00
Kamil Braun
ef9d5750c8 view: fix bug in virtual columns.
When creating a virtual column of non-frozen map type,
the wrong type was used for the map's keys.

Fixes #5165.
2019-10-11 20:47:06 +03:00
Avi Kivity
f12feec2c9 Update seastar submodule
* seastar 1f68be436f...e888b1df9c (8):
  > sharded: Make map work with mapper that returns a future
  > cmake: Remove FindBoost.cmake
  > Reduce noncopyable_function instruction cache footprint
  > doc: add Loops section to the tutorial
  > Merge "Move file related code out of reactor" from Asias
  > Merge "Move the io_queue code out of reactor" from Asias
  > cmake: expose seastar_perf_testing lib
  > future: class doc: explain why discarding a future is bad

 - main.cc now includes new file io_queue.hh
 - perf tests now include seastar perf utilities via user, not
   system, includes since those are not exported
2019-10-10 18:17:28 +03:00
Nadav Har'El
33027a36b4 alternator: Add authorization
Merged patch set from Piotr Sarna:

Refs #5046

This commit adds handling "Authorization:" header in incoming requests.
The signature sent in the authorization is recomputed server-side
and compared with what the client sent. In case of a mismatch,
UnrecognizedClientException is returned.
The signature computation is based on boto3 Python implementation
and uses gnutls to compute HMAC hashes.

This series is rebased on a previous HTTPS series in order to ease
merging these two. As such, it depends on the HTTPS series being
merged first.

Tests: alternator(local, remote)

The series also comes with a simple authorization test and a docs update.

Piotr Sarna (6):
  alternator: migrate split() function to string_view
  alternator: add computing the auth signature
  config: add alternator_enforce_authorization entry
  alternator: add verifying the auth signature
  alternator-test: add a basic authorization test case
  docs: update alternator authorization entry

 alternator-test/test_authorization.py |  34 ++++++++
 configure.py                          |   1 +
 alternator/{server.hh => auth.hh}     |  22 ++---
 alternator/server.hh                  |   3 +-
 db/config.hh                          |   1 +
 alternator/auth.cc                    |  88 ++++++++++++++++++++
 alternator/server.cc                  | 112 +++++++++++++++++++++++---
 db/config.cc                          |   1 +
 main.cc                               |   2 +-
 docs/alternator/alternator.md         |   7 +-
 10 files changed, 241 insertions(+), 30 deletions(-)
 create mode 100644 alternator-test/test_authorization.py
 copy alternator/{server.hh => auth.hh} (58%)
 create mode 100644 alternator/auth.cc
2019-10-10 15:57:46 +03:00
Nadav Har'El
df62499710 docs/isolation.md: copy-edit
Minor spelling and syntax corrections. No new content or semantic changes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191010093457.20439-1-nyh@scylladb.com>
2019-10-10 15:17:28 +03:00
Piotr Dulikowski
c04e8c37aa distributed_loader: populate non-system keyspaces in parallel
Before this change, when populating non-system keyspaces, each data
directory was scanned and for each entry (keyspace directory),
a keyspace was populated. This was done in a serial fashion - populating
of one keyspace was not started until the previous one was done.

Loading keyspaces in such fashion can introduce unnecessary waiting
in case of a large number of keyspaces in one data directory. Population
process is I/O intensive and barely uses CPU.

This change enables parallel loading of keyspaces per data directory.
Populating the next keyspace does not wait for the previous one.

A benchmark was performed measuring startup time, with the following
setup:
  - 1 data directory,
  - 200 keyspaces,
  - 2 tables in each keyspace, with the following schema:
      CREATE TABLE tbl (a int, b int, c int, PRIMARY KEY(a, b))
        WITH CLUSTERING ORDER BY (b DESC),
  - 1024 rows in each table, with values (i, 2*i, 3*i) for i in 0..1023,
  - ran on 6-core virtual machine running on i7-8750H CPU,
  - compiled in dev mode,
  - parameters: --smp 6 --max-io-requests 4 --developer-mode=yes
      --datadir $DIR --commitlog-directory $DIR
      --hints-directory $DIR --view-hints-directory $DIR

The benchmark tested:
  - boot time, by comparing timestamp of the first message in log,
    and timestamp of the following message:
      "init - Scylla version ... initialization completed."
  - keyspace population time, by comparing timestamps of messages:
      "init - loading non-system sstables"
    and
      "init - starting view update generator"

The benchmark was run 5 times for sequential and parallel version,
with the following results:
  - sequential: boot 31.620s, keyspace population 6.051s
  - parallel:   boot 29.966s, keyspace population 4.360s

Keyspace population time decreased by ~27.95%, and overall boot time
by about ~5.23%.

Tests: unit(release)

Fixes #2007
2019-10-10 15:12:23 +03:00
Piotr Sarna
6ca55d3c83 docs: update alternator authorization entry
The entry now contains a comment that computing a signature works,
but is still based on a hardcoded key.
2019-10-10 13:51:00 +02:00
Piotr Sarna
23798b7301 alternator-test: add a basic authorization test case
The test case ensures that passing wrong credential results
in getting an UnrecognizedClientException.
2019-10-10 13:51:00 +02:00
Piotr Sarna
97cbb9a2c7 alternator: add verifying the auth signature
The signature sent in the "Authorization:" header is now verified
by computing the signature server-side with a matching secret key
and confirming that the signatures match.
Currently the secret key is hardcoded to be "whatever" in order
to work with current tests, but it should be replaced
by a proper key store.

Refs #5046
2019-10-10 13:51:00 +02:00
Piotr Sarna
e245b54502 config: add alternator_enforce_authorization entry
The config entry will be used to turn authorization for alternator
requests on and off. The default is currently off, since the key store
is not implemented yet.
2019-10-10 13:51:00 +02:00
Piotr Sarna
589a22d078 alternator: add computing the auth signature
A function for computing the auth signature from user requests
is added, along with helper functions. The implementation
is based on gnutls's HMAC.

Refs #5046
2019-10-10 13:51:00 +02:00
Piotr Sarna
ca58b46b4c alternator: migrate split() function to string_view
The implementation of string split was based on sstring type for
simplicity, but it turns out that more generic std::string_view
will be beneficial later to avoid unneeded string copying.
Unfortunately boost::split does not cooperate well with string views,
so a simple manual implementation is provided instead.
2019-10-10 13:50:59 +02:00
Botond Dénes
52afbae1e5 README.md: add links to other documentation sources
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191010103926.34705-3-bdenes@scylladb.com>
2019-10-10 14:15:01 +03:00
Botond Dénes
e52712f82c docs: add README.md
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191010103926.34705-2-bdenes@scylladb.com>
2019-10-10 14:14:09 +03:00
Amnon Heiman
64c2d28a7f database: Add counter for the number of schema changes
Schema changes can have big effects on performance, typically it should
be a rare event.

It is usefull to monitor how frequently the schema changed.
This patch adds a counter that increases each time a schema changed.

After this patch the metrics would look like:

scylla_database_schema_changed{shard="0",type="derive"} 2

Fixes #4785

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-10-08 17:54:49 +02:00
Asias He
b89ced4635 streaming: Do not open rpc stream connection if reader has no data
We can use the reader::peek() to check if the reader contains any data.
If not, do not open the rpc stream connection. It helps to reduce the
port usage.

Refs: #4943
2019-10-08 10:31:02 +02:00
Konstantin Osipov
94006d77b1 lwt: add cas_contention_timeout_in_ms to config
Make the default conform to the origin.
Message-Id: <20191006154532.54856-3-kostja@scylladb.com>
2019-10-08 00:02:35 +02:00
Konstantin Osipov
383e17162a lwt: implement query_options::check_serial_consistency()
Both in a single-statement transaction and in a batch
we expect that serial consistency is provided. Move the
check to query_options class and make it available for
reuse.

Keep get_serial_consistency() around for use in
transport/server.cc.
Message-Id: <20191006154532.54856-2-kostja@scylladb.com>
2019-10-08 00:02:35 +02:00
Piotr Sarna
36a1905e98 storage_proxy: handle unstarted write cancelling
When another node is reported to be down, view updates queued
for it are cancelled, but some of them may already be initiated.
Right now, cancelling such a write resulted in an exception,
but on conceptual level it's not really an exception, since
this behaviour is expected.
Previous version of this patch was based on introducing a special
exception type that was later handled specially, but it's not clear
if it's a good direction. Instead, this patch simply makes this
path non-exceptional, as was originally done by Nadav in the first
version of the series that introduced handling unstarted write
cancellations. Additionally, a message containing the information
that a write is cancelled is logged with debug level.
2019-10-07 16:55:36 +03:00
Vladimir Davydov
e8bcb34ed4 api: drop /storage_proxy/metrics/cas_read/condition_not_met
There's no such metric in Cassandra (although Cassadra's docs mistakenly
say it exists). Having it would make no sense anyway so let's drop it.

Message-Id: <b4f7a6ad278235c443cb8ea740bfa6399f8e4ee1.1570434332.git.vdavydov@scylladb.com>
2019-10-07 16:54:39 +03:00
Piotr Sarna
5ab134abef alternator-test: update HTTPS section of README
README.md has 3 fixes applied:
 - s/alternator_tls_port/alternator_https_port
 - conf directory is mentioned more explicitly
 - it now correctly states that the self-signed certificate
   warning *is* explicitly ignored in tests
Message-Id: <e5767f7dbea260852fc2fa9b613e1bebf490cc78.1570444085.git.sarna@scylladb.com>
2019-10-07 14:51:16 +03:00
Avi Kivity
8ed6f94a16 Merge "Fix handling of schema alters and eviction in cache" from Tomasz
"
Fixes #5134, Eviction concurrent with preempted partition entry update after
  memtable flush may allow stale data to be populated into cache.

Fixes #5135, Cache reads may miss some writes if schema alter followed by a
  read happened concurrently with preempted partition entry update.

Fixes #5127, Cache populating read concurrent with schema alter may use the
  wrong schema version to interpret sstable data.

Fixes #5128, Reads of multi-row partitions concurrent with memtable flush may
  fail or cause a node crash after schema alter.
"

* tag 'fix-cache-issues-with-schema-alter-and-eviction-v2' of github.com:tgrabiec/scylla:
  tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read
  tests: row_cache_stress_test: Verify all entries are evictable at the end
  tests: row_cache_stress_test: Exercise single-partition reads
  tests: row_cache_stress_test: Add periodic schema alters
  tests: memtable_snapshot_source: Allow changing the schema
  tests: simple_schema: Prepare for schema altering
  row_cache: Record upgraded schema in memtable entries during update
  memtable: Extract memtable_entry::upgrade_schema()
  row_cache, mvcc: Prevent locked snapshots from being evicted
  row_cache: Make evict() not use invalidate_unwrapped()
  mvcc: Introduce partition_snapshot::touch()
  row_cache, mvcc: Do not upgrade schema of entries which are being updated
  row_cache: Use the correct schema version to populate the partition entry
  delegating_reader: Optimize fill_buffer()
  row_cache, memtable: Use upgrade_schema()
  flat_mutation_reader: Introduce upgrade_schema()
2019-10-07 14:43:36 +03:00
Nadav Har'El
f2f0f5eb0f alternator: add https support
Merged patch series from Piotr Sarna:

This series adds HTTPS support for Alternator.
The series comes with --https option added to alternator-test, which makes
the test harness run all the tests with HTTPS instead of HTTP. All the tests
pass, albeit with security warnings that a self-signed x509 certificate was
used and it should not be trusted.

Fixes #5042
Refs scylladb/seastar#685

Patches:
  docs: update alternator entry on HTTPS
  alternator-test: suppress the "Unverified HTTPS request" warning
  alternator-test: add HTTPS info to README.md
  alternator-test: add HTTPS to test_describe_endpoints
  alternator-test: add --https parameter
  alternator: add HTTPS support
  config: add alternator HTTPS port
2019-10-07 12:38:20 +03:00
Avi Kivity
969113f0c9 Update seastar submodule
* seastar c21a7557f9...1f68be436f (6):
  > scheduling: Add per scheduling group data support
  > build: Include dpdk as a single object in libseastar.a
  > sharded: fix foreign_ptr's move assignment
  > build: Fix DPDK libraries linking in pkg-config file
  > http server: https using tls support
  > Make output_stream blurb Doxygen
2019-10-07 12:18:49 +03:00
Nadav Har'El
754add1688 alternator: fix Expected's BEGINS_WITH error handling
The BEGINS_WITH condition in conditional updates (via Expected) requires
that the given operand be either a string or a binary. Any other operand
should result in a validation exception - not a failed condition as we
generate now.

This patch fixes the test for this case so it will succeed against
Amazon DynamoDB (before this patch it fails - this failure was masked by
a typo before commit 332ffa77ea). The patch
then fixes our code to handle this case correctly.

Note that BEGINS_WITH handling of wrong types is now asymmetrical: A bad
type in the operand is now handled differently from a bad type in the
attribute's value. We add another check to the test to verify that this
is the case.

Fixes #5141

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191006080553.4135-1-nyh@scylladb.com>
2019-10-06 17:16:55 +03:00
Botond Dénes
d0fa5dc34d scylla-gdb.py: introduce the downcast_vptr convenience function
When debugging one constantly has to inspect object for which only
a "virtual pointer" is available, that is a pointer that points to a
common parent class or interface.
Finding the concrete type and downcasting the pointer is easy enough but
why do it manually when it is possible to automate it trivially?
$downcast_vptr() returns any virtual pointer given to it, casted to the
actual concrete object.
Exlample:
    (gdb) p $1
    $2 = (flat_mutation_reader::impl *) 0x60b03363b900
    (gdb) p $downcast_vptr(0x60b03363b900)
    $3 = (combined_mutation_reader *) 0x60b03363b900
    # The return value can also be dereferenced on the spot.
    (gdb) p *$downcast_vptr($1)
    $4 = {<flat_mutation_reader::impl> = {_vptr.impl = 0x46a3ea8 <vtable
    for combined_mutation_reader+16>, _buffer = {_impl = {<std::al...
2019-10-04 17:45:47 +03:00
Botond Dénes
434a41d39b scylla-gdb.py: introduce the dereference_lw_shared_ptr convenience function
Dereferencing an `seastar::lw_shared_ptr` is another tedious manual
task. The stored pointer (`_p`) has to be casted to the right subclass
of `lw_shared_ptr_counter_base`, which involves inspecting the code,
then make writing a cast expression that gdb is willing to parse. This
is something machines are so much better at doing.
`$dereference_lw_shared_ptr` returns a pointer to the actual pointed-to
object, given an instance of `seastar::lw_shared_ptr`.
Example:
    (gdb) p $1._read_context
    $2 = {_p = 0x60b00b068600}
    (gdb) p $dereference_lw_shared_ptr($1._read_context)
    $3 = {<seastar::enable_lw_shared_from_this<cache::read_context>>
    = {<seastar::lw_shared_ptr_counter_base> = {_count = 1}, ...
2019-10-04 17:45:47 +03:00
Botond Dénes
f5de002318 scylla-gdb.py: scylla_sstables: also print the sstable filename
And expose the method that obtains the file-name of an sstble object to
python code.
2019-10-04 17:45:32 +03:00
Botond Dénes
ad7a668be9 scylla-gdb.py: scylla_task_histogram: expose internal parameters
Make all the parameters of the sampling tweakable via command line
arguments. I strived to keep full backward compatibility, but due to the
limitations of `argparse` there is one "breaking" change. The optional
positional size argument is now a non-positional argument as `argparse`
doesn't support optional positional arguments.
Added documentation for both the command itself as well as for all the
arguments.
2019-10-04 17:44:40 +03:00
Botond Dénes
7767cc486e scylla-gdb.py: make scylla_find usable from python code 2019-10-04 17:44:40 +03:00
Botond Dénes
9cdea440ef scylla-gdb.py: add std_variant, a wrapper for std::variant
Allows conveniently obtaining the active member via calling `get()`.
2019-10-04 17:44:40 +03:00
Botond Dénes
55e9097dd9 scylla-gdb.py: add std_list, a wrapper for an std::list
std_list makes an `std::list` instance accessible from python code just
like a regular (read-only) python container.
2019-10-04 17:44:40 +03:00
Botond Dénes
b8f0b3ba93 std_optional: fix get()
Apparently there is now another layer of indirection: `std::_Storage`.
2019-10-04 17:43:40 +03:00
Tomasz Grabiec
020a537ade tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read 2019-10-04 11:38:13 +02:00
Tomasz Grabiec
ebedefac29 tests: row_cache_stress_test: Verify all entries are evictable at the end 2019-10-04 11:38:12 +02:00
Tomasz Grabiec
1b95f5bf60 tests: row_cache_stress_test: Exercise single-partition reads
make_single_key_reader() currently doesn't actually create
single-partition readers because it doesn't set
mutation_reader::forwarding::no when it creates individual
readers. The readers will default to mutation_reader::forwarding::yes
and actually create scanning readers in preparation for
fast-forwarding across partitions.

Fix by passing mutation_reader::forwarding::no.
2019-10-04 11:38:12 +02:00
Tomasz Grabiec
81dd17da4e tests: row_cache_stress_test: Add periodic schema alters
Reproduces #5127.
2019-10-03 22:03:29 +02:00
Tomasz Grabiec
2fc144e1a8 tests: memtable_snapshot_source: Allow changing the schema 2019-10-03 22:03:29 +02:00
Tomasz Grabiec
22dde90dba tests: simple_schema: Prepare for schema altering
Currently, methods of simple_schema assume that table's schema doesn't
change. Accessors like get_value() assume that rows were generated
using simple_schema::_s. Because if that, the column_definition& for
the "v" column is cached in the instance. That column_definiion&
cannot be used to access objects created with a different schema
version. To allow using simple_schema after schema changes,
column_definition& caching is now tagged with the table schema version
of origin. Methods which access schema-dependent objects, like
get_value(), are now accepting schema& corresponding to the objects.

Also, it's now possible to tell simple_schema to use a different
schema version in its generator methods.
2019-10-03 22:03:29 +02:00
Tomasz Grabiec
e6afc89735 row_cache: Record upgraded schema in memtable entries during update
Cache update may defer in the middle of moving of partition entry
from a flushed memtable to the cache. If the schema was changed since
the entry was written, it upgrades the schema of the partition_entry
first but doesn't update the schema_ptr in memtable_entry. The entry
is removed from the memtable afterward. If a memtable reader
encounters such an entry, it will try to upgrade it assuming it's
still at the old schema.

That is undefined behavior in general, which may include:

 - read failures due to bad_alloc, if fixed-size cells are interpreted
   as variable-sized cells, and we misinterpret a value for a huge
   size

 - wrong read results

 - node crash

This doesn't result in a permanent corruption, restarting the node
should help.

It's the more likely to happen the more rows there are in a
partition. It's unlikely to happen with single-row partitions.

Introduced in 70c7277.

Fixes #5128.
2019-10-03 22:03:29 +02:00
Tomasz Grabiec
ea461a3884 memtable: Extract memtable_entry::upgrade_schema() 2019-10-03 22:03:29 +02:00
Tomasz Grabiec
90d6c0b9a2 row_cache, mvcc: Prevent locked snapshots from being evicted
If the whole partition entry is evicted while being updated from the
memtable, a subsequent read may populate the partition using the old
version of data if it attempts to do it before cache update advances
past that partition. Partial eviction is not affected because
populating reads will notice that there is a newer snapshot
corresponding to the updater.

This can happen only in OOM situations where the whole cache gets evicted.

Affects only tables with multi-row partitions, which are the only ones
that can experience the update of partition entry being preempted.

Introduced in 70c7277.

Fixes #5134.
2019-10-03 22:03:29 +02:00
Tomasz Grabiec
57a93513bd row_cache: Make evict() not use invalidate_unwrapped()
invalidate_unwrapped() calls cache_entry::evict(), which cannot be
called concurrently with cache update. invalidate() serializes it
properly by calling do_update(), but evict() doesn't. The purpose of
evict() is to stress eviction in tests, which can happen concurrently
with cache update. Switch it to use memory reclaimer, so that it's
both correct and more realistic.

evict() is used only in tests.
2019-10-03 22:03:28 +02:00
Tomasz Grabiec
c88a4e8f47 mvcc: Introduce partition_snapshot::touch() 2019-10-03 22:03:28 +02:00
Tomasz Grabiec
25e2f87a37 row_cache, mvcc: Do not upgrade schema of entries which are being updated
When a read enters a partition entry in the cache, it first upgrades
it to the current schema of the cache. The same happens when an entry
is updated after a memtable flush. Upgrading the entry is currently
performed by squashing all versions and replacing them with a single
upgraded version. That has a side effect of detaching all snapshots
from the partition entry. Partition entry update on memtable flush is
writing into a snapshot. If that snapshot is detached by a schema
upgrade, the entry will be missing writes from the memtable which fall
into continuous ranges in that entry which have not yet been updated.

This can happen only if the update of the entry is preempted and the
schema was altered during that, and a read hit that partition before
the update went past it.

Affects only tables with multi-row partitions, which are the only ones
that can experience the update of partition entry being preempted.

The problem is fixed by locking updated entries and not upgrading
schema of locked entries. cache_entry::read() is prepared for this,
and will upgrade on-the-fly to the cache's schema.

Fixes #5135
2019-10-03 22:03:28 +02:00
Tomasz Grabiec
0675088818 row_cache: Use the correct schema version to populate the partition entry
The sstable reader which populates the partition entry in the cache is
using the schema of the partition entry snapshot, which will be the
schema of the cache at the time the partition was entered. If there
was a schema change after the cache reader entered the partition but
before it created the sstable reader, the cache populating reader will
interpret sstable fragments using the wrong schema version. That is
more likely if partitions have many rows, and the front of the
partition is populated. With single-row partitions that's unlikely to
happen.

That is undefined behavior in general, which may include:

 - read failures due to bad_alloc, if fixed-size cells are
   interpreted as variable-sized cells, and we misinterpret
   a value for a huge size

 - wrong read results

 - node crash

This doesn't result in a permanent corruption, restarting the node
should help.

Fixes #5127.
2019-10-03 22:03:28 +02:00
Tomasz Grabiec
10992a8846 delegating_reader: Optimize fill_buffer()
Use move_buffer_content_to() which is faster than fill_buffer_from()
because it doesn't involve popping and pushing the fragments across
buffers. We save on size estimation costs.
2019-10-03 22:03:28 +02:00
Piotr Sarna
07ac3ea632 docs: update alternator entry on HTTPS
The HTTPS entry is updated - it's now supported, but still
misses the same features as HTTP - CRC headers, etc.
2019-10-03 19:10:30 +02:00
Piotr Sarna
b63077a8dc alternator-test: suppress the "Unverified HTTPS request" warning
Running with --https and a self-signed certificate results in a flood
of expected warnings, that the connection is not to be trusted.
These warnings are silenced, as users runing a local test with --https
usually use self-signed certificates.
2019-10-03 19:10:30 +02:00
Piotr Sarna
e65fd490da alternator-test: add HTTPS info to README.md
A short paragraph about running tests with `--https` and configuring
the cluster to work correctly with this parameter is added to README.md.
2019-10-03 19:10:30 +02:00
Piotr Sarna
0d28d7f528 alternator-test: add HTTPS to test_describe_endpoints
The test_describe_endpoints test spawns another client connection
to the cluster, so it needs to be HTTPS-aware in order to work properly
with --https parameter.
2019-10-03 19:10:30 +02:00
Piotr Sarna
9fd77ed81d alternator-test: add --https parameter
Running with --https parameter will result in sending the requests
via HTTPS instead of HTTP. By default, port 8043 is used for a local
cluster. Before running pytest --https, make sure that Scylla
was properly configured to initialize a HTTPS alternator server
by providing the alternator_tls_port parameter.

The HTTPS-based connection runs with verification disabled,
otherwise it would not work with self-signed certificates,
which are useful for tests.
2019-10-03 19:10:30 +02:00
Piotr Sarna
e1b0537149 alternator: add HTTPS support
By providing a server based on a TLS socket, it's now possible
to serve HTTPS requests in alternator. The HTTPS server is enabled
by setting its port in scylla.yaml: alternator_tls_port=XXXX.
Alternator TLS relies on the existing TLS configuration,
which is provided by certificate, keyfile, truststore, priority_string
options.

Fixes #5042
2019-10-03 19:10:30 +02:00
Piotr Sarna
b42eb8b80a config: add alternator HTTPS port
The config variable will be used to set up a TLS-based server
for serving alternator HTTPS requests.
2019-10-03 19:10:29 +02:00
Nadav Har'El
9d4e71bbc6 alternator-test: fix misleading xfail message
The test test_update_expression_function_nesting() fails because DynamoDB
don't allow an expression like list_append(list_append(:val1, :val2), :val3)
but Alternator doesn't check for this (and supports this expression).

The "xfail" message was outdated, suggesting that the test fails because
the "SET" expression isn't supported - but it is. So replace the message
by a more accurate one.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190915104708.30471-1-nyh@scylladb.com>
2019-10-03 18:45:03 +03:00
Nadav Har'El
9747019e7b alternator: implement additional Expected operators
Merged patch set from Dejan Mircevski implementing some of the
missing operators for Expected: NE, IN, NULL and NOT_NULL.

Patches:
  alternator: Factor out Expected operand checks
  alternator: Implement NOT_NULL operator in Expected
  alternator: Implement NULL operator in Expected
  alternator: Fix expected_1_null testcase
  alternator: Implement IN operator in Expected
  alternator: Implement NE operator in Expected
  alternator: Factor out common code in Expected
2019-10-03 18:12:38 +03:00
Konstantin Osipov
25ffd36d21 lwt: prepare the expression tree for IF condition evaluation
Frozen empty lists/map/sets are not equal to null value,
whil multi-cell empty lists/map/sets are equal to null values.

Return a NULL value for an empty multi-cell set or list
if we know the receiver is not frozen - this makes it
easy to compare the parameter with the receiver.

Add a test case for inserting an empty list or set
- the result is indistinguishable from NULL value.
Message-Id: <20191003092157.92294-2-kostja@scylladb.com>
2019-10-03 14:56:25 +02:00
Avi Kivity
3cb081eb84 Merge " hinted handoff: fix races during shutdown and draining" from Vlad
"
Fix races that may lead to use-after-free events and file system level exceptions
during shutdown and drain.

The root cause of use-after-free events in question is that space_watchdog blocks on
end_point_hints_manager::file_update_mutex() and we need to make sure this mutex is alive as long as
it's accessed even if the corresponding end_point_hints_manager instance
is destroyed in the context of manager::drain_for().

File system exceptions may occur when space_watchdog attempts to scan a
directory while it's being deleted from the drain_for() context.
In case of such an exception new hints generation is going to be blocked
- including for materialized views, till the next space_watchdog round (in 1s).

Issues that are fixed are #4685 and #4836.

Tested as follows:
 1) Patched the code in order to trigger the race with (a lot) higher
    probability and running slightly modified hinted handoff replace
    dtest with a debug binary for 100 times. Side effect of this
    testing was discovering of #4836.
 2) Using the same patch as above tested that there are no crashes and
    nodes survive stop/start sequences (they were not without this series)
    in the context of all hinted handoff dtests. Ran the whole set of
    tests with dev binary for 10 times.
"

* 'hinted_handoff_race_between_drain_for_and_space_watchdog_no_global_lock-v2' of https://github.com/vladzcloudius/scylla:
  hinted handoff: fix a race on a directory removal between space_watchdog and drain_for()
  hinted handoff: make taking file_update_mutex safe
  db::hints::manager::drain_for(): fix alignment
  db::hints::manager: serialize calls to drain_for()
  db::hints: cosmetics: identation and missing method qualifier
2019-10-03 14:38:00 +03:00
Tomasz Grabiec
aad1307b14 row_cache, memtable: Use upgrade_schema() 2019-10-03 13:28:33 +02:00
Tomasz Grabiec
3177732b35 flat_mutation_reader: Introduce upgrade_schema() 2019-10-03 13:28:33 +02:00
Asias He
a9b95f5f01 repair: Fix tracker::start and tracker::done in case of error
The operation after gate.enter() in tracker::start() can fail and throw,
we should call gate.leave() in such case to avoid unbalanced enter and
leave calls. tracker::done() has similar issue too.

Fix it by removing the gate enter and leave logic in tracker start and
done. A helper tracker::run() is introduced to take care of the gate and
repair status.

In addition, the error log is improved. It now logs exceptions on all
shards in the summary. e.g.,

[shard 0] repair - repair id 1 failed: std::runtime_error
({shard 0: std::runtime_error (error0), shard 1: std::runtime_error (error1)})

Fixes #5074
2019-10-03 13:33:02 +03:00
Botond Dénes
00b432b61d querier_cache: correctly account entries evicted on insertion in the population
Currently, the population stat is not increased for entries that are
evicted immediately on insert, however the code that does the eviction
still decreases the population stat, leading to an imbalance and in some
cases the underflow of the population stat. To fix, unconditionally
increase the population stat upon inserting an entry, regardless of
whether it is immediately evicted or not.

Fixes: #5123

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191001153215.82997-1-bdenes@scylladb.com>
2019-10-03 11:49:44 +03:00
Dejan Mircevski
ac98385d04 alternator: Factor out Expected operand checks
Put all AttributeValuelist size verification under
verify_operand_count(), rather than have some cases invoke
verify_operand_count() while others verify it in check_*() functions.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 17:11:58 -04:00
Dejan Mircevski
de18b3240b alternator:Implement NOT_NULL operator in Expected
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 16:23:59 -04:00
Dejan Mircevski
75960639a4 alternator: Implement NULL operator in Expected
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 16:19:14 -04:00
Dejan Mircevski
e4fd5f3ef0 alternator: Fix expected_1_null testcase
Testcase "For NULL, AttributeValueList must be empty" accidentally
used NOT_NULL instead of NULL.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 16:19:14 -04:00
Dejan Mircevski
b7ac510581 alternator: Implement IN operator in Expected
Add check_IN() and a switch case that invokes it.  Reactivate IN
tests.  Add a testcase for non-scalar attribute values.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 16:17:38 -04:00
Dejan Mircevski
56efa55a06 alternator: Implement NE operator in Expected
Recognize "NE" as a new operator type, add check_NE() function, invoke
it in verify_expected_one(), and reactivate NE tests.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 14:47:13 -04:00
Dejan Mircevski
af0462d127 alternator: Factor out common code in Expected
Operand-count verification will be repeated a lot as more operators
are implemented, so factor it out into verify_operand_count().

Also move `got` null checks to check_* functions, which reduces
duplication at call sites.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 14:36:57 -04:00
Konstantin Osipov
e8c13efb41 lwt: move mutation hashers to mutation.hh
Prepare mutation hashers for reuse in CAS implementation.
Message-Id: <20190930202409.40561-2-kostja@scylladb.com>
2019-10-01 19:49:31 +02:00
Konstantin Osipov
6cde985946 lwt: remove code that no longer servers as a reference
Remove ifdef'ed Java code, since LWT implementation
is based on the current state of the origin.
Message-Id: <20190930201022.40240-2-kostja@scylladb.com>
2019-10-01 19:46:15 +02:00
Konstantin Osipov
4d214b624b lwt: ensure enum_set::of is constexpr.
This allows using it to initialize const static members.
Message-Id: <20190930200530.40063-2-kostja@scylladb.com>
2019-10-01 19:45:56 +02:00
Tomasz Grabiec
3b9bf9d448 Merge "storage_proxy: replace variadic futures with structs" from Avi
Seastar variadic futures are deprecated, so replace with structs to
avoid nasty deprecation warnings.
2019-10-01 19:32:55 +02:00
Avi Kivity
162730862d storage_proxy: remove variadic future from query_partition_key_range_concurrent()
Seastar variadic futures are deprecated, so replace with a nice struct.
2019-09-30 21:33:44 +03:00
Avi Kivity
968b34a2b4 storage_proxy: remove variadic future from digest_read_resolver
Seastar variadic futures are deprecated, so replace with a nice
struct.
2019-09-30 21:32:17 +03:00
Avi Kivity
90096da9f3 managed_ref: add get() accessor
While a managed_ref emulates a reference more closely than it does
a pointer, it is still nullable, so add a get() (similar to
unique_ptr::get()) that can be nullptr if the reference is null.

The immediate use will be mutation_partition::_static_row, which
is often empty and takes up about 10% of a cache entry.
2019-09-30 20:55:36 +03:00
Nadav Har'El
c9aae13fae docs/alternator/getting-started.md: fix indentation in example code
The example Python code had wrong indentation, and wouldn't actually
work if naively copy-pasted. Noticed by Noam Hasson.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190929091440.28042-1-nyh@scylladb.com>
2019-09-30 13:03:29 +03:00
Avi Kivity
c6b66d197b Merge "Couple of preparatory patches for lwt" from Gleb
"
This is a collection of assorted patches that will be needed for LWT.
Most of them are trivial, but one touches a lot of files, so have a
good chance to cause rebase headache (I already had to rebase it on
top of Alternator). Lets push them earlier instead of carrying them in
the lwt branch.
"

* 'gleb/lwt-prepare-v2' of github.com:scylladb/seastar-dev:
  lwt: make _last_timestamp_micros static
  lwt: Add client_state::get_timestamp_for_paxos() function
  lwt: Pass client_state reference all the way to storage_proxy::query
  exceptions: Add a constructor for unavailable_exception that allows providing a custom message
  serializer: Add std::variant support
  lwt: Add missing functions to utils/UUID_gen.hh
2019-09-29 13:02:26 +03:00
Avi Kivity
9e990725d9 Merge "Simplify and explain from_varint_to_integer #5031" from Rafael
"
This is the second version of the patch series. The previous one was just the second patch, this one adds more tests an another patch to make it easier to test that the new code has the same behavior as the old one.
"

* 'espindola/overflow-is-intentional' of https://github.com/espindola/scylla:
  types: Simplify and explain from_varint_to_integer
  Add more cast tests
2019-09-29 11:27:55 +03:00
Tomasz Grabiec
b0e0f29b06 db: read: Filter-out sstables using its first and last keys
Affects single-partition reads only.

Refs #5113

When executing a query on the replica we do several things in order to
narrow down the sstable set we read from.

For tables which use LeveledCompactionStrategy, we store sstables in
an interval set and we select only sstables whose partition ranges
overlap with the queried range. Other compaction strategies don't
organize the sstables and will select all sstables at this stage. The
reasoning behind this is that for non-LCS compaction strategies the
sstables' ranges will typically overlap and using interval sets in
this case would not be effective and would result in quadratic (in
sstable count) memory consumption.

The assumption for overlap does not hold if the sstables come from
repair or streaming, which generates non-overlapping sstables.

At a later stage, for single-partition queries, we use the sstables'
bloom filter (kept in memory) to drop sstables which surely don't
contain given partition. Then we proceed to sstable indexes to narrow
down the data file range.

Tables which don't use LCS will do unnecessary I/O to read index pages
for single-partition reads if the partition is outside of the
sstable's range and the bloom filter is ineffective (Refs #5112).

This patch fixes the problem by consulting sstable's partition range
in addition to the bloom filter, so that the non-overlapping sstables
will be filtered out with certainty and not depend on bloom filter's
efficiency.

It's also faster to drop sstables based on the keys than the bloom
filter.

Tests:
  - unit (dev)
  - manual using cqlsh

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190927122505.21932-1-tgrabiec@scylladb.com>
2019-09-28 19:42:57 +03:00
Tomasz Grabiec
b93cc21a94 sstables: Fix partition key count estimation for a range
The method sstable::estimated_keys_for_range() was severely
under-estimating the number of partitions in an sstable for a given
token range.

The first reason is that it underestimated the number of sstable index
pages covered by the range, by one. In extreme, if the requested range
falls into a single index page, we will assume 0 pages, and report 1
partition. The reason is that we were using
get_sample_indexes_for_range(), which returns entries with the keys
falling into the range, not entries for pages which may contain the
keys.

A single page can have a lot of partitions though. By default, there
is a 1:20000 ratio between summary entry size and the data file size
covered by it. If partitions are small, that can be many hundreds of
partitions.

Another reason is that we underestimate the number of partitions in an
index page. We multiply the number of pages by:

   (downsampling::BASE_SAMPLING_LEVEL * _components->summary.header.min_index_interval)
     / _components->summary.header.sampling_level

Using defaults, that means multiplying by 128. In the cassandra-stress
workload a single partition takes about 300 bytes in the data file and
summary entry is 22 bytes. That means a single page covers 22 * 20'000
= 440'000 bytes of the data file, which contains about 1'466
partitions. So we underestimate by an order of magnitude.

Underestimating the number of partitions will result in too small
bloom filters being generated for the sstables which are the output of
repair or streaming. This will make the bloom filters ineffective
which results in reads selecting more sstables than necessary.

The fix is to base the estimation on the number of index pages which
may contain keys for the range, and multiply that by the average key
count per index page.

Fixes #5112.
Refs #4994.

The output of test_key_count_estimation:

Before:

count = 10000
est = 10112
est([-inf; +inf]) = 512
est([0; 0]) = 128
est([0; 63]) = 128
est([0; 255]) = 128
est([0; 511]) = 128
est([0; 1023]) = 128
est([0; 4095]) = 256
est([0; 9999]) = 512
est([5000; 5000]) = 1
est([5000; 5063]) = 1
est([5000; 5255]) = 1
est([5000; 5511]) = 1
est([5000; 6023]) = 128
est([5000; 9095]) = 256
est([5000; 9999]) = 256
est(non-overlapping to the left) = 1
est(non-overlapping to the right) = 1

After:

count = 10000
est = 10112
est([-inf; +inf]) = 10112
est([0; 0]) = 2528
est([0; 63]) = 2528
est([0; 255]) = 2528
est([0; 511]) = 2528
est([0; 1023]) = 2528
est([0; 4095]) = 5056
est([0; 9999]) = 10112
est([5000; 5000]) = 2528
est([5000; 5063]) = 2528
est([5000; 5255]) = 2528
est([5000; 5511]) = 2528
est([5000; 6023]) = 5056
est([5000; 9095]) = 7584
est([5000; 9999]) = 7584
est(non-overlapping to the left) = 0
est(non-overlapping to the right) = 0

Tests:
  - unit (dev)

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190927141339.31315-1-tgrabiec@scylladb.com>
2019-09-28 19:36:43 +03:00
Piotr Sarna
10f90d0e25 types: remove deprecated comment
The comment does not apply anymore, as this definition is no more
in database.hh.
Message-Id: <a0b6ff851e1e3bcb5fcd402fbf363be7af0219af.1569580556.git.sarna@scylladb.com>
2019-09-27 19:32:17 +02:00
Dejan Mircevski
9a89e0c5ec dbuild: Update README on interactive mode
`dbuild` was recently (24c732057) updated to run in interactive mode
when given no arguments; we can now update the README to mention that.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-09-27 16:33:27 +02:00
Dejan Mircevski
f8638d8ae1 alternator: Add build byproducts to .gitignore
Add .pytest_cache and expressions.tokens to the top-level .gitignore.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-09-27 16:18:45 +02:00
Dejan Mircevski
332ffa77ea alternator: Actually use BEGINS_WITH in its tests
For some reason, BEGINS_WITH tests used EQ as comparison operator.

Tests: pytest test_expected.py

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-09-26 22:41:34 +03:00
Tomasz Grabiec
5b0e48f25b Merge "toppartitions: don't transport schema_ptr across shards" from Avi
When the toppartitions operation gathers results, it copies partition
keys with their schema_ptr:s. When these schema_ptr:s are copies
or destroyed, they can cause leaks or premature frees of the schema
in its original shard since reference count operations in are not atomic.

Fix that by converting the schema_ptr to a global_schema_ptr during
transportation.

Fixes #5104 (direct bug)
Fixes #5018 (schema prematurely freed, toppartitions previously executed on that node)
Fixes #4973 (corrupted memory pool of the same size class as schema, toppartitions previously executed on that node)

Tests: new test added that fails with the existing code in debug mode,
manual toppartitions test
2019-09-26 17:09:54 +02:00
Avi Kivity
36b4d55b28 tests: add test for toppartitions cross-shard schema_ptr copy 2019-09-26 17:40:46 +03:00
Avi Kivity
670f398a8a toppartitions: do not copy schema_ptr:s in item keys across shards
Copying schema_ptrs across shards results in memory corruption since
lw_shared_ptr does not use atomic operations for reference counts.
Prevent that by converting schema_ptr:s to global_schema_ptr:s before
shipping them across shards in the map operation, and converting them
back to local schema_ptr:s in the reduce operation.
2019-09-26 17:26:40 +03:00
Avi Kivity
f015bd69b7 toppartitions: compare schemas using schema::id(), not pointer to schema
This allows keys from different stages in the schema's like to compare equal.
This is safe since the partition key cannot change, unlike the rest of the schema.

More importantly, it will allow us to compare keys made local after a pass through
global_schema_ptr, which does not guarantee that the schema_ptr conversion will be
the same even when starting with the same global_schema_ptr.
2019-09-26 17:15:46 +03:00
Avi Kivity
ea4976a128 schema_registry: mark global_schema_ptr move constructor noexcept
Throwing move constructors are a a pain; so we should try to make
them noexcept. Currently, global_schema_ptr's move constructor
throws an exception if used illegaly (moving from a different shard);
this patch changes it to an assert, on the grounds that this error
is impossible to recover from.

The direct motivation for the patch is the desire to store objects
containing a global_schema_ptr in a chunked_vector, to move lists
of partition keys across shards for the topppartitions functionality.
chunked_vector currently requires noexcept move constructors for its
value_type.
2019-09-26 16:56:59 +03:00
Avi Kivity
ba64ec78cf messaging_service: use rpc::tuple instead of variadic futures for rpc
Since variadic future<> is deprecated, switch to rpc::tuple for multiple
return values in rpc calls. This is more or less mechanical translation.
2019-09-26 12:09:31 +02:00
Tomasz Grabiec
9183e28f2c Merge "Recreate dependent user types" from Rafael
When a user type changes we were not recreating other uses types that
use it. This patch series fixes that and makes it clear which code is
responsible for it.

In the system.types table a user type refers to another by name. When
a user type is modified, only its entry in the table is changed.

At runtime a user type has direct pointer to the types it uses. To
handle the discrepancy we need to recreate any dependent types when a
entry in system.types changes.

Fixes #5049
2019-09-26 12:06:32 +02:00
Gleb Natapov
e0b303b432 lwt: make _last_timestamp_micros static
If each client_state has its own copy of the variable two clients may
generate timestamps that clash and needlessly create contention. Making
the variable shared between all client_state on the same shard will make
sure this will not happen to two clients on the same shard. It may still
happen for two client on two different shards or two different nodes.
2019-09-26 11:44:00 +03:00
Gleb Natapov
622d21f740 lwt: Add client_state::get_timestamp_for_paxos() function
Paxos needs a unique timestamp that is greater than some other
timestamp, so that the next round had more chances to succeed.
Add a function that returns such a timestamp.
2019-09-26 11:44:00 +03:00
Gleb Natapov
e72a105b5e lwt: Pass client_state reference all the way to storage_proxy::query
client_state holds a state to generate monotonically increasing unique
timestamp. Queries with a SERIAL consistency level need it to generate
a paxos round.
2019-09-26 11:44:00 +03:00
Gleb Natapov
556f65e8a1 exceptions: Add a constructor for unavailable_exception that allows providing a custom message 2019-09-26 11:44:00 +03:00
Gleb Natapov
209414b4eb serializer: Add std::variant support 2019-09-26 11:44:00 +03:00
Gleb Natapov
f9209e27d4 lwt: Add missing functions to utils/UUID_gen.hh
Some lwt related code is missing in our UUID implementation. Add it.
2019-09-26 11:44:00 +03:00
Rafael Ávila de Espíndola
5af8b1e4a3 types: recreate dependent user types.
In the system.types table a user type refers to another by name. When
a user type is modified, only its entry in the table is changed.

At runtime a user type has direct pointer to the types it uses. To
handle the discrepancy we need to recreate any dependent types when a
entry in system.types changes.

Fixes #5049

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola
4c3209c549 types: Don't include dependent user types in update.
The way schema changes propagate is by editing the system tables and
comparing the before and after state.

When a user type A uses another user type B and we modify B, the
representation of A in the system table doesn't change, so this code
was not producing any changes on the diff that the receiving side
uses.

Deleting it makes it clear that it is the receiver's responsibility to
handle dependent user types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola
34eddafdb0 types: Don't modify the type list in db::cql_type_parser::raw_builder
With this patch db::cql_type_parser::raw_builder creates a local copy
of the list of existing types and uses that internally. By doing that
build() should have no observable behavior other than returning the
new types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola
d6b2e3b23b types: pass a reference to prepare_internal
We were never passing a null pointer and never saving a copy of the
lw_shared_ptr. Passing a reference is more flexible as not all callers
are required to hold the user_types_metadata in a lw_shared_ptr.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-25 15:40:30 -07:00
Avi Kivity
03260dd910 Update seastar submodule
* seastar b56a8c5045...c21a7557f9 (3):
  > net: socket::{set,get}_reuseaddr() should not be virtual
  > iotune: print verbose message in case of shutdown errors
  > iotune: close test file on shutdown

Fixes #4946.
2019-09-25 16:08:32 +03:00
Tomasz Grabiec
06b9818e98 Merge "storage_proxy: tolerate view_update_write_response_handler id not found on shutdown" from Benny
1. Add assert in remove_response_handler to make crashes like in #5032 easier to understand.
2. Lookup the view_update_write_response_handler id before calling  timeout_cb and tolerate it not found.
   Just log a warning if this happened.

Fixes #5032
2019-09-25 14:49:42 +02:00
Avi Kivity
83bc59a89f Merge "mvcc: Fix incorrect schema version being used to copy the mutation when applying (#5099)" from Tomasz
"
Currently affects only counter tables.

Introduced in 27014a2.

mutation_partition(s, mp) is incorrect because it uses s to interpret
mp, while it should use mp_schema.

We may hit this if the current node has a newer schema than the
incoming mutation. This can happen during table schema altering when we receive the
mutation from a node which hasn't processed the schema change yet.

This is undefined behavior in general. If the alter was adding or
removing columns, this may result in corruption of the write where
values of one column are inserted into a different column.

Fixes #5095.
"

* 'fix-schema-alter-counter-tables' of https://github.com/tgrabiec/scylla:
  mvcc: Fix incorrect schema verison being used to copy the mutation when applying
  mutation_partition: Track and validate schema version in debug builds
  tests: Use the correct schema to access mutation_partition
2019-09-25 15:30:22 +03:00
Tomasz Grabiec
11440ff792 mvcc: Fix incorrect schema verison being used to copy the mutation when applying
Currently affects only counter tables.

Introduced in 27014a2.

mutation_partition(s, mp) is incorrect, because it uses s to interpret
mp, while it should use mp_schema.

We may hit this if the current node has a newer schema than the
incoming mutation. This can happen during alter when we receive the
mutation from a node which hasn't processed the schema change yet.

This is undefined behavior in general. If the alter was adding or
removing columns, this may result in corruption of the write where
values of one column are inserted into a different column.

Fixes #5095.
2019-09-25 11:28:07 +02:00
Tomasz Grabiec
bce0dac751 mutation_partition: Track and validate schema version in debug builds
This patch makes mutation_partition validate the invariant that it's
supposed to be accessed only with the schema version which it conforms
to.

Refs #5095
2019-09-25 10:27:06 +02:00
Avi Kivity
721fa44c4f Update seastar submodule
* seastar e51a1a8ed9...b56a8c5045 (3):
  > net: add support for UNIX-domain sockets
  > future: Warn on promise::set_exception with no corresponding future or task
  > Merge "Handle exceptions in repeat_until_value and misc cleanups" from Rafael
2019-09-25 11:21:57 +03:00
Benny Halevy
e9388b3f03 storage_proxy::drain_on_shutdown fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-25 11:19:50 +03:00
Benny Halevy
b7c7af8a75 storage_proxy: validate id from view_update_handlers_list
Handle a race where a write handler is removed from _response_handlers
but not yet from _view_update_handlers_list.

Fixes #5032

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-25 11:19:50 +03:00
Benny Halevy
1fea5f5904 storage_proxy: refactor remove_response_handler
Refactor remove_response_handler_entry out of remove_response_handler,
to be called on a valid iterator found by _response_handlers.find(id).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-25 11:19:50 +03:00
Benny Halevy
592c4bcfc2 storage_proxy: remove_response_handler: assert id was found
Help identify cases like seen in #5032 where the handler id
wasn't found from the on_down -> timeout_cb path.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-25 11:19:50 +03:00
Raphael S. Carvalho
571fa94eb5 sstables/compaction_manager: Don't perform upgrade on shared SSTables
compaction_manager::perform_sstable_upgrade() fails when it feeds
compaction mechanism with shared sstables. Shared sstables should
be ignored when performing upgrade and so wait for reshard to pick
them up in parallel. Whenever a shared sstable is brought up either
on restart or via refresh, reshard procedure kicks in.
Reshard picks the highest supported format so the upgrade for
shared sstable will naturally take place.

Fixes #5056.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190925042414.4330-1-raphaelsc@scylladb.com>
2019-09-25 11:18:40 +03:00
Asias He
19e8c14ad1 gossiper: Improve the gossip timer callback lock handling (#5097)
- Update the outdated comments in do_stop_gossiping. It was
  storage_service not storage_proxy that used the lock. More
  importantly, storage_service does not use it any more.

- Drop the unused timer_callback_lock and timer_callback_unlock API

- Use with_semaphore to make sure the semaphore usage is balanced.

- Add log in gossiper::do_stop_gossiping when it tries to take the
  semaphore to help debug hang during the shutdown.

Refs: #4891
Refs: #4971
2019-09-25 10:46:38 +03:00
Tomasz Grabiec
4d9b176aaa tests: Use the correct schema to access mutation_partition 2019-09-24 19:46:57 +02:00
Botond Dénes
425cc0c104 doc: add debugging.md
A documentation file that is intended to be a place for anything
debugging related: getting started tutorial, tips and tricks and
advanced guides.
For now it contains a short introductions, some selected links to
more in-depth documentation and some trips and tricks that I could think
off the top of my head.
One of those tricks describes how to load cores obtained from
relocatable packages inside the `dbuild` container. I originally
intended to add that to `tools/toolchain/README.md` but was convinced
that `docs/debugging.md` would be a better place for this.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190924133110.15069-1-bdenes@scylladb.com>
2019-09-24 20:18:45 +03:00
Botond Dénes
d57ab83bc8 querier_cache: add inserted stat
Recently we have seen a case where the population stat of the cache was
corrupt, either due to misaccounting or some more serious corruption.
When debugging something like that it would have been useful to know how
many items have been inserted to the cache. I also believe that such a
counter could be useful generally as well.

Refs: #4918

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190924083429.43038-1-bdenes@scylladb.com>
2019-09-24 10:52:49 +02:00
Avi Kivity
8e8a048ada Merge "lsa: Assert no cross-shard region locking #5090" from Tomasz
"
We observed an abort on bad_alloc which was not caused by real OOM,
but could be explained by cache region being locked from a different
shard, which is not allowed, concurrently with memory reclamation.

It's impossible now to prove this, or, if that was indeed the case, to
determine which code path was attempting such lock. This patch adds an
assert which would catch such incorrect locking at the attempt.

Refs #4978

Tests:
 - unit (dev, release, debug)
"

* 'assert-no-xshard-lsa-locking' of https://github.com/tgrabiec/scylla:
  lsa: Assert no cross-shard region locking
  tests: Make managed_vector_test a seastar test
2019-09-23 19:52:47 +03:00
Avi Kivity
79d17f3c80 Update seastar submodule
* seastar 2a526bb120...e51a1a8ed9 (2):
  > rpc: introduce rpc::tuple as a way to move away from variadic future
  > shared_future: don't warn on broken futures
2019-09-23 19:50:40 +03:00
Avi Kivity
1b8009d10c sstables: compaction_manager: #include seastarx.hh
Make it easier for the IDE to resolve references to the seastar
namespace. In any case include files should be stand-alone and not
depend on previously included files.
2019-09-23 16:12:49 +02:00
Avi Kivity
07af9774b3 relocatable: erase build directory from executable and debug info
The build directory is meaningless, since it is typically some
directory in a continuous integration server. That means someone
debugging the relocatable package needs to issue the gdb command
'set substitute-path' with the correct arguments, or they lose
source debugging. Doing so in the relocatable package build saves
this step.

The default build is not modified, since a typical local build
benefits from having the paths hardcoded, as the debugger will
find the sources automatically.
2019-09-23 13:08:15 +02:00
Tomasz Grabiec
eb08ab7ed9 lsa: Assert no cross-shard region locking
We observed an abort on bad_alloc which was not caused by real OOM,
but could be explained by cache region being locked from a different
shard, which is not allowed, concurrently with memory reclamation.

It's impossible now to prove this, or, if that was indeed the case, to
determine which code path was attempting such lock. This patch adds an
assert which would catch such incorrect locking at the attempt.

Refs #4978
2019-09-23 12:51:29 +02:00
Tomasz Grabiec
8bedcd6696 tests: Make managed_vector_test a seastar test
LSA will depend on seastar reactor being present.
2019-09-23 12:51:24 +02:00
Raphael S. Carvalho
b4cf429aab sstables/LCS: Fix increased write amplification due to incorrect SSTable demotion
LCS demotes a SSTable from a given level when it thinks that level is inactive.
Inactive level means N rounds (compaction attempt) without any activity in it,
in other words, no SSTable has been promoted to it.
The problem happens because the metadata that tracks inactiveness of each level
can be incorrectly updated when there's an ongoing compaction. LCS has parallel
compaction disabled. So if a table finds itself running a long operation like
cleanup that blocks minor compaction, LCS could incorrectly think that many
levels need demotion, and by the time cleanup finishes, some demotions would
incorrectly take place.
This problem is fixed by only updating the counter that tracks inactiveness
when compaction completes, so it's not incorrectly updated when there's an
ongoing compaction for the table.

Fixes #4919.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190917235708.8131-1-raphaelsc@scylladb.com>
2019-09-22 10:46:38 +03:00
Eliran Sinvani
280715ad45 Storage proxy: protect against infinite recursion in query_partition_key_range_concurrent
A recent fix to #3767 limited the amount of ranges that
can return from query_ranges_to_vnodes_generator. This with
the combination of a large amount of token ranges can lead to
an infinite recursion. The algorithm multiplies by factor of
2 (actualy a shift left by one)  the amount of requested
tokens in each recursion iteration. As long as the requested
number of ranges is greater than 0, the recursion is implicit,
and each call is scheduled separately since the call is inside
a continuation of a map reduce.
But if the amount of iterations is large enough (~32) the
counter for requested ranges zeros out and from that moment on
two things will happen:
1. The counter will remain 0 forever (0*2 == 0)
2. The map reduce future will be immediately available and this
will result in the continuation being invoked immediately.
The latter causes the recursive call to be a "regular" recursive call
thus, through the stack and not the task queue of the scheduler, and
the former causes this recursion to be infinite.
The combination creates a stack that keeps growing and eventually
overflows resulting in undefined behavior (due to memory overrun).

This patch prevent the problem from happening, it limits the growth of
the concurrency counter beyond twice the last amount of tokens returned
by the query_ranges_to_vnodes_generator.And also makes sure it is not
get stuck at zero.

Testing: * Unit test in dev mode.
         * Modified add 50 dtest that reproduce the problem

Fixes #4944

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190922072838.14957-1-eliransin@scylladb.com>
2019-09-22 10:33:31 +03:00
Gleb Natapov
73e3d0a283 messaging_service: enable reuseaddr on messaging service rpc
Fixes #4943

Message-Id: <20190918152405.GV21540@scylladb.com>
2019-09-19 11:43:03 +03:00
Rafael Ávila de Espíndola
4d0916a094 commitlog: Handle gate_closed_exception
Before this patch, if the _gate is closed, with_gate throws and
forward_to is not executed. When the promise<> p is destroyed it marks
its _task as a broken promise.

What happens next depends on the branch.

On master, we warn when the shared_future is destroyed, so this patch
changes the warning from a broken_promise to a gate closed.

On 3.1, we warn when the promises in shared_future::_peers are
destroyed since they no longer have a future attached: The future that
was attached was the "auto f" just before the with_gate call, and it
is destroyed when with_gate throws. The net result is that this patch
fixes the warning in 3.1.

I will send a patch to seastar to make the warning on master more
consistent with the warning in 3.1.

Fixes #4394

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190917211915.117252-1-espindola@scylladb.com>
2019-09-17 23:41:21 +02:00
Avi Kivity
60656d1959 Update seastar submodule
* seastar 84d8e9fe9b...2a526bb120 (1):
  > iotune: fix exception handling in case test file creation fails

Fixes #5001.
2019-09-16 19:39:14 +03:00
Glauber Costa
c9f2d1d105 do not crash in user-defined operations if the controller is disabled
Scylla currently crashes if we run manual operations like nodetool
compact with the controller disabled. While we neither like nor
recommend running with the controller disabled, due to some corner cases
in the controller algorithm we are not yet at the point in which we can
deprecate this and are sometimes forced to disable it.

The reason for the crash is that manual operations will invoke
_backlog_of_shares, which returns what is the backlog needed to
create a certain number of shares. That scan the existing control
points, but when we run without the controller there are no control
points and we crash.

Backlog doesn't matter if the controller is disabled, and the return
value of this function will be immaterial in this case. So to avoid the
crash, we return something right away if the controller is disabled.

Fixes #5016

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-09-16 18:26:57 +02:00
Avi Kivity
d77171e10e build: adjust libthread_db file name to match gdb expectations
gdb searches for libthread_db.so using its canonical name of libthread_db.so.1 rather
than the file name of libthread_db-1.0.so, so use that name to store the file in the
archive.

Fixes #4996.
2019-09-16 14:48:42 +02:00
Avi Kivity
7502985112 Update seastar submodule
* seastar b3fb4aaab3...84d8e9fe9b (8):
  > Use aio fsync if available
  > Merge "fix some tcp connection bugs and add reuseaddr option to a client socket" from Gleb
  > lz4: use LZ4_decompress_safe
  > reactor: document seastar::remove_file()
  > core/file.hh: remove redundant std::move()
  > core/{file,sstring}: do not add `const` to return value
  > http/api_docs: always call parent constructor
  > Add input_stream blurb
2019-09-16 11:52:55 +03:00
Piotr Sarna
feec3825aa view: degrade shutdown bookkeeping update failures log to warn
Currently, if updating bookkeeping operations for view building fails,
we log the error message and continue. However, during shutdown,
some errors are more likely to happen due to existing issues
like #4384. To differentiate actual errors from semi-expected
errors during shutdown, the latter are now logged with a warning
level instead of error.

Fixes #4954
2019-09-16 10:13:06 +03:00
Piotr Sarna
f912122072 main: log unexpected errors thrown on shutdown (#4993)
Shutdown routines are usually implemented via the deferred_action
mechanism, which runs a function in its destructor. We thus expect
the function to be noexcept, but unfortunately it's not always
the case. Throwing in the destructor results in terminating the program
anyway, but before we do that, the exception can be logged so it's
easier to investigate and pinpoint the issue.

Example output before the patch:
INFO  2019-09-10 12:49:05,858 [shard 0] view - Stopping view builder
terminate called without an active exception
Aborting on shard 0.
Backtrace:
  0x000000000184a9ad
(...)

Example output after the patch:
INFO  2019-09-10 12:49:05,858 [shard 0] view - Stopping view builder
ERROR 2019-09-10 12:49:05,858 [shard 0] init - Unexpected error on shutdown: std::runtime_error (Hello there!)
terminate called without an active exception
Aborting on shard 0.
Backtrace:
  0x000000000184a9ad
(...)
2019-09-16 09:42:55 +03:00
Rafael Ávila de Espíndola
1d9ba4c79b types: Simplify and explain from_varint_to_integer
This simplifies the implementation of from_varint_to_integer and
avoids using the fact that a static_cast from cpp_int to uint64_t
seems to just keep the low 64 bits.

The boost release notes
(https://www.boost.org/users/history/version_1_67_0.html) implies that
the conversion function should return the maximum value a uint64_t can
hold if the original value is too large.

The idea of using a & with ~0 is a suggestion from the boost release
notes.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-15 14:44:54 -07:00
Rafael Ávila de Espíndola
6611e9faf7 Add more cast tests
These cover converting a varint to a value smaller than 64 bits.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-15 14:44:54 -07:00
Benny Halevy
c22ad90c04 scyllatop: livedata, metric: expire absent metrics
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 19:48:09 +03:00
Benny Halevy
6e807a56e1 scyllatop: livedata: update all metrics based on new discovered list
Update current results dictionary using the Metric.discover method.

New results are added and missing results are marked as absent.
(Both full metrics or specific keys)

Previously, with prometheous, each metric.update called query_list
resulting in O(n^2) when all metric were updated, like in the scylla_top
dtest - causing test timeout when testing debug build.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 19:45:34 +03:00
Benny Halevy
16de4600a0 scyllatop: metric: return discover results as dict
So that we can easily search by symbol for updating
multiple results in a single pass.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
02707621d4 scyllatop: metric: update_info in discover
So that all metric information can be retrieved in a single pass.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
3861460d3b scyllatop: metric: refactor update method
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
99ab60fc27 scyllatop: metric: add_to_results
In preparation to changing results to a dict
use a method to add a new metric to the results.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
b489556807 scyllatop: metric: refactor discover and discover_with_help
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
8f7c721907 scyllatop: livedata: get rid of _setupUserSpecifiedMetrics
Add self._metricPatterns member and merge _setupUserSpecifiedMetrics
with _initializeMetrics.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
c17aee0dd3 scyllatop: add debug logging
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Tomasz Grabiec
79935df959 commitlog: replay: Respect back-pressure from memtable space to prevent OOM
Commit log replay was bypassing memtable space back-pressure, and if
replay was faster than memtable flush, it could lead to OOM.

The fix is to call database::apply_in_memory() instead of
table::apply(). The former blocks when memtable space is full.

Fixes #4982.

Tests:
  - unit (release)
  - manual, replay with memtable flush failin and without failing

Message-Id: <1568381952-26256-1-git-send-email-tgrabiec@scylladb.com>
2019-09-15 11:51:56 +03:00
Tomasz Grabiec
3c49b2960b gdb: Introduce 'scylla memtables'
Example output:

(gdb) scylla memtables
table "ks_truncate"."standard1":
  (memtable*) 0x60c0005a5500: total=131072, used=131072, free=0, flushed=0
table "keyspace1"."standard1":
  (memtable*) 0x60c0005a6000: total=5144444928, used=4512728524, free=631716404, flushed=0
  (memtable*) 0x60c0005a8a80: total=426901504, used=374294312, free=52607192, flushed=0
  (memtable*) 0x60c000eb6a80: total=0, used=0, free=0, flushed=0
table "system_traces"."sessions_time_idx":
  (memtable*) 0x60c0005a4d80: total=131072, used=131072, free=0, flushed=0


Message-Id: <1568133476-22463-1-git-send-email-tgrabiec@scylladb.com>
2019-09-15 10:39:55 +03:00
Kamil Braun
9bf4fe669f Auto-expand replication_factor for NetworkTopologyStrategy (#4667)
If the user supplies the 'replication_factor' to the 'NetworkTopologyStrategy' class,
it will expand into a replication factor for each existing DC for their convenience.

Resolves #4210.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-09-15 10:38:09 +03:00
Tomasz Grabiec
8517eecc28 Revert "Simplify db::cql_type_parser::parse"
This reverts commit 7f64a6ec4b.

Fixes #5011

The reverted commit exposes #3760 for all schemas, not only those
which have UDTs.

The problem is that table schema deserialization now requires keyspace
to be present. If the replica hasn't received schema changes which
introduce the keyspace yet, the write will fail.
2019-09-12 12:45:21 +02:00
Nadav Har'El
67a07e9cbc README.md: mention Alternator
Mention on the top-level README.md that Scylla by default is compatible
with Cassandra, but also has experimental support for DynamoDB's API.
Provide links to alternator/alternator.md and alternator/getting-started.md
with more information about this feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190911080913.10141-1-nyh@scylladb.com>
2019-09-11 18:01:58 +03:00
Avi Kivity
c08921b55a Merge "Alternator - Add support for DynamoDB Compatible API in Scylla" from Nadav & Piotr
"
In this patch set, written by Piotr Sarna and myself, we add Alternator - a new
Scylla feature adding compatibility with the API of Amazon DynamoDB(TM).
DynamoDB's API uses JSON-encoded requests and responses which are sent over
an HTTP or HTTPS transport. It is described in detail on Amazon's site:
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/

Our goal is that any application written to use Amazon DynamoDB could
be run, unmodified, against Scylla with Alternator enabled. However, at this
stage the Alternator implementation is incomplete, and some of DynamoDB's
API features are not yet supported. The extent of Alternator's compatibility
with DynamoDB is described in the document docs/alternator/alternator.md
included in this patch set. The same document also describes Alternator's
design (and also points to a longer design document).

By default, Scylla continues to listen only to Cassandra API requests and not
DynamoDB API requests. To enable DynamoDB-API compatibility, you must set
the alternator-port configuration option (via command line or YAML) to the port on
which you wish to listen for DynamoDB API requests. For more information, see
docs/alternator/alternator.md. The document docs/alternator/getting-started.md
also contains some examples of how to get started with Alternator.
"

* 'alternator' of https://github.com/nyh/scylla: (272 commits)
  Added comments about DAX, monitoring and more
  alternator: fix usage of client_state
  alternator-test: complete test_expected.py for rest of comparison operators
  alternator-test: reproduce bug in Expected with EQ of set value
  alternator: implement the Expected request parameter
  alternator: add returning PAY_PER_REQUEST billing mode
  alternator: update docs/alternator.md on GSI/LSI situation
  Alternator: Add getting started document for alternator
  move alternator.md to its own directory
  alternator-test: add xfail test for GSI with 2 regular columns
  alternator/executor.cc: Latencies should use steady_clock
  alternator-test: fix LSI tests
  alternator-test: fix test_describe_endpoints.py for AWS run
  alternator-test: test_describe_endpoints.py without configuring AWS
  alternator: run local tests without configuring AWS
  alternator-test: add LSI tests
  alternator-test: bump create table time limit to 200s
  alternator: add basic LSI support
  alternator: rename reserved column name "attrs"
  alternator: migrate make_map_element_restriction to string view
  ...
2019-09-11 18:01:05 +03:00
Dor Laor
7d639d058e Added comments about DAX, monitoring and more 2019-09-11 18:01:05 +03:00
Nadav Har'El
c953aa3e20 alternator-test: complete test_expected.py for rest of comparison operators
This patch adds tests for all the missing comparion operators in the
Expected parameter (the old-style parameter for conditional operations).
All these new tests are now xfailing on Alternator (and succeeding on
DynamoDB), because these operators are not yet implemented in Alternator
(we only implemented EQ and BEGINS_WITH, so far - the rest are easy but
need to be implemented).

The test_expected.py is now hopefully comprehensive, covering the entire
feature set of the "Expected" parameter and all its various cases and
subcases.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190910092208.23461-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
23bb3948ee alternator-test: reproduce bug in Expected with EQ of set value
Our implementation of the "EQ" operator in Expected (conditional
operation) just compares the JSON represntation of the values.
This is almost always correct, but unfortunately incorrect for
sets - where we can have two equal sets despite having a
different order.

This patch just adds an (xfailing) test for this bug.

The bug itself can be fixed in the future in one of several ways
including changing the implementation of EQ, or changing the
serialization of sets so they'll always be sorted in the same
way.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190909125147.16484-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
13d657b20d alternator: implement the Expected request parameter
In this patch we implement the Expected parameter for the UpdateItem,
PutItem and DeleteItem operations. This parameter allows a conditional
update - i.e., do an update only if the existing value of the item
matches some condition.
This is the older form of conditional updates, but is still used by many
applications, including Amazon's Tic-Tac-Toe demo.

As usual, we do not yet provide isolation guarantees for read-modify-write
operations - the item is simply read before the modification, and there is
no protection against concurrent operation. This will of course need to be
addressed in the future.

The Expected parameter has a relatively large number of variations, and most
of them are supported by this code, except that currenly only two comparison
operators are supported (EQ and BEGINS_WITH) out of the 13 listed in the
documentation. The rest will be implemented later.

This patch also includes comprehensive tests for the Expected feature.
These tests are almost exhaustive, except for one missing part (labled FIXME) -
among the 13 comparison operations, the tests only check the EQ and BEGINS_WITH
operators. We'll later need to add checks to the rest of them as well.
As usual, all the tests pass on Amazon DynamoDB, and after this patch all
of them succeed on Alternator too.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190905125558.29133-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
c5fc48d1ee alternator: add returning PAY_PER_REQUEST billing mode
In order for Spark jobs to work correctly, a hardcoded PAY_PER_REQUEST
billing mode entry is returned when describing a table with
a DescribeTable request.
Also, one test case in test_describe_table.py is no longer marked XFAIL.
Message-Id: <a4e6d02788d8be48b389045e6ff8c1628240197c.1567688894.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
b58eadd6c9 alternator: update docs/alternator.md on GSI/LSI situation
Update docs/alternator.md on the current level of compatibility of our
GSI and LSI implementation vs. DynamoDB.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190904120730.12615-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Eliran Sinvani
a6f600c54f Alternator: Add getting started document for alternator
This patch adds a getting started document for alternator,
it explains how to start up a cluster that has an alternator
API port open and how to test that it works using either an
application or some simple and minimal python scripts.
The goal of the document is to get a user to have an up and
running docker based cluster with alternator support in the
shortest time possible.
2019-09-11 18:01:05 +03:00
Eliran Sinvani
573ff2de35 move alternator.md to its own directory
As part of trying to make alternator more accessible
to users, we expect more documents to be created so
it seems like a good idea to give all of the alternator
docs their own directory.
2019-09-11 18:01:05 +03:00
Piotr Sarna
6579a3850a alternator-test: add xfail test for GSI with 2 regular columns
When updating the second regular base column that is also a view
key, the code in Scylla will assume it only needs to update an entry
instead of replacing an old one. This leads to inconsitencies
exposed in the test case.
Message-Id: <5dfeb9f61f986daa6e480e9da4c7aabb5a09a4ec.1567599461.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Amnon Heiman
722b4b6e98 alternator/executor.cc: Latencies should use steady_clock
To get a correct latency estimations executor should use a higher clock
resolution.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
b470137cea alternator-test: fix LSI tests
LSI tests are amended, so they no longer needlessly XPASS:
 * two xpassing tests are no longer marked XFAIL
 * there's an additional test for partial projection
   that succeeds on DynamoDB and does not work fine yet in alternator
Message-Id: <0418186cb6c8a91de84837ffef9ac0947ea4e3d3.1567585915.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
dc1d577421 alternator-test: fix test_describe_endpoints.py for AWS run
The previous patch fixed test_describe_endpoints.py for a local run
without an AWS configuration. But when running with "--aws", we do
need to use that AWS configuration, and this patch fixes this case.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
897dffb977 alternator-test: test_describe_endpoints.py without configuring AWS
Even when running against a local Alternator, Boto3 wants to know the
region name, and AWS credentials, even though they aren't actually needed.
For a local run, we can supply garbage values for these settings, to
allow a user who never configured AWS to run tests locally.
Running against "--aws" will, of course, still require the user to
configure AWS.

The previous patch already fixed this for most tests, this patch fixes the
same issue in test_describe_endpoints.py, which had a separate copy of the
problematic code.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
b39101cd04 alternator: run local tests without configuring AWS
Even when running against a local Alternator, Boto3 wants to know the
region name, and AWS credentials, even though they aren't actually needed.
For a local run, we can supply garbage values for these settings, to
allow a user who never configured AWS to run tests locally.
Running against "--aws" will, of course, still require the user to
configure AWS.

Also modified the README to be clearer, and more focused on the local
runs.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708121420.7485-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
efff187deb alternator-test: add LSI tests
Cases for local secondary indexes are added - loosely based on
test_gsi.py suite.
2019-09-11 18:01:05 +03:00
Piotr Sarna
927dc87b9c alternator-test: bump create table time limit to 200s
Unfortunately the previous 100s limit proved to be not enough
for creating tables with both local and global indexes attached
to them. Empirically 200s was chosen as a safe default,
as the longest test oscillated around 100s with the deviation of 10s.
2019-09-11 18:01:05 +03:00
Piotr Sarna
2fcd1ff8a9 alternator: add basic LSI support
With this patch, LocalSecondaryIndexes can be added to a table
during its creation. The implementation is heavily shared
with GlobalSecondaryIndexes and as such suffers from the same TODOs:
projections, describing more details in DescribeTable, etc.
2019-09-11 18:01:05 +03:00
Nadav Har'El
7b8917b5cb alternator: rename reserved column name "attrs"
We currently reserve the column name "attrs" for a map of attributes,
so the user is not allowed to use this name as a name of a key.

We plan to lift this reservation in a future patch, but until we do,
let's at least choose a more obscure name to forbid - in this patch ":attrs".
It is even less likely that a user will want to use this specific name
as a column name.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190903133508.2033-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
ef7903a90f alternator: migrate make_map_element_restriction to string view
In order to elide unnecessary copying and allow more copy elision
in the future, make_map_element_restriction helper function
uses string_view instead of a const string reference.
Message-Id: <1a3e82e7046dc40df604ee7fbea786f3853fee4d.1567502264.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
fc946ddfba alternator: clean error, not a crash, on reserved column name
Currently, we reserve the name ATTRS_COLUMN_NAME ("attrs") - the user
cannot use it as a key column name (key of the base table or GSI or LSI)
because we use this name for the attribute map we add to the schema.

Currently, if the user does attempt to create such a key column, the
result is undefined (sometimes corrupt sstables, sometimes outright crashes).
This patches fixes it to become a clean error, saying that this column name is
currently reserved.

The test test_create_table_special_column_name now cleanly fails, instead
of crashing Scylla, so it is converted from "skip" to "xfail".

Eventually we need to solve this issue completely (e.g., in rare cases
rename columns to allow us to reserve a name like ATTRS_COLUMN_NAME,
or alternatively, instead of using a fixed name ATTRS_COLUMN_NAME pick a
different one different from the key column names). But until we do,
better fail with a clear error instead of a crash.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190901102832.7452-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
d64980f2ae alternator-test: add initial test_condition_expression file
The file initially consists of a very simple case that succeeds
with `--aws` and expectedly fails without it, because the expression
is not implemented yet.
2019-09-11 18:01:05 +03:00
Piotr Sarna
80edc00f62 alternator-test: add tests for unsupported expressions
The test cases are marked XFAIL, as their expressions are not yet
supported in alternator. With `--aws`, they pass.
2019-09-11 18:01:05 +03:00
Pekka Enberg
380a7be54b dist/docker: Add support for Alternator
This adds a "alternator-address" and "alternator-port" configuration
options to the Docker image, so people can enable Alternator with
"docker run" with:

  docker run --name some-scylla -d <image> --alternator-port=8080
Message-Id: <20190902110920.19269-1-penberg@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
3fae8239fa alternator: throw on unsupported expressions
When an unsupported expression parameter is encountered -
KeyConditionExpression, ConditionExpression or FilterExpression
are such - alternator will return an error instead of ignoring
the parameter.
2019-09-11 18:01:05 +03:00
Amnon Heiman
811df711fb alternator/executor: update the latencies histogram
This patch update the latencies histogram for get, put, delete and
update.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-09-11 18:01:05 +03:00
Amnon Heiman
4a6d1f5559 alternator/stats metrics: use labels and estimated histogram
This patch make two chagnes to the alternator stats:
1. It add estimated_histogram for the get, put, update and delete
operation

2. It changes the metrics naming, so the operation will be a label, it
will be easier to handle, perform operation and display in this way.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
de53ed7cdd alternator_test: mark test_gsi_3 as passing
The test_gsi_3, involving creating a GSI with two key columns which weren't
previously a base key, now passes, so drop the "xfail" marker.

We still have problems with such materialized views, but not in the simple
scenario tested by test_gsi_3.

Later we should create a new test for the scenario which still fails, if
any.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
0e6338ffd9 alternator: allow creating GSI with 2 base regular columns
Creating an underlying materialized view with 2 regular base columns
is risky in Scylla, as second's column liveness will not be correctly
taken into account when ensuring view row liveness.
Still, in case specific conditions are met:
 * the regular base column value is always present in the base row
 * no TTLs are involved
then the materialized view will behave as expected.

Creating a GSI with 2 base regular columns issues a warning,
as it should be performed with care.
Message-Id: <5ce8642c1576529d43ea05e5c4bab64d122df829.1567159633.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
3325e76c6f alternator: fix default BillingMode
It is important that BillingMode should default to PROVISIONED, as it
does on DynamoDB. This allows old clients, which don't specify
BillingMode at all, to specify ProvisionedThroughput as allowed with
PROVISIONED.

Also added a test case for this case (where BillingMode is absent).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190829193027.7982-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
395a97e928 alternator: correct error on missing index or table
When querying on a missing index, DynamoDB returns different errors in
case the entire table is missing (ResourceNotFoundException) or the table
exists and just the index is missing (ValidationException). We didn't
make this distinction, and always returned ValidationException, but this
confuses clients that expect ResourceNotFoundException - e.g., Amazon's
Tic-Tac-Toe demo.

This patch adds a test for the first case (the completely missing table) -
we already had a test for the second case - and returns the correct
error codes. As usual the test passes against DynamoDB as well as Alternator,
ensure they behave the same.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190829174113.5558-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
62c4ed8ee3 alternator: improve request logging
We needlessly split the trace-level log message for the request to two
messages - one containing just the operation's name, and one with the
parameters. Moreover we printed them in the opposite order (parameters
first, then the operation). So this patch combines them into one log
message.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190829165341.3600-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
f755c22577 alternator-test: reproduce bug with using "attrs" as key column name
Alternator puts in the Scylla table a column called "attrs" for all the
non-key attributes. If the user happens to choose the same name, "attrs",
for one of the key columns, the result of writing two different columns
with the same name is a mess and corrupt sstables.

This test reproduces this bug (and works against DynamoDB of course).

Because the test doesn't cleanly fail, but rather leaves Scylla in a bad
state from which it can't fully recover, the test is marked as "skip"
until we fix this bug.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190828135644.23248-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
6b27eaf4d0 alternator: remove redundant key checks in UpdateItem
Updating key columns is not allowed in UpdateItem requests,
but the series introducing GSI support for regular columns
also introduced redundant duplicates checks of this kind.
This condition is already checked in resolve_update_path helper function
and existing test_update_expression_cannot_modify_key test makes sure that
the condition is checked.
Message-Id: <00f83ab631f93b263003fb09cd7b055bee1565cd.1567086111.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
04a117cda3 alternator-test: improve test_update_expression_cannot_modify_key
The test test_update_expression_cannot_modify_key() verifies that an
update expression cannot modify one of the key columns. The existing
test only tried the SET and REMOVE actions - this patch makes the
test more complete by also testing the ADD and DELETE actions.

This patch also makes the expected exception more picky - we now
expect that the exception message contains the word "key" (as it,
indeed, does on both DynamoDB and Alternator). If we get any other
exception, there may be a problem.

The test passed before this patch, and passes now as well - it's just
stricter now.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190829135650.30928-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
81a97b2ac0 alternator-test: add test case for GSI with both keys
A case which adds a global secondary index on a table with both
hash and sort keys is added.
2019-09-11 18:01:05 +03:00
Piotr Sarna
615603877c alternator: use from_single_value instead of from_singular in ck
The code previously used clustering_key::from_singular() to compute
a clustering key value. It works fine, but has two issues:
1. involves one redundant deserialization stage compared to
   from_single_value
2. does not work with compound clustering keys, which can appear
   when using indexes
2019-09-11 18:01:05 +03:00
Piotr Sarna
4474ceceed alternator-test: enable passing tests
With more GSI features implemented, tests with XPASS status are promoted
to being enabled.

One test case (test_gsi_describe) is partially done as DescribeTable
now contains index names, but we could try providing more attributes
(e.g. IndexSizeBytes and ItemCount from the test case), so the test
is left in the XFAIL state.
2019-09-11 18:01:05 +03:00
Piotr Sarna
f922d6d771 alternator: Add 'mismatch' to serialization error message
In order to match the tests and origin more properly, the error message
for mismatched types is updated so it contains the word 'mismatch'.
2019-09-11 18:01:05 +03:00
Piotr Sarna
9dceea14f9 alternator: add describing GSI in DescribeTable
The DescribeTable request now contains the list of index names
as well. None of the attributes of the list are marked as 'required'
in the documentation, so currently the implementation provides
index names only.
2019-09-11 18:01:05 +03:00
Piotr Sarna
938a06e4c0 alternator: allow adding GSI-related regular columns to schema
In order to be able to create a Global Secondary Index over a regular
column, this column is upgraded from being a map entry to being a full
member of the schema. As such, it's possible to use this column
definition in the underlying materialized view's key.
2019-09-11 18:01:05 +03:00
Piotr Sarna
2a123925ca alternator: add handling regular columns with schema definitions
In order to prepare alternator for adding regular columns to schema,
i.e. in order to create a materialized view over them,
the code is changed so that updating no longer assumes that only keys
are included in the table schema.
2019-09-11 18:01:05 +03:00
Piotr Sarna
befa2fdc80 alternator: start fetching all regular columns
Since in the future we may want to have more regular columns
in alternator tables' schemas, the code is changed accordingly,
so all regular columns will be fetched instead of just the attribute
map.
2019-09-11 18:01:05 +03:00
Piotr Sarna
53044645aa alternator: avoid creating empty collection mutations
If no regular column attributes are passed to PutItem, the attr
collector serializes an empty collection mutation nonetheless
and sends it. It's redundant, so instead, if the attr colector
is empty, the collection does not get serialized and sent to replicas.
2019-09-11 18:01:05 +03:00
Nadav Har'El
317954fe19 alternator-test: add license blurbs
Add copyright and license blurbs to all alternator-test source files.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190825161018.10358-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
c9eb9d9c76 alternator: update license blurbs
Update all the license blurbs to the one we use in the open-source
Scylla project, licensed under the AGPL.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190825160321.10016-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
d6e671b04f alternator: add initial tracing to requests
Each request provides basic tracing information about itself.

Example output from tracing:

cqlsh> select request, parameters from system_traces.sessions
           where session_id = 39813070-c4ea-11e9-8572-000000000000;
 request          | parameters
------------------+-----------------------------------------------------
 Alternator Query | {'query': '{"TableName": "alternator_test_15664",
                    "KeyConditions": {"p": {"AttributeValueList":
                    [{"S": "T0FE0QCS0X"}], "ComparisonOperator": "EQ"}}}'}

cqlsh> select session_id, activity from system_traces.events
           where session_id = 39813070-c4ea-11e9-8572-000000000000;
 session_id                           | activity
--------------------------------------+-----------------------------
 39813070-c4ea-11e9-8572-000000000000 |                    Querying
 39813070-c4ea-11e9-8572-000000000000 | Performing a database query
2019-09-11 18:01:05 +03:00
Piotr Sarna
cb791abb9d alternator: enable query tracing
Probabilistic tracing can be enabled via REST API. Alternator will
from now on create tracing sessions for its operations as well.

Examples:

 # trace around 0.1% of all requests
curl -X POST http://localhost:10000/storage_service/trace_probability?probability=0.001
 # trace everything
curl -X POST http://localhost:10000/storage_service/trace_probability?probability=1
2019-09-11 18:01:05 +03:00
Piotr Sarna
6c8c31bfc9 alternator: add client state
Keeping an instance of client_state is a convenient way of being able
to use tracing for alternator. It's also currently used in paging,
so adding a client state to executor removes the need of keeping
a dummy value.
2019-09-11 18:01:05 +03:00
Piotr Sarna
1ca9dc5d47 alternator: use correct string views in serialization
String views used in JSON serialization should use not only the pointer
returned by rapidjson, but also the string length, as it may contain
\0 characters.
Additionally, one unnecessary copy is elided.
2019-09-11 18:01:05 +03:00
Nadav Har'El
32b898db7b alternator: docs/alternator.md: link to a longer document
Add a link to a longer document (currently, around 40 pages) about
DynamoDB's features and how we implemented or may implement them in
Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190825121201.31747-2-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
a5c3d11ccb alternator: document choice of RF
After changing the choice of RF in a previous patch, let's update the
relevant part of docs/alternator.md.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190825121201.31747-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
d20ec9f492 alternator: expand docs/alternator.md
Expand docs/alternator.md with new sections about how to run Alternator,
and a very brief introduction to its design.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190818164628.12531-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
9b0ef1a311 alternator: refuse CreateTable if uses unsupported features
If a user tries to create a table with a unsupported feature -
a local secondary index, a used-defined encryption key or supporting
streams (CDC), let's refuse the table creation, so the application
doesn't continue thinking this feature is available to it.

The "Tags" feature is also not supported, but it is more harmless
(it is used mostly for accounting purposes) so we do not fail the
table creation because of it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190818125528.9091-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
ab25472034 alternator: migrate to visitor pattern in serialization
Types can now be processed with a visitor pattern, which is more neat
than a chain of if statements.
Message-Id: <256429b7593d8ad8dff737d8ddb356991fb2a423.1566386758.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
42d2910f2c alternator: add from_string with raw pointer to rjson
from_string is a family of function that create rjson values from
strings - now it's extended with accepting raw pointer and size.
Message-Id: <d443e2e4dcc115471202759ecc3641ec902ed9e4.1566386758.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
2f53423a2f alternator: automatically choose RF: 1 or 3
In CQL, before a user can create a table, they must create a keyspace to
contain this table and, among other things, specify this keyspace's RF.

But in the DynamoDB API, there is no "create keyspace" operation - the
user just creates a table, and there is no way, and no opportunity,
to specify the requested RF. Presumably, Amazon always uses the same
RF for all tables, most likely 3, although this is not officially
documented anywhere.

The existing code creates the keyspace during Scylla boot, with RF=1.
This RF=1 always works, and is a good choice for a one-node test run,
but was a really bad choice for a real cluster with multiple nodes, so
this patch fixes this choice:

With this patch, the keyspace creation is delayed - it doesn't happen
when the first node of the cluster boots, but only when the user creates
the first table. Presumably, at that time, the cluster is already up,
so at that point we can make the obvious choice automatically: a one-node
cluster will get RF=1, a >=3 node cluster will get RF=3. The choice of
RF is logged - and the choice of RF=1 is considered a warning.

Note that with this patch, keyspace creation is still automatic as it
was before. The user may manually create the keyspace via CQL, to
override this automatic choice. In the future we may also add additional
keyspace configuration options via configuration flags or new REST
requests, and the keyspace management code will also likely change
as we start to support clusters with multiple regions and global
tables. But for now, I think the automatic method is easiest for
users who want to test-drive Alternator without reading lengthy
instructions on how to set up the keyspace.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190820180610.5341-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
1a1935eb72 alternator-test: add a test for wrong BEGINS_WITH target type
The test ensures that passing a non-compatible type to BEGINS WITH,
e.g. a number, results in a validation error.
Tested both locally and remotely.
Message-Id: <894a10d3da710d97633dd12b6ac54edccc18be82.1566291989.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
b7b998568f alternator: add to CreateTable verification of BillingMode setting
We allow BillingMode to be set to either PAY_PER_REQUEST (the default)
or PROVISIONED, although neither mode is fully implemented: In the former
case the payment isn't accounted, and in the latter case the throughput
limits are not enforced.
But other settings for BillingMode are now refused, and we add a new test
to verify that.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190818122919.8431-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
66a2af4f7d alternator-test: require a new-enough boto library
The alternator tests want to exercise many of the DynamoDB API features,
so they need a recent enough version of the client libraries, boto3
and botocore. In particular, only in botocore 1.12.54, released a year
ago, was support for BillingMode added - and we rely on this to create
pay-per-request tables for our tests.

Instead of letting the user run with an old version of this library and
get dozens of mysterious errors, in this patch we add a test to conftest.py
which cleanly aborts the test if the libraries aren't new enough, and
recommends a "pip" command to upgrade these libraries.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190819121831.26101-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
64bf2b29a8 alternator-test: exhaustive tests for DescribeTable operation
The DescribeTable operation was currently implemented to return the
minimal information that libraries and applications usually need from
it, namely verifying that some table exists. However, this operation
is actually supposed to return a lot more information fields (e.g.,
the size of the table, its creation date, and more) which we currently
don't return.

This patch adds a new test file, test_describe_table.py, testing all
these additional attributes that DescribeTable is supposed to return.
Several of the tests are marked xfail (expected to fail) because we
did not implement these attributes yet.

The test is exhaustive except for attributes that have to do with four
major features which will be tested together with these features: GSI,
LSI, streams (CDC), and backup/restore.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190816132546.2764-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
fbd2f5077d alternator: enable timeouts on requests
Currently Alternator starts all Scylla requests (including both reads
and writes) without any timeout set. Because of bugs and/or network
problems, Requests can theoretically hang and waste Scylla request for
hours, long after the client has given up on them and closed their
connection.

The DynamoDB protocol doesn't let a user specify which timeout to use,
so we should just use something "reasonable", in this patch 10 seconds.
Remember that all DynamoDB read and write requests are small (even scans
just scan a small piece), so 10 seconds should be above and beyond
anything we actually expect to see in practice.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190812105132.18651-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
b2bd3bbc1f alternator: add "--alternator-address" configuration parameter
So far we had the "--alternator-port" option allowing to configure the port
on which the Alternator server listens on, but the server always listened
to any address. It is important to also be able to configure the listen
address - it is useful in tests running several instances of Scylla on
the same machine, and useful in multi-homed machines with several interfaces.

So this patch adds the "--alternator-address" option, defaulting to 0.0.0.0
(to listen on all interfaces). It works like the many other "--*-address"
options that Scylla already has.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190808204641.28648-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
ea41dd2cf8 alternator: docs/alternator.md more about filtering support
Give more details about what is, and what isn't, currently
supported in filtering of Scan (and Query) results.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190811094425.30951-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
88eed415bd alternator: fix indentation
It turns out that recent rjson patches introduced some buggy
tabs instead of spaces due to bad IDE configuration. The indentation
is restored to spaces.
2019-09-11 18:01:05 +03:00
Piotr Sarna
3c11428d8d alternator-test: add QueryFilter validation cases
QueryFilter validation was lately supplemented with non-key column
checks, which is hereby tested.
2019-09-11 18:01:05 +03:00
Piotr Sarna
0e0dc14302 alternator-test: add scan case for key equality filtering
With key equality filtering enabled, a test case for scanning is provided.
2019-09-11 18:01:05 +03:00
Piotr Sarna
f1641caa41 alternator: add filtering for key equality
Until now, filtering in alternator was possible only for non-key
column equality relations. This commit adds support for equality
relations for key columns.
2019-09-11 18:01:05 +03:00
Piotr Sarna
a2828f9daa alternator: add validation to QueryFilter
QueryFilter, according to docs, can only contain non-key attributes.
2019-09-11 18:01:05 +03:00
Piotr Sarna
d055658fff alternator: add computing key bounds from filtering
Alternator allows passing hash and sort key restrictions
as filters - it is, however, better to incorporate these restrictions
directly into partition and clustering ranges, if possible.
It's also necessary, as optimizations inside restrictions_filter
assume that it will not be fed unneeded rows - e.g. if filtering
is not needed on partition key restrictions, they will not be checked.
2019-09-11 18:01:05 +03:00
Piotr Sarna
9c05051b59 alternator: extract getting key value subfunction
Currently the only utility function for getting key bytes
from JSON was to parse a document with the following format:
"key_column_name" : { "key_column_type" : VALUE }.
However, it's also useful to parse only the inner document, i.e.:
{ "key_column_type" : VALUE }.
2019-09-11 18:01:05 +03:00
Piotr Sarna
c84019116a alternator: make make_map_element_restriction static
The function has no outside users and thus does not need to be exposed.
2019-09-11 18:01:05 +03:00
Piotr Sarna
3ee99a89b1 alternator: register filtering metrics
Three metrics related to filtering are added to alternator:
 - total rows read during filtering operations
 - rows read and matched by filtering
 - rows read and dropped by filtering
2019-09-11 18:01:05 +03:00
Piotr Sarna
b3e35dab26 alternator: add bumping filtering stats
When filtering is used in querying or scanning, the number of total
filtered rows is added to stats.
2019-09-11 18:01:05 +03:00
Piotr Sarna
a6d098d3eb alternator: add cql_stats to alternator stats
Some underlying operations (e.g. paging) make use of cql_stats
structure from CQL3. As such, cql_stats structure is added
to alternator stats in order to gather and use these statistics.
2019-09-11 18:01:05 +03:00
Piotr Sarna
3ae54892cd alternator: fix a comment typo
s/Miscellenous/Miscellaneous/g
2019-09-11 18:01:05 +03:00
Piotr Sarna
ccf778578a alternator: register read-before-write stats
Read-before-write stat counters were already introduced, but the metrics
needs to be added to a metric group as well in order to be available
for users.
2019-09-11 18:01:05 +03:00
Nadav Har'El
6f81d0cb15 alternator: initial support for GSI
This patch adds partial support for GSI (Global Secondary Index) in
Alternator, implemented using a materialized view in Scylla.

This initial version only supports the specific cases of the index indexing
a column which was already part of the base table's key - e.g., indexing
what used to be a sort key (clustering key) in the base table. Indexing
of non-key attributes (which today live in a map) is not yet supported in
this version.

Creation of a table with GSIs is supported, and so is deleting the table.
UpdateTable which adds a GSI to an existing table is not yet supported.
Query and Scan operations on the index are supported.
DescribeTable does not yet list the GSIs as it should.

Seven previously-failing tests now pass, so their "xfail" tag is removed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190808090256.12374-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
33611acf44 alternator: add stats for read-before-write
A simple metric counting how many read-before-writes were executed
is added.
Message-Id: <d8cc1e9d77e832bbdeff8202a9f792ceb4f1e274.1565274797.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
ae59340c15 alternator: complement rjson.hh comments
Some comments in rjson.hh header file were not clear and are hereby
amended.
Message-Id: <7fa4e2cf39b95c176af31fe66f404a6a51a25bec.1565275276.git.sarna@scylladb.com>
2019-09-11 18:01:04 +03:00
Piotr Sarna
5eb583ab09 alternator: remove missing key FIXME
The case for missing key in update_item was already properly fixed
along with migrating from libjsoncpp to rapidjson, but one FIXME
remained in the code by mistake.

Message-Id: <94b3cf53652aa932a661153c27aa2cb1207268c7.1565271432.git.sarna@scylladb.com>
2019-09-11 18:01:04 +03:00
Piotr Sarna
436f806341 alternator: remove decimal_type FIXME
Decimal precision problems were already solved by commit
d5a1854d93c9448b1d22c2d02eb1c46a286c5404, but one FIXME
remained in the code by mistake.

Message-Id: <381619e26f8362a8681b83e6920052919acf1142.1565271198.git.sarna@scylladb.com>
2019-09-11 18:01:04 +03:00
Piotr Sarna
b29b753196 alternator: add comments to rjson
The rapidjson library needs to be used with caution in order to
provide maximum performance and avoid undefined behavior.
Comments added to rjson.hh describe provided methods and potential
pitfalls to avoid.
Message-Id: <ba94eda81c8dd2f772e1d336b36cae62d39ed7e1.1565270214.git.sarna@scylladb.com>
2019-09-11 18:01:04 +03:00
Piotr Sarna
7b02c524d0 alternator: remove a pointer-based workaround for future<json>
With libjsoncpp we were forced to work around the problem of
non-noexcept constructors by using an intermediate unique pointer.
Objects provided by rapidjson have correct noexcept specifiers,
so the workaround can be dropped.
2019-09-11 18:01:04 +03:00
Piotr Sarna
cb29d6485e alternator: migrate to rapidjson library
Profiling alternator implied that JSON parsing takes up a fair amount
of CPU, and as such should be optimized. libjsoncpp is a standard
library for handling JSON objects, but it also proves slower than
rapidjson, which is hereby used instead.
The results indicated that libjsoncpp used roughly 30% of CPU
for a single-shard alternator instance under stress, while rapidjson
dropped that usage to 18% without optimizations.
Future optimizations should include eliding object copying, string copying
and perhaps experimenting with different JSON allocators.
2019-09-11 18:01:04 +03:00
Piotr Sarna
0fd1354ef9 alternator: add handling rapidjson errors in the server
If a JSON parsing error is encountered, it is transformed
to a validation exception and returned to the user in JSON form.
2019-09-11 18:01:04 +03:00
Piotr Sarna
7064b3a2bf alternator: add rapidjson helper functions
Migrating from libjsoncpp to rapidjson proved to be beneficial
for parsing performance. As a first step, a set of helper functions
is provided to ease the migration process.
2019-09-11 18:01:04 +03:00
Piotr Sarna
0b0bfc6e54 alternator: add missing namespaces to status_type
error.hh file implicitly assumed that seastar:: namespace is available
when it's included, which is not always the case. To remedy that,
seastar::httpd namespace is used explicitly.
2019-09-11 18:01:04 +03:00
Nadav Har'El
56309db085 alternator: correct catch table-already-exists exception
Our CreateTable handler assumed that the function
migration_manager::announce_new_column_family()
returns a failed future if the table already exists. But in some of
our code branches, this is not the case - the function itself throws
instead of returning a failed future. The solution is to use
seastar::futurize_apply() to handle both possibilities (direct exception
or future holding an exception).

This fixes a failure of the test_table.py::test_create_table_already_exists
test case.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 18:01:04 +03:00
Nadav Har'El
d74b203dee alternator: add docs/alternator.md
This adds a new document, docs/alternator.md, about Alternator.

The scope of this document should be expanded in the future. We begin
here by introducing Alternator and its current compatibility level with
Amazon DynamoDB, but it should later grow to explain the design of Alternator
and how it maps the DynamoDB data model onto Scylla's.

Whether this document should remain a short high-level overview, or a long
and detailed design document, remains an open question.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190805085340.17543-1-nyh@scylladb.com>
2019-09-11 18:01:04 +03:00
Piotr Sarna
75ee13e5f2 dependencies: add rapidjson
The rapidjson fast JSON parsing library is used instead of libjsoncpp
in the Alternator subproject.

[avi: update toolchain image to include the new dependency]

Message-Id: <a48104dec97c190e3762f927973a08a74fb0c773.1564995712.git.sarna@scylladb.com>
2019-09-11 18:00:44 +03:00
Nadav Har'El
5eaf73a292 alternator: fix sharing of a seastar::shared_ptr between threads
The function attrs_type() return a supposedly singleton, but because
it is a seastar::shared_ptr we can't use the same one for multiple
threads, and need to use a separate one per thread.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190804163933.13772-1-nyh@scylladb.com>
2019-09-11 16:06:05 +03:00
Nadav Har'El
1b1ede9288 alternator: fix cross-shard use of CQL type objects
The CQL type singletons like utf8_type et al. are separate for separate
shards and cannot be used across shards. So whatever hash tables we use
to find them, also needs to be per-shard. If we fail to do this, we
get errors running the debug build with multiple shards.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190804165904.14204-1-nyh@scylladb.com>
2019-09-11 16:05:39 +03:00
Nadav Har'El
7eae889513 alternator-test: some more GSI tests
Expand the GSI test suite. The most important new test is
test_gsi_key_not_in_index(), where the index's key includes just one of
the base table's key columns, but not a second one. In this case, the
Scylla implementation will nevertheless need to add the second key column
to the view (as a clustering key), even though it isn't considered a key
column by the DynamoDB API.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190718085606.7763-1-nyh@scylladb.com>
2019-09-11 16:05:38 +03:00
Nadav Har'El
10ad60f7de alternator: ListTables should not list materialized views
Our ListTables implementation uses get_column_families(), which lists both
base tables and materialized views. We will use materialized views to
implement DynamoDB's secondary indexes, and those should not be listed in
the results of ListTables.

The patch also includes a test for this.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190717133103.26321-2-nyh@scylladb.com>
2019-09-11 16:04:29 +03:00
Nadav Har'El
676ada4576 alternator-test: move list_tables to util.py
The list_tables() utility function was used only in test_table.py
but I want to use it elsewhere too (in GSI test) so let's move it
to util.py.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190717133103.26321-1-nyh@scylladb.com>
2019-09-11 16:04:28 +03:00
Piotr Sarna
f3963865f5 alternator: make set_sum exception more user-friendly
As in case of set_diff, an exception message in set_sum should include
the user-provided request (ADD) rather than our internal helper function
set_sum.
2019-09-11 16:03:27 +03:00
Piotr Sarna
9dd8644e4a alternator-tests: enable DELETE case for sets
UpdateExpression's case for DELETE operation for sets is enabled.
2019-09-11 16:03:26 +03:00
Piotr Sarna
2b215b159c alternator: implement set DELETE
UpdateExpression's DELETE operation for set is implemented on top
of set_diff helper function.
2019-09-11 16:02:25 +03:00
Piotr Sarna
fe72a6740c alternator: add set difference helper function
A function for computing set differene of two sets represented
as JSON is added.
2019-09-11 16:01:03 +03:00
Nadav Har'El
e13c56be0b alternator: fail attempt to create table with GSI
Although we do not support GSI yet, until now we silently ignored
CreateTable's GSI parameter, and the user wouldn't know the table
wasn't created as intended.

In this patch, GSI is still unsupported, but now CreateTable will
fail with an error message that GSI is not supported.

We need to change some of the tests which test the error path, and
expect an error - but should not consider a table creation error
as the expected error.

After this patch, test_gsi.py still fails all the tests on
Alternator, but much more quickly :-)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190711161420.18547-1-nyh@scylladb.com>
2019-09-11 16:00:01 +03:00
Piotr Sarna
336c90daaa alternator-test: add stub case for set add duplication
The test case for adding two sets with common values is added.
This case is a stub, because boto3 transforms the result into a Python
set, which removes duplicates on its own. A proper TODO is left
in order to migrate this case to a lower-level API and check
the returned JSON directly for lack of duplicates.
2019-09-11 16:00:00 +03:00
Piotr Sarna
67c95cb303 alternator-test: enable tests for ADD operation
Tests for UpdateExpression::ADD are enabled.
2019-09-11 15:59:59 +03:00
Piotr Sarna
f29c2f6895 alternator: add ADD operation
UpdateExpression is now able to perform ADD operation on both numbers
and sets.
2019-09-11 15:59:00 +03:00
Piotr Sarna
a5f2926056 alternator: add helper function for adding sets
A helper function that allows creating a set sum out of two sets
represented in JSON is added.
2019-09-11 15:57:41 +03:00
Piotr Sarna
18686ff288 alternator: add unwrap_set
It will be needed later to implement adding sets.
2019-09-11 15:56:15 +03:00
Piotr Sarna
09993cf857 alternator: add get_item_type_string helper function
It will be useful later for ensuring that parameters for various
functions have matching types.
2019-09-11 15:52:31 +03:00
Nadav Har'El
d54c82209c alternator: fix Query verification of appropriate key columns
The Query operation's conditions can be used to search for a particular
hash key or both hash and sort keys - but not any other combinations.
We previously forgot to verify most errors, so in this patch we add
missing verifications - and tests to confirm we fail the query when
DynamoDB does.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190711132720.17248-1-nyh@scylladb.com>
2019-09-11 15:51:27 +03:00
Nadav Har'El
fbe63ddcc4 alternator-test: more GSI tests
Add more tests for GSI - tests that DescribeTable describes the GSI,
and test the case of more than one GSI for a base table.

Unfortunately, creating an empty table with two GSIs routinely takes
on DynamoDB more than a full minute (!), so because we now have a
test with two GSIs, I had to increase the timeout in create_test_table().

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190711112911.14703-1-nyh@scylladb.com>
2019-09-11 15:51:26 +03:00
Piotr Sarna
a3be9dda7f alternator-test: enable if_not_exists-related tests
Test cases that relied on the implementation of if_not_exists are
enabled.
2019-09-11 15:51:25 +03:00
Piotr Sarna
cec82490d2 alternator: implement if_not_exists
The if_not_exists function is implemented on the basis of recently added
read-before write mechanism.
2019-09-11 15:50:22 +03:00
Piotr Sarna
b14e3c0e72 alternator: rename holds_path to a more generic name
The holds_path() utility function is actually used to check if a value
needs read before write, so its name is changed to more fitting
check_needs_read_before_write.
2019-09-11 15:49:19 +03:00
Nadav Har'El
5fc7b0507e alternator: fix bug in collection mutations
Alternator currently keeps an item's attributes inside a map, and we
had a serious bug in the way we build mutations for this map:

We didn't know there was a requirement to build this mutation sorted by
the attribute's name. When we neglect to do this sorting, this confuses
Scylla's merging algorithms, which assume collection cells are thus
sorted, and the result can be duplicate cells in a collection, and the
visible effect is a mutation that seems to be ignored - because both
old and new values exist in the collection.

So this patch includes a new helper class, "attribute_collector", which
helps collect attribute updates (put and del) and extract them in correctly
sorted order. This helper class also eliminates some duplication of
arcane code to create collection cells or deletions of collection cells.

This patch includes a simple test that previously failed, and one xfail
test that failed just because of this bug (this was the test that exposed
this bug). Both tests now succeed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190709160858.6316-1-nyh@scylladb.com>
2019-09-11 15:48:18 +03:00
Nadav Har'El
5cce53fed9 alternator-test: exhaustive tests for GSI
This patch adds what is hopefully an exhaustive test suite for the
global secondary indexing (GSI) feature, and all its various
complications and corner cases of how GSIs can be created, deleted,
named, written, read, and more (the tests are heavily documented to
explain what they are testing).

All these tests pass on DynamoDB, and fail on Alternator, so they are
marked "xfail". As we develop the GSI feature in Alternator piece by
piece, we should make these tests start to pass.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708160145.13865-1-nyh@scylladb.com>
2019-09-11 15:48:17 +03:00
Nadav Har'El
9eea90d30d alternator-test: another test for BatchWriteItem
This adds another test for BatchWriteItem: That if one of the operations is
invalid - e.g., has a wrong key type - the entire batch is rejected, and not
none of its operations are done - even the valid ones.

The test succeeds, because we already handle this case correctly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190707134610.30613-1-nyh@scylladb.com>
2019-09-11 15:48:16 +03:00
Nadav Har'El
01f4cf1373 alternator-test: test UpdateItem's SET with #reference
Test an operation like SET #one = #two, where the RHS has a reference
to a name, rather than the name itself. Also verify that DynamoDB
gives an error if ExpressionAttributeNames includes names not needed
by neither left or right hand side of such assignments.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708133311.11843-1-nyh@scylladb.com>
2019-09-11 15:48:15 +03:00
Piotr Sarna
e482f27e2f alternator-test: add test for reading key before write
The test case checks if reading keys in order to use their values
in read-before-write updates works fine.
2019-09-11 15:48:14 +03:00
Piotr Sarna
7b605d5bec alternator-test: add test case for nested read-before-write
A test for read-before-write in nested paths (inside a function call
or inside a +/- operator) is added.
2019-09-11 15:48:13 +03:00
Piotr Sarna
da795d8733 alternator-test: enable basic read-before-write cases
With unsafe read-before-write implemented, simple cases can be enabled
by removing their xfail flag.
2019-09-11 15:48:12 +03:00
Piotr Sarna
2e473b901a alternator: fix indentation 2019-09-11 15:48:09 +03:00
Piotr Sarna
bf13564a9d alternator: add unsafe read-before-write to update_item
In order to serve update requests that depend on read-before-write,
a proper helper function which fetches the existing item with a given
key from the database is added.
This read-before-write mechanism is not considered safe, because it
provides no linearizability guarantees and offers no synchronization
protection. As such, it should be consider a placeholder that works
fine on a single machine and/or no concurrent access to the same key.
2019-09-11 15:45:21 +03:00
Piotr Sarna
2fb711a438 alternator: add context parameters to calculate_value
The calculate_value utility function is going to need more context
in order to resolve paths present in the right-hand side of update_item
operators: update_info and schema.
2019-09-11 15:40:17 +03:00
Piotr Sarna
cbe1836883 alternator: add allowing key columns when resolving path
Historically, resolving a path checked for key columns, which are not
allowed to be on the left-hand side of the assignment. However, path
resolving will now also be used for right-hand side, where it should
be allowed to use the key value.
2019-09-11 15:39:15 +03:00
Piotr Sarna
20a6077fb3 alternator: add optional previous item to calculate_value
In order to implement read-before-write in the future, calculate_value
now accepts an additional parameter: previous_item. If read-before-write
was performed, previous_item will contain an item for the given key
which already exists in the database at the time of the update.
2019-09-11 15:38:13 +03:00
Piotr Sarna
784aaaa8ff alternator: move describe_item implementation up
It will be needed later to add read-before-write to update_item.
2019-09-11 15:37:13 +03:00
Nadav Har'El
bd4dfa3724 alternator-test: move create_test_table() to util.py
This patch moves the create_test_table() utility function, which creates
a test table with a unique name, from the fixtures (conftest.py) to
util.py. This will allow reusing this function in tests which need to
create tables but not through the existing fixtures. In particular
we will need to do this for GSI (global secondary index) tests
in the next patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708104438.5830-1-nyh@scylladb.com>
2019-09-11 15:37:12 +03:00
Nadav Har'El
ce13a0538c alternator-test: expand tests of duplicate items in BatchWriteItem
The tests we had for BatchWriteItem's refusal to accept duplicate keys
only used test_table_s, with just a hash key. This patch adds tests
for test_table, i.e., a table with both hash and sort keys - to check
that we check duplicates in that case correctly as well.

Moreover, the expanded tests also verify that although identical
keys are not allowed, keys with just one component (hash or sort key)
the same but the other not the same - are fine.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190705191737.22235-1-nyh@scylladb.com>
2019-09-11 15:37:11 +03:00
Nadav Har'El
9bc2685a92 alternator-test: run local tests without configuring AWS
Even when running against a local Alternator, Boto3 wants to know the
region name, and AWS credentials, even though they aren't actually needed.
For a local run, we can supply garbage values for these settings, to
allow a user who never configured AWS to run tests locally.
Running against "--aws" will, of course, still require the user to
configure AWS.

Also modified the README to be clearer, and more focused on the local
runs.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708121420.7485-1-nyh@scylladb.com>
2019-09-11 15:37:10 +03:00
Nadav Har'El
cb42c75e0a alternator-test: don't hardcode us-east-1 region
For "--aws" tests, use the default region chosen by the user in the
AWS configuration (~/.aws/config or environment variable), instead of
hard-coding "us-east-1".

Patch by Pekka Enberg.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708105852.6313-1-nyh@scylladb.com>
2019-09-11 15:37:09 +03:00
Piotr Sarna
8f9e720f10 alternator-test: enable precision test for add
With big_decimal-based implementation, the precision test passes.
Message-Id: <6d631a43901a272cb9ebd349cb779c9677ce471e.1562318971.git.sarna@scylladb.com>
2019-09-11 15:37:08 +03:00
Piotr Sarna
78e495fac3 alternator: allow arithmetics without losing precision
Calculating value represented as 'v1 + v2' or 'v1 - v2' was previously
implemented with a double type, which offers limited precision.
From now on, these computations are based on big_decimal, which
allows returning values without losing precision.
This patch depends on 'add big_decimal arithmetic operators' series.
Message-Id: <f741017fe3d3287fa70618068bdc753bfc903e74.1562318971.git.sarna@scylladb.com>
2019-09-11 15:36:08 +03:00
Piotr Sarna
466f25b1e8 alternator-test: enable batch duplication cases
With duplication checks implemented, batch write and delete tests
no longer need to be marked @xfail.
Message-Id: <6c5864607e06e8249101bd711dac665743f78d9f.1562325663.git.sarna@scylladb.com>
2019-09-11 15:36:07 +03:00
Piotr Sarna
eb7ada8387 alternator: add checking for duplicate keys in batches
Batch writes and batch deletes do not allow multiple entries
for the same key. This patch implements checking for duplicated
entries and throws an error if applicable.
Message-Id: <450220ba74f26a0893430cb903e4749f978dfd31.1562325663.git.sarna@scylladb.com>
2019-09-11 15:35:01 +03:00
Nadav Har'El
b810fa59c4 alternator-test: move utility functions to a new "util.py"
Move some common utility functions to a common file "util.py"
instead of repeating them in many test files.

The utility functions include random_string(), random_bytes(),
full_scan(), full_query(), and multiset() (the more general
version, which also supports freezing nested dicts).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190705081013.1796-1-nyh@scylladb.com>
2019-09-11 15:35:00 +03:00
Nadav Har'El
2fb77ed9ad alternator: use std::visit for reading std::variant
The idiomatic way to use an std::variant depending the type holds is to use
std::visit. This modern API makes it unnecessary to write many boiler-plate
functions to test and cast the type of the variant, and makes it impossible
to forget one of the options. So in this patch we throw out the old ways,
and welcome the new.

Thanks to Piotr Sarna for the idea.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190704205625.20300-1-nyh@scylladb.com>
2019-09-11 15:33:57 +03:00
Nadav Har'El
4d07e2b7c5 alternator: support BatchGetItem
This patch adds to Alternator an implementation of the BatchGetItem
operation, which allows to start a number of GetItem requests in parallel
in a single request.

The implementation is almost complete - the only missing feature is the
ability to ask only for non-top-level attributes in ProjectionExpression.
Everything else should work, and this patch also includes tests which,
as usual, pass on DynamoDB and now also on Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:33:50 +03:00
Nadav Har'El
d1a5512a35 alternator: fix second boot
Amazingly, it appears we never tested booting Alternator a second time :-)

Our initialization code creates a new keyspace, and was supposed to ignore
the error if this keyspace already existed - but we thought the error will
come as an exceptional future, which it didn't - it came as a thrown
exception. So we need to change handle_exception() to a try/catch.

With this patch, I can kill Alternator and it will correctly start again.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:22:48 +03:00
Nadav Har'El
374162f759 alternator: generate error on spurious key columns
Operations which take a key as parameter, namely GetItem, UpdateItem,
DeleteItem and BatchWriteItem's DeleteRequest, already fail if the given
key is missing one of the nessary key attributes, or has the wrong types
for them. But they should also fail if the given key has spurious
attributes beyond those actually needed in a key.

So this patch adds this check, and tests to confirm that we do these checks
correctly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:21:50 +03:00
Nadav Har'El
da4da6afbf alternator: fix PutItem to really replace item.
The PutItem operation, and also the PutRequest of BatchWriteItem, are
supposed to completely replace the item - not to merge the new value with
the previous value. We implemented this wrongly - we just wrote the new
item forgetting a tombstone to remove the old item.

So this patch fixes these operations, and adds tests which confirm the
fix (as usual, these tests pass on DynamoDB, failed on Alternator before
this patch, and pass after the patch).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:20:55 +03:00
Nadav Har'El
a0fffcebde alternator: add support for DeleteRequest in BatchWriteItem
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:20:01 +03:00
Nadav Har'El
83b91d4b49 alternator: add DeleteItem
Add support for the DeleteItem operation, which deletes an item.

The basic deletion operation is supported. Still not supported are:

1. Parameters to conditionally delete (ConditionalExpression or Expected)
2. Parameters to return pre-delete content
3. ReturnItemCollectionMetrics (statistics relevant for tables with LSI)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:19:46 +03:00
Nadav Har'El
b09603ed9b alternator: cleaner error on DeleteRequest
In BatchWriteItem, we currently only support the PutRequest operation.
If a user tries to use DeleteRequest (which we don't support yet), he
will get a bizarre error. Let's test the request type more carefully,
and print a better error message. This will also be the place where
eventually we'll actually implement the DeleteRequest.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:16:02 +03:00
Nadav Har'El
a7f7ce1a73 alternator-test: tests for BatchWriteItem
This patch adds more comprehensive tests for the BatchWriteItem operation,
in a new file batch_test.py. The one test we already had for it was also
moved from test_item.py here.

Some of the test still xfail for two reasons:
1. Support for the DeleteRequest operation of BatchWriteItem is missing.
2. Tests that forbid duplicate keys in the same request are missing.

As usual, all tests succeed on DynamoDB, and hopefully (I tried...)
cover all the BatchWriteItem features.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:16:01 +03:00
Nadav Har'El
a8dd3044e2 alternator: support (most of) ProjectionExpression
DynamoDB has two similar parameters - AttributesToGet and
ProjectionExpression - which are supported by the GetItem, Scan and
Query operations. Until now we supported only the older AttributesToGet,
and this patch adds support to the newer ProjectionExpression.

Besides having a different syntax, the main difference between
AttributesToGet and ProjectionExpression is that the latter also
allows fetching only a specific nested attribute, e.g., a.b[3].c.
We do not support this feature yet, although it would not be
hard to add it: With our current data representation, it means
fetching the top-level attribute 'a', whose value is a JSON, and then
post-filtering it to take out only the '.b[3].c'. We'll do that
later.

This patch also adds more test cases to test_projection_expression.py.
All tests except three which check the nested attributes now pass,
and those three xfail (they succeed on DynamoDB, and fail as expected
on Alternator), reminding us what still needs to be done.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:15:01 +03:00
Nadav Har'El
98c4e646a5 alternator-test: tests for yet-unimplemented ProjectionExpression
Our GetItem, Query and Scan implementations support the AttributesToGet
parameter to fetch only a subset of the attributes, but we don't yet
support the more elaborate ProjectionExpression parameter, which is
similar but has a different syntax and also allows to specify nested
document paths.

This patch adds existive testing of all the ProjectionExpression features.
All these tests pass against DynamoDB, but fail against the current
Alternator so they are marked "xfail". These tests will be helpful for
developing the ProjectionExpression feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:15:00 +03:00
Nadav Har'El
7c9e64ed81 alternator-test: more tests for AttributesToGet parameter
The AttributesToGet parameter - saying which attributes to fetch for each
item - is already supported in the GetItem, Query and Scan operations.
However, we only had a test for it for it for Scan. This patch adds
similar tests also for the GetItem and Query operations.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:14:59 +03:00
Nadav Har'El
9c53f33003 alternator-test: another test for top-level attribute overwrite
Yet another test for overwriting a top-level attribute which contains
a nested document - here, overwriting it by just a string.

This test passes. In the current implementation we don't yet support
updates to specific attribute paths (e.g. a.b[3].c) but we do support
well writing and over-writing top-level attributes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:14:58 +03:00
Nadav Har'El
f6fa971e96 alternator: initial implementation of "+" and "-" in UpdateExpression
This patch implements the last (finally!) syntactic feature of the
UpdateExpression - the ability to do SET a=val1+val2 (where, as
before, each of the values can be a reference to a value, an
attribute path, or a function call).

The implementation is not perfect: It adds the values as double-precision
numbers, which can lose precision. So the patch adds a new test which
checks that the precision isn't lost - a test that currently fails
(xfail) on Alternator, but passes on DynamoDB. The pre-existing test
for adding small integer now passes on Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:14:01 +03:00
Nadav Har'El
a5af962d80 alternator: support the list_append() function in UpdateExpression
In the previous patch we added function-call support in the UpdateExpression
parser. In this patch we add support for one such function - list_append().
This function takes two values, confirms they are lists, and concatenates
them. After this patch only one function remains unimplemented:
if_not_exists().

We also split the test we already had for list_append() into two tests:
One uses only value references (":val") and passes after this patch.
The second test also uses references to other attributes and will only
work after we start supporting read-modify-write.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:13:07 +03:00
Nadav Har'El
9d2eba1c75 alternator: parse more types of values in UpdateExpression
Until this patch, in update expressions like "SET a = :val", we only
allowed the right-hand-side of the assignment to be a reference to a
value stored in the request - like ":val" in the above example.

But DynamoDB also allows the value to be an attribute path (e.g.,
"a.b[3].c", and can also be a function of a bunch of other values.
This patch adds supports for parsing all these value types.

This patch only adds the correct parsing of these additional types of
values, but they are still not supported: reading existing attributes
(i.e., read-modify-write operations) is still not supported, and
none of the two functions which UpdateExpression needs to support
are supported yet. Nevertheless, the parsing is now correct, and the
the "unknown_function" test starts to pass.

Note that DynamoDB allows the right-hand side of an assignment to be
not only a single value, but also value+value and value-value. This
possibility is not yet supported by the parser and will be added
later.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:12:06 +03:00
Piotr Sarna
cb50207c7b alternator-test: add initial filtering test for scans
Currently the only supported case is equality on non-key attributes.
More complex filtering tests are also included in test_query.py.
2019-09-11 15:12:05 +03:00
Piotr Sarna
b5eb3aed10 alternator-test: add initial filtering test for query
The test cases verify that equality-based filtering on non-key
attributes works fine. It also contains test stubs for key filtering
and non-equality attribute filtering.
2019-09-11 15:12:04 +03:00
Piotr Sarna
319e946d8f alternator-test: diversify attribute values in filled test table
Filled test table used to have identical non-key attributes for all
rows. These values are now diversified in order to allow writing
filtering test cases.
2019-09-11 15:12:03 +03:00
Piotr Sarna
e4516617eb alternator: add filtering to Query
Query requests now accept QueryFilter parameter.
2019-09-11 15:11:10 +03:00
Piotr Sarna
4ea02bec89 alternator: enable filtering for Scan
Scans can now accept ScanFilter parameter to perform filtering
on returned rows.
2019-09-11 15:10:12 +03:00
Piotr Sarna
8cb078f757 alternator: add initial filtering implementation
Filtering is currently only implemented for the equality operator
on non-key attributes.
Next steps (TODO) involve:
1. Implementing filtering for key restrictions
2. Implementing non-key attribute filtering for operators other than EQ.
   It, in turn, may involve introducing 'map value restrictions' notion
   to Scylla, since now it only allows equality restrictions on map
   values (alternator attributes are currently kept in a CQL map).
3. Implementing FilterExpression in addition to deprecated QueryFilter
2019-09-11 15:08:50 +03:00
Nadav Har'El
aa94e7e680 alternator: clean up parsing of attribute-path components
Before this patch, we read either an attribute name like "name" or
a reference to one "#name", as one type of token - NAME.
However, while attribute paths indeed can use either one, in some other
contexts - such as a function name - only "name" is allowed, so we
need to distinguish between two types of tokens: NAME and NAMEREF.

While separating those, I noticed that we incorrectly allowed a "#"
followed by *zero* alphanumeric characters to be considered a NAMEREF,
which it shouldn't. In other words, NAMEREF should have ALNUM+, not ALNUM*.
Same for VALREF, which can't be just a ":" with nothing after it.
So this patch fixes these mistakes, and adds tests for them.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:08:36 +03:00
Nadav Har'El
13476c8202 alternator: complain about unused values or names in UpdateExpression
DynamoDB complains, and fails an update, if the update contains in
ExpressionAttributeNames or ExpressionAttributeValues names which aren't
used by the expression.

Let's do the same, although sadly this means more work to track which
of the references we've seen and which we haven't.

This patch makes two previously xfail (expected fail) tests become
successful tests on Alternator (they always succeeded against DynamoDB).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:07:35 +03:00
Nadav Har'El
c4fc02082b alternator-test: complete test for UpdateItem's UpdateExpression
The existing tests in test_update_expression.py thoroughly tested the
UpdateExpression features which we currently support. But tests for
features which Alternator *doesn't* yet support were partial.

In this patch, we add a large number of new tests to
test_update_expression.py aiming to cover ALL the features of
UpdateExpression, regardless of whether we already support it in
Alternator or not. Every single feature and esoteric edge-case I could
discover is covered in these tests - and as far as I know these tests
now cover the *entire* UpdateExpression feature. All the tests succeed
on DynamoDB, and confirm our understanding of what DynamoDB actually does
on all these cases.

After this patch, test_update_expression.py is a whopper, with 752 lines of
code and 37 separate test functions. 23 out of these 37 tests are still
"xfail" - they succeed on DynamoDB but fail on Alternator, because of
several features we are still missing. Those missing features include
direct updates of nested attributes, read-modify-write updates (e.g.,
"SET a=b" or "SET a=a+1"), functions (e.g., "SET a = list_append(a, :val)"),
the ADD and DELETE operations on sets, and various other small missing
pieces.

The benefit of this whopper test is two-fold: First, it will allow us
to test our implementation as we continue to fill it (i.e., "test-
driven development"). Second, all these tested edge cases basically
"reverse engineer" how DynamoDB's expression parser is supposed to work,
and we will need this knowledge to implement the still-missing features of
UpdateExpression.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:07:34 +03:00
Nadav Har'El
ede5943401 alternator-test: test for UpdateItem's UpdateExpression
This patch adds an extensive array of tests for UpdateItem's UpdateExpression
support, which was introduced in the previous patch.

The tests include verification of various edge cases of the parser, support
for ":value" and "#name" references, functioning SET and REMOVE operations,
combinations of multiple such operations, and much more.

As usual, all these tests were ran and succeed on DynamoDB, as well as on
Alternator - to confirm Alternator behaves the same as DynamoDB.

There are two tests marked "xfail" (expected to fail), because Alternator
still doesn't support the attribute copy syntax (e.g., "SET a = b",
doing a read-before-write).

There are some additional areas which we don't support - such as the DELETE
and ADD operations or SET with functions - but those areas aren't yet test
in these tests.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:07:33 +03:00
Nadav Har'El
4baa0d3b67 alternator: enable support for UpdateItem's UpdateExpression
For the UpdateItem operation, so far we supported updates via the
AttributeUpdates parameter, specifying which attributes to set or remove
and how. But this parameter is considered deprecated, and DynamoDB supports
a more elaborate way to modify attributes, via an "UpdateExpression".

In the previous patch we added a function to parse such an UpdateExpression,
and in this patch we use the result of this parsing to actually perform
the required updates.

UpdateExpression is only partially supported after this patch. The basic
"SET" and "REMOVE" operations are supported, but various other cases aren't
fully supported and will be fixed in followup patches. The following
patch will add extensive tests to confirm exactly what works correctly
with the new UpdateExpression support.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:06:34 +03:00
Nadav Har'El
829bafd181 alternator: add expression parsers
The DynamoDB protocol is based on JSON, and most DynamoDB requests describe
the operation and its parameters via JSON objects such as maps and lists.
However, in some types of requests an "expression" is passed as a single
string, and we need to parse this string. These cases include:
1. Attribute paths, such as "a[3].b.c", are used in projection
 expressions as well as inside other expressions described below.
2. Condition expressions, such as "(NOT (a=b OR c=d)) AND e=f",
 used in conditional updates, filters, and other places.
3. Update expressions, such as "SET #a.b = :x, c = :y DELETE d"

This patch introduces the framework to parse these expressions, and
an implementation of parsing update expressions. These update expressions
will be used in the UpdateItem operation in the next patch.

All these expression syntaxes are very simple: Most of them could be
parsed as regular expressions, or at most a simple hand-written lexical
analyzer and recursive-descent parser. Nevertheless, we decided to specify
these parsers in the same ANTLR3 language already used in the Scylla
project for parsing CQL, hopefully making these parsers easier to reason
about, and easier to change if needed - and reducing the amount of boiler-
plate code.

The parsing of update expressions is most complete except that in SET
actions, only the "path = value" form is supported and not yet forms
forms such as "path1 = path2" (which does read-before-write) or
"path1 = path1 + value" or "path = function(...)".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:06:12 +03:00
Nadav Har'El
f0f50607a7 alternator-test: split nested-document tests to new file
We need to write more tests for various case of handling
nested documents and nested attributes. Let's collect them
all in the same test file.

This patch mostly moves existing code, but also adds one
small test, test_nested_document_attribute_write, which
just writes a nested document and reads it back (it's
mostly covered by the existing test_put_and_get_attribute_types,
but is specifically about a nested document).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:06:11 +03:00
Nadav Har'El
12abe8e797 alternator-test: make local test the default
We usually run Alternator tests against the local Alternator - testing
against AWS DynamoDB is rarer, and usually just done when writing the
test. So let's make "pytest" without parameters default to testing locally.
To test against AWS, use "pytest --aws" explicitly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:06:10 +03:00
Piotr Sarna
b67f22bfc6 alternator: move related functions to serialization.cc
Existing functions related to serialization and deserialization
are moved to serialization.cc source file.
Message-Id: <fb49a08b05fdfcf7473e6a7f0ac53f6eaedc0144.1559646761.git.sarna@scylladb.com>
2019-09-11 15:06:05 +03:00
Piotr Sarna
fdba9866fc alternator: apply new serialization to reads and writes
Attributes for reads (GetItem, Query, Scan, ...) and writes (PutItem,
UpdateItem, ...) are now serialized and deserialized in binary form
instead of raw JSON, provided that their type is S, B, BOOL or N.
Optimized serialization for the rest of the types will be introduced
as follow-ups.
Message-Id: <6aa9979d5db22ac42be0a835f8ed2931dae208c1.1559646761.git.sarna@scylladb.com>
2019-09-11 15:02:21 +03:00
Piotr Sarna
b3fd4b5660 alternator: add simple attribute serialization routines
Attributes used to be written into the database in raw JSON format,
which is far from optimal. This patch introduces more robust
serializationi routines for simple alternator types: S, B, BOOL, N.
Serialization uses the first byte to encode attribute type
and follows with serializing data in binary form.
More complex types (sets, lists, etc.) are currently still
serialized in raw JSON and will be optimized in follow-up patches.
Message-Id: <10955606455bbe9165affb8ac8fba4d9e7c3705f.1559646761.git.sarna@scylladb.com>
2019-09-11 15:01:07 +03:00
Piotr Sarna
27f00d1693 alternator: move error class to a separate header
Error class definitions were previously in server.hh, but they
are separate entities - future .cc files can use the errors without
the need of including server definitions.
Message-Id: <b5689e0f4c9f9183161eafff718f45dd8a61b653.1559646761.git.sarna@scylladb.com>
2019-09-11 14:52:58 +03:00
Nadav Har'El
52810d1103 configure.py: move alternator source files to separate list
For some unknown reason we put the list of alternator source files
in configure.py inside the "api" list. Let's move it into a separate
list.

We could have just put it in the scylla_core list, but that would cause
frequent and annoying patch conflicts when people add alternator source
files and Scylla core source files concurrently.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:52:39 +03:00
Nadav Har'El
d4b3c493ad alternator: stub support for UpdateItem with UpdateExpression
So far for UpdateItem we only supported the old-style AttributeUpdates
parameter, not the newer UpdateExpression. This patch begins the path
to supporting UpdateExpression. First, trying to use *both* parameters
should result in an error, and this patch does this (and tests this).
Second, passing neither parameters is allowed, and should result in
an *empty* item being created.

Finally, since today we do not yet support UpdateExpression, this patch
will cause UpdateItem to fail if UpdateExpression is used, instead of
silently being ignored as we did so far.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:51:40 +03:00
Nadav Har'El
04856a81f5 alternator-tests: two simple test for nested documents
This patch adds two simple tests for nested documents, which pass:

test_nested_document_attribute_overwrite() tests what happens when
we UpdateItem a top-level attribute to a dictionary. We already tested
this works on an empty item in a previous test, but now we check what
happens when the attribute already existed, and already was a dictionary,
and now we update it to a new dictionary. In the test attribute a was
{b:3, c:4} and now we update it to {c:5}. The test verifies that the new
dictionary completely replaces the old one - the two are not merged.
The new value of the attribute is just {c:5}, *not* {b:3, c:5}.

The second test verifies that the AttributeUpdates parameter of
UpdateItem cannot be used to update a just a nested attributes.
Any dots in the attribute name are considered an actual dot - not
part of a path of attribute names.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:51:39 +03:00
Nadav Har'El
b782d1ef8d alternator-test: test_query.py: change item list comparison
Comparing two lists of items without regard for order is not trivial.
For this reason some tests in test_query.py only compare arrays of sort
keys, and those tests are fine.

But other tests used a trick of converting a list of items into a
of set_of_frozen_elements() and compare this sets. This trick is almost
correct, but it can miss cases where items repeat.

So in this patch, we replace the set_of_frozen_elements() approach by
a similar one using a multiset (set with repetitions) instead of a set.
A multiset in Python is "collections.Counter". This is the same approach
we started to also used in test_scan.py in a recent patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:51:38 +03:00
Nadav Har'El
15f47a351e alternator: remove unused code
Remove the incomplete and unused function to convert DynamoDB type names
to ScyllaDB type objects:

DynamoDB has a different set of types relevant for keys and for attributes.
We already have a separate function, parse_key_type(), for parsing key
types, and for attributes - we don't currently parse the type names at
all (we just save them as JSON strings), so the function we removed here
wasn't used, and was in fact #if'ed out. It was never completed, and it now
started to decay (the type for numbers is wrong), so we're better off
completely removing it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:50:44 +03:00
Nadav Har'El
b63bd037ea alternator: implement correct "number" type for keys
This patch implements a fully working number type for keys, and now
Alternator fully and correctly supports every key type - strings, byte
arrays, and numbers.

The patch also adds a test which verifies that Scylla correctly sorts
number sort keys, and also correctly retrieves them to the full precision
guaranteed by DynamoDB (38 decimal digits).

The implementation uses Scylla's "decimal" type, which supports arbitrary
precision decimal floating point, and in particular supports the precision
specified by DynamoDB. However, "decimal" is actually over-qualified for
this use, so might not be optimal for the more specific requirements of
DynamoDB. So a FIXME is left to optimize this case in the future.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:47 +03:00
Nadav Har'El
cb1b2b1fc2 alternator-test: test_scan.py: change item list comparison
Comparing two lists of items without regard for order is not trivial.
test_scan.py currently has two ways of doing this, both unsatisfactory:

1. We convert each list to a set via set_of_frozen_elements(), and compare
   the sets. But this comparison can miss cases where items repeat.

2. We use sorted() on the list. This doesn't work on Python 3 because
   it removed the ability to compare (with "<") dictionaries.

So in this patch, we replace both by a new approach, similar to the first
one except we use a multiset (set with repetitions) instead of a set.
A multiset in Python is "collections.Counter".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:46 +03:00
Nadav Har'El
4a1b6bf728 alternator-test: drop "test_2_tables" fixture
Creating and deleting tables is the slowest part of our tests,
so we should lower the number of tables our tests create.

We had a "test_2_tables" fixture as a way to create two
tables, but since our tests already create other tables
for testing different key types, it's faster to reuse those
tables - instead of creating two more unused tables.

On my system, a "pytest --local", running all 38 tests
locally, drops from 25 seconds to 20 seconds.

As a bonus, we also have one fewer fixture ;-)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:45 +03:00
Nadav Har'El
013fb1ae38 alternator-text: fix errors in len/length variable name
Also change "xrage" to "range" to appease Python 3

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:44 +03:00
Nadav Har'El
30a123d8ad DynamoDB limits the size of hash keys to 2048 bytes, sort keys
to 1024 bytes, and the entire item to 400 KB which therefore also
limits the size of one attribute. This test checks that we can
reach up to these limits, with binary keys and attributes.

The test does *not* check what happens once we exceed these
limits. In such a case, DynamoDB throws an error (I checked that
manually) but Alternator currently simply succeeds. If in the
future we decide to add artificial limits to Alternator as well,
we should add such tests as well.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:43 +03:00
Nadav Har'El
b91eca28bd alternator-test: don't use "len" as a parameter name
"len" is an unfortunate choice for a variable name, in case one
day the implementation may want to call the built-in "len" function.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:42 +03:00
Nadav Har'El
e21e0e6a37 alternator-test: test sort-key ordering - for both string and binary keys
We already have a test for *string* sort-key ordering of items returned
by the Scan operation, and this test adds a similar test for the Query
operation. We verify that items are retrieved in the desired sorted
order (sorted by the aptly-named sort key) and not in creation order
or any other wrong order.

But beyond just checking that Query works as expected (it should,
given it uses the same machinary as Scan), the nice thing about this
test is that it doesn't create a new table - it uses a shared table
and creates one random partition inside it. This makes this test
faster and easier to write (no need for a new fixture), and most
importantly - easily allows us to write similar tests for other
key types.

So this patch also tests the correct ordering of *binary* sort keys.
It helped exposed bugs in previous versions of the binary key implementation.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:41 +03:00
Nadav Har'El
1d058cf753 alternator-test: test item operations with binary keys
Simple tests for item operations (PutItem, GetItem) with binary key instead
of string for the hash and sort keys. We need to be able to store such
keys, and then retrieve them correctly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:40 +03:00
Nadav Har'El
4bfd5d7ed1 alternator: add support for bytes as key columns
Until now we only supported string for key columns (hash or sort key).
This patch adds support for the bytes type (a.k.a binary or blob) as well.
The last missing type to be supported in keys is the number type.

Note that in JSON, bytes values are represented with base64 encoding,
so we need to decode them before storing the decoded value, and re-encode
when the user retrieves the value. The decoding is important not just
for saving storage space (the encoding is 4/3 the size of the decoded)
but also for correct *sorting* of the binary keys.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:35 +03:00
Nadav Har'El
57b46a92d7 alternator: add base64 encoding and decoding functions
The DynamoDB API uses base64 encoding to encode binary blobs as JSON
strings. So we need functions to do these conversions.

This code was "inspired" by https://github.com/ReneNyffenegger/cpp-base64
but doesn't actually copy code from it.

I didn't write any specific unit tests for this code, but it will be
exercised and tested in a following patch which tests Alternator's use
of these functions.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:46:13 +03:00
Piotr Sarna
0980fde9d5 alternator-test: add dedicated BEGINS_WITH case to Query
BEGINS_WITH behaves in a special way when a key postfix
consists of <255> bytes. The initial test does not use that
and instead checks UTF-8 characters, but once bytes type
is implemented for keys, it should also test specifically for
corner cases, like strings that consist of <255> byte only.
Message-Id: <fe10d7addc1c9d095f7a06f908701bb2990ce6fe.1558603189.git.sarna@scylladb.com>
2019-09-11 14:46:12 +03:00
Piotr Sarna
5bc7bb00e0 alternator-test: rename test_query_with_paginator
Paginator is an implementation detail and does not belong in the name,
and thus the test is renamed to test_query_basic_restrictions.
Message-Id: <849bc9d210d0faee4bb8479306654f2a59e18517.1558524028.git.sarna@scylladb.com>
2019-09-11 14:46:11 +03:00
Piotr Sarna
9e2ecf5188 alternator: fix string increment for BEGINS_WITH
BEGINS_WITH statement increments a string in order to compute
the upper bound for a clustering range of a query.
Unfortunately, previous implementation was not correct,
as it appended a <0> byte if the last character was <255>,
instead of incrementing a last-but-one character.
If the string contains <255> bytes only, the upper bound
of the returned upper bound is infinite.
Message-Id: <3a569f08f61fca66cc4f5d9e09a7188f6daad578.1558524028.git.sarna@scylladb.com>
2019-09-11 14:45:17 +03:00
Nadav Har'El
7b9180cd99 alternator: common get_read_consistency() function
We had several places in the code that need to parse the
ConsistentRead flag in the request. Let's add a function
that does this, and while at it, checks for more error
cases and also returns LOCAL_QUORUM and LOCAL_ONE instead
of QUORUM and ONE.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:44:24 +03:00
Nadav Har'El
56907bf6c6 alternator: for writes, use LOCAL_QUORUM instead of QUORUM
As Shlomi suggested in the past, it is more likely that when we
eventually support global tables, we will use LOCAL_QUORUM,
not QUORUM. So let's switch to that now.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:44:20 +03:00
Nadav Har'El
8c347cc786 alternator-test: verify that table with only hash key also works
So far, all of the tests in test_item.py (for PutItem, GetItem, UpdateItem),
were arbitrarily done on a test table with both hash key and sort key
(both with string type). While this covers most of the code paths, we still
need to verify that the case where there is *not* a sort key, also works
fine. E.g., maybe we have a bug where a missing clustering key is handled
incorrectly or an error is incorrectly reported in that case?

But in this patch we add tests for the hash-key-only case, and see that
it already works correctly. No bug :-)

We add a new fixture test_table_s for creating a test table with just
a single string key. Later we'll probably add more of these test tables
for additional key types.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:41:16 +03:00
Nadav Har'El
c53b2ebe4d alternator-test: also test for missing part of key
Another type of key type error can be to forget part of the key
(the hash or sort key). Let's test that too (it already works correctly,
no need to patch the code).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:41:15 +03:00
Nadav Har'El
f58abb76d6 alternator: gracefully handle wrong key types
When a table has a hash key or sort key of a certain type (this can
be string, bytes, or number), one cannot try to choose an item using
values of different types.

We previously did not handle this case gracefully, and PutItem handled
it particularly bad - writing malformed data to the sstable and basically
hanging Scylla. In this patch we fix the pk_from_json() and ck_from_json()
functions to verify the expected type, and fail gracefully if the user
sent the wrong type.

This patch also adds tests for these failures, for the GetItem, PutItem,
and UpdateItem operations.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:40:23 +03:00
Nadav Har'El
9ee912d5cf alternator: correct handling of missing item in GetItem
According to the documentation, trying to GetItem a non-existant item
should result in an empty response - NOT a response with an empty "Item"
map as we do before this patch.

This patch fixes this case, and adds a test case for it. As usual,
we verify that the test case also works on Amazon DynamoDB, to verify
DynamoDB really behaves the way we thik it does.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:39:32 +03:00
Nadav Har'El
7f73f561d5 alternator: fix support for empty items
If an empty item (i.e., no attributes except the key) is created, or an item
becomes empty (by deleting its existing attributes), the empty item must be
maintained - it cannot just disappear. To do this in Scylla, we must add a
row marker - otherwise an empty attribute map is not enough to keep the
row alive.

This patch includes 4 test cases for all the various ways an empty item can be
created empty or non-empty item be emptied, and verifies that the empty item
can be correctly retrieved (as usual, to verify that our expectation of
"correctness" is indeed correct, we run the same tests against DynamoDB).
All these 4 tests failed before this patch, and now succeed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:38:40 +03:00
Nadav Har'El
95ed2f7de8 alternator: remove two unused lines of code
These lines of codes were superfluous and their result unused: the
make_item_mutation() function finds the pk and ck on its own.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:37:49 +03:00
Nadav Har'El
eb81b31132 alternator: add statistics
his patch adds a statistics framework to Alternator: Executor has (for
each shard) a _stats object which contains counters for various events,
and also is in charge of making these counters visible via Scylla's regular
metrics API (http://localhost:9180/metrics).

This patch includes a counter for each of DynamoDB's operation types,
and we increase the ones we support when handled. We also added counters
for total operations and unsupported operations (operation types we don't
yet handle). In the future we can easily add many more counters: Define
the counter in stats.hh, export it in stats.cc, and increment it in
where relevant in executor.cc (or server.cc).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:36:26 +03:00
Piotr Sarna
d267e914ad alternator-test: add initial Query test
The test covers simple restrictions on primary keys.
Message-Id: <2a7119d380a9f8572210571c565feb8168d43001.1558356119.git.sarna@scylladb.com>
2019-09-11 14:36:25 +03:00
Piotr Sarna
b309c9d54b alternator: implement basic Query
The implementation covers the following restrictions
 - equality for hash key;
 - equality, <, <=, >, >=, between, begins_with for sort key.
Message-Id: <021989f6d0803674cbd727f9b8b3815433ceeea5.1558356119.git.sarna@scylladb.com>
2019-09-11 14:36:16 +03:00
Piotr Sarna
8571046d3e alternator: move do_query to separate function
A fair portion of code from scan() will be used later to implement
query(), so it's extracted as a helper function.
Message-Id: <d3bc163a1cb2032402768fcbc6a447192fba52a4.1558356119.git.sarna@scylladb.com>
2019-09-11 14:31:31 +03:00
Nadav Har'El
4a8b2c794d alternator-test: another edge case for Scan with AttributesToGet
Ask to retrieve only an attribute name which *none* of the items have.
The result should be a silly list of empty items, and indeed it is.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:31:30 +03:00
Nadav Har'El
c766d1153d alternator-test: shorten test_scan.py by reusing full_scan more
Use full_scan() in another test instead of open-coding the scan.
There are two more tests that could have used full_scan(), but
since they seem to be specifically adding more assertions or
using a different API ("paginators"), I decided to leave them
as-is. But new tests should use full_scan().

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:31:29 +03:00
Nadav Har'El
2666b29c77 alternator-test: test AttributesToGet parameter in Scan request
This is a short, but extensive, test to the AttributesToGet parameter
to Scan, allowing to select for output only some of the attributes.

The AttributesToGet feature has several non-obvious features. Firstly,
it doesn't require that any key attributes be selected. So since each
item may have different non-key attributes, some scanned items may
be missing some of the selected columns, and some of the items may
even be missing *all* the selected columns - in which case DynamoDB
returns an empty item (and doesn't entirely skip this item). This
test covers all these cases, and it adds yet another item to the
'filled_test_table' fixture, one which has different attributes,
so we can see these issues.

As usual, this test passes in both DynamoDB and Alternator, to
assure we correspond to the *right* behavior, not just what we
think is right.

This test actually exposed a bug in the way our code returned
empty items (items which had none of the selected columns),
a bug which was fixed by the previous patch.

Instead of having yet another copy of table-scanning code, this
patch adds a utility function full_scan(), to scan an entire
table (with optional extra parameters for the scan) and return
the result as an array. We should simply existing tests in
test_scan.py by using this new function.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:31:28 +03:00
Avi Kivity
446faba49c Merge "dbuild: add --image option, help, and usage" from Benny
* tag 'dbuild-image-help-usage-v1' of github.com:bhalevy/scylla:
  dbuild: add usage
  dbuild: add help option
  dbuild: list available images when no image arg is given
  dbuild: add --image option
2019-09-11 14:30:45 +03:00
Nadav Har'El
f871a4bc87 alternator: fix bug in returning an empty item in a Scan
When a Scan selects only certain attributes, and none of the key
attributes are selected, for some of the scanned items *nothing*
will remain to be output, but still Dynamo outputs an empty item
in this case. Our code had a bug where after each item we "moved"
the object leaving behind a null object, not an empty map, so a
completely empty item wasn't output as an empty map as expected,
and resulted in boto3 failing to parse the response.

This simple one-line patch fixes the bug, by resetting the item
to an empty map after moving it out.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:30:37 +03:00
Piotr Sarna
8525b14271 alternator: add lookup table for requests
Instead of using a really long if-else chain, requests are now
looked up via a routing table.
Message-Id: <746a34b754c3070aa9cbeaf98a6e7c6781aaee65.1557914794.git.sarna@scylladb.com>
2019-09-11 14:29:59 +03:00
Piotr Sarna
f3440f2e4a alternator-test: migrate filled_test_table to use batches
Filled test table fixture now takes advantage of batch writes
in order to run faster.
Message-Id: <e299cdffa9131d36465481ca3246199502d65e0c.1557914382.git.sarna@scylladb.com>
2019-09-11 14:29:58 +03:00
Piotr Sarna
4c3bdd3021 alternator-test: add batch writing test case
Message-Id: <a950799dd6d31db429353d9220b63aa96676a7a7.1557914382.git.sarna@scylladb.com>
2019-09-11 14:29:57 +03:00
Piotr Sarna
c0ecd1a334 alternator: add basic BatchWriteItem
The initial implementation only supports PutRequest requests,
without serving DeleteRequest properly.
Message-Id: <451bcbed61f7eb2307ff5722de33c2e883563643.1557914382.git.sarna@scylladb.com>
2019-09-11 14:29:50 +03:00
Nadav Har'El
9a0c13913d alternator: improve where DescribeEndpoints gets its information
Instead of blindly returning "localhost:8000" in response to
DescribeEndpoints and for sure causing us problems in the future,
the right thing to do is to return the same domain name which the
user originally used to get to us, be it "localhost:8000" or
"some.domain.name:1234". But how can we know what this domain name
was? Easy - this is why HTTP 1.1 added a mandatory "Host:" header,
and the DynamoDB driver I tested (boto3) adds it as expected,
indeed with the expected value of "localhost:8000" on my local setup.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:25:22 +03:00
Nadav Har'El
a4a3b2fe43 alternator-test: test for sort order of items in a single partition
Although different partitions are returned by a Scan in (seemingly)
random order, items in a single partition need to be returned sorted
by their sort key. This adds a test to verify this.

This patch adds to the filled_test_table fixture, which until now
had just one item in each partition, another partition (with the key
"long") with 164 additional items. The test_scan_sort_order_string
test then scans this table, and verifies that the items are really
returned in sorted order.

The sort order is, of course, string order. So we have the first
item with sort key "1", then "10", then "100", then "101", "102",
etc. When we implement numeric keys we'll need to add a version
of this test which uses a numeric clustering key and verifies the
sort order is numeric.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:25:21 +03:00
Nadav Har'El
32c388b48c alternator: fix clustering key setup
Because of a typo, we incorrectly set the table's sort key as a second
partition key column instead of a clustering key column. This has bad
but subtle consequences - such as that the items are *not* sorted
according to the sort key. So in this patch we fix the typo.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:24:30 +03:00
Nadav Har'El
29e0f68ee0 alternator: add initial implementation of DescribeEndpoints
DescribeEndpoints is not a very important API (and by default, clients
don't use it) but I wanted to understand how DynamoDB responds to it,
and what better way than to write a test :-)

And then, if we already have a test, let's implement this request in
Scylla as well. This is a silly implementation, which always returns
"localhost:8000". In the future, this will need to be configurable -
we're not supposed here to return *this* server's IP address, but rather
a domain name which can be used to get to all servers.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:22:47 +03:00
Avi Kivity
211b0d3eb4 Merge "sstables, gdb: Centralize tracking of sstable instances" from Tomasz
"
Currently, GDB scripts locate sstables by scanning the heap for
bag_sstable_set containers. That has disadvatanges:

  - not all containers are considered

  - it's extremely slow on large heaps

  - fragile, new containers can be added, and we won't even know

This series fixes all above by adding a per-shard sstable tracker
which tracks sstable objects in a linked-list.
"

* 'sstable-tracker' of github.com:tgrabiec/scylla:
  gdb: Use sstable tracker to get the list of sstables
  gdb: Make intrusive_list recognize member_hook links
  sstables: Track whether sstable was already open or not
  sstables: Track all instances of sstable objects
  sstables: Make sstable object not movable
  sstables: Move constructor out of line
2019-09-11 14:22:41 +03:00
Nadav Har'El
982b5e60e7 alternator: unify and improve TableName field handling
Most of the request types need to a TableName parameter, specifying the
name of the table they operate on. There's a lot of boilerplate code
required to get this table name and verify that it is valid (the parameter
exists, is a string, passes DynamoDB's naming rules, and the table
actually exists), which resulted in a lot of code duplication - and
in some cases missing checks.

So this patch introduces two utility functions, get_table_name()
and get_table(), to fetch a table name or the schema of an existing
table, from the request, with all necessary validation. If validation
fails, the appropriate api_error() is thrown so the user gets the
right error message.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:21:53 +03:00
Nadav Har'El
b8fc783171 alternator-test: clean up conftest.py
Remove unused random-string code from conftest.py, and also add a
TODO comment how we should speed up filled_test_table fixture by
using a batch write - when that becomes available in Alternator.
(right now this fixture takes almost 4 seconds to prepare on a local
Alternator, and a whopping 3 minutes (!) to prepare on DynamoDB).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:21:52 +03:00
Piotr Sarna
a4387079ac alternator-test: add initial scan test
Message-Id: <c28ff1d38930527b299fe34e9295ecd25607398c.1557757402.git.sarna@scylladb.com>
2019-09-11 14:21:51 +03:00
Piotr Sarna
b6d148c9e0 alternator-test: add filled test table fixture
The fixture creates a test table and fills it with random data,
which can be later used for testing reads.
Message-Id: <649a8b8928e1899c5cbd82d65d745a464c1163c8.1557757402.git.sarna@scylladb.com>
2019-09-11 14:21:50 +03:00
Piotr Sarna
4def674731 alternator: implement basic scan
The most basic version of Scan request is implemented.
It still contains a list of TODOs, among which the support for Segments
parameter for scan parallelism.
Message-Id: <5d1bfc086dbbe64b3674b0053e58a0439e64909b.1557757402.git.sarna@scylladb.com>
2019-09-11 14:21:39 +03:00
Piotr Sarna
0ce3866fb5 alternator: lower debug messages verbosity in the HTTP server
The HTTP server still uses WARN log level to log debug messages,
which is way higher than necessary. These messages are degraded
to TRACE level.
Message-Id: <59559277f2548d4046001bebff45ab2d3b7063b5.1557744617.git.sarna@scylladb.com>
2019-09-11 14:12:40 +03:00
Nadav Har'El
d45220fb39 alternator-test: simplify test_put_and_get_attribute_types
The test test_put_and_get_attribute_types needlessly named all the
different attributes and their variables, causing a lot of repetition
and chance for mistakes when adding additional attributes to the test.

In this rewrite, we only have a list of items, and automatically build
attributes with them as values (using sequential names for the attributes)
and check we read back the same item (Python's dict equality operator
checks the equality recursively, as expected).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:12:39 +03:00
Nadav Har'El
ea32841dab alternator-test: test all attribute types
Although we planned to initially support only string types, it turns out
for the attributes (*not* the key), we actually support all types already,
including all scalar types (string, number, bool, binary and null) and
more complex types (list, nested document, and sets).

This adds a tests which PutItem's these types and verifies that we can
retrieve them.

Note that this test deals with top-level attributes only. There is no
attempt to modify only a nested attribute (and with the current code,
it wouldn't work).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:12:38 +03:00
Nadav Har'El
c645538061 alternator-test: rewrite ListTables test
In our tests, we cannot really assume that ListTables should returns *only*
the tables we created for the test, or even that a page size of 100 will
be enough to list our 3 pages. The issue is that on a shared DynamoDB, or
in hypothetical cases where multiple tests are run in parallel, or previous
tests had catestrophic errors and failed to clean up, we have no idea how
many unrelated tables there are in the system. There may be hundreds of
them.  So every ListTables test will need to use paging.

So in this re-implementation, we begin with a list_tables() utility function
which calls ListTables multiple times to fetch all tables, and return the
resulting list (we assume this list isn't so huge it becomes unreasonable
to hold it in memory). We then use this utility function to fetch the table
list with various page sizes, and check that the test tables we created are
listed in the resulting list.

There's no longer a separate test for "all" tables (really was a page of 100
tables) and smaller pages (1,2,3,4) - we now have just one test that does the
page sizes 1,2,3,4, 50 and 100.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:12:37 +03:00
Piotr Sarna
6b83e17b74 alternator: add tests to ListTables command
Test cases cover both listing appropriate table names
and pagination.
Message-Id: <e7d5f1e5cce10c86c47cdfb4d803149488935ec0.1557402320.git.sarna@scylladb.com>
2019-09-11 14:12:36 +03:00
Piotr Sarna
dfbf4ffe0f alternator-test: add 2 tables fixture
For some tests, more than 1 table is needed, so another fixture
that provided two additional test tables is added.
Message-Id: <75ae9de5cc1bca19594db1f0bc03260f83459380.1557402320.git.sarna@scylladb.com>
2019-09-11 14:12:35 +03:00
Piotr Sarna
b6dde25bcc alternator: implement ListTables
ListTables is used to extract all table names created so far.
Message-Id: <04f4d804a40ff08a38125f36351e56d7426d2e3d.1557402320.git.sarna@scylladb.com>
2019-09-11 14:10:54 +03:00
Piotr Sarna
b73a9f3744 alternator: use trace level for debug messages
In the early development stage, warn level was used for all
debug messages, while it's more appropriate to use 'trace' or 'debug'.
Message-Id: <419ca5a22bc356c6e47fce80b392403cefbee14d.1557402320.git.sarna@scylladb.com>
2019-09-11 14:10:02 +03:00
Nadav Har'El
4ed9aa4fb4 alternator-test: cleanup in conftest.py
This patch cleans up some comments and reorganizes some functions in
conftest.py, where the test_table fixture was defined. The goal is to
later add additional types of test tables with different schemas (e.g.,
just a partition key, different key types, etc.) without too much
code duplication.

This patch doesn't change anything functional in the tests, and they
still pass ("pytest --local" runs all tests against the local Alternator).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:10:01 +03:00
Nadav Har'El
5c564b7117 alternator: make ck_from_json() easier to use
The ck_from_json() utility function is easier to use if it handles
the no-clustering-key case as the callers need them too, instead of
requiring them to handle the no-clustering-key case separately.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:09:06 +03:00
Nadav Har'El
3ae0066aae alternator: add support for UpdateItem's DELETE operation
So far we supported UpdateItem only with PUT operations - this patch
adds support for DELETE operations, to delete specific attributes from
an item.

Only the case of a missing value is support. DynamoDB also provides
the ability to pass the old value, and only perform the deletion if
the value and/or its type is still up-to-date - but we don't support
this yet and fail such request if it is attempted.

This patch also includes a test for this case in alternator-test/

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:08:57 +03:00
Nadav Har'El
81679d7401 alternator-test: add tests for UpdateItem
Add initial tests for UpdateItem. Only the features currently supported
by our code (only string attributes, only "PUT" action) are tested.

As usual, this test (like all others) was tested to pass on both DynamoDB
and Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:03:10 +03:00
Nadav Har'El
0c2a440f7f alternator: add initial UpdateItem implementation
Add an initial UpdateItem implementation. As PutItem and GetItem we
are still limited to string attributes. This initial implementation
of UpdateItem implements only the "PUT" action (not "DELETE" and
certainly not "ADD") and not any of the more advanced options.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:03:00 +03:00
Piotr Sarna
686d1d9c3c alternator: add attrs_column() helper function
Message-Id: <d93ae70ccd27fe31d0bc6915a20d83d7a85342cf.1557223199.git.sarna@scylladb.com>
2019-09-11 13:08:52 +03:00
Piotr Sarna
6ad9b10317 alternator: make constant names more explicit
KEYSPACE and ATTRS constants refer to their names, not objects,
so they're named more explicitly.
Message-Id: <14b1f00d625e041985efbc4cbde192bd447cbf03.1557223199.git.sarna@scylladb.com>
2019-09-11 13:07:14 +03:00
Piotr Sarna
2975ca668c alternator: remove inaccessible return statement
Message-Id: <afaef20e7e110fa23271fb8c3dc40cec0716efb6.1557223199.git.sarna@scylladb.com>
2019-09-11 13:06:21 +03:00
Piotr Sarna
6e8db5ac6a alternator: inline keywords
It was decided that all alternator-specific keywords can be inlined
in code instead of defining them as constants.
Message-Id: <6dffb9527cfab2a28b8b95ac0ad614c18027f679.1557223199.git.sarna@scylladb.com>
2019-09-11 13:04:38 +03:00
Nadav Har'El
50a69174b3 alternator: some cleanups in validate_table_name()
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:03:44 +03:00
Nadav Har'El
0e06d82a1f alternator: clean up api_error() interface
All operation-generated error messages should have the 400 HTTP error
code. It's a real nag to have to type it every time. So make it the
default.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:01:47 +03:00
Nadav Har'El
0634629a79 alternator-test: test for error on creating an already-existing table
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:01:46 +03:00
Nadav Har'El
6fe6cf0074 alternator: correct error when trying to CreateTable an existing table
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:00:54 +03:00
Nadav Har'El
871dd7b908 alternator: fix return object from PutItem
Without special options, PutItem should return nothing (an empty
JSON result). Previously we had trouble doing this, because instead
of return an empty JSON result, we converted an empty string into
JSON :-) So the existing code had an ugly workaround which worked,
sort of, for the Python driver but not for the Java driver.

The correct fix, in this patch, is to invent a new type json_string
which is a string *already* in JSON and doesn't need further conversion,
so we can use it to return the empty result. PutItem now works from
YCSB's Java driver.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:00:47 +03:00
Nadav Har'El
ae1ee91f3c alternator-test: more examples in README.md
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:56:07 +03:00
Nadav Har'El
886438784c alternator-test: test table name limit of 222 bytes, instead of 255.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:56:06 +03:00
Nadav Har'El
28e7fa20ed alternator: limit table names to 222 bytes
Although we would like to allow table names up to 222 bytes, this is not
currently possible because Scylla tacks additional 33 bytes to create
a directory name, and directory names are limited to 255 bytes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:55:07 +03:00
Nadav Har'El
a702e5a727 alternator-test: verify appropriate error when invalid key type is used
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:55:06 +03:00
Nadav Har'El
8af58b0801 alternator: better key type parsing
The supported key types are just S(tring), B(lob), or N(umber).
Other types are valid for attributes, but not for keys, and should
not be accepted. And wrong types used should result in the appropriate
user-visible error.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:54:12 +03:00
Nadav Har'El
6cdcf5abac alternator-test: additional cases of invalid schemas in CreateTable
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:54:11 +03:00
Nadav Har'El
9839183157 alternator: better invalid schema detection for CreateTable
To be correct, CreateTable's input parsing need to work in reverse from
what it did: First, the key columns are listed in KeySchema, and then
each of these (and potetially more, e.g., from indexes) need to appear
AttributeDefinitions.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:53:22 +03:00
Nadav Har'El
8bfbc1bae5 alternator-test: tests for CreateTable with bad schema
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:53:21 +03:00
Benny Halevy
0f01a4c1b8 dbuild: add usage
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-11 12:53:02 +03:00
Benny Halevy
f43bffdf9c dbuild: add help option
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-11 12:52:50 +03:00
Nadav Har'El
dc34c92899 alternator: better error handling for schema errors in CreateTable
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:52:31 +03:00
Nadav Har'El
77de0af40f alternator-test: test for PutItem to nonexistant table
We expect to see the right error code, not some "internal error".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:52:30 +03:00
Nadav Har'El
ca3553c880 alternator: PutItem: appropriate error for a non-existant table
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:51:38 +03:00
Nadav Har'El
275a07cf10 alternator-test: add another column to test_basic_string_put_and_get()
Just to make sure our success isn't limited to just a single non-key
attribute, let's add another one.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:51:37 +03:00
Nadav Har'El
6ca72b5fed alternator: GetItem should by default returns all the columns, not none
The test

  pytest --local test_item.py::test_basic_string_put_and_get

Now passes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:51:31 +03:00
Benny Halevy
c840c43fa7 dbuild: list available images when no image arg is given
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-11 12:51:26 +03:00
Nadav Har'El
9920143fb7 alternator: change empty return of PutItem
Without any arguments, PutItem should return no data at all. But somehow,
for reasons I don't understand, the boto3 driver gets confused from an
empty JSON thinking it isn't JSON at all. If we return a structure with
an empty "attributes" fields, boto3 is happy.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:49:20 +03:00
Nadav Har'El
8dec31d23b alternator: add initial implementation of DeleteTable
Add an initial implementation of Delete table, enough for making the

   pytest --local test_table.py::test_create_and_delete_table

Pass.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:45:42 +03:00
Nadav Har'El
41d4b88e78 alternator: on unknown operation, return standard API error
When given an unknown operation (we didn't implement yet many of them...)
we should throw the appropriate api_error, not some random exception.

This allows the client to understand the operation is not supported
and stop retrying - instead of retrying thinking this was a weird
internal error.

For example the test
   pytest --local test_table.py::test_create_and_delete_table

Now fails immediately, saying Unsupported operation DeleteTable.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:45:04 +03:00
Nadav Har'El
1b1921bc94 alternator: fix JSON in DescribeTable response
The structure's name in DescribeTable's output is supposed to be called
"Table", not "TableDescription". Putting in the wrong place caused the
driver's table creation waiters to fail.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:44:14 +03:00
Nadav Har'El
6a455035ba alternator: validate table name in CreateTable
validate table name in CreateTable, and if it doesn't fit DynamoDB's
requirement, return the appropriate error as drivers expect.

With this patch, test_table.py::test_create_table_unsupported_names
now passes (albeit with a one minute pause - this a bug with keep-alive
support...).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:24 +03:00
Nadav Har'El
0da214c2fe alternator-test: test_create_table_unsupported_names minor fix
Check the expected error message to contain just ValidationException
instead of an overly specific text message from DynamoDB, so we aren't
so constraint in our own messages' wording.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:23 +03:00
Nadav Har'El
4f721a0637 alternator-test: test for creating table with very long name
Dynamo allows tables names up to 255 characters, but when this is tested on
Alternator, the results are disasterous: mkdir with such a long directory
name fails, Scylla considers this an unrecoverable "I/O error", and exits
the server.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:22 +03:00
Nadav Har'El
6967dd3d8f test-table: test DescribeTable on non-existent table
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:21 +03:00
Nadav Har'El
d0cdc65b4c Add "--local" option to run test against local Scylla installation
For example "pytest --local test_item.py"

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:21 +03:00
Nadav Har'El
079c7c3737 test_item.py: basic string put and get test
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:20 +03:00
Nadav Har'El
4550f3024d test_table fixture: be quicker to realize table was created.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:19 +03:00
Nadav Har'El
f1f76ed17b test_table fixture: automatically delete
Automatically delete the test table when the test ends.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:18 +03:00
Nadav Har'El
a946e255c6 test_item.py: start testing CRUD operations
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:17 +03:00
Nadav Har'El
4d7d871930 Start to use "test fixtures"
Start to use "test fixtures" defined in conftest.py: The connection to
the DynamoDB API, and also temporary tables, can be reused between multiple
tests.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:16 +03:00
Nadav Har'El
6984ccf462 Add some table tests and README
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:15 +03:00
Nadav Har'El
f66ec337f7 alternator: very initial implementation of DescribeTable
This initial implementation is enough to pass a test of getting a
failure for a non-existant table -
test_table.py::test_describe_table_non_existent_table
and to recognize an existing table. But it's still missing a lot
of fields for an existing table (among others, the schema).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:41:32 +03:00
Nadav Har'El
ad9eb0a003 alternator: errors should be output from server as Dynamo drivers expect
Exceptions from the handlers need to be output in a certain way - as
a JSON with specific fields - as DynamoDB drivers expect them to be.
If a handler throws an alternator::api_error with these specific fields,
they are output, but any other exception is converted into the same
format as an "Internal Error".

After this patch, executor code can throw an alternator::api_error and
the client will receive this error in the right format.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:40:55 +03:00
Nadav Har'El
db49bc6141 alternator: add alternator::api_error exception type
DynamoDB error messages are returned in JSON format and expect specific
information: Some HTTP error code (often but not always 400), a string
error "type" and a user-readable message. Code that wants to return
user-visible exceptions should use this type, and in the next patch we
will translate it to the appropriate JSON string.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:39:26 +03:00
Nadav Har'El
9d72bc3167 alternator: table creation time is in seconds
The "Timestamp" type returned for CreationDateTime can be one of several
things but if it is a number, it is supposed to be the time in *seconds*
since the epoch - not in milliseconds. Returning milliseconds as we
wrongly did causes boto3 (AWS's Python driver) to throw a parse exception
on this response.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:38:41 +03:00
Nadav Har'El
c0518183c2 alternator: require alternator-port configuration
Until now, we always opened the Alternator port along with Scylla's
regular ports (CQL etc.). This should really be made optional.

With this patch, by default Alternator does NOT start and does not
open a port. Run Scylla with --alternator-port=8000 to open an Alternator
API port on port 8000, as was the default until now. It's also possible
to set this in scylla.yaml.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:38:31 +03:00
Piotr Sarna
2ec78164bc alternator: add minimal HTTP interface
The interface works on port 8000 by default and provides
the most basic alternator operations - it's an incomplete
set without validation, meant to allow testing as early as possible.
2019-09-11 12:34:18 +03:00
Benny Halevy
443e0275ab dbuild: add --image option
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-11 11:46:33 +03:00
Tomasz Grabiec
06154569d5 gdb: Use sstable tracker to get the list of sstables 2019-09-10 17:05:19 +02:00
Tomasz Grabiec
a141d30eca gdb: Make intrusive_list recognize member_hook links
GDB now gives "struct boost::intrusive::member_hook" from template_arguments()
2019-09-10 17:05:19 +02:00
Tomasz Grabiec
c014c79d4b sstables: Track whether sstable was already open or not
Some sstable objects correspond to sstables which are being written
and are not sealed yet. Such sstables don't have all the fields
filled-in. Tools which calculate statistics (like GDB scripts) need to
distinguish such sstables.
2019-09-10 17:05:18 +02:00
Tomasz Grabiec
33bef82f6b sstables: Track all instances of sstable objects
Will make it easier to collect statistics about sstable in-memory metadata.
2019-09-10 17:05:16 +02:00
Tomasz Grabiec
fd74504e87 sstables: Make sstable object not movable
Will be easier to add non-movable fields.

We don't really need it to be movable, all instances should be managed
by a shared pointer.
2019-09-10 17:04:54 +02:00
Tomasz Grabiec
589c7476e0 sstables: Move constructor out of line 2019-09-10 17:04:54 +02:00
Tomasz Grabiec
785fe281e7 gdb: scylla sstables: Print table name
Message-Id: <1568121825-32008-1-git-send-email-tgrabiec@scylladb.com>
2019-09-10 16:36:21 +03:00
Glauber Costa
6651f96a70 sstables: do not keep sharding information from scylla metadata in memory (#4915)
There is no reason to keep parts of the the Scylla Metadata component in memory
after it is read, parsed, and its information fed into the SSTable.

We have seen systems in which the Scylla metadata component is one
of the heaviest memory users, more than the Summary and Filter.

In particular, we use the token metadata, which is the largest part of the
Scylla component, to calculate a single integer -> the shards that are
responsible for this SSTable. Once we do that, we never use it again

Tests: unit (release/debug), + manual scylla write load + reshard.

Fixes #4951

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-09-09 22:28:51 +03:00
Tomasz Grabiec
a09479e63c Merge "Validate position in partition monotonicity" from Benny
Introduce mutation_fragment_stream_validator class and use it as a
Filter to flat_mutation_reader::consume_in_thread from
sstable::write_components to validate partition region and optionally
clustering key monotonicity.

Fixes #4803
2019-09-09 15:38:31 +02:00
Benny Halevy
42f6462837 config: enable_sstable_key_validation by default in debug build
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
34d306b982 config: add enable_sstable_key_validation option
key monotonicity validation requires an overhead to store the last key and also to compare
therefore provide an option to enable/disable it (disabled by default).

Refs #4804

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
507c99c011 mutation_fragment_stream_validator: add compare_keys flag
Storing and comparing keys is expensive.
Add a flag to enable/disable this feature (disabled by default).
Without the flag, only the partition region monotonicity is
validated, allowing repeated clustering rows, regardless of
clustering keys.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
bc2ef1d409 mutation_fragment: declare partition_region operator<< in header file
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
496467d0a2 sstables: writer: Validate input mutation fragment stream
Fixes #4803
Refs #4804

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
a37acee68f position_in_partition: define operator=(position_in_partition_view)
The respective constructor is explicit.
Define this assignment operator to be used by flat_mutation_reader
mutation_fragment_stream_validator filter so that it can use
mutation_fragment::position() verbatim and keep its internal
state as position_in_partition.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
41b60b8bc5 compaction: s/filter_func/make_partition_filter/
It expresses the purpose of this function better
as suggested by Tomasz Grabiec.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
24c7320575 dbuild: run interactive shell by default
If not given any other args to run, just run an interactive shell.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190909113140.9130-1-bhalevy@scylladb.com>
2019-09-09 15:15:57 +03:00
Nadav Har'El
2543760ee6 docs/metrics.md: document additional "lables"
Recently we started to use more the concept of metric labels - several
metrics which share the same name, but differ in the value of some label
such a "group" (for different scheduling groups).

This patch documents this feature in docs/metrics.md, gives the example of
scheduling groups, and explains a couple more relevant Promethueus syntax
tricks.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190909113803.15383-1-nyh@scylladb.com>
2019-09-09 15:15:57 +03:00
Botond Dénes
59a96cd995 scylla-gdb.py: introduce scylla task-queues
This command provides an overview of the reactors task queues.
Example:
   id name                             shares  tasks
 A 00 "main"                           1000.00 4
   01 "atexit"                         1000.00 0
   02 "streaming"                       200.00 0
 A 03 "compaction"                      171.51 1
   04 "mem_compaction"                 1000.00 0
*A 05 "statement"                      1000.00 2
   06 "memtable"                          8.02 0
   07 "memtable_to_cache"               200.00 0

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190906060039.42301-1-bdenes@scylladb.com>
2019-09-09 15:15:57 +03:00
Avi Kivity
8e8975730d Update seastar submoodule
* seastar cb7026c16f...b3fb4aaab3 (10):
  > Revert "scheduling groups: Adding per scheduling group data support"
  > scheduling groups: Adding per scheduling group data support
  > rpc: check that two servers are not created with the same streaming id
  > future: really ignore exceptions in ignore_ready_future
  > iostream: Constify eof() function
  > apply.hh: add missing #include for size_t
  > scheduling_group_demo: add explicit yields since future::get() no longer does
  > Fix buffer size used when calling accept4()
  > future-util: reduce allocations and continuations in parallel_for_each
  > rpc: lz4_decompressor: Add a static constexpr variable decleration for Cpp14 compatibility
2019-09-09 15:15:34 +03:00
Gleb Natapov
9e9f64d90e messaging_service: configure different streaming domain for each rpc server
A streaming domain identifies a server across shards. Each server should
have different one.

Fixes: #4953

Message-Id: <20190908085327.GR21540@scylladb.com>
2019-09-08 14:05:40 +03:00
Piotr Sarna
01410c9770 transport: make sure returning connection errors happens inside the gate.
Previously, the gate could get
closed too early, which would result in shutting down the server
before it had an opportunity to respond to the client.

Refs #4818
2019-09-08 13:23:20 +03:00
Avi Kivity
5663218fac Merge "types: Fix decimal to integer and varint to integer conversion" from Rafael
"
The release notes for boost 1.67.0 includes:

Breaking Change: When converting a multiprecision integer to a narrower type, if the value is too large (or negative) to fit in the smaller type, then the result is either the maximum (or minimum) value of the target

Since we just moved out of boost 1.66, we have to update our code.

This fixes issue #4960
"

* 'espindola/fix-4960' of https://github.com/espindola/scylla:
  types: fix varint to integer conversion
  types: extract a from_varint_to_integer from make_castas_fctn_from_decimal_to_integer
  types: fix decimal to integer conversion
  types: extract helper for converting a decimal to a cppint
  types: rename and detemplate make_castas_fctn_from_decimal_to_integer
2019-09-08 10:45:42 +03:00
Avi Kivity
244218e483 Merge "simplify date type" from Rafael
"
With this patch series one has to be explicit to create a date_type_impl and now there is only the one documented difference between date_type_impl and timestamp_type_impl.
"

* 'espindola/simplify-date-type' of https://github.com/espindola/scylla:
  types: Reduce duplication around date_type_impl
  types: Don't use date_type_native_type when we want a timestamp
  types: Remove timestamp_native_type
  types: Don't specialize data_type_for for db_clock::time_point
  types: Make it harder to create date_type
2019-09-08 10:21:48 +03:00
Rafael Ávila de Espíndola
3bac4ebac7 types: Reduce duplication around date_type_impl
According to the comments, the only different between date_type_impl
and timestamp_type_impl is the comparison function.

This patch makes that explicit by merging all code paths except:

* The warning when converting between the two
* The compare function

The date_type_impl type can still be user visible via very old
sstables or via the thrift protocol. It is not clear if we still need
to support either, but with this patch it is easy to do so.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola
36d40b4858 types: Don't use date_type_native_type when we want a timestamp
In these cases it is pretty clear that the original code wanted to
create a timestamp_type data_value but was creating a date_type one
because of the old defaults.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola
01cd21c04d types: Remove timestamp_native_type
Now that we know that anything expecting a date_type has been
converted to date_type_native_type, switch to using
db_clock::time_point when we want a timestamp_type.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola
df6c2d1230 types: Don't specialize data_type_for for db_clock::time_point
This also moves every user to date_type_native_type. A followup patch
will convert to timestamp_type when appropriate.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola
e09fa2dcff types: Make it harder to create date_type
date_type was replaced with timestamp_type, but it was very easy to
create a date_type instead of a timestamp_type by accident.

This patch changes the code so that a date_type is no longer
implicitly used when constructing a data_value. All existing code that
was depending on this is converted to explicitly using
date_type_native_type. A followup patch will convert to timestamp_type
when appropriate.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-07 10:07:33 -07:00
Gleb Natapov
f78b2c5588 transport: remove remaining craft related to cql's server load balancing
Commit 7e3805ed3d removed the load balancing code from cql
server, but it did not remove most of the craft that load balancing
introduced. The most of the complexity (and probably the main reason the
code never worked properly) is around service::client_state class which
is copied before been passed to the request processor (because in the past
the processing could have happened on another shard) and then merged back
into the "master copy" because a request processing may have changed it.

This commit remove all this copying. The client_request is passed as a
reference all the way to the lowest layer that needs it and it copy
construction is removed to make sure nobody copies it by mistake.

tests: dev, default c-s load of 3 node cluster

Message-Id: <20190906083050.GA21796@scylladb.com>
2019-09-07 18:17:53 +03:00
Avi Kivity
3b5aa13437 Merge "Optimize type find" from Rafael
"
This avoids a double dispatch on _kind and also removes a few shared_ptr copies.

The extra work was a small regression from the recent types refactoring.
"

* 'espindola/optimize_type_find' of https://github.com/espindola/scylla:
  types: optimize type find implementation
  types: Avoid shared_ptr copies
2019-09-07 18:14:36 +03:00
Gleb Natapov
5b9dc00916 test: fix query_processor_test::test_query_counters to use SERIAL consistency correctly
It is not possible to scan a table with SERIAL consistency only to read
a single partition.

Message-Id: <20190905143023.GQ21540@scylladb.com>
2019-09-07 18:07:01 +03:00
Gleb Natapov
e52ebfb957 cql3: remove unused next_timestamp() function
next_timestamp() just calls get_timestamp() directly and nobody uses it
anyway.

Message-Id: <20190905101648.GO21540@scylladb.com>
2019-09-05 17:20:21 +03:00
Botond Dénes
783277fb02 stream_session: STREAM_MUTATION_FRAGMENTS: print errors in receive and distribute phase
Currently when an error happens during the receive and distribute phase
it is swallowed and we just return a -1 status to the remote. We only
log errors that happen during responding with the status. This means
that when streaming fails, we only know that something went wrong, but
the node on which the failure happened doesn't log anything.

Fix by also logging errors happening in the receive and distribute
phase. Also mention the phase in which the error happened in both error
log messages.

Refs: #4901
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190903115735.49915-1-bdenes@scylladb.com>
2019-09-05 13:43:00 +02:00
Rafael Ávila de Espíndola
dd81e94684 types: fix varint to integer conversion
The previous code was using the boost::multiprecision::cpp_int to
integer conversion, but that doesn't have the same semantics an cql
for signed numbers.

This fixes the dtest cql_cast_test.py:CQLCastTest.cast_varint_test.

Fixes #4960

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-04 15:08:14 -07:00
Rafael Ávila de Espíndola
263e18b625 types: extract a from_varint_to_integer from make_castas_fctn_from_decimal_to_integer
It will be used when converting varint to integer too.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-04 15:08:14 -07:00
Rafael Ávila de Espíndola
2d453b8e17 types: fix decimal to integer conversion
The previous code was using the boost::multiprecision::cpp_rational to
integer conversion, but that doesn't have the same semantics an cql.

This patch avoids creating a cpp_rational in the first place and works
just with integers.

This fixes the dtest cql_cast_test.py:CQLCastTest.cast_decimal_test.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-04 15:08:14 -07:00
Rafael Ávila de Espíndola
fb760774dd types: extract helper for converting a decimal to a cppint
It will also be used in the decimal to integer conversion.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-04 15:08:07 -07:00
Rafael Ávila de Espíndola
40e6882906 types: rename and detemplate make_castas_fctn_from_decimal_to_integer
It was only ever used for varint.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-04 14:54:47 -07:00
Avi Kivity
301246f6c0 storage_proxy: protect _view_update_handlers_list iterators from invalidation
on_down() iterates over _view_update_handlers_list, but it yields during iteration,
and while it yields, elements in that list can be removed, resulting in a
use-after-free.

Prevent this by registering iterators that can be potentially invalidated, and
any time we remove an element from the list, check whether we're removing an element
that is being pointed to by a live iterator. If that is the case, advance the iterator
so that it points at a valid element (or at the end of the list).

Fixes #4912.

Tests: unit (dev)
2019-09-04 17:19:28 +03:00
Tomasz Grabiec
9f5826fd4b Merge "Use canonical mutations for background schema sync" from Botond
Currently the background schema sync (push/pull) uses frozen mutation to
send the schema mutations over the wire to the remote node. For this to
work correctly, both nodes have to have the exact same schema for the
system schema tables, as attempting to unpack the frozen mutation with
the wrong schema leads to undefined behaviour.
To avoid this and to ensure syncing schema between nodes with different
schema table schema versions is defined we migrate the background
schema sync to use canonical mutations for the transfer of the schema
mutations. Canonical mutations are immune to this problem, as they
support deserializing with any version of the schema, older or newer
one.

The foreground schema sync mechanisms -- the on-demand schema pulls on
reads and writes -- already use canonical mutations to transmit the
schema mutations.

It is important to note that due to this change, column-level
incompatibilities between the schema mutations and the schema used to
deserialize them will be hidden. This is undesired and should be fixed
in a follow-up (#4956). Table level incompatibilities are detected and
schema mutations containing such mutations will be rejected just like before.

This patch adds canonical mutation support to the two background schema
sync verbs:
* `DEFINITIONS_UPDATE` (schema push)
* `MIGRATION_REQUEST` (schema pull)

Both verbs still support the old frozen mutation schema transfer, albeit
that path is now much less efficient. After all nodes are upgraded, the
pull verb can effectively avoid sending frozen mutations altogether,
completely migrating to canonical mutations. Unfortunately this was not
possible for the push verb, so that one now has an overhead as it needs
to send both the frozen and canonical mutations.

Fixes: #4273
2019-09-04 13:58:14 +02:00
Benny Halevy
bc29520eb8 flat_mutation_reader: consume_in_thread: add mutation_filter
For validating mutation_fragment's monotonicity.

Note: forwarding constructor allows implicit conversion by
current callers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-04 13:42:37 +03:00
Rafael Ávila de Espíndola
000514e7cc sstable: close file_writer if an exception in thrown
The previous code was not exception safe and would eventually cause a
file to be destroyed without being closed, causing an assert failure.

Unfortunately it doesn't seem to be possible to test this without
error injection, since using an invalid directory fails before this
code is executed.

Fixes #4948

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190904002314.79591-1-espindola@scylladb.com>
2019-09-04 13:28:55 +03:00
Botond Dénes
7adc764b6e messaging_service: add canonical_support to schema pull and push verbs
The verbs are:
* DEFINITIONS_UPDATE (push)
* MIGRATION_REQUEST (pull)

Support was added in a backward-compatible way. The push verb, sends
both the old frozen mutation parameter, and the new optional canonical
mutation parameter. It is expected that new nodes will use the latter,
while old nodes will fall-back to the former. The pull verb has a new
optional `options` parameter, which for now contains a single flag:
`remote_supports_canonical_mutation_retval`. This flag, if set, means
that the remote node supports the new canonical mutation return value,
thus the old frozen mutations return value can be left empty.
2019-09-04 10:32:44 +03:00
Botond Dénes
d9a8ff15d8 service::migration_manager: add canonical_mutation merge_schema_from() overload
Add an overload which takes a vector of canonical mutations. Going
forward, this is the overload to use.
2019-09-04 10:32:44 +03:00
Botond Dénes
e02b93cae1 schema_tables: convert_schema_to_mutations: return canonical_mutations
In preparation to the schema push/pull migrating to use canonical
mutations, convert the method producing the schema mutations to return a
vector of canonical mutations. The only user, MIGRATION_REQUEST verb,
converts the canonical mutations back to frozen mutations. This is very
inefficient, but this path will only be used in mixed clusters. After
all nodes are upgraded the verb will be sending the canonical mutations
directly instead.
2019-09-04 08:47:20 +03:00
Rafael Ávila de Espíndola
b100f95adc types: optimize type find implementation
This turns find into a template so there is only one switch over the
kind of each type in the search.

To evaluate the change in code size sizes, I added [[noinline]] to
find and obtained the following results.

The release columns for release in the before case have an extra column
because the functions are sufficiently complex to trigger gcc to split
them in hot + cold.

before:
                      dev                         release (hot + cold split)
find                  0x35f               = 863   0x3d5 + 0x112               = 1255
references_duration   0x62 + 0x22 + 0x8   = 140   0x55 + 0x1f + 0x2a + 0x8    = 166
references_user_type  0x6b + 0x26 + 0x111 = 418   0x65 + 0x1f + 0x32 + 0x11b  = 465

after:
                      dev                          release
find                  0xd6 + 0x1b4        = 650    0xd2 + 0x1f5               = 711
references_duration   0x13                = 19     0x13                       = 19
references_user_type  0x1a                = 26     0x21                       = 33

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-03 08:23:21 -07:00
Rafael Ávila de Espíndola
e0065b414e types: Avoid shared_ptr copies
They are somewhat expensive (in code size at least) and not needed
everywhere.

Inside the getter the variables are 'const data_type&', so we can
return that. Everything still works when a copy is needed, but in code
that just wants to check a property we avoid the copy.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-03 07:43:35 -07:00
Benny Halevy
bdfb73f67d scripts/create-relocatable-package: ldd: print executable name in exception
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190903080511.534-1-bhalevy@scylladb.com>
2019-09-03 15:34:38 +03:00
Avi Kivity
294a86122e Merge "nonroot installer" from Takuya
"
This is nonroot installer patchset v9.
"

* 'nonroot_v9' of https://github.com/syuu1228/scylla:
  dist/common/scripts: support nonroot mode on setup scripts
  reloc/python3: add install.sh on python relocatable package
  install.sh: add --nonroot mode
  dist/common/systemd: untemplataize *.service, use drop-in units instead
  dist/debian: delete debian/*.install, debian/*.dirs
2019-09-03 15:33:20 +03:00
Piotr Sarna
7b297865e1 transport: wait for the connections to finish when stopping (#4818)
During CQL request processing, a gate is used to ensure that
the connection is not shut down until all ongoing requests
are done. However, the gate might have been left too early
if the database was not ready to respond immediately - which
could result in trying to respond to an already closed connection
later. This issue is solved by postponing leaving the gate
until the continuation chain that handles the request is finished.

Refs #4808
2019-09-03 14:49:11 +03:00
Avi Kivity
8fb59915bb Merge "Minor cleanup patches for sstables" from Asias
* 'cleanup_sstables' of https://github.com/asias/scylla:
  sstables: Move leveled_compaction_strategy implementation to source file
  sstables: Include dht/i_partitioner.hh for dht::partition_range
2019-09-03 14:47:44 +03:00
Takuya ASADA
31ddb2145a dist/common/scripts: support nonroot mode on setup scripts
Since nonroot mode requires to run everything on non-privileged user,
most of setup scripts does not able to use nonroot mode.
We only provide following functions on nonroot mode:
 - EC2 check
 - IO setup
 - Node exporter installer
 - Dev mode setup
Rest of functions will be skipped on scylla_setup.
To implement nonroot mode on setup scripts, scylla_util provides
utility functions to abstract difference of directory structure between normal
installation and nonroot mode.
2019-09-03 20:06:35 +09:00
Takuya ASADA
cfa8885ae1 reloc/python3: add install.sh on python relocatable package
To support nonroot installation on scylla-python3, add install.sh on
scylla-python3 relocatable package.
2019-09-03 20:06:30 +09:00
Takuya ASADA
2de14e0800 install.sh: add --nonroot mode
This implements the way to install Scylla without requires root privilege,
not distribution dependent, does not uses package manager.
2019-09-03 20:06:24 +09:00
Takuya ASADA
cde798dba5 dist/common/systemd: untemplataize *.service, use drop-in units instead
Since systemd unit can override parameters using drop-in unit, we don't need
mustache template for them.

Also, drop --disttype and --target options on install.sh since it does not
required anymore, introduce --sysconfdir instead for non-redhat distributions.
2019-09-03 20:06:15 +09:00
Takuya ASADA
49a360f234 dist/debian: delete debian/*.install, debian/*.dirs
Since ac9b115, we switched to install.sh on Debian so we don't rely on .deb
specific packaging scripts anymore.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2019-09-03 20:06:09 +09:00
Benny Halevy
7827e3f11d tests: test_large_data: do not stop database
Now that compaction returns only after the compacted sstables are
deleted we no longer need to stop the base to force waiting
for deletes (that were previously done asynchronously)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:38 +03:00
Benny Halevy
19b67d82c9 table::on_compaction_completion: fix indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:38 +03:00
Benny Halevy
8dd6e13468 table::on_compaction_completion: wait for background deletes
Don't let background deletes accumulate uncontrollably.

Fixes #4909

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:38 +03:00
Benny Halevy
da6645dc2c table: refresh_snapshot before deleting any sstables
The row cache must not hold refrences to any sstable we're
about to delete.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:29 +03:00
Nadav Har'El
6c4ad93296 api/compaction_manager: do not hold map on the stack
Merged patch series by Amnon Heiman:

This patch fixes a bug that a map is held on the stack and then is used
by a future.

Instead, the map is now moved to the relevant lambda function.

Fixes #4824
2019-09-01 13:16:34 +03:00
Avi Kivity
e962beea20 toolchain: update to Fedora 30 and gcc 9.2
In Fedora 30 we have a new boost version, so we no longer need to
use our patched boost, so we also remove the scylladb/toolchain copr.
2019-09-01 12:05:26 +03:00
Piotr Sarna
23c891923e main: make sure view_builder doesn't propagate semaphore errors
Stopping services which occurs in a destructor of deferred_action
should not throw, or it will end the program with
terminate(). View builder breaks a semaphore during its shutdown,
which results in propagating a broken_semaphore exception,
which in turn results in throwing an exception during stop().get().
In order to fix that issue, semaphore exceptions are explicitly
ignored, since they're expected to appear during shutdown.

Fixes #4875
2019-09-01 11:59:57 +03:00
Tomasz Grabiec
c8f8a9450f Merge "Improve cpu instruction set support checks" from Avi
To prevent termination with SIGILL, tighten the instruction set
support checks. First, check for CLMUL too. Second, add a check in
scylla_prepare to catch the problem early.

Fixes #4921.
2019-08-30 16:54:04 +02:00
Avi Kivity
07010af44c scylla_prepare: verify processor satisfies instruction set requirements
Scylla requires the CLMUL and SSE 4.2 instruction sets and will fail without them.
There is a check in main(), but that happens after the code is running and it may
already be too late. Add a check in scylla_prepare which runs before the main
executable.
2019-08-29 15:34:29 +03:00
Avi Kivity
9579946e72 main: extend CPU feature check to verify that PCLMUL is available
Since 79136e895f, we use the pclmul instruction set,
so check it is there.
2019-08-29 15:13:32 +03:00
Gleb Natapov
e61a86bbb2 to_string: Add operator<< overload for std::tuple.
Message-Id: <20190829100902.GN21540@scylladb.com>
2019-08-29 13:35:02 +03:00
Rafael Ávila de Espíndola
036f51927c sstables: Remove unused include
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190827210424.37848-1-espindola@scylladb.com>
2019-08-28 11:32:44 +03:00
Benny Halevy
869b518dca sstables: auto-delete unsealed sstables
Fixes #4807

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190827082044.27223-1-bhalevy@scylladb.com>
2019-08-28 09:46:17 +03:00
Botond Dénes
969aa22d51 configure.py: promote unused result warning to error
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190827111428.6829-2-bdenes@scylladb.com>
2019-08-28 09:46:17 +03:00
Botond Dénes
480b42b84f tests/gossip_test: silence discarded future warning
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190827111428.6829-1-bdenes@scylladb.com>
2019-08-28 09:46:17 +03:00
Avi Kivity
d85339e734 Update seastar submodule
* seastar 20bfd61955...cb7026c16f (2):
  > net: dpdk: suppress discarded future warning
  > Merge "Optimize promises in then/then_wrapped" from Rafael
2019-08-28 09:46:17 +03:00
Avi Kivity
f1d73d0c13 Merge "systemd: put scylla processes in systemd slices. #4743" from Glauber
"
It is well known that seastar applications, like Scylla, do not play
well with external processes: CPU usage from external processes may
confuse the I/O and CPU schedulers and create stalls.

We have also recently seen that memory usage from other application's
anonymous and page cache memory can bring the system to OOM.

Linux has a very good infrastructure for resource control contributed by
amazingly bright engineers in the form of cgroup controllers. This
infrastructure is exposed by SystemD in the form of slices: a
hierarchical structure to which controllers can be attached.

In true systemd way, the hierarchy is implicit in the filenames of the
slice files. a "-" symbol defines the hierarchy, so the files that this
patch presents, scylla-server and scylla-helper, essentially create a
"scylla" cgroup at the top level with "server" and "helper" children.

Later we mark the Services needed to run scylla as belonging to one
or the other through the Slice= directive.

Scylla DBAs can benefit from this setup by using the systemd-run
utility to fire ad-hoc commands.

Let's say for example that someone wants to hypothetically run a backup
and transfer files to an external object store like S3, making sure that
the amount of page cache used won't create swap pressure leading to
database timeouts.

One can then run something like:

sudo systemd-run --uid=id -u scylla --gid=id -g scylla -t --slice=scylla-helper.slice /path/to/my/magical_backup_tool

(or even better, the backup tool can itself be a systemd timer)
"

* 'slices' of https://github.com/glommer/scylla:
  systemd: put scylla processes in systemd slices.
  move postinst steps to an external script
2019-08-26 20:16:55 +03:00
Benny Halevy
20083be9f6 sstables: delete_atomically: fix misplaced parenthesis in pending_delete_log warning message
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190818064637.9207-1-bhalevy@scylladb.com>
2019-08-26 19:50:21 +03:00
Avi Kivity
b9e9d7d379 Merge "Resolve discarded future warnings" from Botond
"
The warning for discarded futures will only become useful, once we can
silence all present warnings and flip the flag to make it become error.
Then it will start being useful in finding new, accidental discarding of
futures.
This series silences all remaining warnings in the Scylla codebase. For
those cases where it was obvious that the future is discarded on
purpose, the author taking all necessary precaution (handling exception)
the warning was simply silenced by casting the future to void and
adding a relevant comment. Where the discarding seems to have been done
in error, I have fixed the code to not discard it. To the rest of the
sites I added a FIXME to fix the discarding.
"

* 'resolve-discarded-future-warnings/v4.2' of https://github.com/denesb/scylla:
  treewide: silence discarded future warnings for questionable discards
  treewide: silence discarded future warnings for legit discards
  tests: silence discarded future warnings
  tests/cql_query_test.cc: convert some tests to thread
2019-08-26 19:40:25 +03:00
Botond Dénes
136fc856c5 treewide: silence discarded future warnings for questionable discards
This patches silences the remaining discarded future warnings, those
where it cannot be determined with reasonable confidence that this was
indeed the actual intent of the author, or that the discarding of the
future could lead to problems. For all those places a FIXME is added,
with the intent that these will be soon followed-up with an actual fix.
I deliberately haven't fixed any of these, even if the fix seems
trivial. It is too easy to overlook a bad fix mixed in with so many
mechanical changes.
2019-08-26 19:28:43 +03:00
Botond Dénes
fddd9a88dd treewide: silence discarded future warnings for legit discards
This patch silences those future discard warnings where it is clear that
discarding the future was actually the intent of the original author,
*and* they did the necessary precautions (handling errors). The patch
also adds some trivial error handling (logging the error) in some
places, which were lacking this, but otherwise look ok. No functional
changes.
2019-08-26 18:54:44 +03:00
Botond Dénes
cff4c4932d tests: silence discarded future warnings 2019-08-26 18:54:44 +03:00
Botond Dénes
486fa8c10c tests/cql_query_test.cc: convert some tests to thread
Some tests are currently discarding futures unjustifiably, however
adding code to wait on these futures is quite inconvenient due to the
continuation style code of these tests. Convert them to run in a seastar
thread to make the fix easier.
2019-08-26 18:54:44 +03:00
Tomasz Grabiec
ac5ff4994a service: Announce the new schema version when features are enabled
Introduced in c96ee98.

We call update_schema_version() after features are enabled and we
recalculate the schema version. This method is not updating gossip
though. The node will still use it's database::version() to decide on
syncing, so it will not sync and stay inconsistent in gossip until the
next schema change.

We should call updatE_schema_version_and_announce() instead so that
the gossip state is also updated.

There is no actual schema inconsistency, but the joining node will
think there is and will wait indefinitely. Making a random schema
change would unbock it.

Fixes #4647.

Message-Id: <1566825684-18000-1-git-send-email-tgrabiec@scylladb.com>
2019-08-26 17:54:59 +03:00
Avi Kivity
a7b82af4c3 Update seastar submodule
* seastar afc5bbf511...20bfd61955 (18):
  > reactor: closing file used to check if direct_io is supported
  > future: set_coroutine(): s/state()/_state/
  > tests/perf/perf_test.hh: suppress discarded future warning
  > tests: rpc: fix memory leak in timeout wraparound tests
  > Revert "future-util: reduce allocations and continuations in parallel_for_each"
  > reactor: fix rename_priority_class() build failure in C++14 mode
  > future: mark future_state_base::failed() as unlikely
  > future-util: reduce allocations and continuations in parallel_for_each
  > future-utils: generalize when_all_estimate_vector_capacity()
  > output_stream: Add comment on sequentiality
  > docs/tutorial.md: minor cleanups in first section
  > core: fix a race in execution stages (Fixes #4856, fixes #4766)
  > semaphore: use semaphore's clock type in with_semaphore()/get_units()
  > future: fix doxygen documentation for promise<>
  > sharded: fixed detecting stop method when building with clang
  > reactor: fixed locking error in rename_priority_class
  > Assert that append_challenged_posix_file_impl are closed.
  > rpc: correctly handle huge timeouts
2019-08-26 15:37:58 +03:00
Asias He
3ea1255020 storage_service: Use sleep_abortable instead of sleep (#4697)
Make the sleep abortable so that it is able to break the loop during
shutdown.

Fixes #4885
2019-08-26 13:35:44 +03:00
Asias He
2f24fd9106 sstables: Move leveled_compaction_strategy implementation to source file
It is better than putting everything in header.
2019-08-26 16:49:48 +08:00
Asias He
b69138c4e4 sstables: Include dht/i_partitioner.hh for dht::partition_range
Get rid of one FIXME.
2019-08-26 16:35:18 +08:00
Nadav Har'El
b60d201a11 API: column_family.cc Add get_built_indexes implmentation
Merged patch series from Amnon Heiman amnon@scylladb.com

This Patch adds an implementation of the get built index API and remove a
FIXME.

The API returns a list of secondary indexes belongs to a column family
and have already been fully built.

Example:

CREATE KEYSPACE scylla_demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};

CREATE TABLE scylla_demo.mytableID ( uid uuid, text text, time timeuuid, PRIMARY KEY (uid, time) );
CREATE index on scylla_demo.mytableID (time);

$ curl -X GET 'http://localhost:10000/column_family/built_indexes/scylla_demo%3Amytableid'
["mytableid_time_idx"]
2019-08-25 18:37:44 +03:00
Amnon Heiman
2d3185fa7d column_family.cc: remove unhandle future
The sum_ratio struct is a helper struct that is used when calculating
ratio over multiple shards.

Originally it was created thinking that it may need to use future, in
practice it was never used and the future was ignore.

This patch remove the future from the implementation and reduce an
unhandle future warning from the compilation.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-08-25 16:51:14 +03:00
Amnon Heiman
21dee3d8ef API:column_family.cc Add get_build_index implmentation
This Patch adds an implementation of the get build index API and remove a
FIXME.

The API returns the list of the built secondary indexes belongs to a column family.

Example:

CREATE KEYSPACE scylla_demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};

CREATE TABLE scylla_demo.mytableID (     uid uuid,     text text,     time timeuuid,     PRIMARY KEY (uid, time) );
CREATE index on scylla_demo.mytableID (time);

$ curl -X GET 'http://localhost:10000/column_family/built_indexes/scylla_demo%3Amytableid'
["mytableid_time_idx"]

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-08-25 16:46:49 +03:00
Juliana Oliveira
711ed76c82 auth: standard_role_manager: read null columns as false
When a role is created through the `create role` statement, the
'is_superuser' and 'can_login' columns are set to false by default.
Likewise, `list roles`, `alter roles` and `* roles` operations
expect to find a boolean when reading the same columns.

This is not the case, though, when a user directly inserts to
`system_auth.roles` and doesn't set those columns. Even though
manually creating roles is not a desired day-to-day operation,
it is an insert just like any other and it should work.

`* roles` operations, on the other hand, are not prepared for
this deviations. If a user manually creates a role and doesn't
set boolean values to those columns, `* roles` will return all
sorts of errors. This happens because `* roles` is explicitly
expecting a boolean and casting for it.

This patch makes `* roles` more friendly by considering the
boolean variable `false` - inside `* roles` context - if the
actual value is `null`; it won't change the `null` value.

Fixes #4280

Signed-off-by: Juliana Oliveira <juliana@scylladb.com>
Message-Id: <20190816032617.61680-1-juliana@scylladb.com>
2019-08-25 11:52:43 +03:00
Pekka Enberg
118a141f5d scylla_blocktune.py: Kill btrfs related FIXME
The scylla_blocktune.py has a FIXME for btrfs from 2016, which is no
longer relevant for Scylla deployments, as Red Hat dropped support for
the file system in 2017.

Message-Id: <20190823114013.31112-1-penberg@scylladb.com>
2019-08-24 20:40:08 +03:00
Botond Dénes
18581cfb76 multishard_mutation_query: create_readr(): use the caller's priority class
The priority class the shard reader was created with was hardcoded to be
`service::get_local_sstable_query_read_priority()`. At the time this
code was written, priority classes could not be passed to other shards,
so this method, receiving its priority class parameters from another
shard, could not use it. This is now fixed, so we can just use whatever
the caller wants us to use.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190823115111.68711-1-bdenes@scylladb.com>
2019-08-23 16:10:43 +02:00
Tomasz Grabiec
080989d296 Merge "cql3: cartesian product limits" from Avi
Cartesian products (generated by IN restrictions) can grow very large,
even for short queries. This can overwhelm server resources.

Add limit checking for cartesian products, and configuration items for
users that are not satisfied with the default of 100 records fetched.

Fixes #4752.

Tests: unit (dev), manual test with SIGHUP.
2019-08-21 19:35:59 +02:00
Avi Kivity
67b0d379e0 main: add glue between db::config and cql3::cql_config
Copy values between the flat db::config and the hierarchical cql_config, adding
observers to keep the values updated.
2019-08-21 19:35:59 +02:00
Avi Kivity
8c7ad1d4cd cql: single_column_clustering_key_restrictions: limit cartesian products
Cartesian products (via IN restrictions) make it easy to generate huge
primary key sets with simple queries, overflowing server resources. Limit
them in the coordinator and report an exception instead of trying to
execute a query that would consume all of our memory.

A unit test is added.
2019-08-21 19:35:59 +02:00
Avi Kivity
3a44fa9988 cql3, treewide: introduce empty cql3::cql_config class and propagate it
We need a way to configure the cql interpreter and runtime. So far we relied
on accessing the configuration class via various backdoors, but that causes
its own problems around initialization order and testability. To avoid that,
this patch adds an empty cql_config class and propagates it from main.cc
(and from tests) to the cql interpreter via the query_options class, which is
already passed everywhere.

Later patches will fill it with contents.
2019-08-21 19:35:59 +02:00
Rafael Ávila de Espíndola
86c29256eb types: Fix references_user_type
This was broken since the type refactoring. It was checking the static
type, which is always abstract_type. Unfortunately we only had dtests
for this.

This can probably be optimized to avoid the double switch over kind,
but it is probably better to do the simple fix first.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190821155354.47704-1-espindola@scylladb.com>
2019-08-21 19:13:59 +03:00
Dejan Mircevski
ea9d358df9 cql3: Optimize LIKE regex construction
Currently we create a regex from the LIKE pattern for every row
considered during filtering, even though the pattern is always the
same.  This is wasteful, especially since we require costly
optimization in the regex compiler.  Fix it by reusing the regex
whenever the pattern is unchanged since the last call.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-08-21 16:45:47 +03:00
Piotr Sarna
526f4c42aa storage_proxy: fix iterator liveness issue in on_down (#4876)
The loop over view update handlers used a guard in order to ensure
that the object is not prematurely destroyed (thus invalidating
the iterator), but the guard itself was not in the right scope.
Fixed by replacinga 'for' loop with a 'while' loop, which moves
the iterator incrementation inside the scope in which it's still
guarded and valid.

Fixes #4866
2019-08-21 15:44:43 +03:00
Avi Kivity
4ef7429c4a build: build seastar in build directory
Currently, seastar is built in seastar/build/{mode}. This means we have two build
directories: build/{mode} and seastar/build/{mode}.

This patch changes that to have only a single build directory (build/{mode}). It
does that by calling Seastar's cmake directly instead of through Seastar's
./configure.py.  However, to support dpdk, if that is enabled it calls cmake
through Seastar's ./cooking.sh (similar to what Seastar's ./configure.py does).

All ./configure.py flags are translated to cmake variables, in the same way that
Seastar does.

Contains fix from Rafael to pass the flags for the correct mode.
2019-08-21 13:10:17 +02:00
Rafael Ávila de Espíndola
278b6abb2b Improve documentation on the system.large_* tables
This clarifies that "rows" are clustering rows and that there is no
information about individual collection elements.

The patch also documents some properties common to all these tables.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190820171204.48739-1-espindola@scylladb.com>
2019-08-21 10:36:25 +03:00
Vlad Zolotarov
d253846c91 hinted handoff: fix a race on a directory removal between space_watchdog and drain_for()
The endpoint directories scanned by space_watchdog may get deleted
by the manager::drain_for().

If a deleted directory is given to a lister::scan_dir() this will end up
in an exception and as a result a space_watchdog will skip this round
and hinted handoff is going to be disabled (for all agents including MVs)
for the whole space_watchdog round.

Let's make sure this doesn't happen by serializing the scanning and deletion
using end_point_hints_manager::file_update_mutex.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 11:46:46 -04:00
Vlad Zolotarov
b34c36baa2 hinted handoff: make taking file_update_mutex safe
end_point_hints_manager::file_update_mutex is taken by space_watchdog
but while space_watchdog is waiting for it the corresponding
end_point_hints_manager instance may get destroyed by manager::drain_for()
or by manager::stop().

This will end up in a use-after-free event.

Let's change the end_point_hints_manager's API in a way that would prevent
such an unsafe locking:

   - Introduce the with_file_update_mutex().
   - Make end_point_hints_manager::file_update_mutex() method private.

Fixes #4685
Fixes #4836

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 11:26:19 -04:00
Vlad Zolotarov
dbad9fcc7d db::hints::manager::drain_for(): fix alignment
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 10:58:36 -04:00
Vlad Zolotarov
7a12b46fc9 db::hints::manager: serialize calls to drain_for()
If drain_for() is running together with itself: one instance for the local
node and one for some other node, erasing of elements from the _ep_managers
map may lead to a use-after-free event.

Let's serialize drain_for() calls with a semaphore.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 10:58:36 -04:00
Vlad Zolotarov
09600f1779 db::hints: cosmetics: identation and missing method qualifier
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 10:58:36 -04:00
Avi Kivity
698b72b501 relocatable: switch from run-time relocation to install-time relocation
Our current relocation works by invoking the dynamic linker with the
executable as an argument. This confuses gdb since the kernel records
the dynamic linker as the executable, not the real executable.

Switch to install-time relocation with patchelf: when installing the
executable and libraries, all paths are known, and we can update the
path to the dynamic loader and to the dynamic libraries.

Since patchelf itself is dynamically linked, we have to relocate it
dynamically (with the old method of invoking it via the dynamic linker).
This is okay since it's a one-time operation and since we don't expect
to debug core dumps of patchelf crashes.

We lose the ability to run scylla directly from the uninstalled
tarball, but since the nonroot installer is already moving in the
direction of requiring install.sh, that is not a great loss, and
certainly the ability to debug is more important.

dh_strip barfs on some binaries which were treated with patchelf,
so exclude them from dh_strip. This doesn't lose any functionality,
since these binaries didn't have debug information to begin with
(they are already-stripped Fedora executables).

Fixes #4673.
2019-08-20 00:25:43 +02:00
Botond Dénes
4cb873abfe query::trim_clustering_row_ranges_to(): fix handling of non-full prefix keys
Non-full prefix keys are currently not handled correctly as all keys
are treated as if they were full prefixes, and therefore they represent
a point in the key space. Non-full prefixes however represent a
sub-range of the key space and therefore require null extending before
they can be treated as a point.
As a quick reminder, `key` is used to trim the clustering ranges such
that they only cover positions >= then key. Thus,
`trim_clustering_row_ranges_to()` does the equivalent of intersecting
each range with (key, inf). When `key` is a prefix, this would exclude
all positions that are prefixed by key as well, which is not desired.

Fixes: #4839
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190819134950.33406-1-bdenes@scylladb.com>
2019-08-20 00:24:51 +02:00
Avi Kivity
21d6f0bb16 Merge "Add LIKE test cases for all non-string types #4859" from Dejan
"
Follow-up to #4610, where a review comment asked for test coverage on all types. Existing tests cover all the types admissible in LIKE, while this PR adds coverage for all inadmissible types.

Tests: unit (dev)
"

* 'like-nonstring' of https://github.com/dekimir/scylla:
  cql_query_test: Add LIKE tests for all types
  cql_query_test: Remove LIKE-nonstring-pattern case
  cql_query_test: Move a testcase elsewhere in file
2019-08-20 00:24:51 +02:00
Tomasz Grabiec
6813ae22b0 Merge "Handle termination signals during streaming" from Avi
In b197924, we changed the shutdown process not to rely on the global
reactor-defined exit, but instead added a local variable to hold the
shutdown state. However, we did not propagate that state everywhere,
and now streaming processes are not able to abort.

Fix that by enhancing stop_signal with a sharded<abort_source> member
that can be propagated to services. Propagate it to storage_service
and thence to boot_strapper and range_streamer so that streaming
processes can be aborted.

Fixes #4674
Fixes #4501

Tests: unit (dev), manual bootstrap test
2019-08-20 00:24:51 +02:00
Avi Kivity
2c7435418a Merge "database: assign proper io priority for streaming view updates" from Piotr
"
Streamed view updates parasitized on writing io priority, which is
reserved for user writes - it's now properly bound to streaming
write priority.

Verified manually by checking appropriate io metrics: scylla_io_queue_total_bytes{class="streaming_write" ...} vs scylla_io_queue_total_bytes{class="query" ...}

Tests: unit(dev)
"

* 'assign_proper_io_priority_to_streaming_view_updates' of https://github.com/psarna/scylla:
  db,view: wrap view update generation in stream scheduling group
  database: assign proper io priority for streaming view updates
2019-08-20 00:24:51 +02:00
Pekka Enberg
d0eecbf3bb api/storage_proxy: Wire up hinted-handoff status to API
We support hinted-handoff now, so let's return it's status via the API.

Message-Id: <20190819080006.18070-1-penberg@scylladb.com>
2019-08-20 00:24:50 +02:00
Piotr Sarna
3cc5a04301 db,view: wrap view update generation in stream scheduling group
Generating view updates is used by streaming, so the service itself
should also run under the matching scheduling group.
2019-08-20 00:24:50 +02:00
Piotr Sarna
1ab07b80b4 database: assign proper io priority for streaming view updates
Streamed view updates parasitized on writing io priority, which is
reserved for user writes - it's now properly bound to streaming
write priority.
2019-08-20 00:24:50 +02:00
Tomasz Grabiec
b9447d0319 Revert "relocatable: switch from run-time relocation to install-time relocation"
This reverts commit 4ecce2d286.

Should be committed via the next branch.
2019-08-20 00:22:30 +02:00
Avi Kivity
4ecce2d286 relocatable: switch from run-time relocation to install-time relocation
Our current relocation works by invoking the dynamic linker with the
executable as an argument. This confuses gdb since the kernel records
the dynamic linker as the executable, not the real executable.

Switch to install-time relocation with patchelf: when installing the
executable and libraries, all paths are known, and we can update the
path to the dynamic loader and to the dynamic libraries.

Since patchelf itself is dynamically linked, we have to relocate it
dynamically (with the old method of invoking it via the dynamic linker).
This is okay since it's a one-time operation and since we don't expect
to debug core dumps of patchelf crashes.

We lose the ability to run scylla directly from the uninstalled
tarball, but since the nonroot installer is already moving in the
direction of requiring install.sh, that is not a great loss, and
certainly the ability to debug is more important.

dh_strip barfs on some binaries which were treated with patchelf,
so exclude them from dh_strip. This doesn't lose any functionality,
since these binaries didn't have debug information to begin with
(they are already-stripped Fedora executables).

Fixes #4673.
2019-08-20 00:20:19 +02:00
Glauber Costa
da260ecd61 systemd: put scylla processes in systemd slices.
It is well known that seastar applications, like Scylla, do not play
well with external processes: CPU usage from external processes may
confuse the I/O and CPU schedulers and create stalls.

We have also recently seen that memory usage from other application's
anonymous and page cache memory can bring the system to OOM.

Linux has a very good infrastructure for resource control contributed by
amazingly bright engineers in the form of cgroup controllers. This
infrastructure is exposed by SystemD in the form of slices: a
hierarchical structure to which controllers can be attached.

In true systemd way, the hierarchy is implicit in the filenames of the
slice files. a "-" symbol defines the hierarchy, so the files that this
patch presents, scylla-server and scylla-helper, essentially create a
"scylla" cgroup at the top level with "server" and "helper" children.

Later we mark the Services needed to run scylla as belonging to one
or the other through the Slice= directive.

Scylla DBAs can benefit from this setup by using the systemd-run
utility to fire ad-hoc commands.

Let's say for example that someone wants to hypothetically run a backup
and transfer files to an external object store like S3, making sure that
the amount of page cache used won't create swap pressure leading to
database timeouts.

One can then run something like:

```
   sudo systemd-run --uid=`id -u scylla` --gid=`id -g scylla` -t --slice=scylla-helper.slice /path/to/my/magical_backup_tool
```

(or even better, the backup tool can itself be a systemd timer)

Changes from last version:
- No longer use the CPUQuota
- Minor typo fixes
- postinstall fixup for small machines

Benchmark results:
==================

Test: read from disk, with 100% disk util using a single i3.xlarge (4 vCPUs).
We have to fill the cache as we read, so this should stress CPU, memory and
disk I/O.

cassandra-stress command:
```
  cassandra-stress read no-warmup duration=5m -rate threads=20 -node 10.2.209.188 -pop dist=uniform\(1..150000000\)
```

Baseline results:

```
Results:
Op rate                   :   13,830 op/s  [READ: 13,830 op/s]
Partition rate            :   13,830 pk/s  [READ: 13,830 pk/s]
Row rate                  :   13,830 row/s [READ: 13,830 row/s]
Latency mean              :    1.4 ms [READ: 1.4 ms]
Latency median            :    1.4 ms [READ: 1.4 ms]
Latency 95th percentile   :    2.4 ms [READ: 2.4 ms]
Latency 99th percentile   :    2.8 ms [READ: 2.8 ms]
Latency 99.9th percentile :    3.4 ms [READ: 3.4 ms]
Latency max               :   12.0 ms [READ: 12.0 ms]
Total partitions          :  4,149,130 [READ: 4,149,130]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

Question 1:
===========

Does putting scylla in a special slice affect its performance ?

Results with Scylla running in a slice:

```
Results:
Op rate                   :   13,811 op/s  [READ: 13,811 op/s]
Partition rate            :   13,811 pk/s  [READ: 13,811 pk/s]
Row rate                  :   13,811 row/s [READ: 13,811 row/s]
Latency mean              :    1.4 ms [READ: 1.4 ms]
Latency median            :    1.4 ms [READ: 1.4 ms]
Latency 95th percentile   :    2.2 ms [READ: 2.2 ms]
Latency 99th percentile   :    2.6 ms [READ: 2.6 ms]
Latency 99.9th percentile :    3.3 ms [READ: 3.3 ms]
Latency max               :   23.2 ms [READ: 23.2 ms]
Total partitions          :  4,151,409 [READ: 4,151,409]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

*Conclusion* : No significant change

Question 2:
===========

What happens when there is a CPU hog running in the same server as scylla?

CPU hog:

```
   taskset -c 0 /bin/sh -c "while true; do true; done" &
   taskset -c 1 /bin/sh -c "while true; do true; done" &
   taskset -c 2 /bin/sh -c "while true; do true; done" &
   taskset -c 3 /bin/sh -c "while true; do true; done" &
   sleep 330
```

Scenario 1: CPU hog runs freely:

```
Results:
Op rate                   :    2,939 op/s  [READ: 2,939 op/s]
Partition rate            :    2,939 pk/s  [READ: 2,939 pk/s]
Row rate                  :    2,939 row/s [READ: 2,939 row/s]
Latency mean              :    6.8 ms [READ: 6.8 ms]
Latency median            :    5.3 ms [READ: 5.3 ms]
Latency 95th percentile   :   11.0 ms [READ: 11.0 ms]
Latency 99th percentile   :   14.9 ms [READ: 14.9 ms]
Latency 99.9th percentile :   17.1 ms [READ: 17.1 ms]
Latency max               :   26.3 ms [READ: 26.3 ms]
Total partitions          :    884,460 [READ: 884,460]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

Scenario 2: CPU hog runs inside scylla-helper slice

```
Results:
Op rate                   :   13,527 op/s  [READ: 13,527 op/s]
Partition rate            :   13,527 pk/s  [READ: 13,527 pk/s]
Row rate                  :   13,527 row/s [READ: 13,527 row/s]
Latency mean              :    1.5 ms [READ: 1.5 ms]
Latency median            :    1.4 ms [READ: 1.4 ms]
Latency 95th percentile   :    2.4 ms [READ: 2.4 ms]
Latency 99th percentile   :    2.9 ms [READ: 2.9 ms]
Latency 99.9th percentile :    3.8 ms [READ: 3.8 ms]
Latency max               :   18.7 ms [READ: 18.7 ms]
Total partitions          :  4,069,934 [READ: 4,069,934]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

*Conclusion*: With systemd slice we can keep the performance very close to
baseline

Question 3:
===========

What happens when there is a CPU hog running in the same server as scylla?

I/O hog: (Data in the cluster is 2x size of memory)

```
while true; do
	find /var/lib/scylla/data -type f -exec grep glauber {} +
done
```

Scenario 1: I/O hog runs freely:

```
Results:
Op rate                   :    7,680 op/s  [READ: 7,680 op/s]
Partition rate            :    7,680 pk/s  [READ: 7,680 pk/s]
Row rate                  :    7,680 row/s [READ: 7,680 row/s]
Latency mean              :    2.6 ms [READ: 2.6 ms]
Latency median            :    1.3 ms [READ: 1.3 ms]
Latency 95th percentile   :    7.8 ms [READ: 7.8 ms]
Latency 99th percentile   :   10.9 ms [READ: 10.9 ms]
Latency 99.9th percentile :   16.9 ms [READ: 16.9 ms]
Latency max               :   40.8 ms [READ: 40.8 ms]
Total partitions          :  2,306,723 [READ: 2,306,723]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

Scenario 2: I/O hog runs in the scylla-helper systemd slice:

```
Results:
Op rate                   :   13,277 op/s  [READ: 13,277 op/s]
Partition rate            :   13,277 pk/s  [READ: 13,277 pk/s]
Row rate                  :   13,277 row/s [READ: 13,277 row/s]
Latency mean              :    1.5 ms [READ: 1.5 ms]
Latency median            :    1.4 ms [READ: 1.4 ms]
Latency 95th percentile   :    2.4 ms [READ: 2.4 ms]
Latency 99th percentile   :    2.9 ms [READ: 2.9 ms]
Latency 99.9th percentile :    3.5 ms [READ: 3.5 ms]
Latency max               :  183.4 ms [READ: 183.4 ms]
Total partitions          :  3,984,080 [READ: 3,984,080]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

*Conclusion*: With systemd slice we can keep the performance very close to
baseline

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-08-19 14:31:28 -04:00
Avi Kivity
c32f9a8f7b dht: check for aborts during streaming
Propagate the abort_source from main() into boot_strapper and range_stream and
check for aborts at strategic points. This includes aborting running stream_plans
and aborting sleeps between retries.

Fixes #4674
2019-08-18 20:41:07 +03:00
Avi Kivity
5af6f5aa22 main: expose SIGINT/SIGTERM as abort_source
In order to propagate stop signals, expose them as sharded<abort_source>. This
allows propagating the signal to all shards, and integrating it with
sleep_abortable().

Because sharded<abort_source>::stop() will block, we'll now require stop_signal
to run in a thread (which is already the case).
2019-08-18 20:15:26 +03:00
Avi Kivity
20aed3398d Merge "Simplify types" from Rafael
"
This is hopefully the last large refactoring on the way of UDF.

In UDF we have to convert internal types to Lua and back. Currently
almost all our types and hidden in types.cc and expose functionality
via virtual functions. While it should be possible to add a
convert_{to|from}_lua virtual functions, that seems like a bad design.

In compilers, the type definition is normally public and different
passes know how to reason about each type. The alias analysis knows
about int and floats, not the other way around.

This patch series is inspired by both the LLVM RTTI
(https://www.llvm.org/docs/HowToSetUpLLVMStyleRTTI.html) and
std::variant.

The series makes the types public, adds a visit function and converts
the various virtual methods to just use visit. As a small example of
why this is useful, it then moves a bit of cql3 and json specific
logic out of types.cc and types.hh. In a similar way, the UDF code
will be able to used visit to convert objects to Lua.

In comparison with the previous versions, this series doesn't require the intermediate step of converting void* to data_value& in a few member functions.

This version also has fewer double dispatches I a am fairly confident has all the tools for avoiding all double dispatches.
"

* 'simplify-types-v3' of https://github.com/espindola/scylla: (80 commits)
  types: Move abstract_type visit to a header
  types: Move uuid_type_impl to a header
  types: Move inet_addr_type_impl to a header
  types: Move varint_type_impl to a header
  types: Move timeuuid_type_impl to a header
  types: Move date_type_impl to a header
  types: Move bytes_type_impl to a header
  types: Move utf8_type_impl to a header
  types: Move ascii_type_impl to a header
  types: Move string_type_impl to a header
  types: Move time_type_impl to a header
  types: Move simple_date_type_impl to a header
  types: Move timestamp_type_impl to a header
  types: Move duration_type_impl to a header
  types: Move decimal_type_impl to a header
  types: Move floating point types  to a header
  types: Move boolean_type_impl to a header
  types: Move integer types to a header
  types: Move integer_type_impl to a header
  types: Move simple_type_impl to a header
  ...
2019-08-18 19:04:05 +03:00
Takuya ASADA
f574112301 dist/debian: handle --dist correctly
On ac9b115, it mistakenly ignores --dist option.
It should set 'housekeeping' template variable to 'enable'.

Fixes #4857

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190816120127.14099-1-syuu@scylladb.com>
2019-08-18 15:00:33 +03:00
Avi Kivity
14d40cc659 Update seastar submodule
* seastar fe2b5b0c6...afc5bbf51 (4):
  > Merge "handle discarded futures or suppress warning" from Benny
  > Remove variadic futures from the Seastar implementation
  > Revert "Merge "handle discarded futures or suppress warning" from Benny"
  > io_queue: Forward declare smp class
2019-08-17 12:18:18 +03:00
Dejan Mircevski
48bb89fcb7 cql_query_test: Add LIKE tests for all types
As requested in a prior code review [1], ensure that LIKE cannot be
used on any non-string type.

[1] https://github.com/scylladb/scylla/pull/4610#pullrequestreview-255590129

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-08-16 17:55:35 -04:00
Dejan Mircevski
ef071bf7ce cql_query_test: Remove LIKE-nonstring-pattern case
This testcase was previously commented out, pending a fix that cannot
be made.  Currently it is impossible to validate the marker-value type
at filtering time.  The value is entered into the options object under
its presumed type of string, regardless of what it was made from.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-08-16 17:07:44 -04:00
Dejan Mircevski
20e688e703 cql_query_test: Move a testcase elsewhere in file
Somehow this test case sits in the middle of LIKE-operator tests:
test_alter_type_on_compact_storage_with_no_regular_columns_does_not_crash

Move it so LIKE test cases are contiguous.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-08-16 17:07:44 -04:00
Glauber Costa
ffc328c924 move postinst steps to an external script
There are systemd-related steps done in both rpm and deb builds.
Move that to a script so we avoid duplication.

The tests are so far a bit specific to the distributions, so it
needs to be adapted a bit.

Also note that this also fixes a bug with rpm as a side-effect:
rpm does not call daemon-reload after potentially changing the
systemd files (it is only implied during postun operations, that
happen during uninstall). daemon-reload was called explicitly for
debian packages, and now it is called for both.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-08-15 10:43:17 -04:00
Rafael Ávila de Espíndola
7f0a434cfa types: Move abstract_type visit to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
dccefd1ddb types: Move uuid_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
038728a381 types: Move inet_addr_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
1966416cb3 types: Move varint_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
9229f99c86 types: Move timeuuid_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
993f132619 types: Move date_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
a299ed3b9b types: Move bytes_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
09ac2a1bc6 types: Move utf8_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
da472a65ec types: Move ascii_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
b98bac65b0 types: Move string_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
3e5b1e2630 types: Move time_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
909df932ac types: Move simple_date_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
8f3bebb6e8 types: Move timestamp_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
3260153d35 types: Move duration_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
2f6a26b1c1 types: Move decimal_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
480ca52b59 types: Move floating point types to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
6a4ec7488e types: Move boolean_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
404b26d3fa types: Move integer types to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
bd3e725605 types: Move integer_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
03aca28dc5 types: Move simple_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
e8ba37fa5a types: Move counter_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
cb03c79a48 types: Move empty_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
1cb7127bf3 types: Make abstract_type::serialize a static helper
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
b175657ee7 types: Devirtualize abstract_type::validate
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
bf96f1111c types: Make abstract_type::serialized_size a static helper
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:41 -07:00
Rafael Ávila de Espíndola
6831e05471 types: Move functions that use abstract_type::serialized_size out of line
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
047e34a31d types: Remove serialize_value
It is no longer needed.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
1e0663c56c types: Devirtualize abstract_type::from_string
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
68b26047cc types: Devirtualize abstract_type::serialize
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
18da5f9001 types: Devirtualize abstract_type::from_json_object
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
b987b2dcbe types: Devirtualize abstract_type::to_json_string
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
b4bc888eac types: Refactor abstract_type::serialized_size
The following logic was duplicated:

* For all types, if value is null, the result is zero.
* For non collection types, if the native object is empty, the result
  is zero.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
968365b7e3 types: Devirtualize abstract_type::serialized_size
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
793bc50d69 types: Delete abstract_type::validate_collection_member
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
37686964f0 types: Devirtualize abstract_type::hash
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
396f5c7656 types: Devirtualize abstract_type::native_typeid
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
492043a77d types: Devirtualize abstract_type::native_value_delete
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
4d849d7742 types: Devirtualize abstract_type::native_value_clone
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
ba887b7e56 types: Delete abstract_type::native_value_destroy
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
5c0e78d70c types: Delete abstract_type::native_value_move
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
2bc6471a1e types: Delete abstract_type::native_value_copy
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
33394dfdc1 types: Delete abstract_type::native_value_size
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
c22ca2f9c9 types: Delete abstract_type::native_value_alignment
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
37c0f5b985 types: Devirtualize get_string
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
f633f70616 types: Devirtualize abstract_type::is_value_compatible_with_internal
It now is a static helper.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
19c9a033d9 types: Devirtualize abstract_type::is_compatible_with
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
d245d08045 types: Devirtualize abstract_type::is_string
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
ae30d78ca9 types: Devirtualize abstract_type::equal
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
f087756684 types: Implement less with compare
We defined less for some types and compare for others. There is no
type for which compare is substantially more expensive, so define it
for all types and implement less with compare.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
9bbf55e9c0 types: Devirtualize abstract_type::compare
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
a5daa8d258 types: Devirtualize abstract_type::less
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
a3e898a648 types: Devirtualize abstract_type::deserialize
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
8145faa66f types: Inline is_byte_order_comparable into only user
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
325418db16 types: Devirtualize abstract_type::is_byte_order_comparable
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
d2b063877b types: Devirtualize abstract_type::is_byte_order_equal
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
21da060b24 types: Devirtualize abstract_type::update_user_type
The type walking is similar to what the find function does, but
refactoring it doesn't seem worth it if these are the only two uses.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
ae6e96a1e2 types: Refactor references_duration and references_user_type
With this patch the logic for walking all nested types is moved to a
helper function. It also fixes reversed_type_impl not being handled in
references_duration.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
25a5631a46 types: Devirtualize abstract_type::references_user_type
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
544337f380 types: Devirtualize abstract_type::references_duration
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
a6b48bda03 types: Devirtualize abstract_type::is_native
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
f5b4fe5685 types: Devirtualize abstract_type::is_atomic
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
ec09fb94cb types: Devirtualize abstract_type::is_multi_cell
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
1bea7747ce types: Devirtualize abstract_type::is_tuple
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
1581805a8d types: Devirtualize abstract_type::is_collection
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
1137695cb2 types: Devirtualize abstract_type::is_counter
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
d3ba0d132a types: Devirtualize abstract_type::is_user_type
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
0ff539500f types: Devirtualize abstract_type::cql3_type_name_impl
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
5314b489e3 types: Devirtualize abstract_type::get_cql3_kind_impl
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
2f0c64844f types: Devirtualize abstract_type::is_reversed
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
33d2ec8e1c types: Devirtualize abstract_type::underlying_type
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
064db9b92e types: Devirtualize abstract_type::to_string_impl
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
69d6fd21d2 types: Add a listlike_collection_type_impl class
With this we can share code that wants to access the element type of
set and list.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
a4837301a6 types: Move _is_multi_cell to collection_type_impl
It was duplicated in each concrete collection type.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
de6d6c46a1 types: Remove collection_type_impl::kind
All uses have been switched to abstract_type::kind.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
c80c19459e types: Add a visitor over data_value
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
5701051857 types: Add a generic visit over abstract_type
The api is inspired by on std::variant.

This bridges the runtime type of a abstract_type object to a compile
time overload resolution. For example, it is possible to have a single
lambda to visit a string_type_impl, but it corresponds to two leaf
types (ascii and utf8).

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
e5c7deaeb5 types: Add a kind to abstract_type
The type hierarchy is closed, so we can give each leaf an enum value.

This will be used to implement a visitor pattern and reduce code
duplication.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
5c098eb7d0 types: Add more tests for abstract_type::to_string_impl
The corresponding code is correct, but I noticed no tests would fail
if it was broken while refactoring it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
096de10eee types: Remove abstract_type::equals
All types are interned, so we can just compare the pointers.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
6a8ffb35ff types: Make a few concrete_type member functions public
These only use public member functions from data_value, so there is no
reason for not making them public too.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Gleb Natapov
1779c3b7f6 move admission control semaphore from cql server to storage_service
There are two reasons for the move. First is that cql server lifetime
is shorter than storage_proxy one and the later stores references to
the semaphore in each service_permit it holds. Second - we want thrift
(and in the future other user APIs) to share the same admission control
memory pool.

Fixes #4844

Message-Id: <20190814142614.GT17984@scylladb.com>
2019-08-14 18:49:56 +03:00
Gleb Natapov
a1e9e6faa2 storage_service: remove outdated comment
We in fact do stop cql server in storage_service::drain_on_shutdown()
which is called in main.cc during shutdown.

Message-Id: <20190814085027.GP17984@scylladb.com>
2019-08-14 11:52:49 +03:00
Avi Kivity
9f512509c7 github: remove github pull request template (#4833)
Since we do accept pull requests (in a long-running experiment), the
pull request template suggesting not to use them is inaccurate, and
many requesters forget to remove the boilerplace.

Remove the outdate template.
2019-08-14 09:28:39 +03:00
Pekka Enberg
595434a554 Merge "docker: relax permission checks" from Avi
"Commit e3f7fe4 added file owner validation to prevent Scylla from
 crashing when it tries to touch a file it doesn't own. However, under
 docker, we cannot expect to pass this check since user IDs are from
 different namespaces: the process runs in a container namespace, but the
 data files usually come from a mounted volume, and so their uids are
 from the host namespace.

 So we need to relax the check. We do this by reverting b1226fb, which
 causes Scylla to run as euid 0 in docker, and by special-casing euid 0
 in the ownership verification step.

 Fixes #4823."

* 'docker-euid-0' of git://github.com/avikivity/scylla:
  main: relax file ownership checks if running under euid 0
  Revert "dist/docker/redhat: change user of scylla services to 'scylla'"
2019-08-13 19:55:05 +03:00
Tomasz Grabiec
64ff1b6405 cql: alter type: Format field name as text instead of hex
Fixes #4841

Message-Id: <1565702635-26214-1-git-send-email-tgrabiec@scylladb.com>
2019-08-13 16:25:48 +03:00
Tomasz Grabiec
34cff6ed6b types: Fix abort on type alter which affects a compact storage table with no regular columns
Fixes #4837

Message-Id: <1565702247-23800-1-git-send-email-tgrabiec@scylladb.com>
2019-08-13 16:25:02 +03:00
Avi Kivity
1ed3356e0e main: relax file ownership checks if running under euid 0
During startup, we check that the data files are owned by our euid.
But in a container environment, this is impossible to enforce because
uid/username mappings are different between the host and the container,
and the data files are likely to be mounted from the host.

To allow for such environments, relax the checks if euid=0. This
both matches what happens in a container (programs run as root) and
the kernel access checks (euid 0 can do anything).

We can reconsider this when container uid mapping is better developed.

Fixes #4823.
Fixes #4536.
2019-08-13 14:36:08 +03:00
Avi Kivity
ca28fdc37d Revert "dist/docker/redhat: change user of scylla services to 'scylla'"
This reverts commit b1226fb15a. When the
data volume is mounted from the host (as is usual in container
deployments), we can't expect that the files will be owned by the
in-container scylla user. So that commit didn't really fix #4536.

A follow-up patch will relax the check so it passes in a container
environment.
2019-08-13 14:36:00 +03:00
Pekka Enberg
fed38f5179 reloc/build_reloc.sh: Add '--configure-flags' command line option
This adds a '--configure-flags FLAGS' command line option, which
overrides the flags passed to scylla.git 'configure.py' script. We need
this for flexibility of custom builds in Jenkins pipelines, for example.

Message-Id: <20190813095428.13590-1-penberg@scylladb.com>
2019-08-13 14:05:25 +03:00
Tomasz Grabiec
0cf4fab2ca Merge "Multishard combining reader more robust reader recreation" from Botond
Make the reader recreation logic more robust, by moving away from
deciding which fragments have to be dropped based on a bunch of
special cases, instead replacing this with a general logic which just
drops all already seen fragments (based on their position).  Special
handling is added for the case when the last position is a range
tombstone with a non full prefix starting position.  Reproducer unit
tests are added for both cases.

Refs #4695
Fixes #4733
2019-08-13 11:53:07 +02:00
Gleb Natapov
00c4078af3 cache_hitrate_calculator: do not ignore a future returned from gossiper::add_local_application_state
We should wait for a future returned from add_local_application_state() to
resolve before issuing new calculation, otherwise two
add_local_application_state() may run simultaneously for the same state.

Fixes #4838.

Message-Id: <20190812082158.GE17984@scylladb.com>
2019-08-13 11:48:38 +03:00
Botond Dénes
fe58324fb9 tests: test_multishard_combining_reader_as_mutation_source: don't copy mutations cross shard
It's illegal. Freeze-unfreeze them instead when crossing shard
boundaries.
2019-08-13 10:16:02 +03:00
Botond Dénes
d746fb59a7 mutation_reader_test: harden test_multishard_combining_reader_as_mutation_source
Add `single_fragment_buffer` test variable. When set, the shard readers
are created with a max buffer size of 1, effectively causing them to
read a single fragment at a time. This, when combined with
`evict_readers=true` will stress the recreate reader logic to the max.
2019-08-13 10:16:02 +03:00
Botond Dénes
899afc0661 flat_mutation_reader_assertions: produces_range_tombstone(): be more lenient
Be more tolerant with different but equivalent representation of range
deletions. When expecting a range tombstone, keep reading range
tombstones while these can be merged with the cumulative range
tombstone, resulting from the merging of the previous range tombstones.
This results in tests tolerating range tombstones that are split into
several, potentially overlapping range tombstones, representing the
same underlying deletion.
2019-08-13 10:16:02 +03:00
Botond Dénes
53e1dca5ca tests/mutation_source_test: generate_mutation_sets() add row that falls into deleted prefix
This is tailored to the multishard_combining_reader, to make sure it
doesn't loos rows following a range tombstone with a prefix starting
position (whose prefix their keys fall into).
2019-08-13 09:47:55 +03:00
Botond Dénes
6bfe468a17 multishard_combining_reader: remote_reader::recreate_reader(): restore indentation 2019-08-13 09:47:55 +03:00
Botond Dénes
68353acc1c multishard_combining_reader: remote_reader: use next instead of last pos
Currently the remote reader uses the last seen fragment's position to
calculate the position the reader should continue from when the reader
is recreated after having been evicted. Recently it was discovered that
this logic breaks down badly when this last position is a non-full
clustering prefix (a range tombstone start bound). In this case, if only
the last position is available, there is no good way of computing the
starting position. Starting after this position will potentially miss
any rows that fall into the prefix (the current behaviour). Starting
from before it will cause all range tombstones with said prefix to be
re-emitted, causing other problems. A better solution is to exploit the
fact that sometimes we also know what the next fragment is.
These "some" times are the exact times that are problematic with the
current approach -- when the last fragment is a range tombstone.
Exploiting this extra knowledge allows for a much better way for
calculating the starting position: instead of maintaining the last
position, we maintain the next position, which is always safe to start
from. This is not always possible, but in many cases we can know for
sure what the next position is, for example if the last position was a
static row we can be sure the next position is the first clustering
position (or partition end). In the few cases where we cannot calculate
the next position we fall back to the previous logic and start from
*after* the last positions. The good news is that in these remaining
cases (the last fragment is a clustering row) it is safe to do so.

This patch also does some refactoring of the remote-reader internals,
all fill-buffer related logic is grouped together in a single
`fill_buffer()` method.
2019-08-13 09:47:55 +03:00
Botond Dénes
3949189918 multishard_combining_reader: remote_reader::do_fill_buffer(): reorganize drop logic
To make it more readable.
2019-08-13 09:47:55 +03:00
Botond Dénes
20c06adf80 position_in_partition: add for_partition_start() 2019-08-13 09:47:55 +03:00
Botond Dénes
87973498a1 query: refactor trim_clustering_row_ranges_to()
Allow expressing `pos` in term of a `position_in_partition_view`, which
allows finer control of the exact position, allowing specifying position
before, at or after a certain key.
The previous overload is kept for backward compatibility, invoking the
new overload behind the curtains.
2019-08-13 09:47:55 +03:00
Botond Dénes
3a5e7db9b6 tests: add unit test for query::trim_clustering_row_ranges_to()
We are about to do a major refactoring of this method. Add extensive
unit tests to ensure we don't brake it in the process.
2019-08-13 09:47:55 +03:00
Botond Dénes
1b4e88b972 position_in_partition_view: add get_bound_weight() 2019-08-13 09:47:55 +03:00
Avi Kivity
0d0ee20f76 Merge "Implement sstable_info API command (info on sstables)" from Calle
"
Refs #4726

Implement the api portion of a "describe sstables" command.

Adds rest types for collecting both fixed and dynamic attributes, some grouped. Allows extensions to add attributes as well. (Hint hint)
"

* 'sstabledesc' of https://github.com/elcallio/scylla:
  api/storage_service: Add "sstable_info" command
  sstables/compress: Make compressor pointer accessible from compression info
  sstables.hh: Add attribute description API to file extension
  sstables.hh: Add compression component accessor
  sstables.hh: Make "has_component" public
2019-08-12 21:16:08 +03:00
Dejan Mircevski
8be147d069 cql3: Handle empty LIKE pattern
Match SQL's LIKE in allowing an empty pattern, which matches only
an empty text field.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-08-12 19:48:31 +03:00
Rafael Ávila de Espíndola
99c7f8457d logalloc: Add a migrators_base that is common to debug and release
This simplifies the debug implementation and it now should work with
scylla-gdb.py.

It is not clear what, if anything, is lost by not using random
ids. They were never being reused in the debug implementation anyway.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190618144755.31212-1-espindola@scylladb.com>
2019-08-12 19:44:55 +03:00
Calle Wilund
2b19bfbfbc types: Remove obsolete "FIXME"
inet_addr_type_impl has supported ipv6 for some time now.
Message-Id: <20190812142731.6384-1-calle@scylladb.com>
2019-08-12 17:30:15 +03:00
Calle Wilund
1afc899e37 type_parser: Fix/improve exception messages
Removes long-standing FIXME for message detail
Also simplifies some code, removing duplication.

Message-Id: <20190812134144.2417-1-calle@scylladb.com>
2019-08-12 17:03:43 +03:00
Calle Wilund
fdf2017487 cql3::term: Remove unneeded const_cast
Removed no longer needed FIXME (to_string became const long ago)

Message-Id: <20190812133943.2011-1-calle@scylladb.com>
2019-08-12 17:00:46 +03:00
Amnon Heiman
6a0490c419 api/compaction_manager: indentation 2019-08-12 14:04:40 +03:00
Amnon Heiman
8181601f0e api/compaction_manager: do not hold map on the stack
This patch fixes a bug that a map is held on the stack and then is used
by a future.

Instead, the map is now wrapped with do_with.

Fixes #4824

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-08-12 14:04:00 +03:00
Asias He
131acc09cc repair: Adjust parallelism according to memory size (#4696)
After commit 8a0c4d5 (Merge "Repair switch to rpc stream" from Asias),
we increased the row buffer size for repair from 512KiB to 32MiB per
repair instance. We allow repairing 16 ranges (16 repair instance) in
parallel per repair request. So, a node can consume 16 * 32MiB = 512MiB
per user requested repair. In addition, the repair master node can hold
data from all the repair followers, so the memory usage on repair master
can be larger than 512MiB. We need to provide a way to limit the memory
usage.

In this patch, we limit the total memory used by repair to 10% of the
shard memory. The ranges that can be repaired in parallel is:

max_repair_ranges_in_parallel = max_repair_memory / max_repair_memory_per_range.

For example, if each shard has 4096MiB of memory, then we will have
max_repair_ranges_in_parallel = 4096MiB / 32MiB = 12.

Fixes #4675
2019-08-12 11:09:27 +03:00
Avi Kivity
e6cde72d2b Merge "Fix cql server admission control to take all leftover work into account" from Gleb
"
Current admission control takes a permit when cql requests starts and
releases it when reply is sent, but some requests may leave background
work behind after that point (some because there is genuine background
work to do like complete a write or do a read repair, and some because
a read/write may stuck in a queue longer than the request's timeout), so
after Scylla replies with a timeout some resources are still occupied.

The series fixes this by passing the permit down to storage_proxy where
it is held until all background work is completed.

Fixes #4768
"

* 'gleb/admission-v3' of github.com:scylladb/seastar-dev:
  transport: add a metric to follow memory available for service permit.
  storage_proxy: store a permit in a read executor
  storage_proxy: store a permit in a write response handler
  Pass service permit to storage_proxy
  transport: introduce service_permit class and use it instead of semaphore_units
  transport: hold admission a permit until a reply is sent
  transport: remove cql server load balancer
2019-08-12 11:02:37 +03:00
Gleb Natapov
3e27c2198a transport: add a metric to follow memory available for service permit.
Add a metric to follow memory available for service permit. When this
memory is close to zero cql server stops admitting new requests.
2019-08-12 10:20:43 +03:00
Gleb Natapov
7d7b1685aa storage_proxy: store a permit in a read executor
A read executor exists until read operation completes in its entirety
so storing a permit there guaranties that it will be freed only after
no background work left for the request on this server.
2019-08-12 10:20:43 +03:00
Gleb Natapov
d5ced800f0 storage_proxy: store a permit in a write response handler
A write response handler exists until write operation completes in its
entirety so storing a permit there guaranties that it will be freed only
after no background work left for the request on this server.
2019-08-12 10:20:43 +03:00
Gleb Natapov
6a4207f202 Pass service permit to storage_proxy
Current cql transport code acquire a permit before processing a query and
release it when the query gets a reply, but some quires leave work behind.
If the work is allowed to accumulate without any limit a server may
eventually run out of memory. To prevent that the permit system should
account for the background work as well. The patch is a first step in
this direction. It passes a permit down to storage proxy where it will
be later hold by background work.
2019-08-12 10:20:43 +03:00
Raphael S. Carvalho
b436c41128 compaction_manager: Prevent sstable runs from being partially compacted
Manager trims sstables off to allow compaction jobs to proceed in parallel
according to their weights. The problem is that trimming procedure is not
sstable run aware, so it could incorrectly remove only a subset of a sstable
run, leading to partial sstable run compaction.

Compaction of a sstable run could lead to inneficiency because the run structure
would be messed up, affecting all amplification factors, and the same generation
could even end up being compacted twice.

This is fixed by making the trim procedure respect the sstable runs.

Fixes #4773.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190730042023.11351-1-raphaelsc@scylladb.com>
2019-08-11 17:20:20 +03:00
Gleb Natapov
ddff7f48cf transport: introduce service_permit class and use it instead of semaphore_units
service_permit is a new class that allows sharing a permit among
different parts of request processing many of which can complete
at different times.
2019-08-11 16:08:55 +03:00
Gleb Natapov
2daa72b7dc transport: hold admission a permit until a reply is sent
Current code release admission permit to soon. If new requests are
admitted faster than client read replies back reply queue can grow to
be very big. The patch moves service permit release until after a reply
is sent.
2019-08-11 16:08:55 +03:00
Gleb Natapov
7e3805ed3d transport: remove cql server load balancer
It is buggy, unused and unnecessary complicates the code.
2019-08-11 16:08:52 +03:00
Nadav Har'El
f9d6eaf5ff reconcilable_result: switch to chunked_vector
Merged patch series from Avi Kivity:

In rare but valid cases (reconciling many tombstones, paging disabled),
a reconciled_result can grow large. This triggers large allocation
warnings. Switch to chunked_vector to avoid the large allocation.
In passing, fix chunked_vector's begin()/end() const correctness, and
add the reverse iterator function family which is needed by the conversion.

Fixes #4780.

Tests: unit (dev)

Commit Summary

    utils: chunked_vector: make begin()/end() const correct
    utils::chunked_vector: add rbegin() and related iterators
    reconcilable_result: use chunked_vector to hold partitions
2019-08-11 16:03:13 +03:00
Avi Kivity
ce2b0b2682 Merge "Add listen/rpc "prefer_ipv6" options to DNS lookup #4775" from Calle
"
Add listen/rpc "prefer_ipv6" options to DNS lookup of bind addresses for API/rpc/prometheus etc .

Fixes #4751

Adds using a preferred address family to dns name lookups related to
listen address and rpc address, adhering to the respective "prefer" options.

API, prometheus and broadcast address are all considered to be covered by
the "listen_interface_prefer_ipv6" option.

Note: scylla does not yet support actual interface binding, but these
options should apply equally to address name parameters.

Setting a "prefer_ipv6" option automtially enables ipv6 dns family query.
"

* 'calle/ipv6' of https://github.com/elcallio/scylla:
  init: Use the "prefer_ipv6" options available for rpc/listen address/interface
  inet_address: Add optional "preferred type" to lookup
  config: Add rpc_interface_prefer_ipv6 parameter
  config: Add listen_interface_perfer_ipv6 parameter
  config.cc: Fix enable_ipv6_dns_lookup actual param name
2019-08-11 15:21:45 +03:00
Pekka Enberg
73113c0ea4 utils/fb_utilities.hh: Kill obsolete FIXME and commented out Java code
The FIXME was added in the very first commit ("utils: Convert
utils/FBUtilities.java") that introduced the fb_utilities class as a
stub. However, we have long implemented the parts that we actually use,
so drop the FIXME as obsolete. In addition, drop the remaining
uncommented Java code as unused and also obsolete.

Message-Id: <20190808182758.1155-1-penberg@scylladb.com>
2019-08-11 10:26:36 +03:00
Botond Dénes
fd925f6049 position_in_partition_view: add constructor with bound_weight
This is a low level constructor which allows directly providing a bound
weight to go with the key.
2019-08-09 10:54:27 +03:00
Pekka Enberg
547c072f93 dbuild: Make Maven local repository accessible
The Maven build tool ("mvn"), which is used by scylla-jmx and
scylla-tools-java, stores dependencies in a local repository stored at
$HOME/.m2. Make sure it's accessible to dbuild.

Message-Id: <20190808140216.26141-1-penberg@scylladb.com>
2019-08-08 17:36:13 +03:00
Avi Kivity
8f19b16fe4 Update seastar submodule
* seastar ed608e3c9e...fe2b5b0c6b (2):
  > Merge "handle discarded futures or suppress warning" from Benny
  > output_stream: Add close() blurb
2019-08-08 16:22:38 +03:00
Avi Kivity
4a5ec61438 Update seastar submodule
* seastar a1cf07858b...ed608e3c9e (4):
  > core: Add ability to abort on EBADF and ENOTSOCK
  > Revert "Merge "handle discarded futures or suppress warning" from Benny"
  > Merge "handle discarded futures or suppress warning" from Benny
  > reactor: remove replace variadic future<pollable_fd, socket_address> with future<tuple>
2019-08-08 14:22:29 +03:00
Raphael S. Carvalho
76cde84540 sstables/compaction_manager: Fix logic for filtering out partial sstable runs
ignore_partial_runs() brings confusion because i__p__r() equal to true
doesn't mean filter out partial runs from compaction. It actually means
not caring about compaction of a partial run.

The logic was wrong because any compaction strategy that chooses not to ignore
partial sstable run[1] would have any fragment composing it incorrectly
becoming a candidate for compaction.
This problem could make compaction include only a subset of fragments composing
the partial run or even make the same fragment be compacted twice due to
parallel compaction.

[1]: partial sstable run is a sstable that is still being generated by
compaction and as a result cannot be selected as candidate whatsoever.

Fix is about making sure partial sstable run has none of its fragments
selected for compaction. And also renaming i__p__r.

Fixes #4729.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190807022814.12567-1-raphaelsc@scylladb.com>
2019-08-08 14:11:35 +03:00
Pekka Enberg
7d4bf10d87 docs/building-packages.md: Document how to build Scylla packages
This documents the steps needed to build Scylla's Linux packages with
the relocatable package infrastructure we use today.

Message-Id: <20190807134017.4275-1-penberg@scylladb.com>
2019-08-08 14:11:35 +03:00
Pekka Enberg
79cece9f33 toolchain: Fix default command for dbuild Docker image
Running "dbuild" without a build command fails as follows:

  $ ./tools/toolchain/dbuild
  Error: This command has to be run under the root user.

Israel Fruchter discovered that the default command of our Docker image is this:

  "Cmd": [
    "bash",
    "-c",
    "dnf -y install python3-cassandra-driver && dnf clean all"
   ]

Let's make "/bin/bash" the default command instead, which will make
"dbuild" with no build command to return to the host shell.

Message-Id: <20190807133955.4202-1-penberg@scylladb.com>
2019-08-08 14:11:35 +03:00
Pekka Enberg
76cdec222f build_reloc.sh: Remove "--with" passed to "configure.py"
The build_reloc.sh script passes "--with=scylla" and "--with=iotune" to
the configure.py script. This is redundant as the
"scylla-package.tar.gz" target of ninja already limits itself to them.

Removing the "--with" options allows building unit tests after a
relocatable package has been built without having to rebuild anything.

Message-Id: <20190807130505.30089-1-penberg@scylladb.com>
2019-08-07 16:28:00 +03:00
Avi Kivity
e548bdb2e8 thrift, transport: switch to new seastar accept() API (#4814)
Seastar switched accept() to return a single struct instead of a variadic future,
adjust the code to the new API to avoid deprecation warnings.
2019-08-07 15:23:26 +02:00
Pekka Enberg
f68fffd99a reloc/build_reloc.sh: Make build mode configurable
Add a '--mode <mode>' command line option to the 'build_reloc.sh' script
so that we can create relocatable packages for debug builds.

The '--mode' command line option defaults to 'release' so existing users
are unaffected.

Message-Id: <20190807120759.32634-1-penberg@scylladb.com>
2019-08-07 16:19:37 +03:00
Asias He
fee26b9f6e repair: Fix use after free in do_estimate_partitions_on_local_shard (#4813)
We need to keep the sstables object alive during the operation of
do_for_each.

Notes: No need to backport to 3.1.

Fixes #4811
2019-08-07 15:19:21 +02:00
Asias He
49a73aa2fc streaming: Move stream_mutation_fragments_cmd to a new file (#4812)
Avoid including the lengthy stream_session.hh in messaging_service.

More importantly, fix the build because currently messaging_service.cc
and messaging_service.hh does not include stream_mutation_fragments_cmd.
I am not sure why it builds on my machine. Spotted this when backporting
the "streaming: Send error code from the sender to receiver" to 3.0
branch.

Refs: #4789
2019-08-07 14:59:46 +02:00
Asias He
288371ce75 streaming: Do not call rpc stream flush in send_mutation_fragments
The stream close() guarantees the data sent will be flushed. No need to
call the stream flush() since the stream is not reused.

Follow up fix for commit bac987e32a (streaming: Send error code from
the sender to receiver).

Refs #4789
2019-08-07 14:31:17 +02:00
Avi Kivity
689fc72bab Update seastar submodule
* seastar d199d27681...a1cf07858b (1):
  > Merge 'Do not return a variadic future form server_socket::accept()' from Avi

Seastar configure.py now has --api-level=1, to keep us one the old variadic future
server_socket::accept() API.
2019-08-06 18:37:27 +03:00
Avi Kivity
97f66c72af Update seastar submodule
* seastar d90834443c...d199d27681 (3):
  > sharded: support for non-cooperative service types
  > shared_future: silence warning about discarded future
  > Fix backtrace suppression message in cpu_stall_detector.

Fixes #4560.
2019-08-06 18:00:48 +03:00
Asias He
bac987e32a streaming: Send error code from the sender to receiver
In case of error on the sender side, the sender does not propagate the
error to the receiver. The sender will close the stream. As a result,
the receiver will get nullopt from the source in
get_next_mutation_fragment and pass mutation_fragment_opt with no value
to the generating_reader. In turn, the generating_reader generates end
of stream. However, the last element that the generating_reader has
generated can be any type of mutation_fragment. This makes the sstable
that consumes the generating_reader violates the mutation_fragment
stream rule.

To fix, we need to propagate the error. However RPC streaming does not
support propagate the error in the framework. User has to send an error
code explicitly.

Fixes: #4789
2019-08-06 16:54:56 +02:00
Piotr Jastrzebski
24f6d90a45 sstables: add test of sstables_mutation_reader for missing partition_end
Reproduces #4783

Issue was fixed by 9b8ac5ecbc

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-08-06 15:11:19 +03:00
Calle Wilund
6c62e5741e init: Use the "prefer_ipv6" options available for rpc/listen address/interface
Fixes #4751

Adds using a preferred address family to dns name lookups related to
listen address and rpc address, adhering to the respective "prefer" options.

API, prometheus and broadcast address are all considered to be covered by
the "listen_interface_prefer_ipv6" option.

Note: scylla does not yet support actual interface binding, but these
options should apply equally to address name parameters.

Setting a "prefer_ipv6" option automtially enables ipv6 dns family query.
2019-08-06 08:32:10 +00:00
Calle Wilund
6c0c1309b3 inet_address: Add optional "preferred type" to lookup
Allows using prio in address family dns lookup. I.e. prefer ipv4/ipv6 if avail.
2019-08-06 08:32:10 +00:00
Calle Wilund
d3410f0e48 config: Add rpc_interface_prefer_ipv6 parameter
As already existing in scylla.yaml
2019-08-06 08:32:10 +00:00
Calle Wilund
0028cecb8e config: Add listen_interface_perfer_ipv6 parameter
As already existing in scylla.yaml.
https://github.com/apache/cassandra/blob/cassandra-3.11/conf/cassandra.yaml#L622
2019-08-06 08:32:10 +00:00
Calle Wilund
39d18178eb config.cc: Fix enable_ipv6_dns_lookup actual param name
When adding option (and iterating through config refactoring)
the member name and the config param name got out of sync
2019-08-06 08:32:09 +00:00
Calle Wilund
298da3fc4b api/storage_service: Add "sstable_info" command
Assembles information and attributes of sstables in one or more
column families.

v2:
* Use (not really legal) nested "type" in json
* Rename "table" param to "cf" for consistency
* Some comments on data sizes
* Stream result to avoid huge string allocations on final json
2019-08-06 08:14:15 +00:00
Calle Wilund
95a8ff12e7 sstables/compress: Make compressor pointer accessible from compression info 2019-08-06 07:07:44 +00:00
Calle Wilund
d15c63627c sstables.hh: Add attribute description API to file extension 2019-08-06 07:07:44 +00:00
Calle Wilund
4c67d702c2 sstables.hh: Add compression component accessor 2019-08-06 07:07:44 +00:00
Calle Wilund
770f912221 sstables.hh: Make "has_component" public 2019-08-06 07:07:44 +00:00
Avi Kivity
b77c4e68c2 Merge "Add Zstandard compression #4802" from Kamil
"
This adds the option to compress sstables using the Zstandard algorithm
(https://facebook.github.io/zstd/).

To use, pass 'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor'
to the 'compression' argument when creating a table.
You can also specify a 'compression_level' (default is 3). See Zstd documentation for the available
compression levels.

Resolves #2613.

This PR also fixes a bug in sstables/compress.cc, where chunk length in bytes
was passed to the compressor as chunk length in kilobytes. Fortunately,
none of the compressors implemented until now used this parameter.

Example usage (assuming there exists a keyspace 'a'):

    create table a.a (a text primary key, b int) with compression = {'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor', 'compression_level': 1, 'chunk_length_in_kb': '64'};

Notes:

 1. The code uses an external dependency: https://github.com/facebook/zstd. Since I'm using "experimental" features of the library (using my own allocated memory to store the compression/decompression contexts), according to the library's documentation we need to link it statically (https://github.com/facebook/zstd/blob/dev/lib/zstd.h#L63). I added a git submodule.
 2. The compressor performs some dynamic allocations. Depending on the specified chunk length and/or compression level the allocations might be big and seastar throws warnings. But with reasonable chunk length sizes it should be OK.
 3. It doesn't yet provide an option to train it with dictionaries, but that should be easy to add in another commit.
"

* 'zstd' of https://github.com/kbr-/scylla:
  Configure: rename seastar_pool to submodule_pool, add more submodules to the pool
  Add unit tests for Zstd compression
  Enable tests that use compressed sstable files
  Add ZStandard compression
  Fix the value of the chunk length parameter passed to compressors
2019-08-05 16:29:27 +03:00
Botond Dénes
23cc6d6fb2 make_flat_mutation_reader_from_fragments: reader: silence discarded future warning
The fragment reader calls `fast_forward_to()` from its constructor to
discard fragments that fall outside the query range. Mmove the
the fast-forward code in to an internal void returning method, and call
that from both the constructor and `fast_forward_to()`, to avoid a
warning on a discarded future<>.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190801133942.10744-1-bdenes@scylladb.com>
2019-08-05 16:21:50 +03:00
Kamil Braun
3a0308f76f Configure: rename seastar_pool to submodule_pool, add more submodules to the pool
Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-08-05 14:55:56 +02:00
Kamil Braun
c3c7c06e10 Add unit tests for Zstd compression
Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-08-05 14:55:56 +02:00
Kamil Braun
8b58cdab0a Enable tests that use compressed sstable files
The files in tests/sstables/3.x/compressed/ were not used in the tests.
This commit:
- renames the directory to tests/sstables/3.x/lz4/,
- adds analogous directories and files for other compressors,
- adds tests using these files,
- does some minor refactoring.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-08-05 14:55:56 +02:00
Kamil Braun
f14e6e73bb Add ZStandard compression
This adds the option to compress sstables using the Zstandard algorithm
(https://facebook.github.io/zstd/).
To use, pass 'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor'
to the 'compression' argument when creating a table.
You can also specify a 'compression_level'. See Zstd documentation for the available
compression levels.
Resolves #2613.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-08-05 14:55:53 +02:00
Kamil Braun
7a61bcb021 Fix the value of the chunk length parameter passed to compressors
This commit also fixes a bug in sstables/compress.cc, where chunk length in bytes
was passed to the compressor as chunk length in kilobytes. Fortunately,
none of the compressors implemented until now used this parameter.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-08-05 14:31:33 +02:00
Avi Kivity
95c0804731 Merge "Catch unclosed partition sstable write #4794" from Tomasz
"
Not emitting partition_end for a partition is incorrect. SStable
writer assumes that it is emitted. If it's not, the sstable will not
be written correctly. The partition index entry for the last partition
will be left partially written, which will result in errors during
reads. Also, statistics and sstable key ranges will not include the
last partition.

It's better to catch this problem at the time of writing, and not
generate bad sstables.

Another way of handling this would be to implicitly generate a
partition_end, but I don't think that we should do this. We cannot
trust the mutation stream when invariants are violated, we don't know
if this was really the last partition which was supposed to be
written. So it's safer to fail the write.

Enabled for both mc and la/ka.

Passing --abort-on-internal-error on the command line will switch to
aborting instead of throwing an exception.

The reason we don't abort by default is that it may bring the whole
cluster down and cause unavailability, while it may not be necessary
to do so. It's safer to fail just the affected operation,
e.g. repair. However, failing the operation with an exception leaves
little information for debugging the root cause. So the idea is that the
user would enable aborts on only one of the nodes in the cluster to
get a core dump and not bring the whole cluster down.
"

* 'catch-unclosed-partition-sstable-write' of https://github.com/tgrabiec/scylla:
  sstables: writer: Validate that partition is closed when the input mutation stream ends
  config, exceptions: Add helper for handling internal errors
  utils: config_file: Introduce named_value::observe()
2019-08-04 15:18:31 +03:00
Asias He
3b39a59135 storage_service: Replicate and advertise tokens early in the boot up process
When a node is restarted, there is a race between gossip starts (other
nodes will mark this node up again and send requests) and the tokens are
replicated to other shards. Here is an example:

- n1, n2
- n2 is down, n1 think n2 is down
- n2 starts again, n2 starts gossip service, n1 thinks n2 is up and sends
  reads/writes to n2, but n2 hasn't replicated the token_metadata to all
  the shards.
- n2 complains:
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  storage_proxy - Failed to apply mutation from $ip#4: std::runtime_error
  (sorted_tokens is empty in first_token_index!)

The code path looks like below:

0 stoarge_service::init_server
1    prepare_to_join()
2          add gossip application state of NET_VERSION, SCHEMA and so on.
3         _gossiper.start_gossiping().get()
4    join_token_ring()
5           _token_metadata.update_normal_tokens(tokens, get_broadcast_address());
6           replicate_to_all_cores().get()
7           storage_service::set_gossip_tokens() which adds the gossip application state of TOKENS and STATUS

The race talked above is at line 3 and line 6.

To fix, we can replicate the token_metadata early after it is filled
with the tokens read from system table before gossip starts. So that
when other nodes think this restarting node is up, the tokens are
already replicated to all the shards.

In addition, this patch also fixes the issue that other nodes might see
a node miss the TOKENS and STATUS application state in gossip if that
node failed in the middle of a restarting process, i.e., it is killed
after line 3 and before line 7. As a result we could not replace the
node.

Tests: update_cluster_layout_tests.py
Fixes: #4709
Fixes: #4723
2019-08-04 15:18:31 +03:00
Avi Kivity
aebb9bd755 Merge "tests/mutation_source_test: pass query time to populate" from Botond
"
Altough 733c68cb1 made sure to synchronize the query time used for
compaction happening in the mutation_source_test suite and that
happening in the `flat_mutation_assertions` class, there remained
another hidden compaction that potentially could use a different
timestamp and hence produce false positive test failures. This was
hastily fixed by cea3338e3, by just increasing the TTL of cells, thus
avoiding possible differences in compaction output. This mini-series is
the proper fix to this problem. It passes a query time to the populate
function, allowing the users of the mutation source test suite to
forward it to any compaction they might be doing on the data. The quick
fix is reverted in favor of the proper fix.

Refs: #4747
"

* 'mutation_source_tests_proper_ttl_fix/v1' of https://github.com/denesb/scylla:
  Revert "tests/mutation_source_tests: generate_mutation_sets() use larger ttl"
  tests/sstable_mutation_test: test_sstable_conforms_to_mutation_source: use query_time
  tests/mutation_source_test: add populate_fn overload with query_time
2019-08-04 15:18:31 +03:00
Tomasz Grabiec
43c7144133 sstables: writer: Validate that partition is closed when the input mutation stream ends
Not emitting partition_end for a partition is incorrect. Sstable
writer assumes that it is emitted. If it's not, the sstable will not
be written correctly. The partition index entry for the last partition
will be left partially written, which will may result in errors during
reads. Also, statistics and sstable key ranges will not include the
last partition.

It's better to catch this problem at the time of writing, and not
generate bad sstables.

Another way of handling this would be to implicitly generate a
partition_end, but I don't think that we should do this. We cannot
trust the mutation stream when invariants are violated, we don't know
if this was really the last partition which was supposed to be
written. So it's safer to fail the write.

Enabled for both mc and la/ka.
2019-08-02 11:13:54 +02:00
Tomasz Grabiec
bf70ee3986 config, exceptions: Add helper for handling internal errors
The handler is intended to be called when internal invariants are
violated and the operation cannot safely continue. The handler either
throws (default) or aborts, depending on configuration option.

Passing --abort-on-internal-error on the command line will switch to
aborting.

The reason we don't abort by default is that it may bring the whole
cluster down and cause unavailability, while it may not be necessary
to do so. It's safer to fail just the affected operation,
e.g. repair. However, failing the operation with an exception leaves
little information for debugging the root cause. So the idea is that the
user would enable aborts on only one of the nodes in the cluster to
get a core dump and not bring the whole cluster down.
2019-08-02 11:13:54 +02:00
Tomasz Grabiec
61a9cfbfa9 utils: config_file: Introduce named_value::observe() 2019-08-02 11:13:53 +02:00
Avi Kivity
093d2cd7e5 reconcilable_result: use chunked_vector to hold partitions
Usually, a reconcilable_result holds very few partitions (1 is common),
since the page size is limited by 1MB. But if we have paging disabled or
if we are reconciling a range full of tombstones, we may see many more.
This can cause large allocations.

Change to chunked_vector to prevent those large allocations, as they
can be quite expensive.

Fixes #4780.
2019-08-01 18:49:13 +03:00
Avi Kivity
eaa9a5b0d7 utils::chunked_vector: add rbegin() and related iterators
Needed as an std::vector replacement.
2019-08-01 18:39:47 +03:00
Avi Kivity
df6faae980 utils: chunked_vector: make begin()/end() const correct
begin() of a const vector should return a const_iterator, to avoid
giving the caller the ability to mutate it.

This slipped through since iterator's constructor does a const_cast.

Noticed by code inspection.
2019-08-01 18:38:53 +03:00
Botond Dénes
0b748bb8fe Revert "tests/mutation_source_tests: generate_mutation_sets() use larger ttl"
This reverts commit cea3338e38.

The above was a quick fix to allow the tests to pass, there is a proper
fix now.
2019-08-01 13:05:46 +03:00
Botond Dénes
ac91f1f6b8 tests/sstable_mutation_test: test_sstable_conforms_to_mutation_source: use query_time
Use the query_time passed in to the populate function and forward it to
the sstable constructor, so that the compaction happening during sstable
write uses the same query time that any compaction done by the mutation
source test suit does.
2019-08-01 13:04:21 +03:00
Botond Dénes
ce1ed2cb70 tests/mutation_source_test: add populate_fn overload with query_time
So tests that do compaction can pass the query_time they used for it to
clients that do some compaction themselves, making sure all compactions
happen with the same query time, avoiding false positive test failures.
2019-08-01 13:03:03 +03:00
Vlad Zolotarov
15eaf2fd8e dist: scylla_util.py: get_mode_cpuset(): don't let false alarm error messages
Don't let perftune.py print false alarm error message when we calculate
a compute CPU set for tuning modes.

This may happen when we calculate a CPU set for non-MQ tuning modes on
small systems on which these modes are forbidden because they would
result in a zero CPU set, e.g. sq_split on a system with a single
physical core.

We are going to utilize a newly introduced --get-cpu-mask-quiet execution
mode introduced to the seastar/script/perftune.py by the
"perftune.py: introduce --get-cpu-mask-quiet" series which would return
a zero CPU set if that's what it turns out to be instead of exiting with
an error what --get-cpu-mask would do in such a case.

The rest of scylla_util.py logic is going to handle a zero CPU set
returned by get_mode_cpuset() correctly.

Fixes #4211
Fixes #4443

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190731212901.9510-1-vladz@scylladb.com>
2019-08-01 11:14:39 +03:00
Botond Dénes
339be3853d foreign_reader: silence warning about discarded future
And add a comment explaining why this is fine.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190801062234.69081-1-bdenes@scylladb.com>
2019-08-01 10:11:24 +03:00
Avi Kivity
47b0f40d27 Merge "introduce metrics for non-local queries" from Konstantin
"
A fix for #4338 "storage_proxy add a counter for cql requests
that arrived to a non replica"

Such requests should be tracked since forwarding them to a correct
replica can create a lot network noise and incur significant performance
penalty.

The current metrics are considered insufficient after introduction
of heat-weighted load balancing.
"

Fixes #4388.

* 'gh-4338' of https://github.com/kostja/scylla:
  metrics: introduce a metric for non-local reads
  metrics: account writes forwarded by a coordinator in an own metric.
2019-08-01 10:09:33 +03:00
Avi Kivity
77686ab889 Merge "Make SSTable cleanup run aware" from Raphael
"
Fixes #4663.
Fixes #4718.
"

* 'make_cleanup_run_aware_v3' of https://github.com/raphaelsc/scylla:
  tests/sstable_datafile_test: Check cleaned sstable is generated with expected run id
  table: Make SSTable cleanup run aware
  compaction: introduce constants for compaction descriptor
  compaction: Make it possible to config the identifier of the output sstable run
  table: do not rely on undefined behavior in cleanup_sstables
2019-07-31 19:10:22 +03:00
Botond Dénes
a41e8f0bcf query::consume_page: move away from variadic future
Require the `consumer` to return 0 or 1 value in its future. Update all
downstream code.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190731140440.57295-1-bdenes@scylladb.com>
2019-07-31 18:49:47 +03:00
Avi Kivity
320fd2be60 Update seastar submodule
* seastar 3f88e9068b...d90834443c (12):
  > Print warning when somaxconn lower than backlog parameter used for listen()
  > Merge "perftune.py: introduce --get-cpu-mask-quiet" from Vlad
  > seastar-json2code: Handle "$ref"-usage for nested object types properly
  > Make future [[nodiscard]]
  > Allow pass listen_options to http_server::listen
  > Handle EPOLLHUP and EPOLLERR from epoll explicitly
  > reactor: fix false positives in the stall detector due to large task queue
  > Merge "Small asan related improvements" from Rafael
  > thread: reduce allocations during context switch
  > thread: remove deprecated thread_scheduling_group and its unit test
  > reactor: make _polls to be non atomic
  > reactor: remove unused _tasks_processed variable
2019-07-31 18:30:10 +03:00
Takuya ASADA
60ec8b2a04 install.sh: install everything when --pkg is not specified
On previous commit ac9b115a8f, install.sh requires to specify single package using --pkg, there is no way to select all.
It should be select all packages when running install.sh without --pkg.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190731013245.5857-1-syuu@scylladb.com>
2019-07-31 16:43:57 +03:00
Asias He
5d3e4d7b73 messaging_service: Check if messaging_service is stopped before get_rpc_client
get_rpc_client assumes the messaging_service is not stopped. We should check
is_stopping() before we call get_rpc_client.

We do such check in existing code, e.g., send_message and friends. Do
the same check in the newly introduced
make_sink_and_source_for_stream_mutation_fragments() and friends for row
level repair.

Fixes: #4767
2019-07-31 11:44:57 +03:00
Avi Kivity
74349bdf7e Merge "Partially devirtualize CQL restrictions" from Piotr
"
This series is a batch of first small steps towards devirtualizing CQL restrictions:
 - one artificial parent class in the hierarchy is removed: abstract_restriction
 - the following functions are devirtualized:
    * is_EQ()
    * is_IN()
    * is_slice()
    * is_contains()
    * is_LIKE()
    * is_on_token()
    * is_multi_column()

Future steps can involve the following:
 - introducing a std::variant of restriction targets: it's either a column
   or a vector of columns
 - introducing a std::variant of restriction values: it's one of:
   {term, term_slice, std::vector<term>, abstract_marker}

The steps above will allow devirtualizing most of the remaining virtual functions
in favor of std::visit. They will also reduce the number of subclasses,
e.g. what's currently `token_restriction::IN_with_values` can be just an instance
of `restriction`, knowing that it's on a token, having a target of std::vector<column>
and a value of std::vector<term>.

Tests: unit(dev), dtest: cql_tests, cql_additional_tests
"

* 'refactor_restrictions_2' of https://github.com/psarna/scylla:
  cql3: devirtualize is_on_token()
  cql3: devirtualize is_multi_column()
  cql3: devirtualize is_EQ, is_IN, is_contains, is_slice, is_LIKE
  tests: add enum_set adding case
  cql3: allow adding enum_sets
  cql3: remove abstract_restriction class
2019-07-31 11:44:57 +03:00
Vlad Zolotarov
9df53b8bca configure.py: ignore 'thrift -version' exit code
(At least) on Ubuntu 19 'thrift -version' prints the expected
string but its exit status is non-zero:

$ thrift -version
Thrift version 0.9.1
$ echo $?
1

We don't really care about the exit status but rather about the printed
version string. If there is going to be some problem with the command,
e.g. it's missing, the printed string is not going to be as expected
anyway - let's verify that explicitly by checking the format of the
returned string in that case.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190722211729.24225-1-vladz@scylladb.com>
2019-07-31 11:44:57 +03:00
Botond Dénes
cea3338e38 tests/mutation_source_tests: generate_mutation_sets() use larger ttl
Currently all cells generated by this method uses a ttl of 1. This
causes test flakyness as tests often compact the input and output
mutations to weed out artificial differences between them. If this
compaction is not done with the exact same query time, then some cells
will be expired in one compaction but not in the other.
733c68cb1 attempted to solve this by passing the same query time to
`flat_mutation_reader_assertions::produce_compacted()` as well as
`mutation_partition::compact_for_query()` when compacting the input
mutation. However a hidden compaction spot remained: the ka/la sstable
writer also does some compaction, and for this it uses the time point
passed to the `sstable` constructor, which defaults to
`gc_clock::now()`. This leads to false positive failures in
`sstable_mutation_test.cc`.
At this point I don't know what the original intent was behind this low
`ttl` value. To solve the immediate problem of the tests failing, I
increased it. If it turns out that this `ttl` value has a good reason,
we can do a more involved fix, of making sure all sstables written also
get the same query time as that used for the compaction.

Fixes: #4747

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190731081522.22915-1-bdenes@scylladb.com>
2019-07-31 11:44:57 +03:00
Piotr Sarna
2f65144a20 cql3: devirtualize is_on_token()
Instead of being a virtual function, is_on_token leverages
the existing enum inside the `restriction` class.
2019-07-29 17:18:50 +02:00
Piotr Sarna
68aa42c545 cql3: devirtualize is_multi_column()
Instead of being a virtual function, is_multi_column leverages
an enum.
2019-07-29 17:18:50 +02:00
Piotr Sarna
83fbfe5a4f cql3: devirtualize is_EQ, is_IN, is_contains, is_slice, is_LIKE
Instead of virtual functions, operation for each restriction
is determined by an enum value it stores.
2019-07-29 17:18:49 +02:00
Piotr Sarna
e9798354ae tests: add enum_set adding case 2019-07-29 17:15:51 +02:00
Piotr Sarna
989c31f68b cql3: allow adding enum_sets
Enum set can now be added to another enum set in order to create
a sum of both.
2019-07-29 17:15:51 +02:00
Piotr Sarna
5e06801f12 cql3: remove abstract_restriction class
All restrictions inherit from `abstract_restriction` class,
which has only one parent class: `restriction`. To simplify the
inheritance tree, `restriction` and `abstract_restriction`
are merged into one class named `restriction`.
2019-07-29 15:54:39 +02:00
Botond Dénes
733c68cb13 tests: flat_reader_assertions::produces_compacted(): add query_time param
`produces_compacted()` is usually used in tandem of another
compaction done on the expected output (`m` param). This is usually done
so that even though the reader works with an uncompacted stream, when
checking the checking of the result will not fail due to insignificant
changes to the data, e.g. expired collection cells dropped while merging
two collections. Currently, the two compactions, the one inside
`produce_compacted()` and the one done by the caller uses two separate
calls to `gc_clock::now()` to obtain the query time. This can lead to
off-by-one errors in the two query times and subsequently artificial
differences between the two compacted mutations, ultimately failing the
test due to a false-positive.
To prevent this allow callers to pass in a query time, the same they
used to compact the input mutation (`m`).

This solves another source of flakyness in unit tests using the mutation
source test suite.

Refs: #4695
Fixes: #4747
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190726144032.3411-1-bdenes@scylladb.com>
2019-07-28 10:59:50 +03:00
Botond Dénes
f215286525 tests/mutation_reader_tests: move away from variadic futures
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190724101005.19126-1-bdenes@scylladb.com>
2019-07-27 13:21:24 +03:00
Botond Dénes
0f30bc0004 mutation_reader: move away from variadic futures
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190724102246.20450-1-bdenes@scylladb.com>
2019-07-27 13:21:24 +03:00
Botond Dénes
6742c77229 scylla-gdb.py: fix scylla_ptr
Broken since b3adabda2.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190726140532.124406-1-bdenes@scylladb.com>
2019-07-27 13:21:24 +03:00
Avi Kivity
b272db368f sstable: index_reader: close index_reader::reader more robustly
If we had an error while reading, then we would have failed to close
the reader, which in turn can cause memory corruption. Make the
closing more robust by using then_wrapped (that doesn't skip on
exception) and log the error for analysis.

Fixes #4761.
2019-07-26 14:26:04 +02:00
Avi Kivity
fcf3195e54 Update seastar submodule
* seastar c1be3c912f...3f88e9068b (3):
  > reactor: improve handling of connect storms
  > json: Make date formatter use RFC8601/RFC3339 format
  > reactor: fix deadlock of stall detector vs dlopen

Fixes #4759.
2019-07-25 18:29:54 +03:00
Takuya ASADA
ac9b115a8f dist/debian: use install.sh on Debian
Currently, install.sh just used for building .rpm, we have similar build script
under dist/debian, sometimes it become inconsistent with install.sh.
Since most of package build process are same, we should share install.sh on both
.rpm and .deb package build process.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190725123207.2326-1-syuu@scylladb.com>
2019-07-25 18:22:42 +03:00
Botond Dénes
6dd8c4da83 test_multishard_combining_reader_non_strictly_monotonic_positions: use the same deletion_time for tombstones
Across all calls to `make_fragments_with_non_monotonic_positions()`, to
prevent off-by one errors between the separately generated test input
and expected output. This problem was already supposed to be fixed by
5f22771ea8 but for some reason that only
used the same deletion time *inside* a single call, which will still
fall short in some cases.
This should hopefully fix this problem for good.

Refs: #4695
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190724073240.125975-1-bdenes@scylladb.com>
2019-07-25 12:37:34 +02:00
Kamil Braun
148d4649d6 Add option to create a XUnit output file for non-boost tests in test.py. (#4757)
If the user specifies an output file name using "--xunit=<filename>",
test.py will write the test results of non-boost tests to the file in the XUnit XML format.
Every boost test creates its own results file already.
Resolves #4680.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-25 12:47:47 +03:00
Vlad Zolotarov
53cf90b075 ec2_snitch: properly build the AWS meta server address
Explicity pass the port number of the AWS metadata server API
when creating a corresponding socket.

This patch fixes the regression introduced by 4ef940169f.

Fixes #4719

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-07-25 10:50:01 +03:00
Tomasz Grabiec
3af8431a40 Merge "compaction: allow collecting purged data" from Botond
compaction: allow collecting purged data

Allow the compaction initiator to pass an additional consumer that will
consume any data that is purged during the compaction process. This
allows the separate retention of these dead cells and tombstone until
some long-running process like compaction safely finishes. If the
process fails or is interrupted the purged data can be used to prevent
data resurrection.

This patch was developed to serve as the basis for a solution to #4531
but it is not a complete solution in and on itself.

This series is a continuation of the patch: "[PATCH v1 1/3] Introduce
Garbage Collected Consumer to Mutation Compactor" by Raphael S.
Carvalho <raphaelsc@scylladb.com>.

Refs: #4531

* https://github.com/denesb/scylla.git compaction_collect_purged_data/v8:
  Introduce compaction_garbage_collector interface
  collection_type_impl::mutation: compact_and_expire() add collector
    parameter
  row: add garbage_collector
  row_marker: de-inline compact_and_expire()
  row_marker: add garbage_collector
  Introduce Garbage Collected Consumer to Mutation Compactor
  tests: mutation_writer_test.cc/generate_mutations() ->
    random_schema.hh/generate_random_mutations()
  tests/random_schema: generate_random_mutations(): remove `engine`
    parameter
  tests/random_schema: add assert to make_clustering_key()
  tests/random_schema: generate_random_mutations(): allow customizing
    generated data
  tests: random_schema: futurize generate_random_mutations()
  random_schema: generate_random_mutations(): restore indentation
  data_model: extend ttl and expiry support
  tests/random_schema: generate_random_mutations(): generate partition
    tombstone
  random_schema: add ttl and expiry support
  tests/random: add get_bool() overload with random engine param
  random_schema: generate_random_mutations(): ensure partitions are
    unique
  tests: add unit tests for the data stream split in compaction
2019-07-23 17:12:28 +02:00
Avi Kivity
44b5878011 Merge "Fix possible stalls in row level repair" from Asias
"
After switching to rpc stream interface, we increased the row buffer
size. Code works on the buffer that do not yield can stall the reactor.

This series fixes the issue by futurizing or running the code in thread
and yield.

Fixes: #4642
"

* 'repair_switch_to_rpc_stream_fix_stall' of https://github.com/asias/scylla:
  repair: Enable rpc stream in row level repair
  repair: Wrap with foreign_ptr to avoid cross cpu free
  repair: Futurize get_repair_rows_size and row_buf_size
  repair: Avoid calling get_repair_rows_size in get_sync_boundary
  repair: Futurize row_buf_csum
  repair: Yield inside get_set_diff
  repair: Use get_repair_rows_size helper in get_sync_boundary
  repair: Avoid stall in do_estimate_partitions_on_local_shard
  remove get_row_diff
  repair: Futurize get_row_diff to avoid stall
  repair: Fix possible stall in request_row_hashes
  repair: Allow default construct for repair_row
  repair: Remove apply_rows
  repair: Run get_row_diff_with_rpc_stream in a thread
  repair: Run get_row_diff_and_update_peer_row_hash_sets inside a thread
  repair: Run get_row_diff inside a thread
  repair: Add apply_rows_on_master_in_thread
  repair: Add apply_rows_on_follower
  repair: Futurize working_row_hashes
  repair: Remove get_full_row_hashes helper
2019-07-22 15:54:06 +03:00
Avi Kivity
9e630eb734 Update seastar submodule
* seastar 44a300cd50...c1be3c912f (9):
  > execution_stage: prevent unbounded growth
  > io queues: Add renaming functionality to io priority class
  > scheduling: Add rename functionality to scheduling groups
  > net: Add listen_backlog option for posix stack
  > future: deprecate variadic futures
  > include,tests: add workaround for missing guaranteed copy elision
  > core/dpdk_rte: handle 64+ cores
  > perftune: add a dry-run mode
  > build: support building dpdk on arm64

Fixes #4749.
2019-07-22 15:41:54 +03:00
Avi Kivity
e03c7003f1 toppartitions: fix race between listener removal and reads
Data listener reads are implemented as flat_mutation_readers, which
take a reference to the listener and then execute asynchronously.
The listener can be removed between the time when the reference is
taken and actual execution, resulting in a dangling pointer
dereference.

Fix by using a weak_ptr to avoid writing to a destroyed object. Note that writes
don't need protection because they execute atomically.

Fixes #4661.

Tests: unit (dev)
2019-07-22 13:26:18 +02:00
Avi Kivity
d730969278 Merge "make sure failure to create snapshots won't crash the node" from Glauber
"
Issue #4558 describes a situation in which failure to execute clearsnapshots
will hard crash the node. The problem is that clearsnapshots will internally use
lister::rmdir, which in turn has two in-tree users: clearing snapshots and clearing
temporary directories during sstable creation. The way it is currently
coded, it wraps the io functions in io_check, which means that failures
to remove the directory will crash the database.

We recently saw how benign failures crashed a database during
clearsnapshot: we had snapshot creation running in parallel, adding more
files to the directory that wasn't empty by the time of deletion. I
have also seen very often users add files to existing directories by
accident, which is another possibility to trigger that.

This patch removes the io_check from lister, and moves it to the caller
in which we want to be more strict. We still want to be strict about
the creation of temporary directories, since users shouldn't be touching
that in any way.

Also while working on that, I realized we have no tests for snapshots of
any kind in tree, so let's write some
"

* 'snapshots' of https://github.com/glommer/scylla:
  tests: add tests for snapshots.
  lister: don't crash the node on failure to remove snapshot
2019-07-22 11:09:23 +03:00
Rafael Ávila de Espíndola
636e2470b1 Always close commitlog files
We were using segment::_closed to decide whether _file was already
closed. Unfortunately they are not exactly the same thing. As far as
I understand it, segments can be closed and reused without actually
closing the file.

Found with a seastar patch that asserts on destroying an open
append_challenged_posix_file_impl.

Fixes #4745.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190721171332.7995-1-espindola@scylladb.com>
2019-07-22 10:08:57 +03:00
Vlad Zolotarov
5632c0776e tests: fix the compilation with fmt v5.3.0
Compilation fails with fmt release 5.3.0 when we print a bytes_view
using "{}" formatter.

Compiler's complain is: "error: static assertion failed: mismatch
between char-types of context and argument"

Fix this by explicitly using to_hex() converter.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190716221231.22605-3-vladz@scylladb.com>
2019-07-21 16:42:54 +03:00
Nadav Har'El
db8d4a0cc6 Add computed columns
Merged patch series by Piotr Sarna:

This series introduces the concept of "computed" column, which represents
values not provided directly by the user, but computed on the fly -
possibly using other column values. It will be used in the future to
implement map value indexing, collection indexing, etc. Right now the only
use is the token column for secondary indexes - which is a column computed
from the base partition key value.

After this series, another one that depends on it and adds map value
indexing will be pushed.

Tests: unit(dev)

Piotr Sarna (14):
  schema: add computed info to column definition
  schema: add implementation of computing token column
  schema: allow marking columns as computed in schema builder
  service: add computed columns feature
  view: check for computed columns in view
  view: remove unused token_for function
  database: add fixing previous secondary index schemas
  tests: disable computed columns feature in schema change test
  tests: add schema change test regeneration comment
  db: add system_schema.computed_columns
  docs: init system_schema_keyspace.md with column computations
  tests: generate new test case for schema change + computed cols
  index: mark token column as 'computed' when creating mv
  tests: add checking computed columns in SI

 column_computation.hh                         |  63 ++++++++
 db/schema_features.hh                         |   4 +-
 db/schema_tables.hh                           |   4 +
 idl/frozen_schema.idl.hh                      |   1 +
 schema.hh                                     |  40 +++++
 schema_builder.hh                             |   4 +-
 schema_mutations.hh                           |  18 ++-
 service/storage_service.hh                    |   8 +
 view_info.hh                                  |   2 -
 database.cc                                   |   6 +-
 db/schema_tables.cc                           | 146 ++++++++++++++++--
 db/view/view.cc                               |  46 +++---
 index/secondary_index_manager.cc              |   2 +-
 schema.cc                                     |  58 ++++++-
 schema_mutations.cc                           |  14 +-
 service/storage_service.cc                    |   5 +
 tests/schema_change_test.cc                   |  63 ++++++--
 tests/secondary_index_test.cc                 |  28 ++++
 docs/system_schema_keyspace.md                |  40 +++++

 plus about 200 new test sstable files
2019-07-21 13:05:46 +03:00
Piotr Sarna
4d1eaf8478 tests: add checking computed columns in SI
The test case checks if token column generated for global indexing
is indeed only present in global indexes and is marked as a computed
column.
2019-07-19 11:58:42 +02:00
Piotr Sarna
a8f7d64a08 index: mark token column as 'computed' when creating mv
Secondary indexes use a computed token column to preserve proper
query ordering. This column is now marked as 'computed'.
2019-07-19 11:58:42 +02:00
Piotr Sarna
1c0ef5f9e9 tests: generate new test case for schema change + computed cols
The original "test_schema_digest_does_not_change" test case ensures
that schema digests will match for older nodes that do not support
all the features yet (including computed columns).
The additional case uses sstables generated after computed columns
are allowed, in order to make sure that the digest computed
including computed columns does not change spuriously as well.
2019-07-19 11:58:42 +02:00
Piotr Sarna
1e54752167 docs: init system_schema_keyspace.md with column computations
The documentation file for system_schema keyspace is introduced,
and its first entry describes the column_computation table.
2019-07-19 11:58:42 +02:00
Piotr Sarna
c1d5aef735 db: add system_schema.computed_columns
Information on which columns of a table are 'computed' is now kept
in system_schema.computed_columns system table.
2019-07-19 11:58:42 +02:00
Piotr Sarna
589200f5a2 tests: add schema change test regeneration comment
Schema change test might need regenerating every time a system table
is added. In order to save future developer's time on debugging this
test, a short description of that requirement is added.
2019-07-19 11:58:42 +02:00
Piotr Sarna
03ade01db7 tests: disable computed columns feature in schema change test
In order to make sure that old schema digest is not recomputed
and can be verified - computed columns feature is initially disabled
in schema_change_test.
The reason for that is as follows: running CQL test env assumes that
we are running the newest cluster with all features enabled. However,
the mere existence of some features might influence digest calculation.
So, in order for the existing test to work correctly, it should have
exactly the same set of cluster supported features as it had during
its creation. It used to be "all features", but now it's "all features
except computed columns". One can think of that as running a cluster
with some nodes not yet knowing what computed columns are, so they
are not taken into account when computing digests.
Additionally, a separate test case that takes computed column digest
into account will be generated and added in this series.
2019-07-19 11:58:42 +02:00
Piotr Sarna
17c323c096 database: add fixing previous secondary index schemas
If a schema was created before computed columns were implemented,
its token column may not have been marked as computed.
To remedy this, if no computed column is found, the schema
will be recreated.
The code will work correctly even without this patch in order to support
upgrading from legacy versions, but it's still important: it transforms
token columns from the legacy format to new computed format, which will
eventually (after a few release cycles) allow dropping the support for
legacy format altogether.
2019-07-19 11:58:42 +02:00
Piotr Sarna
3c5dd94306 view: remove unused token_for function
The function was only used once in code removed in this series.
2019-07-19 11:58:42 +02:00
Piotr Sarna
6a6871aa0e view: check for computed columns in view
Currently, having a 'computed' column in view update generation
indicates that token value needs to be generated and assigned to it.
2019-07-19 11:58:42 +02:00
Piotr Sarna
a0e02df36a service: add computed columns feature
Computed columns feature should be checked before creating
index schemas the new way - by adding computed column names
to system_schema.computed_columns.
2019-07-19 11:58:42 +02:00
Piotr Sarna
a1100e3737 schema: allow marking columns as computed in schema builder
In order to be able to transform legacy materialized view definitions,
builder is now able to mark an existing column as computed.
2019-07-19 11:58:41 +02:00
Piotr Sarna
65bf6d34fe schema: add implementation of computing token column
Computed column of 'token' type can now have its value computed.
2019-07-19 11:47:48 +02:00
Piotr Sarna
491b7a817f schema: add computed info to column definition
Some columns may represent not user-provided values, but ones computed
from other columns. Currently an example is token column used in secondary
indexes to provide proper ordering. In order to avoid hardcoding special
cases in execution stage, optional additional information for computed
columns is stored in column definition.
2019-07-19 11:47:46 +02:00
Tomasz Grabiec
7604980d63 database: Add missing partition slicing on streaming reader recreation
streaming_reader_lifecycle_policy::create_reader() was ignoring the
partition_slice passed to it and always creating the reader for the
full slice.

That's wrong because create_reader() is called when recreating a
reader after it's evicted. If the reader stopped in the middle of
partition we need to start from that point. Otherwise, fragments in
the mutation stream will appear duplicated or out of ordre, violating
assumptions of the consumers.

This was observed to result in repair writing incorrect sstables with
duplicated clustering rows, which results in
malformed_sstable_exception on read from those sstables.

Fixes #4659.

In v2:

  - Added an overload without partition_slice to avoid changing existing users which never slice

Tests:

  - unit (dev)
  - manual (3 node ccm + repair)

Backport: 3.1
Reviewd-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1563451506-8871-1-git-send-email-tgrabiec@scylladb.com>
2019-07-18 18:35:28 +03:00
Asias He
64a4c0ede2 streaming: Do not open rpc stream connection if ranges are not relevant to a shard
Given a list of ranges to stream, stream_transfer_task will create an
reader with the ranges and create a rpc stream connection on all the shards.

When user provides ranges to repair with -st -et options, e.g.,
using scylla-manger, such ranges can belong to only one shard, repair
will pass such ranges to streaming.

As a result, only one shard will have data to send while the rpc stream
connections are created on all the shards, which can cause the kernel
run out of ports in some systems.

To mitigate the problem, do not open the connection if the ranges do not
belong to the shard at all.

Refs: #4708
2019-07-18 18:31:21 +03:00
Avi Kivity
51cff8ad23 Merge "Fix storage service for tests" from Botond
"
Fix another source of flakyness in mutation_reader_test. This one is caused by storage_service_for_tests lacking a config::broadcast_to_all_shards() call, triggering an invalid memory access (or SEGFAULT) when run on more than one shards.

Refs: #4695
"

* 'fix_storage_service_for_tests' of https://github.com/denesb/scylla:
  tests: storage_service_for_tests: broadcast config to all shards
  tests: move storage_service_for_tests impl to test_services.cc
2019-07-18 18:27:47 +03:00
Nadav Har'El
997b92a666 migration_manager: allow dropping table and all its views
The function announce_column_family_drop() drops (deletes) a base table
and all the materialized-views used for its secondary indexes, but not
other materialized views - if there are any, the operation refuses to
continue. This is exactly what CQL's "DROP TABLE" needs, because it is
not allowed to drop a table before manually dropping its views.

But there is no inherent reason why it we can't support an operation
to delete a table and *all* its views - not just those related to indexes.
This patch adds such an option to announce_column_family_drop().
This option is not used by the existing CQL layer, but can be used
by other code automating operations programatically without CQL.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190716150559.11806-1-nyh@scylladb.com>
2019-07-18 13:26:25 +02:00
Takuya ASADA
bd7d1b2d38 dist/common/systemd: change stop timeout sec to 900s
Currently scylla-server.service uses DefaultTimeoutStopSec = 90, if Scylla
does not able to clean-shutdown in 90sec we may have data corruption on the node.
Since we already set TimeoutStartSec = 900, we can use TimeoutSec to set both
TimeoutStartSec and TimeoutStopSec to 900.

See #4700

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190717095416.10652-1-syuu@scylladb.com>
2019-07-17 15:37:47 +03:00
Nadav Har'El
759752947b drop_index_statement: fix column_family()
All statement objects which derive from cf_statement, including
drop_index_statement, have a column_family() returning the name of the
column family involved in this statement. For most statement this is
known at the time of construction, because it is part of the statement,
but for "DROP INDEX", the user doesn't specify the table's name - just
the index name. So we need to override column_family() to find the
table name.

The existing implementation assert()ed that we can always find such
a table, but this is not true - for example, in a DROP INDEX with
"IF EXISTS", it is perfectly fine for no such table to exist. In this
case we don't want a crash, and not even an except - it's fine that
we just return an empty table name.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190716180104.15985-1-nyh@scylladb.com>
2019-07-17 09:44:47 +03:00
Glauber Costa
be26cbd952 tests: add tests for snapshots.
While inspecting the snapshot code, I realized that we don't have any
tests for it. So I decided to add some.

Unfortunately I couldn't come up with a test of clearsnapshot reliably
failing to remove the directory: relying on create snapshot +
clearsnapshot is racy (not always happen), and other tricks that can be
used to reproduce this -- like creating a root-owned file inside the
snapshots directory -- is environment-dependent, and a bit ugly for unit
tests. Dtests would probably be a better place for that.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-07-16 13:35:53 -04:00
Glauber Costa
2008d982c3 lister: don't crash the node on failure to remove snapshot
lister::rmdir has two in-tree users: clearing snapshots and clearing
temporary directories during sstable creation. The way it is currently
coded, it wraps the io functions in io_check, which means that failures
to remove the directory will crash the database.

We recently saw how benign failures crashed a database during
clearsnapshot: we had snapshot creation running in parallel, adding more
files to the directory that wasn't empty by the time of deletion.  I
have also seen very often users add files to existing directories by
accident, which is another possibility to trigger that.

This patch removes the io_check from lister, and moves it to the caller
in which we want to be more strict. We still want to be strict about
the creation of temporary directories, since users shouldn't be touching
that in any way.

Fixes #4558

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-07-16 13:35:36 -04:00
Kamil Braun
4417e78125 Fix timestamp_type_impl::timestamp_from_string.
Now it accepts the 'z' or 'Z' timezone, denoting UTC+00:00.
Fixes #4641.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-16 19:16:56 +03:00
Asias He
722ab3bb65 repair: Log repair id in check_failed_ranges
Add the word `id` before the repair id in the log. It makes the log
easier to figure out what the number stands for.
2019-07-16 19:10:19 +03:00
Avi Kivity
43690ecbdf Merge "Fix disable_sstable_write synchronization with on_compaction_completion" from Benny
"
disable_sstable_write needs to acquire _sstable_deletion_sem to properly synchronize
with background deletions done by on_compaction_completion to ensure no sstables will
be created or deleted during reshuffle_sstables after
storage_service::load_new_sstables disables sstable writes.

Fixes #4622

Test: unit(dev), nodetool_additional_test.py migration_test.py
"

* 'scylla-4622-fix-disable-sstable-write' of https://github.com/bhalevy/scylla:
  table: document _sstables_lock/_sstable_deletion_sem locking order
  table: disable_sstable_write: acquire _sstable_deletion_sem
  table: uninline enable_sstable_write
  table: reshuffle_sstables: add log message
2019-07-16 19:06:58 +03:00
Amnon Heiman
399d79fc6f init: do not allow replace-address for seeds
If a node is a seed node, it can not be started with
replace-address-first-boot or the replace-address flag.

The issue is that as a seed node it will generate new tokens instead of
replacing the existing one the user expect it to replaec when supplying
the flags.

This patch will throw a bad_configuration_error exception
in this case.

Fixes #3889

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-07-16 18:53:19 +03:00
Calle Wilund
dbc3499fd1 server: Fix cql notification inet address serialization
Fixes #4717

Bug in ipv6 support series caused inet_address serialization
to include an additional "size" parameter in the address chunk.

Message-Id: <20190716134254.20708-1-calle@scylladb.com>
2019-07-16 16:51:59 +03:00
Botond Dénes
b40cf1c43d tests: storage_service_for_tests: broadcast config to all shards
Due to recent changes to the config subsystem, configuration has to be
broadcast to all shards if one wishes to use it on them. The
`storage_service_for_tests` has a `sharded<gms::gossiper>` member, which
reads config values on initialization on each shard, causing a crash as
the configuration was initialized only on shard 0. Add a call to
`config::broadcast_to_all_shards()` to ensure all shards have access to
valid config values.
2019-07-16 10:37:17 +03:00
Botond Dénes
fc9f46d7c1 tests: move storage_service_for_tests impl to test_services.cc
Let's make it easier to find.
2019-07-16 10:36:49 +03:00
Raphael S. Carvalho
7180731d43 tests/sstable_datafile_test: Check cleaned sstable is generated with expected run id
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-07-15 23:39:50 -03:00
Raphael S. Carvalho
332c2ff710 table: Make SSTable cleanup run aware
The cleanup procedure will move any sstable out of its sstable run
because sstables are cleaned up individually and they end up receiving
a new run identifier, meaning a table may potentially end up with a
new sstable run for each of the sstables cleaned.

SStable cleanup needs to be run aware, so that the run structure is
not messed up after the operation is done. Given that only one fragment
or other, composing a sstable run, may need cleanup, it's better to keep
them in their original sstable run.

Fixes #4663.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-07-15 23:39:47 -03:00
Raphael S. Carvalho
8c97e0e43e compaction: introduce constants for compaction descriptor
Make it easier for users, and also avoid duplicating knowledge
about descriptor defaults across the codebase.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-07-15 23:39:44 -03:00
Raphael S. Carvalho
a1db29e705 compaction: Make it possible to config the identifier of the output sstable run
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-07-15 23:39:38 -03:00
Raphael S. Carvalho
0e732ed1cf table: do not rely on undefined behavior in cleanup_sstables
It shouldn't rely on argument evaluation order, which is ub.

Fixes #4718.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-07-15 23:39:22 -03:00
Paweł Dziepak
060e3f8ac2 mutation_partition: verify row::append_cell() precondition
row::append_cell() has a precondition that the new cell column id needs
to be larger than that of any other already existing cell. If this
precondition is violated the row will end up in an invalid state. This
patch adds assertion to make sure we fail early in such cases.
2019-07-15 23:25:06 +02:00
Botond Dénes
5f22771ea8 tests/mutation_reader_test stabilize test_multishard_combining_reader_non_strictly_monotonic_positions
Currently the
test_multishard_combining_reader_non_strictly_monotonic_positions is
flaky. The test is somewhat unconventional, in that it doesn't use the
same instance of data as the input to the test and as it's expected
output, instead it invokes the method which generates this data
(`make_fragments_with_non_monotonic_positions()`) twice, first to
generate the input, and a secondly to generate the expected output. This
means that the test is prone to any deviation in the data generated by
said method. One such deviation, discovered recently, is that the method
doesn't explicitly specify the deletion time of the generated range
tombstones. This results in this deletion time sometimes differing
between the test input and the expected output. Solve by explicitly
passing the same deletion time to all created range tombstones.

Refs: #4695
2019-07-15 23:24:16 +02:00
Tomasz Grabiec
14700c2ac4 Merge "Fix the system.size_estimates table" from Kamil
Fixes a segfault when querying for an empty keyspace.

Also, fixes an infinite loop on smp > 1. Queries to
system.size_estimates table which are not single-partition queries
caused Scylla to go into an infinite loop inside
multishard_combining_reader::fill_buffer. This happened because
multishard_combinind_reader assumes that shards return rows belonging
to separate partitions, which was not the case for
size_estimates_mutation_reader.

Fixes #4689.
2019-07-15 22:09:30 +02:00
Asias He
8774adb9d0 repair: Avoid deadlock in remove_repair_meta
Start n1, n2
Create ks with rf = 2
Run repair on n2
Stop n2 in the middle of repair
n1 will notice n2 is DOWN, gossip handler will remove repair instance
with n2 which calls remove_repair_meta().

Inside remove_repair_meta(), we have:

```
1        return parallel_for_each(*repair_metas, [repair_metas] (auto& rm) {
2            return rm->stop();
3        }).then([repair_metas, from] {
4            rlogger.debug("Removed all repair_meta for single node {}", from);
5        });
```

Since 3.1, we start 16 repair instances in parallel which will create 16
readers.The reader semaphore is 10.

At line 2, it calls

```
6    future<> stop() {
7       auto gate_future = _gate.close();
8       auto writer_future = _repair_writer.wait_for_writer_done();
9       return when_all_succeed(std::move(gate_future), std::move(writer_future));
10    }
```

The gate protects the reader to read data from disk:

```
11 with_gate(_gate, [] {
12   read_rows_from_disk
13        return _repair_reader.read_mutation_fragment() --> calls reader() to read data
14 })
```

So line 7 won't return until all the 16 readers return from the call of
reader().

The problem is, the reader won't release the reader semaphore until the
reader is destroyed!
So, even if 10 out of the 16 readers have finished reading, they won't
release the semaphore. As a result, the stop() hangs forever.

To fix in short term, we can delete the reader, aka, drop the the
repair_meta object once it is stopped.

Refs: #4693
2019-07-15 21:51:57 +02:00
Benny Halevy
0e4567c881 table: document _sstables_lock/_sstable_deletion_sem locking order
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-07-15 19:20:35 +03:00
Botond Dénes
135c84c29a tests: add unit tests for the data stream split in compaction 2019-07-15 17:38:00 +03:00
Botond Dénes
719ad51bea random_schema: generate_random_mutations(): ensure partitions are unique
Duplicate partitions can appear as a result of the same partition key
generated more than once. For now we simply remove any duplicates. This
means that in some circumstances there will be less partitions generated
than asked.
2019-07-15 17:38:00 +03:00
Botond Dénes
eaedbed069 tests/random: add get_bool() overload with random engine param 2019-07-15 17:38:00 +03:00
Botond Dénes
057f9aa655 random_schema: add ttl and expiry support
When generating data, the user can now also generate ttls and
expiry for all generated atoms. This happens in a controlled way, via a
generator functor, very similar to how the timestamps are generated.
This functor is also used by `random_schema` to generate `deletion_time`
for all tombstones, so the user now has full control of when all of the
atoms can be GC'd.
2019-07-15 17:38:00 +03:00
Botond Dénes
76a853e345 tests/random_schema: generate_random_mutations(): generate partition tombstone 2019-07-15 17:38:00 +03:00
Botond Dénes
4d9f3e5705 data_model: extend ttl and expiry support 2019-07-15 17:38:00 +03:00
Botond Dénes
96d3c1efb1 random_schema: generate_random_mutations(): restore indentation 2019-07-15 17:38:00 +03:00
Botond Dénes
b26fe76fc1 tests: random_schema: futurize generate_random_mutations()
To avoid reactor stalls when generate many and/or large partitions.
2019-07-15 17:38:00 +03:00
Botond Dénes
cf135c6257 tests/random_schema: generate_random_mutations(): allow customizing generated data
Allow callers to specify the number of partitions generated, as well as
the number of clustering rows and range tombstones generated per
partition.
2019-07-15 17:38:00 +03:00
Botond Dénes
d2930ffa53 tests/random_schema: add assert to make_clustering_key()
Verify that the schema *does* indeed have clustering columns. Better an
assert than a cryptic "division by 0" exception deeper in the call stack.
2019-07-15 17:38:00 +03:00
Botond Dénes
d90ac6bd7b tests/random_schema: generate_random_mutations(): remove engine parameter
Use an internally create instance of random engine. Passing a readily
seeded engine from the outside is pointless now that we have a mechanism
to seed entire test suites with a command line algorithm: the internal
engine is seeded from tests::random, so the seed of the test suite
determines the internal seed as well.

Update the sole user of this method (mutation_writer_test.cc) to not
generate local seeds anymore.
2019-07-15 17:38:00 +03:00
Botond Dénes
fd2f53f292 tests: mutation_writer_test.cc/generate_mutations() -> random_schema.hh/generate_random_mutations()
We plan on allowing other tests to use this method. The first step is to
make it available in a header.
2019-07-15 17:38:00 +03:00
Botond Dénes
7a4a609e88 Introduce Garbage Collected Consumer to Mutation Compactor
Introduce consumer in mutation compactor that will only consume
data that is purged away from regular consumer. The goal is to
allow compaction implementation to do whatever it wants with
the garbage collected data, like saving it for preventing
data resurrection from ever happening, like described in
issue #4531.
noop_compacted_fragments_consumer is made available for users that
don't need this capability.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-07-15 17:38:00 +03:00
Botond Dénes
4c2781edaa row_marker: add garbage_collector
The new collector parameter is a pointer to a
`compaction_garbage_collector` implementation. This collector is passed
the row_marker when it expired and would be discarded.
The collector param is optional and defaults to nullptr.
2019-07-15 17:38:00 +03:00
Botond Dénes
7db2006162 row_marker: de-inline compact_and_expire() 2019-07-15 17:38:00 +03:00
Botond Dénes
4c7a7ffe8f row: add garbage_collector
The new collector parameter is a pointer to a
`compaction_garbage_collector` implementation. This collector is passed
all atoms that are expired and can would be discarded. The body of
`compact_and_expire()` was changed so that it checks cells' tombstone
coverage before it checks their expiry, so that cells that are both
covered by a tombstone and also expired are not passed to the collector.
The collector is forwarded to
`collection_type_impl::mutation::compact_and_expire()` as well.
The collector param is optional and defaults to nullptr
2019-07-15 17:38:00 +03:00
Botond Dénes
307b48794d collection_type_impl::mutation: compact_and_expire() add collector parameter
The new collector parameter is a pointer to a
`compaction_garbage_collector` implementation. This collector is passed
all atoms that are expired and would be discarded. The body of
`compact_and_expire()` was changed so that it checks cells' tombstone
coverage before it checks their expiry, so that cells that are both
covered by a tombstone and also expired are not passed to the collector.
The collector param is optional and defaults to nullptr. To accommodate
the collector, which needs to know the column id, a new `column_id`
parameter was added as well.
2019-07-15 17:37:55 +03:00
Calle Wilund
1ed9a44396 utils::config_file: Propagare broadcast_to_all_shards to dependent files
Fixes #4713

Modifying config files to use sharded storage misses the fact
that extensions are allowed to add non-member config fields to
the main configuration, typically from "extra" config_file
objects.

Unless those "extra" files are broadcast when main file broadcast,
the values will not be readable from other shards.

This patch propagates the broadcast to all other config files
whose entries are in the top level object. This ensures we
always keep data up to date on config reload.

Message-Id: <20190715135851.19948-1-calle@scylladb.com>
2019-07-15 17:02:09 +03:00
Nadav Har'El
9cc9facbea configure.py: atomically overwrite build.ninja
configure.py currently takes some time to write build.ninja. If the user
interrupts (e.g., control-C) configure.py, it can leave behind a partial
or even empty build.ninja file. This is most frustrating when the user
didn't explicitly run "configure.py", but rather just ran "ninja" and
ninja decided to run configure.py, and after interrupting it the user
cannot run "ninja" again because build.ninja is gone. Another result of
losing build.ninja is that the user now needs to remember which parameters
to run "configure.py", because the old ones stored in build.ninja were lost.

The solution in this patch is simple: We write the new build.ninja contents
into a temporary file, not directly into build.ninja. Then, only when the
entire file has been succesfully written, do we rename the temporary file
to its intended name - build.ninja.

Fixes #4706

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190715122129.16033-1-nyh@scylladb.com>
2019-07-15 15:34:48 +03:00
Botond Dénes
5002ebb73f Introduce compaction_garbage_collector interface
This interface can be used to implement a garbage collector that
collects atoms that are purged due to expiry during compaction.
The intended usage is collecting purged atoms for safekeeping until the
compaction process finishes safely, to be dropped only at the end when
the compaction is known to have finished successfully.
2019-07-15 15:30:43 +03:00
Eliran Sinvani
997a146c7f auth: Prevent race between role_manager and pasword_authenticator
When scylla is started for the first time with PasswordAuthenticator
enabled, it can be that a record of the default superuser
will be created in the table with the can_login and is_superuser
set to null. It happens because the module in charge of creating
the row is the role manger and the module in charge of setting the
default password salted hash value is the password authenticator.
Those two modules are started together, it the case when the
password authenticator finish the initialization first, in the
period until the role manager completes it initialization, the row
contains those null columns and any loging attempt in this period
will cause a memory access violation since those columns are not
expected to ever be null. This patch removes the race by starting
the password authenticator and autorizer only after the role manger
finished its initialization.

Tests:
  1. Unit tests (release)
  2. Auth and cqlsh auth related dtests.

Fixes #4226

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190714124839.8392-1-eliransin@scylladb.com>
2019-07-14 16:19:57 +03:00
Rafael Ávila de Espíndola
67c624d967 Add documentation for large_rows and large_cells
Fixes #4552

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190614151907.20292-1-espindola@scylladb.com>
2019-07-12 19:21:26 +03:00
Amnon Heiman
1c6dec139f API: compaction_manager add get pending tasks by table
The pending tasks by table name API return an array of pending tasks by
keyspace/table names.

After this patch the following command would work:
curl -X GET 'http://localhost:10000/compaction_manager/metrics/pending_tasks_by_table'

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-07-12 19:21:26 +03:00
Takuya ASADA
842f75d066 reloc: provide libthread_db.so.1 to debug thread on gdb
In scylla-debuginfo package, we have /usr/lib/debug/opt/scylladb/libreloc/libthread_db-1.0.so-666.development-0.20190711.73a1978fb.el7.x86_64.debug
but we actually does not have libthread_db.so.1 in /opt/scylladb/libreloc
since it's not available on ldd result with scylla binary.

To debug thread, we need to add the library in a relocatable package manually.

Fixes #4673

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190711111058.7454-1-syuu@scylladb.com>
2019-07-12 19:21:26 +03:00
Piotr Sarna
ac7531d8d9 db,hints: decouple in-flight hints limits from resource manager
The resource manager is used to manage common resources between
various hints managers. In-flight hints used to be one of the shared
resources, but it proves to cause starvation, when one manager eats
the whole limit - which may be especially painful if the background
materialized views hints manager starves the regular hints manager,
which can in turn start failing user writes because of admission control.
This patch makes the limit per-manager again,
which effectively reverts the limit to its original behavior.

Fixes #4483
Message-Id: <8498768e8bccbfa238e6a021f51ec0fa0bf3f7f9.1559649491.git.sarna@scylladb.com>
2019-07-12 19:21:26 +03:00
Rafael Ávila de Espíndola
4e7ffb80c0 cql: Fix use of UDT in reversed columns
We were missing calls to underlying_type in a few locations and so the
insert would think the given literal was invalid and the select would
refuse to fetch a UDT field.

Fixes #4672

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190708200516.59841-1-espindola@scylladb.com>
2019-07-12 19:21:26 +03:00
Kamil Braun
60a4867a5b Fix infinite looping when performing a range query on system.size_estimates.
Queries to system.size_estimates table which are not single parition queries
caused Scylla to go into an infinite loop inside multishard_combining_reader::fill_buffer.
This happened because multishard_combinind_reader assumes that shards return rows belonging
to separate partitions, which was not the case for size_estimates_mutation_reader.
This commit fixes the issue and closes #4689.
2019-07-12 18:09:15 +02:00
Kamil Braun
ba5a02169e Fix segmentation fault when querying system.size_estimates for an empty keyspace. 2019-07-12 18:02:10 +02:00
Kamil Braun
a1665b74a9 Refactor size_estimates_virtual_reader
Move the implementation of size_estimates_mutation_reader
to a separate compilation unit to speed up compilation times
and increase readability.

Refactor tests to use seastar::thread.
2019-07-12 17:53:00 +02:00
Benny Halevy
6dad9baa1c table: disable_sstable_write: acquire _sstable_deletion_sem
`disable_sstable_write` needs to acquire `_sstable_deletion_sem`
to properly synchronize with background deletions done by
`on_compaction_completion` to ensure no sstables will be created
or deleted during `reshuffle_sstables` after
`storage_service::load_new_sstables` disables sstable writes.

Fixes #4622

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-07-11 12:14:44 +03:00
Benny Halevy
bbbd749f70 table: uninline enable_sstable_write
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-07-11 12:14:44 +03:00
Benny Halevy
c6bad3f3c2 table: reshuffle_sstables: add log message
To mark the point in time writes are disabled and
scanning of the data directory is beginning.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-07-11 12:14:44 +03:00
Asias He
aa8d7af4f0 repair: Enable rpc stream in row level repair
Add the row_level_diff_detect_algorithm::send_full_set_rpc_stream as
supported algo. If both repair master and followers support it, the
master will use the rpc stream interface, otherwise use the old rpc verb
interface.
2019-07-11 08:59:48 +08:00
Asias He
38b72b398b repair: Wrap with foreign_ptr to avoid cross cpu free
The moved set_diff and rows will be freed on the target cpu instead of the
source cpu, which will cause a lot of cross-cpu frees.

To fix, wrap them in foreign_ptr.
2019-07-11 08:59:48 +08:00
Asias He
06c84be257 repair: Futurize get_repair_rows_size and row_buf_size
To prevent stall when number of rows inside row buf is large.
2019-07-11 08:36:39 +08:00
Asias He
809c992b30 repair: Avoid calling get_repair_rows_size in get_sync_boundary
Instead of calling get_repair_rows_size() which might stall with large
number of rows, return the size of the rows from read_rows_from_disk.
2019-07-11 08:36:39 +08:00
Asias He
4d41f8e57e repair: Futurize row_buf_csum
To prevent stall when number of rows inside row buf is large.
2019-07-11 08:36:39 +08:00
Asias He
0ef167c9c8 repair: Yield inside get_set_diff
get_set_diff always runs inside a thread, so we can
thread::maybe_yield() to avoid stall.
2019-07-11 08:36:39 +08:00
Asias He
f871d9edd4 repair: Use get_repair_rows_size helper in get_sync_boundary
We have a helper get_repair_rows_size to get the row size in the list.
2019-07-11 08:36:39 +08:00
Asias He
ccbc9fb0ca repair: Avoid stall in do_estimate_partitions_on_local_shard
Do not use boost::accumulate which does not yield. Use do_for_each for
each sstable to avoid stall.
2019-07-11 08:36:39 +08:00
Asias He
b7b5cb33e8 remove get_row_diff 2019-07-11 08:36:39 +08:00
Rafael Ávila de Espíndola
281f3a69f8 mc writer: Fix exception safety when closing _index_writer
This fixes a possible cause of #4614.

From the backtrace in that issue, it looks like a file is being closed
twice. The first point in the backtrace where that seems likely is in
the MC writer.

My first idea was to add a writer::close and make it the responsibility
of the code using the writer to call it. That way we would move work
out of the destructor.

That is a bit hard since the writer is destroyed from
flat_mutation_reader::impl::~consumer_adapter and that would need to
get a close function too.

This patch instead just fixes an exception safety issue. If
_index_writer->close() throws, _index_writer is still valid and
~writer will try to close it again.

If the exception was thrown after _completed.set_value(), that would
explain the assert about _completed.set_value() being called twice.

With this patch the path outside of the destructor now moves the
writer to a local variable before trying to close it.

Fixes #4614
Message-Id: <20190710171747.27337-1-espindola@scylladb.com>
2019-07-10 19:27:19 +02:00
Paweł Dziepak
eb7d17e5c5 lsa: make sure align_up_for_asan() doesn't cause reads past end of segment
In debug mode the LSA needs objects to be 8-byte aligned in order to
maximise coverage from the AddressSanitizer.

Usually `close_active()` creates a dummy objects that covers the end of
the segment being closed. However, it the last real objects ends in the
last eight bytes of the segment then that dummy won't be created because
of the alignment requirements. This broke exit conditions on loops
trying to read all objects in the segment and caused them to attempt to
dereference address at the end of the segment. This patch fixes that.

Fixes #4653.
2019-07-10 19:19:24 +02:00
Avi Kivity
e32bdb6b90 Merge "Warn user about using SimpleStrategy with Multi DC deployment" from Kamil
"
If the user creates a keyspace with the 'SimpleStrategy' replication class
in a multi-datacenter environment, they will receive a warning in the CQL shell
and in the server logs.

Resolves #4481 and #4651.
"

* 'multidc' of https://github.com/kbr-/scylla:
  Warn user about using SimpleStrategy with Multi DC deployment
  Add warning support to the CQL binary protocol implementation
2019-07-10 16:47:07 +03:00
Avi Kivity
138b28ae43 Merge "Fix command line parsing and add logging." from Kamil
"
Fixes #4203 and #4141.
"

* 'cmdline' of https://github.com/kbr-/scylla:
  Add logging of parsed command line options
  Fix command line argument parsing in main.
2019-07-10 16:40:57 +03:00
Avi Kivity
405fd517b0 Merge "IPv6 support" from Calle
"
Fixes #2027

Modifies inet address type in scylla to use seastar::net::inet_address,
and removes explicit use of ipv4_addr in various network code in favour
of socket_address. Thus capable of resolving and binding to ipv6.

Adds config option to enable/disable ipv6 (default enabled), so
upgrading cluster can continue to work while running mixed version
nodes (since gossip message address serialization becomes different).
"

* 'calle/ipv6' of https://github.com/elcallio/scylla:
  test-serialization: Add small roundtrip test for inet address (v4 + v6)
  inet_address/init: Make ipv6 default enabled
  db::config: Add enable ipv6 switch (default off)
  gms::inet_address: Make serialization ipv6 aware
  Remove usage of inet_address::raw_addr()
  Replace use of "ipv4_addr" with socket_address
  inet_address: Add optional family to lookup
  gms::inet_address: Change inet_address to wrap actual seastar::net::inet_address
  types: Add ipv6_address support
2019-07-10 15:07:56 +03:00
Benny Halevy
b4dc118639 tests: logalloc_test: scale down test_region_groups
Post commit b3adabda2d
(Reduce logalloc differences between debug and release)
logalloc_test's memory footprint has grown, in particular
in test_region_groups, and it triggers the oom killer on
our test automation machines.

This patch scales down this test case so it requires less memory.

Fixes #4669

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-07-10 12:06:10 +02:00
Pekka Enberg
bb53c109b4 test.py: Add option for repeating test execution
This adds a '--repeat N' command line option to test.py, which can be
used to execute the tests N times. This is useful for finding flakey
tests, for example.

Message-Id: <20190710092115.15960-1-penberg@scylladb.com>
2019-07-10 12:42:39 +03:00
Botond Dénes
ce647fac9f timestamp_based_splitting_writer: fix the handling of partition tombstone
Currently the handling of partition tombstones is broken in multiple
ways:
* The partition-tombstone is lost when the bucket is calculated for its
timestamp (due to a misplaced `std::exchange()`).
* When the `partition_start` fragment (containing the partition
tombstone) is actually written to the bucket we emit another
`partition_start` fragment before it because the bucket has not seen
that partition before and we fail to notice that we are actually writing
the partition header.

This bug was allowed to fly under the radar because the unit test was
accidentally not creating partition tombstones in the generated data
(due to a mistake). It was discovered while working on unit tests for
another test and fixing the data generation function to actually
generate partition tombstones.

This patch fixes both problems in the handling of partition tombstones
but it doesn't yet fixes the test. That is deferred until the patch
series which uncovered this bug is merged to avoid merge conflicts.
The other series mentioned here is: [PATCH v6 00/15] compaction: allow
collecting purged data

Fixes: #4683

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190710092427.122623-1-bdenes@scylladb.com>
2019-07-10 12:36:57 +03:00
Pekka Enberg
e6cc90aa98 test: add 'eventually' block to index paging test (#4681)
Without 'eventually', the test is flaky because the index can still
be not up to date while checking its conditions.

Fixes #4670

Tests: unit(dev)
2019-07-10 11:46:03 +03:00
Kamil Braun
d6736a304a Add metric for failed memtable flushes
Resolves #3316.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-10 11:30:10 +03:00
Amnon Heiman
2fbc5ea852 config_file.hh: get_value return a pointer to the value
The get_value method returns a pointer to the value that is used by the
value_to_json method.

The assumption is that the void pointer points to the actual value.

Fixes #4678

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-07-10 10:40:35 +03:00
Piotr Sarna
ebbe038d19 test: add 'eventually' block to index paging test
Without 'eventually', the test is flaky because the index can still
be not up to date while checking its conditions.

Fixes #4670
2019-07-09 17:07:16 +02:00
Asias He
39ca044dab repair: Allow repair when a replica is down
Since commit bb56653 (repair: Sync schema from follower nodes before
repair), the behaviour of handling down node during repair has been
changed.  That is, if a repair follower is down, it will fail to sync
schema with it and the repair of the range will be skipped. This means
a range can not be repaired unless all the nodes for the replicas are up.

To fix, we filter out the nodes that is down and mark the repair is
partial and repair with the nodes that are still up.

Tests: repair_additional_test:RepairAdditionalTest.repair_with_down_nodes_2b_test
Fixes: #4616
Backports: 3.1

Message-Id: <621572af40335cf5ad222c149345281e669f7116.1562568434.git.asias@scylladb.com>
2019-07-09 10:07:36 +03:00
Konstantin Osipov
56f3bda4c7 metrics: introduce a metric for non-local reads
A read which arrived to a non-replica and had to be forwarded to a
replica by the coordinator is accounted in an own metric,
reads_coordinator_outside_replica_set.
Most often such read is produced by a driver which is unaware of
token distribution on the ring.

If a read was forwarded to another replica due to heat weighted
load balancing or query preference set by the user, it's not accounted
in the metric.

In case of a multi-partition read (a query using IN statement,
e.g. x in (1, 2, 3)), if any of the keys is read from a
non-local node the read is accounted as a non-local.
The rationale behind it is that if the user tries to be careful and send
IN queries only to the same vnode, they are rewarded with the counter
staying at zero, while if they send multi-partition IN queries without
any precautions, they will see the metric go up which gives them a
starting point for investigating performance problems.

Closes #4338
2019-07-08 19:23:38 +03:00
Calle Wilund
5dfc356380 test-serialization: Add small roundtrip test for inet address (v4 + v6)
Verify we get back what we put in.
2019-07-08 15:28:21 +00:00
Konstantin Osipov
da1d1b74da metrics: account writes forwarded by a coordinator in an own metric.
Add a metric to account writes which arrived to a non-replica and
had to be forwarded by a coordinator to a replica.

The name of the added metric is 'writes_coordinator_outside_replica_set'.

Do not account forwarded read repair writes, since they are already
accounted by a reads_coordinator_outside_replica_set metric, added in a
subsequent patch.

In scope of #4338.
2019-07-08 18:17:48 +03:00
Calle Wilund
3cfb79e0ff inet_address/init: Make ipv6 default enabled
Makes lookup find any (incl ipv6 numeric) address.
Init will look at enable_ipv6 and use explcit ipv4 family lookup if not
enabled.
2019-07-08 14:13:10 +00:00
Calle Wilund
1f5e1d22bf db::config: Add enable ipv6 switch (default off)
Off by default to prevent problems during cluster migration when
needing to gossip with non-ipv6 aware nodes.
2019-07-08 14:13:09 +00:00
Calle Wilund
c540e36fe2 gms::inet_address: Make serialization ipv6 aware
Because inet_address was initially hardcoded to
ipv4, its wire format is not very forward compatible.
Since we potentially need to communicate with older version nodes, we
manually define the new serial format for inet_address to be:

ipv4: 4  bytes address
ipv6: 4  bytes marker 0xffffffff (invalid address)
      16 bytes data -> address
2019-07-08 14:13:09 +00:00
Calle Wilund
e9816efe06 Remove usage of inet_address::raw_addr() 2019-07-08 14:13:09 +00:00
Calle Wilund
4ef940169f Replace use of "ipv4_addr" with socket_address
Allows the various sockets to use ipv6 address binding if so configured.
2019-07-08 14:13:09 +00:00
Calle Wilund
5ba545f493 inet_address: Add optional family to lookup 2019-07-08 14:13:09 +00:00
Calle Wilund
5fd811ec8a gms::inet_address: Change inet_address to wrap actual seastar::net::inet_address
Thusly handle all types net::inet_address can handle. I.e. ipv6.
2019-07-08 14:13:09 +00:00
Calle Wilund
482fd72ca2 types: Add ipv6_address support
As ipv4, just redirect to inet_address.
2019-07-08 14:09:25 +00:00
Asias He
b7abaa04da repair: Futurize get_row_diff to avoid stall
The copy of _working_row_buf and boost::copy_range can stall if the
number of rows are big. Futurize get_row_diff to avoid stall.
2019-07-08 15:22:16 +08:00
Asias He
a4b24e44a3 repair: Fix possible stall in request_row_hashes
The std::find_if and std::copy can stall if the number of rows are big.
Introduce a helper move_row_buf_to_working_row_buf to move the rows
that yields to avoid stall.
2019-07-08 15:22:16 +08:00
Asias He
b48dc42e73 repair: Allow default construct for repair_row
All members of repair_row are now optional. Enable the default
constructor so that _row_buf.resize() can work.
2019-07-08 15:22:16 +08:00
Asias He
18fb0714a0 repair: Remove apply_rows
It is not used any more. The user now calls
apply_rows_on_master_in_thread and apply_rows_on_follower instead.
2019-07-08 15:22:16 +08:00
Asias He
882530ce26 repair: Run get_row_diff_with_rpc_stream in a thread
So that we can make get_row_diff_source_op run inside a thread, in turn
it can now call apply_rows_on_master_in_thread which eliminates stall.
2019-07-08 15:22:16 +08:00
Asias He
948b833d74 repair: Run get_row_diff_and_update_peer_row_hash_sets inside a thread
So it can use apply_rows_on_master_in_thread which eliminates stall.
2019-07-08 15:22:16 +08:00
Asias He
7f29d13984 repair: Run get_row_diff inside a thread
So it can use apply_rows_on_master_in_thread which elimiates stall.
2019-07-08 15:22:16 +08:00
Asias He
6b2e3946fb repair: Add apply_rows_on_master_in_thread
Like apply_rows, except it runs inside a thread and runs on master node
only.
2019-07-08 15:22:16 +08:00
Asias He
7c6a29027f repair: Add apply_rows_on_follower
Add a version for apply_rows on follower node only.
2019-07-08 15:22:16 +08:00
Asias He
cc14c6e0c4 repair: Futurize working_row_hashes
To avoid stall when the number of rows is big.
2019-07-08 15:22:16 +08:00
Asias He
f3d2ba6ec7 repair: Remove get_full_row_hashes helper
It is a single wrapper for working_row_hashes and is used only once. Remove it.
2019-07-08 15:22:16 +08:00
Benny Halevy
a0499bbd31 lister::guarantee_type: do not follow symlink
Simliar to commit 9785754e0d
lister::guarantee_type needs to check the entry's type,
not the symlink it may point to.

Fixes #4606

The nodetool_refresh_with_wrong_upload_modes_test dtest creates a broken
symlink and following it fails, as it should, with the default follow_symlink::yes

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190626110734.4558-1-bhalevy@scylladb.com>
2019-07-07 15:29:28 +03:00
Avi Kivity
63edd46562 Merge "Expand big decimal with arithmetic operators" from Piotr
"
This miniseries expands big_decimal interface with convenience operators
(-=, +, -), provides test cases for it and makes one of the constructors
explicit.

Tests: unit(dev)
"

* 'expand_big_decimal_interface' of https://github.com/psarna/scylla:
  utils: make string-based big decimal constructor explicit
  tests: add more operators to big decimal tests
  utils: add operators to big_decimal
2019-07-06 12:26:08 +03:00
Avi Kivity
24caf0824d Merge "Complete the LIKE operator" from Dejan
"
Implement LIKE parsing, intermediate representation, and query processing. Add tests
for this implementation (leaving the LIKE functionality tests in
tests/like_matcher_test.cc).

Refs #4477.
"

* 'finish-like' of https://github.com/dekimir/scylla:
  cql3: Add LIKE operator to CQL grammar
  cql3: Ensure LIKE filtering for partition columns
  cql3: Add LIKE restriction
  cql3: Add LIKE relation
2019-07-06 12:26:08 +03:00
kbr-
8995945052 Implement tuple_type_impl::to_string_impl. (#4645)
Resolves #4633.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-06 12:26:08 +03:00
Avi Kivity
187859ad78 review-checklist: mention that the guidelines are not absolute rules and can be overridden 2019-07-06 12:26:08 +03:00
Kamil Braun
c0915c40eb Warn user about using SimpleStrategy with Multi DC deployment
If the user creates a keyspace with the 'SimpleStrategy' replication class
in a multi-datacenter environment, they will receive a warning in the CQL shell
and in the server logs.
Resolves #4481.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-05 09:25:03 +02:00
Kamil Braun
35dbe9371c Add warning support to the CQL binary protocol implementation
The CQL binary protocol v4 adds support for server-side warnings:
https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v4.spec
This adds a convenient API to add warnings to messages returned to the user.
Resolves #4651.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-05 09:24:56 +02:00
Kamil Braun
2f0f53ac72 Add logging of parsed command line options
The recognized command line options are now being printed when Scylla is run,
together with the whole command used.
Fixes #4203.
2019-07-05 09:00:28 +02:00
Piotr Sarna
eed2543bcc utils: make string-based big decimal constructor explicit
As a rule of thumb, single-parameter constructors should be explicit
in order to avoid unexpected implicit conversions.
2019-07-04 11:33:00 +02:00
Piotr Sarna
7e722f8dd5 tests: add more operators to big decimal tests 2019-07-04 11:32:57 +02:00
Piotr Sarna
a5e41408ec utils: add operators to big_decimal
For convenience, operators -=, + and - are implemented on top of +=.
2019-07-04 11:32:53 +02:00
Dejan Mircevski
6727e8f073 cql3: Add LIKE operator to CQL grammar
Extend the grammar with LIKE and add CQL query tests for it.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 11:01:13 +02:00
Dejan Mircevski
1c583de8bb cql3: Ensure LIKE filtering for partition columns
Partition columns are implicitly filtered whenever possible, avoiding
expensive post-processing.  But there are exceptions, eg, when
partition key is only partially restricted, or for CONTAINS
expressions.  Here we add LIKE to this list of exceptions.

Also fix compute_bounds() to punt on LIKE restrictions, which cannot
be translated into meaningful bounds.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 10:59:13 +02:00
Dejan Mircevski
63cec653e5 cql3: Add LIKE restriction
This restriction leverages like_matcher to perform filtering.

Make single_column_relation::new_LIKE_restriction() return this new
restriction.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 10:58:56 +02:00
Dejan Mircevski
21d7722594 cql3: Add LIKE relation
Add a new type of relation with operator LIKE.  Handle it in
relation::to_restriction by introducing a new virtual method for it.
The temporary implementation of this method returns null; that will be
replaced in a subsequent patch.

Add abstract_type::is_string() to recognize string columns and
disallow LIKE operator on non-string columns.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 10:54:30 +02:00
Kamil Braun
f155a2d334 Fix command line argument parsing in main.
Command line arguments are parsed twice in Scylla: once in main and once in Seastar's app_template::run.
The first parse is there to check if the "--version" flag is present --- in this case the version is printed
and the program exists.  The second parsing is correct; however, most of the arguments were improperly treated
as positional arguments during the first parsing (e.g., "--network host" would treat "host" as a positional argument).
This happened because the arguments weren't known to the command line parser.
This commit fixes the issue by moving the parsing code until after the arguments are registered.
Resolves #4141.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-03 14:11:34 +02:00
Avi Kivity
8a0c4d508a Merge "Repair switch to rpc stream" from Asias
"
The put_row_diff, get_row_dif and get_full_row_hashes verbs are switched
to use rpc stream instead of rpc verb. They are the verbs that could
send big rpc messages. The rpc stream sink and source are created per
repair follower for each of the above 3 verbs. The sink and source are
shared for multiple requests during the entire repair operation for a
given range, so there is no overhead to setup rpc stream.

The row buffer is now increased to 32MiB from 256KiB, giving better
bandwidth in high latency links. The downside of bigger row buffer is
reduced possibility that all the rows inside a row buffer are identical.
This causes more full hashes to be exchanged. To address this issue, the
plan is to add better set reconciliation algorithm in addition to the
current send full hashes.

I compared rebuild using regular stream plan with repair using rpc
stream. With 2 nodes, 1 smp, 8M rows, delete all data on one of the
node before repair or rebuild.

    repair using seastar rpc verb

Time to complete: 82.17s

    rebuild using regular streaming which uses seastar rpc stream

Time to complete: 63.87s

    repair using seastar rpc stream

Time to complete: 68.48s

For 1) and 3), the improvement is 16.6% (repair using rpc verb v.s. repair using rpc stream)

For 2) and 3), the difference is 7.2% (repair v.s. stream)

The result is promising for the future repair-based bootstrap/replace node operations.

NOTE: We do not actually enable rpc stream in row level repair for now. We
will enable it after we fix the the stall issues caused by handling
bigger row buffers.

Fixes #4581
"

* 'repair_switch_to_rpc_stream_v9' of https://github.com/asias/scylla: (45 commits)
  docs: Add RPC stream doc for row level repair
  repair: Mark some of the helper functions static
  repair: Increase max row buf size
  repair: Hook rpc stream version of verbs in row level repair
  repair: Add use_rpc_stream to repair_meta
  repair: Add is_rpc_stream_supported
  repair: Add needs_all_rows flag to put_row_diff
  repair: Optimize get_row_diff
  repair: Register repair_get_full_row_hashes_with_rpc_strea
  repair: Register repair_put_row_diff_with_rpc_stream
  repair: Register repair_get_row_diff_with_rpc_stream
  repair: Add repair_get_full_row_hashes_with_rpc_stream_handler
  repair: Add repair_put_row_diff_with_rpc_stream_handler
  repair: Add repair_get_row_diff_with_rpc_stream_handler
  repair: Add repair_get_full_row_hashes_with_rpc_stream_process_op
  repair: Add repair_put_row_diff_with_rpc_stream_process_op
  repair: Add repair_get_row_diff_with_rpc_stream_process_op
  repair: Add put_row_diff_with_rpc_stream
  repair: Add put_row_diff_sink_op
  repair: Add put_row_diff_source_op
  ...
2019-07-03 10:08:55 +03:00
Asias He
f686f0b9d6 docs: Add RPC stream doc for row level repair
This documents RPC stream usage in row level repair.
2019-07-03 08:09:57 +08:00
Asias He
78ae5af203 repair: Mark some of the helper functions static
They are used only inside repair/row_level.cc. Make them static.
2019-07-03 08:09:57 +08:00
Asias He
e8c13444ba repair: Increase max row buf size
If the cluster supports row level repair with rpc stream interface, we
can use bigger row buf size to have better repair bandwidth in high
latency links.
2019-07-03 08:01:37 +08:00
Asias He
7d08a8d223 repair: Hook rpc stream version of verbs in row level repair
If rpc stream is supported, use the rpc stream version of the
get_row_diff, put_row_diff, get_full_row_hashes.
2019-07-03 08:01:37 +08:00
Asias He
fccaa0324f repair: Add use_rpc_stream to repair_meta
Determine if rpc stream should be used.
2019-07-03 08:01:37 +08:00
Asias He
7bf0c646be repair: Add is_rpc_stream_supported
Given a row_level_diff_detect_algorithm, return if this algo supports
rpc stream interface.
2019-07-03 08:01:04 +08:00
Asias He
1c92643f02 repair: Add needs_all_rows flag to put_row_diff
So we can avoid copy _working_row_buf in get_row_diff on master node if
there is only one follower node and all repair rows are needed by
follower node.
2019-07-03 07:56:22 +08:00
Asias He
6595417567 repair: Optimize get_row_diff
Move _working_row_buf instead of copy if it is follower node or
it is master node with only one follow. In these cases, the
_working_row_buf will not be used after this function, so we can move
it.
2019-07-03 07:56:22 +08:00
Asias He
c4eb0ee361 repair: Register repair_get_full_row_hashes_with_rpc_strea
Register the get_full_row_hashes rpc stream verb.
2019-07-03 07:56:22 +08:00
Asias He
b56cced5b8 repair: Register repair_put_row_diff_with_rpc_stream
Register the put_row_diff rpc stream verb.
2019-07-03 07:56:22 +08:00
Asias He
67130031b1 repair: Register repair_get_row_diff_with_rpc_stream
Register the get_row_diff rpc stream verb.
2019-07-03 07:56:22 +08:00
Asias He
f255f902bd repair: Add repair_get_full_row_hashes_with_rpc_stream_handler
It is the handler for the get_full_row_hashes rpc stream verb on the
receiving side.
2019-07-03 07:56:17 +08:00
Asias He
e3267ad98c repair: Add repair_put_row_diff_with_rpc_stream_handler
It is the handler for the put_row_diff rpc stream verb on the
receiving side.
2019-07-03 07:55:24 +08:00
Asias He
06ac014261 repair: Add repair_get_row_diff_with_rpc_stream_handler
It is the handler for the get_row_diff rpc stream verb on the receiving
side.
2019-07-03 07:54:43 +08:00
Asias He
5f25969da3 repair: Add repair_get_full_row_hashes_with_rpc_stream_process_op
It is the helper for the get_full_row_hashes rpc stream verb handler.
2019-07-03 07:54:03 +08:00
Asias He
39d5a9446e repair: Add repair_put_row_diff_with_rpc_stream_process_op
It is the helper for the put_row_diff rpc stream verb handler.
2019-07-03 07:53:21 +08:00
Asias He
049e793fe5 repair: Add repair_get_row_diff_with_rpc_stream_process_op
It is the helper for the get_row_diff rpc stream verb handler.
2019-07-03 07:52:12 +08:00
Avi Kivity
fca1ae69ff database: convert _cfg from a pointer to a reference
_cfg cannot be null, so it can be converted to a reference to
indicate this. Follow-up to fe59997efe.
2019-07-02 17:57:50 +02:00
Calle Wilund
f317d7a975 commitlog: Simplify commitlog extension iteration
Fixes #4640

Iterating extensions in commitlog.cc should mimic that in sstables.cc,
i.e. a simple future-chain. Should also use same order for read and
write open, as we should preserve transformation stack order.

Message-Id: <20190702150028.18042-1-calle@scylladb.com>
2019-07-02 18:37:44 +03:00
Takuya ASADA
332a6931c4 dist/redhat: fix install path of scripts
On recent changes install.sh mistakenly copies dist/common/scripts to
/opt/scylladb/scripts/scripts, it should be /opt/scylladb/scripts.
Same on /opt/scylladb/scyllatop as well.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190702120030.13729-1-syuu@scylladb.com>
2019-07-02 17:29:33 +03:00
Asias He
b1188f299e repair: Add put_row_diff_with_rpc_stream
It is rpc stream version of put_row_diff. It uses rpc stream instead of
rpc verb to put the repair rows to follower nodes.
2019-07-02 21:22:41 +08:00
Asias He
31b30486a7 repair: Add put_row_diff_sink_op
It is a helper that works on the sink() of the put_row_diff
rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
dbe035649b repair: Add put_row_diff_source_op
It is a helper that works on the source() of the put_row_diff rpc
stream verb.
2019-07-02 21:22:41 +08:00
Asias He
72d3563da1 repair: Add get_row_diff_with_rpc_stream
It is rpc stream version of get_row_diff. It uses rpc stream instead of
rpc verb to get the repair rows from follower nodes.
2019-07-02 21:22:41 +08:00
Asias He
4cb44baa08 repair: Add get_row_diff_sink_op
It is a helper that works on the sink() of the get_row_diff rpc
stream verb.
2019-07-02 21:22:41 +08:00
Asias He
a1e19514f9 repair: Add get_row_diff_source_op
It is a helper that works on the source() of the
get_row_diff rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
473bd7599c repair: Add get_full_row_hashes_with_rpc_stream
It is rpc stream version of get_full_row_hashes. It uses rpc stream
instead of rpc verb to get the repair hashes data from follower nodes.
2019-07-02 21:22:41 +08:00
Asias He
1e2a598fe7 repair: Add get_full_row_hashes_sink_op
It is a helper that works on the sink() of the get_full_row_hashes
rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
149c54b000 repair: Add get_full_row_hashes_source_op
It is a helper that works on the source() of the
get_full_row_hashes rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
b3e7299032 repair: Add sink and source object into repair_meta
They will soon be used to sync repair hashes and repair rows bewteen
master and follower nodes.
2019-07-02 21:22:41 +08:00
Asias He
acd40fd529 repair: Add sink_source_for_put_row_diff
Use sink_source_for_repair to define sink_source_for_put_row_diff
with sink = repair_row_on_wire_with_cmd, source = repair_stream_cmd
for REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
4405f7a6ff repair: Add sink_source_for_get_row_diff
Use sink_source_for_repair to define sink_source_for_get_row_diff with
sink = repair_hash_with_cmd, source = repair_row_on_wire_with_cmd for
REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
0bffd07e7e repair: Add sink_source_for_get_full_row_hashes
Use the sink_source_for_repair to define
sink_source_for_get_full_row_hashes with sink = repair_stream_cmd,
source = repair_hash_with_cmd for
REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
8400dafa12 repair: Add sink_source_for_repair helper class
It is used to store the sink and source objects for the rpc stream verbs
used by row level repair.
2019-07-02 21:22:41 +08:00
Asias He
37b3de4ea0 messaging_service: Add REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM support
It is used by row level repair.
2019-07-02 21:18:55 +08:00
Asias He
a7c7ba9765 messaging_service: Add REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM support
It is used by row level repair.
2019-07-02 21:18:55 +08:00
Asias He
dc92bda93b messaging_service: Add REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM support 2019-07-02 21:18:55 +08:00
Asias He
f312c95b74 messaging_service: Add do_make_sink_source helper
It is used by the row level repair rpc stream verbs to make sink and
source object.
2019-07-02 21:18:55 +08:00
Asias He
bc295a00a6 messaging_service: Add rpc stream verb for row level repair
- REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM

Get repair rows from follower nodes

- REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM

Put repair rows to follower nodes

- REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM:

Get full hashes from follower nodes
2019-07-02 21:18:55 +08:00
Asias He
c93113f3a5 idl: Add repair_row_on_wire_with_cmd 2019-07-02 21:18:54 +08:00
Asias He
a90fb24efc idl: Add repair_hash_with_cmd 2019-07-02 21:18:37 +08:00
Asias He
599d40fbe9 idl: Add repair_stream_cmd 2019-07-02 21:18:15 +08:00
Asias He
672c24f6b0 idl: Add send_full_set_rpc_stream for row_level_diff_detect_algorithm 2019-07-02 21:17:36 +08:00
Avi Kivity
c987397e52 transport: reject initial frames with wild body sizes (#4620)
If someone opens a connection to port 9042 and sends some random bytes,
there is a 1 in 64 probability we'll recognize it as a valid frame
(since we only check the version byte, allowing versions 1-4) and we'll
try to read frame.length bytes for the body. If this value is very large,
we'll run out of memory very quickly.

Fix this by checking for reasonable body size (100kB). The initial message
must be a STARTUP, whose body is a [string map] of options, of which just
three are recognized. 100kB is plenty for future expansion.

Note that this does not replace true security on listening ports and
only serves to protect against mistakes, not attacks. An attacker can
easily exhaust server memory by opening many connections and trickle-feeding
them small amounts of data so they appear alive.

We can't use the config item native_transport_max_frame_size_in_mb,
because that can be legitimately large (and the default is atrocious,
256MB).

Fixes #4366.
2019-07-01 19:02:34 +02:00
Tomasz Grabiec
eb496b5eae Merge "Allow changing configuration at runtime" from Avi
This patchset allows changing the configuration at runtime, The user
triggers this by editing the configuration file normally, then
signalling the database with SIGHUP (as is traditional).

The implementation is somewhat complicated due the need to store
non-atomic mutable state per-shard and to synchronize the values in
all shards. This is somewhat similar to Seastar's sharded<>, but that
cannot be used since the configuration is read before Seastar is
initialized (due to the need to read command-line options).

Tests: unit (dev, debug), manual test with extra prints (dev)

Ref #2689
Fixes #2517.
2019-07-01 15:04:59 +02:00
Avi Kivity
28a514820d Update seastar submodule
* seastar a5b9f77d52...44a300cd50 (1):
  > build: fix dpdk library link order

Should fix the build with dpdk enabled.
2019-07-01 11:56:59 +03:00
Takuya ASADA
02c6db29c8 dist/debian: manage *.pyc as a part of package
Since 828b63f4fb only add *.pyc on .rpm
package, we also need it to .deb package.

See #4612

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190629023739.8472-1-syuu@scylladb.com>
2019-06-30 15:54:42 +03:00
Avi Kivity
af2a3859f6 Update seastar submodule
* seastar b629d5ef7a...a5b9f77d52 (6):
  > perftune.py: add comment explaining why we don't log errors when binding NVMe IRQs for all but i3.nonmetal machines
  > sharded: do a two phase shutdown for sharded services
  > chunked_fifo: add iterator
  > perftune.py: fix the i3 metal detection pattern
  > core/memory: remove translation api
  > reactor: file_type: offer option to not follow symbolic links
2019-06-30 11:32:21 +03:00
Avi Kivity
2abe015150 database: allow live update of the compaction_enforce_min_threshold config item
Change the type from bool to updateable_value<bool> throughout the dependency
chain and mark it as live updateable.

In theory we should also observe the value and trigger compaction if it changes,
but I don't think it is worthwhile.
2019-06-28 16:43:25 +03:00
Avi Kivity
c98d1ea942 tests: cql_test_env: prepare config for updateable values
Once we start using updateable_value<>, we must make it refer
to the updateable_value_source<> on the same shard, and to do
that we need to call broadcast_to_all_shards() first (this
creates the per-shard copy).
2019-06-28 16:43:25 +03:00
Avi Kivity
8cffec37aa main: re-read configuration file on SIGHUP
Trap SIGHUP and signal a loop to re-read the configuration file.
2019-06-28 16:43:25 +03:00
Avi Kivity
2ee07bb09b main: preserve config::client_encryption_options configuration source
With dynamically updateable configuration, tracking the source of a value
is more important, since we'll accept or reject updates depending on the source.

Fix the source of client_encryption_options, which we RMW, by preserving the original
source.
2019-06-28 16:43:25 +03:00
Avi Kivity
6061a833a3 config: make values updateable
Replace the per-shard value we store with an updateable_value_source, which
allows updating it dynamically and allows users to track changes.

The broadcast_to_all_shards() function is augmented to apply modifications
when called on a live system.
2019-06-28 16:43:25 +03:00
Avi Kivity
f7de01d082 config: store copies of config items per shard
Since some of our values are not atomic (strings) and the administrative
information needed to track references to values is also not atomic, we will
need to store them per-shard. To do that we add a vector of per-shard data
to config_file, where each element is itself a vector of configuration items.

Since we need to operate generically on items (copying them from shard to shard)
we store them in a type-erased form.

Only mutable state is stored per-shard.
2019-06-28 16:43:25 +03:00
Avi Kivity
fb23cd1ff6 Introduce updatable_value
The updateable_value and updateable_value_source classes allow broadcasting
configuration changes across the application. The updateable_value_source class
represents a value that can be updated, and updateable_value tracks its source
and reflects changes. A typical use replaces "uint64_t config_item" with
"updateable_value<uint64_t> config_item", and from now on changes to the source
will be reflected in config_item. For more complicated uses, which must run some
callback when configuration changes, you can also call
config_item.observe(callback) to be actively notified of changes.
2019-06-28 16:43:25 +03:00
Avi Kivity
8d7c1c7231 db: seed_provider_type: add operator==()
Dynamically updateable configuration requires checking whether configuration items
changed or not, so we can skip firing notifiers for the common case where nothing
changed.

This patch adds a comparison operator for seed_provider_type, which was missing it.
2019-06-28 16:43:25 +03:00
Avi Kivity
da2a98cde6 config: don't allow assignment to config values
Currently, we allow adjusting configuration via

  cfg.whatever() = 5;

by returning a mutable reference from cfg.whatever(). Soon, however, this operation
will have side effects (updating all references to the config item, and triggering
notifiers). While this can be done with a proxy, it is too tricky.

Switch to an ordinary setter interface:

  cfg.whatever.set(5);

Because boost::program_options no longer gets a reference to the value to be written
to, we have to move the update to a notifier, and the value_ex() function has to
be adjusted to infer whether it was called with a vector type after it is
called, not before.
2019-06-28 16:43:25 +03:00
Avi Kivity
b146fd1356 config: make noncopyable
config_file and db::config are soon not going to be copyable. The reason is that
in order to support live updating, we'll need per-shard copies of each value,
and per-shard tracking of references to values. While these can be copied, it
will be an asycnronous operation and thus cannot be done from a copy constructor.

So to prepare for these changes, replace all copies of db::config by references
and delete config_file's copy constructor.

Some existing references had to be made const in order to adapt the const-ness
of db::config now being propagated (rather than being terminated by a non-const
copy).
2019-06-28 16:43:25 +03:00
Avi Kivity
fe59997efe database: don't copy config object
Copying the config object breaks the link between the original and the copied
object, so updates to config items will not be visible. To allow updates, don't
copy any more, and instead keep a pointer.

The pointer won't work will once config is updateable, since the same object is
shared across multiple shard, but that can be addressed later.
2019-06-28 15:20:39 +03:00
Avi Kivity
339699b627 database: remove default constructor
Currently, database::_cfg is a copy of the global configuration. But this means
that we have multiple master copies of the configuration, which makes updating
the configuration harder. In order to eliminate the copy we have to eliminate the
database default constructor, which creates a config object, so that all
remaining constructors can receive config by reference and retain that reference.
2019-06-28 15:20:39 +03:00
Avi Kivity
70d8127400 gossip_test: pass configuration to database object
We want to eliminate the default database constructor (to be explained in
the next patch), so eliminate its only use in gossip_test, using the regular
constructor instead.
2019-06-28 15:20:39 +03:00
Glauber Costa
d916601ea4 toppartitions: fix typo
toppartitons -> toppartitions

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190627160937.7842-1-glauber@scylladb.com>
2019-06-27 19:13:58 +03:00
Tomasz Grabiec
e071445373 Merge "More precise poisoning in logalloc" from Rafael
With this unused descriptors and objects should always be poisoned.

 * https://github.com/espindola/scylla/ align-descriptors-so-that-they-are-poisoned-v4:
 Convert macros to inline functions
 More precise poisoning in logalloc
2019-06-27 16:30:40 +02:00
Takuya ASADA
eabb872789 dist/redhat: install /usr/sbin symlinks correctly
On current scylla.spec, shell glob pattern "scylla_*setup" does not correctly
expanded, it mistakenly created a symlink named "/usr/sbin/scylla_*setup".
We need to expand them, need to create symlinks for each setup scripts.

Fixes #4605

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190627053530.10406-2-syuu@scylladb.com>
2019-06-27 14:22:40 +03:00
Takuya ASADA
828b63f4fb dist/redhat: manage *.pyc as a part of package
Since we don't install .pyc files on our package, python3 will generate .pyc
file when we launch setup script first time.
Then we will have unmanaged files under script directory, it will remain when
Scylla package upgraded / removed.

We need to compile *.py when we generate relocatable package, add compiled .pyc
files on .rpm/.deb packages.

Fixes #4612

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190627053530.10406-1-syuu@scylladb.com>
2019-06-27 14:22:39 +03:00
Rafael Ávila de Espíndola
d8dbacc7f6 More precise poisoning in logalloc
This change aligns descriptors and values to 8 bytes so that poisoning
a descriptor or value doesn't interfere with other descriptors and
values.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-06-26 13:13:48 -07:00
Rafael Ávila de Espíndola
6a2accb483 Convert macros to inline functions
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-06-26 13:13:48 -07:00
Avi Kivity
dd76943125 Merge "Segregate data when streaming by timestamp for time window compaction strategy" from Botond
"
When writing streamed data into sstables, while using time window
compaction strategy, we have to emit a new sstable for each time window.
Otherwise we can end up with sstables, mixing data from wildly different
windows, ruining the compaction strategy's ability to drop entire
sstables when all data within is expired. This gets worse as these mixed
sstables get compacted together with sstables that used to contain a
single time window.

This series provides a solution to this by segregating the data by its
atom's the time-windows. This is done on the new RPC streaming and the
new row-level, repair, memtable-flush and compaction, ensuring that the
segregation requirement is respected at all times.

Fixes: #2687
"

* 'segregate-data-into-sstables-by-time-window-streaming/v2.1' of ssh://github.com/denesb/scylla:
  streaming,repair: restore indentation
  repair: pass the data stream through the compaction strategy's interposer consumer
  streaming: pass the data stream through the compaction strategy's interposer consumer
  TWCS: implement add_interposer_consumer()
  compaction_strategy: add add_interposer_consumer()
  Add mutation_source_metadata
  tests: add unit test for timestamp_based_splitting_writer
  Add timestamp_based_splitting_writer
  Introduce mutation_writer namespace
2019-06-26 19:18:52 +03:00
Tomasz Grabiec
3e30a33e31 Merge "Introduce tests::random_schema" from Botond
Most of our tests use overly simplistic schemas (`simple_schema`) or
very specialized ones that focus on exercising a specific area of the
tested code. This is fine in most places as not all code is schema
dependent, however practice has showed that there can be nasty bugs
hiding in dark corners that only appear with a schema that has a
specific combination of types.

This series introduces `tests::random_schema` a utility class for
generating random schemas and random data for them. An important goal is
to make using random schemas in tests as simple and convenient as
possible, therefore fostering the appearance of tests using random
schemas.

Random schema was developed to help testing code I'm currently working
on, which segregates data by time-windows. As I wasn't confident in my
ability to think of every possible combination of types that can break
my code I came up with random-schema to help me finding these corner
cases. So far I consider it a success, it already found bugs in my code
that I'm not sure I would have found if I had relied on specific
schemas. It also found bugs in unrelated areas of the code which proves
my point in the first paragraph.

* https://github.com/denesb/scylla.git random_schema/v5:
  tests/data_model: approximate to the modeled data structures
  data_value: add ascii constructor
  tests/random-utils.hh: add stepped_int_distribution
  tests/random-utils.hh: get_int() add overloads that accept external
    rand engine
  tests/random-utils.hh: add get_real()
  tests: introduce random_schema
2019-06-26 18:10:20 +02:00
Botond Dénes
12b8405720 streaming,repair: restore indentation
Deferred from the previous two patches.
2019-06-26 18:45:36 +03:00
Botond Dénes
e3f4692868 repair: pass the data stream through the compaction strategy's interposer consumer 2019-06-26 18:45:36 +03:00
Botond Dénes
9c2407573c streaming: pass the data stream through the compaction strategy's interposer consumer 2019-06-26 18:45:36 +03:00
Botond Dénes
ee563928df TWCS: implement add_interposer_consumer()
Exploit the interposer customization point to inject a consumer that will
segregate the mutation stream based on the contained atoms' timestamps,
allowing the requirements of TWCS to be mantained every time sstables
are written to disk.
For the implementation, `timestamp_based_splitting_writer` is used,
with a classifier that maps timestamps to windows.
2019-06-26 18:45:36 +03:00
Tomasz Grabiec
2d3e3640df Merge "Collection: use utils::chunked_vector to store the cells" from Botond
This is a band-aid patch that is supposed to fix the immediate problem
of large collections causing large allocations. The proper fix is to
use IMR but that will take time. In the meanwhile alleviate the
pressure on the memory allocator by using a chunked storage collection
(utils::chunked_vector) instead of std::vector. In the linked issue
seastar::chunked_fifo was also proposed as the container to use,
however chunked fifo is not traversable in reverse which disqualifies
it from this role.

Refs: #3602
2019-06-26 15:32:25 +02:00
Botond Dénes
a280dcfe4c compaction_strategy: add add_interposer_consumer()
This will be the customization point for compaction strategies, used to
inject a specific interposer consumer that can manipulate the fragment
stream so that it satisfies the requirements of the compaction strategy.
For now the only candidate for injecting such an interposer is
time-window compaction strategy, which needs to write sstables that
only contains atoms belonging to the same time-window. By default no
interposer is injected.
Also add an accompanying customization point
`adjust_partition_estimate()` which returns the estimated per-sstable
partition-estimate that the interposer will produce.
2019-06-26 15:45:59 +03:00
Botond Dénes
3ce902a4be Add mutation_source_metadata
This struct contains metadata regarding to a mutation_source. Currently
it contains the min and max timestamp. This will be used later by
compaction strategies to determine whether a given mutation stream has
to be split or not.
2019-06-26 15:45:59 +03:00
Botond Dénes
25d7cbedc0 tests: add unit test for timestamp_based_splitting_writer 2019-06-26 15:45:59 +03:00
Botond Dénes
df29600eec Add timestamp_based_splitting_writer
This writer implements the core logic of time-window based data
segregation. It splits the fragment stream provided by a reader, such
that each atom (cell) in the stream will be written into a consumer
based on the time-window its timestamp belongs to. The end result is
that each consumer will only see fragments, whoose atoms all have
timestamps belonging to the same time-window.
When a mutation fragment has atoms belonging to different time-windows,
it is split into as many fragments as needed so each has only atoms
that belong to the same time-window.
2019-06-26 15:45:59 +03:00
Botond Dénes
2693f1838a Introduce mutation_writer namespace
Currently there is a single mutation_writer: `multishard_writer`,
however in the next path we are going to add another one. This is the
right moment to move these into a common namespace (and folder), we
have way too much stuff scattered already in the top-level namespace
(and folder).
Also rename `tests/multishard_writer_test.cc` to
`tests/mutation_writer_test.cc`, this test-suite will be the home of all
the different mutation writer's unit test cases.
2019-06-26 15:45:59 +03:00
Avi Kivity
adcc95dddc Merge "sstable: mc: reader: Optimize multi-partition scans for data sets with small partitions" from Tomasz
"
Currently, parser and the consumer save its state and return the
control to the caller, which then figures out that it needs to enter a
new partition, and that it doesn't need to skip. We do it twice, after
row end, and after row start. All this work could be avoided if the
consumer installed by the reader adjusted its state and pushed the
fragments on the spot. This patch achieves just that.

This results in less CPU overhead.

The ka/la reader is left still stopping after row end.

Brings a 20% improvement in frag/s for a full scan in perf_fast_forward (Haswell, NVMe):

perf_fast_forward -c1 -m1G --run-tests=small-partition-skips:

Before:

   read    skip      time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    avg aio    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> 1       0         0.952372            4   1000000    1050009        755    1050765    1046585      976.0    971     124256       1       0        0        0        0        0        0        0  99.7%
After:

   read    skip      time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    avg aio    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> 1       0         0.790178            4   1000000    1265538       1150    1266687    1263684      975.0    971     124256       2       0        0        0        0        0        0        0  99.6%

Tests: unit (dev)
"

* 'sstable-optimize-partition-scans' of https://github.com/tgrabiec/scylla:
  sstable: mc: reader: Do not stop parsing across partitions
  sstables: reader: Move some parser state from sstable_mutation_reader to mp_row_consumer_reader
  sstables: reader: Simplify _single_partition_read checking
  sstables: reader: Update stats from on_next_partition()
  sstables: mutation_fragment_filter: Drop unnecessary calls to _walker.out_of_range()
  sstables: ka/la: reader make push_ready_fragments() safe to call many times
  sstables: mc: reader: Move out-of-range check out of push_ready_fragments()
  sstables: reader: Return void from push_ready_fragments()
  sstables: reader: Rename on_end_of_stream() to on_out_of_clustering_range()
  sstables: ka/la: reader: Make sure push_ready_fragments() does not miss to emit partition_end
2019-06-26 13:19:12 +03:00
Avi Kivity
06a9596491 tests: cql_test_env: disable commitlog O_DSYNC
O_DSYNC causes commitlog to pre-allocate each commitlog segment by writing
zeroes into it. In normal operation, this is amortized over the many
times the segment will be reused. In tests, this is wasteful, but under
the default workstation configuration with /tmp using tmpfs, no actual
writes occur.

However on a non-default configuration with /tmp mounted on a real disk,
this causes huge disk I/O and eventually a crash (observed in
schema_change_test). The crash is likely only caused indirectly, as the
extra I/O (exacerbated by many tests running in parallel) xcauses timeouts.

I reproduced this problem by running 15 copies of schema_change_test in
parallel with /tmp mounted on a real filesystem. Without this change, I
usually observe one or two of the copies crashing, with the change they
complete (and much more quickly, too).
2019-06-26 12:15:53 +02:00
Asias He
f0f0beba2e repair: Move the global tracker object into repair_service
The tracker object was a static object in repair.cc. At the time we initialize
it, we do not know the smp::count, so we have to initialize the _repairs
object when it is used on the fly.

    void init_repair_info() {
        if (_repairs.size() != smp::count) {
            _repairs.resize(smp::count);
        }
    }

This introduces a race if init_repair_info is called on different
thread(shard).

To fix, put the tracker object inside the newly introduced
repair_service object which is created in main.cc.

Fixes #4593
Message-Id: <b1adef1c0528354d2f92f8aaddc3c4bee5dc8a0a.1561537841.git.asias@scylladb.com>
2019-06-26 12:53:10 +03:00
Botond Dénes
572a738777 collection: use chunked_vector to store cells
This is quick fix to the immediate problem of large collections causing
large allocations, triggering stalls or OOM. The proper fix is to
use IMR for storing the cells, but that is a complex change that will
require time, so let's not stall/OOM in the meanwhile.
2019-06-26 11:40:44 +03:00
Botond Dénes
c68ffc330e types: don't copy collection_type_impl::mutation_view
Just because its a view its not cheap to copy.
2019-06-26 11:39:41 +03:00
Asias He
fb3f0125ee repair: Add default construct for partition_key_and_mutation_fragments
This is useful when we want to add an empty
partition_key_and_mutation_fragments.
2019-06-26 09:12:55 +08:00
Asias He
3fc53a6b72 repair: Add send_full_set_rpc_stream in row_level_diff_detect_algorithm
It is used to negotiate if the master can use the rpc stream interface
to transfer data.
2019-06-26 09:12:55 +08:00
Asias He
6054a56333 repair: Add repair_row_on_wire_with_cmd
It is used to contain both a repair cmd and repair_row_on_wire object.
2019-06-26 09:12:55 +08:00
Asias He
9f36d775dc repair: Add repair_hash_with_cmd
It is a wrapper contains both a repair cmd and repair_hash object.
2019-06-26 09:12:55 +08:00
Asias He
6b59279e26 repair: Add repair_stream_cmd
It is used by row level repair to add small protocol on top of the rpc stream
interface.
2019-06-26 09:12:55 +08:00
Rafael Ávila de Espíndola
94d2194c77 dht: token: Simplify operator<
While this is a strict weak ordering, it is not obvious and duplicates
a bit of logic. This ptach simplifies it by using tri_compare.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190621204820.37874-1-espindola@scylladb.com>
2019-06-25 19:06:30 +03:00
Tomasz Grabiec
269e65a8db Merge "Sync schema before repair" from Asias
This series makes sure new schema is propagated to repair master and
follower nodes before repair.

Fixes #4575

* dev.git asias/repair_pull_schema_v2:
  migration_manager: Add sync_schema
  repair: Sync schema from follower nodes before repair
2019-06-25 19:05:29 +03:00
Amos Kong
f0cd589a75 dist: suppress the yaml load warning
YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated,
as the default Loader is unsafe. Please read https://msg.pyyaml.org/load
for full details.

Fix it by use new safe interface - yaml.safe_load()

Signed-off-by: Amos Kong <amos@scylladb.com>
Cc: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <9b68601845117274573474ede0341cc81f80efa6.1561156205.git.amos@scylladb.com>
2019-06-25 19:05:29 +03:00
Avi Kivity
fc629bb14f Merge "cql3: lift infinite bound check" from Benny & Piotr
"
If the database supports infinite bound range deletions,
CQL layer will no longer throw an error indicating that both ranges
need to be specified.

Fixes #432

Update test_range_deletion_scenarios unit test accordingly.
"

* 'cql3-lift-infinite-bound-check' of https://github.com/bhalevy/scylla:
  cql3: lift infinite bound check if it's supported
  service: enable infinite bound range deletions with mc
  database: add flag for infinite bound range deletions
2019-06-25 19:05:29 +03:00
Nadav Har'El
a88c9ca5a5 Merge branch 'add_proper_aggregation_for_paged_indexing_2' of git://github.com/psarna/scylla into next
Piotr Sarna says:

Fixes #4540
This series adds proper handling of aggregation for paged indexed queries.
Before this series returned results were presented to the user in per-page
partial manner, while they should have been returned as a single aggregated
value.

Tests: unit(dev)

Piotr Sarna (8):
  cql3: split execute_base_query implementation
  cql3: enable explicit copying of query_options
  cql3: add a query options constructor with explicit page size
  cql3: add proper aggregation to paged indexing
  cql3: make DEFAULT_COUNT_PAGE_SIZE constant public
  tests: add query_options to cquery_nofail
  tests: add indexing + paging + aggregation test case
  tests: add indexing+paging test case for clustering keys
2019-06-25 19:05:29 +03:00
Avi Kivity
7195f75fb2 Update seastar submodule
* seastar ded50bd8a4...b629d5ef7a (9):
  > sharded: no_sharded_instance_exception: fix grammar
  > core,net: output_stream: remove redundant std::move()
  > perftune: make sure that ethtool -K has a chance of succeeding
  > net/dpdk: upgrade to dpdk-19.05
  > perftune.py: Fix a few more places where we use deprecated pyudev.Device ones
  > reactor: provide an uptime function
  > rpc: add sink::flush() to streaming api
  > Use a table to document the various build modes
  > foreign_ptr: Fix compilation error due to unused variable
2019-06-25 19:05:29 +03:00
Avi Kivity
9d21341733 review-checklist.md: add common checks
- code style
 - naming
 - micro-performance
 - concurrency
 - unit-testing
 - templates and type erasure
 - singletons
2019-06-25 19:05:29 +03:00
Piotr Sarna
efa7951ea5 main: stop view builder conditionally
The view builder is started only if it's enabled in config,
via the view_building=true variable. Unfortunately, stopping
the builder was unconditional, which may result in failed
assertions during shutdown. To remedy this, view building
is stopped only if it was previously started.

Fixes #4589
2019-06-25 19:05:29 +03:00
Asias He
bb5665331c repair: Sync schema from follower nodes before repair
Since commit "repair: Use the same schema version for repair master and
followers", repair master and followers uses the same schema version
that master decides to use during the whole repair operation. If master
has older version of schema, repair could ignore the data which makes use
of the new schema, e.g., writes to new columns.

To fix, always sync the schema agreement before repair.

The master node pulls schema from followers and applies locally. The
master then uses the "merged" schema. The followers use
get_schema_for_write() to pull the "merged" schema.

Fixes #4575
Backports: 3.1
2019-06-25 17:13:47 +08:00
Asias He
14c1a71860 migration_manager: Add sync_schema
Makes sure this node knows about all schema changes known by
"nodes" that were made prior to this call.

Refs: #4575
Backports: 3.1
2019-06-25 17:13:47 +08:00
Botond Dénes
d00cb4916c tests: introduce random_schema
random_schema is a utility class that provides methods for generating
random schemas as well as generating data (mutations) for them. The aim
is to make using random schemas in tests as simple and convenient as
is using `simple_schema`. For this reason the interface of
`random_schema` follows closely that of `simple_schema` to the extent
that it makes sense. An important difference is that `random_schema`
relies on `data_model` to actually build mutations. So all its
mutation-related operations work with `data_model::mutation_descrition`
instead of actual `mutation` objects. Once the user arrived at the
desired mutation description they can generate an actual mutation via
`data_model::mutation_description::build()`.

In addition to the `random_schema` class, the `random_schema.hh` header
exposes the generic utility classes for generating types and values
that it internally uses.

random_schema is fully deterministic. Using the same seed and the same
set of operations is guaranteed to result in generating the same schema
and data.
2019-06-25 12:01:33 +03:00
Botond Dénes
070d72ee23 tests/random-utils.hh: add get_real() 2019-06-25 12:01:33 +03:00
Botond Dénes
2d9f6c3b63 tests/random-utils.hh: get_int() add overloads that accept external rand engine 2019-06-25 12:01:33 +03:00
Botond Dénes
2a7710129e tests/random-utils.hh: add stepped_int_distribution 2019-06-25 12:01:33 +03:00
Botond Dénes
a3f9932a2f data_value: add ascii constructor
To allow a `data_value` with `ascii_type` to be constructed.
2019-06-25 12:01:33 +03:00
Botond Dénes
1bd8b77770 tests/data_model: approximate to the modeled data structures
Make the the data modelling structures model their "real" counterparts
more closely, allowing the user greater control on the produced data.
The changes:
* Add timestamp to atomic_value (which is now a struct, not just an
    alias to bytes).
* Add tombstone to collection.
* Add row_tombstone to row.
* Add bound kinds and tombstone to range_tombstone.

Great care was taken to preserve backward compatibility, to avoid
unnecessary changes in existing code.
2019-06-25 12:01:33 +03:00
Piotr Sarna
add40d4e59 cql3: lift infinite bound check if it's supported
If the database supports infinite bound range deletions,
CQL layer will no longer throw an error indicating that both ranges
need to be specified.

[bhalevy] Update test_range_deletion_scenarios unit test accordingly.

Fixes #432

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-24 15:58:34 +03:00
Piotr Sarna
c19fdc4c90 service: enable infinite bound range deletions with mc
As soon as it's agreed that the cluster supports sstables in mc format,
infinite bound range deletions in statements can be safely enabled.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-24 15:58:28 +03:00
Piotr Sarna
e77ef849af database: add flag for infinite bound range deletions
Database can only support infinite bound range deletions if sstable mc
format is supported. As a first step to implement these checks,
an appropriate flag is added to database.
2019-06-24 15:57:47 +03:00
Piotr Sarna
b668ee2b2d tests: add indexing+paging test case for clustering keys
Indexing a non-prefix part of the clustering key has a separate
code path (see issue #3405), so it deserves a separate test case.
2019-06-24 14:51:17 +02:00
Piotr Sarna
3d9a37f28f tests: add indexing + paging + aggregation test case
Indexed queries used to erroneously return partial per-page results
for aggregation queries. This test case used to reproduce the problem
and now ensures that there would be no regressions.

Refs #4540
2019-06-24 14:06:42 +02:00
Piotr Sarna
60cafcc39c tests: add query_options to cquery_nofail
The cquery_nofail utility is extended, so it can accept custom
query options, just like execute_cql does.
2019-06-24 14:06:41 +02:00
Piotr Sarna
fe18638de3 cql3: make DEFAULT_COUNT_PAGE_SIZE constant public
The constant will be later used in test scenarios.
2019-06-24 13:21:37 +02:00
Piotr Sarna
bb08af7e68 cql3: add proper aggregation to paged indexing
Aggregated and paged filtering needs to aggregate the results
from all pages in order to avoid returning partial per-page
results. It's a little bit more complicated than regular aggregation,
because each paging state needs to be translated between the base
table and the underlying view. The routine keeps fetching pages
from the underlying view, which are then used to fetch base rows,
which go straight to the result set builder.

Fixes #4540
2019-06-24 13:21:32 +02:00
Piotr Sarna
97d476b90f cql3: add a query options constructor with explicit page size
For internal use, there already exists a query_options constructor
that copies data from another query_options with overwritten paging
state. This commit adds an option to overwrite page size as well.
2019-06-24 13:21:32 +02:00
Piotr Sarna
fa89e220ef cql3: enable explicit copying of query_options 2019-06-24 12:57:04 +02:00
Piotr Sarna
7a8b243ce4 cql3: split execute_base_query implementation
In order to handle aggregation queries correctly, the function that
returns base query results is split into two, so it's possible to
access raw query results, before they're converted into end-user
CQL message.
2019-06-24 12:57:03 +02:00
Benny Halevy
b1e78313fe log_histogram: log_heap_options::bucket_of: avoid calling pow2_rank(0)
pow2_rank is undefined for 0.
bucket_of currently works around that by using a bitmask of 0.
To allow asserting that count_{leading,trailing}_zeros are not
called with 0, we want to avoid it at all call sites.

Fixes #4153

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190623162137.2401-1-bhalevy@scylladb.com>
2019-06-23 19:32:51 +03:00
Avi Kivity
779b378785 Merge "Fix partitioned_sstable_set by making it self sufficient" from Raphael & Benny
"
partitioned_sstable_set is not self sufficient because it relies on
compatible_ring_position_view, which in turn relies on lifetime of
sstable object. This leads to use-after-free. Fix this problem by
introducing compatible_ring_position and using it in p__s__s.

Fixes #4572.

Test: unit (dev), compaction dtests (dev)
"

* 'projects/fix_partitioned_sstable_set/v4' of ssh://github.com/bhalevy/scylla:
  tests: Test partitioned sstable set's self-sufficiency
  sstables: Fix partitioned_sstable_set by making it self sufficient
  Introduce compatible_ring_position and compatible_ring_position_or_view
2019-06-23 17:17:18 +03:00
Raphael S. Carvalho
14fa7f6c02 tests: Test partitioned sstable set's self-sufficiency
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-23 16:29:13 +03:00
Raphael S. Carvalho
293557a34e sstables: Fix partitioned_sstable_set by making it self sufficient
Partitioned sstable set is not self sufficient, because it uses compatible_ring_position_view
as key for interval map, which is constructed from a decorated key in sstable object.
If sstable object is destroyed, like when compaction releases it early, partitioned set
potentially no longer works because c__r__p__v would store information that is already freed,
meaning its use implies use-after-free.
Therefore, the problem happens when partitioned set tries to access the interval of its
interval map and uses freed information from c__r__p__v.

Fix is about using the newly introduced compatible_ring_position_or_view which can hold a
ring_position, meaning that partitioned set is no longer dependent on lifetime of sstable
object.

Retire compatible_ring_position_view.hh as it is now unused.

Fixes #4572.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-23 16:29:13 +03:00
Raphael S. Carvalho
9a83561700 Introduce compatible_ring_position and compatible_ring_position_or_view
The motivation for supporting ring position is that containers using
it can be self sufficient. The existing compatible_ring_position_view
could lead to use after free when the ring position data, it was built
from, is gone.

The motivation for compatible_ring_position_or_view is to allow lookup
on containers that don't support different key types using c__r__p,
and also to avoid unnecessary copies.
If the user is provided only with a ring_position_view, c__r__p__or_v
could be built from it and used for lookups.
Converting ring_position_view to ring_position is very bug prone because
there could be information lost in the process.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-23 16:29:12 +03:00
Rafael Ávila de Espíndola
65ac0a831c Add to_string_impl that takes a data_value
Currently to_string takes raw bytes. This means that to print a
data_value it has to first be serialized to be passed to to_string,
which will then deserializes it.

This patch adds a virtual to_string_impl that takes a data_value and
implements a now non virtual to_sting on top of it.

I don't expect this to have a performance impact. It mostly documents
how to access a data_value without converting it to bytes.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190620183449.64779-3-espindola@scylladb.com>
2019-06-23 16:03:06 +03:00
Rafael Ávila de Espíndola
3bd5dd7570 Add a few more tests of data_value::to_string
I found that no tests covered this code while refactoring it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190620183449.64779-2-espindola@scylladb.com>
2019-06-23 16:03:06 +03:00
Nadav Har'El
6e87bca65d storage_proxy: fix race and crash in case of MV and other node shutdown
Recently, in merge commit 2718c90448,
we added the ability to cancel pending view-update requests when we detect
that the target node went down. This is important for view updates because
these have a very long timeout (5 minutes), and we wanted to make this
timeout even longer.

However, the implementation caused a race: Between *creating* the update's
request handler (create_write_response_handler()) and actually starting
the request with this handler (mutate_begin()), there is a preemption point
and we may end up deleting the request handler before starting the request.
So mutate_begin() must gracefully handle the case of a missing request
handler, and not crash with a segmentation fault as it did before this patch.

Eventually the lifetime management of request handlers could be refactored
to avoid this delicate fix (which requires more comments to explain than
code), or even better, it would be more correct to cancel individual writes
when a node goes down, not drop the entire handler (see issue #4523).
However, for now, let's not do such invasive changes and just fix bug that
we set out to fix.

Fixes #4386.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190620123949.22123-1-nyh@scylladb.com>
2019-06-23 16:03:06 +03:00
Asias He
b99c75429a repair: Avoid searching all the rows in to_repair_rows_on_wire
The repair_rows in row_list are sorted. It is only possible for the
current repair_row to share the same partition key with the last
repair_row inserted into repair_row_on_wire. So, no need to search from
the beginning of the repair_rows_on_wire to avoid quadratic complexity.
To fix, look at the last item in repair_rows_on_wire.

Fixes #4580
Message-Id: <08a8bfe90d1a6cf16b67c210151245879418c042.1561001271.git.asias@scylladb.com>
2019-06-23 16:03:06 +03:00
Benny Halevy
883cb4318f Merge pull request #4583 from bhalevy/init-and-shutdown-logging
Init and shutdown logging
2019-06-23 16:03:06 +03:00
Rafael Ávila de Espíndola
3660caff77 Reduce memory used by all tests
Tests without custom flags were already being run with -m2G. Tests
with custom flags have to manually specify it, but some were missing
it. This could cause tests to fail with std::bad_alloc when two
concurrent tests tried to allocate all the memory.

This patch adds -m2G to all tests that were missing it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190620002921.101481-1-espindola@scylladb.com>
2019-06-23 16:03:06 +03:00
Avi Kivity
9229afe64f Merge "Fix infinite paging for indexed queries" from Piotr
"
Fixes #4569

This series fixes the infinite paging for indexed queries issue.
Before this fix, paging indexes tended to end up in an infinite loop
of returning pages with 0 results, but has_more_pages flag set to true,
which confused the drivers.

Tests: unit(dev)
Branches: 3.0, 3.1
"

* 'fix_infinite_paging_for_indexed_queries' of https://github.com/psarna/scylla:
  tests: add test case for finishing index paging
  cql3: fix infinite paging for indexed queries
2019-06-23 16:03:06 +03:00
Takuya ASADA
2135d2ae7f dist/debian: install capabilities.conf on postinst script
We still has "{{^jessie}}" tag on scylla-server systemd unit file to
skip using AmbientCapabilities on Debian 8, but it does not able to work
anymore since we moved to single binary .deb package for all debian variants,
we must share same systemd unit file across all Debian variants.

To do so we need to have separated file on /etc/systemd to define
AmbientCapabilities, create the file while running postinst script only
if distribution is not Debian 8, just like we do in .rpm.

See #3344
See #3486

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190619064224.23035-1-syuu@scylladb.com>
2019-06-23 16:03:06 +03:00
Tomasz Grabiec
46341bd63f gdb: Print coordinator stats related to memory usage from 'scylla memory'
Example:

 Coordinator:
  fg writes:            150
  bg writes:          39980, 21429280 B
  fg reads:               0
  bg reads:               0
  hints:                  0 B
  view hints:             0 B

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1559906745-17150-1-git-send-email-tgrabiec@scylladb.com>
2019-06-23 16:03:06 +03:00
Tomasz Grabiec
f7e79b07d1 lsa: Respect the reclamation step hint from seastar allocator
This will allow us to reduce the amount of segment compaction when
reclaiming on behlaf of a large allocation because we'll evict much
more up front.

Tests:
  - unit (dev)

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1559906584-16770-1-git-send-email-tgrabiec@scylladb.com>
2019-06-23 16:03:06 +03:00
Tomasz Grabiec
c5184b3dd0 gdb: Print region_impl pointer from scylla lsa
Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1559906684-17019-1-git-send-email-tgrabiec@scylladb.com>
2019-06-23 16:03:06 +03:00
Alexys Jacob
98bc9edf6f thrift/: support version 0.11+ after THRIFT-2221
Thrift 0.11 changed to generate c++ code with
std::shared_ptr instead of boost::shared_ptr.

- https://issues.apache.org/jira/browse/THRIFT-2221

This was forcing scylla to stick with older versions
of thrift.

Fixes issue #3097.

thrift: add type aliases to build with old and new versions.

update to using namespace =
2019-06-23 16:03:06 +03:00
Takuya ASADA
e4320d6537 dist/debian: run 'systemctl daemon-reload' automatically on package install/uninstall
Since we cannot use dh --with=systemd because we don't want to
automatically enabling systemd units, manage them by our setup scripts,
we have to do 'systemctl daemon-reload' manually.
(On dh --with=systemd, systemd helper automatically provides such
scirpts)

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190618000210.28972-1-syuu@scylladb.com>
2019-06-23 16:03:06 +03:00
Rafael Ávila de Espíndola
8c067c26d9 Add support for the sanitize build mode in scylla
Running tests in debug mode takes 25:22.08 in my machine. Using
sanitize instead takes that down to 10:46.39.

The mode is opt in, in that it must be explicitly selected with
"configure.py --mode=sanitize" or "ninja sanitize". It must also be
explicitly passed to test.py.

Unfortunately building with asan, optimizations and debug info is
very slow and there is nothing like -gline-tables-only in gcc.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190617170007.44117-1-espindola@scylladb.com>
2019-06-23 16:03:06 +03:00
Benny Halevy
1fd91eb616 main: add logging for deferred stopping
Increase visibility of init messages to help diagnose init
and shutdown issues.

Ref #4384

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-20 13:04:36 +03:00
Benny Halevy
cbbe5a519a main: improve init logging
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-20 13:04:36 +03:00
Benny Halevy
e96b1afdbd supervisor::notify log at info level rather than trace
Increase visibility of init messages to help diagnose init
and shutdown issues.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-20 13:04:36 +03:00
Tomasz Grabiec
fa2ed3ecce sstable: mc: reader: Do not stop parsing across partitions
Currently, parser and the consumer save its state and return the
control to the caller, which then figures out that it needs to enter a
new partition, and that it doesn't need to skip. We do it twice, after
row end, and after row start. All this work could be avoided if the
consumer installed by the reader adjusted its state and pushed the
fragments on the spot. This patch achieves just that.

This results in less CPU overhead.

The ka/la reader is left still stopping after row end.

Brings a 20% improvement in frag/s for a full scan in perf_fast_forward (Haswell, NVMe):

 perf_fast_forward -c1 -m1G  --run-tests=small-partition-skips:

Before:

   read    skip      time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    avg aio    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> 1       0         0.952372            4   1000000    1050009        755    1050765    1046585      976.0    971     124256       1       0        0        0        0        0        0        0  99.7%

After:

   read    skip      time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    avg aio    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> 1       0         0.790178            4   1000000    1265538       1150    1266687    1263684      975.0    971     124256       2       0        0        0        0        0        0        0  99.6%
2019-06-19 14:29:02 +02:00
Tomasz Grabiec
386079472a sstables: reader: Move some parser state from sstable_mutation_reader to mp_row_consumer_reader
This state will be needed by the consumer to handle crossing partition
boundaries on its own.

While at it, document it.
2019-06-19 14:29:02 +02:00
Tomasz Grabiec
92cb07debd sstables: reader: Simplify _single_partition_read checking
The old code was making advance_to_next_partition() behave
incorrectly when _single_partition_read, which was compensated by a
check in read_partition().

Cleaner to exit early.
2019-06-19 14:29:02 +02:00
Tomasz Grabiec
7f4c041ba0 sstables: reader: Update stats from on_next_partition()
After partition_start is emitted directly from the parser's consumer,
read_partition() will not always be called for each produced partition.
2019-06-19 14:29:02 +02:00
Tomasz Grabiec
0964a8fb38 sstables: mutation_fragment_filter: Drop unnecessary calls to _walker.out_of_range()
out_of_range() cannot change to true when the position falls into the
ranges, we only need to check it when it falls outside them.
2019-06-19 14:29:02 +02:00
Tomasz Grabiec
556ccf4373 sstables: ka/la: reader make push_ready_fragments() safe to call many times
Not a bug fix, just makes the implementation more robust against changes.

Before this patch this might have resulted in partition_end being
pushed many times.
2019-06-19 14:29:01 +02:00
Tomasz Grabiec
ef6edff673 sstables: mc: reader: Move out-of-range check out of push_ready_fragments()
Currently, calling push_ready_fragments() with _mf_filter disengaged
or with _mf_filter->out_of_range() causes it to call
_reader->on_out_of_clustering_range(), which emits the partition_end
fragment. It's incorrect to emit this fragment twice, or zero times,
so correctness depends on the fact that push_ready_fragments() is
called exactly once when transitioning between partitions.

This is proved to be tricky to ensure, especially after partition_end
starts to be emitted in a different path as well. Ensuring that
push_ready_fragments() is *NOT* called after partition_end is emitted
from consume_partition_end() becomes tricky.

After having to fix this problem many times after unrelated changes to
the flow, I decide that it's better to refactor.

This change moves the call of on_out_of_clustering_range() out of
push_ready_fragments(), making the latter safe to call any number of
times.

The _mf_filter->out_of_range() check is moved to sites which update
the filter.

It's also good because it gets rid of conditionals.
2019-06-19 14:29:01 +02:00
Tomasz Grabiec
552fe21812 sstables: reader: Return void from push_ready_fragments()
The result is ignored, which is fine, so make it official to avoid
confusion.
2019-06-19 14:29:01 +02:00
Tomasz Grabiec
1488b57933 sstables: reader: Rename on_end_of_stream() to on_out_of_clustering_range()
The old name is confusing, because we're not always ending the stream
when we call it.
2019-06-19 14:29:01 +02:00
Tomasz Grabiec
9b8ac5ecbc sstables: ka/la: reader: Make sure push_ready_fragments() does not miss to emit partition_end
Currently, if there is a fragment in _ready and _out_of_range was set
after row end was consumer, push_ready_fragments() would return
without emitting partition_end.

This is problematic once we make consume_row_start() emit
partiton_start directly, because we will want to assume that all
fragments for the previous partition are emitted by then. If they're
not, then we'd emit partition_start before partition_end for the
previous partition. The fix is to make sure that
push_ready_fragments() emits everything.
2019-06-19 14:14:38 +02:00
Piotr Sarna
b8cadc928c tests: add test case for finishing index paging
The test case makes sure that paging indexes does not result
in an infinite loop.

Refs #4569
2019-06-19 14:10:13 +02:00
Piotr Sarna
88f3ade16f cql3: fix infinite paging for indexed queries
Indexed queries need to translate between view table paging state
and base table paging state, in order to be able to page the results
correctly. One of the stages of this translation is overwriting
the paging state obtained from the base query, in order to return
view paging state to the user, so it can be used for fetching next
pages. Unfortunately, in the original implementation the paging
state was overwritten only if more pages were available,
while if 'remaining' pages were equal to 0, nothing was done.
This is not enough, because the paging state of the base query
needs to be overwritten unconditionally - otherwise a guard paging state
value of 'remaining == 0' is returned back to the client along with
'has_more_pages = true', which will result in an infinite loop.
This patch correctly overwrites the base paging state unconditionally.

Fixes #4569
2019-06-19 14:10:13 +02:00
Tomasz Grabiec
cd1ff1fe02 Merge "Use same schema version for repair nodes" from Asias
This patch set fixes repair nodes using different schema version and
optimizes the hashing thanks to the fact now all nodes uses same schema
version.

Fixes: #4549

* seastar-dev.git asias/repair_use_same_schema.v3:
  repair: Use the same schema version for repair master and followers
  repair: Hash column kind and id instead of column name and type name
2019-06-18 12:42:53 +02:00
Asias He
4285801af9 repair: Hash column kind and id instead of column name and type name
It is guaranteed repair nodes use the same schema. It is faster to hash
column kind and id.

Changing the hashing of mutation fragment causes incompatibility with
mixed clusters. Let's backport to the 3.1 release, which includes row
level repair for the first time and is not released yet.

Refs: #4549
Backports: 3.1
2019-06-18 18:27:21 +08:00
Asias He
3db136f81e repair: Use the same schema version for repair master and followers
Before this patch, repair master and followers use their own schema
version at the point repair starts independently. The schemas can be
different due to schema change. Repair uses the schema to serialize
mutation_fragment and deserialize the mutation_fragment received from
peer nodes. Using different schema version to serialize and deserialize
cause undefined behaviour.

To fix, we use the schema the repair master decides for all the repair
nodes involved.

On top of this patch, we could do another step to make sure all nodes
has the latest schema. But let's do it in a separate patch.

Fixes: #4549
Backports: 3.1
2019-06-18 18:27:21 +08:00
Rafael Ávila de Espíndola
8672eddff2 Document the best practices for when to use asserts/exceptions/logs
The intention is just to document what is currently done. If someone
wants to propose changes, that can be done after the current practices
have been documented.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190524135109.29436-1-espindola@scylladb.com>
2019-06-18 12:13:01 +03:00
Rafael Ávila de Espíndola
26c0814a88 Add test large collection warning
This was already working, but we were not testing for it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190617181706.66490-1-espindola@scylladb.com>
2019-06-18 10:27:55 +02:00
Nadav Har'El
6aab1a61be Fix deciding whether a query uses indexing
The code that decides whether a query should used indexing was buggy - a partition key index might have influenced the decision even if the whole partition key was passed in the query (which effectively means that indexing it is not necessary).

Fixes #4539

Closes https://github.com/scylladb/scylla/pull/4544

Merged from branch 'fix_deciding_whether_a_query_uses_indexing' of git://github.com/psarna/scylla
  tests: add case for partition key index and filtering
  cql3: fix deciding if a query uses indexing
2019-06-18 01:01:14 +03:00
Takuya ASADA
7320c966bc dist/common/scripts/scylla_setup: don't proceed with empty NIC name
Currently NIC selection prompt on scylla_setup just proceed setup when
user just pressed Enter key on the prompt.
The prompt should ask NIC name again until user input correct NIC name.

Fixes #4517
Message-Id: <20190617124925.11559-1-syuu@scylladb.com>
2019-06-17 15:52:29 +03:00
Avi Kivity
938b74f47a Merge "Fix gcc9 build" from Paweł
"
These patches fix remaining issues with gcc9 build, that involve a gcc9 bug, a gcc9 bug, and a stricter warning.

Tests: unit(debug, dev, release).
"

* 'fix-gcc9-build' of https://github.com/pdziepak/scylla:
  dht/ring_position: silence complaints about uninitialised _token_bound
  xx_hasher: disable -Warray-bounds
  api/column_family: work around gcc9 bug in seastar::future<std::any>
2019-06-17 15:23:24 +03:00
Tomasz Grabiec
f798f724c8 frozen_mutation: Guard against unfreezing using wrong schema
Currently, calling unfreeze() using the wrong version of the schema
results in undefined behavior. That can cause hard-to-debug
problems. Better to throw in such cases.

Refs #4549.

Tests:
  - unit (dev)
Message-Id: <1560459022-23786-1-git-send-email-tgrabiec@scylladb.com>
2019-06-17 15:23:24 +03:00
Asias He
f32371727b repair: Avoid copying position in to_repair_rows_list
No need to make a copy because it is not used to construct repair_row
any more since commit 9079790f85 (repair:
Avoid writing row with same partition key and clustering key more than
once). Use mf->position() instead.

Refs: #4510
Backports: 3.1
Message-Id: <7b21edcc3368036b6357b5136314c0edc22ad4d2.1560753672.git.asias@scylladb.com>
2019-06-17 15:23:24 +03:00
Paweł Dziepak
483f66332b dht/ring_position: silence complaints about uninitialised _token_bound 2019-06-17 13:11:20 +01:00
Paweł Dziepak
82b8450922 xx_hasher: disable -Warray-bounds
In release mode gcc9 has a false positive warning about out of bound
access in xxhash implementation:

./xxHash/xxhash.c:799:27: error: array subscript -3 is outside array bounds of ‘long unsigned int [1]’ [-Werror=array-bounds]

This is solved by disabling -Warray-bounds in the xxhash code.
2019-06-17 13:09:54 +01:00
Paweł Dziepak
8a13d96203 api/column_family: work around gcc9 bug in seastar::future<std::any>
There is a gcc9 bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90415
that makes it impossible to pass std::any through a seastar::future<T>.
Fortunately, there is only one user of seastar::future<std::any> in
Scylla and it is not performance-critical. This patch avoids the gcc9
bug by using seastar::future<std::unique_ptr<std::any>>.
2019-06-17 13:06:28 +01:00
Glauber Costa
91b71a0b1a do not allow multiple snapshot operations at the same time
We saw a node crashing today with nodetool clearsnapshot being called.
After investigation, the reason is that nodetool clearsnapshot ws called
at the same time a new snapshot was created with the same tag. nodetool
clearsnapshot can't delete all files in the directory, because new files
had by then been created in that directory, and crashes on I/O error.

There are, many problems with allowing those operations to proceed in
parallel. Even if we fix the code not to crash and return an error on
directory non-empty, the moment they do any amount of work in parallel
the result of the operation becomes undefined. Some files in the
snapshot may have been deleted by clear, for example, and a user may
then not be able to properly restore from the backup if this snapshot
was used to generate a backup.

Moreover, although we could lock at the granularity of a keyspace or
column family, I think we should use a big hammer here and lock the
entire snapshot creation/deletion to avoid surprises (for example, if a
user requests creation of a snapshot for all keyspaces, and another
process requests clear of a single keyspace)

Fixes #4554

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190614174438.9002-1-glauber@scylladb.com>
2019-06-16 10:30:13 +03:00
Rafael Ávila de Espíndola
44eb939aa6 Use the sanitizer flags from seastar
In practice, we always want to use the same sanitizer flags with
seastar and scylla. Seastar was already marking its sanitizer flags
public, so what was missing was exporting the link flags via pkgconfig
and dropping the duplicates from scylla.

I am doing this after wasting some time editing the wrong file.

This depends on the seastar patch to export the sanitizer flags in
pkgconfig.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-06-16 09:21:10 +03:00
Takuya ASADA
f582a759ee dist: merge /usr/lib/scylla to /opt/scylladb
We used to use /opt/scylladb just for Scylla build toolchain and
dependency libraries, not for Scylla main package.
But since we merged relocatable package, Scylla main binary and
dependency libraries are all located under /opt/scylladb, only
setup scripts remained on /usr/lib/scylla.
It strange to keep using both /usr/lib/<app name> and /opt/<app name>,
we should merge them into single place.

Message-Id: <20190614011038.17827-1-syuu@scylladb.com>
2019-06-14 21:03:36 +03:00
Piotr Jastrzebski
a41c9763a9 sstables: distinguish empty and missing cellpath
Before this patch mc sstables writer was ignoring
empty cellpaths. This is a wrong behaviour because
it is possible to have empty key in a map. In such case,
our writer creats a wrong sstable that we can't read back.
This is becaus a complex cell expects cellpath for each
simple cell it has. When writer ignores empty cellpath
it writes nothing and instead it should write a length
of zero to the file so that we know there's an empty cellpath.

Fixes #4533

Tests: unit(release)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <46242906c691a56a915ca5994b36baf87ee633b7.1560532790.git.piotr@scylladb.com>
2019-06-14 20:36:41 +03:00
Asias He
9079790f85 repair: Avoid writing row with same partition key and clustering key more than once
Consider

   master: row(pk=1, ck=1, col=10)
follower1: row(pk=1, ck=1, col=20)
follower2: row(pk=1, ck=1, col=30)

When repair runs, master fetches row(pk=1, ck=1, col=20) and row(pk=1,
ck=1, col=30) from follower1 and follower2.

Then repair master sends row(pk=1, ck=1, col=10) and row(pk=1, ck=1,
col=30) to follower1, follower1 will write the row with the same
pk=1, ck=1 twice, which violates uniqueness constraints.

To fix, we apply the row with same pk and ck into the previous row.
We only needs this on repair follower because the rows can come from
multiple nodes. While on repair master, we have a sstable writer per
follower, so the rows feed into sstable writer can come from only a
single node.

Tests: repair_additional_test.py:RepairAdditionalTest.repair_same_row_diff_value_3nodes_test
Fixes: #4510
Message-Id: <cb4fbba1e10fb0018116ffe5649c0870cda34575.1560405722.git.asias@scylladb.com>
2019-06-13 17:19:19 +02:00
Asias He
912ce53fc5 repair: Allow repair_row to initialize partially
On repair follower node, only decorated_key_with_hash and the
mutation_fragment inside repair_row are used in apply_rows() to apply
the rows to disk. Allow repair_row to initialize partially and throw if
the uninitialized member is accessed to be safe.
Message-Id: <b4e5cc050c11b1bafcf997076a3e32f20d059045.1560405722.git.asias@scylladb.com>
2019-06-13 17:18:53 +02:00
Benny Halevy
2fd2713fda conf: update conf/scylla.yaml default large data warning thresholds
They are currently inconsistent with db/config.cc
and missing compaction_large_cell_warning_threshold_mb

Fixes #4551

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190613133657.15370-1-bhalevy@scylladb.com>
2019-06-13 16:45:27 +03:00
Benny Halevy
4ad06c7eeb tests/perf: provide random-seed option
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190613114307.31038-2-bhalevy@scylladb.com>
2019-06-13 14:45:49 +03:00
Benny Halevy
43e4631e6a tests: random-utils: use seastar::testing::local_random_engine
To provide test reproducibility use the seastar local_random_engine.

To reproduce a run, use the --random-seed command line option
with the seed printed accordingly.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190613114307.31038-1-bhalevy@scylladb.com>
2019-06-13 14:45:48 +03:00
Benny Halevy
fe2d629e20 mutation_reader_test: test_multishard_combining_reader_reading_empty_table: fix non-atomic sharing of shards_touched
It needs to be a std::vector<std::atomic<bool>>
otherwise threads step on wach other in shared memory.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190613112359.21884-1-bhalevy@scylladb.com>
2019-06-13 14:44:43 +03:00
Piotr Sarna
2c2122e057 tests: add a test case for filtering clustering key
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref #4541
Message-Id: <3612dc1c6c22c59ac9184220a2e7f24e8d18407c.1560410018.git.sarna@scylladb.com>
2019-06-13 10:38:56 +03:00
Piotr Sarna
c4b935780b cql3: fix qualifying clustering key restrictions for filtering
Clustering key restrictions can sometimes avoid filtering if they form
a prefix, but that can happen only if the whole partition key is
restricted as well.

Ref #4541
Message-Id: <9656396ee831e29c2b8d3ad4ef90c4a16ab71f4b.1560410018.git.sarna@scylladb.com>
2019-06-13 10:38:47 +03:00
Piotr Sarna
adeea0a022 cql3: fix fetching clustering key columns for filtering
When a column is not present in the select clause, but used for
filtering, it usually needs to be fetched from replicas.
Sometimes it can be avoided, e.g. if primary key columns form a valid
prefix - then, they will be optimized out before filtering itself.
However, clustering key prefix can only be qualified for this
optimization if the whole partition key is restricted - otherwise
the clustering columns still need to be present for filtering.

This commit also fixes tests in cql_query_test suite, because they now
expect more values - columns fetched for filtering will be present as
well (only internally, the clients receive only data they asked for).

Fixes #4541
Message-Id: <f08ebae5562d570ece2bb7ee6c84e647345dfe48.1560410018.git.sarna@scylladb.com>
2019-06-13 10:38:37 +03:00
Glauber Costa
8a3fe3ac9b debian: correctly relocate python scripts
Relocation of python scripts mentions scylla-server in paths explicitly.
It should use {{product}} instead. The current build is failing when
{{product}} is different than scylla-server

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190613012518.28784-1-glauber@scylladb.com>
2019-06-13 09:39:36 +03:00
Takuya ASADA
b1226fb15a dist/docker/redhat: change user of scylla services to 'scylla'
On branch-3.1 / master, we are getting following error:

ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/data: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/hints: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/commitlog: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/view_hints: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)

It seems like owner verification of data directory fails because
scylla-server process is running in root but data directory owned by
scylla, so we should run services as scylla user.

Fixes #4536
Message-Id: <20190611113142.23599-1-syuu@scylladb.com>
2019-06-12 20:29:06 +03:00
Takuya ASADA
60d8a99f05 dist/common/scripts/scylla_setup: verify system umask is acceptable for scylla-server
To avoid 'Bad permmisons' error when user changed default umask, we need
to verify system umask is acceptable for scylla-server.

Fixes #4157

Message-Id: <20190612130343.6043-1-syuu@scylladb.com>
2019-06-12 20:29:06 +03:00
Avi Kivity
cac812661c Update seastar submodule
* seastar 253d6cb...ded50bd (14):
  > Only export sanitizer flags if used
  > perftune.py: use pyudev.Devices methods instead of deprecated pyudev.Device ones
  > Add a Sanitize build mode
  > Merge "perftune.py : new tuning modes" from Vlad
  > reactor: clarify how submit_to() destroys the function object
  > Export the sanitizer flags via pkgconfig
  > smp: Delete unprocessed work items
  > iotune: fixed finding mountpoint infinite loop
  > net: Fix dereferencing moved object
  > Always enable the exception scalability hack
  > Merge "Simple cleanups in future.hh" from Rafael
  > tests: introduce testing::local_random_engine
  > core/deleter: Fix abort when append() is called twice with a shared deleter
  > rpc stream: do not crash if a stream is used after eos
2019-06-12 20:28:48 +03:00
Asias He
b463d7039c repair: Introduce get_combined_row_hash_response
Currently, REPAIR_GET_COMBINED_ROW_HASH RPC verb returns only the
repair_hash object. In the future, we will use set reconciliation
algorithm to decode the full row hashes in working row buf. It is useful
to return the number of rows inside working row buf in addition to the
combined row hashes to make sure the decode is successful.

It is also better to use a wrapper class for the verb response so we can
extend the return values later more easily with IDL.

Fixes #4526
Message-Id: <93be47920b523f07179ee17e418760015a142990.1559771344.git.asias@scylladb.com>
2019-06-12 13:51:29 +03:00
Takuya ASADA
30414d9c23 dist/ami: install scylla debug symbols by default
On AMI creation, install scylla-debuginfo by default.

closes #4542

Message-Id: <20190612102355.21386-1-syuu@scylladb.com>
2019-06-12 13:49:46 +03:00
Eliran Sinvani
2b44d8ed42 cql: Allow user manipulation queries to use cql keywords for a name
This commit allows the CREATE/DROP/ALTER USER cql queris
to use cql keywords for the user name (for example "empty").

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190612104301.8322-1-eliransin@scylladb.com>
2019-06-12 13:48:10 +03:00
Dejan Mircevski
a52a56bfc0 utils: Add like_matcher
A utility for matching text with LIKE patterns, and a battery of
tests.

Tests: unit(dev,debug)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-06-12 13:14:53 +03:00
Piotr Sarna
7b2de7ac5b tests: add case for partition key index and filtering
The test ensures that partition key index does not influence
filtering decisions for regular columns.

Ref #4539
2019-06-12 11:53:02 +02:00
Rafael Ávila de Espíndola
bf87b7e1df logalloc: Use asan to poison free areas
With this patch, when using asan, we poison segment memory that has
been allocated from the system but should not be accessible to user
code.

Should help with debugging user after free bugs.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190607140313.5988-1-espindola@scylladb.com>
2019-06-12 11:46:45 +02:00
Piotr Sarna
adc51e57c1 cql3: fix deciding if a query uses indexing
The code that decides whether a query should used indexing
was buggy - a partition key index might have influenced the decision
even if the whole partition key was passed in the query (which
effectively means that indexing it is not necessary).

Fixes #4539
2019-06-12 11:44:16 +02:00
Raphael S. Carvalho
62aa0ea3fa sstables: fix log of failure on large data entry deletion by fixing use-after-move
Fixes #4532.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190527200828.25339-1-raphaelsc@scylladb.com>
2019-06-12 10:55:46 +03:00
Juliana Oliveira
43f92ae6d5 cql: functions: add min/max/count for boolean type
Explicitly add min/max/count functions and tests for
boolean type.

Tests: unit (release)

Signed-off-by: Juliana Oliveira <juliana@scylladb.com>
Message-Id: <20190612015215.GA2618@shenzou.localdomain>
2019-06-12 10:11:08 +03:00
Benny Halevy
3ad005ba17 build-ami: fix branch detection failure when not in git tree
Introduced in 513d01d53e

The script is trying to determine the branch to shallow clone
when an rpm is missing and has to be built.
This functionality in the current implementation assumes it is being run inside
a git repository, but that must not be the case if the script is triggered after
local rpms were placed on the local directory.
This happens when putting all necessary rpm files in: dist/ami/files
And then running: dist/ami/build_ami.sh --localrpm
The dist/ami/ and dist/ami/files are the only ones required for this action so
querying the git repository in that situation makes no sense.

Fixes #4535

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190611112455.13862-1-bhalevy@scylladb.com>
2019-06-11 19:08:02 +03:00
Piotr Sarna
1a5e5433bf cql3: make add_restriction helper functions public
In order to allow building statement restrictions manually
instead of providing WHERE clause from CQL layer, helper functions
that add single restrictions are made public.
Message-Id: <31fa23a5e5ef927128f23b9fcb3362a2582d86bb.1560237237.git.sarna@scylladb.com>
2019-06-11 16:01:35 +03:00
Tomasz Grabiec
8c4baab81e Merge "view: ignore duplicated key entries in progress virtual reader" from Piotr S.
Build progress virtual reader uses Scylla-specific
scylla_views_builds_in_progress table in order to represent legacy
views_builds_in_progress rows. The Scylla-specific table contains
additional cpu_id clustering key part, which is trimmed before
returning it to the user. That may cause duplicated clustering row
fragments to be emitted by the reader, which may cause undefined
behaviour in consumers.  The solution is to keep track of previous
clustering keys for each partition and drop fragments that would cause
duplication. That way if any shard is still building a view, its
progress will be returned, and if many shards are still building, the
returned value will indicate the progress of a single arbitrary shard.

Fixes #4524
Tests:
unit(dev) + custom monotonicity checks from tgrabiec@scylladb.com
2019-06-11 13:55:25 +02:00
Piotr Sarna
85a3a4b458 view: ignore duplicated key entries in progress virtual reader
Build progress virtual reader uses Scylla-specific
scylla_views_builds_in_progress table in order to represent
legacy views_builds_in_progress rows. The Scylla-specific table contains
additional cpu_id clustering key part, which is trimmed before returning
it to the user. That may cause duplicated clustering row fragments to be
emitted by the reader, which may cause undefined behaviour in consumers.
The solution is to keep track of previous clustering keys for each
partition and drop fragments that would cause duplication. That way if
any shard is still building a view, its progress will be returned,
and if many shards are still building, the returned value will indicate
the progress of a single arbitrary shard.

Fixes #4524
Tests:
unit(dev) + custom monotonicity checks from <tgrabiec@scylladb.com>
2019-06-11 13:01:31 +02:00
Nadav Har'El
5ef928a63d coding-style.md: mention "using namespace seastar"
All Scylla code is written with "using namespace seastar", i.e., no
"seastar::" prefix for Seastar symbols. Document this in the coding style.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190610203948.18075-1-nyh@scylladb.com>
2019-06-11 10:39:03 +03:00
Calle Wilund
26702612f3 api.hh: Fix bool parsing in req_param
Fixes #4525

req_param uses boost::lexical cast to convert text->var.
However, lexical_cast does not handle textual booleans,
thus param=true causes not only wrong values, but
exceptions.

Message-Id: <20190610140511.15478-1-calle@scylladb.com>
2019-06-10 17:11:47 +03:00
Gleb Natapov
9213d56a06 storage_proxy: align background and foreground repair metric names
One is plural another is not, make them all plural.

Message-Id: <20190605135940.GI25001@scylladb.com>
2019-06-10 11:34:36 +03:00
Benny Halevy
2017de9387 build-ami: delete extra parenthesis in branch_arg calculation
Fixing a typo

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190610062113.5604-1-bhalevy@scylladb.com>
2019-06-10 11:29:44 +03:00
Avi Kivity
591d2968cc storage_proxy: limit resources consumed in cross-shard operations
Currently, each shard protects itself by not reading from rpc and the native
transport if in-flight requests consume too much memory for that shard. However,
if all shards then forward their requests to some other shard, then that shard
can easily run out of memory since its load can be multiplied by the number of
shards that send it requests.

To protect against this, use the new Seastar smp_service_group infrastructure.
We create three groups: read, write, and write ack (the latter is needed to
avoid ABBA deadlocks is shard A exhausts all its resources sending writes to shard B,
and shard B simulateously does the same; neither will be able to send
acknowledgements, so if the writes are throttled, they will never be unthrottled
until a timeout occurs).

Range scans are not addressed by this patch since they are handled by
multishard_mutation_query, which has its own complex cross-shard communication
scheme, but it be a similar solution.

Ref #1105 (missing range scan protection)

Tests: unit (dev)
Message-Id: <20190512142243.17795-1-avi@scylladb.com>
2019-06-07 10:53:23 +02:00
Vlad Zolotarov
20a610f6bc fix_system_distributed_tables.py: declare the 'port' argument as 'int'
If a port value passed as a string this makes the cluster.connect() to
fail with Python3.4.

Let's fix this by explicitly declaring a 'port' argument as 'int'.

Fixes #4527

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190606133321.28225-1-vladz@scylladb.com>
2019-06-06 20:19:57 +03:00
Benny Halevy
c188f838bc build-ami: use ssh git URLs
Rather than https, for cert-based passwordless access.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190606133648.15877-2-bhalevy@scylladb.com>
2019-06-06 20:02:13 +03:00
Benny Halevy
513d01d53e build-ami: use current git branch for shallow-clone of other repos
We want to use the same branch on the other repos build-ami needs
as the one we're building for.  Automatically find the current branch
using the `git branch` command.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190606133648.15877-1-bhalevy@scylladb.com>
2019-06-06 20:02:13 +03:00
Juliana Oliveira
fd83f61556 Add a warning for partitions with too many rows
This patch adds a warning option to the user for situations where
rows count may get bigger than initially designed. Through the
warning, users can be aware of possible data modeling problems.

The threshold is initially set to '100,000'.

Tests: unit (dev)

Message-Id: <20190528075612.GA24671@shenzou.localdomain>
2019-06-06 19:48:57 +03:00
Piotr Sarna
74f6ab7599 db: drop unnecessary double computation when feeding hash
When feeding hash for schema digest, compact_for_schema_digest
is mistakenly called twice, which may result in needless recomputation.

Message-Id: <8f52201cf428a55e7057d8438025275023eb9288.1559826555.git.sarna@scylladb.com>
2019-06-06 16:16:47 +03:00
Rafael Ávila de Espíndola
b3adabda2d Reduce logalloc differences between debug and release
A lot of code in scylla is only reachable if SEASTAR_DEFAULT_ALLOCATOR
is not defined. In particular, refill_emergency_reserve in the default
allocator case is empty, but in the seastar allocator case it compacts
segments.

I am trying to debug a crash that seems to involve memory corruption
around the lsa allocator, and being able to use a debug build for that
would be awesome.

This patch reduces the differences between the two cases by having a
common segment_pool that defers only a few operations to different
segment_store implementations.

Tests: unit (debug, dev)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190606020937.118205-1-espindola@scylladb.com>
2019-06-06 12:55:56 +03:00
Nadav Har'El
95bab04cf9 docs/metrics.md: "instance" label no longer comes from Scylla
Prometheus needs to remember which "instance" (node) each measurement
came from. But it doesn't actually need Scylla to tell it the instance
name - it knows which node it got each measurement from.

After Seastar commit 79281ef287
which fixed Seastar issue https://github.com/scylladb/seastar/issues/477,
the "instance" label on measurements no longer comes from Scylla but rather
is added by Prometheus. This patch corrects the documentation to explain the
current situation, instead of incorrectly saying that Scylla adds the
"instance" label itself.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190602074629.14336-1-nyh@scylladb.com>
2019-06-06 12:42:30 +03:00
Piotr Sarna
f50f418066 types: isolate deserializing iterator to separate file
In order to be used outside types.cc, listlike deserializing iterator
is moved to a separate header.

Message-Id: <d9416e6a8d170aa4936826b54ca7be4acb4ec8e6.1559745816.git.sarna@scylladb.com>
2019-06-05 17:46:51 +03:00
Pekka Enberg
eb00095bca relocate_python_scripts.py: Fix node-exporter install on Debian variants
The relocatable Python is built from Fedora packages. Unfortunately TLS
certificates are in a different location on Debian variants, which
causes "node_exporter_install" to fail as follows:

  Traceback (most recent call last):
    File "/usr/lib/scylla/libexec/node_exporter_install", line 58, in <module>
      data = curl('https://github.com/prometheus/node_exporter/releases/download/v{version}/node_exporter-{version}.linux-amd64.tar.gz'.format(version=VERSION), byte=True)
    File "/usr/lib/scylla/scylla_util.py", line 40, in curl
      with urllib.request.urlopen(req) as res:
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 222, in urlopen
      return opener.open(url, data, timeout)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 525, in open
      response = self._open(req, data)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 543, in _open
      '_open', req)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 503, in _call_chain
      result = func(*args)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1360, in https_open
      context=self._context, check_hostname=self._check_hostname)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1319, in do_open
      raise URLError(err)
  urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>
  Unable to retrieve version information
  node exporter setup failed.

Fix the problem by overriding the SSL_CERT_FILE environment variable to
point to the correct location of the TLS bundle.

Message-Id: <20190604175434.24534-1-penberg@scylladb.com>
2019-06-04 21:12:21 +03:00
Piotr Sarna
b3396dbb57 types: migrate to_json_string to use bytes view
The to_json_string utility implementation was based on const references
instead of views, which can be a source of unnecessary memory copying.
This patch migrates all to_json_string to use bytes_view and leaves
the const reference version as a thin wrapper.

Message-Id: <2bf9f1951b862f8e8a2211cb4e83852e7ac70c67.1559654014.git.sarna@scylladb.com>
2019-06-04 19:17:46 +03:00
Avi Kivity
06d77aa548 Merge "Introduce queue reader" from Botond
"
Technically queue_reader already exists, however so far it was a
private utility in `multishard_writer.cc`. This mini-series makes it
public and generally useful. The interface is made safer and simpler and
the implementation is improved so it doesn't have two separate buffers.
Also, unit tests are added.

Tests: mutation_reader_test:debug/test_queue_reader, multishard_writer_test:debug
"

* 'queue_reader/v2' of https://github.com/denesb/scylla:
  queue_reader: use the reader's buffer as the queue
  Make queue_reader public
2019-06-04 13:46:15 +03:00
Botond Dénes
2ccd8ee47c queue_reader: use the reader's buffer as the queue
The queue reader currently uses two buffers, a `_queue` that the
producer pushes fragments into and its internal `_buffer` where these
fragments eventually end up being served to the consumer from.
This double buffering is not necessary. Change the reader to allow the
producer to push fragments directly into the internal `_buffer`. This
complicates the code a little bit, as the producer logic of
`seastar::queue` has to be folded into the queue reader. On the other
hand this introduces proper memory consumption management, as well as
reduces the amount of consumed memory and eliminates the possibility of
outside code mangling with the queue. Another big advantage of the
change is that there is now an explicit way to communicate the EOS
condition, no need to push a disengaged `mutation_fragment_opt`.

The producer of the queue reader now pushes the fragments into the
reader via an opaque `queue_reader_handle` object, which has the
producer methods of `seastar::queue`.

Existing users of queue readers are refactored to use the new interface.

Since the code is more complex now, unit tests are added as well.
2019-06-04 13:39:26 +03:00
Glauber Costa
cbaea172cd python3: add the cassandra driver to the relocatable package
We have a script in tree that fixes the schema for distributed system
tables, like tracing, should they change their schema. We use it all the
time but unfortunately it is not distributed with the scylla package,
which makes it using it harder (we want to do this in the server, but
consistent updates will take a while).

One of the problems with the script today that makes distributing it
harder is that it uses the python3 cassandra driver, that we don't want
to have as a server dependency. But now with the relocatable packages in
place there is no reaso not to just add it.

[avi: adjust tools/toolchain/image to point to a new image with
 python3-cassandra-driver]
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190603162447.24215-1-glauber@scylladb.com>
2019-06-03 19:34:55 +03:00
Konstantin Osipov
29c27bfc28 storage_proxy: remove unnecessary lambdas in metrics binding
Remove unnecessasry lambdas when binding metrics of the storage proxy.
Message-Id: <20190603133753.1724-1-kostja@scylladb.com>
2019-06-03 16:55:16 +03:00
Botond Dénes
a597e46792 Make queue_reader public
Extract it from `mutlishard_writer.cc` and move it to
`mutation_reader.{hh,cc}` so other code can start using it too.
2019-06-03 12:08:37 +03:00
Takuya ASADA
25112408a7 dist/debian: support relocatable python3 on Debian variants
Unlike CentOS, Debian variants has python3 package on official repository,
so we don't have to use relocatable python3 on these distributions.
However, official python3 version is different on each distribution, we may
have issue because of that.
Also, our scripts and packaging implementation are becoming presuppose
existence of relocatable python3, it is causing issue on Debian
variants.

Switching to relocatable python3 on Debian variants avoid these issues,
it will easier to manage Scylla python3 environments accross multiple
distributions.

Fixes #4495

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190531112707.20082-1-syuu@scylladb.com>
2019-06-02 14:59:43 +03:00
Raphael S. Carvalho
f360d5a936 sstables: export output operator for sstable run
It wasn't being exported in any header.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190527182246.19007-1-raphaelsc@scylladb.com>
2019-06-02 10:25:51 +03:00
Avi Kivity
7a0c6cd583 Revert "dist/debian: support relocatable python3 on Debian variants"
This reverts commit 4d119cbd6d. It breaks build_deb.sh:

18:39:56 +	seastar/scripts/perftune.py seastar/scripts/seastar-addr2line seastar/scripts/perftune.py
18:39:56 Traceback (most recent call last):
18:39:56   File "./relocate_python_scripts.py", line 116, in <module>
18:39:56     fixup_scripts(archive, args.scripts)
18:39:56   File "./relocate_python_scripts.py", line 104, in fixup_scripts
18:39:56     fixup_script(output, script)
18:39:56   File "./relocate_python_scripts.py", line 79, in fixup_script
18:39:56     orig_stat = os.stat(script)
18:39:56 FileNotFoundError: [Errno 2] No such file or directory: '/data/jenkins/workspace/scylla-master/unified-deb/scylla/build/debian/scylla-package/+'
18:39:56 make[1]: *** [debian/rules:19: override_dh_auto_install] Error 1
2019-05-29 13:58:41 +03:00
Konstantin Osipov
fcd52d6187 Update README.md with more recent build instructions on Ubuntu
Building on Ubuntu 18 or 19 following the current build instructions
doesn't work. Add information about a few pitfalls. Switch README.md
to recommending dbuild and move the details to HACKING.md.

Message-Id: <20190520152738.GA15198@atlas>
2019-05-29 12:26:12 +03:00
Takuya ASADA
4d119cbd6d dist/debian: support relocatable python3 on Debian variants
Unlike CentOS, Debian variants has python3 package on official repository,
so we don't have to use relocatable python3 on these distributions.
However, official python3 version is different on each distribution, we may
have issue because of that.
Also, our scripts and packaging implementation are becoming presuppose
existence of relocatable python3, it is causing issue on Debian
variants.

Switching to relocatable python3 on Debian variants avoid these issues,
it will easier to manage Scylla python3 environments accross multiple
distributions.

Fixes #4495

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190526105138.677-1-syuu@scylladb.com>
2019-05-26 13:56:30 +03:00
Glauber Costa
71c4375a66 scylla_io_setup: adjust values for i3en instances
Apparently we are having some issues running iotune in the i3en instances,
as the values not always make sense. We believe it is something that XFS
is doing, and running fio directly on the device (no filesystem) provides
more meaningful results (more according to AWS published expected values).

For now, let's use fio instead. In this patch I have ran fio for our 4
dimensions in each of the three types of disks (large, xlarge, 3xlarge).

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190524111454.27956-1-glauber@scylladb.com>
2019-05-24 19:37:58 +03:00
Avi Kivity
53dfaf9121 Update seastar submodule
* seastar 5cb1234b0...253d6cb69 (3):
  > reactor: disable nowait aio again
  > Merge "Restructure `timer` implementations to avoid circular dependencies" from Jesse
  > Fix build command in building-docker.md
2019-05-24 14:33:05 +03:00
Raphael S. Carvalho
cabeb12b4e sstables: add output operator for sstable run
the output will look like as follow:

Run = {
	Identifier: 647044fd-d3d4-43c4-b014-b546943ead0d
	Fragments = {
		1471=-9223317893235177836:-7063220874380325121
		1478=5924386327138804918:8070482595977135657
		1472=-7063202587832032132:-4903425074566642766
		1473=-4903298949436784325:-2739716797579745183
		1474=-2739703419744073436:-589328117804966275
		1477=3734534455848060136:5924372906965333873
		1476=1579822226461317527:3734518878340722529
		1475=-589322393539097068:1579813857236466583
		1479=8070499046054048682:9223317594733741806
	}
}

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190524043331.5093-1-raphaelsc@scylladb.com>
2019-05-24 08:36:08 +03:00
Paweł Dziepak
899ebe483a Merge "Fix empty counters handling in MC" from Piotr
"
Before this patchset empty counters were incorrectly persisted for
MC format. No value was written to disk for them. The correct way
is to still write a header that informs the counter is empty.

We also need to make sure that reading wrongly persisted empty
counters works because customers may have sstables with wrongly
persisted empty counters.

Fixes #4363
"

* 'haaawk/4363/v3' of github.com:scylladb/seastar-dev:
  sstables: add test for empty counters
  docs: add CorrectEmptyCounters to sstable-scylla-format
  sstables: Add a feature for empty counters in Scylla.db.
  sstables: Write header for empty counters
  sstables: Remove unused variables in make_counter_cell
  sstables: Handle empty counter value in read path
2019-05-23 13:05:53 +01:00
Piotr Jastrzebski
fdbf4f6f53 sstables: add test for empty counters
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-05-23 10:10:24 +02:00
Piotr Jastrzebski
e91e1a1dde docs: add CorrectEmptyCounters to sstable-scylla-format
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-05-23 10:10:24 +02:00
Piotr Jastrzebski
a962696e44 sstables: Add a feature for empty counters in Scylla.db.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-05-23 10:10:24 +02:00
Piotr Jastrzebski
b35030ae7e sstables: Write header for empty counters
When storing an empty counter we should still
write its header that indicates the emptiness.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-05-23 10:10:08 +02:00
Amnon Heiman
f3b6c5fe2f API: storage_proxy add CAS and View endpoints
Some nodetool command in 3.0 uses the CAS and View metrics.

CAS is not implemented and we don't have all the metrics for View
but we still don't want those nodetool commands to fail.

After this patch the following would work and will return empty:

curl -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_proxy/metrics/cas_read/moving_average_histogram'

curl -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_proxy/metrics/view_write/moving_average_histogram'

curl -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_proxy/metrics/cas_write/moving_average_histogram'

This patch is needed for #4416

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20190521141235.20856-1-amnon@scylladb.com>
2019-05-22 14:25:17 +03:00
Avi Kivity
698f52d257 Merge "tests: Replace ad-hoc cql utilities with general ones" from Dejan
"
One local utility function in cql_query_test.cc duplicates an existing
exception_predicate member.  Another can be generalized for wider use
in the future.  This patch accomplishes both, retiring a to-do item.

Tests: unit (dev)
"

* 'use-utils-predicate-in-cql_test' of https://github.com/dekimir/scylla:
  tests/cql: Replace equery() with cquery_nofail()
  tests: Add cquery_nofail() utility
  tests: Drop redundant function
2019-05-22 10:09:12 +03:00
Dejan Mircevski
09acb32d35 tests/cql: Replace equery() with cquery_nofail()
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-21 23:38:09 -04:00
Dejan Mircevski
a9849ecba7 tests: Add cquery_nofail() utility
Most tests await the result of cql_test_env::execute_cql().  Most
would also benefit from reporting errors with top-level location
included.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-21 23:28:14 -04:00
Dejan Mircevski
1d8bfc4173 tests: Drop redundant function
make_predicate_for_exception_message_fragment() is redundant now that
exception_utils has landed.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-21 23:28:14 -04:00
Avi Kivity
d481521a2e Update seastar submodule
* seastar 3f7a5e1...5cb1234 (5):
  > build: Help Seastar to find Boost on Fedora 30
  > Merge 'Reinstate nowait aio support' from Avi
  > Fix documentation link in README.md
  > sharded: add variants to invoke_on() that accept an smp_service_group
  > improve error message on AIO setup failure
2019-05-21 20:15:09 +03:00
Benny Halevy
fae4ca756c cql3: select_statement: provide default initializer for parameters::_bypass_cache
Fixes #4503

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190521143300.22753-1-bhalevy@scylladb.com>
2019-05-21 20:06:40 +03:00
Piotr Jastrzebski
a6484b28a1 sstables: Remove unused variables in make_counter_cell
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-05-21 12:07:31 +02:00
Piotr Jastrzebski
f711cce024 sstables: Handle empty counter value in read path
Due to a bug in an sstable writer, empty counters
were stored without a header.

Correct way of storing empty counter is to still write
a header that indicates the emptiness.

Next patch in this series fixes the write path
but we have to make sure that we handle incorrectly
serialized counters in the read path becuase there
may exist sstables with counters stored without header.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-05-21 12:07:12 +02:00
Takuya ASADA
a55330a10b dist/ami: output scylla version information to AMI tags and description
Users may want to know which version of packages are used for the AMI,
it's good to have it on AMI tags and description.

To do this, we need to download .rpm from specified .repo, extract
version information from .rpm.

Fixes #4499

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190520123924.14060-2-syuu@scylladb.com>
2019-05-20 15:46:06 +03:00
Takuya ASADA
abe44c28c5 dist/ami: build scylla-python3 when specified --localrpm
Since we switched to relocatable python3, we need to build it for AMI too.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190520123924.14060-1-syuu@scylladb.com>
2019-05-20 15:46:05 +03:00
Konstantin Osipov
25087536bc main: developer-mode configuraiton option uses dash, not underscore
Message-Id: <20190520115524.101871-1-kostja@scylladb.com>
2019-05-20 15:14:11 +03:00
Calle Wilund
1e37e1d40c commitlog: Add optional use of O_DSYNC mode
Refs #3929

Optionally enables O_DSYNC mode for segment files, and when
enabled ignores actual flushing and just barriers any ongoing
writes.

Iff using O_DSYNC mode, we will not only truncate the file
to max size, but also do an actual initial write of zero:s
to it, since XFS (intended target) has observably less good
behaviour on non-physical file blocks. Once written (and maybe
recycled) we should have rather satisfying throughput on writes.

Note that the O_DSYNC behaviour is hidden behind a default
disabled option. While user should probably seldom worry about
this, we should add some sort of logic i main/init that unless
specified by user, evaluates the commitlog disk and sets this
to true if it is using XFS and looks ok. This is because using
O_DSYNC on things like EXT4 etc has quite horrible performance.

All above statements about performance and O_DSYNC behaviour
are based on a sampling of benchmark results (modified fsqual)
on a statistically non-ssignificant selection of disks. However,
at least there the observed behaviour is a rather large
difference between ::fallocate:ed disk area vs. actually written
using O_DSYNC on XFS, and O_DSYNC on EXT4.

Note also that measurements on O_DSYNC vs. no O_DSYNC does not
take into account the wall-clock time of doing manual disk flush.
This is intentionally ignored, since in the commitlog case, at
least using periodic mode, flushes are relatively rare.

Message-Id: <20190520120331.10229-1-calle@scylladb.com>
2019-05-20 15:10:48 +03:00
Avi Kivity
d92973ba86 Merge "scylla-gdb.py: scylla_fiber: add fallback mode" from Botond
"
Add a fallback-mode that can be used when the `scylla ptr` cannot be
used, either because the application is not built with the seastar
allocator, or due to bugs. The fallback mode relies on a more primitive
method for determining how much memory to scan looking for task pointers
inside the task object. This mode, being more primitive, is less prone
to errors, but is more wasteful and less precise.
"

* 'scylla-fiber-fallback-mode/v2' of https://github.com/denesb/scylla:
  scylla-gdb.py: scylla_fiber: add fallback mode
  scylla-gdb.py: scylla_ptr: add is_seastar_allocator_used()
  scylla-gdb.py: pointer_metadata: allow constructing from non-seastar pointers
  scylla-gdb.py: scylla_fiber: fix misaligned text in docstring
2019-05-19 18:34:55 +03:00
Takuya ASADA
4b08a3f906 reloc/python3: add license files on relocatable python3 package
It's better to have license files on our python3 distribution.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190516094329.13273-1-syuu@scylladb.com>
2019-05-19 18:30:19 +03:00
Jesse Haber-Kucharsky
68353a8265 build: Don't build iotune unconditionally
We compile Seastar unconditionally so that changes to Seastar files are
reflected in Scylla when it's built.

We don't need to unconditionally build `iotune` in the same way.

`iotune` is still listed as a build artifact, so it will be built if
`ninja` is invoked without a particular target.

However, building a specific target (like `ninja build/dev/scylla`) will
not build `iotune`.

Fixes #4165

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <9fb96a281580a8743e04d5dd11398be53960cb58.1558100815.git.jhaberku@scylladb.com>
2019-05-19 18:24:05 +03:00
Avi Kivity
5a276d44af Merge "row_cache: Make invalidate() preemptible" from Tomasz
"
This patchset fixes reactor stalls caused by cache invalidation not being preemptible.
This becomes a problem when there is a lot of partitions in cache inside the invalidated range.

This affects high-level operations like nodetool refresh, table
truncation, repair and streaming.

Fixes #2683

The improvement on stalls was measured using tests/perf_row_cache_update:

  Before:

    Small partitions, no overwrites:
    invalidation: 339.420624 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 339.422144 [ms]}
    Small partition with a few rows:
    invalidation: 191.855331 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 191.856816 [ms]}
    Large partition, lots of small rows:
    invalidation: 0.959328 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 0.961453 [ms]}

  After:

    Small partitions, no overwrites:
    invalidation: 400.505554 [ms], preemption: {count: 843, 99%: 0.545791 [ms], max: 0.502340 [ms]}
    Small partition with a few rows:
    invalidation: 306.352600 [ms], preemption: {count: 644, 99%: 0.545791 [ms], max: 0.506464 [ms]}
    Large partition, lots of small rows:
    invalidation: 0.963660 [ms], preemption: {count: 2, 99%: 0.009887 [ms], max: 0.963264 [ms]}

The maximum scheduling latency went down form 339 ms to 0.5 ms (task quota).

Tests:
  - unit (dev)
"

* tag 'cache-preemptible-invalidation-v2' of github.com:tgrabiec/scylla:
  row_cache: Make invalidate() preemptible
  row_cache: Switch _prev_snapshot_pos to be a ring_position_ext
  dht: Introduce ring_position_ext
  dht: ring_position_view: Take key by const pointer
  tests: perf_row_cache_update: Rename 'stall' to 'preemption' to avoid confusion
  tests: perf_row_cache_update: Report stalls around invalidation
2019-05-19 10:47:46 +03:00
Takuya ASADA
f625284113 dist/debian: apply product name variable on override_dh_auto_install
To make product name templatization works correctly, we cannot use
"debian/scylla-server" as package contents directory path,
need to use template like "debian/{{product}}-server" instead.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190517121946.18248-1-syuu@scylladb.com>
2019-05-19 10:46:08 +03:00
Gleb Natapov
31bf4cfb5e cache_hitrate_calculator: make cache hitrate calculation preemptable
The calculation is done in a non preemptable loop over all tables, so if
numbers of tables is very large it may take a while since we also build
a string for gossiper state. Make the loop preemtable and also make
the string calculation more efficient by preallocating memory for it.
Message-Id: <20190516132748.6469-3-gleb@scylladb.com>
2019-05-16 15:32:36 +02:00
Gleb Natapov
4517c56a57 cache_hitrate_calculator: do not copy stats map for each cpu
invoke_on_all() copies provided function for each shard it is executed
on, so by moving stats map into the capture we copy it for each shard
too. Avoid it by putting it into the top level object which is already
captured by reference.
Message-Id: <20190516132748.6469-2-gleb@scylladb.com>
2019-05-16 15:32:24 +02:00
Dejan Mircevski
8dcb35913a table: Avoid needless allocation of cell lockers
All `table` instances currently unconditionally allocate a cell locker
for counter cells, though not all need one.  Since the lockers occupy
quite a bit of memory (as reported in #4441), it's wasteful to
allocate them when unneeded.

Fixes #4441.

Tests: unit (dev, debug)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20190515190910.87931-1-dejan@scylladb.com>
2019-05-16 11:10:38 +03:00
Avi Kivity
5b2c8847c7 Merge "Pre timestamp based data segregation cleanup" from Botond
"
This series contains loosely related generic cleanup patches that the
timestamp based data segregation series depends on. Most of the patches
have to do with making headers self-sustainable, that is compilable on
their own. This was needed to be able to ensure that the new headers
introduced or touched by that series are self-sustainable too.
This series also introduces `schema_fwd.hh` which contains a forward
declaration of `schema` and `schema_ptr` classes. No effort was made to
find and replace all existing ad-hoc schema forward declarations in the
source tree.
"

* 'pre-timestamp-based-data-segregation-cleanup/v1' of https://github.com/denesb/scylla:
  encoding_stats.hh: add missing include
  sstables/time_window_compaction_strategy.hh: make self-sufficient
  sstables/size_tiered_compaction_strategy.hh: make self-sufficient
  sstables/compaction_strategy_impl.hh: make header self-sufficient
  compaction_strategy.hh: use schema_fwd.hh
  db/extensions.hh: use schema_fwd.hh
  Add schema_fwd.hh
2019-05-15 17:37:06 +03:00
Asias He
51c4f8cc47 repair: Fix use after free in remove_repair_meta for repair_metas
We should capture repair_metas so that it will not be freed until the
parallel_for_each is finished.

Fixes: #4333
Tests: repair_additional_test.py:RepairAdditionalTest.repair_kill_1_test
Message-Id: <237b20a359122a639330f9f78c67568410aef014.1557922403.git.asias@scylladb.com>
2019-05-15 17:22:51 +03:00
Calle Wilund
e7003f1051 sstable: Make all sstable components subject to file extensions
Makes opening all sstable components go through same file open
routine, optionally applying extensions to each (except TOC which
is special).

Also ensures we read Scylla metadata before other non-TOC
components, as we might need this for extensions (hint hint).

Message-Id: <20190513201821.14417-1-calle@scylladb.com>
2019-05-15 17:14:58 +03:00
Botond Dénes
a0010f52c5 scylla-gdb.py: scylla_fiber: add fallback mode
The current implementation of the `scylla fiber` command relies on the
`scylla ptr` command to provide metadata on pointers, more
specifically the boundaries of the region the object they point to
occupies. However, in debug mode, seastar is using the standard allocator
and thus the `scylla ptr` command doesn't work.
To work around this, provide a fallback mode for debug builds. This mode
assumes pointers point to the start of objetcts and scans a
configurable region of memory. While less exact than the variant relying
on `scylla ptr` it still works reasonably well.
The size of the to-be-scanned memory region can be set using the
`--scanned-region-size` command line argument. This defaults to 512.

Additionally, add a flag (`--force-fallback-mode`) to force using the
fallback mode. This is useful if `scylla ptr` is not working for any
reason.
2019-05-15 15:46:42 +03:00
Botond Dénes
c78d667153 scylla-gdb.py: scylla_ptr: add is_seastar_allocator_used()
Determines whether the application is using the seastar allocator or
not. This is done by attempting to resolve the
`seastar::memory::cpu_mem` symbol. To avoid the expensive symbol lookup
the result is cached. This means that loading a new inferior will
possibly return the wrong value. The cache can be flushed by re-sourcing
the `scylla-gdb.py` script.
2019-05-15 15:44:38 +03:00
Botond Dénes
c3a06da8fb scylla-gdb.py: pointer_metadata: allow constructing from non-seastar pointers 2019-05-15 15:43:34 +03:00
Botond Dénes
4964671e83 scylla-gdb.py: scylla_fiber: fix misaligned text in docstring 2019-05-15 15:43:29 +03:00
Avi Kivity
8e19121e98 Merge "Implement simple selection alongside aggregation" from Dejan
"
Although CQL allows SELECT statements with both simple and aggregate
selectors, Scylla disallows them.  This patch removes that restriction
and ensures that mixed simple/aggregate selection works as specified
both with and without GROUP BY.

Tests: unit (dev)
"

* 'aggregate-and-simple-select-together' of https://github.com/dekimir/scylla:
  cql: Fix mixed selection with GROUP BY
  cql: Allow mixing of aggregate and simple selectors
2019-05-14 20:03:58 +03:00
Dejan Mircevski
f9b00a4318 cql: Fix mixed selection with GROUP BY
GROUP BY is currently supported by simple_selection, the class used
when all selectors are simple.  But when selectors are mixed, we use
selection_with_processing, which does not yet support GROUP BY.  This
patch fixes that.

It also adapts one testcase in filtering_test to the new behavior of
simple_selector.  The test currently expects the last value seen, but
simple_selector now outputs the first value seen.

(More details: the WHERE clause implicitly selects the columns it
references, and unit tests are forced to provide expected values for
these columns.  The user-visible result is unchanged in the test;
users never see the WHERE column values due to filtering in
cql::transport, outside unit tests.)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-14 12:50:39 -04:00
Dejan Mircevski
06e3b36164 cql: Allow mixing of aggregate and simple selectors
Scylla currently rejects SELECT statements with both simple and
aggregate selectors, but Cassandra allows them.  This patch brings
parity to Scylla.

Fixes #4447.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-14 10:34:02 -04:00
Botond Dénes
fe3b798b51 scylla-gdb.py: scylla fiber: add seastar::smp_message_queue::async_work_item to the whitelist
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4c49fcf5391e027eae68707c9e6ab2f9188c2ea4.1557838171.git.bdenes@scylladb.com>
2019-05-14 17:09:32 +03:00
Avi Kivity
82b91c1511 Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz
"
Commit d0f9e00 changed the representation of the gc_clock::duration
from int32_t to int64_t.

Mutation hashing uses appending_hash<gc_clock::time_point>, which by
default feeds duration::count() into the hasher. duration::rep changed
from int32_t to int64_t, which changes the value of the hash.

This affects schema digest and query digests, resulting in mismatches
between nodes during a rolling upgrade.

Fixes #4460.
Refs #4485.
"

* tag 'fix-gc_clock-digest-v2.1' of github.com:tgrabiec/scylla:
  tests: Add test which verifies that schema digest stays the same
  tests: Add sstables for the schema digest test
  schema_tables, storage_service: Make schema digest insensitive to expired tombstones in empty partition
  db/schema_tables: Move feed_hash_for_schema_digest() to .cc file
  hashing: Introduce type-erased interface for the hasher
  hashing: Introduce C++ concept for the hasher
  hashers: Rename hasher to cryptopp_hasher
  gc_clock: Fix hashing to be backwards-compatible
2019-05-14 16:59:50 +03:00
Tomasz Grabiec
285ada5035 Merge "config: remove _make_config_values macro" from Avi
The _make_config_values macro reduces duplication (both the item name
and the types need to be available as C++ identifiers and as runtime
strings), but is hard to work with. The macro is huge and editors
don't handle it well, errors aren't identified at the correct
location, and since the macro doesn't have types, it's hard to
refactor.

This series replaces the macro with ordinary C++ code. Some repetition is
introduced, but IMO the result is easier to maintain than the macro. As a
bonus the bulk of the code is moved away from the header file.

Tests: unit (dev), manual testing of the config REST API

* https://github.com/avikivity/scylla config-no-macro/v2
  config: make the named_value type name available without requiring
    _make_config_values
  config: remove value_status from named_value template parameter list
  config: add named_value::value_as_json()
  api: config: stop using _make_config_values
  config: auto-add named_values into config_file
  config: add allowed_values parameter to named_value constructor
  config: convert _make_config_values to individual named_value member
    declarations and initializers
2019-05-14 16:00:23 +03:00
Avi Kivity
987739898f docs: document SSTable Scylla.db component
Document the format and meaning of the various bits of the Scylla.db component.
Message-Id: <20190513081605.7394-1-avi@scylladb.com>
2019-05-14 16:00:23 +03:00
Avi Kivity
786ce70dfc doc: mention the Slack workspace as a place to get help
Message-Id: <20190514090420.5598-1-avi@scylladb.com>
2019-05-14 16:00:23 +03:00
Botond Dénes
c2ec78358b encoding_stats.hh: add missing include 2019-05-14 13:27:30 +03:00
Botond Dénes
eeacf45b4a sstables/time_window_compaction_strategy.hh: make self-sufficient 2019-05-14 13:27:30 +03:00
Botond Dénes
9953cecc83 sstables/size_tiered_compaction_strategy.hh: make self-sufficient 2019-05-14 13:27:30 +03:00
Botond Dénes
d02c2253a5 sstables/compaction_strategy_impl.hh: make header self-sufficient
Add missing includes and forward declarations. De-inline some methods.
2019-05-14 13:27:30 +03:00
Botond Dénes
20d9d18ab3 compaction_strategy.hh: use schema_fwd.hh 2019-05-14 13:27:30 +03:00
Botond Dénes
690ef09b8f db/extensions.hh: use schema_fwd.hh 2019-05-14 13:27:30 +03:00
Botond Dénes
48bf1d5629 Add schema_fwd.hh 2019-05-14 13:27:30 +03:00
Tomasz Grabiec
6159d5522d tests: Add test which verifies that schema digest stays the same
(cherry picked from commit 8019634dba)
2019-05-14 10:43:06 +02:00
Tomasz Grabiec
815295547d tests: Add sstables for the schema digest test
Generated by running test_schema_digest_does_not_change with
regenerate set to true.

(cherry picked from commit 1f2995c8c5)
2019-05-14 10:43:06 +02:00
Tomasz Grabiec
9de071d214 schema_tables, storage_service: Make schema digest insensitive to expired tombstones in empty partition
Schema digest is calculated by querying for mutations of all schema
tables, then compacting them so that all tombstones in them are
dropped. However, even if the mutation becomes empty after compaction,
we still feed its partition key. If the same mutations were compacted
prior to the query, because the tombstones expire, we won't get any
mutation at all and won't feed the partition key. So schema digest
will change once an empty partition of some schema table is compacted
away.

That's not a problem during normal cluster operation because the
tombstones will expire at all nodes at the same time, and schema
digest, although changes, will change to the same value on all nodes
at about the same time.

This fix changes digest calculation to not feed any digest for
partitions which are empty after compaction.

The digest returned by schema_mutations::digest() is left unchanged by
this patch. It affects the table schema version calculation. It's not
changed because the version is calculated on boot, where we don't yet
know all the cluster features. It's possible to fix this but it's more
complicated, so this patch defers that.

Refs #4485.

Asd
2019-05-14 10:43:06 +02:00
Tomasz Grabiec
3a4a903674 db/schema_tables: Move feed_hash_for_schema_digest() to .cc file 2019-05-14 10:43:06 +02:00
Tomasz Grabiec
b0eecdcb8f hashing: Introduce type-erased interface for the hasher
The motivation is to allow hiding the definition of functions
accepting a hasher. For one, this reduces (re)complication times,
because we can put the definition in .cc
2019-05-14 10:43:06 +02:00
Avi Kivity
1cf72b39a5 Merge "Unbreak the Unbreakable Linux" from Glauber
"
scylla_setup is currently broken for OEL. This happens because the
OS detection code checks for RHEL and Fedora. CentOS returns itself
as RHEL, but OEL does not.
"

* 'unbreakable' of github.com:glommer/scylla:
  scylla_setup: be nicer about unrecognized OS
  scylla_util: recognize OEL as part of the RHEL family
2019-05-13 21:38:21 +03:00
Glauber Costa
3b64727244 scylla_setup: be nicer about unrecognized OS
Right now if the user tries to execute this in an unrecognized OS, the
following will be thrown:

  Traceback (most recent call last):
   File "/usr/lib/scylla/libexec/scylla_setup", line 214, in <module>
     do_verify_package('scylla-enterprise-jmx')
   File "/usr/lib/scylla/libexec/scylla_setup", line 73, in do_verify_package
     if res != 0:
  UnboundLocalError: local variable 'res' referenced before assignment

It would be a lot nicer to exit gracefully and print a messge saying what
is going on. This was caught when running on OEL, which the previous patch
fixed. Still, there are other unknown OS out there the users may try to run
on.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-05-13 14:31:49 -04:00
Glauber Costa
6c15ae5b36 scylla_util: recognize OEL as part of the RHEL family
Oracle Linux is a RHEL-like distribution and we support it just fine, but our
new incarnation of scylla_setup is failing to recognize it.

os-release for OEL is a bit different. It doesn't have an ID_LIKE string, and
only shows an ID string, which is set to 'ol'. So let's recognize this.

Fixes: #4493
Branches: 3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-05-13 14:31:38 -04:00
Tomasz Grabiec
77fb34821b row_cache: Make invalidate() preemptible
This change inserts preemption points between removal of partitions.

The main complication is in maintaining consitency in the face of
concurrent population or eviction. We use the same mechanism which is
used by memtable updates. _prev_snapshot_pos is the ring position
which partitions the ring into the part which is already updated in
cache and the one which is yet to be updated. That position should be
set accordingly on preemption.

In case of invalidation, updating means removing all entries in the
range and marking the range as discontinuous.  When resuming
invalidation of a range we continue from _prev_snapshot_pos as the
lower bound.

This affects high-level operations like nodetool refresh, table
truncation, repair and streaming.

Fixes #2683

The improvement on stalls was measured using tests/perf_row_cache_update:

Before

Small partitions, no overwrites:
invalidation: 339.420624 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 339.422144 [ms]}
Small partition with a few rows:
invalidation: 191.855331 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 191.856816 [ms]}
Large partition, lots of small rows:
invalidation: 0.959328 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 0.961453 [ms]}

After:

Small partitions, no overwrites:
invalidation: 400.505554 [ms], preemption: {count: 843, 99%: 0.545791 [ms], max: 0.502340 [ms]}
Small partition with a few rows:
invalidation: 306.352600 [ms], preemption: {count: 644, 99%: 0.545791 [ms], max: 0.506464 [ms]}
Large partition, lots of small rows:
invalidation: 0.963660 [ms], preemption: {count: 2, 99%: 0.009887 [ms], max: 0.963264 [ms]}

The maximum scheduling latency went down form 339 ms to 0.5 ms (task quota).
2019-05-13 19:32:00 +02:00
Tomasz Grabiec
595e1a540e row_cache: Switch _prev_snapshot_pos to be a ring_position_ext
dht::ring_position cannot represent all ring_position_view instances,
in particular those obtained from
dht::ring_position_view::for_range_start(). To allow using the latter,
switch to views.
2019-05-13 19:30:50 +02:00
Tomasz Grabiec
1530224377 dht: Introduce ring_position_ext
It's an owning version of ring_position_view.

Note that ring_position has a narrower domain than the
ring_position_view for historical reasons, so we cannot use that.
2019-05-13 19:30:50 +02:00
Tomasz Grabiec
b08180c7fa dht: ring_position_view: Take key by const pointer 2019-05-13 19:30:39 +02:00
Tomasz Grabiec
ed697306be tests: perf_row_cache_update: Rename 'stall' to 'preemption' to avoid confusion 2019-05-13 19:18:20 +02:00
Tomasz Grabiec
b516e5fdbf tests: perf_row_cache_update: Report stalls around invalidation 2019-05-13 10:47:03 +02:00
Avi Kivity
a8b3cb8a28 Update seastar submodule
* seastar f73690e...3f7a5e1 (7):
  > Revert "Make sure all allocations/deallocations are properly byte aligned"
  > http: fix request content for POST requests
  > doc: discourage generic lambdas and unconstrained templates
  > smp: add smp_service_group for smp::submit_to() resource control
  > Revert "smp: add smp_service_group for smp::submit_to() resource control"
  > smp: add smp_service_group for smp::submit_to() resource control
  > Make sure all allocations/deallocations are properly byte aligned
2019-05-12 13:32:41 +03:00
Tomasz Grabiec
fd349a3c65 hashing: Introduce C++ concept for the hasher 2019-05-10 12:54:30 +02:00
Tomasz Grabiec
5c2f5b522d hashers: Rename hasher to cryptopp_hasher
So that we can introduce a truly generic interface named "hasher".
2019-05-10 12:54:08 +02:00
Tomasz Grabiec
b7ece4b884 gc_clock: Fix hashing to be backwards-compatible
Commit d0f9e00 changed the representation of the gc_clock::duration
from int32_t to int64_t.

Mutation hashing uses appending_hash<gc_clock::time_point>, which by
default feeds duration::count() into the hasher. duration::rep changed
from int32_t to int64_t, which changes the value of the hash.

This affects schema digest and query digests, resulting in mismatches
between nodes during a rolling upgrade.

Fixes #4460.

(cherry picked from commit 549d0eb2f3)
2019-05-10 12:48:46 +02:00
Avi Kivity
fdace36fa5 Merge "Fixes for GCC9 build" from Paweł
"
This series contains fixes for GCC9 build, mostly corrections needed
after changes in libstdc++. With this series and a workaround for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90415 (not included)
Scylla builds and passes unit tests with GCC9 (tested on Fedora 30,
development mode only).

Tests: unit(dev with gcc8 and gcc9).
"

* tag 'gcc9-fixes/v1' of https://github.com/pdziepak/scylla:
  tests/imr: add missing noexcept
  counters: bytes_view::pointer is not const pointer
  imr/fundamental: use bytes_view::const_pointer for const pointer
2019-05-09 21:51:24 +03:00
Paweł Dziepak
96eec203bd tests/imr: add missing noexcept
The concepts require that serialisers passed to the IMR are noexcept.
GCC9 started verifying that.
2019-05-09 17:38:24 +01:00
Paweł Dziepak
ae9e083b02 counters: bytes_view::pointer is not const pointer
In libstdc++ for gcc9 std::basic_string_view::pointer isn't const any
more. As a result the compiler is complaining about reinterpret_cast
casting away const. The solution is to use std::conditional<> to choose
between const pointer for counter view and non-const pointer for mutable
counter view.
2019-05-09 17:31:35 +01:00
Paweł Dziepak
c19576319f imr/fundamental: use bytes_view::const_pointer for const pointer
In libstdc++ shipped with gcc9 std::basic_string_view::pointer is no
longer constant, which is causing the compiler to complain about
dropping const in reinterpret_cast. The solution is to use
std::basic_string_view::const_pointer.
2019-05-09 17:30:15 +01:00
Paweł Dziepak
49b4aeca4d Merge "hinted handoff: prevent sending attempts" from Vlad
"
Fix the broken logic that is meant to prevent sending hints when node is
in a DOWN NORMAL state.
"

* 'hinted_handoff_stop_sending_to_down_node-v2' of https://github.com/vladzcloudius/scylla:
  hints_manager: rename the state::ep_state_is_not_normal enum value
  hinted handoff: fix the logic that detects that the destination node is in DN state
  hinted_handoff: sender::can_send(): optimize gossiper::is_alive(ep) check
  hinted handoff: end_point_hints_manager::sender: use _gossiper instead of _shard_manager.local_gossiper()
  types.cc: fix the compilation with fmt v5.3.0
2019-05-09 15:18:57 +01:00
Avi Kivity
db536776d9 tools: toolchain: fix dbuild in interactive mode regression
Before ede1d248af, running "tools/toolchain/dbuild -it -- bash" was
a nice way to play in the toolchain environment, for example to start
a debugger. But that commit caused containers to run in detached mode,
which is incompatible with interactive mode.

To restore the old behavior, detect that the user wants interactive mode,
and run the container in non-detached mode instead. Add the --rm flag
so the container is removed after execution (as it was before ede1d248af).
Message-Id: <20190506175942.27361-1-avi@scylladb.com>
2019-05-09 15:01:21 +02:00
Dejan Mircevski
d5f587b83d Narrow down build dependences of duration_test
In 0ea6df, duration_test was made to link against all tests/*.o files.
This isn't necessary, as it only needs tests/exception_utils.o.  This
patch narrows down duration_test's dependences to only
exception_utils.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20190508211630.108228-1-dejan@scylladb.com>
2019-05-09 15:01:21 +02:00
Dejan Mircevski
e4ec89473e tests: Cover indexing errors in frozen collections
Add new test cases:
- disallow creating a non-FULL index on frozen collections
- disallow repeated creation of a FULL index on frozen collections
- disallow FULL indexes on non-frozen collections
- disallow referencing frozen-map entries in the WHERE clause

Also add error-message expectations to existing test cases.

Fixes #3654.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20190509025806.124499-1-dejan@scylladb.com>
2019-05-09 15:25:11 +03:00
Dejan Mircevski
4eeec4a452 tests: drop util.hh
The file tests/util.hh was somehow committed despite `git mv`g it to
tests/exception_utils.hh.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20190508210203.106295-1-dejan@scylladb.com>
2019-05-09 14:45:33 +03:00
Takuya ASADA
19a973cd05 dist/ami: fix wrong path of SCYLLA-PRODUCT-FILE
Since other build_*.sh are for running inside extracted relocatable
package, they have SCYLLA-PRODUCT-FILE on top of the directory,
but build_ami.sh is not running in such condition, we need to run
SCYLLA-VERSION-GEN first, then refer to build/SCYLLA-PRODUCT-FILE.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190509110621.27468-1-syuu@scylladb.com>
2019-05-09 14:45:31 +03:00
Vlad Zolotarov
f07c341efc hints_manager: rename the state::ep_state_is_not_normal enum value
Rename this state value to better reflect the reality:
state::ep_state_is_not_normal -> state::ep_state_left_the_ring

The manager gets to this state when the destination Node has left the ring.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-05-08 15:46:47 -04:00
Vlad Zolotarov
93ba700458 hinted handoff: fix the logic that detects that the destination node is in DN state
When node is in a DN state its gossiper state may be NORMAL, SHUTDOWN
or "" depending on the use case.

In addition to that if node has been removed from the ring its state is
also going to be removed from the gossiper_state map.

Let's consider the above when deciding if node is in the DN state.

Fixes #4461

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-05-08 14:53:01 -04:00
Glauber Costa
a23531ebd5 Support AWS i3en instances
AWS just released their new instances, the i3en instances.  The instance
is verified already to work well with scylla, the only adjustments that
we need is advertise that we support it, and pre-fill the disk
information according to the performance numbers obtained by running the
instance.

Fixes #4486
Branches: 3.1

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190508170831.6003-1-glauber@scylladb.com>
2019-05-08 20:09:44 +03:00
Avi Kivity
a86fdeb02b Merge "Implement GROUP BY" from Dejan
"
Cassandra has supported GROUP BY in SELECT statements since 2016
(v3.10), while ScyllaDB currently treats it as a syntax error.  To
achieve parity with Cassandra in this important bit of functionality,
this patch adds full support for GROUP BY, from parsing to validation
to implementation to testing.
"

* 'groupby-implPP' of https://github.com/dekimir/scylla:
  Implement grouping in selection processing
  Propagate GROUP BY indices to result_set_builder
  Process GROUP BY columns into select_statement
  Parse GROUP BY clause, store column identifiers
2019-05-08 18:35:12 +03:00
Dejan Mircevski
d51e4a589d Implement grouping in selection processing
Make result_set_builder obey its _group_by_cell_indices by recognizing
group boundaries and resetting the selectors.

Also make simple_selectors work correctly when grouping.

Fixes #2206.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-08 11:05:36 -04:00
Dejan Mircevski
c3929aee3a Propagate GROUP BY indices to result_set_builder
Ensure that the indices recorded in select_statement are passed to
result_set_builder when one is created for processing the cell values.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-08 10:10:10 -04:00
Dejan Mircevski
274a77f45e Process GROUP BY columns into select_statement
Validate raw GROUP BY identifiers and translate them into
a select_statement member.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-08 10:10:10 -04:00
Dejan Mircevski
e1fb414805 Parse GROUP BY clause, store column identifiers
Extend the grammar file with GROUP BY, collect the column identifiers,
and store them in raw::select_statement.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-08 10:09:22 -04:00
Avi Kivity
ab3f044daa Revert "Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz"
This reverts commit dcb263b36b, reversing
changes made to a6759dc6aa. schema_change_test
fails consistently on master with it.
2019-05-08 16:19:38 +03:00
JP-Reddy
56420dc650 scylla_io_setup: TypeError in iotune_args array from scylla_io_setup script
Whenever the iotune_args array uses "--smp", it needs cpudata.smp()
which returns an integer instead of a string. So when iotune_args is
passed to subprocess.check_call(), it actually throws "TypeError:
expected str, bytes or os.PathLike object, not int" but
"%s did not pass validation tests, it may not be on XFS..." is shown as
the exception.

Even though the user inputs correct arguments, it might still throw an
error and confuse the user that he/she has not passed the right
arguments.

One simple fix is to use str(cpudata.smp()) instead of cpudata.smp().

Signed-off-by: JP-Reddy <guthijp.reddy@gmail.com>
Message-Id: <20190406070118.48477-1-guthijp.reddy@gmail.com>
2019-05-07 20:13:54 +03:00
Paweł Dziepak
8a16cbc50d Merge "treewide: adjust for gcc 9" from Avi
"
gcc 9 complains a lot about pessimizing moves, narrowing conversions, and
has tighter deduction rules, plus other nice warnings. Fix problems found
by it, and make some non-problems compile without warnings.
"

* tag 'gcc9/v1' of https://github.com/avikivity/scylla:
  types: fix pessimizing moves
  thrift: fix pessimizing moves
  tests: fix pessimizing moves
  tests: cql_query_test: silence narrowing conversion warning
  test: cql_auth_syntax_test: fix ambiguity due to parser uninitialized<T>
  table: fix potentially wrong schema when reading from zero sstables
  storage_proxy: fix pessimizing moves
  memtable: fix pessimizing moves
  IDL: silence narrowing conversion in bool serializer
  compaction: fix pessimizing moves
  cache: fix pessimizing moves
  locator: fix pessimizing moves
  database: fix pessimizing moves
  cql: fix pessimizing moves
  cql parser: fix conversion from uninitalized<T> to optional<T> with gcc 9
2019-05-07 12:19:29 +01:00
Avi Kivity
43867fe618 types: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 10:01:36 +03:00
Avi Kivity
1b760297f5 thrift: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 10:01:15 +03:00
Avi Kivity
0ff6e48e77 tests: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 10:00:58 +03:00
Avi Kivity
b60d58d6bd tests: cql_query_test: silence narrowing conversion warning
Make it explicit to gcc 9 that the conversion to bool is intended.
2019-05-07 09:59:44 +03:00
Avi Kivity
5636b621a7 test: cql_auth_syntax_test: fix ambiguity due to parser uninitialized<T>
gcc 9 is unable to decide whether to call role_name's copy or move
constructor. Help it by casting.
2019-05-07 09:58:21 +03:00
Avi Kivity
add20eb9a6 table: fix potentially wrong schema when reading from zero sstables
We use the schema during creation of the mutation_source rather than
during the query itself. Likely they're the same, and since no rows
are returned from a zero-sstable query, harmless. But gcc 9 complains.

Fix by using the query's schema.
2019-05-07 09:56:30 +03:00
Avi Kivity
985a30a01c storage_proxy: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:56:09 +03:00
Avi Kivity
fd3c493961 memtable: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:55:53 +03:00
Avi Kivity
17c268cd55 IDL: silence narrowing conversion in bool serializer
bool serializers are now aliases to int8_t serializers, but gcc 9
complains about narrowing conversions, due to the path int8_t -> int -> bool.

A bad narrowing conversion here cannot happen in practice, but massage
the code a little to silence it.
2019-05-07 09:28:24 +03:00
Avi Kivity
d7cbd3dc61 compaction: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:28:12 +03:00
Avi Kivity
9c7eb95f78 cache: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:27:50 +03:00
Avi Kivity
c42d59d805 locator: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:27:27 +03:00
Avi Kivity
96a0073929 database: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:26:58 +03:00
Avi Kivity
03e9cdbfb0 cql: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:26:20 +03:00
Avi Kivity
c26ec176dd cql parser: fix conversion from uninitalized<T> to optional<T> with gcc 9
We use uninitialized<T> (wrapping an optional<T>) to adjust to the
parser's way of laying out the code, but this fails with gcc 9
(presumably for the correct reasons) when converting from
uninitialized<T> back to optional<T>. Add a conversion operator
to make it build.
2019-05-07 09:21:22 +03:00
Dejan Mircevski
0ea6df2cd1 tests: Add predicates for checking exception messages
Many tests verify exception messages.  Currently, they do so via
verbose lambdas or inner functions that hide test-failure locations.
This patch adds utilities for quick creation of message-checking tests
and replaces existing ad-hoc methods with these new utilities.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20190506210006.124645-1-dejan@scylladb.com>
2019-05-07 07:11:07 +03:00
Avi Kivity
dcb263b36b Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz
"
Commit d0f9e00 changed the representation of the gc_clock::duration
from int32_t to int64_t.

Mutation hashing uses appending_hash<gc_clock::time_point>, which by
default feeds duration::count() into the hasher. duration::rep changed
from int32_t to int64_t, which changes the value of the hash.

This affects schema digest and query digests, resulting in mismatches
between nodes during a rolling upgrade.

Fixes #4460.

Branches: 3.1
"

* tag 'fix-gc_clock-digest-v1' of github.com:tgrabiec/scylla:
  tests: Add test which verifies that schema digest stays the same
  tests: Add sstables for the schema digest test
  gc_clock: Fix hashing to be backwards-compatible
2019-05-07 07:04:40 +03:00
Tomasz Grabiec
8019634dba tests: Add test which verifies that schema digest stays the same 2019-05-06 18:43:43 +02:00
Tomasz Grabiec
1f2995c8c5 tests: Add sstables for the schema digest test
Generated by running test_schema_digest_does_not_change with
regenerate set to true.
2019-05-06 18:43:43 +02:00
Tomasz Grabiec
549d0eb2f3 gc_clock: Fix hashing to be backwards-compatible
Commit d0f9e00 changed the representation of the gc_clock::duration
from int32_t to int64_t.

Mutation hashing uses appending_hash<gc_clock::time_point>, which by
default feeds duration::count() into the hasher. duration::rep changed
from int32_t to int64_t, which changes the value of the hash.

This affects schema digest and query digests, resulting in mismatches
between nodes during a rolling upgrade.

Fixes #4460.
2019-05-06 18:43:43 +02:00
Avi Kivity
a6759dc6aa Update seastar submodule
* seastar 4cdccae...f73690e (16):
  > sstring: silence technically correct but unhelpful warning in sstring move ctor
  > cmake: add a seastar_supports_flag function
  > future: Fix build with libc++'s non-trivially-constructible  std::tuple<>
  > Revert "Make sure all allocations are properly bytes aligned"
  > Merge "future: simplify future_state management" from Rafael
  > Make sure all allocations are properly bytes aligned
  > util/log: use correct clock type
  > core/reactor: don't assume system_clock::duration is in nanoseconds
  > Merge "Optimize the future_state move constructor" from Rafael
  > rpc: don't use boost/variant.hpp directly
  > core/memory: Omit [[gnu::leaf]] attribute on clang
  > Fix build with std::filesystem
  > Merge "Fix clang build and tests" from Rafael
  > cmake: Move ) out of quotes
  > Merge "Fix some bugs found by (or perhaps in) gcc 9" by Avi
  > Deduplicate Seastar dependencies management in CMake scripts
2019-05-06 19:17:37 +03:00
Gleb Natapov
1d851a3892 messaging: catch an error that sending of CLIENT_ID may return
Avoid a warning about unhandled exception.

Message-Id: <20190506122718.GL21208@scylladb.com>
2019-05-06 18:13:51 +03:00
Glauber Costa
79a5351651 scylla-housekeeping: timeout eventually
scylla-housekeeping always wants to run in the installation to check if
we are running the latest version. This happens regardless of whether or
not we said yes or no to the housekeeping scylla_setup question - as
that question only deals with whether or not we want to do this through
a timer.

It is fine to try to run scylla-housekeeping, as long as we time it out.
The current code doesn't.

The naive solution is to add a timeout parameter to urllib.request.open.
However, that timeout is not respected and in my tests I saw real
timeouts up to four times higher the timeout we set. For a reasonable 5s
timeout, this mean a 20s real timeout which can lead to a very bad user
experience. This seems to be a known problem with this module according
to a quick Google search.

This patch then takes a slightly more complex solution and uses
multiprocess to enforce a well-defined user-visible timeout.

Fixes #3980

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190506122335.5707-1-glauber@scylladb.com>
2019-05-06 17:37:59 +03:00
Gleb Natapov
b8188e1e2f storage_proxy: avoid copying of a topology and endpoint array in batchlog code
batchlog make copies of topology and endpoint array in batchlog endpoint
choosing code. There is a remark that at least endpoint copy is
deliberate because Cassandra code has it. We do not have to follow. Our
endpoint calculation code is atomic, so we can use a reference.

Message-Id: <20190506115815.GK21208@scylladb.com>
2019-05-06 17:36:50 +03:00
Raphael S. Carvalho
ef5681486f compaction: do not unconditionally delete a new sstable in interrupted compaction
After incremental compaction, new sstables may have already replaced old
sstables at any point. Meaning that a new sstable is in-use by table and
a old sstable is already deleted when compaction itself is UNFINISHED.
Therefore, we should *NEVER* delete a new sstable unconditionally for an
interrupted compaction, or data loss could happen.
To fix it, we'll only delete new sstables that didn't replace anything
in the table, meaning they are unused.

Found the problem while auditting the code.

Fixes #4479.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190506134723.16639-1-raphaelsc@scylladb.com>
2019-05-06 16:55:36 +03:00
Avi Kivity
1c65ba6e66 Use correct scylla_tables schema for removing version column
Mutations carry their schema, so use that instead of bring in a global schema,
which may change as features are added.
Message-Id: <20190505132542.6472-1-avi@scylladb.com>
2019-05-06 13:51:08 +02:00
Paweł Dziepak
51e98e0e11 tests/perf_fast_forward: report average number of aio operations
perf_fast_forward is used to detect performance regressions. The two
main metrics used for this are fargments per second and the number of
the IO operations. The former is a median of a several runs, but the
latter is just the actual number of asynchronous IO operations performed
in the run that happened to be picked as a median frag/s-wise. There's
no always a direct correlation between frag/s and aio and the latter can
vary which makes the latter hard to compare.

In order to make this easier a new metric was introduced: "average aio"
which reports the average number of asynchronous IO operations performed
in a run. This should produce much more stable results and therefore
make the comparison more meaningful.
Message-Id: <20190430134401.19238-1-pdziepak@scylladb.com>
2019-05-06 11:47:31 +02:00
Piotr Sarna
cf8d2a5141 Revert "view: cache is_index for view pointer"
This reverts commit dbe8491655.
Caching the value was not done in a correct manner, which resulted
in longevity tests failures.

Fixes #4478

Branches: 3.1

Message-Id: <762ca9db618ca2ed7702372fbafe8ecd193dcf4d.1557129652.git.sarna@scylladb.com>
2019-05-06 11:45:46 +03:00
Benny Halevy
d9136f96f3 commitlog: descriptor: skip leading path from filename
std::regex_match of the leading path may run out of stack
with long paths in debug build.

Using rfind instead to lookup the last '/' in in pathname
and skip it if found.

Fixes #4464

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190505144133.4333-1-bhalevy@scylladb.com>
2019-05-05 17:51:56 +03:00
Benny Halevy
3a2fa82d6e time_window_backlog_tracker: fix use after free
Fixes #4465

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190430094209.13958-1-bhalevy@scylladb.com>
2019-05-05 12:47:51 +03:00
Glauber Costa
47d04e49e8 scylla_setup: respect user's decision not to call housekeeping
The setup script asks the user whether or not housekeeping should
be called, and in the first time the script is executed this decision
is respected.

However if the script is invoked again, that decision is not respected.

This is because the check has the form:

 if (housekeeping_cfg_file_exists) {
    version_check = ask_user();
 }
 if (version_check) { do_version_check() } else { dont_do_it() }

When it should have the form:

 if (housekeeping_cfg_file_exists) {
    version_check = ask_user();
    if (version_check) { do_version_check() } else { dont_do_it() }
 }

(Thanks python)

This is problematic in systems that are not connected to the internet, since
housekeeping will fail to run and crash the setup script.

Fixes #4462

Branches: master, branch-3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190502034211.18435-1-glauber@scylladb.com>
2019-05-02 18:46:41 +03:00
Glauber Costa
99c00547ad make scylla_util OS detection robust against empty lines
Newer versions of RHEL ship the os-release file with newlines in the
end, which our script was not prepared to handle. As such, scylla_setup
would fail.

This patch makes our OS detection robust against that.

Fixes #4473

Branches: master, branch-3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190502152224.31307-1-glauber@scylladb.com>
2019-05-02 18:33:35 +03:00
Paweł Dziepak
cf451f0e62 Merge "gdb: Fixes and improvements to memory analysis" from Tomasz
"
One of the fixes is for incorrect recognition of memory pages as belonging
or not belonging to small allocation pools in some cases.

Also, compensates for https://github.com/scylladb/seastar/issues/608 in "scylla memory",
which improves accurracy of the small allocation pool report.

Fixes "scylla task_histogram" to not look into pages which do not belong to live
small allocation pool spans.

Fixes #4367
Fixes #4368
"

* tag 'gdb-fix-span-qualification-v2' of github.com:tgrabiec/scylla:
  gdb: Print size of large allocations in 'scylla ptr'
  gdb: Fix 'scylla ptr' for free pages
  gdb: Set is_live and offset for large allocations properly in 'scylla ptr'
  gdb: Fix 'scylla ptr' misqualifying pointers
  gdb: Make 'scylla memory' show unused memory in small pools
  gdb: Fix small pool memory usage reporting in 'scylla memory'
  gdb: Switch 'scylla memory' to use the span_checker to find large spans
  gdb: Switch task_histogram to use the span_checker
  gdb: Introduce span_checker
2019-05-02 14:25:30 +01:00
Gleb Natapov
95c6d19f6c batchlog_manager: fix array out of bound access
endpoint_filter() function assumes that each bucket of
std::unordered_multimap contains elements with the same key only, so
its size can be used to know how many elements with a particular key
are there.  But this is not the case, elements with multiple keys may
share a bucket. Fix it by counting keys in other way.

Fixes #3229

Message-Id: <20190501133127.GE21208@scylladb.com>
2019-05-01 17:30:11 +03:00
Nadav Har'El
2710f382de secondary index: expand test of secondary-index and UPDATE requests
The existing unit test test_secondary_index_contains_virtual_columns
reproduced a bug (issue #4144) with indexing of primary-key columns,
but we only actually tested clustering columns. In issue #4471 there
was a question whether we may still have a bug when indexing of
*partition-key* columns. This patch adds a test that verifies that
we don't, and this case works well too.

Refs #4144
Refs #4471

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190501113500.25900-1-nyh@scylladb.com>
2019-05-01 12:53:23 +01:00
Nadav Har'El
a45b6e41a0 materialized views and secondary index: sometimes allow dropping base columns
Until this patch, dropping columns from a table was completely forbidden
if this table has any materialized views or secondary indexes. However,
this is excessively harsh, and not compatible with Cassandra which does
allow dropping columns from a base table which has a secondary index on
*other* columns. This incompatibility was raised in the following
Stackoverflow question:
https://stackoverflow.com/questions/55757273/error-while-dropping-column-from-a-table-with-secondary-index-scylladb/55776490

In this patch, we allow dropping a base table column if none of its
materialized views *needs* this column. Columns selected by a view
(as regular or key columns) are needed by it, of course, but when
virtual columns are used (namely, there is a view with same key columns
as the base), *all* columns are needed by the view, so unfortunately none
of the columns may be dropped.

After this patch, when a base-table column cannot be dropped because one
of the materialized views needs it, the error message will look like:

   exceptions::invalid_request_exception: Cannot drop column a from base
   table ks.cf: a materialized view cf_a_idx_index needs this column.

This patch also includes extensive testing for the cases where dropping
columns are now allowed, and not allowed. The secondary-index tests are
especially interesting, because they demonstrate that now usually (when
a non-key column is being indexed) dropping columns will be allowed,
which is what originally bothered the Stackoverflow user.

Fixes #4448.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190429214805.2972-1-nyh@scylladb.com>
2019-04-30 12:13:10 +01:00
Nadav Har'El
92d5f61ba5 cql: support single-value IN restriction wherever EQ restriction is supported
There are several places were IN restrictions are not currently supported,
especially in queries involving a secondary index. However, when the IN
restriction has just a single value, it is nothing more than an equality
restriction and can be converted into one and be supported. So this patch
does exactly this.

Note that Cassandra does this conversion since August 2016, and therefore
supports the special case of single-value IN even where general IN is not
supported. So it's important for Cassandra compatibility that we do this
conversion too.

This patch also includes a test with two queries involving a secondary
index that were previously disallowed because of the "IN" on the primary
key or the indexed column - and are now allowed when the IN restriction
has just a single value. A third query tested is not related to secondary
indexes, but confirms we don't break multi-column single-value IN queries.

Fixes #4455.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190428160317.23328-1-nyh@scylladb.com>
2019-04-30 12:13:06 +01:00
Tomasz Grabiec
1adcb3637e Merge "multishard reader: fix handling of non strictly monotonous positions" from Botond
The shard readers of the multishard reader assumed that the positions in
the data stream are strictly monotonous. This assumption is invalid.
Range tombstones can have positions that they can share with other range
tombstones and/or a clustering row. The effect of this false assumption
was that when the shard reader was evicted such that the last seen
fragment was a range tombstone, when recreated it would skip any unseen
fragments that have the same position as that of the last seen range
tombstone.

Fixes: #4418

Branches: master, 3.0, 2019.1

Tests: unit(dev)

* https://github.com/denesb/scylla.git
multishard_reader_handle_non_strictly_monotonous_positions/v4:
  multishard_combining_reader: shard_reader::remote_reader extract
    fill-buffer logic into do_fill_buffer()
  mutlishard_combining_reader: reorder
    shard_reader::remote_reader::do_fill_buffer() code
  position_in_partition_view: add region() accessor
  multishard_combining_reader: fix handling of non-strictly monotonous
    positions
  flat_mutation_reader: add flat_mutation_reader_from_mutations()
    overload with range and slice
  flat_mutation_reader: add make_flat_mutation_reader_from_fragments()
    overload with range and slice
  tests: add unit test for multishard reader correctly handling
    non-strictly monotonous positions
2019-04-30 12:35:28 +02:00
Tomasz Grabiec
077c639e42 Merge "Simplify the result_set_row API" from Rafael
Currently null and missing values are treated differently. Missing
values throw no_such_column. Null values return nullptr, std::nullopt
or throw null_column_value.

The api is a bit confusing since a function returning a std::optional
either returns std::nullopt or throws depending on why there is no
value.

With this patch series only get_nonnull throws and there is only one
exception type.

* https://github.com/espindola/scylla.git espindola/merge-null-and-missing-v2:
  query-result-set: merge handling of null and missing values
  Remove result_set_row::has
  Return a reference from get_nonnull
2019-04-30 11:06:29 +02:00
Rafael Ávila de Espíndola
63c47117b5 Return a reference from get_nonnull
No reason to copy if we don't have to. Now that get_nonnull doesn't
copy, replace a raw used of get_data_value with it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-29 21:14:11 -07:00
Rafael Ávila de Espíndola
0474458872 Remove result_set_row::has
Now that the various get methods return nullptr or std::nullopt on
missing values, we don't need to do double lookups.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-29 19:56:26 -07:00
Rafael Ávila de Espíndola
2770b29036 query-result-set: merge handling of null and missing values
Nothing seems to differentiate a missing and a null value. This patch
then merges the two exception types and now the only method that
throws is get_nonnull. The other methods return nullptr or
std::nullopt as appropriate.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-29 19:56:20 -07:00
Avi Kivity
3726a4fbd9 Merge "Fix schema disagreement during rolling upgrade" from Tomasz
"
After 7c87405, schema sync includes system_schema.view_virtual_columns in the
schema digest. Old nodes don't know about this table and will not include it
in the digest calculation. As a result, there will be schema disagreement
until the whole cluster is upgraded.

Also, the order in which tables were hashed changed in 7c87405, which
causes digests to differ in some schemas.

Fixes #4457.
"

* tag 'fix-disagreement-during-upgrade-v2' of github.com:tgrabiec/scylla:
  db/schema_tables: Include view_virtual_columns in the digest only when all nodes do
  storage_service: Introduce the VIEW_VIRTUAL_COLUMNS cluster feature
  db/schema_tables: Hash schema tables in the same order as on 3.0
  db/schema_tables: Remove table name caching from all_tables()
  treewide: Propagate schema_features to db::schema::all_tables()
  enum_set: Introduce full()
  service/storage_service: Introduce cluster_schema_features()
  schema: Introduce schema_features
  schema_tables: Propagate storage_service& to merge_schema()
  gms/feature: Introduce a more convenient when_enabled()
  gms/feature: Mark all when_enabled() overloads as const
2019-04-29 14:23:53 +03:00
Avi Kivity
ede1d248af tools: toolchain: improve dbuild signal handing
Currently, we use --sig-proxy to forward signals to the container. However, this
requires the container's co-operation, which usually doesn't exist. For example,

    docker run --sig-proxy fedora:29 bash -c "sleep 5"

Does not respond to ctrl-C.

This is a problem for continuous integration. If a build is aborted, Jenkins will
first attempt to gracefully terminate the processes (SIGINT/SIGTERM) and then give
up and use SIGKILL. If the graceful termination doesn't work, we end up with an
orphan container running on the node, which can then consume enough memory and CPU
to harm the following jobs.

To fix this, trap signals and handle them by killing the container. Also trap
shell exit, and even kill the container unconditionally, since if Jenkins happens
to kill the "docker wait" process the regular paths will not be taken.

We lose a lot by running the container asynchronously with the dbuild shell
script, so we need to add it back:

 - log display: via the "docker logs" command
 - auto-removal of the container: add a "docker rm -f" command on signal
   or normal exit
Message-Id: <20190424130112.794-1-avi@scylladb.com>
2019-04-29 10:05:21 +02:00
Botond Dénes
aa18bb33b9 tests: add unit test for multishard reader correctly handling non-strictly monotonous positions 2019-04-29 10:24:14 +03:00
Botond Dénes
51e81cf027 flat_mutation_reader: add make_flat_mutation_reader_from_fragments() overload with range and slice
To be able to support this new overload, the reader is made
partition-range aware. It will now correctly only return fragments that
fall into the partition-range it was created with. For completeness'
sake and to be able to test it, also implement
`fast_forward_to(const dht::partition_range)`. Slicing is done by
filtering out non-overlapping fragments from the initial list of
fragments. Also add a unit test that runs it through the mutation_source
test suite.
2019-04-29 10:24:14 +03:00
Tomasz Grabiec
c96ee9882b db/schema_tables: Include view_virtual_columns in the digest only when all nodes do
After 7c87405, schema sync includes system_schema.view_virtual_columns
in the schema digest. Old nodes don't know about this table and will
not include it in the digest calculation. As a result, there will be
schema disagreement until the whole cluster is upgraded.

Fix this by taking the new table into account only when the whole
cluster is upgraded.

The table should not be used for anything before this happens. This is
not currently enforced, but should be.

Fixes #4457.
2019-04-28 15:50:13 +02:00
Tomasz Grabiec
a108df09f9 storage_service: Introduce the VIEW_VIRTUAL_COLUMNS cluster feature
Needed for determining if all nodes in the cluster are aware of the
new schema table. Only when all nodes are aware of it we can take it
into account when calculating schema digest, otherwise there would be
permanent schema disagreement in during rolling upgrade.
2019-04-28 15:50:13 +02:00
Tomasz Grabiec
73b859005c db/schema_tables: Hash schema tables in the same order as on 3.0
The commit 7c87405 also indirectly changed the order of schema tables
during hash calculation (index table should be taken after all other
tables). This shows up when there is an index created and any of {user
defined type, function, or aggregate}.

Refs #4457.
2019-04-28 15:50:13 +02:00
Tomasz Grabiec
394a684a99 db/schema_tables: Remove table name caching from all_tables()
The set of table names will depend on the features and thus will be dynamic.
2019-04-28 15:50:13 +02:00
Tomasz Grabiec
3cb7b2d72e treewide: Propagate schema_features to db::schema::all_tables() 2019-04-28 15:50:13 +02:00
Tomasz Grabiec
f33f0d759d enum_set: Introduce full() 2019-04-28 15:50:12 +02:00
Tomasz Grabiec
1d9b88dceb service/storage_service: Introduce cluster_schema_features() 2019-04-28 15:50:12 +02:00
Tomasz Grabiec
0633fcde10 schema: Introduce schema_features 2019-04-28 15:50:12 +02:00
Tomasz Grabiec
6e2c190b5f schema_tables: Propagate storage_service& to merge_schema()
We will need to calculate cluster schema features at the time we
calculate the schema digest.
2019-04-28 12:33:10 +02:00
Tomasz Grabiec
6db002163f gms/feature: Introduce a more convenient when_enabled()
It can be invoked with a lambda without the ceremony of creating a
class deriving from gms::feature::listener.

The reutrned registration object controls listener's scope.
2019-04-28 12:33:10 +02:00
Tomasz Grabiec
22c07b9183 gms/feature: Mark all when_enabled() overloads as const 2019-04-28 12:33:10 +02:00
Rafael Ávila de Espíndola
ee9f3388f6 cql_query_test: Fix a use after return
There was nothing keeping the verify lambda alive after the return. It
worked most of the time since the only state kept by the lambda was
a pointer to cql_test_env.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190426203823.15562-1-espindola@scylladb.com>
2019-04-27 08:06:35 +03:00
Avi Kivity
07d06aee43 Update seastar submodule
* seastar e84d2647c...4cdccae53 (4):
  > Merge "future: Move some code out of line" from Rafael
  > tests: socket_test: Add missing virtual and override
  > build: Don't pass -Wno-maybe-uninitialized to clang
  > Merge "expose file_permssions for creating files and dirs in API" from Benny
2019-04-26 22:58:48 +03:00
Tomasz Grabiec
c6274fdef3 keys: Avoid implicit conversion to partition_key in the hasher of partition_key_view
Message-Id: <1556230107-13557-1-git-send-email-tgrabiec@scylladb.com>
2019-04-26 20:02:35 +03:00
Botond Dénes
bc08f8fd07 flat_mutation_reader: add flat_mutation_reader_from_mutations() overload with range and slice
To be able to run the mutation-source test suite with this reader. In
the next patch, this reader will be used in testing another reader, so
it is important to make sure it works correctly first.
2019-04-26 12:43:45 +03:00
Botond Dénes
eba310163d multishard_combining_reader: fix handling of non-strictly monotonous positions
The shard readers under a multishard reader are paused after every
operation executed on them. When paused they can be evicted at any time.
When this happens, they will be re-created lazily on the next
operation, with a start position such that they continue reading from
where the evicted reader left off. This start position is determined
from the last fragment seen by the previous reader. When this position
is clustering position, the reader will be recreated such that it reads
the clustering range (from the half-read partition): (last-ckey, +inf).
This can cause problems if the last fragment seen by the evicted reader
was a range-tombstone. Range tombstones can share the same clustering
position with other range tombstones and potentially one clustering row.
This means that when the reader is recreated, it will start from the
next clustering position, ignoring any unread fragments that share the
same position as the last seen range tombstone.
To fix, ensure that on each fill-buffer call, the buffer contains all
fragments for the last position. To this end, when the last fragment in
the buffer is a range tombstone (with pos x), we continue reading until
we see a fragment with a position y that is greater. This way it is
ensured that we have seen all fragments for pos x and it is safe to
resume the read, starting from after position x.
2019-04-26 11:38:12 +03:00
Botond Dénes
b30af48c83 position_in_partition_view: add region() accessor 2019-04-26 11:38:12 +03:00
Vlad Zolotarov
274b9d8069 hinted_handoff: sender::can_send(): optimize gossiper::is_alive(ep) check
gossiper::is_alive() has a lot of not needed checks (e.g. is_me(ep)) that
are irrelevant for HH use case and we may safely skip them.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-04-25 23:16:07 -04:00
Vlad Zolotarov
74b4076ceb hinted handoff: end_point_hints_manager::sender: use _gossiper instead of _shard_manager.local_gossiper()
sender has its own reference to the local gossiper - use it.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-04-25 23:04:02 -04:00
Vlad Zolotarov
fe82437dea types.cc: fix the compilation with fmt v5.3.0
Compilation fails with fmt release 5.3.0 when we print a bytes_view
using "{}" formatter.

Compiler's complain is: "error: static assertion failed: mismatch
between char-types of context and argument"

Fix this by explicitly using to_hex() converter.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-04-25 23:04:02 -04:00
Piotr Sarna
037b517c85 service: initialize system distributed keyspace after schema agreement
In order to avoid schema disagreements during upgrades (which may lead
to deadlocks), system distributed keyspace initialization is moved
right before starting the bootstrapping process, after the schema
agreement checks already succeeded.

Fixes #3976
Message-Id: <932e642659df1d00a2953df988f939a81275774a.1556204185.git.sarna@scylladb.com>
2019-04-25 18:44:08 +02:00
Raphael S. Carvalho
ccb29c6c20 sstables: make partitioned sstable set available to custom compaction strategies
To make it available, we'll need to make it optional the usage of level metadata,
used to deal with interval map's fragmentation issue when level 0 falls behind,
and also introduce a interface for compaction strategies to implement
make_sstable_set() that instantiate partitioned sstable set.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190424232948.668-1-raphaelsc@scylladb.com>
2019-04-25 12:59:04 +03:00
Botond Dénes
a3f79bfe5e mutlishard_combining_reader: reorder shard_reader::remote_reader::do_fill_buffer() code
Reduce the number of indentations - use early return for the short path.
2019-04-24 10:55:16 +03:00
Botond Dénes
bbd3f0acc3 multishard_combining_reader: shard_reader::remote_reader extract fill-buffer logic into do_fill_buffer() 2019-04-24 10:55:16 +03:00
Avi Kivity
b19792405f main: RAII-ify shutdown
Instead of app-template::run_deprecated() and at_exit() hooks, use
app_template::run() and RAII (via defer()) to stop services. This makes it
easier to add services that do support shutdown correctly.

Ref #2737
Message-Id: <20190420175733.29454-1-avi@scylladb.com>
2019-04-23 16:13:39 +02:00
Avi Kivity
9a6c86e2a7 config: convert _make_config_values to individual named_value member declarations and initializers
While causing some duplication (names are explicitly instead of implicitly
stringified, and names are repeated in the member declaration and initializer),
it is overall more maintainable than the huge macro. It is easier to overload
named_value constructors when you can get error reporting on the line where the error
occurs, for example.
2019-04-23 16:29:03 +03:00
Avi Kivity
4b3c2f6514 config: add allowed_values parameter to named_value constructor
The _make_config_values() macro supples an optional list of allowed values
for a config item, so support that, even though no one uses it yet.
2019-04-23 16:29:03 +03:00
Avi Kivity
d959fbfc16 config: auto-add named_values into config_file
By passing a config_file into named_value, we remove another call to the
_make_config_values() macro.
2019-04-23 16:29:03 +03:00
Avi Kivity
b663cd1765 api: config: stop using _make_config_values
Now that named_value::value_as_json() exists, make use of it to report the
current value of a configuration variable via the REST API, instead of
_make_config_values().
2019-04-23 16:29:03 +03:00
Avi Kivity
6033b6a079 config: add named_value::value_as_json()
Currently, the REST API does its own conversion of named_value into json.
This requires it to use the _make_config_values macro to perform iteration
of all config items, since it needs to preserve the concrete type of the item
while iterating, so it can select the correct json conversion.

Since we want to remove that macro, we need to provide a different way to
convert a config item to json. So this patch adds a value_as_json().

To hide json_return_value from the rest of the system, we extend config_type
with a conversion function to handle the details. This usually calls
the json_return_type constructor directly, but when it doesn't have default
translation, it interposes a conversion into a type that json recognizes.

I didn't bother maintaining the existing type names, since they're C++
names which don't make sense for the UI.
2019-04-23 16:28:19 +03:00
Avi Kivity
db3f61776f config: remove value_status from named_value template parameter list
The value_status is only needed at run-time, and removing it from the
template parameter list reduces type proliferation (which leads to code
bloat) and simplifies the code.
2019-04-23 16:15:28 +03:00
Avi Kivity
daf5744daa config: make the named_value type name available without requiring _make_config_values
I want to remove the _make_config_values macro, but it is needed now in
api/config.cc to make the type names available. So as a first step, copy the
type names to config_src. Further changes can extract it from there.

Because we want to add more type infomation in following patches, place the type
name in a new config_type object, instead of allocating a string_view in
config_src.
2019-04-23 16:13:54 +03:00
Tomasz Grabiec
21fbf59fa8 lsa: Fix compact_and_evict() being called with a too low step
compact_and_evict gets memory_to_release in bytes while
reclamation step is in segments.

Broken in f092decd90.

It doesn't make much difference with the current default step of 1
segment since we cannot reclaim less than that, so shouldn't cause
problems in practice.

Message-Id: <1556013920-29676-1-git-send-email-tgrabiec@scylladb.com>
2019-04-23 13:14:43 +03:00
Gleb Natapov
c6b3b9ff13 cache_hitrate_calculator: wait for ongoing calculation to complete during stop
Currently stop returns ready future immediately. This is not a problem
since calculation loop holds a shared pointer to the local service, so
it will not be destroyed until calculation completes and global database
object db, that also used by the calculation, is never destroyed. But the
later is just a workaround for a shutdown sequence that cannot handle
it and will be changed one day. Make cache hitrate calculation service
ready for it.

Message-Id: <20190422113538.GR21208@scylladb.com>
2019-04-22 14:44:42 +03:00
Takuya ASADA
64c2aa8f9b reloc/python3: add missing SCYLLA-PRODUCT-FILE to python3 relocatable package
Since 214c74a, we need SCYLLA-PRODUCT-FILE on relocatable package so add
it on python3 package as well.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190422085620.22486-1-syuu@scylladb.com>
2019-04-22 13:56:38 +03:00
Gleb Natapov
306f5b99b5 cache_hitrate_calculator: fix use after free in non_system_filter lambda
non_system_filter lambda is defined static which means it is initialized
only once, so the 'this' that is will capture will belong to a shard
where the function runs first. During service destruction the function
may run on different shard and access already other's shard service that
may be already freed.

Fixed #4425

Message-Id: <20190421152139.GN21208@scylladb.com>
2019-04-21 18:22:31 +03:00
Amnon Heiman
9ad63efcfe Adding node_exporter to docker
This patch add the node_exporter to the docker image.
It install it create and run a service with it.

After this patch node_exporter will run and will be part of scylla
Docker image.

Fixes #4300

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20190421130643.6837-1-amnon@scylladb.com>
2019-04-21 18:12:58 +03:00
Benny Halevy
0c9aaef673 sstables: make lamdas that std:move mutable
As noticed by Rafael Ávila de Espíndola <espindola@scylladb.com>
regarding commit 5a99023d4a:
Without the lambda being mutable, the second std::move actually doesn't move anything.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190421150422.19304-1-bhalevy@scylladb.com>
2019-04-21 18:11:42 +03:00
Benny Halevy
5a99023d4a treewide: use lambda for io_check of *touch_directory
To prepare for a seastar change that adds an optional file_permissions
parameter to touch_directory and recursive_touch_directory.
This change messes up the call to io_check since the compiler can't
derive the Func&& argument.  Therefore, use a lambda function instead
to wrap the call to {recursive_,}touch_directory.

Ref #4395

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190421085502.24729-1-bhalevy@scylladb.com>
2019-04-21 12:04:39 +03:00
Tomasz Grabiec
f092decd90 lsa: Fix potential bad_alloc even though evictable memory exists
When we start the LSA reclamation it can be that
segment_pool::_free_segments is 0 under some conditions and
segment_pool::_current_emergency_reserve_goal is set to 1. The
reclamation step is 1 segment, and compact_and_evict_locked() frees 1
segment back into the segment_pool. However,
segment_pool::reclaim_segments() doesn't free anything to the standard
allocator because the condition _free_segments >
_current_emergency_reserve_goal is false. As a result,
tracker::impl::reclaim() returns 0 as the amount of released memory,
tracker::reclaim() returns
memory::reclaiming_result::reclaimed_nothing and the seastar allocator
thinks it's a real OOM and throws std::bad_alloc.

The fix is to change compact_and_evict() to make sure that reserves
are met, by releasing more if they're not met at entry.

This change also allows us to drop the variant of allocate_segment()
which accepts the reclamation step as a means to refill reserves
faster. This is now not needed, because compact_and_evict() will look
at the reserve deficit to increase the amount of memory to reclaim.

Fixes #4445

Message-Id: <1555671713-16530-1-git-send-email-tgrabiec@scylladb.com>
2019-04-20 09:17:49 +03:00
Avi Kivity
704600f829 Update seastar submodule
* seastar eb03ba5cd...e84d2647c (14):
  > Fix hardcoded python paths in shebang line
  > Disable -Wmaybe-uninitialized everywhere
  > app_template: allow opting out of automatic SIGINT/SIGTERM handling
  > build: Restore DPDK machine inference from cflags
  > http: capture request content for POST requests
  > Merge "Simplify future_state and promise" from Rafael
  > temporary_buffer: fix memleak on fast path
  > perftune.py: allow explicitly giving a CPU mask to be used for binding IRQs
  > perftune.py: fix the sanity check for args.tune
  > perftune.py: identify fast-path hardware queues IRQs of Mellanox NICs
  > memory: malloc_allocator should be always available
  > Merge "Using custom allocator in the posix network stack" from Elazar
  > memory: Tell reclaimers how much should be reclaimed
  > net/ipv4_addr: add std::hash & operator== overloads
2019-04-20 09:16:53 +03:00
Avi Kivity
d485facea2 Revert "tools: toolchain: improve dbuild signal handing"
This reverts commit 6c672e674b. It loses
build logs, and the patch that restores logs causes build failures, so
the whole thing needs to be revisited.
2019-04-19 15:16:42 +03:00
Takuya ASADA
0a874f1897 dist/docker/redhat: prioritize /opt/scylladb/python3/bin on $PATH
To prevent running entrypoint script in another python3 package like
python36 in EPEL, move /opt/scylladb/python3/bin to top of $PATH.
It won't happen on this container image, but may occurs when user tries to
extend the image.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190417165806.12212-1-syuu@scylladb.com>
2019-04-19 11:47:40 +03:00
Takuya ASADA
c3dae6673f dist/common/scripts: use out() to run perftune.py
perftune.py executes hwloc-calc, the command is now provided as
relocatable binary, placed under /opt/scylladb/bin.
So we need to add the directory to PATH when calling
subprocess.check_output(), but our utility function already do that,
switch to it.

Fixes #4443

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190418124345.24973-1-syuu@scylladb.com>
2019-04-19 11:47:40 +03:00
Benny Halevy
9785754e0d distributed_loader: do not follow symlinks when verifying mode and owner
We allow only regular files and directotries so to detect symlinks
we must not follow them.

Fixes #4375

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190418051627.9298-1-bhalevy@scylladb.com>
2019-04-19 11:47:40 +03:00
Takuya ASADA
214c74a71d dist: merge product name parameter on single place
When we add product name customization, we mistakenly defined the
parameter on each package build script.
Number of script is increasing since we recently added relocatable
python3 package, we should merge it in single place.

Also we should save the parameter on relocatable package, just like
version-release parameters.

So move the definition to SCYLLA-VERSION-GEN, save it to
build/SCYLLA-PRODUCT-FILE then archive it to relocatable package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190417163335.10191-1-syuu@scylladb.com>
2019-04-19 11:47:40 +03:00
Paweł Dziepak
d47ea66ec6 messaging_service: add lz4_fragmented RPC compressor
Seastar now supports two RPC compression algorithm: the original LZ4 one
and LZ4_FRAGMENTED. The latter uses lz4 stream interface which allows it
to process large messages without fully linearising them. Since, RPC
requests used by Scylla often contain user-provided data that
potentially could be very large, LZ4_FRAGMENTED is a better choice for
the default compression algorithm.

Message-Id: <20190417144318.27701-1-pdziepak@scylladb.com>
2019-04-18 19:07:14 +03:00
Takuya ASADA
592fec32a0 dist/common/scripts: use /etc/os-release to detect distributions
Since we moved relocatable .rpm now Scylla able to run on Amazon Linux
2.
However, is_redhat_variant() on scylla_util.py does not works on Amazon
Linux 2, since it does not have /etc/redhat-release.
So we need to switch to /etc/os-release, use ID_LIKE to detect Redhat
variants/Debian variants.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190417115634.9635-1-syuu@scylladb.com>
2019-04-18 19:07:14 +03:00
Takuya ASADA
3cf7cf015a dist/docker/redhat: use relocatable python3 on docker-entrypoint.py
Switch to relocatable python3 instead of EPEL's python3 on docker-entrypoint.py.
Also drop uneeded dependencies, since we switched to relocatable scylla
image.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190417111024.6604-1-syuu@scylladb.com>
2019-04-18 19:07:14 +03:00
Paweł Dziepak
85409c1a16 Merge "Validate elements of collections" from Piotr
"
Previously we weren't validating elements of collections so it
was possible to add non-UTF-8 string to a column with type
list<text>.

Tests: unit(release)

Fixes #4009
"

* 'haaawk/4009/v5' of github.com:scylladb/seastar-dev:
  types: Test correct map validation
  types: Test correct in clause validation
  types: Test correct tuple validation
  types: Test correct set validation
  types: Test correct list validation
  types: Add test_tuple_elements_validation
  types: Add test_in_clause_validation
  types: Add test_map_elements_validation
  types: Add test_set_elements_validation
  types: Add test_list_elements_validation
  types: Validate input when tuples
  types: Validate input when parsing a set
  types: Validate input when parsing a map
  types: Validate input when parsing a list
  types: Implement validation for tuple
  types: Implement validation for set
  types: Implement validation for map
  types: Implement validation for list
  types: Add cql_serialization_format parameter to validate
2019-04-18 19:07:14 +03:00
Botond Dénes
6e85d1e8c1 date_type_impl: add notice explaining why its not used
And why is it still in the code. The note has been copied from Origin.

Refs: #4419
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <c7790a898c331a7f58014d82a10cbc9ee7ad3265.1555483620.git.bdenes@scylladb.com>
2019-04-18 19:07:14 +03:00
Piotr Jastrzebski
134b59a425 table_helper: take insert function arguments by value
Previous version wasn't working correctly with r-values.

Fixes #4438

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <5017b04901c47bd826b2e411e603ce01e42a83a5.1555424512.git.piotr@scylladb.com>
2019-04-16 17:34:35 +03:00
Tomasz Grabiec
5dc3f5ea33 Merge "Properly enable MC format on the cluster" from Piotr
1. All nodes in the cluster have to support MC_SSTABLE_FEATURE
2. When a node observes that whole cluster supports MC_SSTABLE_FEATURE
   then it should start using MC format.
3. Once all shards start to use MC then a node should broadcast that
   unbounded range tombstones are now supported by the cluster.
4. Once whole cluster supports unbounded range tombstones we can
   start accepting them on CQL level.

tests: unit(release)

Fixes #4205
Fixes #4113

* seastar-dev.git dev/haaawk/enable_mc/v11:
  system_keyspace: Add scylla_local
  system_keyspace: add accessors for SCYLLA_LOCAL
  storage_service: add _sstables_format field
  feature: add when_enabled callbacks
  system_keyspace: add storage_service param to setup
  Add sstable format helper methods
  Register feature listeners in storage_service
  Add service::read_sstables_format
  Use read_sstables_format in main.cc
  Use _sstables_format to determine current format
  Add _unbounded_range_tombstones_feature
  Update supported features on format change
2019-04-16 14:07:05 +02:00
Avi Kivity
6c672e674b tools: toolchain: improve dbuild signal handing
Currently, we use --sig-proxy to forward signals to the container. However, this
requires the container's co-operation, which usually doesn't exist. For example,

    docker run --sig-proxy fedora:29 bash -c "sleep 5"

Does not respond to ctrl-C.

This is a problem for continuous integration. If a build is aborted, Jenkins will
first attempt to gracefully terminate the processes (SIGINT/SIGTERM) and then give
up and use SIGKILL. If the graceful termination doesn't work, we end up with an
orphan container running on the node, which can then consume enough memory and CPU
to harm the following jobs.

To fix this, trap signals and handle them by killing the container. Also trap
shell exit, and even kill the container unconditionally, since if Jenkins happens
to kill the "docker wait" process the regular paths will not be taken.
Message-Id: <20190415084040.12352-1-avi@scylladb.com>
2019-04-16 14:07:05 +02:00
Tomasz Grabiec
ac0d435c3e Merge "hinted handoff: don't reuse_segments and discard corrupted segments" from Vlad
This series addresses two issues in the hinted handoff that should
complete fixing the infamous #4231.

In particular the second patch removes the requirement to manually
delete hints files after upgrading to 3.0.4.

Tested with manual unit testing.

* https://github.com/vladzcloudius/scylla.git hinted_handoff_drop_broken_segments-v3:
  hinted handoff: disable "reuse_segments"
  commitlog: introduce a segment_error
  hinted handoff: discard corrupted segments
2019-04-16 14:07:05 +02:00
Avi Kivity
643bddbecc Update seastar submodule
* seastar 6f73675...eb03ba5 (11):
  > tests: tests C++14 dialect in continuous integration
  > rpc/compressor/lz4: fix std:variant related compiler errors
  > tests: futures_test: allow project to compile with C++14
  > Merge "io_queue: make io_priority_class namespace global" from Benny
  > future::then_wrapped: use std::terminate instead of abort
  > reactor: make metric about task quota violations less sensitive
  > Merge "Add LZ4_FRAGMENTED compressor for RPC" from Paweł
  > Fix build issues with Clang 7
  > Merge "file_stat follow_symlink option and related fixes" from Benny
  > doc/tutorial.md: reword mention of seastar::thread premption on get()
  > tests: semaphore_test: relax timeouts

Fixes #4272.
2019-04-16 14:34:32 +03:00
Raphael S. Carvalho
52e1125b52 sstables: do not destroy sstable runs after resharding
Resharding wasn't preserving the sstable run structure, which depends
on all fragments sharing the same run identifier. So let's make
resharding run aware, meaning that a run will be created for each
shard involved.

tests: release mode.

Fixes #4428.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190415193556.16435-1-raphaelsc@scylladb.com>
2019-04-16 10:34:49 +03:00
Tomasz Grabiec
ff66b27754 gdb: heapprof: Coalesce parents in the flamegraph mode
This change drops the hit count from the name of the node, because it
prevents coalescing of nodes which are shared parents for paths with
different counts. This lack of coalescing makes the flamegraph a lot
less useful.

Message-Id: <1555348576-26382-1-git-send-email-tgrabiec@scylladb.com>
2019-04-15 21:05:08 +03:00
Tomasz Grabiec
3fd82021b1 schema_tables: Serialize schema merges fairly
All schema changes made to the node locally are serialized on a
semaphore which lives on shard 0. For historical reasons, they don't
queue but rather try to take the lock without blocking and retry on
failure with a random delay from the range [0, 100 us]. Contenders
which do not originate on shard 0 will have an extra disadvantage as
each lock attempt will be longer by the across-shard round trip
latency. If there is constant contention on shard 0, contenders
originating from other shards may keep loosing to take the lock.

Schema merge executed on behalf of a DDL statement may originate on
any shard. Same for the schema merge which is coming from a push
notification. Schema merge executed as part of the background schema
pull will originate on shard 0 only, where the application state
change listeners run. So if there are constant schema pulls, DDL
statements may take a long time to get through.

The fix is to serialize merge requests fairly, by using the blocking
semaphore::wait(), which is fair.

We don't have to back-off any more, since submit_to() no longer has a
global concurrency limit.

Fixes #4436.

Message-Id: <1555349915-27703-1-git-send-email-tgrabiec@scylladb.com>
2019-04-15 20:40:38 +03:00
Botond Dénes
c6314e422f tests/mutation_source_test: use a single random seed
Currently, each instanciation of `random_mutation_generator::impl` will
generate a new random seed for itself. Altough these are printed,
mapping back all the printed seeds to the exact source location where it
has to be substituted in is non-trivial. This makes reproducing random
test failures very hard. To solve this problem, use
`tests::random::get_int()` to produce the random seed of the
`random_mutation_generator::impl` instances. This way the seed of all
the mutation generator will be derived from a single "master" seed that
is easily replaced after a test failure, hopefully also leading to
easily reproducible random test failures.

I checked that after substituting in a previously generated master
random seed, all derived seeds were exactly the same.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <0471415938fc27485975ef9213d37d94bff20fd5.1555329062.git.bdenes@scylladb.com>
2019-04-15 17:37:31 +03:00
Avi Kivity
3afbe219cd Merge "UDF/UDA related cleanups and refactoring" from Rafael
"
These are patches I wrote while working on UDF/UDA, but IMHO they are
independent improvements and are ready for review.

Tests: unit (debug) dtest (release)

I checked that all tests in

nosetests -v  user_types_test.py sstabledump_test.py cqlsh_tests/cqlsh_tests.py

now pass.
"

* 'espindola/udf-uda-refactoring-v3' of https://github.com/espindola/scylla:
  Refactor user type merging
  cql_type_parser::raw_builder: Allow building types incrementally
  cql3: delete dead code
  Include missing header
  return a const reference from return_type
  delete unused var
  Add a test on nested user types.
2019-04-15 16:52:13 +03:00
Glauber Costa
c01ed239a3 fix typo in create table statement error message
specifed -> specified

Fixes #4434

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190415125206.2993-1-glauber@scylladb.com>
2019-04-15 16:51:13 +03:00
Benny Halevy
b543ab4c76 sstables: remove_temp_dir: do not return then_wrapped future
f.get_exception makes the future invalid so it must not be returned.
Instead, make_exception_future<> with the exception ptr.

Fixes #4435.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190415111909.30499-1-bhalevy@scylladb.com>
2019-04-15 16:42:49 +03:00
Glauber Costa
b9327f81cf conf: stop telling people to run auto_bootstrap: false
auto_bootstrap: false provide negligible gains for new clusters and
it is extremely dangerous everywhere else. We have seen a couple of
times in which users, confused by this, added this flag by mistake
and added nodes with it. While they were pleased by the extremely fast
times to add nodes, they were later displeased to find their data
missing.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190414012028.20767-1-glauber@scylladb.com>
2019-04-14 10:42:25 +03:00
Piotr Jastrzebski
2c599122e1 Update supported features on format change
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 10:38:31 +02:00
Piotr Jastrzebski
9c7e3dd470 Add _unbounded_range_tombstones_feature
This requires introduction of storage_service::get_known_features
and using it with check_knows_remote_features.
Otherwise a node joining the existing cluster won't be able to
join because it does not support unbounded range tombstones yet.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 10:37:12 +02:00
Piotr Jastrzebski
96ad8f7df9 Use _sstables_format to determine current format
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 10:37:12 +02:00
Piotr Jastrzebski
da1eba5bdb Use read_sstables_format in main.cc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 10:37:12 +02:00
Piotr Jastrzebski
7339e9de30 Add service::read_sstables_format
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 10:37:12 +02:00
Piotr Jastrzebski
9934740c39 Register feature listeners in storage_service
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 10:36:58 +02:00
Piotr Jastrzebski
7a62235259 Add sstable format helper methods
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 09:33:40 +02:00
Piotr Jastrzebski
caa6798f2c system_keyspace: add storage_service param to setup
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 09:33:40 +02:00
Piotr Jastrzebski
460fb260cb feature: add when_enabled callbacks
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 09:33:40 +02:00
Piotr Jastrzebski
081542cf00 storage_service: add _sstables_format field
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 09:33:40 +02:00
Piotr Jastrzebski
0211541d84 system_keyspace: add accessors for SCYLLA_LOCAL
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 09:33:40 +02:00
Piotr Jastrzebski
4c205b733a system_keyspace: Add scylla_local
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 09:33:40 +02:00
Benny Halevy
adf539fb2c tests: sstable_test_env::do_with_async: wait_for_background_jobs
To solve memory leak seen in
sstable_datafile_test -t test_old_format_non_compound_range_tombstone_is_read

Refs #4376

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190411154621.9716-1-bhalevy@scylladb.com>
2019-04-11 18:50:42 +03:00
Takuya ASADA
4636284856 dist/ami: drop EPEL, convert scylla_install_ami script to python2
We have to run this script in python2, since we dropped EPEL from
dependencies, and the script is installer for rpms so we cannot use
relocatable python3 for it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190411151858.2292-1-syuu@scylladb.com>
2019-04-11 18:21:48 +03:00
Glauber Costa
f3a24b6c22 dist: remove curl dependency to simplify dependency list further
Although curl is widely available, there is no reason to depend on it.
There are mainly two users, as indicated by grep:
1) scylla-housekeeping
2) scripts within the AMI
3) docker image

The AMI has its own RPM and it already depends on curl. While we could
get rid of the curl dependency there too, we can do that later. Docker
is its own thing and it only needs it at build time anyway.

For the main scylla repo, this patch changes scylla-housekeeping so as
not to depend on the curl binary and use urllib directly instead. We can
then remove curl from our dependency list.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190411125642.9754-1-glauber@scylladb.com>
2019-04-11 16:12:36 +03:00
Benny Halevy
8181acd83b test.py: fail if given test name not found
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190411092041.24712-1-bhalevy@scylladb.com>
2019-04-11 12:31:23 +03:00
Tzach Livyatan
f444c949bd Fix the Dockerhub documentation for listen-address
Fix listen-address documention: it is used for internal communication, not for external clients

Signed-off-by: Tzach Livyatan <tzach@scylladb.com>
Message-Id: <20190410181409.16078-1-tzach@scylladb.com>
2019-04-11 11:53:40 +03:00
Botond Dénes
f201f8abab types: fix date_type_impl::less() (timestamp cql type)
date_type_impl::less() invokes `compare_unsigned()` to compare the
underlying raw byte values. `compared_unsigned()` is a tri comparator,
however `date_type_impl::less()` implicitely converted the returned
value to bool. In effect, `date_type_impl::less()` would *always* return
`true` when the two compared values were not equal.

Found while working on a unit test which empoly a randomly generated
schema to test a component.


Fixes #4419.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <8a17c81bad586b3772bf3d1d1dae0e3dc3524e2d.1554907100.git.bdenes@scylladb.com>
2019-04-10 21:01:25 +03:00
Botond Dénes
90721468f0 tests/mutation_diff: remove false-positive diff of the partition header
Currently the partition header will always be reported as different when
comparing two mutations. This is because they are prepended with the
"expected: " and "... but got: " texts. This generates unnecessary
noise. Inject a new line between the prefix and the partition-header
proper. This way the partition header will only show up in the diff when
there is an actual difference. The "expected: " and "... but got: "
phrases are still shown as different on the top of the diff but this is
fine as one can immediately see that they are not part of the data and
additionaly they help the reader in determining which part of the diff
is the expected one and which is the actual one.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <29e0f413d248048d7db032224a3fd4180bf1b319.1554909144.git.bdenes@scylladb.com>
2019-04-10 18:05:36 +02:00
Raphael S. Carvalho
8a117c338a compaction: fix use-after-free when calculating backlog after schema change
The problem happens after a schema change because we fail to properly
remove ongoing compaction, which stopped being tracked, from list that
is used to calculate backlog, so it may happen that a compaction read
monitor (ceases to exist after compaction ends) is used after freed.

Fixes #4410.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190409024936.23775-1-raphaelsc@scylladb.com>
2019-04-10 15:54:39 +03:00
Vlad Zolotarov
db2ba0df61 hinted handoff: discard corrupted segments
If we discover that a current segment is corrupted there is nothing we
can do about it.

This patch does the following:
1) Drops the corrupted segment and moves to the next one.
2) Logs such events as ERRORs.
3) Introduces a new metrics that accounts such event.

Fixes #4364

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-04-09 15:54:20 -04:00
Vlad Zolotarov
1cba4a54bb commitlog: introduce a segment_error
Introduce a common base class for all errors that indicate that the current
segment has "issues".

This allows a laconic "catch" clause for all such errors.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-04-09 15:31:13 -04:00
Vlad Zolotarov
00fe2acb35 hinted handoff: disable "reuse_segments"
Hinted handoff doesn't utilize this feature (which was developed with a
commitlog in mind).
Since it's enabled by default we need to explicitly disable it.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-04-09 11:13:41 -04:00
Piotr Jastrzebski
dee64c30b3 types: Test correct map validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:23 +02:00
Piotr Jastrzebski
3d94f0aaf0 types: Test correct in clause validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:23 +02:00
Piotr Jastrzebski
36853a7a5c types: Test correct tuple validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
94bdc1c868 types: Test correct set validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
429a8e082a types: Test correct list validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
910d81e03e types: Add test_tuple_elements_validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
e2fe9ca5d0 types: Add test_in_clause_validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
cd11959a8e types: Add test_map_elements_validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
22f541af1d types: Add test_set_elements_validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
be405e24e9 types: Add test_list_elements_validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
47e242efc5 types: Validate input when tuples
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
c4df3014ac types: Validate input when parsing a set
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
8a7b05ae26 types: Validate input when parsing a map
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
16596ec045 types: Validate input when parsing a list
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
8482764003 types: Implement validation for tuple
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
bd2823b623 types: Implement validation for set
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
086d8abf89 types: Implement validation for map
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
4a51ee6e34 types: Implement validation for list
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
f5f6367674 types: Add cql_serialization_format parameter to validate
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Takuya ASADA
e3a5ac2945 reloc: run fix_sharedlib() only on application/x-sharedlib and application/x-pie-executable
We need to prevent to run fix_sharedlib() on non-ELF files.

Fixes #4415

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190409114941.28276-1-syuu@scylladb.com>
2019-04-09 14:54:54 +03:00
Tomasz Grabiec
1b1f241c94 gdb: Print size of large allocations in 'scylla ptr' 2019-04-09 13:44:15 +02:00
Tomasz Grabiec
cda1781a77 gdb: Fix 'scylla ptr' for free pages
Fixes runtime error which happens because the setter is expected to
take an argument, but our definition doesn't take one. We're not
really expecting the setter to be called with False, so don't use setter
semantics.
2019-04-09 13:44:15 +02:00
Tomasz Grabiec
13efabe74c gdb: Set is_live and offset for large allocations properly in 'scylla ptr'
Before:

  (gdb) scylla ptr 0x601000860003
  thread 1, large, free

After:

  (gdb) scylla ptr 0x601000860003
  thread 1, large, live (0x601000860000 +3)

Omission from e1ea4db7ca.
2019-04-09 13:22:06 +02:00
Tomasz Grabiec
4002d8db7c gdb: Fix 'scylla ptr' misqualifying pointers
It can be that page::pool is != nullptr and page::offset_in_span is 0
for a page which is inside a large allocation span (live or
dead). This may lead to misqualification of a pointer as belonging to
a small allocation pool.

Only the first page of a span contains reliable information. This
patch changes the code to use the span_checker, which knows the real
boundaries of spans and exposes reliable information via the span
object.

Fixes #4368
2019-04-09 13:22:06 +02:00
Tomasz Grabiec
4d3399ee1f gdb: Make 'scylla memory' show unused memory in small pools
Example output:

Small pools:
objsz spansz    usedobj       memory       unused  wst%
    1   4096          0            0            0   0.0
    1   4096          0            0            0   0.0
    1   4096          0            0            0   0.0
    1   4096          0            0            0   0.0
    2   4096          0            0            0   0.0
    2   4096          0            0            0   0.0
    3   4096          0            0            0   0.0
    3   4096          0            0            0   0.0
    4   4096          0            0            0   0.0
    5   4096          0            0            0   0.0
    6   4096          0            0            0   0.0
    7   4096          0            0            0   0.0
    8   4096        241         8192         6264  76.5
   10   4096          0         8192         8192  99.9
   12   4096      35943       454656        23340   1.4
   14   4096          0         8192         8192  99.8
   16   4096       1171        24576         5840  23.8
   20   4096       1007        24576         4436  17.7
   24   4096      59380      1437696        12576   0.5
   28   4096        548        16384         1040   6.2
   32   4096      69433      2314240        92384   0.3
   40   4096      36447      1564672       106792   0.4
   48   4096      34099      1748992       112240   0.4
2019-04-09 13:22:05 +02:00
Tomasz Grabiec
ac7a393be5 gdb: Fix small pool memory usage reporting in 'scylla memory'
Uses span_checker to work around for corrupted _pages_in_use.

Refs https://github.com/scylladb/seastar/issues/608

As a bonus, calculates use_count correctly for fallback spans.
2019-04-09 13:22:05 +02:00
Tomasz Grabiec
d0567476e5 gdb: Switch 'scylla memory' to use the span_checker to find large spans
Simplifies code.
2019-04-09 13:22:05 +02:00
Tomasz Grabiec
4b748e601c gdb: Switch task_histogram to use the span_checker
It can be that page::pool is != nullptr and page::offset_in_span is 0
for a page which is inside a large allocation span (live or
dead). This may lead to misqualification of that span as belonging to
a small allocation pool and interpreting its contents as if it
contained small objects.

Only the first page of a span contains reliable information. This
patch changes the code to use the span_checker, which knows the real
boundaries of spans and exposes reliable information via the span
object.

Another problem was that the command scanned dead spans as well.
This is no longer the case after this patch.

I've seen this command report thousands of no longer live sstable
writers and various continuations because of those problems.

Fixes #4367
2019-04-09 13:22:05 +02:00
Tomasz Grabiec
c7215a2f67 gdb: Introduce span_checker
The purpose is to encapsulate iteration and lookup of seastar
allocator memory spans.
2019-04-09 13:22:05 +02:00
Rafael Ávila de Espíndola
89b2c4ddc5 Refactor user type merging
The comparison of tables before and after mutation is now done by a
generic diff_rows function. The same function will be used for user
defined functions and user defined aggregates.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-08 14:16:40 -07:00
Rafael Ávila de Espíndola
4f1260f3e3 cql_type_parser::raw_builder: Allow building types incrementally
Before this patch raw_builder would always start with an empty list of
user types. This means that every time a type is added to a keyspace,
every type in that keyspace needs to be recreated.

With this patch we pass a keyspace_metadata instead of just the
keyspace name and can construct new user types on top of previous
ones.

This will be used in the followup patch, where only new types are
created.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-08 14:06:51 -07:00
Rafael Ávila de Espíndola
c037b266b4 cql3: delete dead code
In c++ TOKEN_FUNCTION_NAME is only needed in the .cc file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola
1db0b83711 Include missing header
abstract_function.hh uses function, which is defined in function.hh,
so it should include it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola
4551691b5d return a const reference from return_type
We define data_type as

using data_type = shared_ptr<const abstract_type>;

Since it is a shared_ptr, it cannot be copied into another thread
since that would create a race condition incrementing the reference
counter.

In particular, before this patch it is not legal to call
return_type from another thread.

With this patch read only access from another thread is possible.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola
35f1b1055d delete unused var
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola
b577082c64 Add a test on nested user types.
This would have found a bug in a previous version of this series.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-08 10:54:33 -07:00
Takuya ASADA
1f009b5e9b dist/redhat/python3: drop SCYLLA-*-FILE files in rpm
Related with #4409, These are more files does not needed for runtime, so
drop them too.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190405074030.3990-1-syuu@scylladb.com>
2019-04-08 11:52:48 +03:00
Rafael Ávila de Espíndola
6191fd7701 Avoid duplicated read_keyspace_mutation calls
There were many calls to read_keyspace_mutation. One in each function
that prepares a mutation for some other schema change.

With this patch they are all moved to a single location.

Tests: unit (dev, debug)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190328024440.26201-1-espindola@scylladb.com>
2019-04-07 09:26:56 +03:00
Takuya ASADA
d180caea89 dist/redhat/python3: drop dist/ files in rpm
These files does not needed for runtime, drop them.

Fixes #4409

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190405071445.18678-1-syuu@scylladb.com>
2019-04-07 09:26:56 +03:00
Amos Kong
db9a721d02 scylla_kernel_check: update kb_fs_not_qualified_aio doc link
The doc has been moved to
https://docs.scylladb.com/troubleshooting/error_messages/kb_fs_not_qualified_aio/

Fixes #4398

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <75fdc97d222667f4402cadc7a46e52d6f38a32a8.1554375560.git.amos@scylladb.com>
2019-04-07 09:26:56 +03:00
Glauber Costa
2305cc88f3 relocatable python: Be more permissive with mime type checking
Fedora28 python magic used to return a x-sharedlib mime type for .so files.
Fedora29 changed that to x-pie-executable, so the libraries are no longer
relocated.

Let's be more permissive and relocate everything that starts with application/.

Fixes #4396

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190404140929.7119-1-glauber@scylladb.com>
2019-04-07 09:26:56 +03:00
Piotr Jastrzebski
882ea9caf0 tests: Fix use after free in check_multi_schema
Refs #4376

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <7d7b4cf69cea1e4d31058d8f1fd2c01f1dd11c58.1554387442.git.piotr@scylladb.com>
2019-04-07 09:26:56 +03:00
Piotr Jastrzebski
4485868d27 tests: Fix use after free in check_read_indexes
Refs #4376

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <0dc76b2a55bebc49558f30e8d2894973ce817577.1554386770.git.piotr@scylladb.com>
2019-04-07 09:26:56 +03:00
Tomasz Grabiec
a717e11026 Merge "row level repair shutdown fixes" from Asias
This series fixes row level repair shutdown related issues we saw with
dtests, e.g., use after free of the repair meta object, fail to stop a
table during shutdown.

Fixes: #4044
Fixes: #4314
Fixes: #4333
Fixes: #4380

Tests: repair_additional_test.py:RepairAdditionalTest.repair_abort_test
       repair_additional_test.py:RepairAdditionalTest.repair_kill_2_test

* sestar-dev.git asias/repair.fix.shutdown.v1:
  repair: Wait for pending repair_meta operation before removing it
  repair: Check shutdown in row level repair
  repair: Remove repair meta when node is dead
  repair: Remove all row level repair during shtudown
2019-04-05 15:47:25 +03:00
Avi Kivity
e63bc6b1e3 Update seastar submodule
* seastar 63d8607...6f73675 (5):
  > Merge "seastar-addr2line: improve the context of backtraces" from Botond
  > log: fix std::system_error ostream operator to print full error message
  > Revert "threads: yield on get if we had run for too long."
  > core/queue: Document concurrency constraints
  > core/memory: Make small pools use the full span size

Fixes #4407.
Fixes #4316.
2019-04-05 15:47:25 +03:00
Avi Kivity
b1c4c371fa Merge "fix I/O calculation for i3.metal instances" from Glauber
"
Calculation of IO properties is slightly wrong for i3.metal, because we get
the number of disks wrong. The reason for that is our check for ephemeral nvme
disks, that pre-date the time in which root devices were exposed as nvme devices
(nitro and metal instances).
"

toolchain updated with python3-psutil

* 'ec2fixes' of github.com:glommer/scylla:
  scylla_util.py: do not include root disks in ephemeral list
  scylla-python3: include the psutil module
  fix typo in scylla_ec2_check
2019-04-05 15:46:59 +03:00
Asias He
f212dfb887 streaming: Reject stream if the _sys_dist_ks or _view_update_generator are not ready
They are of type db::system_distributed_keyspace and
db::view::view_update_generator.

n1 is in normal status
n2 boots up and _sys_dist_ks or _view_update_generator are not
initialized
n1 runs stream, n2 is the follower.
n2 uses the _sys_dist_ks or _view_update_generator
"Assertion `local_is_initialized()' failed" is observed

Fixes #4360

Message-Id: <4ae13e1640ac8707a9ba0503a2744f6faf89ecf4.1554330030.git.asias@scylladb.com>
2019-04-04 10:48:00 +03:00
Avi Kivity
8abba6f6a6 Merge "Avoid copying data_type" from Rafael
"
With these changes we avoid a std::vector<data_value> copy, which is
nice in itself, but also makes it possible to call get_list from other
shards.
"

* 'espindola/result-set-v3' of https://github.com/espindola/scylla:
  Avoid copying a std::vector in get_list
  query-result-set: add and use a get_ptr method
2019-04-03 21:29:22 +03:00
Asias He
99da196e6f repair: Reject repair if the _sys_dist_ks or _view_update_generator are not ready
They are of type db::system_distributed_keyspace and db::view::view_update_generator.

n1 is in normal status
n2 boots up and _sys_dist_ks or _view_update_generator are not initialized
n1 runs repair, n2 is the follower.
n2 uses the _sys_dist_ks or _view_update_generator
"Assertion `local_is_initialized()' failed" is observed

Fixes #4360

Message-Id: <6616c21078c47137a99ba71baf82594ba709597c.1553742487.git.asias@scylladb.com>
2019-04-03 21:29:22 +03:00
Rafael Ávila de Espíndola
74f956e5a8 Avoid copying a std::vector in get_list
For now this is just an optimization. But it also avoids copying
data_type, which will allow this be used across shards.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-03 09:20:12 -07:00
Rafael Ávila de Espíndola
c2a8807c35 query-result-set: add and use a get_ptr method
This moves a copy up the call stack and makes it possible to avoid it
completely by passing a reference type to get_nonnull.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-03 09:19:52 -07:00
Tomasz Grabiec
3356a085d2 lsa: Cover more bad_alloc cases with abort
When --abort-on-lsa-bad-alloc is enabled we want to abort whenever
we think we can be out of memory.

We covered failures due to bad_alloc thrown from inside of the
allocation section, but did not cover failures from reservations done
at the beginning of with_reserve(). Fix by moving the trap into
reserve().

Message-Id: <1553258915-27929-1-git-send-email-tgrabiec@scylladb.com>
2019-04-03 16:39:40 +03:00
Glauber Costa
0e9a50ab57 scylla_util.py: do not include root disks in ephemeral list
Nitro instances (and metal ones) put their root device in nvme (as a
protocol. it is still EBS). Our algorithm so far has relied on parsing
the nvme devices to figure out which ones are ephemeral but it will
break for those instances.

Out of our supported instances so far, the i3.metal is the only one
in which this breaks.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-04-03 07:57:00 -04:00
Glauber Costa
6d7ac87136 scylla-python3: include the psutil module
Using a new python3 module has never been that easy! So we'll
unapologetically use psutil and don't even worry about whether or not
CentOS supports it (it doesn't)

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-04-02 17:24:25 -04:00
Glauber Costa
027eee5f13 fix typo in scylla_ec2_check
enahanced -> enhanced

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-04-02 17:24:00 -04:00
Dejan Mircevski
a66a5d423a query_processor: Add query-count metrics
... with labels for each consistency level.  Fixes
https://github.com/scylladb/scylla/issues/4309 ("add counters breaking
up cql requests based on consistency_level").

Tests: unit (dev)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <1554127055-17705-1-git-send-email-dejan@scylladb.com>
2019-04-02 19:08:25 +03:00
Avi Kivity
be6905da84 Update seastar submodule
* seastar 5572de7...63d8607 (6):
  > test: verify that negative sleep time doesn't cause infinite sleep
  > httpd: Change address handling to use socket_address
  > dns: Change "unspecififed" address search type to retrive first avail
  > Allow when_all and when_all_succeed to take function arguments
  > when_all: abort if memory allocation fails
  > inet_address: Add missing constructor impl.
2019-04-02 16:56:56 +03:00
Asias He
b98d95ebf0 repair: Remove all row level repair during shtudown
We saw dtest failed to stop a node like:

```
ERROR: repair_one_missing_row_test (repair_additional_test.RepairAdditionalTest)
----------------------------------------------------------------------
Traceback (most recent
[2019.1.3.node1.repair.zip](https://github.com/scylladb/scylla/files/2723244/2019.1.3.node1.repair.zip)
 call last):
  File "/home/asias/src/cloudius-systems/scylla-dtest/repair_additional_test.py", line 2521, in repair_one_missing_row_test
    return RepairAdditionalBase._repair_one_missing_row_test(self)
  File "/home/asias/src/cloudius-systems/scylla-dtest/repair_additional_test.py", line 1842, in _repair_one_missing_row_test
    self.check_rows_on_node(node2, nr_rows)
  File "/home/asias/src/cloudius-systems/scylla-dtest/repair_additional_test.py", line 34, in check_rows_on_node
    node.stop(wait_other_notice=True)
  File "/home/asias/src/cloudius-systems/scylla-ccm/ccmlib/scylla_node.py", line 496, in stop
    raise NodeError("Problem stopping node %s" % self.name)
NodeError: Problem stopping node node1
```

The problem is:

1) repair_meat is created
repair_meta -> repair_writer::create_writer() -> t.stream_in_progress()
repari_meta -> repair_reader::repair_reader -> cf.read_in_progress()

2) repair_meta is stored in _repair_metas map.

3) Shtudown repair, repair_meta is not removed from the _repair_metas map

4) Shutdown database which wait for the utils::phased_barrier.

To fix, we should stop and remove all the repair_meata from the _repair_metas map.

Tests: 30 successful runs of the repair_kill_2_test

Fixes: #4044
2019-04-02 19:28:53 +08:00
Asias He
344d0ee37d repair: Remove repair meta when node is dead
Repair follower nodes will create repair meta object when repair master
node starts a repair. Normally, the repair meta object is removed when
repair master finishes the repair and sends the verb
REPAIR_ROW_LEVEL_STOP to all the followers to remove the repair meta
object. In case of repair master was killed suddenly, no one will remove
the repair meta object.

To prevent keeping this repair meta object forever, we should remove
such objects when gossip detects a node is dead with the gossip
listener.

Fixes: #4380

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
2019-04-02 19:28:53 +08:00
Asias He
b061157b21 repair: Check shutdown in row level repair
During node shutdown, we should abort the repair as soon as possible.
Check if we are in shutdown in row level repair steps.

Refs: #4044
2019-04-02 19:28:53 +08:00
Asias He
e3e489328e repair: Wait for pending repair_meta operation before removing it
We remove repair_meta object in remove_repair_meta up receiving of stop
row level repair rpc verb. It is possible there is an pending operation
of repair_meta. To avoid use after free, we should not remove the
repair_meta object until all the pending operations are done.
Use a gate to protect it.

Fixes: #4333
Fixes: #4314
Tests: 50 succesful run of repair_additional_test.py:RepairAdditionalTest.repair_kill_2_test
2019-04-02 19:28:53 +08:00
Vlad Zolotarov
0dc0a6025d query_pager::fetch_page: cosmetics: fix code alignment
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190401214030.5570-2-vladz@scylladb.com>
2019-04-02 11:53:10 +03:00
Asias He
70fbe85b3e main: Add shutdown database log
It is useful to know which step we are during shutdown process.

Refs: #4044
Message-Id: <f7c94c60d039560bfacd6d473f7d828940cc55b7.1554172140.git.asias@scylladb.com>
2019-04-02 11:49:00 +03:00
Benny Halevy
3749148339 storage_service: fix handling of load_new_sstables exception
ignore_ready_future in load_new_ss_tables broke
migration_test:TestMigration_with_*.migrate_sstable_with_counter_test_expect_fail dtests.

The java.io.NotSerializableException in nodetool was caused by exceptions that
were too long.

This fix prints the problematic file names onto the node system log
and includes the casue in the resulting exception so to provide the user
with information about the nature of the error.

Fixes #4375

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190331154006.12808-1-bhalevy@scylladb.com>
2019-04-02 11:46:19 +03:00
Avi Kivity
988dfd7209 Merge "add relocatable CLI tools required for scylla setup scripts" from Takuya
"
To make offline installer easier we need to minimize dependencies as
possible.
Python dependencies are already dropped by adding relocatable python3 by
Glauber, now it's time to drop rest of command line tools which used by
scylla setup tools.
(even scripts are converted to python3, it still executes some external
commands, so these commands should be distributed with offline installer)

Note that some of CLI tools haven't added such as NTP and RAID stuff,
since these tools have daemons, not just CLI.
To use such stuff in offline mode, users have to install them manually.
But both NTP setup and RAID setup are optional, users still can run Scylla w/o
them.
"

Toolchain updated to docker.io/scylladb/scylla-toolchain:fedora-29-20190401
for changes in install-dependencies.sh; also updates to gnutls 3.6.7 security
release.

* 'reloc_clitools_v5' of https://github.com/syuu1228/scylla:
  reloc: add relocatable CLI tools for scylla setup scripts
  dist/redhat: drop systemd-libs from dependency
  dist/redhat: drop file from dependency since it seems unused
  dist/redhat: drop pciutils from dependency since it only used in DPDK mode
2019-04-01 14:23:04 +03:00
Raphael S. Carvalho
d59f716e1c table: fix wild disk usage stat after sstables are discarded by truncate
Truncate would make disk usage stat go wild because it isn't updated
when sstables are removed in table::discard_sstables(). Let's update
the stat after sstables are removed from the sstable set.

Fixes #3624.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190328154918.25404-1-raphaelsc@scylladb.com>
2019-04-01 13:55:11 +03:00
Duarte Nunes
b2dd8ce065 database: Make exception message more accurate
It's the sstable read queue that's overloaded, not the inactive one
(which can be considered empty when we can't admit newer reads).

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190328003533.6162-1-duarte@scylladb.com>
2019-04-01 13:53:50 +03:00
Takuya ASADA
75a7859019 reloc: add relocatable CLI tools for scylla setup scripts
To minimize dependencies of Scylla, add relocatable image of CLI tools
required for scylla setup scripts.
2019-04-01 02:59:01 +09:00
Takuya ASADA
a3c1b9fcf3 dist/redhat: drop systemd-libs from dependency
Since we switched to relocatable package, we don't need distribution
native libraries, so the package is not needed anymore.
2019-04-01 02:58:22 +09:00
Takuya ASADA
a3741b4052 dist/redhat: drop file from dependency since it seems unused
The pacakge is not used in our script anymore, drop it.
2019-04-01 02:57:43 +09:00
Takuya ASADA
7d78515d5b dist/redhat: drop pciutils from dependency since it only used in DPDK mode
Since we don't use DPDK mode by default, and the mode is not officially
supported, drop pciutils from package dependency.
Users who want to use DPDK mode they neeed to install the package
manually.
2019-04-01 02:56:31 +09:00
Avi Kivity
77a0d5c5da Update seastar submodule
* seastar 05efbce...5572de7 (5):
  > posix_file_impl::list_directory: do not ignore symbolic link file type
  > prometheus: yield explicitly after each metric is processed
  > thread: add maybe_yield function
  > metrics: add vector overload of add_group()
  > memory: tone down message for memory allocator
2019-03-31 15:26:21 +03:00
Tomasz Grabiec
4c0584289b tests: cql_test_env: Fix _feature_service not being initialized
We moved from uninitialized field instead of the constructor parameter.

No known issues.

Message-Id: <1553854544-26719-1-git-send-email-tgrabiec@scylladb.com>
2019-03-31 13:05:35 +03:00
Takuya ASADA
b1bba0c1b0 dist/redhat/python3: product name customization support
Currently scylla-python3 package name is hardcorded, need to support
package name renaming just like on other scylla packages.
This is required to release enterprise version.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190329003941.12289-1-syuu@scylladb.com>
2019-03-29 19:22:24 +02:00
Amos Kong
98cb7d145b scylla_setup: don't repeatedly select disks if it's assigned
Currently scylla_setup would be stuck to select disks in non-interaction mode.

Fixes #4370

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <8fb445708a6ac0d2130f8a8d041b1d8d71f1cf14.1553745961.git.amos@scylladb.com>
2019-03-28 15:21:36 +02:00
Avi Kivity
65dd45d9cf Merge "sstable: validate file ownership and mode." from Benny
"
File must be either owned by the process uid
or have both read and write access to it,
so it could be (hard) linked when sysctl
fs.protected_hardlinks is enabled.

Fixes #3117
"

* 'projects/valid_owner_and_mode/v3-rebased' of https://github.com/bhalevy/scylla:
  storage_service: handle load_new_sstables exception
  init: validate file ownership and mode.
  treewide: use std::filesystem
2019-03-28 14:58:14 +02:00
Benny Halevy
956cb2e61c storage_service: handle load_new_sstables exception
Refs #3117

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-28 14:54:56 +02:00
Benny Halevy
e3f7fe44c0 init: validate file ownership and mode.
Files and directories must be owned by the process uid.
Files must have read access and directories must have
read, write, and execute access.

Refs #3117

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-28 14:40:12 +02:00
Benny Halevy
ff4d8b6e85 treewide: use std::filesystem
Rather than {std::experimental,boost,seastar::compat}::filesystem

On Sat, 2019-03-23 at 01:44 +0200, Avi Kivity wrote:
> The intent for seastar::compat was to allow the application to choose
> the C++ dialect and have seastar follow, rather than have seastar choose
> the types and have the application follow (as in your patch).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-28 14:21:10 +02:00
Dejan Mircevski
aa11f5f35e Drop unused #include
v2: fix "From" field in email

Tests: unit/cql_query_test (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <1553099087-11621-1-git-send-email-dejan@scylladb.com>
2019-03-28 01:48:19 +00:00
Duarte Nunes
d8fcdefe4a tests/view_schema_test: Remove debug output
A stray std::cout remained.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2019-03-27 21:58:10 +00:00
Tomasz Grabiec
2b8bf0dbf8 Merge "db/view: Apply tracked tombstones for new updates" from Duarte
When generating view updates for base mutations when no pre-existing
data exists, we were forgetting to apply the tracked tombstones.

Fixes #4321
Tests: unit(dev)

* https://github.com/duarten/scylla materialized-views/4321/v1.1:
  db/view: Apply tracked tombstones for new updates
  tests/view_schema_test: Add reproducer for #4321
2019-03-27 13:24:28 +01:00
Duarte Nunes
f609848b69 tests/view_schema_test: Add reproducer for #4321
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2019-03-27 12:01:39 +00:00
Duarte Nunes
ded9221187 db/view: Apply tracked tombstones for new updates
When generating view updates for base mutations when no pre-existing
data exists, we were forgetting to apply the tracked tombstones.

Fixes #4321
Tests: unit(dev)

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2019-03-27 12:01:39 +00:00
Glauber Costa
043d102ab6 commitlog: fix typo in error message
maxiumum -> maximum

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190326191108.7573-1-glauber@scylladb.com>
2019-03-26 21:32:56 +02:00
Avi Kivity
a77762b02a Merge "Optimise vint deserialisation" from Paweł
"

Variable length integers are used are used extensively by SSTables mc
format. The current deserialisation routine is quite naive in a way that
it reads each byte separately. Since, those vints usually appear inside
much larger buffers, we optimise for such cases, read 8-bytes at once
and then mask out the unneeded parts (as well as fix their order because
big-endian).

Tests: unit(dev).

perf_vint (average time per element when deserializing 1000 vints):

before:
vint.deserialize                            69442000    14.400ns     0.000ns    14.399ns    14.400ns

after:
vint.deserialize                           241502000     4.140ns     0.000ns     4.140ns     4.140ns

perf_fast_forward (data on /tmp):
large-partition-single-key-slice on dataset large-part-ds1:

before:
   range            time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> [0, 1]           0.000278         8792         2       7190        119       7367       1960      3        104       2       0        0        1        1        0        0        1 100.0%
-> [1, 100)         0.000344           96        99     288100       4335     307689     193809      2        108       2       0        0        1        1        0        0        1 100.0%
-> (100, 200]       0.000339        13254       100     295263       2824     301734     222725      2        108       2       0        0        1        1        0        0        1 100.0%

after:
   range            time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> [0, 1]           0.000236        10001         2       8461         59       8718       2261      3        104       2       0        0        1        1        0        0        1 100.0%
-> [1, 100)         0.000285           89        99     347500       2441     355826     215745      2        108       2       0        0        1        1        0        0        1 100.0%
-> (100, 200]       0.000293        14369       100     341302       1512     350123     222049      2        108       2       0        0        1        1        0        0        1 100.0%
"

* tag 'optimise-vint/v2' of https://github.com/pdziepak/scylla:
  sstable: pass full length of buffer to vint deserialiser
  vint: optimise deserialisation routine
  vint: drop deserialize_type structure
  tests/vint: reduce test dependencies
  tests/perf: add performance test for vint serialisation
2019-03-26 16:41:44 +02:00
Avi Kivity
4b330b3911 Merge "introduce sstables manager" from Benny
"
This series introduce a rudimentary sstables manager
that will be used for making and deleting sstables, and tracking
of thereof.

The motivation for having a sstables manager is detailed in
https://github.com/scylladb/scylla/issues/4149.
The gist of it is that we need a proper way to manage the life
cycle of sstables to solve potential races between compaction
and various consumers of sstables, so they don't get deleted by
compaction while being used.

In addition, we plan to add global statistics methods like returning
the total capacity used by all sstables.

This patchset changes the way class sstable gets the large_data_handler.
Rather than passing it separately for writing the sstable and when deleting
sstables, we provide the large_data_handler when the sstable object is
constructed and then use it when needed.

Refs #4149
"

* 'projects/sstables_manager/v3' of https://github.com/bhalevy/scylla:
  sstables: provide large_data_handler to constructor
  sstables_manager: default_sstable_buffer_size need not be a function
  sstables: introduce sstables_manager
  sstables: move shareable_components def to its own header
  tests: use global nop_lp_handler in test_services
  sstables: compress.hh: add missing include
  sstables: reorder entry_descriptor constructor params
  sstables: entry_descriptor: get rid of unused ctor
  sstables: make load_shared_components a method of sstable
  sstables: remove default params from sstable constructor
  database: add table::make_sstable helper
  distributed_loader: pass column_family to load_sstables_with_open_info
  distributed_loader: no need for forward declaration of load_sstables_with_open_info
  distributed_loader: reshard: use default params for make_sstable
2019-03-26 16:31:40 +02:00
Benny Halevy
223e1af521 sstables: provide large_data_handler to constructor
And use it for writing the sstable and/or when deleting it.

Refs #4198

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:24:19 +02:00
Benny Halevy
c23f658d0e sstables_manager: default_sstable_buffer_size need not be a function
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
eebc3701a5 sstables: introduce sstables_manager
The goal of the sstables manager is to track and manage sstables life-cycle.
There is a sstable manager instance per database and it is passed to each column-family
(and test environment) on construction.
All sstables created, loaded, and deleted pass through the sstables manager.

The manager will make sure consumers of sstables are in sync so that sstables
will not be deleted while in use.

Refs #4149

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
b50c041aa2 sstables: move shareable_components def to its own header
To be used by sstables_manager.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
2cd11208a1 tests: use global nop_lp_handler in test_services
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
0e3f9c25e4 sstables: compress.hh: add missing include
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
33cbfe81f2 sstables: reorder entry_descriptor constructor params
To match make_sstable's in preparation of moving to sstables_manager

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
ac5f9c1eae sstables: entry_descriptor: get rid of unused ctor
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
adf8428321 sstables: make load_shared_components a method of sstable
and open code its static part in the caller (distributed_loader)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
ff7b7910f1 sstables: remove default params from sstable constructor
The goal is to construct sstables only via make_sstables
that will be moved to class sstables_manager in a later patch.

Defining the default values in both interfaces is unneeded
and may to lead to them going out of sync.

Therefore, have only make_sstables provide the default parameter values.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
3a17053cb8 database: add table::make_sstable helper
In most cases we make a sstable based on the table schema
and soon - large_data_handler.
Encapsulate that in a make_sstable method.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
67f705ae04 distributed_loader: pass column_family to load_sstables_with_open_info
Rather than just its schema.

In preparation for adding table::make_sstable

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
99875ba966 distributed_loader: no need for forward declaration of load_sstables_with_open_info
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
7a8ab1d6f1 distributed_loader: reshard: use default params for make_sstable
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Avi Kivity
5e39b62fcc Merge "configure: Optionally don't compress debug in executables" from Rafael
"
Most of the binaries we link in a debug build are linked with -s, so
the only impact is build/debug/scylla, which grows by 583 MiB when
using --compress-exec-debuginfo=0.

On the other hand, not having to recompress all the debug info from
all the used object files is a pretty big win when debugging an issue.

For example, linking build/debug/scylla goes from

56.01s user 15.86s system 220% cpu 32.592 total

to

27.39s user 19.51s system 991% cpu 4.731 total

Note how the cpu time is "only" 2x better, but given that compressing
debug info is a long serial task, the wall time is 6.8x better.

Tests: unit (debug)
"

* 'espindola/dont-compress-debug-v5' of https://github.com/espindola/scylla:
  configure: Add a --compress-exec-debuginfo option
  configure: Move some flags from cxx_ld_flags to cxxflags
  configure: rename per mode opt to cxx_ld_flags
  configure: remove per mode libs
  configure: remove sanitize_libs and merge sanitize into opt
  configure: split a ld_flags_{mode} out of cxxflags_{mode}
2019-03-26 15:25:07 +02:00
Avi Kivity
fad1be0ddc Update seastar submodule
* seastar caa98f8...05efbce (2):
  > fix use after free in rpc server handler
  > rpc: wait for send_negotiation_frame

Fixes #4336.
2019-03-26 14:33:37 +02:00
Gleb Natapov
1abc50ad8a messaging_service: make sure a client is unique for a destination
Function messaging_service::get_rpc_client() suppose to either return
existing client or create one and return it. The function is suppose to
be atomic, so after checking that requested client does not exist it is
safe to assume emplace() will succeed. But we saw bugs that made the
function to not be atomic. Lets add an assert that will help to catch
such bugs easier if they will happen in the future.

Message-Id: <20190326115741.GX26144@scylladb.com>
2019-03-26 14:19:08 +02:00
Avi Kivity
a696a3daf2 Merge "Fix decimal and varint serialization" from Piotr
"
Fixes #4348

v2 changes:
 * added a unit test

This miniseries fixes decimal/varint serialization - it did not update
output iterator in all cases, which may lead to overwriting decimal data
if any other value follows them directly in the same buffer (e.g. in a tuple).
It also comes with a reproducing unit test covering both decimals and varints.

Tests: unit (dev)
dtest: json_test.FromJsonUpdateTests.complex_data_types_test
   json_test.FromJsonInsertTests.complex_data_types_test
   json_test.ToJsonSelectTests.complex_data_types_test
"

* 'fix_varint_serialization_2' of https://github.com/psarna/scylla:
  tests: add test for unpacking decimals
  types: fix varint and decimal serialization
2019-03-26 13:00:19 +02:00
Piotr Sarna
e538163a29 tests: add test for unpacking decimals
Refs #4348
2019-03-26 11:52:44 +01:00
Piotr Sarna
287a02dc05 types: fix varint and decimal serialization
Varint and decimal types serialization did not update the output
iterator after generating a value, which may lead to corrupted
sstables - variable-length integers were properly serialized,
but if anything followed them directly in the buffer (e.g. in a tuple),
their value will be overwritten.

Fixes #4348

Tests: unit (dev)
dtest: json_test.FromJsonUpdateTests.complex_data_types_test
       json_test.FromJsonInsertTests.complex_data_types_test
       json_test.ToJsonSelectTests.complex_data_types_test

Note that dtests still do not succeed 100% due to formatting differences
in compared results (e.g. 1.0e+07 vs 1.0E7, but it's no longer a query
correctness issue.
2019-03-26 11:02:43 +01:00
Rafael Ávila de Espíndola
ddac002fd4 Make atomic_cell comparison symmetrical
I noticed a test failure with

Mutation inequality is not symmetric for ...

And the difference between the two mutations was that one atomic_cell
was live and the other wasn't.

Looking at the code I found a few cases where the comparison was not
symmetrical. This patch fixes them.

This patch will not fix the test, as it will now fail with a
"Mutations differ" error, but that is probably an independent issue.

Ref #3975.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190325194647.54950-1-espindola@scylladb.com>
2019-03-26 11:14:22 +02:00
Vlad Zolotarov
c798563cb0 scylla_util.py: ignore perftune.py's error messages when calling it in order to get mode's CPU mask
When we call perftune.py in order to get a particular mode's cpu set
(e.g. mode=sq_split) it may fail and print an error message to stderr because
there are too few CPUs for a particular configuration mode (e.g. when
there are only 2 CPUs and the mode is sq_split).

We already treat these situations correctly however we let the
corresponding perftune.py error message get out into the syslog.

This is definitely confusing, stressful and annoying.
Let's not let these messages out.

Fixes #4211

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190325220018.22824-1-vladz@scylladb.com>
2019-03-26 11:08:31 +02:00
Vlad Zolotarov
afa176851b transport: result_message: fix the compilation with fmt v5.3.0
Compilation fails with fmt release 5.3.0 when we print a bytes_view
using "{}" formatter.
Compiler's complain is: "error: static assertion failed: mismatch between char-types of context and argument"

Resolve this by explicitly using the operator<<() across the whole
operator<<(std::ostream& os, const result_message::rows& msg) function.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190325203628.5902-1-vladz@scylladb.com>
2019-03-26 11:06:18 +02:00
Benny Halevy
af7f2a07f4 table::open_sstable: test has_scylla_component after load
has_scylla_component is always false before loading the sstable.

Also, return exception future rather than throwing.

Hit with the following dtests:
 counter_tests.TestCounters.upgrade_test
 counter_tests.TestCountersOnMultipleNodes.counter_consistency_node_*_test
 resharding_test.ReshardingTest_nodes?_with_*CompactionStrategy.resharding_counter_test
 update_cluster_layout_tests.TestUpdateClusterLayout.increment_decrement_counters_in_threads_nodes_restarted_test

Fixes #4306

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190326084151.18848-1-bhalevy@scylladb.com>
2019-03-26 10:58:52 +02:00
Avi Kivity
f259a4c3b4 Merge "Remove usage of static gossiper object in init.cc and storage_service" from Asias
"
This series removes the usage of the static gossiper object in init.cc
and storage_service.

Follow up series will remove more in other components. This is the
effort to clean up the component dependencies and have better shutdown
procedure.

Tests: tests/gossip_test, tests/cql_query_test, tests/sstable_mutation_test,  dtests.
"

* tag 'asias/storage_service_gossiper_dep_v5' of github.com:cloudius-systems/seastar-dev:
  storage_service: Do not use the global gms::get_local_gossiper()
  storage_service: Pass gossiper object to storage_service
  gms: Remove i_failure_detector.hh
  gossip: Get rid of the gms::get_local_failure_detector static object
  dht: Do not use failure_detector::is_alive in failure_detector_source_filter
  tests: Fix stop snitch in gossip_test.cc
  gossiper: Do not use value_factory from storage_service object
  gossiper: Use cfg options from _cfg instead of get_local_storage_service
  gossiper: Pass db::config object to gossiper class
  init: Pass gossiper object to init_ms_fd_gossiper
2019-03-26 08:54:46 +02:00
Avi Kivity
1d9699d833 Update seastar submodule
* seastar 33baf62...caa98f8 (8):
  > Merge "Add file_accessible and file_stat methods" from Benny
  > future::then: use std::terminate instead of abort
  > build: Allow cooked dependencies with configure.py
  > tests: Show a test's output when it fails
  > posix_file_impl: Bypass flush() call iff opened with O_DSYNC
  > posix_file_impl: Propagate and keep open_flags
  > open_flags: Add O_DSYNC value
  > build: Forward variables to CMake correctly
2019-03-25 15:45:52 +02:00
Avi Kivity
a7520c0ba9 Merge "Turn cql3_type into a trivial wrapper over data_type" from Rafael
"
Both cql3_type and abstract_type are normally used inside
shared_ptr. This creates a problem when an abstract_type needs to refer
to a cql3_type as that creates a cycle.

To avoid warnings from asan, we were using a std::unordered_map to
store one of the edges of the cycle. This avoids the warning, but
wastes even more memory.

Even before this series cql3_type was a fairly light weight
structure. This patch pushes in that direction and now cql3_type is a
struct with a single member variable, a data_type.

This avoids the reference cycle and is easier to understand IMHO.

The one corner case is varchar. In the old system cql3_type::varchar
and cql3_type::text don't compare equal, but they both map to the same
data_type.

In the new system they would compare equal, so we avoid the confusion
by just removing the cql3_type::varchar variable.

Tests: unit (dev)
"

* 'espindola/merge-cq3-type-and-type-v3' of https://github.com/espindola/scylla:
  Turn cql3_type into a trivial wrapper over data_type
  Delete cql3_type::varchar
  Simplify db::cql_type_parser::parse
  Add a test for the varchar column representation
2019-03-25 15:03:16 +02:00
Tomasz Grabiec
80020118d0 Merge "Fix a couple of bugs related to large entry deletion" from Rafael
The crash observed in issue #4335 happens because
delete_large_data_entries is passed a deleted name.

Normally we don't get a crash, but a garbage name and we fail to
delete entries from system.large_*.

Adding a test for the fix found another issue that the second patch
is this series fixes.

Tests: unit (dev)

Fixes #4335.

* https://github.com/espindola/scylla guthub/fix-use-after-free-v4:
  large_data_handler: Fix a use after destruction
  large_data_handler: Make a variable non static
  Allow large_data_handler to be stopped twice
  Allow table to be stopped twice
  Test that large data entries are deleted
2019-03-25 10:37:36 +01:00
Avi Kivity
8c6306897d Merge "load_new_sstables: validate new_tables before calling row_cache::invalidate" from Benny
"
Validate the to-be-loaded sstables in the open_sstable phase and handle any exceptions before calling cf.get_row_cache().invalidate.

Currently if exception is thrown from distributed_loader::open_sstable cf._sstables_opened_but_not_loaded may be left partially populated.

Fixes #4306

Tests: unit (dev)
	- next-gating dtests (dev)
	- migration_test:TestMigration_with_2_1_x.migrate_sstable_with_counter_test
	  migration_test:TestMigration_with_2_1_x.migrate_sstable_with_counter_test_expect_fail
	  - with bypassing exception in distributed_loader::flush_upload_dir
	    to trigger the exception in table::open_sstable

"

* 'issues/4306/v3' of https://github.com/bhalevy/scylla:
  table: move sstable counters validation from load_sstable to open_sstable
  distributed_loader::load_new_sstables: handle exceptions in open_sstable
2019-03-24 20:30:44 +02:00
Avi Kivity
bd3a836e6c Merge "fixes for relocatable python3 packaging" from Takuya
"
Aligned way to build relocatable rpm with existing relocatable packages.
"

* 'relocatable-python3-fix-v3' of https://github.com/syuu1228/scylla:
  reloc: allow specify rpmbuild dir
  reloc/python3: archive package version number on build_reloc.sh
  reloc/python3: archive rpm build script in the relocatable package, build rpm using the script
  relloc/python3: fix PyYAML package name
  reloc: rename python3 relocatable package filename to align same style with other packages
  reloc: move relocatable python build scripts to reloc/python3 and dist/redhat/python3
2019-03-24 20:29:56 +02:00
Duarte Nunes
93a1c27b31 service/storage_proxy: Don't consider view hints for MV backpressure
When a view replica becomes unavailable, updates to it are stored as
hints at the paired based replica. This on-disk queue of pending view
updates grows as long as there are view updated and the view replica
remains unavailable. Currently, we take that relative queue size into
account when calculating the delay for new base writes, in the context
of the backpressure algorithm for materialized views.

However, the way we're calculating that on-disk backlog is wrong,
since we calculate it per-device and then feed it to all the hints
managers for that device. This means that normal hints will show up as
backlog for the view hints manager, which in turn introduces delays.
This can make the view backpressure mechanism kick-in even if the
cluster uses no materialized views.

There's yet another way in which considering the view hints backlog is
wrong: a view replica that is unavailable for some period of time can
cause the backlog to grow to a point where all base writes are applied
the maximum delay of 1 second. This turns a single-node failure into
cluster unavailability.

The fix to both issues is to simply not take this on-disk backlog into
account for the backpressure algorithm.

Fixes #4351
Fixes #4352

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190321170418.25953-1-duarte@scylladb.com>
2019-03-24 20:29:56 +02:00
Benny Halevy
32bf0f36ef table: move sstable counters validation from load_sstable to open_sstable
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-24 18:25:09 +02:00
Benny Halevy
564be8b720 distributed_loader::load_new_sstables: handle exceptions in open_sstable
Propagate exception to caller.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-24 18:25:09 +02:00
Takuya ASADA
efb3865840 reloc: allow specify rpmbuild dir
Aded same option on python3/build_rpm.sh, --builddir to specify rpmbuild
dir.
2019-03-24 00:34:09 +09:00
Takuya ASADA
dc5cec4194 reloc/python3: archive package version number on build_reloc.sh
Instead of getting python3 version number on build_rpm.sh, archive
version number when generating python3 relocatable package.
2019-03-24 00:27:24 +09:00
Takuya ASADA
4fed4fecf6 reloc/python3: archive rpm build script in the relocatable package, build rpm using the script
Since we archive rpm/deb build script on relocatable package and build
rpm/deb using the script, so align python relocatable package too.

Also added SCYLLA-RELOCATABLE-FILE, SCYLLA-RELEASE-FILE and SCYLLA-VERSION-FILE
since these files are required for relocatable package.
2019-03-24 00:27:16 +09:00
Takuya ASADA
b1283b23bb relloc/python3: fix PyYAML package name
On Fedora 29 (Scylla official toolchain uses it),
PyYAML package name is "python3-pyyaml", no uppercase character.
2019-03-24 00:27:02 +09:00
Takuya ASADA
3762c4447a reloc: rename python3 relocatable package filename to align same style with other packages 2019-03-24 00:26:48 +09:00
Takuya ASADA
a515324732 reloc: move relocatable python build scripts to reloc/python3 and dist/redhat/python3
To make easier to find build scripts and keep script filename simpler,
move them to python3 directory.
2019-03-24 00:25:50 +09:00
Tomasz Grabiec
bc4a614e17 Merge "Add scylla fiber gdb command" from Botond
Debugging continuations is challenging. There is no support from gdb for
finding out which continuation was this continuation called from, nor
what other continuations are attached to it. GDB's `bt` command is of
limited use, at best a handful of continuations will appear in the
backtrace, those that were ready. This series attempts to fill part of
this void and provides a command that answers the latter question: what
continuations are attached to this one?
`scylla fiber` allows for walking a continuation chain, printing each
continuation. It is supposed to be the seastar equivalent of `bt`.
The continuation chain is walked starting from an arbitrary task,
specified by the user. The command will print all continuations attached
to the specified task.
This series also contains some loosely related cleanup of existing
commands and code in `scylla-gdb.py`.

* https://github.com/denesb/scylla.git scylla-fiber-gdb-command/v4:
  scylla-gdb.py: fix static_vector
  scylla-gdb.py: std_unique_ptr: add get() method
  scylla-gdb.py: fix existing documentation
  scylla-gdb.py: fix tasks and task-stats commands
  scylla-gdb.py: resolve(): add cache parameter
  scylla-gdb.py: scylla_ptr: move actual logic into analyze()
  scylla-gdb.py: scylla_ptr: make analyze() usable for outside code
  scylla-gdb.py: scylla_ptr: accept any valid gdb expression as input
  scylla-gdb.py: add scylla fiber command
2019-03-23 10:20:20 +02:00
Asias He
7447c92d63 storage_service: Do not use the global gms::get_local_gossiper()
Use the gossiper object stored in _gossiper member from storage_service.
2019-03-22 09:11:26 +08:00
Asias He
b91452ed4c storage_service: Pass gossiper object to storage_service
Pass the gossiper object to storage_service class in order to avoid the
usage of the static object returned from get_local_gossiper().
2019-03-22 09:11:26 +08:00
Asias He
b2c110699e gms: Remove i_failure_detector.hh
It is not used any more.
2019-03-22 09:08:51 +08:00
Asias He
af579a055b gossip: Get rid of the gms::get_local_failure_detector static object
Store the failure_detector object inside gossiper object.

- No more the global object sharded<failure_detector>

- No need to initialize sharded<failure_detector> manually which
simplifies the code in tests/cql_test_env.cc and init.cc.
2019-03-22 09:08:51 +08:00
Asias He
2b6a4050c2 dht: Do not use failure_detector::is_alive in failure_detector_source_filter
Switch failure_detector_source_filter to use get_local_gossiper::is_alive
directly since we are going to remove the static
gms::get_local_failure_detector object soon.
Pass the nodes that are down to the filter direclty, to avoid the
range_streamer to depends on gossiper at all.
2019-03-22 08:26:47 +08:00
Asias He
9dbc4af1dd tests: Fix stop snitch in gossip_test.cc
It should stop snitch not failure detector. Fix it up. We are going to
remove the static failure_detector object soon.
2019-03-22 08:26:47 +08:00
Asias He
967794798a gossiper: Do not use value_factory from storage_service object
Avoid using value_factory from storage_service inside gossiper.
2019-03-22 08:26:47 +08:00
Asias He
4a55617c6c gossiper: Use cfg options from _cfg instead of get_local_storage_service
Gossiper has db::config _cfg now, avoid using the
get_local_storage_service() to get config options.
2019-03-22 08:26:44 +08:00
Asias He
ee1227b3ae gossiper: Pass db::config object to gossiper class
Gossiper calls service::get_local_storage_service() to get cfg options.
To avoid cyclic dependency, pass the cfg object to gossiper directly.
2019-03-22 08:25:16 +08:00
Asias He
1652ee512a init: Pass gossiper object to init_ms_fd_gossiper
In order to avoid the usage of the static gossiper object returned from
get_local_gossiper().
2019-03-22 08:25:16 +08:00
Rafael Ávila de Espíndola
51754ab068 Test that large data entries are deleted
This area is hard to test since we only issue deletes during
compaction and we wait for deletes only during shutdown.

That is probably worth it, seeing that two independent bugs would have
been found by this test.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 10:48:20 -07:00
Rafael Ávila de Espíndola
bd1593c12a Allow table to be stopped twice
This will be used in a testcase.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 10:47:59 -07:00
Rafael Ávila de Espíndola
c8da28a3eb Allow large_data_handler to be stopped twice
This will be used in a testcase.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 10:47:23 -07:00
Rafael Ávila de Espíndola
c0b0a6baeb configure: Add a --compress-exec-debuginfo option
The default is the old behavior, but it is now possible to configure
with --compress-exec-debuginfo=0 to get faster links but larger
binaries.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 09:55:54 -07:00
Rafael Ávila de Espíndola
ab53055640 configure: Move some flags from cxx_ld_flags to cxxflags
They are moved because they are not relevant for linking.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 09:55:39 -07:00
Rafael Ávila de Espíndola
e11cefab9c configure: rename per mode opt to cxx_ld_flags
It is the same name used in the build.ninja file.

A followup patch will add cxxflags and move compiler only flags there.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 09:46:58 -07:00
Rafael Ávila de Espíndola
443a85a68c configure: remove per mode libs
It was always empty.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 09:46:32 -07:00
Rafael Ávila de Espíndola
35c7ec6777 configure: remove sanitize_libs and merge sanitize into opt
These are flags we want to pass to both compilation and linking. There
is nothing special about the fact that they are sanitizer related.

With {sanitize} being passed to the link, we don't need {sanitize_libs}.

We do need to make sure -fno-sanitize=vptr is the last one in the
command line. Before we were implicitly getting it from seastar, but
it is bad practice to get some sanitizer flags from seastar but not
others.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 09:43:02 -07:00
Duarte Nunes
5752174762 Merge 'Use staging directory for uploaded sstables awaiting view updates' from Piotr
"
This series adds moving sstables uploaded via `nodetool refresh` to
staging/ directory if they require generating view updates from them.
Previous behavior (leaving these sstables in upload/ directory until
view updates are generated) might have caused sstables with
conflicting names to be mistakenly overwritten by the user.

Fixes #4047

Tests: unit (dev)
dtest: backup_restore_tests.py + backup_restore_tests.py modified with
       having materialized view definitions
"

* 'use_staging_directory_for_uploaded_sstables_awaiting_view_updates' of https://github.com/psarna/scylla:
  sstables: simplify requires_view_building
  loader: move uploaded view pending sstables to staging
2019-03-21 12:46:02 -03:00
Gleb Natapov
bb93d990ad messaging_service: keep shared pointer to an rpc connection while opening mutation fragment stream
Current code captures a reference to rpc::client in a continuation, but
there is no guaranty that the reference will be valid when continuation runs.
Capture shared pointer to rpc::client instead.

Fixes #4350.

Message-Id: <20190314135538.GC21521@scylladb.com>
2019-03-21 12:46:01 -03:00
Tomasz Grabiec
69775c5721 row_cache: Fix abort in cache populating read concurrent with memtable flush
When we're populating a partition range and the population range ends
with a partition key (not a token) which is present in sstables and
there was a concurrent memtable flush, we would abort on the following
assert in cache::autoupdating_underlying_reader:

     utils::phased_barrier::phase_type creation_phase() const {
         assert(_reader);
         return _reader_creation_phase;
     }

That's because autoupdating_underlying_reader::move_to_next_partition()
clears the _reader field when it tries to recreate a reader but it finds
the new range to be empty:

         if (!_reader || _reader_creation_phase != phase) {
            if (_last_key) {
                auto cmp = dht::ring_position_comparator(*_cache._schema);
                auto&& new_range = _range.split_after(*_last_key, cmp);
                if (!new_range) {
                    _reader = {};
                    return make_ready_future<mutation_fragment_opt>();
                }

Fix by not asserting on _reader. creation_phase() will now be
meaningful even after we clear the _reader. The meaning of
creation_phase() is now "the phase in which the reader was last
created or 0", which makes it valid in more cases than before.

If the reader was never created we will return 0, which is smaller
than any phase returned by cache::phase_of(), since cache starts from
phase 1. This shouldn't affect current behavior, since we'd abort() if
called for this case, it just makes the value more appropriate for the
new semantics.

Tests:

  - unit.row_cache_test (debug)

Fixes #4236
Message-Id: <1553107389-16214-1-git-send-email-tgrabiec@scylladb.com>
2019-03-21 12:46:00 -03:00
Asias He
c0f744b407 storage_service: Wait for gossip to settle only if do_bind is set
In commit 71bf757b2c, we call
wait_for_gossip_to_settle() which takes some time to complete in
storage_service::prepare_to_join().

In tests/cql_query_test calls init_server with do_bind == false which in
turn calls storage_service::prepare_to_join(). Since in the test, there
is only one node, there is no point to wait for gossip to settle.

To make the cql_query_test fast again, do not call
wait_for_gossip_to_settle if do_bind is false.

Before this patch, cql_query_test takes forever to complete.
After it takes 10s.

Tests: tests/cql_query_test
Message-Id: <3ae509e0a011ae30eef3f383c6a107e194e0e243.1553147332.git.asias@scylladb.com>
2019-03-21 12:46:00 -03:00
Avi Kivity
a9cf07369f Merge "Add local indexes" from Piotr
"
This series adds support for local indexing, i.e. when the index table
resides on the same partition as base data.
It addresses the performance issue of having an indexed query
that also specifies a partition key - index will be queried
locally.
"

* 'add_local_indexing_11' of https://github.com/psarna/scylla: (30 commits)
  tests: add cases for local index prefix optimization
  tests: add create/drop local index test case
  tests: add non-standard names cases to local index tests
  tests: add multi pk case for local index tests
  tests: add test for malformed local index definitions
  tests: add local index paging test
  tests: add local indexing test
  cql3: add CREATE INDEX syntax for local indexes
  cql3: use serialization function to create index target string
  index: add serialization function for index targets
  index: use proper local index target when adding index
  index: add parsing target column name from local index targets
  db: add checking for local index in schema tables
  index: add checking if serialized target implies local index
  index: enable parsing multi-key targets
  index: move target parser code to .cc file
  json: add non-throwing overload for to_json_value
  cql3: add checking for local indexes in has_supporting_index()
  cql3: move finding index restrictions to prepare stage
  cql3: add picking an index by score
  ...
2019-03-21 12:46:00 -03:00
Nadav Har'El
561c640ed1 materialized views: allow view without clustering columns
When a materialized view was created, the verification code artificially
forbade creating a view without a clustering key column. However, there
is no real reason to forbid this. In the trivial case, the original base
table might not have had a clustering key, and the view might want to use
the exact same key. In a more complex case, a view may want to have all the
primary key columns as *partition* key columns, and that should be fine.

The patch also includes a regression test, which failed before this patch,
and succeeds with it (we test that we can create materialized views in both
aforementioned scenarios, and these materialized views work as expected).

Duarte raised the opinion that the "trivial" case of a view table with
a key identical to that of the base should be disallowed. However, this
should be done, if at all (I think it shouldn't), in a follow-up patch,
which will implement the non-triviality requirement consistently (e.g.,
require view primary key to be different from base's, regardless of
the existance or non-existance of clustering columns).

Fixes #4340.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Message-Id: <20190320122925.10108-1-nyh@scylladb.com>
2019-03-21 12:45:52 -03:00
Glauber Costa
34b640993f storage proxy: add tracepoints about delays
When we are tracing requests, we would like to know everything that
happened to a query that can contribute to it having increased
latencies.

We insert some of those latencies explicitly due to throttling, but we
do not log that into tracing.

In the case of storage proxy, we do have a log message at trace level
but that is rarely used: trace messages are too heavy of a hammer, there
is no way to specify specific queries, etc.

The correct place for that is CQL tracing. This patch moves that message
to CQL tracing. We also add a matching tracepoint assuring us that no
delay happened if that's the case.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190320163350.15075-1-glauber@scylladb.com>
2019-03-21 12:45:52 -03:00
Avi Kivity
eddb98e8c6 Merge "sstables: mc: Write and read static compact tables the same way as Cassandra" from Tomasz
"
Static compact tables are tables with compact storage and no
clustering columns.

Before this patch, Scylla was writing rows of static compact tables as
clustered rows instead of as static rows. That's because in our in-memory
model such tables have regular rows and no static row. In Cassandra's
schema (since 3.x), those tables have columns which are marked as
static and there are no regular columns.

This worked fine as long as Scylla was writing and reading those
sstables. But when importing sstables from Cassandra, our reader was
skipping the static row, since it's not present in our schema, and
returning no rows as a result. Also, Cassandra, and Scylla tools,
would have problems reading those sstables.

Fix this by writing rows for such tables the same way as Cassandra
does. In order to support rolling downgrade, we do that only when all
nodes are upgraded.

Fixes #4139.

Tests:

  - unit (dev)
"

* tag 'static-compact-mc-fix-v3.1' of github.com:tgrabiec/scylla:
  tests: sstables: Test reading of static compact sstable generated by Cassandra
  tests: sstables: Add test for writing and reading of static compact tables
  sstables: mc: Write static compact tables the same way as Cassandra
  sstable: mc: writer: Set _static_row_written inside write_static_row()
  sstables: Add sstable::features()
  sstables: mc: writer: Prepare write_static_row() for working with any column_kind
  storage_service: Introduce the CORRECT_STATIC_COMPACT feature flag
  sstables: mc: writer: Build indexed_columns together with serialization_header
  sstables: mc: writer: De-optimize make_serialization_header()
  sstable: mc: writer: Move attaching of mc-specific components out of generic code
2019-03-21 12:45:51 -03:00
Rafael Ávila de Espíndola
53ab298957 Turn cql3_type into a trivial wrapper over data_type
Both cql3_type and abstract_type are normally used inside
shared_ptr. This creates a problem when an abstract_type needs to refer
to a cql3_type as that creates a cycle.

To avoid warnings from asan, we were using a std::unordered_map to
store one of the edges of the cycle. This avoids the warning, but
wastes even more memory.

Even before this patch cql3_type was a fairly light weight
structure. This patch pushes in that direction and now cql3_type is a
struct with a single member variable, a data_type.

This avoids the reference cycle and is easier to understand IMHO.

Tests: unit (dev)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 14:10:28 -07:00
Rafael Ávila de Espíndola
c76148b6ce Delete cql3_type::varchar
varchar is just an alias for text. Handle that conversion directly in
the parser and delete the cql3_type::varchar variable.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 14:07:46 -07:00
Rafael Ávila de Espíndola
7f64a6ec4b Simplify db::cql_type_parser::parse
Since its first version, db::cql_type_parser::parse had special cases
for native and user defined types.

Those are not necessary, as the general parser has no problem handling
them.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 12:44:31 -07:00
Rafael Ávila de Espíndola
088d59aced Add a test for the varchar column representation
We map varchar to text, and so does cassandra.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 12:44:31 -07:00
Rafael Ávila de Espíndola
8d9baf9843 large_data_handler: Make a variable non static
The value computed is not static since
f254664fe6, but unfortunately that was
missed in that commit.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 09:31:21 -07:00
Rafael Ávila de Espíndola
e7749e7aee large_data_handler: Fix a use after destruction
The path leading to the issue was:

The sstable name is allocated and passed to maybe_delete_large_data_entries by reference

   auto name = sst->get_filename();
   return large_data_handler.maybe_delete_large_data_entries(*sst->get_schema(), name, sst->data_size());

A future is created with a reference to it

  large_partitions = with_sem([&s, &filename, this] {
     return delete_large_data_entries(s, filename, db::system_keyspace::LARGE_PARTITIONS);
  });

The semaphore blocks.

The filename is destroyed.

delete_large_data_entries is called with a destroyed filename.

The reason this did not reproduce trivially in a debug build was that
the sstable itself was in the stack and the destructed value was read
as an internal value, and so asan had nothing to complain about.

Unfortunately we also had no tests that the entry in
system.large_rows was actually deleted.

This patch passes the name by value. It might create up to 3 copies of
it. If that is too inefficient it can probably be avoided with a
do_with in maybe_delete_large_data_entries.

Fixes #4335

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 09:30:42 -07:00
Rafael Ávila de Espíndola
c250a26e68 configure: split a ld_flags_{mode} out of cxxflags_{mode}
Flags that we want to pass to gcc during compilation and linking are
in cxx_ld_flags_{mode}.

With this patch, we no longer pass

-I. -I build/{mode}/gen

to the link, which should have no impact.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 08:33:23 -07:00
Piotr Sarna
9695a47e96 sstables: simplify requires_view_building
Since sstables uploaded via upload/ directory are no longer left there
awaiting view updates, the only remaining valid directory is staging/.
2019-03-20 13:47:21 +01:00
Botond Dénes
0c381572fd repair::row_level: pin table for local reads
The repair reader depends on the table object being alive, while it is
reading. However, for local reads, there was no synchronization between
the lifecycle of the repair reader and that of the table. In some cases
this can result in use-after-free. Solve by using the table's existing
mechanism for lifecycle extension: `read_in_progress()`.

For the non-local reader, when the local node's shard configuration is
different from the remote one's, this problem is already solved, as the
multishard streaming reader already pins table objects on the used
shards. This creates an inconsistency that might be suprising (in a bad
way). One reader takes care of pinning needed resources while the other
one doesn't. I was thorn on how to reconcile this, and decided to go
with the simplest solution, explicitely pinning the table for local
reads, that is conserve the inconsistency. It was suggested that this
inconsitency is remedied by building resource pinning into the local
reader as well [1] but there is opposition to this [2]. Adding a wrapper
reader which does just the resource pinning seems excessive, both in
code and runtime overhead.

Spotted while investigating repair-related crashes which occured during
interrupted repairs.

Fixes: #4342

[1] https://github.com/scylladb/scylla/issues/4342#issuecomment-474271050
[2] https://github.com/scylladb/scylla/issues/4342#issuecomment-474331657

Tests: none, this is a trivial fix for a not-yet-seen-in-the-wild bug.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <8e84ece8343468960d4e161467ecd9bb10870c27.1553072505.git.bdenes@scylladb.com>
2019-03-20 14:45:22 +02:00
Piotr Sarna
986004a959 loader: move uploaded view pending sstables to staging
When loading tables uploaded via `nodetool refresh`, they used to be
left in upload/ directory if view updates would need to be generated
from them. Since view update generation is asynchronous, sstables
left in the directory could erroneously get overwritten by the user,
who decides to upload another batch of sstables and some of the names
collided.
To remedy this, uploaded sstables that need view updates are moved
to staging/ directory with a unique generation number, where they
await view update generation.

Fixes #4047
2019-03-20 13:44:29 +01:00
Juliana Oliveira
8cd6028d0d Dockerfile: remove cgroup volume mount
Mounting /sys/fs/cgroup inside the image causes docker cgroup to not
be mounted internally. Therefore, hosts cannot limit resources on
Scylla. This patch removes the cgroup volume mount, allowing folders
under /sys/fs/cgroup to be created inside docker.

Message-Id: <20190320122053.GA20256@shenzou.localdomain>
2019-03-20 14:30:27 +02:00
Nadav Har'El
7c874057f5 materialized_views: propagate "view virtual columns" between nodes
db::schema_tables::ALL and db::schema_tables::all_tables() are both supposed
to list the same schema tables - the former is the list of their names, and
the latter is the list of their schemas. This code duplication makes it easy
to forget to update one of them, and indeed recently the new
"view_virtual_columns" was added to all_tables() but not to ALL.

What this patch does is to make ALL a function instead of constant vector.
The newly named all_table_names() function uses all_tables() so the list
of schema tables only appears once.

So that nobody worries about the performance impact, all_table_names()
caches the list in a per-thread vector that is only prepared once per thread.

Because after this patch all_table_names() has the "view_virtual_columns"
that was previously missing, this patch also fixes #4339, which was about
virtual columns in materialized views not being propagated to other nodes.

Unfortunately, to test the fix for #4339 we need a test with multiple
nodes, so we cannot test it here in a unit test, and will instead use
the dtest framework, in a separate patch.

Fixes #4339

Branches: 3.0
Tests: all unit tests (release and debug mode), new dtest for #4339. The unit test mutation_reader_test failed in debug mode but not in release mode, but this probably has nothing to do with this patch (?).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Message-Id: <20190320063437.32731-1-nyh@scylladb.com>
2019-03-20 09:14:59 -03:00
Nadav Har'El
ccf731a820 Materialized views: add metric for current flow-control delay
The materialized views flow control mechanism works by adding a certain
delay to each client request, designed to slow down the client to the
rate at we can complete the background view work. Until now we could observe
this mechanism only indirectly, in whether or not it succeeded to keep the
view backlog bounded; But we had no way to directly observe the delay that
we decided to add. In fact, we had a bug where this delay was constantly
zero, and we didn't even notice :-)

So in this patch we add a new metric,
scylla_storage_proxy_coordinator_last_mv_flow_control_delay

The metric is a floating point number, in units of seconds.

This metric is somewhat peculiar that it always contains the *last* delay
used for some request - unlike other metrics it doesn't measure the "current"
value of something. Moreover, it can jump wildly because there is no
guarantee that each request's delay will be identical (in particular,
different requests may involve different base replicas which have different
view backlogs, so decide on different delays). In the future we may want
to supplement this metric with some sort of delay histogram. But even
this simple metric is already useful to debug certain scenarios and
understand if the materialized-views flow control is working or not.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190227133630.26328-1-nyh@scylladb.com>
2019-03-20 09:14:59 -03:00
Tomasz Grabiec
fbeae4ffeb toolchain: Install gdb in the image
Scylla built using the frozen toolchain needs to be debugged
on a system with matching libraries. It's easiest if it's also done on the same image.
Install gdb in the image so that it's always out there when we need it.

Fixes #4329

Message-Id: <1553072393-9145-1-git-send-email-tgrabiec@scylladb.com>
2019-03-20 13:35:26 +02:00
Piotr Sarna
41679de13e tests: add cases for local index prefix optimization
The cases check if incorporating clustering key prefix into
the indexed query works fine (i.e. does not require filtering
and returns proper rows).
2019-03-20 10:51:27 +01:00
Piotr Sarna
56a0e6d992 tests: add create/drop local index test case 2019-03-20 10:51:27 +01:00
Piotr Sarna
3c61c8e18a tests: add non-standard names cases to local index tests
New test cases cover case-sensitive column/table names and names with
non-alphanumeric characters like commas and parentheses.
2019-03-20 10:51:27 +01:00
Piotr Sarna
d664e0e522 tests: add multi pk case for local index tests 2019-03-20 10:51:27 +01:00
Piotr Sarna
3b39029924 tests: add test for malformed local index definitions 2019-03-20 10:51:27 +01:00
Piotr Sarna
4b82011cd3 tests: add local index paging test 2019-03-20 10:51:27 +01:00
Piotr Sarna
8836500fcd tests: add local indexing test
A test case for local indexing is added to the SI suite.
2019-03-20 10:51:27 +01:00
Piotr Sarna
cedec95f8d cql3: add CREATE INDEX syntax for local indexes
In order to create a local index, the syntax used is:
CREATE INDEX t ON ((p1, p2, p3), v);

where (p1, p2, p3) are partition key columns (all of them),
and v is the indexed column.
2019-03-20 10:51:27 +01:00
Piotr Sarna
1fd61c5ac4 cql3: use serialization function to create index target string
Instead of building the string manually, a serialization function
is called to create a string out of index target list.
2019-03-20 10:51:27 +01:00
Piotr Sarna
757419b524 index: add serialization function for index targets
Since target_parser is responsible for deserializing target strings,
the function that serializes them belongs in the same class.
2019-03-20 10:51:26 +01:00
Piotr Sarna
074ed2c8a5 index: use proper local index target when adding index
With global indexes, target column name is always the same as the string
kept in 'options[target]' field. It's not the case for local indexes,
and so a proper extracting function is used to get the value.
2019-03-20 10:20:24 +01:00
Piotr Sarna
2fcae3d0ec index: add parsing target column name from local index targets
When (re)creating a local index, the target string needs to be used
to parse out the actual indexed column:
"(base_pk_part1,base_pk_part2,base_pk_part3),actual_indexed_column".
This column is later used to deterine if an index should be applied
to a SELECT statement.
2019-03-20 10:20:24 +01:00
Piotr Sarna
e0d7807eed db: add checking for local index in schema tables
Based on which targets the index has, it will be either local
or global - local indexes have their full base partition key
embedded in their targets.
2019-03-20 10:20:24 +01:00
Piotr Sarna
de5e5ee1a5 index: add checking if serialized target implies local index
This utility enables checking if the specified target indicated
having a local index, even before base table schema is known.
2019-03-20 10:20:24 +01:00
Piotr Sarna
5672edc149 index: enable parsing multi-key targets
Parsing index targets that consist of partition key columns
followed by clustering key columns is enabled.
2019-03-20 10:20:24 +01:00
Piotr Sarna
9782381dd4 index: move target parser code to .cc file
It will be useful later when expanding the implementation.
2019-03-20 10:20:24 +01:00
Piotr Sarna
25264d61ee json: add non-throwing overload for to_json_value
It will be needed later to avoid unnecessary try-catch blocks.
2019-03-20 10:20:24 +01:00
Piotr Sarna
b46ab76d4b cql3: add checking for local indexes in has_supporting_index()
With local indexes it's not sufficient to check if a single
restriction is supported by an index in order to decide
that in can be used, because local indexes can be leveraged
only when full partition key is properly restricted.

(It also serves as a great example why restrictions code
 would greatly benefit from a facelift! :) )
2019-03-20 10:20:24 +01:00
Piotr Sarna
87f6e37caa cql3: move finding index restrictions to prepare stage
Index restrictions that match a given index were recomputed
during execution stage, which is redundant and prone to errors.
Now, used index restrictions are cached in a prepare statement.
2019-03-20 10:20:22 +01:00
Piotr Sarna
9823898b27 cql3: add picking an index by score
Instead of choosing the first index that we find (in column def order),
the index with highest score is picked. Currently local indexes
score higher than global ones if restrictions allow local indexing
to be applied.
2019-03-20 10:20:02 +01:00
Piotr Sarna
2f173f7ed8 cql3: add handling paging state for local indexes
When computing paging state for local indexes, the partition
and clustering keys are different than with global ones:
 - partition key is the same as base's
 - clustering key starts with the indexed column
2019-03-20 10:20:02 +01:00
Piotr Sarna
75dd964751 cql3: add handling partition slices for local indexes
For local indexes, a slice will consist of the indexed column
followed by base clustering columns.
2019-03-20 10:20:01 +01:00
Piotr Sarna
b12162c8f5 cql3: add returning correct partition ranges for local indexes
Local indexes always share the partition range with their base.
2019-03-20 09:51:46 +01:00
Piotr Sarna
da8e8f18b3 cql3: make read_posting_list a member function
It already accepts several arguments that can be extracted from 'this',
and more will be added in the future.
New parameters include lambdas prepared during prepare stage
that define how to extract partition/clustering key ranges depending
on which index is used, so keeping it a static function will result
in unbounded number of parameters with complex types, which will
in turn make the function header almost illegible for a reader.
Hence, read_posting_list becomes a member function with easy access
to any data prepared during prepare stage.
2019-03-20 09:51:46 +01:00
Piotr Sarna
85017c5ad4 cql3: look for indexed column definition only once
There's no need to look for the column definition inside a loop.
2019-03-20 09:51:46 +01:00
Piotr Sarna
8002471c81 cql3: allow index target to keep multiple columns
Instead of having just one column definition, index target is now
a variant of either single column definition or a vector of them.
The vector is expected to be used when part of a target definition
is enclosed in parentheses:
 $ CREATE INDEX ON t((p),v);
or
 $ CREATE INDEX ON t((p1,p2), v);
etc.

This feature will allow providing (possibly composite) base partition key
to CREATE INDEX statement, which will result in creating a local index.
2019-03-20 09:51:46 +01:00
Piotr Sarna
a45022dbc7 docs: document index target serialization
Index target serialization format is extended for the purpose
of local indexing. Both new and old formats are described
in docs.
2019-03-20 09:51:46 +01:00
Piotr Sarna
9c984f9da9 index: fix indentation 2019-03-20 09:51:46 +01:00
Piotr Sarna
3b908b7b5d index: add base partition keys to local index schema
When the index is local, its partition key in underlying materialized
view is the the same as base's, and the indexed column is a first
clustering key. This implementation ensures that view and base rows
will reside on the same partition, while querying the indexed column
will be possible by putting it as a first clustering key part.
2019-03-20 09:51:46 +01:00
Piotr Sarna
90d47ca183 schema: add is_local_index cached value to index metadata
In order to quickly distinguish global indexes from local ones,
a cached boolean value is introduced.
2019-03-20 09:51:46 +01:00
Botond Dénes
ddf795d2f9 configure.py: add check header targets
Our guidelines dictate that each header is self-sufficient, i.e.
after including it into an empty .cc file, the .cc file can be compiled
without having to include any other header file.
Currently we don't have any tool to check that a header is self
sufficient. This patch aims to remedy that by adding a target to check
each header, as well as a target to check all the headers.
For each header a target is generated that does the equivalent of
including the header into an empty .cc file, then compiling the
resulting .cc file.This targetis called {header_name}.o, so for
given the header `myheader.hh` this will be `build/dev/myheader.hh.o`
(if the dev build-mode is used).
Also a target, `checkheaders` is added which validates all headers in
the project. This currently fails as we have many headers that are not
self-sufficient.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <fdf550dc71203417252f1d8144e7a540eec074a1.1552636812.git.bdenes@scylladb.com>
2019-03-19 17:35:18 +02:00
Botond Dénes
721dd70d93 scylla-gdb.py: add scylla fiber command
The scylla fiber command traverses a continuation chain, given an
arbitrary task pointer.
Example (cropped for brevity):
(gdb) scylla fiber this
 #0  (task*) 0x0000600000550360 0x000000000468ac40 vtable for seastar...
 #1  (task*) 0x0000600000550300 0x00000000046c3778 vtable for seastar...
 #2  (task*) 0x00006000018af600 0x00000000046c37a0 vtable for seastar...
 #3  (task*) 0x00006000005502a0 0x00000000046c37f0 vtable for seastar...
 #4  (task*) 0x0000600001a65e10 0x00000000046c6b10 vtable for seastar...

scylla fiber can be passed any expression that evaluates to a task
pointer. C++ variables, raw adresses and GDB variables (e.g. $1) all
work.

The command works by scanning the task object for pointers. If a pointer
is found it is dereferenced. If successful it checks that the pointer
dereferences to a vtable, the class for which is a known task.
If this succeeds the found task is saved, the scan then recursively
proceeds to scan the newly found task until a task with no further
attached continuations is found.
2019-03-19 17:06:41 +02:00
Botond Dénes
697fc5cefe scylla-gdb.py: scylla_ptr: accept any valid gdb expression as input 2019-03-19 17:06:41 +02:00
Botond Dénes
e1ea4db7ca scylla-gdb.py: scylla_ptr: make analyze() usable for outside code
Instead of a formatted message, intended for humans, return a
`pointer_metadata` object, suitable for being using by code. The
formatting of the pointer metadata into the human readable message is
now done by the `pointer_metadata.__str__()` method, on the call site.

Also make `analyze()` a class method, making it possible for being
called without having to create a `scylla_ptr` command instance,
possibly confusing GDB.
2019-03-19 17:06:41 +02:00
Botond Dénes
e77b6d12d1 scylla-gdb.py: scylla_ptr: move actual logic into analyze()
In preparation to this method being made usable for outside code.
2019-03-19 17:06:41 +02:00
Botond Dénes
7d5c0ff666 scylla-gdb.py: resolve(): add cache parameter
Allow callers to prevent the resolved name from being saved. Useful when
one is just probing addresses but doesn't want to flood the cache with
useless symbols.
2019-03-19 17:06:41 +02:00
Botond Dénes
48b96d25b3 scylla-gdb.py: fix tasks and task-stats commands
These two commands are broken for some time, roughly since the CPU
scheduler was merged. Fix them and move the task queue parsing code into
a common method, which now is used by both commands.
2019-03-19 17:06:41 +02:00
Botond Dénes
87c28df429 scylla-gdb.py: fix existing documentation
Some commands are documented, but not in the python way. Refactor these
commands so they use the standard python way for self documenting. In
addition to being more "python", this makes these documentation strings
discoverable by GDB so they appear in the `help scylla` output.
2019-03-19 17:06:41 +02:00
Botond Dénes
e1dffc3850 scylla-gdb.py: std_unique_ptr: add get() method
Add a `get()` method that retrieves the wrapped pointer without
dereferencing it. All existing methods are refactored to use this new
method to obtain the pointer instead of directly accessing the members.
This way only a single method has to be fixed if the object
implementation changes.
2019-03-19 17:06:41 +02:00
Botond Dénes
c51b11c0ed scylla-gdb.py: fix static_vector
Appearantly a new 'dummy' level was added.
2019-03-19 17:06:41 +02:00
Glauber Costa
7119440cbc tests: make sure that commitlog replay works after truncate.
Tomek and I recently had a discussion about whether or not a commitlog
replay would be safe after we dropped or truncated a table that is not
flushed (durable, but auto_snapshots being false).

While we agreed that would be the safe, we both agreed we would feel
better with a unit test covering that.

This patch adds such a test (btw, it passes)

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190318223811.6862-1-glauber@scylladb.com>
2019-03-19 11:30:51 +01:00
Avi Kivity
0441b59a70 Update seastar submodule
* seastar 463d24e...33baf62 (3):
  > reactor: improve detection of io_pgetevents()
  > rpc: fix stack use after free in frame reading functions
  > core/thread: enable move-only functions
2019-03-19 11:44:35 +02:00
Takuya ASADA
32cee92d56 dist/debian: don't strip ld.so
On some environment dh_strip fails at libreloc/ld.so, so it's better to
skip too just like libprotobuf.so.15.

error message is:
dh_strip -Xlibprotobuf.so.15 --dbg-package=scylla-server-dbg
strip:debian/scylla-server/opt/scylladb/libreloc/ld.so[.gnu.build.attributes]: corrupt GNU build attribute note: bad description size: Bad value
dh_strip: strip --remove-section=.comment --remove-section=.note --strip-unneeded debian/scylla-server/opt/scylladb/libreloc/ld.so returned exit code 1
0

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190319005153.26506-1-syuu@scylladb.com>
2019-03-19 11:06:44 +02:00
Asias He
71bf757b2c gossiper: Enable features only after gossip is settled
n1, n2, n3 in the cluster,

shutdown n1, n2, n3

start n1, n2

start n3, we saw features are enabled using the system table while n1 and n2 are already up and running in the cluster.

INFO  2019-02-27 09:24:41,023 [shard 0] gossip - Feature check passed. Local node 127.0.0.3 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH}
INFO  2019-02-27 09:24:41,025 [shard 0] storage_service - Starting up server gossip
INFO  2019-02-27 09:24:41,063 [shard 0] gossip - Node 127.0.0.1 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH}
INFO  2019-02-27 09:24:41,063 [shard 0] gossip - Node 127.0.0.2 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH}

The problem is we enable the features too early in the start up process.
We should enable features after gossip is settled.

Fixes #4289
Message-Id: <04f2edb25457806bd9e8450dfdcccc9f466ae832.1551406991.git.asias@scylladb.com>
2019-03-18 18:25:29 +01:00
Dejan Mircevski
c7d05b88a6 Update GCC version check in configure.py
This brings the version check up-to-date with README.md and HACKING.md,
which were updated by commit fa2b03 ("Replace std::experimental types
with C++17 std version.") to say that minimum GCC 8.1.1 is required.

Tests: manually run configure.py with various `--compiler` values.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20190318130543.24982-1-dejan@scylladb.com>
2019-03-18 15:24:25 +02:00
Tomasz Grabiec
33f15aa1b5 tests: sstables: Test reading of static compact sstable generated by Cassandra 2019-03-18 11:18:33 +01:00
Tomasz Grabiec
c78568daef tests: sstables: Add test for writing and reading of static compact tables 2019-03-18 11:18:33 +01:00
Tomasz Grabiec
47ca280e57 sstables: mc: Write static compact tables the same way as Cassandra
Static compact tables are tables with compact storage and no
clustering columns.

Before this patch, Scylla was writing rows of static compact tables as
clustered rows instead of static rows. That's because in our in-memory
model such tables have regular rows and no static row. In Cassandra's
schema (since 3.x), those tables have columns which are marked as
static and there are no regular columns.

This worked fine as long as Scylla was writing and reading those
sstables. But when importing sstables from Cassandra, our reader was
skipping the static row, since it's not present in the schema, and
returning no rows as a result. Also, Cassandra, and Scylla tools,
would have problems reading those sstables.

Fix this by writing rows for such tables the same way as Cassandra
does. In order to support rolling downgrade, we do that only when all
nodes are upgraded.

Fixes #4139.
2019-03-18 11:18:33 +01:00
Tomasz Grabiec
b0ff68d8d9 sstable: mc: writer: Set _static_row_written inside write_static_row() 2019-03-18 11:18:33 +01:00
Tomasz Grabiec
b68df143a1 sstables: Add sstable::features() 2019-03-18 11:18:33 +01:00
Tomasz Grabiec
cf9721e855 sstables: mc: writer: Prepare write_static_row() for working with any column_kind 2019-03-18 11:18:33 +01:00
Tomasz Grabiec
fefef7b9eb storage_service: Introduce the CORRECT_STATIC_COMPACT feature flag
When enabled on all nodes, sstable writers will start to produce
correct MC-format sstables for compact storage tables by writing rows
into the static row (like C*) rather than into the regular row.

We only do that when all nodes are upgraded to support rolling
downgrade. After all nodes are upgraded, regular rolling downgrade will
not be possible.

Refs #4139
2019-03-18 11:18:33 +01:00
Tomasz Grabiec
52d634025d sstables: mc: writer: Build indexed_columns together with serialization_header
The set of columns in both must match, so it's better to build them
together.  Later the for choosing columns will become more
complicated, and this patch will allow for avoiding duplication.
2019-03-18 11:18:33 +01:00
Tomasz Grabiec
701ac53b80 sstables: mc: writer: De-optimize make_serialization_header()
So that it's easier to make it use schema_v3 conditionally in later
patches. It's not on the hot path, so it shouldn't matter that we
don't reserve the vectors.
2019-03-18 11:15:18 +01:00
Tomasz Grabiec
8bb8d67a93 sstable: mc: writer: Move attaching of mc-specific components out of generic code 2019-03-18 11:15:18 +01:00
Tomasz Grabiec
b0e6f17a22 Merge "Fix empty remote common_features in check_knows_remote_features" from Asias
Three nodes in the cluster node1, node2, node3

Shutdown the whole cluster

Start node1

Start node2, node2 sees empty remote common_features.

   gossip - Feature check passed.  Local node 127.0.0.2 features =
   {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS,
   DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS,
   LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT,
   RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3,
   STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH},
   Remote common_features = {}

The problem is node3 hasn't started yet, node1 sees node3 has empty
features. In get_supported_features(), an empty common features will be
returned if an empty features of a node is seen. To fix, we should
fallback to use the features saved in system table.

Start node3, node3 sees empty remote common_features.

   gossip - Feature check passed. Local node 127.0.0.3 features =
   {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS,
   DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS,
   LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT,
   RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3,
   STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH},
   Remote common_features = {}

The problem is node3 hasn't inserted its own features into gossip
endpoint_state_map. get_supported_features() returns the common features
of all nodes in endpoint_state_map. To fix, we should fallback to use
the features stored in the system table for such node in this case.

Fixes #4225
Fixes #4341

* dev asias/fix_check_knows_remote_features.upstream.v4.1:
  gossiper: Remove unused register_feature and unregister_feature
  gossiper: Remove unused wait_for_feature_on_all_node and
    wait_for_feature_on_node
  gossiper: Log feature is enabled only if the feature is not enabled
    previously
  gossiper: Fix empty remote common_features in
    check_knows_remote_features
2019-03-18 10:56:10 +01:00
Asias He
1d59f26c11 gossiper: Fix empty remote common_features in check_knows_remote_features
Three nodes in the cluster node1, node2, node3

Shutdown the whole cluster

Start node1

Start node2, node2 sees empty remote common_features.

   gossip - Feature check passed.  Local node 127.0.0.2 features =
   {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS,
   DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS,
   LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT,
   RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3,
   STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH},
   Remote common_features = {}

The problem is node3 hasn't started yet, node1 sees node3 has empty
features. In get_supported_features(), an empty common features will be
returned if an empty features of a node is seen. To fix, we should
fallback to use the features saved in system table.

Start node3, node3 sees empty remote common_features.

   gossip - Feature check passed. Local node 127.0.0.3 features =
   {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS,
   DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS,
   LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT,
   RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3,
   STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH},
   Remote common_features = {}

The problem is node3 hasn't inserted its own features into gossip
endpoint_state_map. get_supported_features() returns the common features
of all nodes in endpoint_state_map. To fix, we should fallback to use
the features stored in the system table for such node in this case.

Fixes #4225
2019-03-18 10:56:10 +01:00
Asias He
acb4badbc3 gossiper: Log feature is enabled only if the feature is not enabled previously
We saw the log "Feature FOO is enabled" more than once like below. It is
better to log it only when the feature is not enabled previously.

    gossip - InetAddress 127.0.0.1 is now UP, status = NORMAL
    gossip - Feature CORRECT_COUNTER_ORDER is enabled
    gossip - Feature CORRECT_NON_COMPOUND_RANGE_TOMBSTONES is enabled
    gossip - Feature COUNTERS is enabled
    gossip - Feature DIGEST_MULTIPARTITION_READ is enabled
    gossip - Feature INDEXES is enabled
    gossip - Feature LARGE_PARTITIONS is enabled
    gossip - Feature LA_SSTABLE_FORMAT is enabled
    gossip - Feature MATERIALIZED_VIEWS is enabled
    gossip - Feature MC_SSTABLE_FORMAT is enabled
    gossip - Feature RANGE_TOMBSTONES is enabled
    gossip - Feature ROLES is enabled
    gossip - Feature ROW_LEVEL_REPAIR is enabled
    gossip - Feature SCHEMA_TABLES_V3 is enabled
    gossip - Feature STREAM_WITH_RPC_STREAM is enabled
    gossip - Feature TRUNCATION_TABLE is enabled
    gossip - Feature WRITE_FAILURE_REPLY is enabled
    gossip - Feature XXHASH is enabled

    gossip - Feature CORRECT_COUNTER_ORDER is enabled
    gossip - Feature CORRECT_NON_COMPOUND_RANGE_TOMBSTONES is enabled
    gossip - Feature COUNTERS is enabled
    gossip - Feature DIGEST_MULTIPARTITION_READ is enabled
    gossip - Feature INDEXES is enabled
    gossip - Feature LARGE_PARTITIONS is enabled
    gossip - Feature LA_SSTABLE_FORMAT is enabled
    gossip - Feature MATERIALIZED_VIEWS is enabled
    gossip - Feature MC_SSTABLE_FORMAT is enabled
    gossip - Feature RANGE_TOMBSTONES is enabled
    gossip - Feature ROLES is enabled
    gossip - Feature ROW_LEVEL_REPAIR is enabled
    gossip - Feature SCHEMA_TABLES_V3 is enabled
    gossip - Feature STREAM_WITH_RPC_STREAM is enabled
    gossip - Feature TRUNCATION_TABLE is enabled
    gossip - Feature WRITE_FAILURE_REPLY is enabled
    gossip - Feature XXHASH is enabled
    gossip - InetAddress 127.0.0.2 is now UP, status = NORMAL
2019-03-18 10:56:10 +01:00
Asias He
f32f08c91e gossiper: Remove unused wait_for_feature_on_all_node and wait_for_feature_on_node
Remove unused check_features helper as well.
2019-03-18 10:56:09 +01:00
Asias He
6dbcb2e0c9 gossiper: Remove unused register_feature and unregister_feature
They are not used any more.
2019-03-18 10:56:09 +01:00
Benny Halevy
ecf88d8e2e compaction: fix sstable_window_size calculation is only unit/size is set
If a user that changes the default UNIT from DAYS to HOURS and does not set
the compaction_window_size will endup with a window of 24H instead of 1H.

According to the docs https://docs.scylladb.com/getting-started/compaction/#twcs-options
compaction_window_size should default to a value of 1.

Fixes #4310

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190307131318.13998-1-bhalevy@scylladb.com>
2019-03-18 11:19:18 +02:00
Takuya ASADA
02be95365f reloc/build_rpm.sh: don't use '*' for tar xf argument
It works accidentally but it just expanded by bash to use mached files
in current directory, not correctly recognized by tar.
Need to use full file name instead.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190312172243.5482-2-syuu@scylladb.com>
2019-03-18 11:09:55 +02:00
Takuya ASADA
5b10b6a0ce reloc/build_reloc.sh: enable DPDK
We get following link error when running reloc/build_reloc.sh in dbuild,
need to enable DPDK on Seastar:

g++: error: /usr/lib64/librte_cfgfile.so: No such file or directory
g++: error: /usr/lib64/librte_cmdline.so: No such file or directory
g++: error: /usr/lib64/librte_ethdev.so: No such file or directory
g++: error: /usr/lib64/librte_hash.so: No such file or directory
g++: error: /usr/lib64/librte_kvargs.so: No such file or directory
g++: error: /usr/lib64/librte_mbuf.so: No such file or directory
g++: error: /usr/lib64/librte_eal.so: No such file or directory
g++: error: /usr/lib64/librte_mempool.so: No such file or directory
g++: error: /usr/lib64/librte_mempool_ring.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_bnxt.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_e1000.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_ena.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_enic.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_fm10k.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_qede.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_i40e.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_ixgbe.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_nfp.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_ring.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_sfc_efx.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_vmxnet3_uio.so: No such file or directory
g++: error: /usr/lib64/librte_ring.so: No such file or directory

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190312172243.5482-1-syuu@scylladb.com>
2019-03-18 11:09:55 +02:00
Piotr Sarna
2e05d86cf3 service: reduce number of spawned threads when notifying
Commit 9c544df217 introduced running up/down/join/leave notifications
in threaded context, but spawned a thread for every notification,
while it could be done once for all notifiees.

Reported-by: Avi Kivity <avi@scylladb.com>
Message-Id: <34815d5aa11902c4a052cff38f4c45c45ff919d8.1552897848.git.sarna@scylladb.com>
2019-03-18 10:45:47 +02:00
Avi Kivity
64fa2dd1d2 Merge "gdb: Introduce 'scylla sstables'" from Tomasz
"
Finds all sstables on current shard and prints useful information,
like on-disk and in-memory usage.

Example:

  (gdb) scylla sstables
  (sstables::sstable*) 0x60100034d200: local=1 data_file=9551, in_memory=266192 (bf=400, summary=3072, sm=262096)
  (sstables::sstable*) 0x601000348600: local=1 data_file=1229, in_memory=266192 (bf=400, summary=3072, sm=262096)
  (sstables::sstable*) 0x601000348000: local=1 data_file=4785, in_memory=266192 (bf=400, summary=3072, sm=262096)
  (sstables::sstable*) 0x60100034c600: local=1 data_file=298, in_memory=266192 (bf=400, summary=3072, sm=262096)
  ...
  total (shard-local): count=144, data_file=782839677, in_memory=59774408

Because of the way it finds sstables (bag_sstable_set), doesn't yet support tables using LeveledCompactionStrategy.
"

* 'gdb-scylla-sstables' of github.com:tgrabiec/scylla:
  gdb: Introduce 'scylla sstables'
  gdb: Introduce find_instances()
  gdb: Extract std_unqiue_ptr.get()
  gdb: Add chunked_vector wrapper
  gdb: Add small_vector wrapper
  gdb: Add circular_buffer.size() and circular_buffer.external_memory_footprint()
  gdb: Add wrapper for seastar::lw_shared_ptr
  gdb: Add std_vector.external_memory_footprint()
  gdb: Add wrapper for boost::variant
  gdb: Add wrapper for std::optional
2019-03-17 19:37:44 +02:00
Takuya ASADA
270f9cf9e6 dist/debian: fix installing scyllatop
Since we removed dist/common/bin/scyllatop we are getting a build error
on .deb package build (1bb65a0888).
To fix the error we need to create a symlink for /usr/bin/scyllatop.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190316162105.28855-1-syuu@scylladb.com>
2019-03-17 19:37:44 +02:00
Tomasz Grabiec
05e2c87936 gdb: Introduce 'scylla sstables'
Finds all sstables on current shard and prints useful information,
like on-disk and in-memory usage.

Example:

  (gdb) scylla sstables
  (sstables::sstable*) 0x60100034d200: local=1 data_file=9551, in_memory=266192 (bf=400, summary=3072, sm=262096)
  (sstables::sstable*) 0x601000348600: local=1 data_file=1229, in_memory=266192 (bf=400, summary=3072, sm=262096)
  (sstables::sstable*) 0x601000348000: local=1 data_file=4785, in_memory=266192 (bf=400, summary=3072, sm=262096)
  (sstables::sstable*) 0x60100034c600: local=1 data_file=298, in_memory=266192 (bf=400, summary=3072, sm=262096)
2019-03-15 15:12:48 +01:00
Tomasz Grabiec
929653f51d gdb: Introduce find_instances() 2019-03-15 15:12:48 +01:00
Tomasz Grabiec
fc4952c579 gdb: Extract std_unqiue_ptr.get() 2019-03-15 15:12:48 +01:00
Tomasz Grabiec
e47a5019f2 gdb: Add chunked_vector wrapper 2019-03-15 15:12:47 +01:00
Tomasz Grabiec
a6da71e4da gdb: Add small_vector wrapper 2019-03-15 15:12:47 +01:00
Tomasz Grabiec
0e8589cfdf gdb: Add circular_buffer.size() and circular_buffer.external_memory_footprint() 2019-03-15 15:12:47 +01:00
Tomasz Grabiec
380c6fbdfe gdb: Add wrapper for seastar::lw_shared_ptr 2019-03-15 15:12:47 +01:00
Tomasz Grabiec
93e5e0d644 gdb: Add std_vector.external_memory_footprint() 2019-03-15 15:12:47 +01:00
Tomasz Grabiec
8866b1320a gdb: Add wrapper for boost::variant 2019-03-15 15:12:46 +01:00
Tomasz Grabiec
dd237c32af gdb: Add wrapper for std::optional 2019-03-15 15:12:46 +01:00
Paweł Dziepak
f4f56027bf Merge "Detect partitioner mismatch" from Piotr
"
Refuse to accept SSTables that were created with partitioner
different than the one used by the Scylla server.

Fixes #4331
"

* 'haaawk/4331/v4' of github.com:scylladb/seastar-dev:
  sstables: Add test for sstable::validate_partitioner
  sstables: Add sstable::validate_partitioner and use it
2019-03-15 11:45:10 +00:00
Piotr Jastrzebski
2b0437a147 sstables: Add test for sstable::validate_partitioner
Make sure the exception is thrown when Scylla
tries to load an SSTable created with non-compatible partitioner.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-03-15 10:47:47 +01:00
Piotr Jastrzebski
4aea97f120 sstables: Add sstable::validate_partitioner and use it
Scylla server can't read sstables that were created
with different partitioner than the one being used by Scylla.

We should make sure that Scylla identifies such mismatch
and refuses to use such SSTables.

We can use partitioner information stored in validation metadata
(Statistics.db file) for each SSTable and compare it against
partitioner used by Scylla.

Fixes #4331

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-03-15 10:14:37 +01:00
Rafael Ávila de Espíndola
94c28cfb16 sstable: Wait for future returned by maybe_record_large_cells.
A previous version of the patch that introduced these calls had no
limit on how far behind the large data recording could get, and
maybe_record_large_cells returned null.

The final version switched to a semaphore, but unfortunately these
calls were not updated.

Tests: unit (dev)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190314195856.66387-1-espindola@scylladb.com>
2019-03-14 21:01:37 +01:00
Paweł Dziepak
349601ac32 sstable: pass full length of buffer to vint deserialiser
vint deserialiser can be more performant if it is allowed to do an
overread (i.e. read more memory than the value it is deserialising).
In case of sstable reads those vints are going to be usually in a middle
of a much larger buffer so lets pass the whole length of the buffer and
enable this optimisation.
2019-03-14 13:37:06 +00:00
Paweł Dziepak
552fc0c6b9 vint: optimise deserialisation routine
At the moment, vint deserialisation is using a naive approach, reading
each byte separately. In practice, vints are going to most often appears
inside larger buffers. That means we can read 8-bytes at a time end then
figure out unneded parts and mask them out. This way we avoid a loop and
do less memory loads which are much more expensive than arithmetic
operations (even if they hit the cache).
2019-03-14 13:37:06 +00:00
Paweł Dziepak
57de2c26b3 vint: drop deserialize_type structure
Deserialisation function returns a structure containing both the value
and its length in the input buffer. In the vast majority of the cases
the caller will already know the length and having this structure will
make it harder for the compiler to emit good code, especially if the
function is not inlined.

In practice I've seen the structure causing register pressure problems
that lead to spilling variables to memory.
2019-03-14 13:37:06 +00:00
Paweł Dziepak
6110278439 tests/vint: reduce test dependencies
vint serialisation test doesn't need whole Scylla so lets reduce its
dependencies to improve build times.
2019-03-14 13:37:06 +00:00
Paweł Dziepak
54a079cdb5 tests/perf: add performance test for vint serialisation 2019-03-14 13:37:06 +00:00
Piotr Sarna
9c544df217 service: run notifying code in threaded context
In order to allow yielding when handling endpoint lifecycle changes,
notifiers now run in threaded context.
Implementations which used this assumption before are supplemented
with assertions that they indeed run in seastar::async mode.

Fixes #4317
Message-Id: <45bbaf2d25dac314e4f322a91350705fad8b81ed.1552567666.git.sarna@scylladb.com>
2019-03-14 12:56:53 +00:00
Piotr Sarna
a7602bd2f1 database: add global view update stats
Currently view update metrics are only per-table, but per-table metrics
are not always enabled. In order to be able to see the number of
generated view updates in all cases, global stats are added.

Fixes #4221
Message-Id: <e94c27c530b2d7d262f76d03937e7874d674870a.1552552016.git.sarna@scylladb.com>
2019-03-14 12:04:18 +00:00
Paweł Dziepak
d4d2eb2ed5 Update seastar submodule
* seastar e640314...463d24e (3):
  > Merge 'Handle IOV_MAX limit in posix_file_impl' from Paweł
  > core: remove unneeded 'exceptional future ignored' report
  > tests/perf: support multiple iterations in a single test run
2019-03-13 14:24:58 +00:00
Tomasz Grabiec
2ef9d9c12e Merge "Record large cells to system.large_cells" from Rafael
Issue #4234 asks for a large collection detector. Discussing the issue
Benny pointed out that it is probably better to have a generic large
cell detector as it makes a natural progression on what we already
warn on (large partitions and large rows).

This patch series implements that. It is on top of
shutdown-order-patches-v7 which is currently on next.

With the charges to use a semaphore this patch series might be getting
a bit big. Let me know if I should split it.

* https://github.com/espindola/scylla espindola/large-cells-on-top-of-shutdown-v5:
  db: refactor large data deletion code
  db: Rename (maybe_)?update_large_partitions
  db: refactor a try_record helper
  large_data_handler: assert it is not used after stop()
  db: don't use _stopped directly
  sstables: delete dead error handling code.
  large_data_handler: Remove const from a few functions
  large_data_handler: propagate a future out of stop()
  large_data_handler: Run large data recording in parallel
  Create a system.large_cells table
  db: Record large cells
  Add a test for large cells
2019-03-13 09:44:57 +01:00
Rafael Ávila de Espíndola
f983570ac8 Add a test for large cells
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
63251b66c1 db: Record large cells
Fixes #4234.

Large cells are now recorded in system.large_cells.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
d17083b483 Create a system.large_cells table
This is analogous to the system.large_rows table, but holds individual
cells, so it also needs the column name.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
8b4ae95168 large_data_handler: Run large data recording in parallel
With this changes the futures returned by large_data_handler will not
normally wait for entries to be written to system.large_rows or
system.large_partitions.

We use a semaphore to bound how behind system.large_* table updates
can get.

This should avoid delaying sstables writes in the common case, which
is more relevant once we warn of large cells since the the default
threshold will be just 1MB.

Note that there is no ordering between the various maybe_record_* and
maybe_delete_large_data_entries requests. This means that we can end
up with a stale entry that is only removed once the TTL expires.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
54b856e5e4 large_data_handler: propagate a future out of stop()
stop() will close a semaphore in a followup patch, so it needs to return a
future.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
989ab33507 large_data_handler: Remove const from a few functions
These will use a member semaphore variable in a followup patch, so they
cannot be const.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
0b763ec19b sstables: delete dead error handling code.
maybe_delete_large_data_entries handles exceptions internally, so the
code this patch deletes would never run.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
5fcb3ff2d7 db: don't use _stopped directly
This gives flexibility in how it is implemented.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
a17a936882 large_data_handler: assert it is not used after stop()
This should have been changed in the patch

db: stop the commit log after the tables during shutdown

But unfortunately I missed it then.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
f3089bf3d1 db: refactor a try_record helper
We had almost identical error handling for large_partitions and
large_rows. Refactor in preparation for large_cells.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:02 -07:00
Rafael Ávila de Espíndola
d7f263d334 db: Rename (maybe_)?update_large_partitions
This renames it to record_large_partitions, which matches
record_large_rows. It also changes the signature to be closer to
record_large_rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:16:04 -07:00
Rafael Ávila de Espíndola
f254664fe6 db: refactor large data deletion code
The code for deleting entries from system.large_partitions was almost
a duplicate from the code for deleting entries from system.large_rows.

This patch unifies the two, which also improves the error message when
we fail to delete entries from system.large_partitions.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:16:04 -07:00
Asias He
b8158dd65d streaming: Get rid of the keep alive timer in streaming
There is no guarantee that rpc streaming makes progress in some time
period. Remove the keep alive timer in streaming to avoid killing the
session when the rpc streaming is just slow.

The keep alive timer is used to close the session in the following case:

n2 (the rpc streaming sender) streams to n1 (the rpc streaming receiver)
kill -9 n2

We need this because we do not kill the session when gossip think a node
is down, because we think the node down might only be temporary
and it is a waste to drop the previous work that has done especially
when the stream session takes long time.

Since in range_streamer, we do not stream all data in a single stream
session, we stream 10% of the data per time, and we have retry logic.
I think it is fine to kill a stream session when gossip thinks a node is
down. This patch changes to close all stream session with the node that
gossip think it is down.
Message-Id: <bdbb9486a533eee25fcaf4a23a946629ba946537.1551773823.git.asias@scylladb.com>
2019-03-12 12:20:28 +01:00
Duarte Nunes
2718c90448 Merge 'Add canceling long-standing view update requests' from Piotr
"
This series allows canceling view update requests when a node is
discovered DOWN. View updates are sent in the background with long
timeout (5 minutes), and in case we discover that the node is
unavailable, there's no point in waiting that long for the request
to finish. What's more, waiting for these requests occurs on shutdown,
which may result in waiting 5 minutes until Scylla properly shuts down,
which is bad for both users and dtests.

This series implements storage_proxy as a lifecycle subscriber,
so it can react to membership changes. It also keeps track of all
"interruptible" writes per endpoint, so once a node is detected as DOWN,
an artificial timeout can be triggered for all aforementioned write
requests.

Fixes #3826
Fixes #3966
Fixes #4028
"

* 'write_hints_for_view_updates_on_shutdown_4' of https://github.com/psarna/scylla:
  service: remove unused stop_hints_manager
  storage_proxy: add drain_on_shutdown implementation
  main: register storage proxy as lifecycle subscriber
  storage_proxy: add endpoint_lifecycle_subscriber interface
  storage_proxy: register view update handlers for view write type
  storage_proxy: add intrusive list of view write handlers
  storage_proxy: add view_update_write_response_handler
2019-03-08 13:34:46 -03:00
Piotr Sarna
ae52b3baa7 tests: fix complex timestamp test flakiness
Complex timestamp tests were ported from dtest and contained a potential
race - rows were updated with TTL 1 and then checked if the row exists
in both base and view replicas in an eventually() loop.
During this loop however, TTL of 1 second might have already passed
and the row could have been deleted from base.
This patch changes the mentioned TTL to 30 seconds, making the tests
extremely unlikely to be flaky.
Message-Id: <6b43fe31850babeaa43465eb771c0af45ee6e80d.1552041571.git.sarna@scylladb.com>
2019-03-08 13:34:27 -03:00
Tomasz Grabiec
eb5506275b Merge "Further enhancements to perf_fast_forward" from Paweł
This series contains several improvements to perf_fast_forward that
either address some of the problems seen in the automated runs or help
understanding the results.

The main problem was that test small-partition-slicing had a preparation
stage disproportionally long compared to the actual testing phase. While
the fragments per second results wasn't affected by that, it restricted
the number of iterations of the test that we were able to run, and the
test which single iterations is short (and more prone to noise) was
executed only four times. This was solved by sharing the preparation
stage with all iterations, thus enabling the test to be run many times
and improving the stability of the results.

Another, improvement is the ability to dump all test results and process
them producing histograms. This allows us to see how the distribution of
particular statistics looks like and if there are some complications.

Refs #4278.

* https://github.com/pdziepak/scylla.git more-perf_fast_forward/v1:
  tests/perf_fast_forward: print number of iterations of each test
  tests/perf_fast_forward: reuse keys in small partition slicing test
  tests/perf_fast_forward: extract json result file writing logic
  tests/perf_fast_forward: add an option to dump all results
  tests/perf_fast_forward: add script for analysing full results
2019-03-07 12:22:13 -03:00
Piotr Sarna
aea4b7ea78 service: remove unused stop_hints_manager
Stopping hints manager now occurs when draining storage proxy
and it shouldn't be executed independently, so it's removed
from external API.
2019-03-07 13:44:06 +01:00
Piotr Sarna
cc806909d7 storage_proxy: add drain_on_shutdown implementation
When storage proxy is shutting down, all interruptible writes
can be timed out in order not to wait for them. Instead, the mechanism
will fall back to storing hints and/or not progressing with view
building.
2019-03-07 13:44:05 +01:00
Piotr Sarna
c61d0ee8aa main: register storage proxy as lifecycle subscriber
In order to be able to act when node joins/leaves, storage proxy
is registered as an endpoint lifecycle subscriber.

Fixes #3826
Fixes #4028
2019-03-07 12:10:40 +01:00
Piotr Sarna
92df1d5a6b storage_proxy: add endpoint_lifecycle_subscriber interface
Storage proxy is able to react to membership changes
in order to cancel long-standing operations for an endpoint.
2019-03-07 12:10:40 +01:00
Piotr Sarna
f9ff97511f storage_proxy: register view update handlers for view write type
View update handlers have a specialized class, so all writes
of type write_type::VIEW are now registered as such.
2019-03-07 12:10:40 +01:00
Piotr Sarna
75ec5fa876 storage_proxy: add intrusive list of view write handlers
In order to be able to iterate over view update write response handlers,
an intrusive list of them is added to storage proxy. This way
iteration can be easily yielded without invalidating operators and all
logic is moved to slow path.
2019-03-07 12:10:40 +01:00
Piotr Sarna
c2048a0758 storage_proxy: add view_update_write_response_handler
View update write response handler inherits from a regular write
response handler, but it's also possible to link it intrusively
in order to be able to induce timeouts on them later.
2019-03-07 12:10:40 +01:00
Paweł Dziepak
0ba7a3c55a tests/perf_fast_forward: add script for analysing full results
perf_fast_forward with flag --dump-all-results reports the results of
every test iteration that was executed. This patch introduces a python
script that can analyse those results (in json format) and present them
in a more human-friendly way.

For now, the only option is to plot histograms of selected statistics.
2019-03-06 15:48:49 +00:00
Paweł Dziepak
4220b90b22 tests/perf_fast_forward: add an option to dump all results
perf_fast_forward runs each test case multiple times and reports a
summary of those results (median, min, max, and median absolute
deviation).

While very convenient the summary may hide some important information
(e.g. the distribution of the results). This patch adds an option to
report results of every single executed iteration.
2019-03-06 15:48:48 +00:00
Paweł Dziepak
55ed8b2472 tests/perf_fast_forward: extract json result file writing logic
We are about to report, depending on flags, both full results as well as
the results summary written now. Most of the logic is going to be
identical.
2019-03-06 15:48:45 +00:00
Paweł Dziepak
daafde21c5 tests/perf_fast_forward: reuse keys in small partition slicing test 2019-03-06 15:48:42 +00:00
Paweł Dziepak
0eb1e570aa tests/perf_fast_forward: print number of iterations of each test 2019-03-06 15:48:38 +00:00
Avi Kivity
0beeb2f721 Merge "implement upgradesstables + scub​" from Calle
"
Fixes #4245

Breaks up "perform_cleanup" in parameterized "rewrite_sstables"
and implements upgrade + scrub in terms of this.

Both run as a "regular" compaction, but ignore the normal criteria
for compaction and select obsolete/all tables.
We also ensure all previous compactions are done so we can guarantee
all tables are rewritten post invocation of command.
"

* 'calle/upgrade_sstables' of github.com:scylladb/seastar-dev:
  api::storage_service: Implement "scrub"
  api/storage_service: Implement "upgradesstables"
  api::storage_service: Add keyspace + tables helper
  compaction_manager: Add perform_sstable_scrub
  compaction_manager: Add perform_sstable_upgrade
  compaction_manager: break out rewrite_sstables from cleanup
  table: parameterize cleanup_sstables
2019-03-06 15:47:26 +02:00
Duarte Nunes
a29ec4be76 Merge 'Update system.large_partitions during shutdown' from Rafael
"
Currently any large partitions found during shutdown are not
recorded. The reason is that the database commit log is already off,
so there is nowhere to record it to.

One possible solution is to have an independent system database. With
that the regular db is shutdown first and writes can continue to the
system db.

That is a pretty big change. It would also not allow us to record
large partitions in any system tables.

This patch series instead tries to stop the commit log later. With
that any large partitions are recorded to the log and moved to a
sstable on the next startup.
"

* 'espindola/shutdown-order-patches-v7' of https://github.com/espindola/scylla:
  db: stop the commit log after the tables during shutdown
  db: stop the compaction manager earlier
  db: Add a stop_database helper
  db: Don't record large partitions in system tables
2019-03-06 10:36:38 -03:00
Calle Wilund
ef1bdebd0a api::storage_service: Implement "scrub" 2019-03-06 13:13:21 +00:00
Calle Wilund
23f4c982ea api/storage_service: Implement "upgradesstables"
Fixes #4245

Implemented as a compation barrier (forcing previous compactions to
finish) + parameterized "cleanup", with sstable list based on
parameters.
2019-03-06 13:13:21 +00:00
Calle Wilund
3b5588dddd api::storage_service: Add keyspace + tables helper
To avoid repeating code to get keyspace + tables
2019-03-06 13:13:21 +00:00
Calle Wilund
c0bb6a4bef compaction_manager: Add perform_sstable_scrub
Suspiciously similar to an unconditional upgrade
2019-03-06 13:13:21 +00:00
Calle Wilund
7585b8c310 compaction_manager: Add perform_sstable_upgrade
Rewrites obsolete/all sstables via compaction
2019-03-06 13:13:21 +00:00
Tomasz Grabiec
889f31fabe Merge "fix slow truncation under flush pressure" from Glauber
Truncating a table is very slow if the system is under pressure. Because
in that case we mostly just want to get rid of the existing data, it
shouldn't take this long. The problem happens because truncate has to
wait for memtable flushes to end, twice. This is regardless of whether
or not the table being truncated has any data.

1. The first time is when we call truncate itself:

if auto_snapshot is enabled, we will flush the contents of this table
first and we are expected to be slow. However, even if auto_snapshot is
disabled we will still do it -- which is a bug -- if the table is marked
as durable. We should just not flush in this case and it is a silly bug.

1. The second time is when we call cf->stop(). Stopping a table will
wait for a flush to finish. At this point, regardless of which path
(Durable or non-durable) we took in the previous step we will have no
more data in the table. However, calling `flush()` still need to acquire
a flush_permit, which means we will wait for whichever memtable is
flushing at that very moment to end.

If the system is under pressure and a memtable flush will take many
seconds, so will truncate.  Even if auto_snapshots are enabled, we
shouldn't have to flush twice. The first flush should already put is in
a state in which the next one is immediate (maybe holding on to the
permit, maybe destroying the memtable_list already at that point ->
since no other memtables should be created).

If auto_snapshots are not enabled, the whole thing should just be
instantaneous.

This patchset fixes that by removing the flush need when !auto_snapshot,
and special casing the flush of an empty table.

Fixes #4294

* git@github.com:glommer/scylla.git slowtruncate-v2:
  database: immediately flush tables with no memtables.
  truncate: do not flush memtables if auto_snapshot is false.
2019-03-06 13:54:58 +01:00
Eliran Sinvani
479131259e auth: prevent failure due to race in tables creation
This commit rewrites the logic of table creation at startup of the auth
mechanism to be race proof. This is done by simply ignoring the
already_exists exception as done in system_distributed_keyspace.
The old creation logic, tested for existance of the column family and
right after called announce_new_column_family with the newly
created table schema. The problem was that it does not prevent
a race since the announcement itself is a fiber and the created table
can still be gossiped from another node, causing the announce
function to throw an already_exists exception that in turn crashes
scylla.
Message-Id: <20190306075016.28131-1-eliransin@scylladb.com>
2019-03-06 13:09:09 +01:00
Rafael Ávila de Espíndola
16ed9a2574 db: stop the commit log after the tables during shutdown
This allows for system.large_partitions to be updated if a large
partition is found while writing the last sstables.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-05 18:04:51 -08:00
Rafael Ávila de Espíndola
a3e1f14134 db: stop the compaction manager earlier
We want to finish all large data logging in stop_system, so stopping
the compaction manager should be the first thing stop_system does.

The make_ready_future<>() will be removed in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-05 18:04:51 -08:00
Rafael Ávila de Espíndola
765d8535f1 db: Add a stop_database helper
This reduces code duplication. A followup patch will add more code to
stop_database.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-05 18:04:45 -08:00
Rafael Ávila de Espíndola
0b86a99592 db: Don't record large partitions in system tables
This will allow us to delay shutdown of all system tables in a uniform
way.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-05 17:52:00 -08:00
Tomasz Grabiec
c584f48c32 Merge "transport: sort bound ranges in read reques in order to conform to cql definitions" from Eliran
According to the cql definitions, if no ORDER BY clause is present,
records should be returned ordered by the clustering keys. Since the
backend returns the ranges according to their order of appearance
in the request, the bounds should be sorted before sending it to the
backend. This kind of sorting is needed in queries that generates more
than one bound to be read, examples to such queris are:
1. a SELECT query with an IN clause.
2. a SELECT query on a mixed order tupple of columns (see #2050).
The assumption this commit makes is the correctness of the bounds
list, that is, the bounds are non overlapping. If this wasn't true, multiple
occurences of the same reccord could have returned for certain queries.

Tests:
1. Unit tests release
2. All dtest that requires #2050 and #2029

Fixes #2029
2019-03-05 21:07:15 +01:00
Avi Kivity
3cfbd682ec Merge "Add JSON support to tuples and UDT" from Piotr
"
Fixes #3708

This series adds JSON serialization and deserialization procedures
to tuples and user defined types.

Tests: unit (dev)
"

* 'add_tuple_and_udt_json_support_2' of https://github.com/psarna/scylla:
  tests: add test cases for JSON and UDT
  types: add JSON support to UDT
  tests: add JSON tuple tests
  types: add JSON support for tuples
2019-03-05 20:06:15 +02:00
Glauber Costa
c2c6c71398 truncate: do not flush memtables if auto_snapshot is false.
Right now we flush memtables if the table is durable (which in practice
it almost always is).

We are truncating, so we don't want the data. We should only flush if
auto_snapshot is true.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-03-05 11:22:48 -05:00
Glauber Costa
ed8261a0fe database: immediately flush tables with no memtables.
If a table has no data, it may still take a long time to flush. This is
because before we even try to flush, we need go acquire a permit and
that can take a while if there is a long running flush already queued.

We can special case the situation in which there is no data in any of
the memtables owned by table and return immediately.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-03-05 11:22:48 -05:00
Piotr Sarna
a5c66d5ce1 tests: add test cases for JSON and UDT 2019-03-05 16:25:18 +01:00
Piotr Sarna
ebf0eb92bb types: add JSON support to UDT
User defined types can now be serialized to and deserialized from JSON.

Fixes #3708
2019-03-05 16:08:05 +01:00
Piotr Sarna
c2064d152d tests: add JSON tuple tests 2019-03-05 16:08:05 +01:00
Piotr Sarna
aa0cc8a8a2 types: add JSON support for tuples
Tuples can now be serialized to and deserialized from JSON.

Refs #3708
2019-03-05 16:08:04 +01:00
Piotr Sarna
e9bc2a7912 cql3: fix error message for lack of primary keys in JSON
When any primary key part is not present in INSERT JSON statement,
proper error message will be presented to the client.

Tests: unit (dev) 
Message-Id: <3aa99703523c45056396a0b6d97091da30206dab.1551797502.git.sarna@scylladb.com>
2019-03-05 16:54:46 +02:00
Avi Kivity
256b7d34e2 Update seastar submodule
* seastar ab54765...e640314 (10):
  > net: enable IP_BIND_ADDRESS_NO_PORT before binding a socket during connection
  > core: show address in error message for posix_listen failures
  > fmt: remove submodule
  > tests: fix loopback socket close() to not fail when the peer's side is already closed
  > Merge "Add suffixes to target names" from Jesse
  > temporary_buffer: improve documentation for alignment param requirements
  > docs: Fix dependencies for split tutorial target
  > deleter: prevent early memory free caused by deleter append.
  > doc/tutorial.md: introduce memory allocation foreign_ptr
  > Fix CLI help message (network & DPDK options)

Toolchain and configure.py updated for fmt submodule removal.
2019-03-05 15:51:38 +02:00
Botond Dénes
817490cda1 tests/multishard_mutation_query_test: fuzzy_test: replace BOOST_WARN_* with logger::debug()
fuzzy_test performs some checks that are expected to fail and whoose
failure does not influence the outcome of the test. For this it uses the
`BOOT_WARN_*` family of macros. These will just log a warning when their
predicate fails. This can however confuse someone looking at the logs
trying to determine the cause of a failure. Since these checks are
performed primarly to provide an aid in debugging failures, replace them
with a conditional debug-level log message.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f550a9d9ab1b5b4aeb4f81860cbd3d924fc86898.1551792035.git.bdenes@scylladb.com>
2019-03-05 15:24:53 +02:00
Botond Dénes
0ed0d3297a tests/multishard_mutation_query_test: test_abandoned_read: reduce querier TTL
The `test_abandoned_read` verifies that an abandoned read does a proper
cleanup. One of the things checked is that after the querier TTL
expires, the saved queriers are cleaned-up. This check however had a
very tight timing. The TTL was 2s and the test waited 2s before it did
the check, which is wrapped in an `eventually_true()` (max +1s).
The TTL timer scans the queriers with a period of TTL/2 so a querier
can live 1.5*TTL time. This means that the 2s + 1s wait time is just on
the limit and with some bad luck (and a slow machine) it can fail.
Reduce the TTL in this test to 1s to relax the dependence on timing.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <ed0d45b5a07960b83b391d289cade9b9f60c7785.1551787638.git.bdenes@scylladb.com>
2019-03-05 14:10:04 +02:00
Eliran Sinvani
eeb0845be0 unit test: validate order instead of just content in the mixed order token test
This change ammends on the functionality of the result generation,
it changes the behaviour to return the expected results vector sorted
in the expected order of appearance in the result set. Then the
result set is validated for both, content and also order.

Tests: unit tests (Release)
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2019-03-05 13:51:17 +02:00
Eliran Sinvani
13284d9272 unit test: change IN clause tests to validate with ordering_spec
Whenever a query with an IN clause on clustering keys is executed,
assuming only one partition, the rows are ordered according to the
clustering keys. This commit adds the order validation to the content
validation whenever possible (which means removing the
ignore order part).

Tests: unit tests (Release)
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2019-03-05 13:51:17 +02:00
Eliran Sinvani
7df0c873aa transport: sort bound ranges in read reques in order to conform to cql definitions
According to the cql definitions, if no ORDER BY clause is present,
records should be returned ordered by the clustering keys. Since the
backend returns the ranges according to their order of appearance
in the request, the bounds should be sorted before sending it to the
backend. This kind of sorting is needed in queries that generates more
than one bound to be read, examples to such queris are:
1. a SELECT query with an IN clause.
2. a SELECT query on a mixed order tupple of columns (see #2050).
The assumption this commit makes is the correctness of the bounds
list, that is, the bounds are non overlapping. If this wasn't true, multiple
occurences of the same reccord could have returned for certain queries.

Tests:
1. Unit tests release
2. All dtest that requires #2050 and #2029

Fixes #2029

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2019-03-05 13:51:17 +02:00
Avi Kivity
5993c05a1b Merge "partitioner: Futurize split_range_to_single_shard" from Asias
"
Futurize split_range_to_single_shard to fix reactor stall.

Fixes: #3846
"

* tag 'asias/split_range_to_single_shard/v4' of github.com:scylladb/seastar-dev:
  partitioner: Futurize split_range_to_single_shard
  tests: Use SEASTAR_THREAD_TEST_CASE for partitioner_test.cc
2019-03-05 11:25:36 +02:00
Asias He
58fae5f4c1 partitioner: Futurize split_range_to_single_shard
We saw reactor stalls when closing SSTables. The backtrace looks like:

Oct 12 19:00:51 dell-1 scylla[435045]: Backtrace:[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /home/sylla/scylla/seastar/util/backtrace.hh:56
seastar::backtrace_buffer::append_backtrace() at /home/sylla/scylla/seastar/core/reactor.cc:410
 (inlined by) print_with_backtrace at /home/sylla/scylla/seastar/core/reactor.cc:431
seastar::reactor::block_notifier(int) at /home/sylla/scylla/seastar/core/reactor.cc:749
_L_unlock_13 at funlockfile.c:?
std::experimental::fundamentals_v1::_Optional_base<range_bound<dht::ring_position>, true>::_Optional_base(std::experimental::fundamentals_v1::_Optional_base<range_bound<dht::ring_position>, true>&&) at /opt/scylladb/include/c++/7/experimental/optional:247
 (inlined by) std::experimental::fundamentals_v1::optional<range_bound<dht::ring_position> >::optional(std::experimental::fundamentals_v1::optional<range_bound<dht::ring_position> >&&) at /opt/scylladb/include/c++/7/experimental/optional:493
 (inlined by) wrapping_range<dht::ring_position>::wrapping_range(wrapping_range<dht::ring_position>&&) at /home/sylla/scylla/./range.hh:61
 (inlined by) nonwrapping_range<dht::ring_position>::nonwrapping_range(nonwrapping_range<dht::ring_position>&&) at /home/sylla/scylla/./range.hh:430
 (inlined by) void __gnu_cxx::new_allocator<nonwrapping_range<dht::ring_position> >::construct<nonwrapping_range<dht::ring_position>, nonwrapping_range<dht::ring_position> >(nonwrapping_range<dht::ring_position>*, nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/ext/new_allocator.h:136
 (inlined by) void std::allocator_traits<std::allocator<nonwrapping_range<dht::ring_position> > >::construct<nonwrapping_range<dht::ring_position>, nonwrapping_range<dht::ring_position> >(std::allocator<nonwrapping_range<dht::ring_position> >&, nonwrapping_range<dht::ring_position>*, nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/bits/alloc_traits.h:475
 (inlined by) nonwrapping_range<dht::ring_position>& std::deque<nonwrapping_range<dht::ring_position>, std::allocator<nonwrapping_range<dht::ring_position> > >::emplace_back<nonwrapping_range<dht::ring_position> >(nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/bits/deque.tcc:167
 (inlined by) std::deque<nonwrapping_range<dht::ring_position>, std::allocator<nonwrapping_range<dht::ring_position> > >::push_back(nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/bits/stl_deque.h:1558
 (inlined by) dht::split_range_to_single_shard(dht::i_partitioner const&, schema const&, nonwrapping_range<dht::ring_position> const&, unsigned int) at /home/sylla/scylla/dht/i_partitioner.cc:454
dht::split_range_to_single_shard(schema const&, nonwrapping_range<dht::ring_position> const&, unsigned int) at /home/sylla/scylla/dht/i_partitioner.cc:464
create_sharding_metadata at /home/sylla/scylla/sstables/sstables.cc:2075
 (inlined by) sstables::sstable::write_scylla_metadata(seastar::io_priority_class const&, unsigned int, sstables::sstable_enabled_features) at /home/sylla/scylla/sstables/sstables.cc:2435
sstables::sstable_writer_m::consume_end_of_stream() at /home/sylla/scylla/sstables/sstables.cc:3483
sstables::compaction::finish_new_sstable(std::experimental::fundamentals_v1::optional<sstables::sstable_writer>&, seastar::lw_shared_ptr<sstables::sstable>&) at /home/sylla/scylla/sstables/compaction.cc:338
 (inlined by) sstables::regular_compaction::stop_sstable_writer() at /home/sylla/scylla/sstables/compaction.cc:579
 (inlined by) sstables::regular_compaction::finish_sstable_writer() at /home/sylla/scylla/sstables/compaction.cc:585
sstables::compacting_sstable_writer::consume_end_of_stream() at /home/sylla/scylla/sstables/compaction.cc:494
 (inlined by) auto compact_mutation_state<(emit_only_live_rows)0, (compact_for_sstables)1>::consume_end_of_stream<sstables::compacting_sstable_writer>(sstables::compacting_sstable_writer&) at /home/sylla/scylla/./mutation_compactor.hh:292
 (inlined by) compact_mutation<(emit_only_live_rows)0, (compact_for_sstables)1, sstables::compacting_sstable_writer>::consume_end_of_stream() at /home/sylla/scylla/./mutation_compactor.hh:397
 (inlined by) stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >::consume_end_of_stream() at /home/sylla/scylla/./mutation_reader.hh:366
 (inlined by) auto flat_mutation_reader::impl::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)> >(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)>, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /home/sylla/scylla/./flat_mutation_reader.hh:288
 (inlined by) auto flat_mutation_reader::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)> >(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)>, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /home/sylla/scylla/./flat_mutation_reader.hh:370
 (inlined by) operator() at /home/sylla/scylla/sstables/compaction.cc:757
 (inlined by) apply at /home/sylla/scylla/seastar/core/apply.hh:35
 (inlined by) apply<sstables::compaction::run(std::unique_ptr<sstables::compaction>)::<lambda()> > at /home/sylla/scylla/seastar/core/apply.hh:43
 (inlined by) apply<sstables::compaction::run(std::unique_ptr<sstables::compaction>)::<lambda()> > at /home/sylla/scylla/seastar/core/future.hh:1309
 (inlined by) operator() at /home/sylla/scylla/./seastar/core/thread.hh:315
 (inlined by) _M_invoke at /opt/scylladb/include/c++/7/bits/std_function.h:316
std::function<void ()>::operator()() const at /opt/scylladb/include/c++/7/bits/std_function.h:706
 (inlined by) seastar::thread_context::main() at /home/sylla/scylla/seastar/core/thread.cc:313

The call chain is:

sstable_writer_k_l::consume_end_of_stream and mc::writer::consume_end_of_stream
-> sstable::write_scylla_metadata -> create_sharding_metadata -> dht::split_range_to_single_shard

Since sstable writer assumes a thread context. We can futurize dht::split_range_to_single_shard.

Fixes: #3846
Tests: dtest + build/dev/tests/partitioner_test
2019-03-05 17:21:27 +08:00
Benny Halevy
1021eb29c9 distributed_loader: fix old format counters exception
table::load_sstable: fix missing arg in old format counters exception

Properly catch and log the exception in load_new_sstables.
Abort when the exception is caught to keep current behavior.

Seen with migration_test:TestMigration_with_2_1_x.migrate_sstable_with_counter_test
without enable_dangerous_direct_import_of_cassandra_counters.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190301091235.2914-1-bhalevy@scylladb.com>
2019-03-04 17:36:09 +01:00
Avi Kivity
026821fb59 Merge "Record large rows in the system.large_rows table" from Rafael
"
This fixes #3988.

We already have a system.large_partitions, but only a warning for
large rows. These patches close the gap by also recording large rows
into a new system.large_rows.
"

* 'espindola/large-row-add-table-v6' of https://github.com/espindola/scylla:
  Add a testcase for large rows
  Populate system.large_rows.
  Create a system.large_rows table
  Extract a key_to_str helper
  Don't call record_large_rows if stopped
  Add a delete_large_rows_entries method to large_data_handler
  db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void
  Rename maybe_delete_large_partitions_entry
  Rename log_large_row to record_large_rows
  Rename maybe_log_large_row to maybe_record_large_rows
2019-03-04 18:31:10 +02:00
Avi Kivity
da0a25859b Merge "Improvements to commitlog logs" from Paweł
"
This series contains minor improvements to commitlog log messages that
have helped investigating #4231, but are not specific to that bug.
"

* tag 'improve-commitlog-logs/v1' of https://github.com/pdziepak/scylla:
  commitlog: use consistent chunk offsets in logs
  commitlog: provide more information in logs
  commitlog: remove unnecessary comment
2019-03-04 14:52:46 +02:00
Paweł Dziepak
00b33de25c commitlog: use consistent chunk offsets in logs
Logs in commitlog writer use offset in the file of the chunk header to
identify chunks. However, the replayer is using offset after the header
for the same purpose. This causes unnecessary confusion suggesting that
the replayer is reading at the wrong position.

This patch changes the replayer so that it reports chunk header offsets.
2019-03-04 12:15:50 +00:00
Paweł Dziepak
813b00a1a6 commitlog: provide more information in logs
This commits adds some more information to the logs. Motivated, by
experiences with investigating #4231.

 * size of each write
 * position of each write
 * log message for final write
2019-03-04 12:15:50 +00:00
Paweł Dziepak
1a657e9c5f commitlog: remove unnecessary comment 2019-03-04 12:15:50 +00:00
Avi Kivity
d95dec22d9 Merge "Fix commitlog chunks overwriting each other" from Paweł
"
This series fixes a problem in the commitlog cycle() function that
confused in-memory and on-disk size of chunks it wrote to disk. The
former was used to decide how much data needs to be actually written,
and the latter was used to compute the offset of the next chunk. If two
chunk writes happened concurrently one the one positioned earlier in
the file could corrupt the header of the next one.

Fixes #4231.

Tests: unit(dev), dtest(commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup,test_commitlog_replay_with_alter_table)
"

* tag 'fix-commitlog-cycle/v1' of https://github.com/pdziepak/scylla:
  commitlog: write the correct buffer size
  utils/fragmented_temporary_buffer_view: add remove suffix
2019-03-04 14:14:32 +02:00
Tomasz Grabiec
58e7ad20eb sstable/compaction: Use correct schema in the writing consumer
Introduced in 2a437ab427.

regular_compaction::select_sstable_writer() creates the sstable writer
when the first partition is consumed from the combined mutation
fragment stream. It gets the schema directly from the table
object. That may be a different schema than the one used by the
readers if there was a concurrent schema alter duringthat small time
window. As a result, the writing consumer attached to readers will
interpret fragments using the wrong version of the schema.

One effect of this is storing values of some columns under a different
column.

This patch replaces all column_family::schema() accesses with accesses
to the _schema memeber which is obtained once per compaction and is
the same schema which readers use.

Fixes #4304.

Tests:

  - manual tests with hard-coded schema change injection to reproduce the bug
  - build/dev/scylla boot
  - tests/sstable_mutation_test

Message-Id: <1551698056-23386-1-git-send-email-tgrabiec@scylladb.com>
2019-03-04 13:27:19 +02:00
Paweł Dziepak
434023425d commitlog: write the correct buffer size
Commitlog files contain multiple chunks. Each chunk starts as a single
(possibly, fragmented buffer). The size of that buffer in memory may be
larger than the size in the file.

cycle() was incorrectly using the in-memory size to write the whole
buffer to the file. That sometimes caused data corruption, since a
smaller on-file size was used to compute the offset of the next chunk
and there could be multiple chunk writes happening at the same time.

This patch solves the issue by ensuring that only the actual on-file
size of the chunk is written.
2019-03-04 10:25:48 +00:00
Paweł Dziepak
ca8d1025c0 utils/fragmented_temporary_buffer_view: add remove suffix
This patch adds fragmented_temporary_buffer_view::remove_suffix(). It is
also necessary to adjust remove_prefix() since now the total size of all
fragments may be larger than the size of the view if both those
operations are performed.
2019-03-04 10:23:45 +00:00
Asias He
3861f538dc tests: Use SEASTAR_THREAD_TEST_CASE for partitioner_test.cc
We are going to convert split_range_to_single_shard to return a future.
2019-03-04 09:41:09 +08:00
Avi Kivity
8f71e7ffd4 Merge "auth: Prevent disallowed roles from logging in" from Jesse
"
This series heavily refactors `auth_test` in anticipation of
the last patch, which fixes a bug and which should be backported.

Branches: branch-3.0, branch-2.3
"

Fixes #4284

* 'jhk/check_can_login/v2' of https://github.com/hakuch/scylla:
  auth: Reject logins from disallowed roles
  tests: Restrict the scope of a variable
  tests: Simplify boolean assertions in `auth_test`
  tests: Abstract out repeated assertion checking
  tests: Do not use the `auth` namespace
  tests: Validate authentication correctly
  tests: Ensure test roles are created and dropped
  tests: Use `static` variables in `auth_test`
  tests: Remove non-useful test
2019-03-02 17:13:06 +02:00
Asias He
a949ccee82 repair: Reject combination of -dc and -hosts options
4 nodes in the cluster
n1, n2 in dc1
n3, n4 in dc2

dc1 RF=2, dc2 RF=2.

If we run

    nodetool repair -hosts 127.0.0.1,127.0.03 -dc "dc1,dc2" multi

on n1.

The -hosts option will be ignored and only the -dc option
will be used to choose which hosts to repair. In this case, n1 to n4
will be repaired.

If user wants to select specific hosts to repair with, there is no need
to specify the -dc option. Use the -hosts option is enough.

Reject the combination and not to surprise the user.

In https://issues.apache.org/jira/browse/CASSANDRA-9876, the same logic
is introduced as well.

Refs #3836
Message-Id: <e95ac1099f98dd53bb9d6534316005ea3577e639.1551406529.git.asias@scylladb.com>
2019-03-02 16:42:29 +02:00
Juliana Oliveira
6322293263 dist/docker: add ssh server
Scylla Manager communicates through SSH, so this patch adds SSH server
to Scylla's docker image in order for it to be configurable by Scylla
Manager.

Message-Id: <20190301161428.GA12148@shenzou.localdomain>
2019-03-01 19:11:35 +02:00
Avi Kivity
41078de096 tools: toolchain: update image for gcc-8.3.1-2.fc29.x86_64
tests: unit (debug, dev, release)
2019-03-01 16:42:18 +02:00
Duarte Nunes
44966d0a66 Merge 'Fix view update generation optimizations' from Piotr
"
This series aims to fix inconsistencies in recent view update generation series (435447998).

First of all, it checks view row marker liveness instead of that of a base row marker
when deciding if optimizations can be applied or not.

Secondly, tests based on creating mutations directly are removed. Instead:
 - dtest case which detected inconsistencies in previous series is ported to be a unit test
 - the above case is also expanded to cover views with regular base column in their key
 - additional test for TTL and timestamps is added and it's based on CQL

Tests: unit (dev)
dtest: materialized_views_test.TestMaterializedViews.test_no_base_column_in_view_pk_complex_timestamp_without_flush

Fixes: #4271
"

* 'fix_virtual_columns_liveness_checks_in_update_optimization_5' of https://github.com/psarna/scylla:
  tests: add view update optimization case for TTL
  database: add view_stats getter
  tests: port complex timestamp view test from dtest
  db,view: fix virtual columns liveness checks
  tests: remove update generating test case
2019-03-01 10:58:39 -03:00
Jesse Haber-Kucharsky
a139afc30c auth: Reject logins from disallowed roles
When the `LOGIN` option for a role is set to `false`, Scylla should not
permit the role to log in.

Fixes #4284

Tests: unit (debug)
2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky
320b4a7b99 tests: Restrict the scope of a variable 2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky
f8764a12e6 tests: Simplify boolean assertions in auth_test 2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky
879217ccaf tests: Abstract out repeated assertion checking 2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky
3c8eeb0e86 tests: Do not use the auth namespace 2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky
afed9c7bee tests: Validate authentication correctly
There are additional validation steps that the server executes in
addition to simply invoking the authenticator, so we adapt the tests to
also perform that validation.

We also eliminate lots of code duplication.
2019-02-28 15:01:14 -05:00
Jesse Haber-Kucharsky
baefde0f6c tests: Ensure test roles are created and dropped
Since the role manager and authenticator work in tandem, the test cases
should use the wrapper for `auth::service` to create and drop users
instead of just doing it through the authenticator.
2019-02-28 15:00:20 -05:00
Jesse Haber-Kucharsky
fd88d59ad9 tests: Use static variables in auth_test
This way, we avoid copies and alleviate resource-management concerns.
2019-02-28 14:59:38 -05:00
Jesse Haber-Kucharsky
f274982522 tests: Remove non-useful test
Password handling is verified in its own test suite, and this test not
only makes a number of assumptions about implementation details, but
also tries to verify a hashing scheme (bcrypt) which is not supported on
most Linux distributions.
2019-02-28 14:58:27 -05:00
Avi Kivity
7c968f4a9e build: move XXH_PRIVATE_API and SEASTAR_TESTING_MAIN non-mode-specific
These defines are global, so they can be in the mode-agnostic cxxflags
rather than the mode-specific cxxflags_{mode}.
Message-Id: <20190228081247.20116-1-avi@scylladb.com>
2019-02-28 09:51:02 +00:00
Piotr Sarna
032f8e2893 tests: add view update optimization case for TTL
This test case checks whether redundant updates are omitted
and the essential ones are still generated.
2019-02-28 10:47:20 +01:00
Piotr Sarna
67e63d4dd7 database: add view_stats getter
It will be used for testing purposes
2019-02-28 10:47:20 +01:00
Piotr Sarna
09b8d2e9d6 tests: port complex timestamp view test from dtest
This test was useful in discovering corner cases for TTLs of virtual
columns, so it's ported to unit test suite from dtest.

The test is also extended with a mirrored case for base regular column
that *is* included in view pk.
2019-02-28 10:47:20 +01:00
Piotr Sarna
5f85a7a821 db,view: fix virtual columns liveness checks
When looking for optimization paths, columns selected in a view
are checked against multiple conditions - unfortunately virtual
columns were erroneously skipped from that check, which resulted
in ignoring their TTLs. That can lead to overoptimizing
and not including vital liveness info into view rows,
which can then result in row disappearing too early.
2019-02-28 10:47:19 +01:00
Piotr Sarna
b963543762 tests: remove update generating test case
This test case should have been based on CQL instead of creating
artificial update scenarios. It also contains invalid cases
regarding base and view row marker, so it's removed here
and replaced with CQL-based test in this same series.
2019-02-28 10:40:47 +01:00
Avi Kivity
20eadb2c39 relocatable-package: package and redirect gnutls configuration
gnutls requires a configuration file, and the configuration file must match
the one used by the library. Since we ship our own version of the library with
the relocatable package, we must also ship the configuration file.

Luckily, it is possible to override the location of the configuration file via
an environment variable, so all we need to do is to copy the file to the archive
and provide the environment variable in the thunk that adjusts the library path.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190227110529.14146-1-avi@scylladb.com>
2019-02-28 10:57:32 +02:00
Avi Kivity
4022a919f6 test: allocate at least one logical core per unit test
Currently, we only allocate memory for concurrent unit test runs. This can cause
CPU overcommit when running test.py on machines with a log of memory but few cores.
This overcommit can cause timeouts in tests that are time-sensitive (bad practice,
but can happen) and makes the desktop sluggish.

Improve by allocating at least one logical core per running test.

Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190227132516.22147-1-avi@scylladb.com>
2019-02-28 10:34:33 +02:00
Dan Yasny
6dbb48a12a node_health_check: collect scylla.d contents with node_health_check
We are missing data for CPU conf files and potentially other
information when collecting node data.

Fixes #4094

Message-Id: <20190225204727.20805-5-dyasny@scylladb.com>
2019-02-28 10:23:19 +02:00
Dan Yasny
9055e7a49e node_health_check: Add redhat-release to health check if present
Collect /etc/redhat-release as well as os-release from relevant
hosts. The problem with os-release is that it doesn't contain the
minor version of the EL OS family. Since this is only present in
Red Hat distributions and derivatives, it will not be collected
in Debian derivatives.

Another approach is to use lsb_release -a but it will not provide
anything more useful than os-release on Debian and lsb needs to be
installed on EL derivatives first.

Fixes #4093

Message-Id: <20190225204727.20805-4-dyasny@scylladb.com>
2019-02-28 10:23:12 +02:00
Dan Yasny
2f26390f52 node_health_check: Use clear hostname instead of -i for filenames and report names
Hostname -i produces a garbled output on new systems with ipv6
enabled, better to use the clean hostname instead, for the file
names.

Message-Id: <20190225204727.20805-3-dyasny@scylladb.com>
2019-02-28 10:23:06 +02:00
Dan Yasny
f483c594ee node_health_check: Detect the address for the CQL (port 9042) listener and use it
The script relies on hostname -i for host address, which can be
wrong in some systems. This patch checks for where the defined
CQL_PORT is listening, and uses the correct IP address instead.

Message-Id: <20190225204727.20805-2-dyasny@scylladb.com>
2019-02-28 10:22:58 +02:00
Avi Kivity
632c7c303a Merge "auth: Restructure SASL code" from Jesse
"
This series restructures the SASL code that was previously internal
to the `password_authenticator` so that it can be used in other contexts.
"

* 'jhk/restructure_sasl/v1' of https://github.com/hakuch/scylla:
  auth: Rename SASL challenge class for "PLAIN"
  auth: Make a ctor `explicit`
  auth: Move `sasl_challenge` to its own file
  auth: Decouple SASL code from its parent class
2019-02-28 10:19:41 +02:00
Jesse Haber-Kucharsky
f2d92f81e8 auth: Report a more specific error with bad creds
Without this change, the resulting error message for an invalid password
is "authentication failed".

With this change, we report "Username and/or password are incorrect".

Fixes #4285

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <32d00be8af5075ee10d2c14f85b76843a9adac10.1551306914.git.jhaberku@scylladb.com>
2019-02-28 09:53:57 +02:00
Jesse Haber-Kucharsky
3d883e8cf2 auth: Rename SASL challenge class for "PLAIN" 2019-02-27 18:36:58 -05:00
Jesse Haber-Kucharsky
0c955b7992 auth: Make a ctor explicit 2019-02-27 18:36:58 -05:00
Jesse Haber-Kucharsky
dc41f1098b auth: Move sasl_challenge to its own file
This will allow for other authenticators other than
`password_authenticator` from making use of the PLAIN SASL
authentication code.
2019-02-27 18:36:52 -05:00
Jesse Haber-Kucharsky
2d59fa6be9 auth: Decouple SASL code from its parent class
This way, we can (in the future) use this implementation of the SASL
"PLAIN" mechanism in other contexts other than `password_authenticator`.
2019-02-27 18:11:31 -05:00
Avi Kivity
88322086cb Merge "Add fuzzer-type unit test for range scans" from Botond
"
This series adds a fuzzer-type unit test for range scans, which
generates a semi-random dataset and executes semi-random range scans
against it, validating the result.
This test aims to cover a wide range of corner cases with the help of
randomness. Data and queries against it are generated in such a way that
various corner cases and their combinations are likely to be covered.

The infrastructure under range-scans have gone under massive changes in
the last year, growing in complexity and scope. The correctness of range
scans is critical for the correct functioning of any Scylla cluster, and
while the current unit tests served well in detecting any major problems
(mostly while developing), they are too simplistic and can only be
relied on to check the correctness of the basic functionality. This test
aims to extend coverage drastically, testing cases that the author of
the range-scan code or that of the existing unit tests didn't even think
exists, by relying on some randomness.

Fixes: #3954 (deprecates really)
"

* 'more-extensive-range-scan-unit-tests/v2' of https://github.com/denesb/scylla:
  tests/multishard_mutation_query_test: add fuzzy test
  tests/multishard_mutation_query_test: refactor read_all_partitions_with_paged_scan()
  tests/test_table: add advanced `create_test_table()` overload
  tests/test_table: make `create_test_table()` customizable
  query: add trim_clustering_row_ranges_to()
  tests/test_table: add keyspace and table name params
  tests/test_table: s/create_test_cf/create_test_table/
  tests: move create_test_cf() to tests/test_table.{hh,cc}
  tests/multishard_mutation_query_test: drop many partition test
  tests/multishard_mutation_query_test: drop range tombstone test
2019-02-27 17:26:53 +02:00
Avi Kivity
cc2f9841c4 Merge "Simplify -g and -gz checks in configure.py" from Rafael
* 'simplify-g-gz-check-v2' of https://github.com/espindola/scylla:
  Assume -gz is always available
  Assume -g is always available
2019-02-27 17:19:37 +02:00
Duarte Nunes
871790a340 Merge 'Hide virtual columns write time and ttl from the user' from Piotr
"
This miniseries hides virtual columns's writetime and ttl
from the user.

Tests: unit (dev)

Fixes #4288
"

* 'hide_virtual_columns_writetime_and_ttl_2' of https://github.com/psarna/scylla:
  tests: add test for hiding virtual columns from WRITETIME
  cql3: hide virtual columns from WRITETIME() and TTL()
  schema: add column_definition::is_hidden_from_cql
2019-02-27 14:36:08 +00:00
Calle Wilund
93602ecee3 compaction_manager: break out rewrite_sstables from cleanup
Allowing additional behaviour control. Such as which tables,
and whether to actually lock ourselves out as a "cleanup".
2019-02-27 14:25:31 +00:00
Calle Wilund
7fb6bbe68c table: parameterize cleanup_sstables
To allow using the logic for one-sstable-at-a-time compaction (i.e.
rewrite) of sstables without the "normal" cleanup logic and partition
selection.
2019-02-27 14:25:31 +00:00
Piotr Sarna
09eb0429ce tests: add test for hiding virtual columns from WRITETIME
Visibility checks for virtual columns' WRITETIME and TTL
are added.
2019-02-27 15:08:16 +01:00
Piotr Sarna
af39787bf0 cql3: hide virtual columns from WRITETIME() and TTL()
Virtual columns should not be visible to the user,
so they are now hidden not only from directly selecting them,
but also via WRITETIME() and TTL() keywords.

Fixes #4288
2019-02-27 15:08:15 +01:00
Piotr Sarna
b0ab4c28cf schema: add column_definition::is_hidden_from_cql
Right now the only columns hidden from CQL are view virtual columns,
but in case of expanding this set, a helper function is provided.
2019-02-27 15:07:54 +01:00
Avi Kivity
d189e12438 tests: database_test: fix misaligned dma write
test_distributed_loader_with_pending_delete issues a dma write, but violates
the unwritten contract to temporary_buffer::aligned(), which requires that
size be a multiple of alignment. As a result the test fails spuriously.

Instead of playing with the alignment, rewrite that snippet to use the
easier-to-use make_file_output_stream().

Introduced in 1ba88b709f.
Branches: master.
Message-Id: <20190226181850.3074-1-avi@scylladb.com>
2019-02-27 09:00:31 +01:00
Rafael Ávila de Espíndola
d9e0b47d53 Add a testcase for large rows
Tests: unit (release)

Fixes #3988.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:56:50 -08:00
Rafael Ávila de Espíndola
25f81cf3e3 Populate system.large_rows.
It now records large rows when they are first written to an sstable
and removes them when the sstable is deleted.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:56:42 -08:00
Rafael Ávila de Espíndola
66d8a0cf93 Create a system.large_rows table
This is analogous to the system.large_partitions table, but holds
individual rows, so it also needs the clustering key of the large
rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
da4c0da78a Extract a key_to_str helper
It will be used in more places in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
b7fd03d0fd Don't call record_large_rows if stopped
The implementations large_data_handler should only be called if
large_data_handler hasn't been stopped yet.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
0c401f56f8 Add a delete_large_rows_entries method to large_data_handler
This will be responsible for removing large rows from
system.large_rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
81a21ea425 db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void
These functions will record into tables in a followup patch, so they
will need to return a future.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
d4c001cba8 Rename maybe_delete_large_partitions_entry
It will also delete large rows, so rename it to
maybe_delete_large_data_entries.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
e9a13aff90 Rename log_large_row to record_large_rows
It will also record into a table in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
6fb7066755 Rename maybe_log_large_row to maybe_record_large_rows
It will also record into a table in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
a586ac209a Assume -gz is always available
It is available since clang 5 and gcc 5.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 09:57:26 -08:00
Rafael Ávila de Espíndola
054078b6af Assume -g is always available
From the log it looks like these checks were added in 2014 because of
a broken clang.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 09:57:26 -08:00
Rafael Ávila de Espíndola
87106ea5e2 Improve the build mode documentation
With this patch HACKING suggest using just ./configure.py and passing
the mode to ninja. It also expands on the characteristics of each mode
and mentions the dev mode.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190208020444.19145-1-espindola@scylladb.com>
2019-02-26 19:54:50 +02:00
Nadav Har'El
da54d0fc7d Materialized views: fix accidental zeroing of flow-control delay
The materialized-views flow control carefully calculates an amount of
microseconds to delay a client to slow it down to the desired rate -
but then a typo (std::min instead of std::max) causes this delay to
be zeroed, which in effect completely nullifies the flow control
algorithm.

Before this fix, experiments suggested that view flow control was
not having any effect and view backlog not bounded at all. After this
fix, we can see the flow control having its desired effect, and the
view backlog converging.

Fixes #4143.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190226161452.498-1-nyh@scylladb.com>
2019-02-26 18:22:18 +02:00
Tomasz Grabiec
1a63a313c8 Merge "repair: Rename names to be consistent with rpc verb
" from Asias

Some of the function names are not updated after we change the rpc verb
names. Rename them to make them consistent with the rpc verb names.

* seastar-dev.git asias/row_level_repair_rename_consistent_with_rpc_verb/v1:
  repair: Rename request_sync_boundary to get_sync_boundary
  repair: Rename request_full_row_hashes to get_full_row_hashes
  repair: Rename request_combined_row_hash to get_combined_row_hash
  repair: Rename request_row_diff to get_row_diff
  repair: Rename send_row_diff to put_row_diff
  repair: Update function name in docs/row_level_repair.md
2019-02-26 13:01:36 +01:00
Tomasz Grabiec
b06aac4fdb Merge "Fix temporary spurious schema version mismatch when nodes are restarted" from Asias
Fixes: #4148
Fixes: #4258

Tests: resharding_test.py:reshardingtest_nodes4_with_sizetieredcompactionstrategy.resharding_by_smp_increase_test

* seastar-dev.git asias/fix_schema_mismatch_when_nodes_restarts/v1:
  database: Add update_schema_version and announce_schema_version
  storage_service: Add application_state::SCHEMA when gossip starts
2019-02-26 12:55:52 +01:00
Avi Kivity
5f94bc902a transport: add option to disable shard-aware drivers
The shard-aware drivers can cause a huge amount of connections to be created
when there are tens of thousands of clients. While normally the shard-aware
drivers are beneficial, in those cases they can consume too much memory.

Provide an option to disable shard awareness from the server (it is likely to
be easier to do this on the server than to reprovision those thousands of
clients).

Tests: manual test with wireshark.
Message-Id: <20190223173331.24424-1-avi@scylladb.com>
2019-02-26 12:44:11 +01:00
Asias He
459836079c storage_service: Add application_state::SCHEMA when gossip starts
In resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test, we saw:

   4 nodes in the tests

   n1, n2, n3, n4 are started

   n1 is stopped

   n1 is changed to use different shard config

   n1 is restarted ( 2019-01-27 04:56:00,377 )

The backtrace happened on n2 right fater n1 restarts:

   0 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature STREAM_WITH_RPC_STREAM is enabled
   1 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature WRITE_FAILURE_REPLY is enabled
   2 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature XXHASH is enabled
   3 WARN 2019-01-27 04:56:05,177 [shard 0] gossip - Fail to send EchoMessage to 127.0.58.1: seastar::rpc::closed_error (connection is closed)
   4 INFO 2019-01-27 04:56:05,205 [shard 0] gossip - InetAddress 127.0.58.1 is now UP, status =
   5 Segmentation fault on shard 0.
   6 Backtrace:
   7 0x00000000041c0782
   8 0x00000000040d9a8c
   9 0x00000000040d9d35
   10 0x00000000040d9d83
   11 /lib64/libpthread.so.0+0x00000000000121af
   12 0x0000000001a8ac0e
   13 0x00000000040ba39e
   14 0x00000000040ba561
   15 0x000000000418c247
   16 0x0000000004265437
   17 0x000000000054766e
   18 /lib64/libc.so.6+0x0000000000020f29
   19 0x00000000005b17d9

The theory is: migration_manager::maybe_schedule_schema_pull is scheduled, at this time
n1 has SCHEMA application_state, when n1 restarts, n2 gets new application
state from n1 which does not have SCHEMA yet, when migration_manager::maybe_schedule
wakes up from the 60 sleep, n1 has non-empty endpoint_state but empty
application_state for SCHEMA. We dereference the nullptr
application_state and abort.

In commit da80f27f44, we fixed the problem by
checking the pointer before dereference.

To prevent this to happen in the first place, we'd better to add
application_state::SCHEMA when gossip starts. This way, peer nodes
always see the application_state::SCHEMA when a node restarts.

Tests: resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test

Fixes #4148
Fixes #4258
2019-02-26 19:30:22 +08:00
Asias He
75edbe939d database: Add update_schema_version and announce_schema_version
Split the update_schema_version_and_announce() into
update_schema_version() and announce_schema_version(). This is going to
be used in storage_service::prepare_to_join() where we want to first
update the schema version, start gossip, announce the schema version.
2019-02-26 19:10:02 +08:00
Amnon Heiman
b8a838c66c node_exporter_install: Add a force install option
It is sometimes usefull for force reinstallation of the node_exporter,
for example during upgrade or if something is wrong with the current
installation.

This patch adds a --force command line option.

If the --force is given to the node_expoerter_install, it will reinstall
node_exporter to the latest version, regardless if it was already
installed.

The symbolic link in /usr/bin/node_exporter will be set to the installed
version, so if there are other installed version, they will remain.

Examples:
$ sudo ./dist/common/scripts/node_exporter_install
node_exporter already installed, you can use `--force` to force reinstallation

$ sudo ./dist/common/scripts/node_exporter_install --force
node_exporter already installed, reinstalling

Fixes #4201

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20190225151120.21919-1-amnon@scylladb.com>
2019-02-25 20:16:58 +02:00
Pekka Enberg
ca288189a9 dist/ami: Support different products for the AMI
Let's add a PRODUCT variable, similar to build_rpm.sh, for example, so
that we can override package names for enterprise AMIs.

Message-Id: <20190225063319.19516-1-penberg@scylladb.com>
2019-02-25 11:17:44 +02:00
Asias He
3e615c3a15 repair: Update function name in docs/row_level_repair.md
The repair rpc request_* functions are renamed to get_*.
The send_row_diff is renamed to put_row_diff.
2019-02-25 15:13:39 +08:00
Asias He
62104902db repair: Rename send_row_diff to put_row_diff
Make it consistent with the row level repair rpc verb.
2019-02-25 15:13:39 +08:00
Asias He
6e4ea1b3c4 repair: Rename request_row_diff to get_row_diff
Make it consistent with the row level repair rpc verb.
2019-02-25 15:13:39 +08:00
Asias He
5b29fb30ac repair: Rename request_combined_row_hash to get_combined_row_hash
Make it consistent with the row level repair rpc verb.
2019-02-25 15:13:39 +08:00
Asias He
6f6c4878d5 repair: Rename request_full_row_hashes to get_full_row_hashes
Make it consistent with the row level repair rpc verb.
2019-02-25 15:13:39 +08:00
Asias He
02ddfa393e repair: Rename request_sync_boundary to get_sync_boundary
Make it consistent with the row level repair rpc verb.
2019-02-25 15:13:39 +08:00
Avi Kivity
a0b0db7915 Merge "Fix regression in perf_fast_forward results" from Paweł
"
After adcb3ec20c ("row_cache: read is not
single-partition if inter-partition forwarding is enabled") we have
noticed a regression in the results of some perf_fast_forward tests.
This was caused by those tests not disabling partition-level
fast-forwarding even though it was not needed and the commit in question
fixed an incorrect optimisation in such cases.

However, after solving that issue it has also become apparent that
mutation_reader_merger performs worse when the fast-forwarding is
disabled. This was attributed to logic responsible for dropping readers
as soon as they have reached the end of stream (which cannot be done if
fast-forwarding is enabled). This problem was mitigated with avoiding a
scan of the list and removing readers in small batches.

Fixes #4246.
Fixes #4254.

Tests: unit(dev)
"

* tag 'perf_fast_forward-fix-regression/v1' of https://github.com/pdziepak/scylla:
  mutation_reader_merger: drop unneded readers in small batches
  mutation_reader_merger: track readers by iterators and not pointers
  tests/perf_fast_forward: disable partition-level fast-forwarding if not needed
2019-02-24 19:24:00 +02:00
Avi Kivity
e3c53ff3ff Update seastar submodule
* seastar 2313dec...ab54765 (10):
  > Fix C++-17-only uses of static_assert() with a single parameter.
  > README.md: fix out-of-date explanation of C++ dialect
  > net: fix tcp load balancer accounting leak while moving socket to other shard
  > Revert "deleter: prevent early memory free caused by deleter append."
  > deleter: prevent early memory free caused by deleter append.
  > Solve seastar.unit.thread failure in debug mode
  > Fix iovec-based read_dma: use make_readv_iocb instead of make_read_iocb
  > build: Fix the required version of `fmt`
  > app_template: fix use after move in app constructor
  > build: Rename CMake variable for private flags

Fixes #4269.
2019-02-24 16:06:23 +02:00
Avi Kivity
a3a7bea12f Merge "Clean up preprocessor definitions" from Jesse
* 'jhk/define_debug/v1' of https://github.com/hakuch/scylla:
  build: Remove the `DEBUG_SHARED_PTR` pp variable
  build: Prefer the Seastar version of a pp variable
2019-02-23 14:04:08 +02:00
Jesse Haber-Kucharsky
f9297895c1 auth: Change the log level for async. retries
The log message is benign, but it has caused some users of Scylla to
think that an error has occurred.

Fixes #3850

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <ba49c38266c0e77c3ed23cfca3c1a082b3060f17.1550777586.git.jhaberku@scylladb.com>
2019-02-23 14:03:16 +02:00
Tomasz Grabiec
3f698701c2 gdb: Drop incorrect throw of StopIteration
It is converted into a RuntimeError by python3:

  https://docs.python.org/3/library/exceptions.html#StopIteration

We should just return.

Message-Id: <20190221144321.18093-1-tgrabiec@scylladb.com>
2019-02-23 14:02:47 +02:00
Nadav Har'El
0eddf19432 main: add INFO log messages at start, initialization end, and end.
Scylla currently prints a welcome message when it starts, with the
Scylla version, but this is not printed to the regular log so in some
cases (e.g., Jenkins runs) we do not see it in the log. So let's add
a regular INFO-level log message with the same information.

Also, Scylla currently doesn't print any specific log message when it
normally completes its shutdown. In some cases, users may end up
wondering whether Scylla hung in the middle of the shutdown, or in
fact exited normally. Refs #4238. So in this patch we add a "shutdown
complete" message as the very last message in a successfull shutdown.
We print Scylla's version also in the shutdown message, which may be
useful to see in the logs when shutting down one version of Scylla
and starting a different version.

Finally, we also add a log message when initialization is complete,
which may also be useful to understand whether Scylla hung during
initialization.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190217140659.19512-1-nyh@scylladb.com>
2019-02-22 16:52:31 +01:00
Tomasz Grabiec
b90cb91468 gdb: Introduce 'scylla cache'
Prints contents of the row cache for each table on current shard.
Message-Id: <20190222144420.19677-1-tgrabiec@scylladb.com>
2019-02-22 14:58:58 +00:00
Paweł Dziepak
b524f96a74 mutation_reader_merger: drop unneded readers in small batches
It was observed that destroying readers as soon as they are not needed
negatively affects performance of relatively small reads. We don't want
to keep them alive for too long either, since they may own a lot of
memory, but deferring the destruction slightly and removing them in
batches of 4 seems to solve the problem for the small reads.
2019-02-22 14:43:38 +00:00
Paweł Dziepak
435e24f509 mutation_reader_merger: track readers by iterators and not pointers
mutation_reader_merger uses a std::list of mutation_reader to keep them
alive while the rest of the logic operates on non-owning pointers.

This means that when it is a time to drop some of the readers that are
no longer needed, the merger needs to scan the list looking for them.
That's not ideal.

The solution is to make the logic use iterators to elements in that
list, which allows for O(1) removal of an unneeded reader. Iterators to
list are just pointers to the node and are not invalidated by unrelated
additions and removals.
2019-02-22 14:33:10 +00:00
Paweł Dziepak
5d5777f85e tests/perf_fast_forward: disable partition-level fast-forwarding if not needed
Several of the test cases in perf_fast_forward do not need
partition-level fast-forwarding. However, since the defaults are used to
construct most of the readers the fast-forwarding is enabled regardless.

This showed an apparent regression in the perf_fast_forward results
after adcb3ec20c ("row_cache: read is not
single-partition if inter-partition forwarding is enabled") which
disabled an optimisation that was invalid when partition-level
fast-forwarind was requested.

This patch ensures that all single-partition reads that do not need
partition-level fast-forwarding keep it disabled.
2019-02-22 14:28:02 +00:00
Avi Kivity
fdefee696e Merge "sstables: mc: writer: Avoid large allocations for keeping promoted index entries" from Tomasz
"
Currently we keep the entries in a circular_buffer, which uses
a contiguous storage. For large partitions with many promoted index
entries this can cause OOM and sstable compaction failure.

A similar problem exists for the offset vector built
in write_promoted_index().

This change solves the problem by serializing promoted index entries
and the offset vector on the fly directly into a bytes_ostream, which
uses fragmented storage.

The serialization of the first entry is deferred, so that
serialization is avoided if there will be less than 2
entries. Promoted index is not added for such partitions.

There still remains a problem that large-enough promoted index can cause OOM.

Refs #4217

Tests:
  - unit (release)
  - scylla-bench write

Branches: 3.0
"

* tag 'fix-large-alloc-for-promoted-index-v3' of github.com:tgrabiec/scylla:
  sstables: mc: writer: Avoid large allocations for maintaining promoted index
  sstables: mc: writer: Avoid double-serialization of the promoted index
2019-02-22 15:44:51 +02:00
Avi Kivity
177159da75 Merge "delete_atomically recovery" from Benny
"
The delete_atomically function is required to delete a set of sstables
atomically. I.e. Either delete all or none of them.  Deleting only
some sstables in the set might result in data resurrection in case
sstable A holding tombstone that cover mutation in sstable B, is deleted,
while sstable B remains.

This patchset introduces a log file holding a list of SSTable TOC files
to delete for recovering a partial delete_atomically operation.

A new subdirectory is create in the sstables dir called `pending_delete`
holding in-flight logs.

The logs are created with a temporary name (using a .tmp suffix)
and renamed to the final .log name once ready.  This indicates
the commit point for the operation.

When populating the column family, all files in the pending_delete
sub-directory are examined.  Temporary log files are just removed,
and committed log files are read, replayed, and deleted.

Fixes #4082

Tests: unit (dev), database_test (debug)
"

* 'projects/delete_atomically_recovery/v5' of https://github.com/bhalevy/scylla:
  tests: database_test: add test_distributed_loader_with_pending_delete
  distributed_loader: replay and cleanup pending_delete log files
  distributed_loader: populated_column_family: separate temp sst dirs cleanup phase
  docs: add sstables-directory-structure.md
  sstables: commit sstables to delete_atomically into a pending_delete log file
  sstables: delete_atomically: delete sstables in a thread
  sstables: component_basename: reuse with sstring component
  sstables: introduce component_basename
  database: maybe_delete_large_partitions_entry: do not access sstable and do not mask exceptions
  sstables: add delete_sstable_and_maybe_large_data_entries
  sstables: call remove_by_toc_name in dtor if marked_for_deletion
2019-02-22 15:37:17 +02:00
Benny Halevy
1ba88b709f tests: database_test: add test_distributed_loader_with_pending_delete
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:08:22 +02:00
Benny Halevy
043673b236 distributed_loader: replay and cleanup pending_delete log files
Scan the table's pending_delete sub-directory if it exists.
Remove any temporary pending_delete log files to roll back the respective
delete_atomically operation.
Replay completed pending_delete log files to roll forward the respective
delete_atomically operation, and finally delete the log files.

Cleanup of temporary sstable directories and pending_delete
sstables are done in a preliminary scan phase when populating the column family
so that we won't attempt to load the to-be-deleted sstables.

Fixes #4082

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:08:22 +02:00
Benny Halevy
ee3ad75492 distributed_loader: populated_column_family: separate temp sst dirs cleanup phase
In preparation for replaying pending_delete log files,
we would like to first remove any temporary sst dirs
and later handle pending_delete log files, and only
then populate the column family.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:08:22 +02:00
Benny Halevy
f35e4cbac7 docs: add sstables-directory-structure.md
Refs #4184

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:08:22 +02:00
Benny Halevy
024d0a6d49 sstables: commit sstables to delete_atomically into a pending_delete log file
To facilitate recovery of a delete_atomically operation that crashed mid
way, add a replayable log file holding the committed sstables to delete.

It will be used by populate_column_family to replay the atomic deletion.

1. Write the toc names of sstables to be deleted into a temporary file.
2. Once flushed and closed, rename the temp log file into the final name
   and flush the pending_delete directory.
3. delete the sstables.
4. Remove the pending_delete log file
   and flush the pending_delete directory.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:05:37 +02:00
Benny Halevy
70fda0eda0 sstables: delete_atomically: delete sstables in a thread
In prepaton for implementing a pending_delete log file.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:05:37 +02:00
Benny Halevy
9ac04850a0 sstables: component_basename: reuse with sstring component
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:05:10 +02:00
Benny Halevy
a2a9750074 sstables: introduce component_basename
component_basename returns just the basename for the component filename
without the leading sstdir path.

To be used for delete_atomically's pending_delete log file.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 10:44:02 +02:00
Benny Halevy
13ffda5c31 database: maybe_delete_large_partitions_entry: do not access sstable and do not mask exceptions
1. We would like to be able to call maybe_delete_large_partitions_entry
from the sstable destructor path in the future so the sstable might go away
while the large data entries are being deleted.

2. We would like the caller to handle any exception on this path,
especially in the prepatation part, before calling delete_large_partitions_entry().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 10:44:02 +02:00
Benny Halevy
ae29db8db6 sstables: add delete_sstable_and_maybe_large_data_entries
To be called by delete_atomically,
rather that passing a vector to delete_sstables.

This way, no need to build `sstables_to_delete_atomically` vector

To be replaced in the future with a sstable method once we
provide the large_data_handler upon construction.

Handle exceptions from remove_by_toc_name or maybe_delete_large_partitions_entry
by merely logging an error.  There is nothing else we can do at this point.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 10:44:02 +02:00
Benny Halevy
387f14a874 sstables: call remove_by_toc_name in dtor if marked_for_deletion
No need to call delete_sstables which works on a list of sstable
(by toc name).

Also, add FIXME comment about not calling
large_data_handler.maybe_delete_large_partitions_entry
on this path.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 10:44:02 +02:00
Avi Kivity
34b254381f sstables: checksummed_file_writer: fix dma alignment
checksummed_file_writer does not override allocate_buffer(), so it inherits
data_source_impl's default allocate_buffer, which does not care about alignment.
The buffer is then passed to the real file_data_sink_impl, and thence to the file
itself, which cannot complete the write since it is not properly aligned.

This doesn't fail in release mode, since the Seastar allocator will supply a
properly aligned buffer even if not asked to do so. The ASAN allocator usually
does supply an aligned buffer, but not always, which causes the test to fail.

Fix by forwarding the allocate_buffer() function to the underlying data_source.

Fixes #4262.
Branches: branch-3.0
Message-Id: <20190221184115.6695-1-avi@scylladb.com>
2019-02-21 21:26:56 +01:00
Jesse Haber-Kucharsky
b7b50392ed build: Remove the DEBUG_SHARED_PTR pp variable
This definition is exported by Seastar as `SEASTAR_DEBUG_SHARED_PTR` and
no code in Scylla uses this definition either way.
2019-02-21 10:45:09 -05:00
Jesse Haber-Kucharsky
f4883a1aea build: Prefer the Seastar version of a pp variable
Seastar defines `SEASTAR_DEFAULT_ALLOCATOR`, and everywhere else in
Scylla we use this variable too.
2019-02-21 10:41:42 -05:00
Piotr Sarna
c743617236 cql3: unify max value for row limit and per-partition limit
Limits are stored as uint32_t everywhere, but in some places
int32_t was used, which created inconsistencies when comparing
the value to std::numeric_limits<Type>::max().
In order to solve inconsistencies, the types are unified to uint32_t,
and instead of explicitly calling numeric limit max,
an already existing constant value query::max_rows is utilized.

Fixes #4253

Message-Id: <4234712ff61a0391821acaba63455a34844e489b.1550683120.git.sarna@scylladb.com>
2019-02-21 13:56:02 +02:00
Tomasz Grabiec
ecff716f40 query-result-set: Give more context on failure
We've seen schema application failing with marshal_exception
here. That's not enough information to figure out what is the
problem. Knowing which table and column is affected would make
diagnosis much easier in certain cases.

This patch wraps errors in query::deserialization_error with more
information.

Example output:

  query::deserialization_error (failed on column system_schema.tables#bloom_filter_fp_chance \
  (version: c179c1d7-9503-3f66-a5b3-70e72af3392a, id: 0, index: 0, type: org.apache.cassandra.db.marshal.DoubleType):\
  seastar::internal::backtraced<marshal_exception> (marshaling error: read_simple - not enough bytes (expected 8, got 3)
Message-Id: <20190221113219.13018-1-tgrabiec@scylladb.com>
2019-02-21 11:35:27 +00:00
Nadav Har'El
f55bdea364 compaction manager: avoid spurious "asked to stop" message at the end of the log
This patch removes the log message about "compaction_manager - Asked to stop"
at the very end of Scylla runs. This log message is confusing because it
only has the "asked to stop" part, without finally a "stopped", and may
lead a user to incorrectly fear that the shutdown hung - when it in fact
finished just fine.

The database object holds a compaction_manager and stop()s it when the
database is stop()ed - and that is the very last thing our shutdown does.
However, much earlier, as the *first* shutdown operation (i.e., the last
at_exit() in main.cc), we already stop() the compaction manager.

The second stop() call does nothing, but unfortunately prints the log
message just before checking if it has anything to stop. So this patch
just moves the log message to after the check.

Fixes #4238.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190217142657.19963-1-nyh@scylladb.com>
2019-02-21 12:32:47 +01:00
Rafael Ávila de Espíndola
5a7bff36ca Simplify sstable::filename
No functionality change, but avoids a std::unordered_map.

Tests: unit (dev)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190221014630.15476-1-espindola@scylladb.com>
2019-02-21 12:40:01 +02:00
Avi Kivity
5520fc37ba Merge " Fix INSERT JSON with null values" from Piotr
"
Fixes #4256

This miniseries fixes a problem with inserting NULL values through
INSERT JSON interface.

Tests: unit (dev)
"

* 'fix_insert_json_with_null' of https://github.com/psarna/scylla:
  tests: add test for INSERT JSON with null values
  cql3: add missing value erasing to json parser
2019-02-21 12:36:09 +02:00
Piotr Sarna
4d211690f9 tests: add test for INSERT JSON with null values 2019-02-21 11:25:14 +01:00
Piotr Sarna
6618191e49 cql3: add missing value erasing to json parser
When inserting a null value through INSERT JSON, the column
was erroneously not removed from the 'not used' list of columns.

Fixes #4256
2019-02-21 11:23:44 +01:00
Tomasz Grabiec
8687666169 schema_tables: Add trace-level logging of schema mutations
Can be useful in diagnosing problems with application of schema
mutations.

do_merge_schema() is called on every change of schema of the local
node.

create_table_from_mutations() is called on schema merge when a table
was altered or created using mutations read from local schema tables
after applying the change, or when loading schema on boot.

Message-Id: <20190221093929.8929-2-tgrabiec@scylladb.com>
2019-02-21 12:16:38 +02:00
Tomasz Grabiec
f65d1e649d schema_mutations: Make printable
Message-Id: <20190221093929.8929-1-tgrabiec@scylladb.com>
2019-02-21 12:16:32 +02:00
Avi Kivity
9adfd11374 Merge "Avoid including cryptopp headers" from Rafael
"
cryptopp's config.h has the following pragma:

 #pragma GCC diagnostic ignored "-Wunused-function"

It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.

This patch series introduces a single .cc file that has to include
cryptopp headers.
"

* 'avoid-cryptopp-v3' of https://github.com/espindola/scylla:
  Avoid including cryptopp headers
  Delete dead code
2019-02-21 10:31:20 +02:00
Rafael Ávila de Espíndola
fd5ea2df5a Avoid including cryptopp headers
cryptopp's config.h has the following pragma:

 #pragma GCC diagnostic ignored "-Wunused-function"

It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.

The issue has been reported as
https://github.com/weidai11/cryptopp/issues/793

To work around it, this patch uses a pimpl to have a single .cc file
that has to include cryptopp headers.

While at it, it also reduces the differences and code duplication
between the md5 and sha1 hashers.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-20 08:03:46 -08:00
Rafael Ávila de Espíndola
a309f952d2 Delete dead code
This code would have be to refactored by the next patch. Since it is
commented out, just delete it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-20 08:03:46 -08:00
Duarte Nunes
4354479985 Merge 'Minimize generated view updates for unselected column updates' from Piotr
"
This series addresses the issue of redundant view updates,
generated for columns that were not selected for given materialized view.
Cases covered (quote:)
* If a base row has a live row marker, then we can avoid generating
  view updates if only unselected columns change;
* If a base row has no live row marker, then we can avoid generating
  view updates if unselected columns are updated, unless they are newly
  created, deleted, or they have a TTL.

Additionally, this series includes caching selected columns and is_index information
to avoid unnecessary CPU cycles spent on recomputing these two.

Fixes #3819
"

* 'send_less_view_updates_if_not_necessary_4' of https://github.com/psarna/scylla:
  tests: add cases for view update generation optimizations
  view: minimize generated view updates for unselected columns
  view: cache is_index for view pointer
  index: make non-pointer overload of is_index function
  index: avoid copying when checking for is_index
2019-02-20 13:24:44 +00:00
Piotr Sarna
563456e3ac tests: add cases for view update generation optimizations
Test cases that cover avoiding generating view updates
when not necessary (e.g. when a column not selected by the view
is modified) are added.
2019-02-20 14:05:29 +01:00
Piotr Sarna
bd52e05ae2 view: minimize generated view updates for unselected columns
In some cases generating view updates for columns that were not
selected in CREATE VIEW statement is redundant - it is the case
when the update will not influence row liveness in anyway.
Currently, these cases are optimized out:
 - row marker is live and only unselected columns were updated;
 - row marked is not live and only unselected columns were updated,
   and in the process nothing was created or deleted and there was
   no TTL involved;
2019-02-20 14:05:27 +01:00
Piotr Sarna
dbe8491655 view: cache is_index for view pointer
It's detrimental to keep querying index manager whether a view
is backing a secondary index every time, so this value is cached
at construct time.
At the same time, this value is not simply passed to view_info
when being created in secondary index manager, in order to
decouple materialized view logic from secondary indexes as much as
possible (the sole existence of is_index() is bad enough).
2019-02-20 12:52:32 +01:00
Piotr Sarna
cb20fc2e4f index: make non-pointer overload of is_index function
Previous interface enforced passing a shared pointer, which
might result in calling unneeded shared_from_this().
2019-02-20 12:52:32 +01:00
Piotr Sarna
94db098d39 index: avoid copying when checking for is_index
Previously is_index implementation used list_indexes() helper function,
which copies data.
2019-02-20 12:52:32 +01:00
Tomasz Grabiec
a8c74bc7ab gdb: Print LSA/Cache/Memtable memory usage from "scylla memory"
Example output:

LSA:
  allocated:     181010432
  used:          177209344
  free:            3801088

Cache:
  total:          97255424
  used:           60700600
  free:           36554824

Memtables:
 total:            83755008
 Regular:
  real dirty:      79429632
  virt dirty:      35168426
 System:
  real dirty:        524288
  virt dirty:        466764
 Streaming:
  real dirty:             0
  virt dirty:             0

Message-Id: <1550598424-23428-1-git-send-email-tgrabiec@scylladb.com>
2019-02-20 12:53:53 +02:00
Tomasz Grabiec
dafe22dd83 lsa: Fix spurios abort with --enable-abort-on-lsa-bad-alloc
allocate_segment() can fail even though we're not out of memory, when
it's invoked inside an allocating section with the cache region
locked. That section may later succeed after retried after memory
reclamation.

We should ignore bad_alloc thrown inside allocating section body and
fail only when the whole section fails.

Fixes #2924

Message-Id: <1550597493-22500-1-git-send-email-tgrabiec@scylladb.com>
2019-02-20 12:53:49 +02:00
Avi Kivity
84465c23c4 Merge "Add multi-column restrictions filtering" from Piotr
"
Fixes #3574

This series adds missing multi-column restrictions filtering to CQL.
The underlying infrastructure already allows checking multi-column
restrictions in a reasonable way, so this series consists of mostly
adding simple interfaces and parameters.
Also, unit test cases for multi-column restrictions are provided.

Tests: unit (dev)
"

* 'add_multi_column_restrictions_filtering_3' of https://github.com/psarna/scylla:
  tests: add multi-column filtering tests
  cql3: add multi-column restrictions filtering
  cql3: add specified is_satisfied_by to multi-column restriction
  cql3: rewrite raw loop in is_satisfied_by to boost::any_of
  cql3: fix is_satisfied_by for multi-column restrictions
  cql3: add missing include to multi-column restriction
2019-02-19 14:42:14 +02:00
Piotr Sarna
9432937816 tests: add multi-column filtering tests
Refs #3574
2019-02-19 13:24:25 +01:00
Piotr Sarna
4dc0b0672c cql3: add multi-column restrictions filtering
It's now possible to pass multi-column restrictions
to queries that require filtering.

Fixes #3574
2019-02-19 13:24:25 +01:00
Piotr Sarna
3db526ffe2 cql3: add specified is_satisfied_by to multi-column restriction
Multi-column restrictions need only schema, clustering key and query
options in order to decide if they are satisfied, so an overloaded
function that takes reduced number of parameters is added.
2019-02-19 13:24:25 +01:00
Piotr Sarna
16dbc917a4 cql3: rewrite raw loop in is_satisfied_by to boost::any_of 2019-02-19 13:24:12 +01:00
Piotr Sarna
0d675e4419 cql3: fix is_satisfied_by for multi-column restrictions
Multi-column restriction should be satisfied by the value
if any of the ranges contains it, not all of them.
Example: SELECT * FROM t WHERE (a,b) IN ((1,2),(1,3))
will operate on two singular ranges: [(1,2),(1,2)] and [(1,3),(1,3)].
It's sufficient for a value to be inside any of these two in order
to satisfy the restriction.
2019-02-19 13:10:58 +01:00
Avi Kivity
934ba7ccb2 Merge "tests: introduce test environment and cleanup sstable tests" from Benny
"
As part of implementing sstables manager and fixing issue related
to updating large_data_handler on all delete paths, we want to funnel
all sstable creations, loading, and deletions through a manager.

The patchset lays out test infrastructure to funnel these opeations
through class sstables::test_env.

In the process, it cleans up many numerous call sites in the existing
unit tests that evolved over time.

Refs #4198
Refs #4149

Tests: unit (dev)
"

* 'projects/test_env/v3' of https://github.com/bhalevy/scylla:
  tests: introduce sstables::test_env
  tests: perf_sstable: rename test_env
  tests: sstable_datafile_test: use useable_sst
  tests: sstable_test: add write_and_validate_sst helper
  tests: sstable_test: add test_using_reusable_sst helper
  tests: sstable_test: use reusable_sst where possible
  tests: sstable_test: add test_using_working_sst helper
  tests: sstable_3_x_test: make_test_sstable
  tests: run_sstable_resharding_test: use default parameters to make_sstable
  tests: sstables::test::make_test_sstable: reorder params
  tests: test_setup: do_with_test_directory is unused
  tests: move sstable_resharding_strategy_tests to sstable_reharding_test
  tests: move create_token_from_key helpers to test_services
  tests: move column_family_for_tests to test_services
  dht: move declaration of default_partitioner from sstable_datafile_test to i_partitioner.hh
2019-02-19 11:26:42 +02:00
Piotr Sarna
4eecb57a0b cql3: add missing include to multi-column restriction 2019-02-19 10:24:31 +01:00
Tomasz Grabiec
9c6f897731 tools/toolchain/README: Add the "Troubleshooting" section
Message-Id: <1550567863-29404-1-git-send-email-tgrabiec@scylladb.com>
2019-02-19 11:21:02 +02:00
Tzach Livyatan
622361bf1a docs/docker-hub.md: Docker Compose cluster example
This adds a simple example of launching a 3-node Scylla cluster with
Docker Compose.

Signed-off-by: Tzach Livyatan <tzach@scylladb.com>
[ penberg: minor edits ]
Message-Id: <20190213081003.6401-1-tzach@scylladb.com>
2019-02-19 09:52:20 +02:00
Avi Kivity
e37e095432 build: allow configuring and testing multiple modes
Allow the --mode argument to ./configure.py and ./test.py to be repeated. This
is to allow contiuous integration to configure only debug and release, leaving dev
to developers.
Message-Id: <20190214162736.16443-1-avi@scylladb.com>
2019-02-18 15:52:25 +00:00
Tomasz Grabiec
08f4a3664e sstables: mc: writer: Avoid large allocations for maintaining promoted index
Currently, we keep the entries in a circular_buffer, which uses
a contiguous storage. For large partitions with many promoted index
entries this can cause OOM and sstable compaction failure.

A similar problem exists for the offset vector built
in write_promoted_index().

This change solves the problem by serializing promoted index entries
and the offset vector on the fly directly into a bytes_ostream, which
uses fragmented storage.

The serialization of the first entry is deferred, so that
serialization is avoided if there will be less than 2
entries. Promoted index is not added for such partitions.

There still remains a problem that large-enough promoted index can cause OOM.

Refs #4217
2019-02-18 16:03:07 +01:00
Tomasz Grabiec
4e093bc3a4 sstables: mc: writer: Avoid double-serialization of the promoted index 2019-02-18 16:03:07 +01:00
Duarte Nunes
6e83457b1b Merge 'Add PER PARTITION LIMIT' from Piotr
"
This series introduces PER PARTITION LIMIT to CQL.
Protocol and storage is already capable of applying per-partition limits,
so for nonpaged queries the changes are superficial - a variable is parsed
and passed down.
For paged queries and filtering the situation is a little bit more complicated
due to corner cases: results for one partition can be split over 2 or more pages,
filtering may drop rows, etc. To solve these, another variable is added to paging
state - the number of rows already returned from last served partition.
Note that "last" partition may be stretched over any number of pages, not just the
last one, which is a case especially when considering filtering.
As a result, per-partition-limiting queries are not eligible for page generator
optimization, because they may need to have their results locally filtered
for extraneous rows (e.g. when the next page asks for  per-partition limit 5,
but we already received 4 rows from the last partition, so need just 1 more
from last partition key, but 5 from all next ones).

Tests: unit (dev)

Fixes #2202
"

* 'add_per_partition_limit_3' of https://github.com/psarna/scylla:
  tests: remove superficial ignore_order from filtering tests
  tests: add filtering with per partition key limit test
  tests: publish extract_paging_state and count_rows_fetched
  tests: fix order of parameters in with_rows_ignore_order
  cql3,grammar: add PER PARTITION LIMIT
  idl,service: add persistent last partition row count
  cql3: prevent page generator usage for per-partition limit
  cql3: add checking for previous partition count to filtering
  pager: add adjusting per-partition row limit
  cql3: obey per partition limit for filtering
  cql3: clean up unneeded limit variables
  cql3: obey per partition limit for select statement
  cql3: add get_per_partition_limit
  cql3: add per_partition_limit to CQL statement
2019-02-18 14:47:11 +00:00
Amnon Heiman
750b76b1de scylla-housekeeping: Read JSON as UTF-8 string for older Python 3 compatibility
Python 3.6 is the first version to accept bytes to the json.loads(),
which causes the following error on older Python 3 versions:

  Traceback (most recent call last):
    File "/usr/lib/scylla/scylla-housekeeping", line 175, in <module>
      args.func(args)
    File "/usr/lib/scylla/scylla-housekeeping", line 121, in check_version
      raise e
    File "/usr/lib/scylla/scylla-housekeeping", line 116, in check_version
      versions = get_json_from_url(version_url + params)
    File "/usr/lib/scylla/scylla-housekeeping", line 55, in get_json_from_url
      return json.loads(data)
    File "/usr/lib64/python3.4/json/__init__.py", line 312, in loads
      s.__class__.__name__))
  TypeError: the JSON object must be str, not 'bytes'

To support those older Python versions, convert the bytes read to utf8
strings before calling the json.loads().

Fixes #4239
Branches: master, 3.0

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20190218112312.24455-1-amnon@scylladb.com>
2019-02-18 14:52:32 +02:00
Piotr Sarna
5ad5221ce1 tests: remove superficial ignore_order from filtering tests
Testing filtering with LIMIT used with_rows_ignore_order function,
while it's better to use simpler with_rows.
2019-02-18 11:06:44 +01:00
Piotr Sarna
5f67a501ec tests: add filtering with per partition key limit test 2019-02-18 11:06:44 +01:00
Piotr Sarna
a84e237177 tests: publish extract_paging_state and count_rows_fetched
These local lambda functions will be reused, so they are promoted
to static functions.
2019-02-18 11:06:44 +01:00
Piotr Sarna
824e9dc352 tests: fix order of parameters in with_rows_ignore_order
When reporting a failure, expected rows were mixed up with received
rows. Also, the message assumed it received more rows, but it can
as well be less, so now it reports a "different number" of rows.
2019-02-18 11:06:44 +01:00
Piotr Sarna
3e4f065847 cql3,grammar: add PER PARTITION LIMIT
Select statements now allow passing PER PARTITION LIMIT (?) directive
which will trim results for each partition accordingly.
2019-02-18 11:06:44 +01:00
Piotr Sarna
acf7bedad4 idl,service: add persistent last partition row count
In order to process paged queries with per-partition limits properly,
paging state needs to keep additional information: what was the row
count of last partition returned in previous run.
That's necessary because the end of previous page and the beginning
of current one might consist of rows with the same partition key
and we need to be able to trim the results to the number indicated
by per-partition limit.
2019-02-18 11:06:44 +01:00
Piotr Sarna
3a2b004f02 cql3: prevent page generator usage for per-partition limit
Paged queries that induce per-partition limits cannot use
page generator optimization, as sometimes the results need
to be filtered for extraneous rows on page breaks.
2019-02-18 11:06:44 +01:00
Piotr Sarna
1dadae212a cql3: add checking for previous partition count to filtering
Filtering now needs to take into account per partition limits as well,
and for that it's essential to be able to compare partition keys
and decide which rows should be dropped - if previous page(s) contained
rows with the same partition key, these need to be taken into
consideration too.
2019-02-18 11:06:43 +01:00
Piotr Sarna
82a3883575 pager: add adjusting per-partition row limit
For filtering pagers, per partition limit should be set
to page size every time a query is executed, because some rows
may potentially get dropped from results.
2019-02-18 10:55:52 +01:00
Piotr Sarna
b965c3778f cql3: obey per partition limit for filtering
Filtering queries now take into account the limit of rows
per single partition provided by the user.
2019-02-18 10:29:34 +01:00
Piotr Sarna
b3aa939cde cql3: clean up unneeded limit variables
Some places extracted a `limit` variable to be captured by lambdas,
but they were not used inside them.
2019-02-18 10:29:34 +01:00
Piotr Sarna
cfb6e9c79c cql3: obey per partition limit for select statement
Select statement now takes into account the limit of rows
per single partition provided by the user.
2019-02-18 10:29:34 +01:00
Piotr Sarna
41b466246e cql3: add get_per_partition_limit 2019-02-18 10:29:34 +01:00
Piotr Sarna
93786a9148 cql3: add per_partition_limit to CQL statement
Select statements can now accept per_partition_limit variable.
2019-02-18 10:29:34 +01:00
Gleb Natapov
b01a659014 storage_proxy: remove old Cassandra code
Part of the code is already implemented (counters and hinted-handoff).
Part of the code will probably never be (triggers). And the rest is
the code that estimates number of rows per range to determine query
parallelism, but we implemented exponential growth algorithms instead.

Message-Id: <20190214112226.GE19055@scylladb.com>
2019-02-18 10:34:55 +02:00
Avi Kivity
a1567b0997 Merge "replace get_restricted_ranges() function with generator interface" from Gleb
"
get_restricted_ranges() is inefficient since it calculates all
vnodes that cover a requested key ranges in advance, but callers often
use only the first one.  Replace the function with generator interface
that generates requested number of vnodes on demand.
"

* 'gleb/query_ranges_to_vnodes_generator' of github.com:scylladb/seastar-dev:
  storage_proxy: limit amount of precaclulated ranges by query_ranges_to_vnodes_generator
  storage_proxy: remove old get_restricted_ranges() interface
  cql3/statements/select_statement: convert index query interface to new query_ranges_to_vnodes_generator interface
  tests: convert storage_proxy test to new query_ranges_to_vnodes_generator interface
  storage_proxy: convert range query path to new query_ranges_to_vnodes_generator interface
  storage_proxy: introduce new query_ranges_to_vnode_generator interface
2019-02-18 10:33:54 +02:00
Avi Kivity
497367f9f7 Revert "build: switch debug mode from -O0 to -Og"
This reverts commit e988521b89. It triggers a bug
int gcc variable tracking, and there are reports it significantly slows down
compilation.
2019-02-17 18:32:28 +02:00
Nadav Har'El
05db7d8957 Materialized views: name the "batch_memory_max" constant
Give the constant 1024*1024 introduced in an earlier commit a name,
"batch_memory_max", and move it from view.cc to view_builder.hh.
It now resides next to the pre-existing constant that controlled how
many rows were read in each build step, "batch_size".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190217100222.15673-1-nyh@scylladb.com>
2019-02-17 13:28:16 +00:00
Avi Kivity
7b411e30a9 Update seastar submodule
* seastar 11546d4...2313dec (6):
  > Deprecate thread_scheduling_group in favor of scheduling_group
  > Merge "Fixes for Doxygen documentation" from Jesse
  > future: optionally type-erase future::then() and future::then_wrapped
  > build: Allow deprecated declarations internally
  > rpc: fix insertion of server connections into server's container
  > rpc: split BOOST_REQUIRE with long conditions into multiple
2019-02-16 22:27:34 +02:00
Avi Kivity
03531c2443 fragmented_temporary_buffer: fix read_exactly() during premature end-of-stream
read_exactly(), when given a stream that does not contain the amount of data
requested, will loop endlessly, allocating more and more memory as it does, until
it fails with an exception (at which point it will release the memory).

Fix by returning an empty result, like input_stream::read_exactly() (which it
replaces). Add a test case that fails without a fix.

Affected callers are the native transport, commitlog replay, and internal
deserialization.

Fixes #4233.

Branches: master, branch-3.0
Tests: unit(dev)
Message-Id: <20190216150825.14841-1-avi@scylladb.com>
2019-02-16 17:06:19 +00:00
Takuya ASADA
af988a5360 install-dependencies.sh: show description when 'yum-utils' package is installed on Fedora
When yum-utils already installed on Fedora, 'yum install dnf-utils' causes
conflict, will fail.
We should show description message instead of just causing dnf error
mesage.

Fixes #4215

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190215221103.2379-1-syuu@scylladb.com>
2019-02-16 17:16:18 +02:00
Pekka Enberg
f7cf04ac4b tools/toolchain: Clean up DNF cache from Docker image
Make sure we call "dnf clean all" to remove the DNF cache, which reduces
Docker image size as per the following guidelines:

https://github.com/fedora-cloud/Fedora-Dockerfiles/wiki/Guidelines-for-Creating-Dockerfiles

A freshly built image is 250 MB smaller than the one on Docker Hub:

  <none>                                <none>               b8cafc8ff557        16 seconds ago      1.2 GB
  docker.io/scylladb/scylla-toolchain   fedora-29-20190212   d253d45a964c        3 days ago          1.45 GB

Message-Id: <20190215142322.12466-1-penberg@scylladb.com>
2019-02-16 17:12:10 +02:00
Botond Dénes
2125e99531 service/storage_service: fix pre-bootstrap wait for schema agreement
When bootstrapping, a node should to wait to have a schema agreement
with its peers, before it can join the ring. This is to ensure it can
immediately accept writes. Failing to reach schema agreement before
joining is not fatal, as the node can pull unknown schemas on writes
on-demand. However, if such a schema contains references to UDFs, the
node will reject writes using it, due to #3760.

To ensure that schema agreement is reached before joining the ring,
`storage_service::join_token_ring()` has to checks. First it checks that
at least one peer was connected previously. For this it compares
`database::get_version()` with `database::empty_version`. The (implied)
assumption is that this will become something other than
`database::empty_version` only after having connected (and pulled
schemas from) at least one peer. This assumption doesn't hold anymore,
as we now set the version earlier in the boot process.
The second check verifies that we have the same schema version as all
known, live peers. This check assumes (since 3e415e2) that we have
already "met" all (or at least some) of our peers and if there is just
one known node (us) it concludes that this is a single-node cluster,
which automatically has schema agreement.
It's easy to see how these two checks will fail. The first fails to
ensure that we have met our peers, and the second wrongfully concludes
that we are a one-node cluster, and hence have schema agreement.

To fix this, modify the first check. Instead of relying on the presence
of a non-empty database version, supposedly implying that we already
talked to our peers, explicitely make sure that we have really talked to
*at least* one other node, before proceeding to the second check, which
will now do the correct thing, actually checking the schema versions.

Fixes: #4196

Branches: 3.0, 2.3

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <40b95b18e09c787e31ba6c5519fb64d68b4ca32e.1550228389.git.bdenes@scylladb.com>
2019-02-15 15:56:46 +01:00
Rafael Ávila de Espíndola
9cd14f2602 Don't write to system.large_partition during shutdown
The included testcase used to crash because during database::stop() we
would try to update system.large_partition.

There doesn't seem to be an order we can stop the existing services in
cql_test_env that makes this possible.

This patch then adds another step when shutting down a database: first
stop updating system.large_partition.

This means that during shutdown any memtable flush, compaction or
sstable deletion will not be reflected in system.large_partition. This
is hopefully not too bad since the data in the table is TTLed.

This seems to impact only tests, since main.cc calls _exit directly.

Tests: unit (release,debug)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190213194851.117692-1-espindola@scylladb.com>
2019-02-15 10:49:10 +01:00
Avi Kivity
e988521b89 build: switch debug mode from -O0 to -Og
-Og is advertised as debug-friendly optimization, both in compile time
and debug experience. It also cuts sstable_mutation_test run time in half:

Changing -O0 to -Og

Before:

real    16m49.441s
user    16m34.641s
sys    0m10.490s

After:

real    8m38.696s
user    8m26.073s
sys    0m10.575s

Message-Id: <20190214205521.19341-1-avi@scylladb.com>
2019-02-15 08:19:48 +02:00
Benny Halevy
c8f239ff2b tests: introduce sstables::test_env
In preparation to adding sstables_manager we want
to establish an environment for testing sstables.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:37:41 +02:00
Benny Halevy
f9546b23b7 tests: perf_sstable: rename test_env
test_env is going to be a class in sstables namespace

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:15 +02:00
Benny Halevy
d6cfc1fae5 tests: sstable_datafile_test: use useable_sst
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:14 +02:00
Benny Halevy
2a6b5a7622 tests: sstable_test: add write_and_validate_sst helper
In preparation for sstables::test_env

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:14 +02:00
Benny Halevy
255f05e6c8 tests: sstable_test: add test_using_reusable_sst helper
In preparation for sstables::test_env

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:14 +02:00
Benny Halevy
e11e29a1fc tests: sstable_test: use reusable_sst where possible
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:14 +02:00
Benny Halevy
9d4989f2e8 tests: sstable_test: add test_using_working_sst helper
In preparation for sstables::test_env

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:14 +02:00
Benny Halevy
55aac22b37 tests: sstable_3_x_test: make_test_sstable
Reused for making sstables for test cases.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:14 +02:00
Benny Halevy
3bc1b8b9ff tests: run_sstable_resharding_test: use default parameters to make_sstable
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:14 +02:00
Benny Halevy
b0f3f8d766 tests: sstables::test::make_test_sstable: reorder params
In preparation for providing a default large_data_handler in
a test-standard way.

buffer_size parameter reordered and now has a default value
same as make_sstable()'s.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:21:36 +02:00
Benny Halevy
bcd3f36a8a tests: test_setup: do_with_test_directory is unused
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:21:32 +02:00
Benny Halevy
b39c7bc4ae tests: move sstable_resharding_strategy_tests to sstable_reharding_test
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:21:32 +02:00
Benny Halevy
8801a6da1f tests: move create_token_from_key helpers to test_services
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:21:32 +02:00
Benny Halevy
815fd76c25 tests: move column_family_for_tests to test_services
And unify multiple copies of column_family_test_config().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:21:10 +02:00
Benny Halevy
b6ad61d2e5 dht: move declaration of default_partitioner from sstable_datafile_test to i_partitioner.hh
So it can be used by other tests

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:16:52 +02:00
Nadav Har'El
43c42d608d materialized views: forbid using "virtual" columns in restrictions
For fixing issue #3362 we added in materialized views, in some cases,
"virtual columns" for columns which were not selected into the view.
Although these columns nominally exist in the view's schema, they must
not be visible to the user, and in commit
3f3a76aa8f we prevented a user from being
able to SELECT these columns.

In this patch we also prevent the user from being able to use these
column names (which shouldn't exist in the view) in WHERE restrictions.

Fixes #4216

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190212162014.18778-1-nyh@scylladb.com>
2019-02-14 16:08:41 +02:00
Gleb Natapov
0b84b04f97 consistency_level: make it more const correct
Message-Id: <20190214122631.GF19055@scylladb.com>
2019-02-14 14:52:51 +02:00
Nadav Har'El
fec562ec8f Materialized views: limit size of row batching during bulk view building
The bulk materialized-view building processes (when adding a materialized
view to a table with existing data) currently reads the base table in
batches of 128 (view_builder::batch_size) rows. This is clearly better
than reading entire partitions (which may be huge), but still, 128 rows
may grow pretty large when we have rows with large strings or blobs,
and there is no real reason to buffer 128 rows when they are large.

Instead, when the rows we read so far exceed some size threshold (in this
patch, 1MB), we can operate on them immediately instead of waiting for
128.

As a side-effect, this patch also solves another bug: At worst case, all
the base rows of one batch may be written into one output view partition,
in one mutation. But there is a hard limit on the size of one mutation
(commitlog_segment_size_in_mb, by default 32MB), so we cannot allow the
batch size to exceed this limit. By not batching further after 1MB,
we avoid reaching this limit when individual rows do not reach it but
128 of them did.

Fixes #4213.

This patch also includes a unit test reproducing #4213, and demonstrating
that it is now solved.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190214093424.7172-1-nyh@scylladb.com>
2019-02-14 12:04:40 +02:00
Calle Wilund
e70286a849 db/extensions: Allow schema extensions to turn themselves off
Fixes #4222

Iff an extension creation callback returns null (not exception)
we treat this as "I'm not needed" and simply ignore it.

Message-Id: <20190213124311.23238-1-calle@scylladb.com>
2019-02-13 14:50:51 +02:00
Jesse Haber-Kucharsky
74ac1deee1 build: Fix the build on Ubuntu
The way the `pkg-config` executable works on Fedora and Ubuntu is
different, since on Fedora `pkg-config` is provided by the `pkgconf`
project.

In the build directory of Seastar, `seastar.pc` and `seastar-testing.pc`
are generated. `seastar` is a requirement of `seastar-testing`.

When pkg-config is invoked like this:

    pkg-config --libs build/release/seastar-testing.pc

the version of `pkg-config` on Fedora resolves the reference to
`seastar` in `Requires` to the `seastar.pc` in the same directory.

However, the version of `pkg-config` on Ubuntu 18.04 does not:

    Package seastar was not found in the pkg-config search path.
    Perhaps you should add the directory containing `seastar.pc'
    to the PKG_CONFIG_PATH environment variable
    Package 'seastar', required by '/seastar-testing', not found

To address the divergent behavior, we set the `PKG_CONFIG_PATH` variable
to point to the directory containing `seastar.pc`. With this change, I
was able to configure Scylla on both Fedora 29 and Ubuntu 18.04.

Fixes #4218

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <d7164bde2790708425ac6761154d517404818ecd.1550002959.git.jhaberku@scylladb.com>
2019-02-13 13:33:50 +02:00
Avi Kivity
2915baeff4 Merge "Move truncation records to separate table" from Calle
"
Fixes #4083

Instead of sharded collection in system.local, use a
dedicated system table (system.truncated) to store
truncation positions. Makes query/update easier
and easier on the query memory.

The code also migrates any existing truncation
positions on startup and clears the old data.
"

* 'calle/truncation' of github.com:scylladb/seastar-dev:
  truncation_migration_test: Add rudimentary test
  system_keyspace: Add waitable for trunc. migration
  cql_test_env: Add separate config w. feature disable
  cql_test_env: Add truncation migration to init
  cql_assertions: Add null/non-null tests
  storage_service: Add features disabling for tests
  Add system.truncated documentation in docs
  commitlog_replay: Use dedicated table for truncation
  storage_service: Add "truncation_table" feature
2019-02-13 11:16:30 +02:00
Calle Wilund
2e320a456c truncation_migration_test: Add rudimentary test 2019-02-13 09:08:12 +00:00
Calle Wilund
4e657c0633 system_keyspace: Add waitable for trunc. migration
For tests. Hooray for separation of concern.
2019-02-13 09:08:12 +00:00
Calle Wilund
b253757b17 cql_test_env: Add separate config w. feature disable 2019-02-13 09:08:12 +00:00
Calle Wilund
859a1d8f36 cql_test_env: Add truncation migration to init 2019-02-13 09:08:12 +00:00
Calle Wilund
fbcbe529ad cql_assertions: Add null/non-null tests 2019-02-13 09:08:12 +00:00
Calle Wilund
64e8c6f31d storage_service: Add features disabling for tests 2019-02-13 09:08:12 +00:00
Calle Wilund
7d3867e153 Add system.truncated documentation in docs 2019-02-13 09:08:12 +00:00
Calle Wilund
12ebcf1ec7 commitlog_replay: Use dedicated table for truncation
Fixes #4083

Instead of sharded collection in system.local, use a
dedicated system table (system.truncated) to store
truncation positions. Makes query/update easier
and easier on the query memory.

The code also migrates any existing truncation
positions on startup and clears the old data.
2019-02-13 09:08:12 +00:00
Calle Wilund
ff5e541335 storage_service: Add "truncation_table" feature 2019-02-13 09:08:12 +00:00
Avi Kivity
a3de5581ce Update seastar submodule
* seastar 428f4ac...11546d4 (9):
  > reactor: Fix an infinite loop caused the by high resolution timer not being monitored
  > build: Add back `SEASTAR_SHUFFLE_TASK_QUEUE`
  > build: Unify dependency versions
  > future-util: optimize parallel_for_each() with single element
  > core/sharded.hh: fix doxygen for "Multicore" group
  > build: switch from travis-ci to circleci
  > perftune.py: fix irqbalance tuning on Ubuntu 18
  > build: Make the use of sanitizers transitive
  > net: ipv6: fix ipv6 detection and tests by binding to loopback
2019-02-12 18:42:07 +02:00
Avi Kivity
c7aa73af51 Merge "Automatically pause shard readers when not used" from Botond
"
Recently, there has been a series of incidents of the multishard
combining reader deadlocking, when the concurrency of reads were
severely restricted and there was no timeout for the read.
Several fixes have been merged (414b14a6b, 21b4b2b9a, ee193f1ab,
170fa382f) but eliminating all occurrences of deadlocks proved to be a
whack-a-mole game. After the last bug report I have decided that instead
of trying to plug new wholes as we find them, I'll try to make wholes
impossible to appear in the first place. To translate this into the
multishard reader, instead of sprinkling new `reader.pause()` calls all
over the place in the multishard reader to solve the newly found
deadlocks, make the pausing of readers fully automatic on the shard
reader level. Readers are now always kept in a paused state, except when
actually used. This eliminates the entire class of deadlock bugs.

This patch-set also aims at simplifying the multishard reader code, as
well as the code of the existing `lifecycle_policy` implementations.
This effort resulted in:
* mutation_reader.cc: no change in SLOC, although it now also contains
  logic that used to be duplicated in every `lifecycle_policy`
  implementation;
* multishard_mutation_query.cc: 150 SLOC removed;
* database.cc: 30 SLOC removed;
Also the code is now (hopefully) simpler, safer and has a clearer
structure.

Fixes #4050 (main issue)
Fixes #3970
Fixes #3998 (deprecates really)
"

* 'simplify-and-fix-multishard-reader/v3.1' of https://github.com/denesb/scylla:
  query_mutations_on_all_shards(): make states light-weight
  query_mutations_on_all_shards(): get rid of read_context::paused_reader
  query_mutations_on_all_shards(): merge the dismantling and ready_to_save states into saving state
  query_mutations_on_all_shards(): pause looked-up readers
  query_mutation_on_all_shards(): remove unecessary indirection
  shard_reader: auto pause readers after being used
  reader_concurrency_semaphore::inactive_read_handle: fix handle semantics
  shard_reader: make reader creation sync
  shard_reader: use semaphore directly to pause-resume
  shard_reader: recreate_reader(): fix empty range case
  foreign_reader: rip out the now unused private API
  shard_reader: move away from foreign_reader
  multishard_combining_reader: make shard_reader a shared pointer
  multishard_combining_reader: move the shard reader definition out
  multishard_combining_reader: disentangle shard_reader
2019-02-12 16:22:52 +02:00
Botond Dénes
db106a32c8 query_mutations_on_all_shards(): make states light-weight
Previously the different states a reader can be in were all separate
structs, and were joined together by a variant. When this was designed
this made sense as states were numerous and quite different. By this
point however the number of states has been reduced to 4, with 3 of them
being almost the same. Thus it makes sense to merge these states into
single struct and keep track of the current state with an enum field.
This can theoretically increase the chances of mistakes, but in practice
I expect the opposite, due to the simpler (and less) code. Also, all the
important checks that verify that a reader is in the state expected by
the code are all left in place.
A byproduct of this change is that the amount of cross-shard writes is
greatly reduced. Whereas previously the whole state object had to be
rewritten on state change, now a single enum value has to be updated.
Cross shard reads are reduced as well to the read of a few foreign
pointers, all state-related data is now kept on the shard where the
associated reader lives.
2019-02-12 16:20:51 +02:00
Botond Dénes
65b2eb0939 query_mutations_on_all_shards(): get rid of read_context::paused_reader 2019-02-12 16:20:51 +02:00
Botond Dénes
ec44a4dbb1 query_mutations_on_all_shards(): merge the dismantling and ready_to_save states into saving state
These two states are now the same, with the artificial distinction that
all readers are promoted to readey_to_save state after the compaction
state and the combined buffer is dismantled. From a practical
perspective this distinction is meaningless so merge the two states into
a single `saving` state.
2019-02-12 16:20:51 +02:00
Botond Dénes
9a1bd24d82 query_mutations_on_all_shards(): pause looked-up readers
On the beginning of each page, all saved readers from the previous pages
(if any) are looked up, so they can be reused. Some of these saved
readers can end up not being used at all for the current page, in which
case they will needlessly sit on their permit for the duration of
filling the page. Avoid this by immediately pausing all looked-up
readers. This also allows a nice unifying of the reader saving logic, as
now *all* readers will be in a paused state when `save_reader()` is
called. Previously, looked-up, but not used readers were an exception to
this, requiring extra logic to handle both cases. This logic can now be
removed.
2019-02-12 16:20:51 +02:00
Botond Dénes
61b9ed7faf query_mutation_on_all_shards(): remove unecessary indirection 2019-02-12 16:20:51 +02:00
Botond Dénes
9000626647 shard_reader: auto pause readers after being used
Previously it was the responsibility of the layer above (multishard
combining reader) to pause readers, which happened via an explicit
`pause()` call. This proved to be a very bad design as we kept finding
spots where the multishard reader should have paused the reader to avoid
potential deadlocks (due to starved reader concurrency semaphores), but
didn't.

This commit moves the responsibility of pausing the reader into the
shard reader. The reader is now kept in a paused state, except when it
is actually used (a `fill_buffer()` or `fast_forward_to()` call is
executing). This is fully transparent to the layer above.
As a side note, the shard reader now also hides when the reader is
created. This also used to be the responsibility of the multishard
reader, and although it caused no problems so far, it can be considered
a leak of internal details. The shard reader now automatically creates
the remote reader on the first time it is attempted to be used.

The code has been reorganized, such that there is now a clear separation
of responsibilities. The multishard combining reader handles the
combining of the output of the shard readers, as well as issuing
read-aheads. The shard reader handles read-ahead and creating the
remote reader when needed, as well as transferring the results of remote
reads to the "home" shard. The remote reader
(`shard_reader::remote_reader`, new in this patch) handles
pausing-resuming as well as recreating the reader after it was evicted.
Layers don't access each other's internals (like they used to).

After this commit, the reader passed to `destroy_reader()` will always
be in paused state.
2019-02-12 16:20:51 +02:00
Botond Dénes
ab5d717052 reader_concurrency_semaphore::inactive_read_handle: fix handle semantics
That is:
* make it move only;
* make moved-from handles null handles;
* add (public) default constructor, which constructs a null handle;
2019-02-12 16:20:51 +02:00
Botond Dénes
37006135dc shard_reader: make reader creation sync
Reader creation happens through the `reader_lifecycle_policy` interface,
which offers a `create_reader()` method. This method accepts a shard
parameter (among others) and returns a future. Its implementation is
expected to go to the specified shard and then return with the created
reader. The method is expected to be called from the shard where the
shard reader (and consequently the multishard reader) lives. This API,
while reasonable enough, has a serious flaw. It doesn't make batching
possible. For example, if the shard reader issues a call to the remote
shard to fill the remote reader's buffer, but finds that it was evicted
while paused, it has to come back to the local shard just to issue the
recreate call. This makes the code both convoluted and slow.
Change the reader creation API to be synchronous, that is, callable from
the shard where the reader has to be created, allowing for simple call
sites and batching.
This change requires that implementations of the lifecycle policy update
any per-reader data-structure they have from the remote shard. This is
not a problem however, as these data-structures are usually partitioned,
such that they can be accessed safely from a remote shard.
Another, very pleasant, consequence of this change is that now all
methods of the lifecycle interface are sync and thus calls to them
cannot overlap anymore.

This patch also removes the
`test_multishard_combining_reader_destroyed_with_pending_create_reader`
unit test, which is not useful anymore.

For now just emulate the old interface inside shard reader. We will
overhaul the shard reader after some further changes to minimize
noise.
2019-02-12 16:20:51 +02:00
Botond Dénes
57d1f6589c shard_reader: use semaphore directly to pause-resume
The shard reader relies on the `reader_lifecycle_policy` for pausing and
resuming the remote reader. The lifecycle policy's API was designed to
be as general as possible, allowing for any implementation of
pause/resume. However, in practice, we have a single implementation of
pause/resume: registering/unregistering the reader with the relevant
`reader_concurrency_semaphore`, and we don't expect any new
implementations to appear in the future.
Thus, the generic API of the lifecycle policy, is needlessly abstract
making its implementations needlessly complex. We can instead make this
very concrete and have the lifecycle policy just return the relevant
semaphore, removing the need for every implementor of the lifecycle
policy interface to have a duplicate implementation of the very same
logic.

For now just emulate the old interface inside shard reader. We will
overhaul the shard reader after some further changes to minimize noise.
2019-02-12 16:20:51 +02:00
Botond Dénes
fae5a2a8c8 shard_reader: recreate_reader(): fix empty range case
If the shard reader is created for a singular range (has a single
partition), and then it is evicted after reaching EOS, when recreated we
would have to create a reader that reads an empty range, since the only
partition the range has was already read. Since it is not possible to
create a reader with an empty range, we just didn't recreate the reader
in this case. This is incorrect however, as the code might still attempt
to read from this reader, if only due to a bug, and would trigger a
crash. The correct fix is to create an empty reader that will
immediately be at EOS.
2019-02-12 16:20:51 +02:00
Botond Dénes
cd807586f6 foreign_reader: rip out the now unused private API
Drop all the glue code, needed in the past so the shard reader can be
implemented on top of foreign reader. As the shard reader moved away
from foreign reader, this glue code is not needed anymore.
2019-02-12 16:20:51 +02:00
Botond Dénes
d80bc3c0a5 shard_reader: move away from foreign_reader
In the past, shard reader wrapped a foreign reader instance, adding
functionality required by the multishard reader on top. This has worked
well to a certain degree, but after the addition of pause-resume of
shard reader, the cooperation with foreign reader became more-and-more a
struggle. It has now gotten to a point, where it feels like shard reader
is fighting foreign reader as much as it reuses it. This manifested
itself in the ever growing amount of glue code, and hacks baked into
foreign reader (which is supposed to be of general use), specific to
the usage in the multishard reader.
It is time we don't force this code-reuse anymore and instead implement
all the required functionality in shard reader directly.
2019-02-12 16:20:51 +02:00
Botond Dénes
da0c01c68b multishard_combining_reader: make shard_reader a shared pointer
Some members of shard reader have to be accessed even after it is
destroyed. This is required by background work that might still be
pending when the reader is destroyed. This was solved by creating a
special `state` struct, which contained all the members of the shard
readers that had to be accessed even after it was destroyed. This state
struct was managed through a shared pointer, that each continuation that
was expected to outlive the reader, held a copy of. This however created
a minefield, where each line of the code had to be carefully audited to
access only fields that will be guaranteed to remain valid.
Fix this mess by making the whole class a shared pointer, with
`enable_shared_from_this`. Now each continuation just has to make sure
to keep `this` alive and code can now access all members freely (well,
almost).
2019-02-12 16:20:51 +02:00
Botond Dénes
f1c3421eb4 multishard_combining_reader: move the shard reader definition out
Shard reader started its life as a very thin layer above foreign reader,
with just some convenience methods added. As usual, by now it has grown
into a hairy monster, its class definition out-growing even that of the
multishard reader itself. It is time shard reader is moved into the
top-level scope, improving the readability of both classes.
2019-02-12 16:20:51 +02:00
Botond Dénes
7114b59309 multishard_combining_reader: disentangle shard_reader
Currently shard reader has a reference to the owning multishard reader
and it freely accesses its members. This resulted in a mess, where it's
not clear what exactly shard reader depends on. Disentangle this mess,
by making the shard reader self-sufficient, passing all it depends on
into its constructor.
2019-02-12 16:20:51 +02:00
Nadav Har'El
85e5791710 tests/view_schema_test: fix flakiness caused by missing eventually()
All tests that involve writing to a base table and then reading from the
view table must use the eventually() function to account for the fact that
the view update is asynchronous, and may be visible only some time after
writing the base table. Forgetting an eventually() can cause the test
to become flaky and sometimes fail because the expected data is not *yet*
in the view. Botond noticed these failures in practice in two subtests
(test_partition_key_filtering_with_slice and
test_clustering_key_in_restrictions).

This patch fixes both tests, and I also reviewed the entire source file
view_schem_test.cc and found additional places missing an eventually()
(and also places that unnecessarily used eventually() to read from the
base table), and fixed those as well.

Fixes #4212

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190212121140.14679-1-nyh@scylladb.com>
2019-02-12 16:10:30 +02:00
Paweł Dziepak
eb03cf00f5 sstable: write_components: drop default for encoding stats
There is no value if having a default value for encoding_stats parameter
of write_components(). If anything it weakens the tests by encouraging
not using the real encoding stats which is not what the actual sstable
write path in Scylla does.

This patch removes the default value and makes most of the tests provide
real encoding statistics. The ones that do not are those that have no
easy way of obtaining those (and those stats are not that important for
the test itself) or there is a reason for not using those
(sstable_3_x_test::test_sstable_write_large_row uses row size thresholds
based on size with default-constructed encoding_stats).

Message-Id: <20190212124356.14878-1-pdziepak@scylladb.com>
2019-02-12 16:08:24 +02:00
Calle Wilund
4a52ed7884 commitlog: Accept recycled (not yet re-used) segments in replay
Refs #4085

Changes commitlog descriptor to both accept "Recycled-Commitlog..."
file names, and preserve said name in the descriptor.

This ensures we pick up the not-yet-used recycled segments left
from a crash for replay. The replay in turn will simply ignore
the recycled files, and post actual replay they will be deleted
as needed.

Message-Id: <20190129123311.16050-1-calle@scylladb.com>
2019-02-12 12:23:55 +02:00
Nadav Har'El
93baa334ea create-relocatable-package.py: speed up slow compression
create-relocatable-package.py currently (refs #4194) builds a compressed
tar file, but does so using a painfully slow Python implementation of gzip,
which is a problem considering the huge size (around 2 gigabytes) of Scylla's
executable. On my machine, running it for a release build of Scylla takes a
whopping 6 minutes.

Just replacing the Python compression with a pipe to an external "gzip"
process speeds up the run to just 2 minutes. But gzip is still not optimal,
using only one thread even when on a many-core machine. If we switch to
"pigz", a parallel implementation of "gzip", all cores are used and on
my machine the compression speeds up to just 23 seconds - that's 15
times faster than before this patch.

So this patch has create-relocatable-package.py use an external pigz process.
"pigz" is now required on the build system (if you want to create packages),
so is added to install-dependencies.sh.

[avi: update toolchain]
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190212090333.3970-1-nyh@scylladb.com>
2019-02-12 11:19:04 +02:00
Nadav Har'El
1cf1af1502 scylla_setup: fix non-interactive behavior
In commit ec66dd6562, in non-interactive
runs of scylla_setup all options were unintentionally set to "false",
regardless of the options passed on the scylla_setup command line. This
can lead to all sorts of wrong behaviors, and in particular one test
setup assumed it was enabling the Scylla service (which was previously
the default) but after this commit, it no longer did.

This patch restores the previous behavior: Non-interactive invocations
of scylla_setup adhere to the defaults and the command-line options,
rather than blindly choosing "false".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190211214105.32613-1-nyh@scylladb.com>
2019-02-12 10:50:00 +02:00
Gleb Natapov
26e5700819 storage_proxy: limit amount of precaclulated ranges by query_ranges_to_vnodes_generator
Do not recalculate too much ranges in advance, it requires large
allocation and usually means that a consumer of the interface is going
to do to much work in parallel.

Fixes: #3767
2019-02-12 10:45:25 +02:00
Avi Kivity
da9628c6dc auth: password_authenticator: protect against NULL salted_hash
In case salted_hash was NULL, we'd access uninitialized memory when dereferencing
the optional in get_as<>().

Protect against that by using get_opt() and failing authentication if we see a NULL.

Fixes #4168.

Tests: unit (release)
Branches: 3.0, 2.3
Message-Id: <20190211173820.8053-1-avi@scylladb.com>
2019-02-11 18:54:03 +01:00
Botond Dénes
c9e00172e9 tests/multishard_mutation_query_test: add fuzzy test
"Fuzzy test" executes semi-random range-scans against semi-random data.
By doing so we hope to achieve a coverage of edge cases that would
be very hard to achieve by "conventional" unit tests.

Fuzzy test generates a table with a population of partitions that are
a combinations of all of:
* Size of static row: none, tiny, small and large;
* Number of clustering rows: none, few, several, and lots;
* Size of clustering rows: tiny, small and large;
* Number of range deletions: few, several and lots;
* Number of rows covered by a range deletion: few, several;

As well as a partition with extreme large static row, extreme number of
rows and rows of extreme size.

To avoid writing an excess amount of data, the size limit of pages is
reduced to 1KB (from the default 1MB) and the row count limit of pages
is reduced to 1000 (from the default of 10000).

The test then executes range-scans against this population. For each
range scan, a random partition range is generated, that is guaranteed to
contain at least one partition (to avoid executing mostly empty scans),
as well as a random partition-slice (row ranges). The data returned by
the query is then thoroughly validated against the population
description returned by the `create_test_table()` function.

As this test has a large degree of randomness to it, covering a
quasi-infinite input-space, it can (theoretically) fail at any time.
As such I took great care in making such failures deterministically
reproducible, based on a single random seed, which is logged to the
output in case of a failure, together with instructions on how to repeat
the particular run. The test also uses extensive logging to aid
investigations. For logging, seastar's logging mechanism is used, as
`BOOST_TEST_MESSAGE` produces unintelligible output when running with
-c > 1. Log messages are carefully tagged, so that the test produces the
least amount of noise by default, while being very explicit about what's
happening when ran with `debug` or especially `trace` log levels.
2019-02-11 17:14:47 +02:00
Botond Dénes
4b2cac6f40 tests/multishard_mutation_query_test: refactor read_all_partitions_with_paged_scan()
The existing `read_all_partitions_with_paged_scan()` implementation was
tailored to the existing, simplistic test cases. Refactor it so that it
can be used in much more complex test cases:
* Allow specifying the page's `max_size`.
* Allow specifying the query range.
* Allow specifying the partition slice's ck ranges.
* Fix minor bugs in the paging logic.

To avoid churn, a backward-compatible overload is added, that retains
the old parameter set.
2019-02-11 17:14:47 +02:00
Botond Dénes
542301fdc9 tests/test_table: add advanced create_test_table() overload
This overload provides a middle ground between the very generic, but
hard-to-use "expert version" and to very restrictive and simplistic
"beginner version". It allows the user to declaratively describe the
to-be-generated population in terms of bunch
`std::uniform_int_distribution` objects (e.g. number of rows, size of
rows, etc.).
This allows for generating a random population in a controlled way, with
a minimum amount of boiler-plate code on the user side.
2019-02-11 17:14:47 +02:00
Botond Dénes
7e1c1c2e8c tests/test_table: make create_test_table() customizable
Allow the user to specify the population of the table in a generic and
flexible way. This patch essentially rewrites the `create_test_table()`
implementation from scratch, so that it populates the table using the
partition generator passed in by the user. Backward compatibility is
kept, by providing a `create_test_table()` overload that is identical to
the previous API. This overload is now implemented on top of the generic
overload.
2019-02-11 17:14:47 +02:00
Gleb Natapov
ecc5230de5 storage_proxy: remove old get_restricted_ranges() interface
It is not used any more.
2019-02-11 14:45:43 +02:00
Gleb Natapov
0cd9bbb71d cql3/statements/select_statement: convert index query interface to new query_ranges_to_vnodes_generator interface 2019-02-11 14:45:43 +02:00
Gleb Natapov
e6208b1cde tests: convert storage_proxy test to new query_ranges_to_vnodes_generator interface 2019-02-11 14:45:43 +02:00
Gleb Natapov
2735a85c8e storage_proxy: convert range query path to new query_ranges_to_vnodes_generator interface 2019-02-11 14:45:43 +02:00
Gleb Natapov
692a0bd000 storage_proxy: introduce new query_ranges_to_vnode_generator interface
get_restricted_ranges() function gets query provided key ranges
and divides them on vnode boundaries. It iterates over all ranges and
calculates all vnodes, but all its users are usually interested in only
one vnode since most likely it will be enough to populate a page. If it
will be not enough they will ask for more. This patch introduces new
interface instead of the function that allows to generate vnode ranges
on demand instead of precalculating all of them.
2019-02-11 14:45:43 +02:00
Avi Kivity
cb51fcab9d README: improbe dbuild instructions
Add a quick start, document more options, and link from the main README.
Message-Id: <20190210154606.21739-1-avi@scylladb.com>
2019-02-11 09:25:25 +01:00
Avi Kivity
2724a66a12 docker: don't send .git during "docker build"
It's huge and useless during "docker build" operations.
Message-Id: <20190208161848.21125-1-avi@scylladb.com>
2019-02-11 09:17:14 +01:00
Glauber Costa
e0bfd1c40a allow Cassandra SSTables with counters to be imported if they are new enough
Right now Cassandra SSTables with counters cannot be imported into
Scylla.  The reason for that is that Cassandra changed their counter
representation in their 2.1 version and kept transparently supporting
both representations.  We do not support their old representation, nor
there is a sane way to figure out by looking at the data which one is in
use.

For safety, we had made the decision long ago to not import any
tables with counters: if a counter was generated in older Cassandra, we
would misrepresent them.

In this patch, I propose we offer a non-default way to import SSTables
with counters: we can gate it with a flag, and trust that the user knows
what they are doing when flipping it (at their own peril). Cassandra 2.1
is by now pretty old. many users can safely say they've never used
anything older.

While there are tools like sstableloader that can be used to import
those counters, there are often situations in which directly importing
SSTables is either better, faster, or worse: the only option left.  I
argue that having a flag that allow us to import them when we are sure
it is safe is better than having no option at all.

With this patch I was able to successfully import Cassandra tables with
counters that were generated in Cassandra 2.1, reshard and compact their
SSTables, and read the data back to get the same values in Scylla as in
Cassandra.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190210154028.12472-1-glauber@scylladb.com>
2019-02-10 17:50:48 +02:00
Glauber Costa
61ea54eff6 tools: toolchain: dbuild: use host networking
This is convenient to test scylla directly by invoking build/dev/scylla.
This needs to be done under docker because the shared objects scylla
looks for may not exist in the host system.

During quick development we may not want to go through the trouble of
packaging relocatable scylla every time to test changes.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190209021033.8400-1-glauber@scylladb.com>
2019-02-10 12:16:47 +02:00
Duarte Nunes
d2d885fb93 Merge 'Fix misdetection of remote counter shards' from Paweł
"
The code reading counter cells form sstables verifies that there are no
unsupported local or remote shards. The latter are detected by checking
if all shards are present in the counter cell header (only remote shards
do not have entries there). However, the logic responsible for doing
that was incorrectly computing the total number of counter shards in a
cell if the header was larger than a single counter shard. This resulted
in incorrect complaints that remote shards are present.

Fixes #4206

Tests: unit(release)
"

* tag 'counter-header-fix/v1' of https://github.com/pdziepak/scylla:
  tests/sstables: test counter cell header with large number of shards
  sstables/counters: fix remote counter shard detection
2019-02-10 12:16:31 +02:00
Paweł Dziepak
4eeb8eeed5 tests/sstables: test counter cell header with large number of shards
The logic responsible for reading counters from sstables was getting
confused by large headers. The size of the header depends directly on
the number of shards. This tests checks that we can handle cells with
large number of counter shards properly.
2019-02-08 17:06:31 +00:00
Paweł Dziepak
df1ac03154 sstables/counters: fix remote counter shard detection
Each counter cell has a header with an entry for each local and global
shards. The detection of remote shards is done by checking if there are
any counter shards that do not have an entry in the header. This is done
by computing the number of counter shards in a cell and comparing it to
the number of header entries. However, the computation was wrong and
included the size taken by the header itself. As a result, if the header
was as big or larger than a single counter shard Scylla incorrectly
complained about remote shards.
2019-02-08 17:04:22 +00:00
Glauber Costa
8ba6b569b1 relocatable python: make sure all shared objects are relocated
The interpreter as it is right now has a bug: I incorrectly assumed that
all the shared libraries that python dynamically links would be in
lib-dynload. That is not true, and at least some of them are in
site-packages.

With that, we were loading system libraries for some shared objects.
The approach taken to fix this is to just check if we're seeing a shared
library and relocate everything we see: we will end up relocating the
ones in lib64 too, but that not only should be okay, it is probably even
more fool-proof.

While doing that I noticed that I had forgotten to incorporate one of
previous feedback from Avi (that we're leaving temporary files behind).
So I'm fixing that as well.

[avi: update toolchain]
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190208115501.7234-1-glauber@scylladb.com>
2019-02-08 18:42:24 +02:00
Glauber Costa
fb742473e2 replace /usr/local as a source of packages in the python relocatable interpreter
I was playing with the python3 interpreter trying to get pip to work,
just to see how far we can go. We don't really need pip, but I figured
it would be a good stress test to make sure that the process is working
and robust.

And it didn't really work, because although pip will correctly install
things into $relocatable_root/local/lib, sys.path will still refer to a
hardcoded /usr/local. While this should not affect Scylla, since we
expect to have all our modules in out path anyway -- and that path is
searched before /usr/local, it is still dangerous to make an absolute
reference like this.

Unfortunately, /usr/local/ it is included unconditionally by site.py,
which is executed when the interpreter is started and there is no
environment variable I found to change that (the help string refers to
PYTHONNOUSERSITE, but I found no mention of that in site.py whatsoever)

There is a way to tell site.py not to bother to add user sites, by
passing the -s flag, which this patch does.

Aside from doing that, we also enhance PYTHONPATH to include a reference
to ./local/{lib,lib64}/python<version>/site-packages.

After applying this patch, I was able to build an interpreter containing
only python3-pip and python3-setuptools, and build the relocatable
environment from there.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190206052104.25927-1-glauber@scylladb.com>
2019-02-08 18:41:52 +02:00
Botond Dénes
181bf64858 query: add trim_clustering_row_ranges_to()
This algorithm was already duplicated in two places
(service/pager/query_pagers.cc and mutation_reader.cc). Soon it will be
used in a third place. Instead of triplicating, move it into a function
that everybody can use.
2019-02-08 16:30:17 +02:00
Botond Dénes
bc31d8cbcc tests/test_table: add keyspace and table name params
Allow the keyspace and table names to be customizable by the caller.
2019-02-08 16:30:17 +02:00
Botond Dénes
2d885c6453 tests/test_table: s/create_test_cf/create_test_table/
Also move it to the `test` namespace.
2019-02-08 16:30:17 +02:00
Botond Dénes
c2a6ac307f tests: move create_test_cf() to tests/test_table.{hh,cc}
In the next patches `create_test_cf()` will be made much more powerful
and as such generally useful. Move it into its own files so other tests
can start using it as well.
2019-02-08 16:30:17 +02:00
Botond Dénes
2d3c4f9009 tests/multishard_mutation_query_test: drop many partition test
Soon a much better test will be added that will cover many partitions
as well and much more.
2019-02-08 16:30:17 +02:00
Botond Dénes
ced0e7ecb3 tests/multishard_mutation_query_test: drop range tombstone test
Soon a much better test will be added that will also cover range
tombstones and much more.
2019-02-08 16:30:17 +02:00
Paweł Dziepak
64b1a2caf9 tests: modernise tmpdir
tmpdir is a helper class representing a temporary directory.
Unfortunately, it suffers for some problems such as lack of proper
encapsulation and weak typing. This has caused bugs in the past when the
user code accidentally modified the member variable with the path to the
directory.

This patch modernises tmpdir and updates its users. The path is stored
in a std::filesystem::path and available read-only to the class users.
mkdtemp and boost are replaced by standard solution.

The users are update to use path more (when it didn't involve too many
changes to their code) and stop using lw_shared_ptr to store the tmpdir
when it wasn't necessary.

tmpdir intentionally doesn't provide any helpers for getting the path as
a string in order to discourage weak types.

Message-Id: <20190207145727.491-1-pdziepak@scylladb.com>
2019-02-07 20:18:14 +02:00
Avi Kivity
e2e25720c1 Update seastar submodule
* seastar c3be06d...428f4ac (13):
  > build: make the "dist" test respect the build type
  > Merge 'Add support for docker --cpuset-cpus' from Juliana
  > Merge "Add support for Coroutines TS" from Paweł
  > Merge "Modernize dependency management" from Avi
  > future: propagate broken_promise exception to abandoned continuations
  > net/inet_address: avoid clang Wmissing-braces
  > build: Default to the "Release" type if unspecified
  > rpc: log an exception that may happen while processing an RPC message
  > Add a --split-dwarf option to configure.py
  > build: Fix the `StdFilesystem` module
  > Compress debug info by default
  > Add an option for building with split dwarf
  > Dockerfile: install stow
2019-02-07 20:08:15 +02:00
Paweł Dziepak
de2a447576 utils/extremum_tracking: drop default constructor
Default constructed extremum_tracker has uninitialised _default_value
which basically makes it never correct to do that. Since this class is a
mechanism and not a value it doesn't really need to be a regular type,
so let's drop the default constructor.

Message-Id: <20190207162430.7460-1-pdziepak@scylladb.com>
2019-02-07 18:31:25 +02:00
Tomasz Grabiec
7184289015 Merge "Various fixes and improvements for sstables statistics" from Paweł
This series contains several fixes and improvements as well as new tests
for sstable code dealing with statistics.

 * https://github.com/pdziepak/scylla.git sstable-stats-fixes/v1-rebased:
  sstables: compaction: don't access moved-from vector of sstables
  memtable: move encoding_stats_collector implementation out of header
  sstables: seal_statistics(): pass encoding_stats by constant reference
  sstables/mc/writer: don't assume all schema columns are present
  tests/sstable3: improvements to file compare
  tests: extract mutation data model
  tests/data_model: add support for expiring atomic cells
  tests/data_model: allow specifying timestamp for row markers
  tests/memtable: test column tracking for encoding stats
  sstables: use correct source of statistics in
    get_encoding_stats_for_compaction()
  utils/extremum_tracking: preserve "not-set" status on merge
  sstables/metadata_collector: move the default values to the global
    tracker
  tests/sstables: test for reading serialisation header
  tests/sstables: pass encoding stats to write_components()
  tests/sstable: test merging encoding_stats

Fixes #4202.
2019-02-07 12:35:29 +01:00
Paweł Dziepak
67252de195 tests/sstable: test merging encoding_stats 2019-02-07 10:17:06 +00:00
Paweł Dziepak
e25603fbf7 tests/sstables: pass encoding stats to write_components()
By default write_components() uses a safe default for encoding_stats
which indicates that all columns are present. This may hide so bugs, so
let's pass the real thing in the tests that this may matter.
2019-02-07 10:17:06 +00:00
Paweł Dziepak
d44d5ebf86 tests/sstables: test for reading serialisation header 2019-02-07 10:17:06 +00:00
Paweł Dziepak
ebf667fb9c sstables/metadata_collector: move the default values to the global tracker
column_stats is a per-partition tracker, while metadata_collector is the
global one. The statistics gathered by column_stats are merged into the
metadata_collector. In order to ensure that we get proper default values
in case no value of particular kind (e.g. no TTLs) was seen they need to
be set on the global tracker, not the per-partition one.
2019-02-07 10:16:50 +00:00
Paweł Dziepak
2680022df0 utils/extremum_tracking: preserve "not-set" status on merge
extremum_tracker allows choosing a default value that's going to be used
only if no "real" values were provided. Since it is never compared with
the actual input values it can be anything. For instance, if the minimum
tracker default value is 0 and there was one update with the value 1 the
detected minimum is going to be 1 (the default is ignored).

However, this doesn't work when the trackers are merged since that
process always leaves the destination tracker in the "set" state
regardless whether any of the merged trakcers has ever seen any value.

This is fixed by this patch, by properly preserving _is_set state on
merge.
2019-02-07 10:16:50 +00:00
Paweł Dziepak
84d8ee35d4 sstables: use correct source of statistics in get_encoding_stats_for_compaction()
sstable class is responsible for much more things that it should. In
particular, it takes care of both writing and reading sstables. The
problem that it causes is that it is very easy to confuse those two.

This is what has happened in get_encoding_stats_for_compaction().
Originally, it was using _c_stats as a source of the statistics, which
is used only during the write and per-partition. Needless to say, the
returned encoding_stats were bogus.

The correct source of those statistics is get_stats_metadata().
2019-02-07 10:16:50 +00:00
Paweł Dziepak
e315448d0a tests/memtable: test column tracking for encoding stats 2019-02-07 10:16:50 +00:00
Paweł Dziepak
591d5195a9 tests/data_model: allow specifying timestamp for row markers 2019-02-07 10:16:50 +00:00
Paweł Dziepak
b07cba6a89 tests/data_model: add support for expiring atomic cells 2019-02-07 10:16:50 +00:00
Paweł Dziepak
aab0b7360f tests: extract mutation data model 2019-02-07 10:16:50 +00:00
Paweł Dziepak
fa216be260 tests/sstable3: improvements to file compare
This patch introduces some improvement to file comparison:
 - exception flags are set so that any error triggers an exceptions and
   guarantees that they are not silently ignored
 - std::ios_base::binary flag is passed to open()
 - istreambuf_iterator is used instead of istream_iterator. It is better
   suited for comparing binary data.
2019-02-07 10:16:50 +00:00
Paweł Dziepak
bc61471132 sstables/mc/writer: don't assume all schema columns are present
The writer constructor prepares lists of present static and regular
columns, those should be used for any further checks.
2019-02-07 10:16:50 +00:00
Paweł Dziepak
0132bcc035 sstables: seal_statistics(): pass encoding_stats by constant reference 2019-02-07 10:16:50 +00:00
Paweł Dziepak
341f186933 memtable: move encoding_stats_collector implementation out of header 2019-02-07 10:16:50 +00:00
Paweł Dziepak
6d5c1a9813 sstables: compaction: don't access moved-from vector of sstables 2019-02-07 10:16:50 +00:00
Paweł Dziepak
a8a45a243b tests/cql_test_env: don't override tmpdir::path
The interface tmpdir::path isn't properly encapsulated and its users can
modify the path even though they really shouldn't. This can happen
accidentally, in cql_test_env a reference to tmpdir::path was created
and later assigned to in one of the code paths. This caused tmpdir
destructor to remove wrong directory at program exit.

This patch solves the problem by avoiding referencing tmpdir::path, a
copy is perfectly acceptable considering that this is tests-only code.

Message-Id: <20190206173046.26801-1-pdziepak@scylladb.com>
2019-02-06 20:55:40 +02:00
Takuya ASADA
96b1cb97ba dist/ami: don't cleanup build dir
rm -rf build/* was to start rpm building on clean state, but it also delete
scylla built binaries so it was not good idea.

Instead of rm -rf build/*, we can check file existance on cloned
directory, if it seems good we can reuse it.
Also we need to run git pull on each package repo since it may not
included latest commit.

Fixes #4189

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190206101755.2056-1-syuu@scylladb.com>
2019-02-06 15:33:09 +02:00
Nadav Har'El
3e7dc7230d build_deb.sh: fix error message
The error message was apparently copied from the RPM script. Fix it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190205162148.20698-1-nyh@scylladb.com>
2019-02-05 18:22:36 +02:00
Avi Kivity
54748ad15b Merge "Allow non-key IN restrictions" from Piotr
"
Fixes #4193
Fixes #3795

This series enables handling IN restrictions for regular columns,
which is needed by both filtering and indexing mechanisms.

Tests: unit (release)
"

* 'allow_non_key_in_restrictions' of https://github.com/psarna/scylla:
  tests: add filtering with IN restriction test
  cql3: remove unused can_have_only_one_value function
  cql3: allow non-key IN restrictions
2019-02-05 17:30:35 +02:00
Piotr Sarna
45db5da51b tests: add filtering with IN restriction test
Test case for filtering regular columns with IN restriction is added.
2019-02-05 16:04:17 +01:00
Piotr Sarna
36609d1376 cql3: remove unused can_have_only_one_value function 2019-02-05 16:04:17 +01:00
Piotr Sarna
c178ed8b16 cql3: allow non-key IN restrictions
Restricting a regular column with IN restriction is a perfectly
valid case for filtering and indexing, so it should be allowed.

Fixes #4193
Fixes #3795
2019-02-05 15:50:17 +01:00
Rafael Ávila de Espíndola
84542dadfa sstables: delete_atomically: don't drop futures
We still allow the delete of rows from system.large_partition to run
in parallel with the sstable deletion, but now we return a future that
waits for both.

Tests: unit (release)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190205001526.68774-1-espindola@scylladb.com>
2019-02-05 16:47:58 +02:00
Calle Wilund
ba6a8ef35b tls: Use a default prio string disabling TLS1.0 forcing min 128bits
Fixes #4010

Unless user sets this explicitly, we should try explicitly avoid
deprecated protocol versions. While gnutls should do this for
connections initiated thusly, clients such as drivers etc might
use obsolete versions.

Message-Id: <20190107131513.30197-1-calle@scylladb.com>
2019-02-05 15:34:18 +02:00
Avi Kivity
6c71eae63f Merge "API: Stream compaction history records" from Amnon
"
get_compaction_history can return a lot of records which will add up to a
big http reply.

This series makes sure it will not create large allocations when
returning the results.

It adds an api to the query_processor to use paged queries with a
consumer function that returns a future, this way we can use the http
stream after each record.

This implementation will prevent large allocations and stalls.

Fixes #4152
"

* 'amnon/compaction_history_stream_v7' of github.com:scylladb/seastar-dev:
  tests/query_processor_test: add query_with_consumer_test
  system_keyspace, api: stream get_compaction_history
  query_processor: query and for_each_cql_result with future
2019-02-05 14:16:36 +02:00
Avi Kivity
ebf179318c Merge "SI: Add virtual columns to underlying MV" from Duarte
"
Virtual columns are MV-specific columns that contribute to the
liveness of view rows. However, we were not adding those columns when
creating an index's underlying MV, causing indexes to miss base rows.

Fixes #4144
Branches: master, branch-3.0
"

Reviewed-by: Nadav Har'El <nyh@scylladb.com>

* 'sec-index/virtual-columns/v1' of https://github.com/duarten/scylla:
  tests/secondary_index_test: Add reproducer for #4144
  index/secondary_index_manager: Add virtual columns to MV
2019-02-05 13:26:45 +02:00
Avi Kivity
367ef8d318 Merge "provide our own, relocatable, python3 interpreter" from Glauber
"

We would like to deploy Scylla in constrained environments where
internet access is not permitted. In those environments it is not
possible to acquire the dependencies of Scylla from external repos and
the packages have to be sent alongside with its dependencies.

In older distributions, like CentOS7 there isn't a python3 interpreter
available. And while we can package one from EPEL this tends to break in
practice when installing the software in older patchlevels (for
instance, installing into RHEL7.3 when the latest is RHEL7.5).

The reason for that, as we saw in practice, is that EPEL may
not respect RHEL patchlevels and have the python interpreter depending
on newer versions of some system libraries.

virtualenv can be used to create isolated python enviornments, but it is
not designed for full isolation and I hit at least two roadblocks in
practice:

1) It doesn't copy the files, linking some instead. There is an
   --always-copy option but it is broken (for years) in some
   distributions.
2) Even when the above works, it still doesn't copy some files, relying
   on the system files instead (one sad example was the subprocess
   module that was just kept in the system and not moved to the
   virtualenv)

This patch solves that problem by creating a python3 environment in a
directory with the modules that Scylla uses, and no other else. It is
essentially doing what vitualenv should do but doesn't. Once this
environment is assembled the binaries are then made relocatable the same
way the Scylla binary is.

One difference (for now) between the Scylla binary relocation process
and ours is that we steer away from LD_LIBRARY_PATH: the environment
variable is inherited by any child process steming from the caller,
which means that we are unable to use the subprocess module to call
system binaries like mkfs (which our scripts do a lot). Instead, we rely
on RUNPATH to tell the binary where to search for its libraries.

Once we generate an archive with the python3 interpreter, we then
package it as an rpm with bare any dependencies. The dependencies listed
are:

$ rpm -qpR scylla-relocatable-python3-3.6.7-1.el7.x86_64.rpm
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(FileDigests) <= 4.6.0-1
rpmlib(PartialHardlinkSets) <= 4.0.4-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(PayloadIsXz) <= 5.2-1

And the total size of that rpm, with all modules scylla needs is 20MB.

The Scylla rpm now have a way more modest dependency list:

$ rpm -qpR scylla-server-666.development-0.20190121.80b7c7953.el7.x86_64.rpm | sort | uniq
/bin/sh
curl
file
hwloc
kernel >= 3.10.0-514
mdadm
pciutils
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(FileDigests) <= 4.6.0-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(PayloadIsXz) <= 5.2-1
scylla-conf
scylla-relocatable-python3 <== our python3 package.
systemd-libs
util-linux
xfsprogs

I have tested this end to end by generating RPMs from our master branch,
then installing them in a clean CentOS7.3 installation without even
using yum, just rpm -Uhv <package_list>

Then I called scylla_setup to make sure all python scripts were working
and started Scylla successfully.
"

* 'scylla-python3-v5' of github.com:glommer/scylla:
  Create a relocatable python3 interpreter
  spec file: fix python3 dependency list.
  fixup scripts before installing them to their final location
  automatically relocate python scripts
  make scyllatop relocatable
  use relative paths for installing scylla and iotune binaries
2019-02-05 12:53:34 +02:00
Amnon Heiman
c96c3ce9e8 tests/query_processor_test: add query_with_consumer_test
This patch adds a unit test for querying with a consumer function.

query with consumer uses paging, the tests covers the scenarios where
the number of rows bellow and above the page size, it also test the
option to stop in the middle of reading.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-02-05 12:35:53 +02:00
Amnon Heiman
6c7742d616 system_keyspace, api: stream get_compaction_history
get_compaciton_history can return big chunk of data.

To prevent large memory allocation, the get_compaction_history now read
each compaction_history record and use the http stream to send it.

Fixes #4152

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-02-05 11:14:53 +02:00
Amnon Heiman
c0e3b7673d query_processor: query and for_each_cql_result with future
query and for_each_cql_result accept a function that reads a row and
return a stop_iterator.

This implementation of those functions gets a function that returns a
future stop_iterator allowing preemption between calls.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-02-05 11:14:53 +02:00
Glauber Costa
afed2cddae Create a relocatable python3 interpreter
We would like to deploy Scylla in constrained environments where
internet access is not permitted. In those environments it is not
possible to acquire the dependencies of Scylla from external repos and
the packages have to be sent alongside with its dependencies.

In older distributions, like CentOS7 there isn't a python3 interpreter
available. And while we can package one from EPEL this tends to break in
practice when installing the software in older patchlevels (for
instance, installing into RHEL7.3 when the latest is RHEL7.5).

The reason for that, as we saw in practice, is that EPEL may
not respect RHEL patchlevels and have the python interpreter depending
on newer versions of some system libraries.

virtualenv can be used to create isolated python enviornments, but it is
not designed for full isolation and I hit at least two roadblocks in
practice:

1) It doesn't copy the files, linking some instead. There is an
  --always-copy option but it is broken (for years) in some
  distributions.
2) Even when the above works, it still doesn't copy some files, relying
   on the system files instead (one sad example was the subprocess
   module that was just kept in the system and not moved to the
   virtualenv)

This patch solves that problem by creating a python3 environment in a
directory with the modules that Scylla uses, and no other else. It is
essentially doing what vitualenv should do but doesn't. Once this
environment is assembled the binaries are then made relocatable the same
way the Scylla binary is.

One difference (for now) between the Scylla binary relocation process
and ours is that we steer away from LD_LIBRARY_PATH: the environment
variable is inherited by any child process steming from the caller,
which means that we are unable to use the subprocess module to call
system binaries like mkfs (which our scripts do a lot). Instead, we rely
on RUNPATH to tell the binary where to search for its libraries.

In terms of the python interpreter, PYTHONPATH does not need to be set
for this to work as the python interpreter will include the lib
directory in its PYTHONPATH. To confirm this, we executed the following
code:

    bin/python3 -c "import sys; print('\n'.join(sys.path))"

with the interpreter unpacked to  both /home/centos/glaubertmp/test/ and
/tmp. It yields respectively:

    /home/centos/glaubertmp/test/lib64/python36.zip
    /home/centos/glaubertmp/test/lib64/python3.6
    /home/centos/glaubertmp/test/lib64/python3.6/lib-dynload
    /home/centos/glaubertmp/test/lib64/python3.6/site-packages

and

    /tmp/python/lib64/python36.zip
    /tmp/python/lib64/python3.6
    /tmp/python/lib64/python3.6/lib-dynload
    /tmp/python/lib64/python3.6/site-packages

This was tested by moving the .tar.gz generated on my Fedora28 laptop to
a CentOS machine without python3 installed. I could then invoke
./scylla_python_env/python3 and use the interpreter to call 'ls' through
the subprocess module.

I have also tested that we can successfully import all the modules we listed
for installation and that we can read a sample yaml file (since PyYAML depends
on the system's libyaml, we know that this works)

Time to build:
real	0m15.935s
user	0m15.198s
sys	0m0.382s

Final archive size (uncompressed): 81MB
Final archive sie (compressed)   : 25MB

Signed-off-by: Glauber Costa <glauber@scylladb.com>
--
v3:
- rewrite in python3
- do not use temporary directories, add directly to the archive. Only the python binary
  have to be materialized
- Use --cacheonly for repoquery, and also repoquery --list in a second step to grab the file list
v2:
- do not use yum, resolve dependencies from installed packages instead
- move to scripts as Avi wants this not only for old offline CentOS
2019-02-04 18:02:40 -05:00
Glauber Costa
f757b42ba7 spec file: fix python3 dependency list.
The dependency list as it was did not reflect the fact that scyllatop is
now written in python3.

Some packages, like urwid, should use the python3 version. CentOS
doesn't really have an urwid package for python3, not even in EPEL. So
this officially marks the point in which we can't build packages that
will install in CentOS7 anyway.

Luckily, we will soon be providing our own python3 interpreter. But for
now, as a first step, simplify the dependency list by removing the
CentOS/Fedora conditional and listing the full python3 list

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-02-04 18:02:40 -05:00
Glauber Costa
7052028752 fixup scripts before installing them to their final location
Before installing python files to their final location in install.sh,
replace them with a thunk so that they can work with our python3
interpreter.  The way the thunk works, they will also work without our
python3 interpreter so unconditionally fixing them up is always safe.

I opt in this patch for fixing up just at install time to simplify
developer's life, who won't have to worry about this at all.

Note about the rpm .spec file: since we are relying on specific format
for the shebangs, we shouldn't let rpmbuild mess with them. Therefore,
we need to disable a global variable that controls that behavior (by
definition, Fedora rpmbuild will rewrite all shebangs to /usr/bin/python3)

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-02-04 18:02:40 -05:00
Glauber Costa
3869628429 automatically relocate python scripts
Given a python script at $DIR/script.py, this copies the script to
$DIR/libexec/script.py.bin, fixes its shebang to use /usr/bin/env instead
of an absolute path for the interpreter and replaces the original script
with a thunk that calls into that script.

PYTHONPATH is adjusted so that the original directory containing the script
can also serve as a source of modules, as would be originally intended.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-02-04 18:02:39 -05:00
Glauber Costa
1bb65a0888 make scyllatop relocatable
Right now the binary we distribute with scyllatop calls into
/usr/lib/scylla/scyllatop/scyllatop.py unconditionally. Calling that is
all that this binary does.

This poses a problem to our relocatable process, since we don't want
to be referring to absolute paths (And moreover, that is calling python
whereas it should be calling python3)

The scyllatop.py files includes a python3 shebang and is executable.
Therefore, it is best to just create a link to that file and execute it
directly

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-02-04 16:12:46 -05:00
Glauber Costa
e890b8af09 use relative paths for installing scylla and iotune binaries
The answer is yes: if we install them in $root/opt, we should link
to $root/opt

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-02-04 14:33:51 -05:00
Piotr Jastrzebski
834bec5cc9 Read shard awareness columns as dropped
Without this new version of Scylla won't be able to
start with system tables inherited after older version
that had shard awareness columns.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <cb62f20fc0c98f532c6f4ad5e08b3794951e85bd.1549289050.git.piotr@scylladb.com>
2019-02-04 18:43:11 +02:00
Rafael Ávila de Espíndola
bbd9dfcba7 Add a --split-dwarf option to configure.py
It is off by default as it conflicts with distcc.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190204002706.15540-1-espindola@scylladb.com>
2019-02-04 18:42:16 +02:00
Benny Halevy
a9e1e0233a Add a dev build mode to test.py
Message-Id: <20190204162112.7471-2-espindola@scylladb.com>
2019-02-04 18:38:23 +02:00
Rafael Ávila de Espíndola
6243443591 Add a dev build mode
The build times I got with a clean ccache were:

ninja dev      10806.89s user  678.29s system 2805% cpu  6:49.33 total
ninja release  28906.37s user 1094.53s system 2378% cpu 21:01.27 total
ninja debug    18611.17s user 1405.66s system 2310% cpu 14:26.52 total

With this version -gz is not passed to seastar's configure. It should
probably be seastar's configure responsibility to do that and I will
send a separate patch to do it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190204162112.7471-1-espindola@scylladb.com>
2019-02-04 18:38:22 +02:00
Calle Wilund
9cadbaa96f commitlog_replayer: Bugfix: finding truncation positions uses local var ref
"uuid" was ref:ed in a continuation. Works 99.9% of the time because
the continuation is not actually delayed (and assuming we begin the
checks with non-truncated (system) cf:s it works).
But if we do delay continuation, the resulting cf map will be
borked.

Fixes #4187.

Message-Id: <20190204141831.3387-1-calle@scylladb.com>
2019-02-04 16:51:13 +02:00
Rafael Ávila de Espíndola
15a515a39b build: Don't link utils/gz/gen_crc_combine_table with seastar
It doesn't use seastar, so there is no point in linking with it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190203214145.43009-1-espindola@scylladb.com>
2019-02-04 15:43:16 +02:00
Botond Dénes
2a67355ded multishard_combining_reader: better shard selection algorithm
The multishard reader has to combine the output of all shards into a
single fragment stream. To do that, each time a `partition_start` is
read it has to check if there is another partition, from another shard,
that has to be emitted before this partition. Currently for this it
uses the partitioner. At every partition start fragment it checks if the
token falls into the current shard sub-range. The shard sub-range is the
continuous range of tokens, where each token belongs to the same shard.
If the partition doesn't belong to the current shard sub-range the
multishard reader assumes the following shard sub-range of the next shard
will have data and move over to it. This assumption will however only
stand on very dense tables, and will fail miserably on less dense
tables, resulting in the multishard reader effectively iterating over
the shard sub-ranges (4096 in the worst case), only to find data in just
a few of them. This resulted in high user-perceived latency when
scanning a sparse table.

This patch replaces this algorithm with one based on a shard heap. The
shards are now organized into a min-heap, by the next token they have
data for. When a partition start fragment is read from the current
shard, its token is compared to the smallest token in the shard heap. If
smaller, we continue to read from the current shard. Otherwise we move
to the shard with the smallest token. When constructing the reader, or
after fast-forwarding we don't know what first token each reader will
produce. To avoid reading in a partition from each reader, we assume
each reader will produce the first token from the first shard sub-range
that overlaps with the query range. This algorithm performs much better
on sparse tables, while also being slightly better on dense tables.

I did only a very rough measurement using CQL tracing. I populated a
table with four rows on a 64 shards machine, then scanned the entire
table.
Time to scan the table (microseconds):
before 27'846
after   5'248

Fixes: #4125

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <d559f887b650ab8caa79ad4d45fa2b7adc39462d.1548846019.git.bdenes@scylladb.com>
2019-02-04 14:10:23 +02:00
Piotr Sarna
11e6d88ca7 tests: supplement filtering collections with more cases
Filtering test cases for collections are supplemented with
checking whether CONTAINS works correctly for sets and maps.

Message-Id: <4a684152cdcdb65e1415ba5859699cb324312c2b.1548837150.git.sarna@scylladb.com>
2019-02-03 17:19:30 +02:00
Avi Kivity
468f8c7ee7 Merge "Print a warning if a row is too large" from Rafael
"
This is a first step in fixing #3988.
"

* 'espindola/large-row-warn-only-v4' of https://github.com/espindola/scylla:
  Rename large_partition_handler
  Print a warning if a row is too large
  Remove defaut parameter value
  Rename _threshold_bytes to _partition_threshold_bytes
  keys: add schema-aware printing for clustering_key_prefix
2019-02-03 13:57:42 +02:00
Nadav Har'El
5a695b8029 Materialized views: fix three error messages
Three error messages were supposed to include a column name, but a "{}"
was missing in the format so the given column name didn't actually appear
in the error message. So this patch adds the missing {}'s.

Fixes #4183.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190203112100.13031-1-nyh@scylladb.com>
2019-02-03 12:23:29 +01:00
Tomasz Grabiec
72dd6f54e3 gdb: Print total amount of memory used by small and large allocations
Message-Id: <1548956406-7601-2-git-send-email-tgrabiec@scylladb.com>
2019-02-01 13:18:16 +00:00
Tomasz Grabiec
f48fa542fc gdb: Extend 'scylla memory' to show memory used by large allocations
Adds new columns to the "Page spans" table named "large [B]" and
"[spans]", which shows how much memory is allocated in spans of given
size. Excludes spans used by small pools.

Useful in determining what is the size of large allocations which
consume the memory.

Example output:

Page spans:
index      size [B]      free [B]     large [B] [spans]
    0          4096          4096          4096       1
    1          8192         32768             0       0
    2         16384         16384             0       0
    3         32768         98304       2785280      85
    4         65536         65536       1900544      29
    5        131072        524288     471597056    3598
...
   31 8796093022208             0             0       0
Large allocations: 484675584 [B]
Message-Id: <1548956406-7601-1-git-send-email-tgrabiec@scylladb.com>
2019-02-01 13:18:01 +00:00
Asias He
28d6d117d2 migration_manager: Fix nullptr dereference in maybe_schedule_schema_pull
Commit 976324bbb8 changed to use
get_application_state_ptr to get a pointer of the application_state. It
may return nullptr that is dereferenced unconditionally.

In resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test, we saw:

   4 nodes in the tests

   n1, n2, n3, n4 are started

   n1 is stopped

   n1 is changed to use different shard config

   n1 is restarted ( 2019-01-27 04:56:00,377 )

The backtrace happened on n2 right fater n1 restarts:

   0 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature STREAM_WITH_RPC_STREAM is enabled
   1 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature WRITE_FAILURE_REPLY is enabled
   2 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature XXHASH is enabled
   3 WARN 2019-01-27 04:56:05,177 [shard 0] gossip - Fail to send EchoMessage to 127.0.58.1: seastar::rpc::closed_error (connection is closed)
   4 INFO 2019-01-27 04:56:05,205 [shard 0] gossip - InetAddress 127.0.58.1 is now UP, status =
   5 Segmentation fault on shard 0.
   6 Backtrace:
   7 0x00000000041c0782
   8 0x00000000040d9a8c
   9 0x00000000040d9d35
   10 0x00000000040d9d83
   11 /lib64/libpthread.so.0+0x00000000000121af
   12 0x0000000001a8ac0e
   13 0x00000000040ba39e
   14 0x00000000040ba561
   15 0x000000000418c247
   16 0x0000000004265437
   17 0x000000000054766e
   18 /lib64/libc.so.6+0x0000000000020f29
   19 0x00000000005b17d9

We do not know when this backtrace happened, but according to log from n3 an n4:

   INFO 2019-01-27 04:56:22,154 [shard 0] gossip - InetAddress 127.0.58.2 is now DOWN, status = NORMAL
   INFO 2019-01-27 04:56:21,594 [shard 0] gossip - InetAddress 127.0.58.2 is now DOWN, status = NORMAL

We can be sure the backtrace on n2 happened before 04:56:21 - 19 seconds (the
delay the gossip notice a peer is down), so the abort time is around 04:56:0X.
The migration_manager::maybe_schedule_schema_pull that triggers the backtrace
must be scheduled before n1 is restarted, because it dereference
application_state pointer after it sleeps 60 seconds, so the time
maybe_schedule_schema_pull is called is around 04:55:0X which is before n1 is
restarted.

So my theory is: migration_manager::maybe_schedule_schema_pull is scheduled, at this time
n1 has SCHEMA application_state, when n1 restarts, n2 gets new application
state from n1 which does not have SCHEMA yet, when migration_manager::maybe_schedule
wakes up from the 60 sleep, n1 has non-empty endpoint_state but empty
application_state for SCHEMA. We dereference the nullptr
application_state and abort.

Fixes: #4148
Tests: resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test
Message-Id: <9ef33277483ae193a49c5f441486ee6e045d766b.1548896554.git.asias@scylladb.com>
2019-02-01 09:01:08 +02:00
Piotr Jastrzebski
ad217bbdc7 Revert "system_keyspace: add sharding information to local table"
This reverts commit bdce561ada.

Those columns are not used and cause problems with tools.

Refs #4112
Message-Id: <c772ebc0ebc001e5bdf229424c6d51dc58cd5d2e.1548945023.git.piotr@scylladb.com>
2019-01-31 19:06:55 +01:00
Avi Kivity
9adf46b50e Update seastar submodule
* seastar 2f35731...c3be06d (1):
  > rpc: support closing streaming when only sink or source was created

Ref #4124.
2019-01-31 12:39:02 +02:00
Nadav Har'El
7b9b7f8ebc docs/metrics.md: document syntax for choosing specific instance/shard
As another useful example of Prometheus syntax, show the syntax of plotting
a graph for one particular node or shard.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Reviewed-by: Botond Denes <bdenes@scylladb.com>
Message-Id: <20190129221607.11813-1-nyh@scylladb.com>
2019-01-31 12:37:30 +02:00
Asias He
9d9ecda619 repair: Log keyspace and table name in repair_cf_range
When a repair failed, we saw logs like:

   repair - Checksum of range (8235770168569320790, 8235957818553794560] on
   127.0.0.1 failed: std::bad_alloc (std::bad_alloc)

It is hard to tell which keyspace and table has failed.

To fix, log the keyspace and table name. It is useful to know when debugging.

Fixes #4166
Message-Id: <8424d314125b88bf5378ea02a703b0f82c2daeda.1548818669.git.asias@scylladb.com>
2019-01-31 12:36:46 +02:00
Gleb Natapov
a70374d982 messaging_service: do not forget to close stream when sending it to another side failed
Fixes #4124

Message-Id: <20190131091857.GC3172@scylladb.com>
2019-01-31 12:01:56 +02:00
Piotr Jastrzebski
4b47094f30 Prevent undefined behaviour while writing range tombstones in LA/KA
Stop calling .remove_suffix on empty string_view.

ck_bview can be empty because this function can be
called for a half open range tombstone.

It is impossible to write such range tombstones to LA/KA SSTables
so we should throw a proper exception instead of allowing
an undefined behaviour.

Refs #4113

Tests: unit(release)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <c3738916953e4b10812aed95e645c739b4c29462.1548777086.git.piotr@scylladb.com>
2019-01-31 10:58:19 +01:00
Glauber Costa
94ead559f7 move scylla-housekeeping to dist/common/scripts
All of our python scripts are there and they are all installed
automatically into /usr/lib/scylla. By keeping scylla-housekeeping
separately we are just complicating our build process.

This would be just a minor annoyance but this broke the new relocatable
process for python3 that I am trying to put together because I forgot to
add the new location as a source for the scripts.

Therefore, I propose we start being more diligent with this and keeping
all scripts together for the future.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190123191732.32126-2-glauber@scylladb.com>
2019-01-31 11:44:34 +02:00
Jesse Haber-Kucharsky
c37aa258c5 build: Fix incremental builds when Seastar changes
When a file in the `seastar` directory changes, we want to minimize the
amount of Scylla artifacts that are re-built while ensuring that all
changes in Seastar are reflected in Scylla correctly.

For compiling object files, we change Seastar to be an "order only"
dependency so that changes to Seastar don't trigger unnecessary builds.

For linking, we add an "implicit" dependency on Seastar so that Scylla
is re-linked when Seastar changes.

With these changes, modifying a Seastar header file will trigger the
recompilation of the affected Scylla object files, and modifying a
Seastar source file will trigger linking only.

Fixes #4171

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <0ab43d79ce0d41348238465d1819d4c937ac6414.1548906335.git.jhaberku@scylladb.com>
2019-01-31 11:00:40 +02:00
Raphael S. Carvalho
930f8caff9 sstables/compaction: Fix segfault when replacing expired sstable in incremental compaction
Fully expired sstable is not added to compacting set, meaning it's not actually
compacted, but it's kept in a list of sstables which incremental compaction
uses to check if any sstable can be replaced.
Incremental compaction was unconditionally removing expired sstable from compacting
set, which led to segfault because end iterator was given.

The fix is about changing sstable_set::erase() behavior to follow standard one
for erase functions which will works if the target element is not present.

Fixes #4085.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190130163100.5824-1-raphaelsc@scylladb.com>
2019-01-30 16:32:45 +00:00
Avi Kivity
056b6a4439 Update seastar submodule
* seastar 07e1ed3...2f35731 (1):
  > Merge " Initial seastar ipv6 support" from Calle
2019-01-30 17:41:39 +02:00
Avi Kivity
1224cde871 Merge "Make perf_simple_query produce JSON results" from Paweł
"
This series enhances perf_simple_query error reporting by adding an
option of producing a json file containing the results. The format of
that file is very similar to the results produces by perf_fast_forward
in order to ease integration with any tools that may want to interpret
them.

In addition to that perf_simple_query now prints to the standard output
median, median absolute deviation, minimum and maximum of the partial
results, so that there is no need for external scripts to compute those
values.
"

* tag 'perf_simple_query-json/v1' of https://github.com/pdziepak/scylla:
  perf_simple_query: produce json results
  perf_simple_query: calculate and print statistics
  perf: time_parallel: return results of each iteration
  perf_simple_query: take advantage of threads in main()
2019-01-30 17:39:19 +02:00
Paweł Dziepak
6a0ee5dbbf Merge "Simpler fix for the memtable reader's fragment monotonicity violation" from Botond
"
Recently it was discovered that the memtable reader
(partition_snapshot_reader to be more precise) can violate mutation
fragment monotonicity, by remitting range tombstones when those overlap
with more than one ck range of the partition slice.
This was fixed by 7049cd9, however after that fix was merged a much
simpler fix was proposed by Tomek, one that doesn't involve nearly as
much changes to the partition snapshot reader and hences poses less risk
of breaking it.
This mini-series reverts the previous fix, then applies the new, simpler
one.

Refs: #4104
"

* 'partition-snapshot-reader-simpler-fix/v2' of https://github.com/denesb/scylla:
  partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges
  Revert "partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges"
2019-01-30 15:24:31 +00:00
Jesse Haber-Kucharsky
b39eac653d Switch to the the CMake-ified Seastar
Committer: Avi Kivity <avi@scylladb.com>
Branch: next

Switch to the the CMake-ified Seastar

This change allows Scylla to be compiled against the `master` branch of
Seastar.

The necessary changes:

- Add `-Wno-error` to prevent a Seastar warning from terminating the
  build

- The new Seastar build system generates the pkg-config files (for
  example, `seastar.pc`) at configure time, so we don't need to invoke
  Ninja to generate them

- The `-march` argument is no longer inherited from Seastar (correctly),
  so it needs to be provided independently

- Define `SEASTAR_TESTING_MAIN` so that the definition of an entry
  point is included for all unit test compilation units

- Independently link Scylla against Seastar's compiled copy of fmt in
  its build directory

- All test files use the (now public) Seastar testing headers

- Add some missing Seastar headers to source files

[avi: regenerate frozen toolchain, adjust seastar submoule]
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <02141f2e1ecff5cbcd56b32768356c3bf62750c4.1548820547.git.jhaberku@scylladb.com>
2019-01-30 11:17:38 +02:00
Botond Dénes
8d59c36165 partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges
When entering a new ck range (of the partition-slice), the partition
snapshot reader will apply to its range tombstones stream all the
tombstones that are relevant to the new ck range. When the partition has
range tombstones that overlap with multiple ck ranges, these will be
applied to the range tombstone stream when entering any of the ck ranges
they overlap with. This will result in the violation of the monotonicity
of the mutation fragments emitted by the reader, as these range
tombstones will be re-emitted on each ck range, if the ck range has at
least one clustering row they apply to.
For example, given the following partition:
    rt{[1,10]}, cr{1}, cr{2}, cr{3}...

And a partition-slice with the following ck ranges:
    [1,2], [3, 4]

The reader will emit the following fragment stream:
    rt{[1,10]}, cr{1}, cr{2}, rt{[1,10]}, cr{3}, ...

Note how the range tombstone is emitted twice. In addition to violating
the monotonicity guarantee, this can also result in an explosion of the
number of emitted range tombstones.

Fix by trimming range tombstones to the start of the current ck range,
thus ensuring that they will not violate mutation fragment monotonicity
guarantees.

Refs: #4104

This is a much simpler fix for the above issue, than the already
committed one (7049cd937A). The latter is reverted by the previous
patch and this patch applies the simpler fix.
2019-01-30 10:01:13 +02:00
Nadav Har'El
9dd3c59c77 docs/metrics.md: explain Prometheus and Grafana
docs/metrics.md so far explained just the REST API for retrieving current
metrics from a single Scylla node. In this patch, I add basic explanations
on how to use the Prometheus and Grafana tools included in the
"scylla-grafana-monitoring" project.

It is true that technically, what is being explained here doesn't come
with the Scylla project and requires the separate scylla-grafana-monitoring
to be installed as well. Nevertheless, most Scylla developers will need this
knowledge eventually and suprisingly it appears it was never documented
anywhere accessible to newbie developers, and I think metrics.md is the
right place to introduce it.

In fact, I myself wasn't aware until today that Prometheus actually had
its own Web UI on port 9090, and that it is probably more useful for
developers than Grafana is.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Reviewed-by: Botond Denes <bdenes@scylladb.com>
Message-Id: <20190129114214.17786-1-nyh@scylladb.com>
2019-01-29 15:46:06 +02:00
Duarte Nunes
35c03f41a4 Merge 'Fix multiple contains for one column' from Piotr
"
An error in validating CONTAINS restrictions against collections caused
only the first restriction to be taken into account due to returning
prematurely.
This miniseries provides a fix for that as well as a matching test case.

Tests: unit (release)
Fixes #4161
"

* 'fix_multiple_contains_for_one_column' of https://github.com/psarna/scylla:
  tests: enable CONTAINS tests for filtering
  cql3: remove premature return from is_satisfied_by
  cql3: restore indentation
2019-01-29 11:10:13 +00:00
Piotr Sarna
11aae54cca tests: enable CONTAINS tests for filtering
Tests for filtering with CONTAINS restrictions were not enabled,
so they are now. Also, another case for having two CONTAINS restrictions
for a single column is added.

Refs #4161
2019-01-29 11:47:28 +01:00
Piotr Sarna
9595fec2ec cql3: remove premature return from is_satisfied_by
Function which checked whether a CONTAINS restriction is satisfied
by a collection erroneously returned prematurely after checking
just the first restriction - which works fine for the usual case,
but fails if there are multiple CONTAINS restrictions present
for a column.

Fixes #4161
2019-01-29 11:47:28 +01:00
Piotr Sarna
89af01315d cql3: restore indentation 2019-01-29 11:47:28 +01:00
Rafael Ávila de Espíndola
625080b414 Rename large_partition_handler
Now that it also handles large rows, rename it to large_data_handler.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 15:03:14 -08:00
Rafael Ávila de Espíndola
1185138a34 Print a warning if a row is too large
Tests: unit (release)

Refs #3988.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 15:03:10 -08:00
Rafael Ávila de Espíndola
776d5bb9e2 Remove defaut parameter value
The value is already passed by cql_table_large_partition_handler, so
the default was just for nop_large_partition_handler.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 13:02:01 -08:00
Rafael Ávila de Espíndola
30528fa853 Rename _threshold_bytes to _partition_threshold_bytes
A followup patch will add a threshold for rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 13:02:01 -08:00
Rafael Ávila de Espíndola
561285488b keys: add schema-aware printing for clustering_key_prefix
For reporting large rows we have to be able to print clustering keys
in addition to partition keys.

Refs #3988.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 13:01:54 -08:00
Paweł Dziepak
335dca54a5 perf_simple_query: produce json results 2019-01-28 16:36:06 +00:00
Paweł Dziepak
7d21c9c31f perf_simple_query: calculate and print statistics 2019-01-28 16:36:06 +00:00
Paweł Dziepak
eb3d80fa2b perf: time_parallel: return results of each iteration 2019-01-28 16:35:33 +00:00
Pekka Enberg
7bda3abbc6 toolchain/dbuild: Fix permission errors when SELinux is enabled
Use the ":z" suffix to tell Docker to relabel file objets on shared
volumes. Fixes accessing filesystem via dbuild when SELinux is enabled.

Message-Id: <20190128160557.2066-1-penberg@scylladb.com>
2019-01-28 18:16:53 +02:00
Paweł Dziepak
6a1e1e8454 perf_simple_query: take advantage of threads in main() 2019-01-28 13:21:08 +00:00
Paweł Dziepak
11a1f97307 Merge "Fix cleanup of temporary sstable directories" from Benny
"
Cleanup of temporary sstable directories in distributed_loader::populate_column_family
is completely broken and non tested. This code path was never executed since
populate_column_family doesn't currently list subdirectories at all.

This patchset fixes this code path and scans subdirectories in populate_column_family.
Also, a unit test is added for testing the cleanup of incomplete (unsealed) sstables.

Fixes: #4129
"

* 'projects/sst-temp-dir-cleanup/v3' of https://github.com/bhalevy/scylla:
  tests: add test_distributed_loader_with_incomplete_sstables
  tests: single_node_cql_env::do_with: use the provided data_file_directories path if available
  tests: single_node_cql_env::_data_dir is not used
  distributed_loader: populate_column_family should scan directories too
  sstables: fix is_temp_dir
  distributed_loader: populate_column_family: ignore directories other than sstable::is_temp_dir
  distributed_loader: remove temporary sstable directories only on shard 0
  distributed_loader: push future returned by rmdir into futures vector
2019-01-28 12:23:00 +00:00
Duarte Nunes
ea34e242de Merge 'Do not use hints for view building' from Piotr
"
This series prevents view building to fall back to storing hints.
Instead, it will try to send hints to an endpoint as if it has
consistency level ONE, and in case of failure retry the whole
building step. Then, view building will never be marked as finished
prematurely (because of pending hints), which will help avoid
creating inconsistencies when decommissioning a node from the cluster.

Tests:
  unit (release)
  dtest (materialized_views_test.py.*)

Fixes #3857
Fixes #4039
"

* 'do_not_mark_view_as_built_with_hints_7' of https://github.com/psarna/scylla:
  db,view: add updating view_building_paused statistics
  database: add view_building_paused metrics
  table: make populate_views not allow hints
  db,view: add allow_hints parameter to mutate_MV
  storage_proxy: add allow_hints parameter to send_to_endpoint
2019-01-28 10:31:14 +00:00
Piotr Sarna
9a6261ca27 db,view: add updating view_building_paused statistics
Each time view building does is paused because of connection failure,
view_building_paused metrics is bumped.
2019-01-28 09:38:42 +01:00
Piotr Sarna
e30b0663d6 database: add view_building_paused metrics
The metrics exposes how many times view building process was paused,
e.g. because target node was down or overloaded.
2019-01-28 09:38:42 +01:00
Piotr Sarna
5dec6dc6c6 table: make populate_views not allow hints
View building uses populate_views to generate and send view updates.
This procedure will now not allow hints to be used to acknowledge
the write. Instead, the whole building step will be retried on failure.

Fixes #3857
Fixes #4039
2019-01-28 09:38:42 +01:00
Piotr Sarna
e30cf22956 db,view: add allow_hints parameter to mutate_MV
Mutating MV function can now accept a parameter whether
hints should be allowed during sending mutations to endpoints.
2019-01-28 09:38:42 +01:00
Piotr Sarna
e0fe9ce2c0 storage_proxy: add allow_hints parameter to send_to_endpoint
With hints allowed, send_to_endpoint will leverage consistency level ANY
to send data. Otherwise, it will use the default - cl::ONE.
2019-01-28 09:38:41 +01:00
Rafael Ávila de Espíndola
5332ebd50c Update the description of compaction_large_partition_warning_threshold_mb
Despite the name, this option also controls if a warning is issued
during memtable writes.

Warning during memtable writes is useful but the option name also
exists in cassandra, so probably the best we can do is update the
description.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190125020821.72815-1-espindola@scylladb.com>
2019-01-28 09:09:35 +02:00
Takuya ASADA
5c6c008109 dist/ami: follow build script changes on -jmx/-tools/-ami packages
We need to follow changes of rpm package build procedure on
-jmx/-tools/-ami packages, since it have been changed when we merged
relocatable pacakge.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190127204436.13959-1-syuu@scylladb.com>
2019-01-28 09:08:32 +02:00
Takuya ASADA
7db1b45839 reloc: move relocatable libraries from /opt/scylladb/lib to /opt/scylladb/libreloc
On Scylla 3rdparty tools, we add /opt/scylladb/lib to LD_LIBRARY_PATH.
We use same directory for relocatable binaries, including libc.so.6.
Once we install both scylla-env package and relocatable version of scylla-server package, the loader tries to load libc from /opt/scylladb/lib then entire distribution become unusable.

We may able to use Obsoletes or Conflict tag on .rpm/.deb to avoid
install new Scylla package with scylla-env, but it's better & safer not to share
same directory for different purpose.

Fixes #3943

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190128023757.25676-1-syuu@scylladb.com>
2019-01-28 09:04:56 +02:00
Avi Kivity
274f553485 tools: toolchain: run dbuild container with same timezone as host
Make it easier to work interactively by not reporting surprising times.

There are also reports that dtest fails with incorrect timezones, but those
are probably bugs in dtest.
Message-Id: <20190127134754.1428-1-avi@scylladb.com>
2019-01-27 22:48:42 +00:00
Duarte Nunes
aafaf840a2 tests/secondary_index_test: Add reproducer for #4144
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2019-01-27 22:30:34 +00:00
Duarte Nunes
aa476cd6c9 index/secondary_index_manager: Add virtual columns to MV
Virtual columns are MV-specific columns that contribute to the
liveness of view rows. However, we were not adding those columns when
creating an index's underlying MV, causing indexes to miss base rows.

Fixes #4144
Branches: master, branch-3.0

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2019-01-27 22:30:12 +00:00
Benny Halevy
36b6a3ebcf tests: add test_distributed_loader_with_incomplete_sstables
Test removal of sstables with temporary TOC file,
with and without temporary sstable directory.

Temporary sstable directories may be empty or still have
leftover components in them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:48:24 +02:00
Benny Halevy
64a23ea3bc tests: single_node_cql_env::do_with: use the provided data_file_directories path if available
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
441809094a tests: single_node_cql_env::_data_dir is not used
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
74ef09a3a2 distributed_loader: populate_column_family should scan directories too
To detect and cleanup leftover temporary sstable directories.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
bd85975277 sstables: fix is_temp_dir
1. fs::canonical required that the path will exist.
   and there is no need for fs::canonical here.
2. fs::path::extension will return the leading dot.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
c2a5f3b842 distributed_loader: populate_column_family: ignore directories other than sstable::is_temp_dir
populate_column_family currently lists only regular files. ignoring all directories.
A later patch in this series allows it to list also directories so to cleanup
the temporary sstable directories, yet valid sub-directories, like staging|upload|snapshots,
may still exist and need to be ignored.

Other kinds of handling, like validating recgnized sub-directories and halting on
unrecognized sub-directories are possible, yet out of scope for this patch(set).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
9bd7b2f4e6 distributed_loader: remove temporary sstable directories only on shard 0
Similar to calling remove_sstable_with_temp_toc later on in
populate_column_family(), we need only one thread to do the
cleanup work and the existing convention is that it's shard 0.

Since lister::rmdir is checking remove_file of all entries
(recursively) and the dir itself, doing that concurrently would fail.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
bcfb2e509b distributed_loader: push future returned by rmdir into futures vector 2019-01-27 14:14:32 +02:00
Asias He
ee0bb0aa94 tests: Drop the unsupported random_read mode in perf_sstable
It is not supported. Remove it.
Message-Id: <fe31e090574be96a9620b6902ceb843699d558d0.1548403105.git.asias@scylladb.com>
2019-01-25 14:24:40 +00:00
Avi Kivity
85abb13679 Merge "Fix cross shard cf usage" from Piotr
"
Lambda passed to distribute_reader_and_consume_on_shards shouldn't
capture shard local variables.

Fixes #4108

Tests:
unit(release),
dtest(update_cluster_layout_tests.TestLargeScaleCluster.add_50_nodes_test)
"

* 'haaawk/4108/v2' of github.com:scylladb/seastar-dev:
  Fix cross shard cf usage in repair
  Fix cross shard cf usage in streaming
2019-01-24 19:40:44 +02:00
Avi Kivity
d0f9e00e85 Merge " Support 64-bit gc_clock" (fixes) from Benny
"
Use int64_t in data::cell for expiry / deletion time.

Extend time_overflow unit tests in cql_query_test to use
select statements with and without bypass cache to access deeper
into the system.

Refs #3353
"

* 'projects/gc_clock_64_fixes/v1' of https://github.com/bhalevy/scylla:
  tests: extend time_overflow unit tests
  data::cell: use int64_t for expiry and deletion time
2019-01-24 19:15:12 +02:00
Piotr Jastrzebski
fab1b7a3a2 Fix cross shard cf usage in repair
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 18:13:49 +01:00
Piotr Jastrzebski
1ac7283550 Fix cross shard cf usage in streaming
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 18:13:30 +01:00
Glauber Costa
ec66dd6562 scylla_setup: tell users about the possibility of a non-interactive session
From day1, scylla_setup can be run either iteractively or through
command line parameters. Still, one of the requests we are asked the
most from users is whether we can provide them with a version of
scylla_setup that they can call from their scripts.

This probably happens because once you call a script interactively,
it may not be totally obvious that a different mode is available.
Even when we do tell users about that possibility, the request number
two is then "which flags do I pass?"

The solution I am proposing is to just tell users the answers to those
qestions at the end of an interactive session. After this patch, we
print the following message to the console:

  ScyllaDB setup finished.
  scylla_setup accepts command line arguments as well! For easily provisioning in a similar environmen than this, type:

    scylla_setup --no-raid-setup --nic eth0 --no-kernel-check \
                 --no-verify-package --no-enable-service --no-ntp-setup \
                 --no-node-exporter --no-fstrim-setup

  Also, to avoid the time-consuming I/O tuning you can add --no-io-setup and copy the contents of /etc/scylla.d/io*
  Only do that if you are moving the files into machines with the exact same hardware

Notes on the implementation: it is unfortunate for these purposes that
all our options are negated. Most conditionals are branching on true
conditions, so although I could write this:

  args.no_option = not interactive_ask_service(...)
  if not args.no_option:
    ...

I opted in this patch to write:

  option = interactive_ask_service(...)
  args.no_option = not option
  if option:
    ...

There is an extra line and we have to update args separately, but it
makes it less hard to get confused in the conditional with the double
negation. Let me know if there are disagreements here.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190124153832.21140-1-glauber@scylladb.com>
2019-01-24 17:41:26 +02:00
Benny Halevy
6efd85ed01 tests: extend time_overflow unit tests
Test also cql select queries with and without bypass cache.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-24 15:55:06 +02:00
Benny Halevy
7373825473 data::cell: use int64_t for expiry and deletion time
Ttl may still use int32_t to reduce footprint

Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-24 15:55:06 +02:00
Takuya ASADA
597059b4b1 dist/debian: skip stripping libprotobuf.so.15
dh_strip won't able to strip libprotobuf.so.15, and we actually don't
need to strip dependency libraries, so skip it.

Fixes #4135

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190123202213.2117-4-syuu@scylladb.com>
2019-01-24 15:51:56 +02:00
Takuya ASADA
aefc18e70d dist/debian: install /usr/bin/file for dh_strip
dh_strip requires /usr/bin/file but does not automatically installed, so
install it on build_deb.sh.

Fixes #4134

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190123202213.2117-3-syuu@scylladb.com>
2019-01-24 15:51:53 +02:00
Benny Halevy
fbebd0bb1d thrift: validate_column_name: fix exception format string
It's printing uint32_t rather than char*.

Refs #4140

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190124104002.32381-1-bhalevy@scylladb.com>
2019-01-24 12:46:23 +02:00
Avi Kivity
b58b82c9a2 Merge "Cut build dependencies around types.hh" from Piotr
"
I've recently had to work around types.hh/types.cc files and had
very unpleasent experience with incremental build on every change
to types.hh. It took ~30 min on my machine which is almost as much
as the clean build.

I looked around and it turns out that types.hh contains the whole
hierarchy of the types. On the same time, many places access the
types only through abstract_type which is the root of the
hierarchy.

This patchset extracts user_type_impl, tuple_type_impl,
map_type_impl, set_type_impl, list_type_impl and
collection_type_impl from types.hh and places each of them
in a separate header.

The result of this is that change in user_type_impl causes now
incremental build of ~6 min instead of ~30 min.
Change to tuple_type_impl causes incremental build of ~7.5 min
instead of ~30 min and change to map_type_impl triggers incremental
build that takes ~20 min instead of ~30 min.

Tests: unit(release)
"

* 'haaawk/types_build_speedup_2/rfc/2' of github.com:scylladb/seastar-dev:
  Stop including types/list.hh in cql3/tuples.hh
  Stop including types/set.hh into cql3/sets.hh
  Move collection_type_impl out of types.hh to types/collection.hh
  Move set_type_impl out of types.hh to types/set.hh
  Move list_type_impl out of types.hh to types/list.hh
  Move map_type_impl out of types.hh to types/map.hh
  Move tuple_type_impl from types.hh to types/tuple.hh
  Decouple database.hh from types/user.hh
  Allow to use shared_ptr with incomplete type other than sstable
  Move user_type_impl out of types.hh to types/user.hh
2019-01-24 11:21:22 +02:00
Piotr Jastrzebski
a3912a35f5 Stop including types/list.hh in cql3/tuples.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:57:19 +01:00
Piotr Jastrzebski
fe8dfc8fdc Stop including types/set.hh into cql3/sets.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:57:19 +01:00
Piotr Jastrzebski
5a5201a50b Move collection_type_impl out of types.hh to types/collection.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
ad016a732b Move set_type_impl out of types.hh to types/set.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
b1e1b66732 Move list_type_impl out of types.hh to types/list.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
147cc031db Move map_type_impl out of types.hh to types/map.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
b6b2fdc5be Move tuple_type_impl from types.hh to types/tuple.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
7666e81b51 Decouple database.hh from types/user.hh
This commit declares shared_ptr<user_types_metadata> in
database.hh were user_types_metadata is an incomplete type so
it requires
"Allow to use shared_ptr with incomplete type other than sstable"
to compile correctly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:55:04 +01:00
Piotr Jastrzebski
316be5c6b5 Allow to use shared_ptr with incomplete type other than sstable
When seastar/core/shared_ptr_incomplete.hh is included in a header
then it causes problems with all declarations of shared_ptr<T> with
incomplete type T that end up in the same compilation unit.

The problem happens when we have a compilation unit that includes
two headers a.hh and b.hh such that a.hh includes
seastar/core/shared_ptr_incomplete.hh and b.hh declares
shared_ptr<T> with incomplete type T. On the same time this
compilation unit does not use declared shared_ptr<T> so it should
compile and work but it does not because shared_ptr_incomplete.hh
is included and it forces instantiation of:

template <typename T>
T*
lw_shared_ptr_accessors<T,
void_t<decltype(lw_shared_ptr_deleter<T>{})>>::to_value(lw_shared_ptr_counter_base*
counter) {
    return static_cast<T*>(counter);
}

for each declared shared_ptr<T> with incomplete type T. Even the once
that are never used.

Following commit "Decouple database.hh from types/user.hh"
moves user_types_metadata type out of database.hh and instead
declares shared_ptr<user_types_metadata> in database.hh where
user_types_metadata is incomplete. Without this commit
the compilation of the following one fails with:

In file included from ./sstables/sstables.hh:34,
                 from ./db/size_estimates_virtual_reader.hh:38,
                 from db/system_keyspace.cc:77:
seastar/include/seastar/core/shared_ptr_incomplete.hh: In
instantiation of ‘static T*
seastar::internal::lw_shared_ptr_accessors<T,
seastar::internal::void_t<decltype
(seastar::lw_shared_ptr_deleter<T>{})>
>::to_value(seastar::lw_shared_ptr_counter_base*) [with T =
user_types_metadata]’:
seastar/include/seastar/core/shared_ptr.hh:243:51:   required from
‘static void seastar::internal::lw_shared_ptr_accessors<T,
seastar::internal::void_t<decltype
(seastar::lw_shared_ptr_deleter<T>{})>
>::dispose(seastar::lw_shared_ptr_counter_base*) [with T =
user_types_metadata]’
seastar/include/seastar/core/shared_ptr.hh:300:31:   required from
‘seastar::lw_shared_ptr<T>::~lw_shared_ptr() [with T =
user_types_metadata]’
./database.hh:1004:7:   required from ‘static void
seastar::internal::lw_shared_ptr_accessors_no_esft<T>::dispose(seastar::lw_shared_ptr_counter_base*)
[with T = keyspace_metadata]’
seastar/include/seastar/core/shared_ptr.hh:300:31:   required from
‘seastar::lw_shared_ptr<T>::~lw_shared_ptr() [with T =
keyspace_metadata]’
./db/size_estimates_virtual_reader.hh:233:67:   required from here
seastar/include/seastar/core/shared_ptr_incomplete.hh:38:12: error:
invalid static_cast from type ‘seastar::lw_shared_ptr_counter_base*’
to type ‘user_types_metadata*’
     return static_cast<T*>(counter);
            ^~~~~~~~~~~~~~~~~~~~~~~~
[131/415] CXX build/release/distributed_loader.o

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:45:25 +01:00
Piotr Jastrzebski
e92b4c3dbc Move user_type_impl out of types.hh to types/user.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:04:04 +01:00
Rafael Ávila de Espíndola
f7d1dc16d4 database: Use nop_large_partition_handler to avoid self-reporting
Currently nop_large_partition_handler is only used in tests, but it
can also be used avoid self-reporting.

Tests: unit(Release)

I also tested starting scylla with
--compaction-large-partition-warning-threshold-mb=0.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190123205059.39573-1-espindola@scylladb.com>
2019-01-23 21:11:21 +00:00
Avi Kivity
4882f29f82 Merge "Detemplatize primary key restrictions" from Piotr
"
This series is a first small step towards rewriting
CQL restrictions layer. Primary key restrictions used to be
a template that accepts either partition_key or clustering_key,
but the implementation is already based on virtual inheritance,
so in multiple cases these templates need specializations.

Refs #3815
"

* 'detemplatize_primary_key_restrictions_2' of https://github.com/psarna/scylla:
  cql3: alias single_column_primary_key_restrictions
  cql3: remove KeyType template from statement_restrictions
  cql3: remove template from primary_key_restrictions
  cql3: remove forwarding_primary_key_restrictions
2019-01-23 17:43:03 +02:00
Piotr Sarna
9982587bea cql3: alias single_column_primary_key_restrictions
In preparation for detemplatizing this class, it's aliased with
single_column_partition_key restrictions and
single_column_clustering_key_restrictions accordingly.
2019-01-23 17:43:03 +02:00
Piotr Sarna
4663094474 cql3: remove KeyType template from statement_restrictions
The code is unfolded into serving partition and clustering key
cases separately instead of overloading a template.
2019-01-23 17:43:03 +02:00
Piotr Sarna
4bd0cb8dd9 cql3: remove template from primary_key_restrictions
Partition key restrictions and clustering key restrictions
currently require virtual function specializations and have
lots of distinct code, so there's no value in having
primary_key_restrictions<KeyType> template.
2019-01-23 17:43:03 +02:00
Piotr Sarna
bdd8566ea3 cql3: remove forwarding_primary_key_restrictions
I presume this header was created during code translation from C*,
but it's not used or included anywhere.
2019-01-23 17:43:03 +02:00
Avi Kivity
c83ae62aed build: fix libdeflate object file corruption during parallel build
libdeflate's build places some object files in the source directory, which is
shared between the debug and release build. If the same object file (for the two
modes) is written concurrently, or if one more reads it while the other writes it,
it will be corrupted.

Fix by not building the executables at all. They aren't needed, and we already
placed the libraries' objects in the build directory (which is unshared). We only
need the libraries anyway.

Fixes #4130.
Branches: master, branch-3.0
Message-Id: <20190123145435.19049-1-avi@scylladb.com>
2019-01-23 15:32:17 +00:00
Nadav Har'El
76f1fcc346 cql3: really ensure retrieval of columns for filtering
Commit fd422c954e aimed to fix
issue #3803. In that issue, if a query SELECTed only certain columns but
did filtering (ALLOW FILTERING) over other unselected columns, the filtering
didn't work. The fix involved adding the columns being filtered to the set
of columns we read from disk, so they can be filtered.

But that commit included an optimization: If you have clustering keys
c1 and c2, and the query asks for a specific partition key and c1 < 3 and
c2 > 3, the "c1 < 3" part does NOT need to be filtered because it is already
done as a slice (a contiguous read from disk). The committed code erroneously
concluded that both c1 and c2 don't need to be filtered, which was wrong
(c2 *does* need to be read and filtered).

In this patch, we fix this optimization. Previously, we used the "prefix
length", which in the above example was 2 (both c1 and c2 were filtered)
but we need a new and more elaborate function,
num_prefix_columns_that_need_not_be_filtered(), to determine we can only
skip filtering of 1 (c1) and cannot skip the second.

Fixes #4121. This patch also adds a unit test to confirm this.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Message-Id: <20190123131212.6269-1-nyh@scylladb.com>
2019-01-23 15:24:30 +02:00
Avi Kivity
835ad406de tools: toolchain: update docker build command to include --no-cache
If docker sees the Dockerfile hasn't changed it may reuse an old image, not
caring that context files and dependent images have in fact changed. This can
happen for us if install-dependencies.sh or the base Fedora image changed.

To make sure we always get a correct image, add --no-cache to the build command.
Message-Id: <20190122185042.23131-1-avi@scylladb.com>
2019-01-23 10:47:40 +01:00
Glauber Costa
5d754c1d11 install-dependencies.sh: add packages that will be needed by scylla-python3
Done in a separate step so we can update the toolchain first.

dnf-utils is used to bring us repoquery, which we will use to derive the
list of files in the python packages.
patchelf is needed so we can add a DT_RUNPATH section to the interpreter
binary.
the python modules, as well as the python3 interpreter are taken from
the current RPM spec file.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
[avi: regenerate frozen toolchain image]
Message-Id: <20190123011751.14440-1-glauber@scylladb.com>
2019-01-23 10:53:10 +02:00
Avi Kivity
c1dd04986b Merge "Prepare for the switch to CMake-ified Seastar" from Jesse
"
This series prepares for the integration of the `master` branch of
Seastar back into Scylla.

A number of changes to the existing build are necessary to integrate
Seastar correctly, and these are detailed in the individual change
messages.

I tested with and without DPDK, in release and debug mode.

The actual switch is a separate patch.
"

* 'jhk/seastar_cmake/v4' of https://github.com/hakuch/scylla:
  build: Fix link order for DPDK
  tests: Split out `sstable_datafile_test`
  build: Remove unnecessary inclusion
  tests: Fix use-after-free errors in static vars
  build: Remove Seastar internals
  build: Only use Seastar flags from pkg-config
  build: Query Seastar flags using pkg-config
  build: Change parameters for `pkg_config` function
2019-01-23 10:33:00 +02:00
Duarte Nunes
88c7c1e851 Merge 'hinted handoff: cache cf mappings' from Vlad
"
Cache cf mappings when breaking in the middle of a segment sending so
that the sender has them the next time it wants to send this segment
for where it left off before.

Also add the "discard" metric so that we can track hints that are being
discarded in the send flow.
"

Fixes #4122

* 'hinted_handoff_cache_cf_mappings-v1' of https://github.com/vladzcloudius/scylla:
  hinted handoff: cache column family mappings for segments that were not sent out in full
  hinted handoff: add a "discarded" metric
2019-01-23 00:44:41 +00:00
Jesse Haber-Kucharsky
3d79bd25b2 build: Fix link order for DPDK
Without this change, DPDK libraries will not be linked to Scylla
correctly when we switch to the new pkg-config support in Seastar.
2019-01-22 18:25:01 -05:00
Jesse Haber-Kucharsky
cfb1492a6e tests: Split out sstable_datafile_test
Each `*_test.cc` file must be compiled separately so that there is only
one definition of `main`.

This change correctly defines an independent `sstable_datafile_test`
from `sstable_datafile_test.cc` and adds that test to the existing
suite.
2019-01-22 18:25:01 -05:00
Jesse Haber-Kucharsky
02dd7bcc82 build: Remove unnecessary inclusion 2019-01-22 18:25:01 -05:00
Jesse Haber-Kucharsky
2a62550002 tests: Fix use-after-free errors in static vars
Without these two variables being declared as TLS, executing these two
tests in "debug" mode fail AddressSanitizer's checks.
2019-01-22 18:24:52 -05:00
Jesse Haber-Kucharsky
88cc43d5e0 build: Remove Seastar internals
We don't need to re-specify Seastar internals in Scylla's build, since
everything private to Seastar is managed via pkg-config.

We can eliminate all references to ragel and generated ragel header
files from Seastar.

We can also simplify the dependence on generated Seastar header files by
ensuring that all object files depend on Seastar being built first.
2019-01-22 18:24:38 -05:00
Jesse Haber-Kucharsky
4f44e143be build: Only use Seastar flags from pkg-config
Some Seastar-specific flags were manually specified as Ninja rules, but
we want to rely exclusively on Seastar for its necessary flags.

The pkg-config file generated by the latest version of Seastar is
correct and allows us to do this, but the version generated by Scylla's
current check-out of Seastar does not. Therefore, we have to manually
adjust the pkg-config results temporarily until we update Seastar.
2019-01-22 18:24:38 -05:00
Jesse Haber-Kucharsky
8743cff59b build: Query Seastar flags using pkg-config
Previously, we manually parsed the pkg-config file. We now used
pkg-config itself to get the correct build flags.

This means that we will get the correct behavior for variable expansion,
and fields like `Requires`, `Requires.private`, and `Libs.private`.
Previously, these fields were ignored.
2019-01-22 18:24:38 -05:00
Vlad Zolotarov
34829b8f81 hinted handoff: cache column family mappings for segments that were not sent out in full
We will try to send a particular segment later (in 1s) from the place
where we left off if it wasn't sent out in full before. However we may miss
some of column family mappings when we get back to sending this file and
start sending from some entry in the middle of it (where we left off)
if we didn't save column family mappings we cached while reading this segment
from its begining.

This happens because commitlog doesn't save a column family information
in every entry but rather once for each uniq column family (version) per
"cycle" (see commitlog::segment description for more info).

Therefore we have to assume that a particular column family mapping
appears once in the whole segment (worst case). And therefore, when we
decide to resume sending a segment we need to keep the column family
mappings we accumulated so far and drop them only after we are done with
this particular segment (sent it out in full).

Fixes #4122

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-01-22 15:24:22 -05:00
Vlad Zolotarov
4516a8cfc4 hinted handoff: add a "discarded" metric
Account the amount of hints that were discarded in the send path.
This may happen for instance due to a schema change or because a hint
being to old.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-01-22 14:11:09 -05:00
Avi Kivity
fa0312d0f2 Merge "Support 64-bit gc_clock" from Benny
"
wrap around on 2038-01-19 03:14:07 UTC.  Such dates are valid deletion
times starting 2018-01-19 with the 20 years long maximum ttl.

This patchset extends gc_clock::duration::rep to int64_t and adds
respective unit tests for the max_ttl cases.

Fixes #3353

Tests: unit (release)
"

* 'projects/gc_clock_64/v2' of https://github.com/bhalevy/scylla:
  tests: cql_query_test add test_time_overflow
  gc_clock: make 64 bit
  sstables: mc: use int64_t for local_deletion_time and ttl
  sstables: add capped_tombstone_deletion_time stats counter
  sstables: mc: cap partition tombstone local_deletion_time to max
  sstables: add capped_local_deletion_time stats counter
  sstables: mc: metadata collector: cap local_deletion_time at max
  sstables: mc: use proper gc_clock types for local_deletion_time and ttl
  db: get default_time_to_live as int32_t rather than gc_clock::rep
  sstables: safely convert ttl and local_deletion_time to int32_t
  sstables: mc: move liveness_info initialization to members
  sstables: mc: move parsing of liveness_info deltas to data_consume_rows_context_m
  sstables: mc: define expired_liveness_ttl as signed int32_t
  sstables: mc: change write_delta_deletion_time to receive tombstone rather than deletion_time
  sstables: mc: use gc_clock types for writing delta ttl and local_deletion_time
2019-01-22 18:21:55 +02:00
Glauber Costa
54bc0ce70d scylla_setup: make sure it works (again) in interactive mode
Commit 019a2e3a27 marked some arguments as required, which improved
the usability of scylla_setup.

The problem is that when we call scylla_setup in interactive mode,
no argument should be required. After the aforementioned commit
scylla_setup will either complain that the required arguments were
not passed if zero arguments are present, or skip interactive mode
if one of the mandatory ones is present.

This patch fixes that by checking whether or not we were invoked with
no command line arguments and lifting the requirements for mandatory
arguments in that case.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190122003621.11156-1-glauber@scylladb.com>
2019-01-22 16:54:55 +02:00
Benny Halevy
7d0854a1e5 tests: cql_query_test add test_time_overflow
Test 32-bit time overflow scenarios.
Fails without "gc_clock: make 64 bit".

Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
93270dd8e0 gc_clock: make 64 bit
Fixes: #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
1ccd72f115 sstables: mc: use int64_t for local_deletion_time and ttl
In preparation for changing gc_clock::duration::rep to int64_t.

Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
427d6e6090 sstables: add capped_tombstone_deletion_time stats counter
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
0ec46924bf sstables: mc: cap partition tombstone local_deletion_time to max
deletion_time struct as int32_t deletion_time that cannot hold long
time values. Cap local_deletion_time to max_local_deletion_time and
log a warning about that,
This corresponds to Cassandra's MAX_DELETION_TIME.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
156f9ffa11 sstables: add capped_local_deletion_time stats counter
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
7609a04565 sstables: mc: metadata collector: cap local_deletion_time at max
max local_deletion_time_tracker in stats is int32_t so just track the limit
of (max int32_t - 1) if time_point is greater than the limit.
This corresponds to Cassandra's MAX_DELETION_TIME.

Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
bd6861989d sstables: mc: use proper gc_clock types for local_deletion_time and ttl
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
9878b36895 db: get default_time_to_live as int32_t rather than gc_clock::rep
Otherwise, value_cast<> throws std::bad_cast exception
when gc_clock::rep is defined as int64_t.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
33314cec3f sstables: safely convert ttl and local_deletion_time to int32_t
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
9a00c5a763 sstables: mc: move liveness_info initialization to members
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 13:36:35 +02:00
Benny Halevy
0aba922b6d sstables: mc: move parsing of liveness_info deltas to data_consume_rows_context_m
To be consistent with other calls to parse_* methods there.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 13:36:35 +02:00
Benny Halevy
6465a673f5 sstables: mc: define expired_liveness_ttl as signed int32_t
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 13:36:35 +02:00
Benny Halevy
c4c2133e3e sstables: mc: change write_delta_deletion_time to receive tombstone rather than deletion_time
mc format only writes delta local_deletion_time of tombstones.
Conventional deletion_time is written only for the partition header.

Restructure the code to pass a tombstone to write_delta_deletion_time
rather than struct deletion_time to prepare for using 64-bit deletion times.

The tombstone uses gc_clock::time_point while struct
deletion_time is limited to int32_t local_deletion_time.

Note that for "live" tombstones we encode <api::missing_timestamp,
no_deletion_time> as was previously evaluated by to_deletion_time().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 13:36:35 +02:00
Benny Halevy
820906b794 sstables: mc: use gc_clock types for writing delta ttl and local_deletion_time
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 13:36:35 +02:00
Tomasz Grabiec
dbc1894bd5 lsa: Avoid unnecessary compact_and_evict_locked()
When the reclaim request was satisfied from the pool there's no need
to call compact_and_evict_locked(). This allows us to avoid calling
boost::range::make_heap(), which is a tiny performance difference, as
well as some confusing log messages.

Message-Id: <1548091941-8534-1-git-send-email-tgrabiec@scylladb.com>
2019-01-21 20:19:20 +02:00
Jesse Haber-Kucharsky
72da3283b9 build: Change parameters for pkg_config function
We can invoke pkg-config with multiple options, and we specify the
package name first since this is the "target" of the pkg-config query.

Supporting multiple options is necessary for querying Seastar's
pkg-config file with `--static`, which we anticipate in a future change.
2019-01-21 11:38:25 -05:00
Glauber Costa
ca997b5f60 scylla_setup: warn users on the severity of answering no to IOTUne
The system won't work properly if IOTune is not run. While it is fair
to skip this step because it takes long-- indeed, it is common to provision
io.conf manually to be able to skip this step, first time users don't know
this and can have the impression that this is just a totally optional step.

Except the node won't boot up without it.

As a user nicely put recently in our mailing list:

"...in this case, it would be even simpler to forbid answering "no"
 to this not-so-optional step :)"

We should not forbid saying no to IOTune, but we should warn the user
about the consequences of doing so.

Fixes #4120

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190121144506.17121-1-glauber@scylladb.com>
2019-01-21 16:55:50 +02:00
Botond Dénes
4e89dea9ea database: don't allow access to global semaphores
Recently we had a bug (#4096) due to a component
(`multishard_mutation_query()`) assuming that all reads used the
semaphore obtainable via `database::user_read_concurrency_sem()`.
This problem revealed that it is plain wrong to allow access to the
shard-global semaphores residing in the database object. Instead all
code wishing to access the relevant semaphore for some read, should do
so via the relevant `table` object, thus guaranteeing that it will get
the correct semaphore, configured for that table.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4f3a6780eb3240822db34aba7c1ba0a675a96592.1547734212.git.bdenes@scylladb.com>
2019-01-21 16:29:02 +02:00
Piotr Sarna
5d76a635ca distributed_loader: migrate flush_upload_dir to thread
Flushing upload dir code suffers from overcomplication,
so in order to make it a little bit simpler, it's moved
to threaded context.

Refs #4118

Message-Id: <232cca077bae7116cfa87de9c9b4ba60efc2a01d.1548077720.git.sarna@scylladb.com>
2019-01-21 15:48:17 +02:00
Gleb Natapov
85cb09294e storage_service: do not start thrift and cql servers if a node is isolated due to errors
Scylla starts doing IO much earlier that it starts cql/thrift servers.
The IO may cause an error that will try stop all servers, but since they
are still not running it will do nothing, but servers will be started
later. Fix it by checking that the node is not isolated before starting
servers.

Message-Id: <20190110152830.GE3172@scylladb.com>
2019-01-21 13:04:23 +00:00
Tomasz Grabiec
e02baabd62 tests: perf_fast_forward: Introduce --with-compression option
Message-Id: <1547819062-4369-1-git-send-email-tgrabiec@scylladb.com>
2019-01-21 12:18:31 +00:00
Botond Dénes
ff2884f25b Revert "partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges"
A much simpler and more complete fix was found. Let's revert this before
applying the simpler fix.

This reverts commit 7049cd9374.
2019-01-21 13:56:56 +02:00
Botond Dénes
f229dff210 auth/service: unregister migration listener on stop()
Otherwise any event that triggers notification to this listener would
trigger a heap-use-after-free.

Refs: #4107

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <b6bbd609371a2312aed7571b05119d59c7d103d7.1548067626.git.bdenes@scylladb.com>
2019-01-21 13:06:59 +02:00
Tomasz Grabiec
d7c701d2d1 Merge "Type-erase gratuitous templates with functions" from Avi
Many area of the code are splattered with unneeded templates. This patchset replaces
some of them, where the template parameter is a function object, with an std::function
or noncopyable_function (with a preference towards the latter; but it is not always
possible). As the template is compiled for each instantiation (if the function
object is a lambda) while a function is compiled only once, there are significant
savings in compile time and bloat.

   text    data     bss     dec     hex filename
85160690          42120  284910 85487720        5187068 scylla.before
84824762          42120  284910 85151792        5135030 scylla.after

* https://github.com/avikivity/scylla detemplate/v2:
  api/commitlog: de-template acquire_cl_metric()
  database: de-template do_parse_schema_tables
  database: merge for_all_partitions and for_all_partitions_slow
  hints: de-template scan_for_hints_dirs()
  schema_tables: partially de-template make_map_mutation()
  distributed_loader: de-template
  tests: commitlog_test: de-template
  tests: cql_auth_query_test: de-template
  test: de-template eventually() and eventually_true()
  tests: flush_queue_test: de-template
  hint_test: de-template
  tests: mutation_fragment_test: de-template
  test: mutation_test: de-template
2019-01-21 11:32:22 +01:00
Avi Kivity
826cf90f3f Merge "Restore mutating uploaded sstables to level 0" from Piotr
"
This miniseries fixes the behaviour of distributed loader,
which now unconditionally mutates new sstables found in /upload
dir to LCS level 0 first, and only after that proceeds with
either queueing them for update generation or moving them
to data directory.
"

* 'restore_always_mutating_sstables_level_0' of https://github.com/psarna/scylla:
  distributed_loader: restore indentation
  distributed_loader: restore always mutating to level 0
2019-01-20 20:32:15 +02:00
Benny Halevy
844a2de263 sstables: mc: prevent signed integer overflow
Fix runtime error: signed integer overflow
introduced by 2dc3776407

Delta-encoded values may wrap around if the encoded value is
less than the base value.  This could happen in two places:
In the mc-format serialization header itself, where the base values are implicit
Cassandra epoch time, and in the sstables data files, where the base values
are taken from the encoding_stats (later written to the serialization_header).

In these cases, when the calculation is done using signed integer/long we may see
"runtime error: signed integer overflow" messages in debug mode
(with -fsanitize=undefined / -fsanitize=signed-integer-overflow).

Overflow here is expected and harmless since we do not gurantee that
neither the base values in the serialization header are greater than
or equal to Cassandra's epoch now that the delta-encoded values are
always greater than or equal to the respective base values in
the serialization header.

To prevent these warnings, the subtraction/addition should be done with unsigned
(two's complement) arithmetic and the result converted to the signed type.

Note that to keep the code simple where possible, when also rely on implicit
conversion of signed integers to unsigned when either one of added value is unsigned
and the other is signed.

Fixes: #4098

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190120142950.15776-1-bhalevy@scylladb.com>
2019-01-20 16:59:46 +02:00
Avi Kivity
1e5c09dbce test: mutation_test: de-template
Replace the with_column_family helper template with an ordinary funciton, to
reduce code bloat.
2019-01-20 15:55:20 +02:00
Avi Kivity
28db56df13 tests: mutation_fragment_test: de-template
The for_each_target() template is called four times, so making it a normal function
reduces a lot of code generation.
2019-01-20 15:55:20 +02:00
Avi Kivity
401684503d hint_test: de-template
While cl_test is duplicated with commitlog_test, at least deduplicate it internally
by converting it to an ordinary function.
2019-01-20 15:55:20 +02:00
Avi Kivity
208b0f80a4 tests: flush_queue_test: de-template
The internal test_propagation template is instantiated many times. Replace
with an oridinary function to reduce bloat. Call sites adjusted to have a
uniform signature.
2019-01-20 15:55:20 +02:00
Avi Kivity
2f36d30572 test: de-template eventually() and eventually_true()
These templates are not trivial and called many times. De-template them to
reduce code bloat.
2019-01-20 15:55:20 +02:00
Avi Kivity
96a8eacc3c tests: cql_auth_query_test: de-template
Replace the with_user() and verify_unauthorized_then_ok() templates with functions.
2019-01-20 15:55:20 +02:00
Avi Kivity
e0b0e18234 tests: commitlog_test: de-template
The cl_test function is called many times, so its contents are bloat. De-template
it so it is compiled only once.
2019-01-20 15:55:20 +02:00
Avi Kivity
baf9480c8d distributed_loader: de-template
distributed_loader has several large templates that can be converted to normal
function with the help of noncopyable_function<>, reducing code bloat.

One of the lambdas used as an actual argument was adjusted, because the de-templated
callee only accepts functions returning a future, while the original accepted both
functions returning a future and functions returning void (similar to future::then).
2019-01-20 15:55:20 +02:00
Avi Kivity
e0914a080e schema_tables: partially de-template make_map_mutation()
make_map_mutation() is called several times, hopfully with the same Map type
parameter. Replace the Func parameter with a noncopyable_function<>.
2019-01-20 15:55:20 +02:00
Avi Kivity
630f841e5b hints: de-template scan_for_hints_dirs()
This function is called twice, and is not doing anything performance critical,
so replace the template parameter Func with std::function<>.x
2019-01-20 15:55:20 +02:00
Avi Kivity
fae4c6c0b6 database: merge for_all_partitions and for_all_partitions_slow
for_all_partitions is only used in the implementation of for_all_partitions_slow,
so merge them and get rid of a template.
2019-01-20 15:55:20 +02:00
Avi Kivity
9858395c3e database: de-template do_parse_schema_tables
This long slow-path function is called four times, so de-templating it is an
easy win. We use std::function instead of noncopyable_function because the
function is copied within the parallel_for_each callback. The original code
uses a move, which is incorrect, but did not fail because moving the lambdas
that were used as the actual arguments is equivalent to a copy.
2019-01-20 15:55:18 +02:00
Tomasz Grabiec
c422bfc2c5 tests: perf_fast_forward: Store results for each dataset in separate sub-directory
Otherwise read test results for subsequent datasets will override each other.

Also, rename population test case to not include dataset name, which
is now redundant.

Message-Id: <1547822942-9690-1-git-send-email-tgrabiec@scylladb.com>
2019-01-20 15:38:46 +02:00
Botond Dénes
7049cd9374 partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges
When entering a new ck range (of the partition-slice), the partition
snapshot reader will apply to its range tombstones stream all the
tombstones that are relevant to the new ck range. When the partition has
range tombstones that overlap with multiple ck ranges, these will be
applied to the range tombstone stream when entering any of the ck ranges
they overlap with. This will result in the violation of the monotonicity
of the mutation fragments emitted by the reader, as these range
tombstones will be re-emitted on each ck range, if the ck range has at
least one clustering row they apply to.
For example, given the following partition:
    rt{[1,10]}, cr{1}, cr{2}, cr{3}...

And a partition-slice with the following ck ranges:
    [1,2], [3, 4]

The reader will emit the following fragment stream:
    rt{[1,10]}, cr{1}, cr{2}, rt{[1,10]}, cr{3}, ...

Note how the range tombstone is emitted twice. In addition to violating
the monotonicity guarantee, this can also result in an explosion of the
number of emitted range tombstones.

Fix by applying only those range tombstones to the range tombstone
stream, that have a position strictly greater than that of the last
emitted clustering row (or range tombstone), when entering a new ck
range.

Fixes: #4104

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <e047af76df75972acb3c32c7ef9bb5d65d804c82.1547916701.git.bdenes@scylladb.com>
2019-01-20 15:38:04 +02:00
Paweł Dziepak
14757d8a83 types: collection_type: drop tombstone if covered by higher-level one
At the moment are inefficiencies in how
collection_type_impl::mutation::compact_and_expire( handles tombstones.
If there is a higher-level tombstone that covers the collection one
(including cases where there is no collection tombstone) it will be
applied to the collection tombstone and present in the compaction
output. This also means that the collection tombstone is never dropped
if fully covered by a higher-level one.

This patch fixes both those problems. After the compaction the
collection tombstone is either unchanged or removed if covered by a
higher-level one.

Fixes #4092.

Message-Id: <20190118174244.15880-1-pdziepak@scylladb.com>
2019-01-20 15:32:34 +02:00
Avi Kivity
e51ef95868 Update seastar submodule
* seastar af6b797...7d620e1 (1):
  > perftune.py: don't let any exception out when connecting to AWS meta server

Fixes #4102.
2019-01-20 13:59:09 +02:00
Avi Kivity
32e79fc23b api/commitlog: de-template acquire_cl_metric()
Use std::function instead of a template parameter. Likely doesn't gain
anyting, because the template was always instantiated with the same type
(the result of std::bind() with the same signatures), but still good practice.

std::function was used instead of noncopyable_function because
sharded::map_reduce0() copies the input function.
2019-01-20 11:58:39 +02:00
Avi Kivity
6e6372e8d2 Revert "Merge "Type-eaese gratuitous templates with functions" from Avi"
This reverts commit 31c6a794e9, reversing
changes made to 4537ec7426. It causes bad_function_calls
in some situations:

INFO  2019-01-20 01:41:12,164 [shard 0] database - Keyspace system: Reading CF sstable_activity id=5a1ff267-ace0-3f12-8563-cfae6103c65e version=d69820df-9d03-3cd0-91b0-c078c030b708
INFO  2019-01-20 01:41:13,952 [shard 0] legacy_schema_migrator - Moving 0 keyspaces from legacy schema tables to the new schema keyspace (system_schema)
INFO  2019-01-20 01:41:13,958 [shard 0] legacy_schema_migrator - Dropping legacy schema tables
INFO  2019-01-20 01:41:14,702 [shard 0] legacy_schema_migrator - Completed migration of legacy schema tables
ERROR 2019-01-20 01:41:14,999 [shard 0] seastar - Exiting on unhandled exception: std::bad_function_call (bad_function_call)
2019-01-20 11:32:14 +02:00
Paweł Dziepak
e212d37a8a utils/small_vector: fix leak in copy assignment slow path
Fixes #4105.

Message-Id: <20190118153936.5039-1-pdziepak@scylladb.com>
2019-01-18 17:49:46 +02:00
Paweł Dziepak
23cfb29fea Merge "compaction: mc: re-calculate encoding_stats" from Benny
"
Use input sstables stats metadata to re-calculate encoding_stats.

Fixes #3971.
"

* 'projects/compaction-encoding-stats/v3' of https://github.com/bhalevy/scylla:
  compaction: mc: re-calculate encoding_stats based on column stats
  memtable: extract encoding_stats_collector base class to encoding_stats header file
2019-01-18 14:36:17 +00:00
Tomasz Grabiec
7308effb45 tests: flat_mutation_reader_test: Drop unneeded includes
Message-Id: <1547819118-4645-1-git-send-email-tgrabiec@scylladb.com>
2019-01-18 13:58:05 +00:00
Tomasz Grabiec
6461e085fe managed_bytes: Fix compilation on gcc 8.2
The compilation fails on -Warray-bounds, even though the branch is never taken:

    inlined from ‘managed_bytes::managed_bytes(bytes_view)’ at ./utils/managed_bytes.hh:195:22,
    inlined from ‘managed_bytes::managed_bytes(const bytes&)’ at ./utils/managed_bytes.hh:162:77,
    inlined from ‘dht::token dht::bytes_to_token(bytes)’ at dht/random_partitioner.cc:68:57,
    inlined from ‘dht::token dht::random_partitioner::get_token(bytes)’ at dht/random_partitioner.cc:85:39:
/usr/include/c++/8/bits/stl_algobase.h:368:23: error: ‘void* __builtin_memmove(void*, const void*, long unsigned int)’ offset 16 from the object at ‘<anonymous>’ is out of the bounds of referenced subobject ‘managed_bytes::small_blob::data’ with type ‘signed char [15]’ at offset 0 [-Werror=array-bounds]
      __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
      ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Work around by disabling the diagnostic locally.
Message-Id: <1547205350-30225-1-git-send-email-tgrabiec@scylladb.com>
2019-01-18 13:48:05 +00:00
Tomasz Grabiec
31c6a794e9 Merge "Type-eaese gratuitous templates with functions" from Avi
Many area of the code are splattered with unneeded templates. This patchset replaces
some of them, where the template parameter is a function object, with an std::function
or noncopyable_function (with a preference towards the latter; but it is not always
possible). As the template is compiled for each instantiation (if the function
object is a lambda) while a function is compiled only once, there are significant
savings in compile time and bloat.

   text    data     bss     dec     hex filename
85160690          42120  284910 85487720        5187068 scylla.before
84824762          42120  284910 85151792        5135030 scylla.after

* https://github.com/avikivity/scylla detemplate/v1:
  api/commitlog: de-template acquire_cl_metric()
  database: de-template do_parse_schema_tables
  database: merge for_all_partitions and for_all_partitions_slow
  hints: de-template scan_for_hints_dirs()
  schema_tables: partially de-template make_map_mutation()
  distributed_loader: de-template
  tests: commitlog_test: de-template
  tests: cql_auth_query_test: de-template
  test: de-template eventually() and eventually_true()
  tests: flush_queue_test: de-template
  hint_test: de-template
  tests: mutation_fragment_test: de-template
  test: mutation_test: de-template
2019-01-18 11:42:01 +01:00
Piotr Sarna
3d65eb5d4a distributed_loader: restore indentation 2019-01-18 10:59:37 +01:00
Piotr Sarna
e50e9b5150 distributed_loader: restore always mutating to level 0
When introducing view update generation path for sstables
in /upload directory, mutating these sstables was moved
to regular path only. It was wrong, because sstables that
need view updates generated from them may still need
to be downgraded to LCS level 0, so they won't disrupt
LCS assumptions after being loaded.

Reported-by: Nadav Har'El <nyh@scylladb.com>
2019-01-18 10:35:20 +01:00
Avi Kivity
089931fb56 test: mutation_test: de-template
Replace the with_column_family helper template with an ordinary funciton, to
reduce code bloat.
2019-01-17 19:06:42 +02:00
Avi Kivity
53a3db9446 tests: mutation_fragment_test: de-template
The for_each_target() template is called four times, so making it a normal function
reduces a lot of code generation.
2019-01-17 19:05:48 +02:00
Avi Kivity
4a21de4592 hint_test: de-template
While cl_test is duplicated with commitlog_test, at least deduplicate it internally
by converting it to an ordinary function.
2019-01-17 19:03:31 +02:00
Avi Kivity
1f02fd3ff6 tests: flush_queue_test: de-template
The internal test_propagation template is instantiated many times. Replace
with an oridinary function to reduce bloat. Call sites adjusted to have a
uniform signature.
2019-01-17 19:02:26 +02:00
Avi Kivity
63077501ed test: de-template eventually() and eventually_true()
These templates are not trivial and called many times. De-template them to
reduce code bloat.
2019-01-17 19:00:55 +02:00
Avi Kivity
a5d3254ed3 tests: cql_auth_query_test: de-template
Replace the with_user() and verify_unauthorized_then_ok() templates with functions.
Some adjustments made to the call site to unify the signatures.
2019-01-17 18:59:30 +02:00
Avi Kivity
8c05debecb tests: commitlog_test: de-template
The cl_test function is called many times, so its contents are bloat. De-template
it so it is compiled only once.
2019-01-17 18:57:35 +02:00
Avi Kivity
b6239134c2 distributed_loader: de-template
distributed_loader has several large templates that can be converted to normal
function with the help of noncopyable_function<>, reducing code bloat.
2019-01-17 18:56:22 +02:00
Avi Kivity
2407c35cc1 schema_tables: partially de-template make_map_mutation()
make_map_mutation() is called several times, hopfully with the same Map type
parameter. Replace the Func parameter with a noncopyable_function<>.
2019-01-17 18:54:43 +02:00
Avi Kivity
81d004b2c0 hints: de-template scan_for_hints_dirs()
This function is called twice, and is not doing anything performance critical,
so replace the template parameter Func with std::function<>.x
2019-01-17 18:51:46 +02:00
Avi Kivity
f61dbc9855 database: merge for_all_partitions and for_all_partitions_slow
for_all_partitions is only used in the implementation of for_all_partitions_slow,
so merge them and get rid of a template.
2019-01-17 18:50:36 +02:00
Avi Kivity
4568a4e4b0 database: de-template do_parse_schema_tables
This long slow-path function is called four times, so de-templating it is an
easy win.
2019-01-17 18:48:57 +02:00
Avi Kivity
08bd28942b api/commitlog: de-template acquire_cl_metric()
Use noncopyable_function instead of a template parameter. Likely doesn't gain
anyting, because the template was always instantiated with the same type
(the result of std::bind() with the same signatures), but still good practice.
2019-01-17 18:45:14 +02:00
Botond Dénes
4537ec7426 mutlishard_mutation_query(): use correct reader concurrency semaphore
The multishard mutation query used the semaphore obtained from
`database::user_read_concurrency_sem()` to pause-resume shard readers.
This presented a problem when `multishard_mutation_query()` was reading
from system tables. In this case the readers themselves would obtain
their permits from the system read concurrency semaphore. Since the
pausing of shard readers used the user read semaphore, pausing failed to
fulfill its objective of alleviating pressure on the semaphore the reads
obtained their permits from. In some cases this lead to a deadlock
during system reads.
To ensure the correct semaphore is used for pausing-resuming readers,
obtain the semaphore from the `table` object. To avoid looking up the
table on every pause or resume call, cache the semaphores when readers
are created.

Fixes: #4096

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <c784a3cd525ce29642d7216fbe92638fa7884e88.1547729119.git.bdenes@scylladb.com>
2019-01-17 15:19:59 +02:00
Avi Kivity
8e9989685d scyllatop: complete conversion to python3
d2dbbba139 converted scyllatop's interperter to Python 3, but neglected to do
the actual conversion. This patch does so, by running 2to3 over allfiles and adding
an additional bytes->string decode step in prometheus.py. Superfluous 2to3 changes
to print() calls were removed.
Message-Id: <20190117124121.7409-1-avi@scylladb.com>
2019-01-17 12:50:25 +00:00
Duarte Nunes
7505815013 Merge 'Fix filtering with LIMIT and paging' from Piotr
"
Before this series the limit was applied per page instead
of globally, which might have resulted in returning too many
rows.

To fix that:
 1. restrictions filter now has a 'remaining' parameter
    in order to stop accepting rows after enough of them
    have already been accepted
 2. pager passes its row limit to restrictions filter,
    so no more rows than necessary will be served to the client
 3. results no longer need to be trimmed on select_statement
    level

Tests: unit (release)
"

* 'fix_filtering_limit_with_paging_3' of https://github.com/psarna/scylla:
  tests: add filtering+limit+paging test case
  tests: allow null paging state in filtering tests
  cql3: fix filtering with LIMIT with regard to paging
2019-01-17 12:50:00 +00:00
Piotr Sarna
ed7328613f tests: add filtering+limit+paging test case
A test case that checks whether a combination of paging
and LIMIT clause for filtering queries doesn't return
with too many rows.

Refs #4100
2019-01-17 13:25:10 +01:00
Piotr Sarna
7d4f994e98 tests: allow null paging state in filtering tests
Previously the utility to extract paging state asserted
that the state exists, but in future tests it would be useful
to be able to call this function even if it would return null.
2019-01-17 13:25:10 +01:00
Piotr Sarna
87c23372fb cql3: fix filtering with LIMIT with regard to paging
Previously the limit was erroneously applied per page
instead of being accumulated, which might have caused returning
too many rows. As of now, LIMIT is handled properly inside
restrictions filter.

Fixes #4100
2019-01-17 13:25:09 +01:00
Piotr Sarna
02d88de082 db,view: add consuming units in staging table registration
View update generator service can accept sstables even before it starts,
but it should still acknowledge the number of waiters in the semaphore.

Reported-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <fcaa0f2884ebb4d34d1716e9e1cfed0642b4b85d.1547661048.git.sarna@scylladb.com>
2019-01-16 18:05:17 +00:00
Benny Halevy
1d483bc424 compaction: mc: re-calculate encoding_stats based on column stats
When compacting several sstables, get and merge their encoding_stats
for encoding the result.

Introduce sstable::get_encoding_stats_for_compaction to return encoding_stats
based on the sstable's column stats.

Use encoding_stats_collector to keep track of the minimum encoding_stats
values of all input sstables.

Fixes #3971

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-16 17:59:59 +02:00
Benny Halevy
e2c4d2d60a memtable: extract encoding_stats_collector base class to encoding_stats header file
To be used also by compaction.

Refs #3971

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-16 17:59:58 +02:00
Asias He
4b9e1a9f1d repair: Add row level metrics
Number of rows sent and received
- tx_row_nr
- rx_row_nr

Bytes of rows sent and received
- tx_row_bytes
- rx_row_bytes

Number of row hashes sent and received
- tx_hashes_nr
- rx_hashes_nr

Number of rows read from disk
- row_from_disk_nr

Bytes of rows read from disk
- row_from_disk_bytes

Message-Id: <d1ee6b8ae8370857fe45f88b6c13087ea217d381.1547603905.git.asias@scylladb.com>
2019-01-16 14:04:57 +02:00
Duarte Nunes
04a14b27e4 Merge 'Add handling staging sstables to /upload dir' from Piotr
"
This series adds generating view updates from sstables added through
/upload directory if their tables have accompanying materialized views.
Said sstables are left in /upload directory until updates are generated
from them and are treated just like staging sstables from /staging dir.
If there are no views for a given tables, sstables are simply moved
from /upload dir to datadir without any changes.

Tests: unit (release)
"

* 'add_handling_staging_sstables_to_upload_dir_5' of https://github.com/psarna/scylla:
  all: rename view_update_from_staging_generator
  distributed_loader: fix indentation
  service: add generating view updates from uploaded sstables
  init: pass view update generator to storage service
  sstables: treat sstables in upload dir as needing view build
  sstables,table: rename is_staging to requires_view_building
  distributed_loader: use proper directory for opening SSTable
  db,view: make throttling optional for view_update_generator
2019-01-15 18:19:27 +00:00
Duarte Nunes
9b79f0f58b Merge 'Add stream phasing' from Piotr
"
This series addresses the problem mentioned in issue 4032, which is a race
between creating a view and streaming sstables to a node. Before this patch
the following scenario is possible:
 - sstable X arrives from a streaming session
 - we decide that view updates won't be generated from an sstable X
   by the view builder
 - new view is created for the table that owns sstable X
 - view builder doesn't generate updates from sstable X, even though the table
   has accompanying views - which is an inconsistency

This race is fixed by making the view builder wait for all ongoing streams,
just like it does for reads and writes. It's implemented with a phaser.

Tests:
unit (release)
dtest(not merged yet: materialized_views_test.TestMaterializedViews.stream_from_repair_during_build_process_test)
"

* 'add_stream_phasing_2' of https://github.com/psarna/scylla:
  repair: add stream phasing to row level repair
  streaming: add phasing incoming streams
  multishard_writer: add phaser operation parameter
  view: wait for stream sessions to finish before view building
  table: wait for pending streams on table::stop
  database: add pending streams phaser
2019-01-15 18:18:40 +00:00
Piotr Sarna
0eb703dc80 all: rename view_update_from_staging_generator
The new name, view_update_generator, is both more concise
and correct, since we now generate from directories
other than "/staging".
2019-01-15 17:31:47 +01:00
Piotr Sarna
a5d24e40e0 distributed_loader: fix indentation
Bad indentation was introduced in the previous commit.
2019-01-15 17:31:37 +01:00
Piotr Sarna
13c8c84045 service: add generating view updates from uploaded sstables
SSTables loaded to the system via /upload dir may sometimes be needed
to generate view updates from them (if their table has accompanying
views).

Fixes #4047
2019-01-15 17:31:37 +01:00
Piotr Sarna
46305861c3 init: pass view update generator to storage service
Storage service needs to access view update generator in order
to register staging sstables from /upload directory.
2019-01-15 17:31:36 +01:00
Piotr Sarna
13f6453350 sstables: treat sstables in upload dir as needing view build
In some cases, sstables put in the upload dir should have view updates
generated from them. In order to avoid moving them across directories
(which then involves handling failure paths), upload dir will also be
treated as a valid directory where staging sstables reside.
Regular sstables that are not needed for view updates will be
immediately moved from upload/ dir as before.
2019-01-15 16:47:01 +01:00
Piotr Sarna
09401e0e71 sstables,table: rename is_staging to requires_view_building
A generalized name will be more fitting once we treat uploaded sstables
as requiring view building too.
2019-01-15 16:47:01 +01:00
Piotr Sarna
76616f6803 distributed_loader: use proper directory for opening SSTable
Previous implementation assumes that each SSTable resides directly
in table::datadir directory, while what should actually be used
is directory path from SSTable descriptor.
This patch prevents a regression when adding staging sstables support
for upload/ dir.
2019-01-15 16:47:01 +01:00
Piotr Sarna
beb4836726 db,view: make throttling optional for view_update_generator
Currently registering new view updates is throttled by a semaphore,
which makes sense during stream sessions in order to avoid overloading
the queue. Still, registration also occurs during initialization,
where it makes little sense to wait on a semaphore, since view update
generator might not have started at all yet.
2019-01-15 16:47:01 +01:00
Paweł Dziepak
635873639b Merge "Encoding stats enhancements" from Benny
"
Cleanup various cases related to updating of metatdata stats and encoding stats
updating in preparation for 64-bit gc_clock (#3353).

Fixes #4026
Fixes #4033
Fixes #4035
Fixes #4041

Refs #3353
"

* 'projects/encoding-stats-fixes/v6' of https://github.com/bhalevy/scylla:
  sstables: remove duplicated code in data_consume_rows_context CELL_VALUE_BYTES
  sstables: mc: use api::timestamp_type in write_liveness_info
  sstables: mc: sstable_write encoding_stats are const
  mp_row_consumer_k_l::consume_deleted_cell rename ttl param to local_deletion_time
  memtable: don't use encoding_stats epochs as default
  memtable: mc: udpate min_ttl encoding stats for dead row marker
  memtable: mc: add comment regarding updating encoding stats of collection tombstones
  sstables: metadata_collector: add update tombstone stats
  sstables: assert that delete_time is not live when updating stats
  sstables: move update_deletion_time_stats to metadata collector
  sstables: metadata_collector: introduce update_local_deletion_time_and_tombstone_histogram
  sstables: mc: write_liveness_info and write_collection should update tombstone_histogram
  sstables: update_local_deletion_time for row marker deletion_time and expiration
2019-01-15 16:53:36 +02:00
Tomasz Grabiec
32f711ce56 row_cache: Fix crash on memtable flush with LCS
Presence checker is constructed and destroyed in the standard
allocator context, but the presence check was invoked in the LSA
context. If the presence checker allocates and caches some managed
objects, there will be alloc-dealloc mismatch.

That is the case with LeveledCompactionStrategy, which uses
incremental_selector.

Fix by invoking the presence check in the standard allocator context.

Fixes #4063.

Message-Id: <1547547700-16599-1-git-send-email-tgrabiec@scylladb.com>
2019-01-15 16:53:36 +02:00
Piotr Sarna
08a42d47a5 repair: add stream phasing to row level repair
In order to allow other services to wait for incoming streams
to finish, row level repair uses stream phasing when creating
new sstables from incoming data.

Fixes scylladb#4032
2019-01-15 10:28:21 +01:00
Piotr Sarna
7e61f02365 streaming: add phasing incoming streams
Incoming streams are now phased, which can be leveraged later
to wait for all ongoing streams to finish.

Refs #4032
2019-01-15 10:28:15 +01:00
Asias He
1cc7e45f44 database: Make log max_vector_size and internal_count debug level
It is useful for developers but not useful for users. Make it debug
level.

Message-Id: <775ce22d6f8088a44d35601509622a7e73ddeb9b.1547524976.git.asias@scylladb.com>
2019-01-15 11:02:30 +02:00
Piotr Sarna
238003b773 multishard_writer: add phaser operation parameter
Multishard writer can now accept a phaser operation parameter
in order to sustain a phased operation (e.g. a streaming session).
2019-01-15 10:02:22 +01:00
Piotr Sarna
b9203ec4f8 view: wait for stream sessions to finish before view building
During streaming, there's a race between streamed sstables
and view creation, which might result in some tables not being
used to generate view updates, even though they should.
That happens when the decision about view update path for a table
is done before view creation, but after already receiving some sstables
via streaming. These will not be used in view building even though
they should.
Hence, a phaser is used to make the view builder wait for all ongoing
stream sessions for a table to finish before proceeding with build steps.

Refs #4032
2019-01-15 09:36:55 +01:00
Piotr Sarna
d3a8fb378c table: wait for pending streams on table::stop
Stream sessions are now phased, so it's possible to wait for existing
streams to finish gently before stopping a table.
2019-01-15 09:36:55 +01:00
Piotr Sarna
8a5aaf2839 database: add pending streams phaser
This phaser will be used later to wait for all existing stream sessions
to finish before proceeding with view building.
2019-01-15 09:36:55 +01:00
Nadav Har'El
9062750089 scylla_util.py: make view_hints_directory setting optional
It is optional to set "view_hints_directory", so we shouldn't insist that
it is defined in scylla.yaml on upgrade.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190114125225.10794-1-nyh@scylladb.com>
2019-01-14 14:59:20 +02:00
Benny Halevy
238866228f memtable: rename get_stats to get_encoding_stats
For symmetry reasons to similar sstable and compaction methods.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190113105155.29118-2-bhalevy@scylladb.com>
2019-01-14 14:58:43 +02:00
Avi Kivity
df090a15ff Merge "Add counters for inactive reads" from Botond
"
This mini-series adds counters for the inactive reads registered in the
reader concurrency semaphore.
"

* 'reader-concurrency-semaphore-counters/v6' of https://github.com/denesb/scylla:
  tests/querier_cache: use stats to get the no. of inactive reads
  reader_concurrency_semaphore: add counters for inactive reads
2019-01-14 11:56:43 +02:00
Rafael Ávila de Espíndola
acd6999ba9 Don't use SEASTAR_HAVE_LZ4_COMPRESS_DEFAULT in scylla
The existence of LZ4_compress_default is a property of the lz4
library, not seastar.

With this patch scylla does its own configure check instead of
depending on the one done by seastar.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190114013737.5395-1-espindola@scylladb.com>
2019-01-14 11:51:20 +02:00
Rafael Ávila de Espíndola
684fb607c4 sstable: handle missing index entry
This patch fixes a crash when the index file is corrupted and we get
an empty index entry list.

Tests: unit (release)

Fixes: 2532

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190110202833.29333-1-espindola@scylladb.com>
2019-01-14 10:47:21 +01:00
Avi Kivity
f5ee466a1c Merge "Cleanup UDT and tuple names creation" from Piotr
"
Currently the logic is scattered between types.*, cql3_types.* and
sstables/mc/writer.cc.

This patchset places all the logic in types.* and makes sure we
correctly add "frozen<...>" and "FrozenType(...)" to the names of
tuples and UDTs.

Fixes #4087

Tests: unit(release)
"

* 'haaawk/4087_v1' of github.com:scylladb/seastar-dev:
  Add comment explaining tuple type name creation
  Add "FrozenType(...)" to UDT name only when it's frozen
  Move "FrozenType(...)" addition to UDT name to user_type_impl
  Add "frozen<...>" to tuple CQL name only when it's frozen
  Move "frozen<...>" addition to tuple CQL name to tuple_type_impl
  Merge make_cql3_tuple_type into tuple_type_impl::as_cql3_type
  Add "frozen<...>" to UDT CQL name only when it's frozen
  Move "frozen<...>" addition to UDT CQL name to user_type_impl
2019-01-13 15:34:24 +02:00
Benny Halevy
b243852a70 sstables: remove duplicated code in data_consume_rows_context CELL_VALUE_BYTES
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
d9e2aa65fc sstables: mc: use api::timestamp_type in write_liveness_info
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
7ea96aa778 sstables: mc: sstable_write encoding_stats are const
Encoding stats are immutable once statistics are sealed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
5d2d2bf47a mp_row_consumer_k_l::consume_deleted_cell rename ttl param to local_deletion_time
It is actually the local deletion time rather than the ttl

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
2c99eb28d8 memtable: don't use encoding_stats epochs as default
Why default to an artificial minimum when you can do better
with zero effort? Track the actual minima in the memtable instead.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
9b78911379 memtable: mc: udpate min_ttl encoding stats for dead row marker
Update min ttl with expired_liveness_ttl (although it's value of max int32
is not expected to affect the minimum).

Fixes #4041

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
47964d9ddc memtable: mc: add comment regarding updating encoding stats of collection tombstones
When the row flag has_complex_deletion is set, some collection columns may have
deletion tombstones and some may not. we don't strictly need to update stats
will not affect the encoding_stats anyway.

Fixes #4035

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
75ccd29b6a sstables: metadata_collector: add update tombstone stats
Conditionally update timestamp and local_deletion_time stats based on tombstone

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
0ae85a126a sstables: assert that delete_time is not live when updating stats
Be compatible with Cassandra

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
12e6b503c9 sstables: move update_deletion_time_stats to metadata collector
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
2989b986ef sstables: metadata_collector: introduce update_local_deletion_time_and_tombstone_histogram
Refs #4026
Refs #4033

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
bcb1fcd402 sstables: mc: write_liveness_info and write_collection should update tombstone_histogram
Fixes #4033

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
0ca4ae658c sstables: update_local_deletion_time for row marker deletion_time and expiration
Fixes #4026

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Tomasz Grabiec
f12a3e2066 sstables: index_reader: Rename _promoted_index_size
Message-Id: <1547219234-21182-2-git-send-email-tgrabiec@scylladb.com>
2019-01-13 11:29:13 +02:00
Tomasz Grabiec
6c5f8e0eda sstables: index_reader: Simplify offset calculations
Now that continuous_data_consumer::position() is meaningful (since
36dd660), we can use our position in the stream to calculate offsets
instead of duplicating state machine in offset calculations.

The value of position() - data.size() always holds the current offset
in the stream.
Message-Id: <1547219234-21182-1-git-send-email-tgrabiec@scylladb.com>
2019-01-13 11:29:12 +02:00
Avi Kivity
0d52bdcbad install-dependencies.sh: unwrap long lines
Put package names one per line. This makes it easier to review changes,
and to backport changes to this file. No content changes.

Message-Id: <20190112091024.21878-1-avi@scylladb.com>
2019-01-12 14:23:27 +02:00
Avi Kivity
391d1e0fe0 table: const correctness for table::get_sstables() and related
Do not allow write access to the sstable list via this accessor. Luckily
there are no violations, and now we enforce it.
Message-Id: <20190111151049.16953-1-avi@scylladb.com>
2019-01-11 17:39:17 +01:00
Rafael Ávila de Espíndola
cd9ce18874 sstable: rename the is_boundary predicate
The new name makes it clear what is on either side of the boundary.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190110221324.33618-1-espindola@scylladb.com>
2019-01-11 14:36:49 +02:00
Piotr Jastrzebski
96b880f81c Add comment explaining tuple type name creation
To keep format compatibiliti we never wrap tuple type name
into "org.apache.cassandra.db.marshal.FrozenType(...)".
Even when the tuple is frozen.
This patch adds a comment in tuple_type_impl::make_name that
explains the situation.

For more details see #4087

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 12:14:26 +01:00
Piotr Jastrzebski
57e655d716 Add "FrozenType(...)" to UDT name only when it's frozen
At the moment Scylla supports only frozen UDTs but
the code should be able to handle non-frozen UDTs as well.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 12:08:02 +01:00
Piotr Jastrzebski
fc17bd376b Move "FrozenType(...)" addition to UDT name to user_type_impl
This logic belongs in types.hh/types.cc layer.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 12:07:47 +01:00
Piotr Jastrzebski
1fdfc461b8 Add "frozen<...>" to tuple CQL name only when it's frozen
At the moment Scylla supports only frozen tuples but
the code should be able to handle non-frozen tuples as well.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 11:14:30 +01:00
Piotr Jastrzebski
749eee2711 Move "frozen<...>" addition to tuple CQL name to tuple_type_impl
This logic belongs in types.hh/types.cc layer.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 11:14:30 +01:00
Piotr Jastrzebski
7aba17de2c Merge make_cql3_tuple_type into tuple_type_impl::as_cql3_type
This logic belongs in types.hh/types.cc layer.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 11:14:30 +01:00
Piotr Jastrzebski
56060573bb Add "frozen<...>" to UDT CQL name only when it's frozen
At the moment Scylla supports only frozen UDTs but
the code should be able to handle non-frozen UDTs as well.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 11:14:30 +01:00
Piotr Jastrzebski
a928c103c2 Move "frozen<...>" addition to UDT CQL name to user_type_impl
This logic belongs in types.hh/types.cc layer.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 11:09:00 +01:00
Raphael S. Carvalho
1b7cad3531 database: Fix race condition in sstable snapshot
Race condition takes place when one of the sstables selected by snapshot
is deleted by compaction. Snapshot fails because it tries to link a
sstable that was previously unlinked by compaction's sstable deletion.

Fixes #4051.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190110194048.26051-1-raphaelsc@scylladb.com>
2019-01-11 07:53:14 +02:00
Benny Halevy
2dc3776407 sstables: mc: sign-extend serialization_header min_local_deletion_time_base and min_ttl_base
Refs #4074
Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190110141439.1324-1-bhalevy@scylladb.com>
2019-01-10 16:23:20 +02:00
Gleb Natapov
a29182b447 sstable: fix use after free while applying extensions in sstable::open_file
sstable_file_io_extensions() return an array of pointers to extensions,
but do_for_each() may defer and the array will be destroyed. The match
keeps it alive until do_for_each completes.

Message-Id: <20190110125656.GC3172@scylladb.com>
2019-01-10 15:10:06 +02:00
Avi Kivity
b247ce01c3 table: restore indentation after changes to table::make_sstable_reader
Message-Id: <20190109175804.9352-2-avi@scylladb.com>
2019-01-10 13:00:53 +01:00
Avi Kivity
3d6be2f822 table: reduce duplication in table::make_sstable_reader
make_sstable_reader needs to deal with single-key and scanning reads, and
with restricting and non-restricting (in terms of read concurrency) readers.
Right now it does this combinatorically - there are separate cases for
restricting single-key reads, non-restricting single-key reads, restricing
scans, and non-restricting scans.

This makes further changes more complicated, so separate the two concepts.
The patch splits the code into two stages; the first selects between a single-key
and a scan, and the second selects between a restricting and non-restricting read.

This slightly pessimizes non-restricting reads (a mutation_source is created and
immediately destroyed), but that's not the common case.

Tests: unit(release)
Message-Id: <20190109175804.9352-1-avi@scylladb.com>
2019-01-10 13:00:40 +01:00
Benny Halevy
16dda033a5 sstables: row_marker: initialize _expiry
compare_row_marker_for_merge compares deletion_time also for row markers
that have missing timestamps.  This happened to succeed due to implicit
initialization to 0. However, we prefer the initialization to be explicit
and allow calling row_marker::deletion_time() in all states.

Fixes #4068

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190110102949.17896-1-bhalevy@scylladb.com>
2019-01-10 12:45:07 +01:00
Avi Kivity
4a6aeced59 Merge "Fix UDTs representation in serialization header" from Piotr
"
Tests: unit(release)
"

Fixes #4073.

* commit 'FETCH_HEAD~1':
  Add test for serialization header with UDT
  Fix UDT names in serialization header
2019-01-10 12:57:11 +02:00
Piotr Jastrzebski
d4bc5b64cf Add test for serialization header with UDT
Serialization header stores column types for all
columns in sstable. If any of them is a UDT then it
has to be wrapped into
"org.apache.cassandra.db.marshal.FrozenType(...)".

This patch adds a test case to verify that.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-10 10:59:01 +01:00
Piotr Jastrzebski
3de85aebc9 Fix UDT names in serialization header
Serialization header stores type names of all
columns in a table. Including partition key columns,
clustering key columns, static columns and regular columns.

If one of those types is a user defined type then we need to
wrap its name into
"org.apache.cassandra.db.marshal.FrozenType(...)".

Fixes #4073

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-10 10:58:30 +01:00
Benny Halevy
60323b79d1 sstables: mc: sign-extend delta local_deletion_time and delta ttl
Follow Cassandra's encoding so that values that are less than the
baseline encoding_stats will wrap-around in 64-bits rather tham 32.

Fixes #4074
Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190109192703.18371-1-bhalevy@scylladb.com>
2019-01-09 21:43:30 +02:00
Rafael Ávila de Espíndola
26ac2c23ef Change *_row_* names that refer to partitions
This renames some variables and functions to make it clear that they
refer to partitions and not rows.

Old versions of sstablemetadata used to refer to a row histogram, but
current versions now mention a partition histogram instead.

This patch doesn't change the exposed API names.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181229223311.4184-2-espindola@scylladb.com>
2019-01-09 14:53:42 +02:00
Takuya ASADA
f00e9051ea reloc: show error message when relocatable package doesn't exist
Both build_rpm.sh/build_deb.sh are failing at beginning of the script
when relocatable package does not exist, need to prevent it and show
user friendly message.

Fixes #4071

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190109094353.16690-1-syuu@scylladb.com>
2019-01-09 12:53:08 +02:00
Raphael S. Carvalho
f5301990fc compaction: release reference of cleaned sstable in compaction manager
Compaction manager holds reference to all cleaning sstables till the very
end, and that becomes a problem because disk space of cleaned sstables
cannot be reclaimed due to respective file descriptors opened.

Fixes #3735.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181221000941.15024-1-raphaelsc@scylladb.com>
2019-01-08 14:14:01 +02:00
Duarte Nunes
fa2b0384d2 Replace std::experimental types with C++17 std version.
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.

Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.

Scylla now requires GCC 8 to compile.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
2019-01-08 13:16:36 +02:00
Rafael Ávila de Espíndola
51a08c3240 sstable: remove constexpr from run time predicates
We never check these predicates at compile time.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190108010055.92042-1-espindola@scylladb.com>
2019-01-08 12:28:42 +02:00
Piotr Sarna
c5346cdf9b database, table: split table-related code to table.cc
All table:: related code is moved to table.cc source file,
which splits database.cc size in half and thus allows
faster compilation on multiple cores.

Refs #1

Message-Id: <28e67f7793ff2147ffce18df5e0b077e14d3b8bd.1546940360.git.sarna@scylladb.com>
2019-01-08 12:02:42 +02:00
Avi Kivity
8ecb528d5a Update seastar submodule
* seastar 67fd967...af6b797 (1):
  > iotune: Initialize io_rates member variables

Fixes #4064
2019-01-08 12:02:42 +02:00
Avi Kivity
d8adbeda11 tests: mutation_source_test: generate valid utf-8 data
test_fast_forwarding_across_partitions_to_empty_range uses an uninitialized
string to populate an sstable, but this can be invalid utf-8 so that sstable
cannot be sstabledumped.

Make it valid by using make_random_string().

Fixes #4040.
Message-Id: <20190107193240.14409-1-avi@scylladb.com>
2019-01-08 12:02:42 +02:00
Asias He
1de24c8495 repair: Use mf.visit() in fragment_hasher
When new fragment type is added, it will fail to compile instead of
producing runtime errors.

Message-Id: <cf10200e4185c779aad15da3a776a5b79f5323af.1546930796.git.asias@scylladb.com>
2019-01-08 12:02:42 +02:00
Rafael Ávila de Espíndola
67039e942b Remove the only use of with_alignment from scylla
In c++17 there are standard ways of requesting aligned memory, so
seastar doesn't need to provide one.

This patch is in preparation for removing with_alignment from seastar.

Tests: unit (debug)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190107191019.22295-1-espindola@scylladb.com>
2019-01-07 21:34:47 +02:00
Rafael Ávila de Espíndola
0d4529a5f1 Change timeout to fix tests in a debug build
The current timeout is way too small for debug builds. Currently
jenkins runs avoid the problem by increasing the timeout by 100x. This
patch increases it by 10x, with seems to be sufficient to run the
tests in most desktop machines.

Message-Id: <20190107191413.22531-1-espindola@scylladb.com>
2019-01-07 21:34:06 +02:00
Avi Kivity
34251f5ea1 tools: toolchain: update image for all-user sudo 2019-01-07 21:22:42 +02:00
Takuya ASADA
3514b185fd tools: toolchain: allow sudo for all users
Non-privileged user may not belongs to "wheel" group, for example Debian
variants uses "sudo" group instead of "wheel".
To make sudo able to work on all environment we should allow sudo for
"ALL" instead of "wheel".

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190107173410.23140-1-syuu@scylladb.com>
2019-01-07 20:47:22 +02:00
Benny Halevy
40410465d7 sstables: mc: expired_liveness_ttl should be max int32_t rather than max uint32_t
Corresponding to Cassandra's EXPIRED_LIVENESS_TTL = Integer.MAX_VALUE;

Fixes #4060

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190107172457.20430-1-bhalevy@scylladb.com>
2019-01-07 18:41:37 +01:00
Avi Kivity
20b6d00e56 tools: toolchain: support dbuild from subdirectory or parent directory of scylla.git
When building something other than Scylla (like scylla-tools-java or scylla-jmx)
it is convenient to run it from some other directory. To do that, allow running
dbuild from any directory (so we locate tools/toolchain/image relative to the
dbuild script rather than use a fixed path) and mount the current directory
since it's likely the user will want to access files there.
Message-Id: <20190107165824.25164-1-avi@scylladb.com>
2019-01-07 18:35:51 +01:00
Nadav Har'El
f6e0ce02fa docs/isolation.md: new document
Start a new document with an overview of isolation in Scylla, i.e.,
scheduling groups, I/O priority classes, controllers, etc.

As all documents in docs/, this is a document for developers (not users!)
who need to understand how isolation between different pieces of Scylla
(e.g., queries, compaction, repair, etc.) works, which scheduling groups
and I/O classes we have and why, etc.

The document is still very partial and includes a lot of TODOs on
places where the explanation needs to be expanded. In particular it
needs an accurate explanation (and not just a name) of what kind of
work is done under each of the groups and classes, and an explanation
of how we set up RPC to use which scheduling groups for the code it
executes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190103183232.21348-1-nyh@scylladb.com>
2019-01-07 17:48:35 +02:00
Botond Dénes
80affca5f7 tests/querier_cache: use stats to get the no. of inactive reads
Now that we added stats for the inactive reads, the tests don't need
the `reader_concurrency_semaphore::inactive_reads()` method, instead
they can rely on the stats to check the number of inactive reads.
2019-01-07 17:06:26 +02:00
Botond Dénes
e56c26205f reader_concurrency_semaphore: add counters for inactive reads
Add counters that give insight into inactive read related events.
Two counters are added:
* permit_based_evictions
* population
2019-01-07 16:45:49 +02:00
Nadav Har'El
da090a5458 materialized views: move hints to top-level directory
While we keep ordinary hints in a directory parallel to the data directory,
we decided to keep the materialized view hints in a subdirectory of the data
directory, named "view_pending_updates". But during boot, we expect all
subdirectories of data/ to be keyspace names, and when we notice this one,
we print a warning:

   WARN: database - Skipping undefined keyspace: view_pending_updates

This spurious warning annoyed users. But moreover, we could have bigger
problems if the user actually tries to create a keyspace with that name.

So in this patch, we move the view hints to a separate top-level directory,
which defaults to /var/lib/scylla/view_hints, but as usual can be configured.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190107142257.16342-1-nyh@scylladb.com>
2019-01-07 16:43:43 +02:00
Takuya ASADA
eddecdd0b5 dist/redhat: drop unused dependencies
wget and yum-builddep are not used anymore, don't install them.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190107091148.1590-7-syuu@scylladb.com>
2019-01-07 12:56:18 +00:00
Takuya ASADA
40dc62fa98 dist/debian: don't use sudo to rm debian dir
sudo does not allowed in dbuild with non-root privilege, and also it
should be owned by current user, stop using sudo.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190107091148.1590-5-syuu@scylladb.com>
2019-01-07 12:56:18 +00:00
Takuya ASADA
237de20ff9 dist/debian: correct dbuild path
/usr/sbin/debuild is typo, should be /usr/bin.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190107091148.1590-4-syuu@scylladb.com>
2019-01-07 12:56:17 +00:00
Pekka Enberg
2520c8caac Merge 'Improve frozen toolchain for continuous integration' from Avi
"Add features that are useful for continuous integration pipelines (and
also ordinary developers):

 - sudo support, with and without a tty, as our packaging scripts require it
 - install ccache package to allow reducing incremental build times
 - dependencies needed to build scylla-jmx and scylla-tools-java"

* tag 'toolchain-ci/v1' of https://github.com/avikivity/scylla:
  tools: toolchain: update image for ant, maven, ccache, sudo
  tools: toolchain: dbuild: pass-through supplementary groups
  tools: toolchain: defeat PAM
  tools: toolchain: improve sudo support
  tools: toolchain: break long line in dbuild
  tools: toolchain: prepare sudoers file
  tools: toolchain: install ccache
  install-dependencies.sh: add maven and ant
2019-01-07 12:56:17 +00:00
Pekka Enberg
9b27a3035c Merge 'Reduce inclusions of "database.hh"' from Avi
"This patchset reduces inclusions of database.hh, particularly in header
files. It reduces the number of objects depending on database.hh from 166
to 116.

Tests: unit(release), playing a little with tracing"

* tag 'database.hh/v1' of https://github.com/avikivity/scylla:
  streaming: stream_session: remove include of db/view/view_update_from_staging_generator.hh
  sstables: writer.hh: add some forward declarations
  table_helper: remove database.hh include
  table_helper: de-inline insert() and setup_keyspace()
  table_helper: de-template setup_keyspace()
  table_helper: simplify template body of table_helper::insert()
  schema_tables: remove #include of database.hh
  cql_type_parser: remove dependency on user_types_metadata
  thrift: add missing include of sleep.hh
  cql3: ks_prop_defs: remove #include "database.hh"
2019-01-07 12:56:17 +00:00
Benny Halevy
b017d87a43 tests: mc: add back missing sstable_3_x_test Statistics.db files
To be able to verify the golden version with sstabledump.
These files were generated by running sstable_3_x_test and keeping its
generated output files.

Refs #4043

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190103112511.23488-2-bhalevy@scylladb.com>
2019-01-07 12:56:16 +00:00
Benny Halevy
517ad58823 tests: mc: delete empty line from write_static_row/mc-1-big-TOC.txt
Refs #4043

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190103112511.23488-1-bhalevy@scylladb.com>
2019-01-07 12:56:16 +00:00
Nadav Har'El
b14616b879 docs/logging.md: improvements
Various small improvements to docs/logging.md:
1. Describe the options to log to stdout or syslog and their defaults.
2. Mention the possibility of using nodetool instead of REST API.
3. Additional small tweaks to formatting.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190106111851.26700-1-nyh@scylladb.com>
2019-01-06 13:20:53 +02:00
Nadav Har'El
232e97ad06 docs/logging.md: new document
Add a new document about logging in Scylla, and how to change the log levels
when running Scylla and during the run.

It needs more developer-oriented information (e.g., how to create new logger
subsystems in the code) but I think it's a good start.

Some of the text is based on Glauber's writeup for the Scylla website on
changing log levels at runtime.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190106103606.26032-1-nyh@scylladb.com>
2019-01-06 12:40:14 +02:00
Benny Halevy
2daf81e80f dist: redhat/debian specs: add dependency on 'file' package
Needed by seastar-addr2line

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190101203434.14858-1-bhalevy@scylladb.com>
2019-01-06 12:13:08 +02:00
Avi Kivity
f02c64cadf streaming: stream_session: remove include of db/view/view_update_from_staging_generator.hh
This header, which is easily replaced with a forward declaration,
introduces a dependency on database.hh everywhere. Remove it and scatter
includes of database.hh in source files that really need it.
2019-01-05 17:33:25 +02:00
Avi Kivity
ca93b88cfb sstables: writer.hh: add some forward declarations
This makes the header less dependent on previously-included headers.
2019-01-05 17:04:16 +02:00
Avi Kivity
53a21c7787 table_helper: remove database.hh include 2019-01-05 16:39:26 +02:00
Avi Kivity
7534412071 table_helper: de-inline insert() and setup_keyspace()
After previous patches de-templated these functions, we can de-inline them.
This helps reduce compile time and prepares to reduce header dependencies.
2019-01-05 16:28:46 +02:00
Avi Kivity
cfedf4ab0f table_helper: de-template setup_keyspace()
This setup function has no reason to be a template and is easily
converted. We can then later de-inline it to reduce dependencies.
2019-01-05 16:23:10 +02:00
Avi Kivity
659147cd79 table_helper: simplify template body of table_helper::insert()
Move most of the body into a non-template overload to reduce dependencies
in the header (and template bloat). The function is not on any fast path,
and noncopyable_function will likely not even allocate anything.
2019-01-05 16:22:08 +02:00
Avi Kivity
c3ef99f84f schema_tables: remove #include of database.hh
Distribute in source files (and one header - table_helper.hh) that need it.
2019-01-05 15:43:07 +02:00
Avi Kivity
f43f82d1d2 cql_type_parser: remove dependency on user_types_metadata
A default parameter of type T (or lw_shared_ptr<T>) requires that T be
defined. Remove the depndency by redefining the default parameter
as an overload, for T = user_types_metadata.
2019-01-05 15:40:58 +02:00
Avi Kivity
4ba1d4d1dc thrift: add missing include of sleep.hh
Currently obtained indirectly through database.hh.
2019-01-05 15:39:30 +02:00
Avi Kivity
d24962e16c cql3: ks_prop_defs: remove #include "database.hh"
Replace with forward declaration to reduce rebuilds.
2019-01-05 14:26:03 +02:00
Jesse Haber-Kucharsky
17a5f7acab build: Link against libatomic
Since Scylla uses functions from the `atomic` header in its own source
code, we need to explicitly link against the stub library that is
provided for hardware architectures that do not have native support for
atomic operations.

Fixes #4053

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <7d62e762130494d73565ce8c031f53aaf866d3aa.1546645041.git.jhaberku@scylladb.com>
2019-01-05 13:38:57 +02:00
Avi Kivity
36e4e9fb54 Update seastar submodule
* seastar 6c8c229...67fd967 (1):
  > perftune.py: tune only active NVMe HW queues on i3 AWS instances
2019-01-04 13:17:29 +02:00
Avi Kivity
b0980ba7c6 compaction_controller: increase minimum shares to 50 (~5%) for small-data workloads
The workload in #3844 has these characteristics:
 - very small data set size (a few gigabytes per shard)
 - large working set size (all the data, enough for high cache miss rate)
 - high overwrite rate (so a compaction results in 12X data reduction)

As a result, the compaction backlog controller assigns very few shares to
compaction (low data set size -> low backlog), so compaction proceeds very slowly.
Meanwhile, we have tons of cache misses, and each cache miss needs to read from a
large number of sstables (since compaction isn't progressing). The end result is
a high read amplification, and in this test, timeouts.

While we could declare that the scenario is very artificial, there are other
real-world scenarios that could trigger it. Consider a 100% write load
(population phase) followed by 100% read. Towards the end of the last compaction,
the backlog will drop more and more until compaction slows to a crawl, and until
it completes, all the data (for that compaction) will have to be read from its
input sstables, resulting in read amplification.

We should probably have read amplification affect the backlog, but for now the
simpler solution is to increase the minimum shares to 50 so that compaction
always makes forward progress. This will result in higher-than-needed compaction
bandwidth in some low write rate scenarios so we will see fluctuations in request
rate (what the controller was designed to avoid), but these fluctioations will be
limited to 5%.

Since the base class backlog_controller has a fixed (0, 0) point, remove it
and add it to derived classes (setting it to (0, 50) for compaction).

Fixes #3844 (or at least improves it).
Message-Id: <20181231162710.29410-1-avi@scylladb.com>
2019-01-04 10:58:43 +01:00
Duarte Nunes
b851cb1a9a distributed_loader: Forbid uploading MV sstables
Instead suggest that the views be re-created.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190103142933.35354-1-duarte@scylladb.com>
2019-01-03 16:31:20 +02:00
Avi Kivity
7d3562a403 tools: toolchain: update image for ant, maven, ccache, sudo 2019-01-03 16:16:47 +02:00
Avi Kivity
344468e20d tools: toolchain: dbuild: pass-through supplementary groups
Useful for ccache.
2019-01-03 16:16:47 +02:00
Avi Kivity
11889f5ea9 tools: toolchain: defeat PAM
Prevent PAM from enforcing security and preventing sudo from working. This is
done by replacing the default configuration (designed for workstations) to
one that uses pam_permit for everything.
2019-01-03 16:16:47 +02:00
Avi Kivity
9c258923d8 tools: toolchain: improve sudo support
Bind-mount /etc/passwd and /etc/group so sudo doesn't complain, and
support sudo without password or tty.
2019-01-03 16:16:47 +02:00
Avi Kivity
05f78df7b9 tools: toolchain: break long line in dbuild 2019-01-03 16:16:47 +02:00
Avi Kivity
f79a300081 tools: toolchain: prepare sudoers file
Don't require a tty or passwords, since they won't be available in
continuous integration environments.
2019-01-03 16:16:47 +02:00
Avi Kivity
25040824cf tools: toolchain: install ccache
Not strictly necessary, but often useful to reduce rebuild times. The user
will need to bind-mount a populated cache.
2019-01-03 16:16:47 +02:00
Avi Kivity
527e3a58ff install-dependencies.sh: add maven and ant
Add tools needed to build scylla-jmx and scylla-tools-java. While
not requirements of this repository, it's nicer if a single setup
can be used to build and run everything.

We also install pystache as it's used by packaging scripts.
2019-01-03 16:16:45 +02:00
Duarte Nunes
3235c13125 utils/fragmented_temporary_buffer: Correctly implement remove_suffix()
The current implementation breaks the invariant that

_size_bytes = reduce(_fragments, &temporary_buffer::size)

In particular, this breaks algorithms that check the individual
segment size.

Correctly implement remove_suffix() by destroying superfluous
temporary_buffer's and by trimming the last one, if needed.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190103133523.34937-1-duarte@scylladb.com>
2019-01-03 13:37:01 +00:00
Botond Dénes
021feef513 querier_cache: simplify memory eviction use-after-free fix, add tests
Simplify the fix for memory based eviction, introduced by 918d255 so
there is no need to massage the counters.

Also add a check to `test_memory_based_cache_eviction` which checks for
the bug fixed. While at it also add a check to
`test_time_based_cache_eviction` for the fix to time based eviction
(e5a0ea3).

Tests: tests/querier_cache:debug
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <c89e2788a88c2a701a2c39f377328e77ac01e3ef.1546515465.git.bdenes@scylladb.com>
2019-01-03 13:44:08 +02:00
Tomasz Grabiec
1613a623e1 Merge "Fix crash on corrupt sstable" from Rafael
* https://github.com/espindola/scylla espindola/invalid_boundary4:
  sstables: Refactor predicates on bound_kind_m
  Fix crash on corrupt sstable
2019-01-03 12:02:09 +01:00
Duarte Nunes
42d9ca8266 Merge 'Add staging SSTables support to row level repair' from Piotr
"
This series adds staging SSTables support to row level repair.
It was introduced for streaming sessions before, but since row level
repair doesn't leverage sessions at all, it's added separately.

Tests:
unit (release)
dtest (repair_additional_test.py:RepairAdditionalTest,
       excluding repair_abort_test, which fails for me locally on master)
"

* 'add_staging_sstables_generation_to_row_level_repair_2' of https://github.com/psarna/scylla:
  repair: add staging sstables support to row level repair
  main,repair: add params to row level repair init
  streaming,view: move view update checks to separate file
2019-01-03 09:40:13 +00:00
Piotr Sarna
a73d9ccf31 service: mark existing views as built before bootstrap
When a node is bootstrapping, it will receive data from other nodes
via streaming, including materialized views. Regardless whether these
views are built on other nodes or not, building them on newly
bootstrapped nodes has no effect - updates were either already streamed
completely (if view building have finished) or will be propagated
via view building, if the process is still ongoing.
So, marking all views as 'built' for the bootstrapped node prevents it
from spawning superfluous view building processes.

Fixes #4001
Message-Id: <fd53692c38d944122d1b1013fdb0aedf517fa409.1546498861.git.sarna@scylladb.com>
2019-01-03 09:39:33 +00:00
Botond Dénes
e5a0ea390a querier_cache: unregister queriers evicted due to expired TTL
Currently queriers evicted due to their TTL expiring are not
unregistered from the `reader_concurrency_semaphore`. This can cause a
use-after-free when the semaphore tries to evict the same querier at
some later point in time, as the querier entry it has a pointer to is
now invalid.

Fix by unregistering the querier from the semaphore before destroying
the entry.

Refs: #4018
Refs: #4031

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4adfd09f5af8a12d73c29d59407a789324cd3d01.1546504034.git.bdenes@scylladb.com>
2019-01-03 10:29:26 +02:00
Piotr Sarna
bc74ac6f09 repair: add staging sstables support to row level repair
In some cases, sstables created during row level repair
should be enqueued as staging in order to generate
view updates from them.

Fixes #4034
2019-01-03 08:36:45 +01:00
Piotr Sarna
a0003c52cf main,repair: add params to row level repair init
Row level repair needs references to system distributed keyspace
and view update generator in order to enqueue some sstables
as staging.
2019-01-03 08:31:41 +01:00
Piotr Sarna
9d46715613 streaming,view: move view update checks to separate file
Checking if view update path should be used for sstables
is going to be reused in row level repair code,
so relevant functions are moved to a separate header.
2019-01-03 08:31:40 +01:00
Avi Kivity
918d255168 querier_cache: unregister querier from reader_concurrency_semaphore during eviction
In insert_querier(), we may evict older queriers to make room for the new one.
However, we forgot to unregister the evicted queriers from
reader_concurrency_semaphore. As a result, when reader_concurrency_semaphore
eventually wanted to evict something, it saw an inactive_read_handle that was
not connected to a querier_cache::entry, and crashed on use-after-free.

Fix by evicting through the inactive_read_handle associated with the querier
to be evicted. This removes traces of the querier from both
reader_concurrency_semaphore and querier_cache. We also have to massage the
statistics since querier_inactive_read::evict() updates different counters.

Fixes #4018.

Tests: unit(release)
Reviewed-by: Botond Denes <bdenes@scylladb.com>
Message-Id: <20190102175023.26093-1-avi@scylladb.com>
2019-01-03 09:15:07 +02:00
Rafael Ávila de Espíndola
28c014351f Fix crash on corrupt sstable
The check in consume_range_tombstone was too late. Before getting to
it we would fail an assert calling to_bound_kind.

This moves the check earlier and adds a testcase.

Tests: unit (release)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-02 17:52:07 -08:00
Rafael Ávila de Espíndola
3c9178d122 sstables: Refactor predicates on bound_kind_m
This moves the predicate functions to the start of the file, renames
is_in_bound_kind to is_bound_kind for consistency with to_bound_kind
and defines all predicates in a similar fashion.

It also uses the predicates to reduce code duplication.

Tests: unit (release)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-02 17:50:44 -08:00
Avi Kivity
2717bdd301 tools: toolchain: allow adjusting "docker run" command line
It is useful to adjust the command line when running the docker image,
for example to attach a data volume or a ccache directory. Add e mechanism
to do that.
Message-Id: <20181228163306.19439-1-avi@scylladb.com>
2019-01-01 21:44:50 +00:00
Avi Kivity
d19660ec0a Merge "commitlog: Use fragmented buffers for reading entries" from Duarte
"
Instead of allocating a contiguous temporary_buffer when reading
mutations from the commitlog - or hint - replaying, use fragemnted
buffers instead.

Refs #4020
"

* 'commitlog/fragmented-read/v1' of https://github.com/duarten/scylla:
  db/commitlog: Use fragmented buffers to read entries
  db/commitlog: Implement skip in terms of input buffer skipping
  tests/fragmented_temporary_buffer_test: Add unit test for remove_suffix()
  utils/fragmented_temporary_buffer: Add remove_suffix
  tests/fragmented_temporary_buffer_test: Add unit test for skip()
  utils/fragmented_temporary_buffer: Allow skipping in the input stream
2019-01-01 19:08:34 +02:00
Avi Kivity
6641353854 tracing: remove static class_registry
Static class_registries hinder librarification by requiring linking with
all object files (instead of a library from which objects are linked on
demand) and reduce readability by hiding dependencies and by their
horrible syntax. Hide them behind a non-static, non-template tracing
backend registry.
Message-Id: <20181229121000.7885-1-avi@scylladb.com>
2018-12-31 13:24:54 +00:00
Duarte Nunes
b7517183fa db/commitlog: Use fragmented buffers to read entries
Leverage fragmented_temporary_buffer when reading commit log
entries, avoiding large allocations.

Refs #4020

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-31 13:20:37 +00:00
Duarte Nunes
0e50a9bc6d db/commitlog: Implement skip in terms of input buffer skipping
This simplifies the code and allows to get rid of the overload of
advance() taking a temporary_buffer.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-31 13:20:37 +00:00
Duarte Nunes
8379ac6189 tests/fragmented_temporary_buffer_test: Add unit test for remove_suffix()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-31 13:20:37 +00:00
Duarte Nunes
1a88cd7992 utils/fragmented_temporary_buffer: Add remove_suffix
Essentially hide some bytes off the end of the buffer. Needed for
subsequent commit log changes.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-31 13:20:37 +00:00
Duarte Nunes
50dd8b67b2 tests/fragmented_temporary_buffer_test: Add unit test for skip()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-31 13:20:37 +00:00
Duarte Nunes
8eab0a3e01 utils/fragmented_temporary_buffer: Allow skipping in the input stream
Add fragmented_temporary_buffer::istream::skip(), needed for
subsequent commit log changes.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-31 13:20:37 +00:00
Avi Kivity
c180a18dbb Distribute distributed_loader into its own header and source files
distributed_loader is a sizeable fraction of database.cc, so moving it
out reduces compile time and improves readability.
Message-Id: <20181230200926.15074-1-avi@scylladb.com>
2018-12-31 14:27:27 +02:00
Avi Kivity
49958d5836 tools: toolchain: update for lz4 1.8.3
lz4 1.8.3 was released with a fix for data corruption during compression. While
the release notes indicate we aren't vulnerable, be cautious and update anyway.
Message-Id: <20181230144716.7238-1-avi@scylladb.com>
2018-12-31 14:27:27 +02:00
Hagit Segev
141fad9c14 Update README.md
fix a typo
2018-12-31 13:33:04 +02:00
Asias He
d90836a2d3 streaming: Make total_incoming_bytes and total_outgoing_bytes metrics monotonic
Currently, they increases and decreases as the stream sessions are
created and destroyed. Make them prometheus monotonically increasing
counter for easier monitoring.

Message-Id: <7c07cea25a59a09377292dc8f64ed33ff12eda87.1545959905.git.asias@scylladb.com>
2018-12-30 16:52:17 +02:00
Pekka Enberg
96172b7bca Merge 'Fixes for the view_update_from_staging_generator' from Duarte
"This series contains a couple of fixes to the
view_update_from_staging_generator, the object responsible for
generating view updates from sstables written through streaming.

Fixes #4021"
* 'materialized-views/staging-generator-fixes/v2' of https://github.com/duarten/scylla:
  db/view/view_update_from_staging_generator: Break semaphore on stop()
  db/view/view_update_from_staging_generator: Restore formatting
  db/view/view_update_from_staging_generator: Avoid creating more than one fiber
2018-12-29 18:31:40 +02:00
Duarte Nunes
f41d13f38c db/view/view_update_from_staging_generator: Break semaphore on stop()
This avoid having fibers waiting _registration_sem without ever being
notified.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-29 12:55:04 +00:00
Duarte Nunes
4974addc5c db/view/view_update_from_staging_generator: Restore formatting
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-29 12:55:02 +00:00
Duarte Nunes
201196130d db/view/view_update_from_staging_generator: Avoid creating more than one fiber
If view_update_from_staging_generator::maybe_generate_view_updates()
is called before view_update_from_staging_generator::start(), as can
happen in main.cc, then we can potentially create more than one fiber,
which leads to corrupted state and conflicting operations.

To avoid this, use just one fiber and be explicit about notifying it
that more work is needed, by leveraging a condition-variable.

Fixes #4021

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-29 12:52:51 +00:00
Duarte Nunes
66113a2d39 Merge 'Replace query_processor's sharded<database> with plain database' from Avi
"
A sharded<database> is not very useful for accessing data since data is
usually distributed across many nodes, while a sharded<database>
contains only a single node's view. So it is really only used for
accessing replicated metadata, not data. As such only the local shard
is accessed.

Use that to simplify query_processor a little by replacing sharded<database>
with a plain database.

We can probably be more ambitious and make all accesses, data and metadata,
go through storage_proxy, but this is a start.
"

* tag 'qp-unshard-database/v1' of https://github.com/avikivity/scylla:
  query_processor: replace sharded<database> with the local shard
  commitlog_replayer: don't use query_processor
  client_state: change set_keyspace() to accept a single database shard
  legacy_schema_migrator: initialize with database reference
2018-12-29 12:14:19 +00:00
Avi Kivity
0c0cc66ee7 system_keyspace, view: reduce interdependencies
system_keyspace is an implementation detail for most of its users, not
part of the interface, as it's only used to store internal data. Therefore,
including it in a header file causes unneeded dependencies.

This patch removes a dependency between views and system_keyspace.hh
by moving view_name and view_build_progress into a separate header file,
and using forward declarations where possible. This allows us to
remove an inclusion of system_keyspace.hh from a header file (the last
one), so that further changes to system_keyspace.hh will cause fewer
recompilations.
Message-Id: <20181228215736.11493-1-avi@scylladb.com>
2018-12-29 12:12:15 +00:00
Avi Kivity
30745eeb72 query_processor: replace sharded<database> with the local shard
query_processor uses storage_proxy to access data, and the local
database object to access replicated metadata. While it seems strange
that the database object is not used to access data, it is logical
when you consider that a sharded<database> only contain's this node's
data, not the cluster data.

Take advantage of this to replace sharded<database> with a single database
shard.
2018-12-29 11:02:15 +02:00
Avi Kivity
f0a709cfc8 commitlog_replayer: don't use query_processor
During normal writes, query processing happens before commitlog, so
logically commitlog replaying the commitlog shouldn't need it. And in
fact the dependency on query_processor can be eliminated, all it needs
is the local node's database.
2018-12-29 11:00:29 +02:00
Avi Kivity
7830086317 client_state: change set_keyspace() to accept a single database shard
set_keyspace() only needs one shard (it is checking replicated state,
not sharded data) so arrange for it to receive only that one shard.
2018-12-29 10:58:39 +02:00
Avi Kivity
e4233262cf legacy_schema_migrator: initialize with database reference
Provide legacy_schema_migrator with a sharded<database> so it doesn't need
to use the one from query_processor. We want to replace query_processor's
sharded<database> with just a local database reference in order to simplify
it, and this is standing in the way.
2018-12-29 10:58:22 +02:00
Duarte Nunes
bab7e6877b streaming/stream_session: Only stage sstables for tables with views
When streaming, sstables for which we need to generate view updates
are placed in a special staging directory. However, we only need to do
this for tables that actually have views.

Refs #4021
Message-Id: <20181227215412.5632-1-duarte@scylladb.com>
2018-12-28 18:32:24 +02:00
Avi Kivity
feddf0b021 tools: toolchain: patch boost for use-after-free in Boost.Test XML output
The version of boost in Fedora 29 has a use-after-free bug that is only
exposed when ./test.py is run with the --jenkins flag.  To patch it,
use a fixed version from the copr repository scylladb/toolchain.
Message-Id: <20181228150419.29623-1-avi@scylladb.com>
2018-12-28 16:35:28 +01:00
Tomasz Grabiec
7747f2dde3 Merge "nodetool toppartitions" from Rafi & Avi
Implementation of nodetool toppartiotion query, which samples most frequest PKs in read/write
operation over a period of time.

Content:
- data_listener classes: mechanism that interfaces with mutation readers in database and table classes,
- toppartition_query and toppartition_data_listener classes to implement toppartition-specific query (this
  interfaces with data_listeners and the REST api),
- REST api for toppartitions query.

Uses Top-k structure for handling stream summary statistics (based on implementation in C*, see #2811).

What's still missing:
- JMX interface to nodetool (interface customization may be required),
- Querying #rows and #bytes (currently, only #partitions is supported).

Fixes #2811

* https://github.com/avikivity/scylla rafie_toppartitions_v7.1:
  top_k: whitespace and minor fixes
  top_k: map template arguments
  top_k: std::list -> chunked_vector
  top_k: support for appending top_k results
  nodetool toppartitions: refactor table::config constructor
  nodetool toppartitions: data listeners
  nodetool toppartitions: add data_listeners to database/table
  nodetool toppartitions: fully_qualified_cf_name
  nodetool toppartitions: Toppartitions query implementation
  nodetool toppartitions: Toppartitions query REST API
  nodetool toppartitions: nodetool-toppartitions script
2018-12-28 16:31:24 +01:00
Rafi Einstein
7677d2ba2c nodetool toppartitions: nodetool-toppartitions script
A Python script mimicking the nodetool toppartitions utility, utilizing Scylla REST API.

Examples:
$ ./nodetool-toppartitions --help
usage: nodetool-toppartitions [-h] [-k LIST_SIZE] [-s CAPACITY]
                              keyspace table duration

Samples database reads and writes and reports the most active partitions in a
specified table

positional arguments:
  keyspace      Name of keyspace
  table         Name of column family
  duration      Query duration in milliseconds

optional arguments:
  -h, --help    show this help message and exit
  -k LIST_SIZE  The number of the top partitions to list (default: 10)
  -s CAPACITY   The capacity of stream summary (default: 256)

$ ./nodetool-toppartitions ks test1 10000
READ
  Partition   Count
  30          2
  20          2
  10          2

WRITE
  Partition   Count
  30          1
  20          1
  10          1

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:48:03 +02:00
Rafi Einstein
197f38d4ee nodetool toppartitions: Toppartitions query REST API
A HTTP GET operation starts the query (with args: ks/cf name and duration in ms).
It executes synchroneously, results are returned as JSON:
$ curl -s -X GET http://localhost:10000/column_family/toppartitions/ks:cf1?duration=10000 | jq
{
  "read": [
    {
      "count": "15",
      "error": "0",
      "partition": "4b504d39354f37353131"
    },
    {
      "count": "15",
      "error": "0",
      "partition": "3738313134394d353530"
    }
  ],
  "write": [
    {
      "count": "15",
      "error": "0",
      "partition": "4b504d39354f37353131"
    },
    {
      "count": "15",
      "error": "0",
      "partition": "3738313134394d353530"
    }
  ]
}

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Rafi Einstein
6b2c21f69b nodetool toppartitions: Toppartitions query implementation
toppartitions_query installs toppartitions_data_listener-s on all database shards, waits for
the designated period, uninstalls shards and collects top-k read/write partition keys.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Rafi Einstein
404f75def5 nodetool toppartitions: fully_qualified_cf_name
Encapsulate keyspace:column_family REST API argument parsing into fully_qualified_cf_name class.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Rafi Einstein
0bffe5f83e nodetool toppartitions: add data_listeners to database/table
Add data_listeners member to database.
Adds data_listeners* to table::config, to be used by table methods to invoke listeners.
Install on_read() listener in table::make_reader().
Install on_write() listener in database::apply_in_memory().

Tests: Unit (release)
Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Rafi Einstein
08ba115c16 nodetool toppartitions: data listeners
Mechanism that interfaces with mutation readers in database and table classes, to
allow tracking most frequent partition keys in read and write operation.
Basic design is specified in #2811.

Tracking top #rows and #bytes will be supported in the future.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Rafi Einstein
038f8c7988 nodetool toppartitions: refactor table::config constructor
Eliminae extra parameters to ctor and deduce them instead from db param.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Rafi Einstein
eda43b93c9 top_k: support for appending top_k results
Allow appending results of one top_k into another.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:56 +02:00
Rafi Einstein
aeebe8e86b top_k: std::list -> chunked_vector
Replaced std::list with chunked_vector. Because chunked_vector requires
a noexcept move constructor from its value type, change the bad_boy type
in the unit test not to throw in the move constructor.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:07 +02:00
Avi Kivity
8e2f6d0513 Merge "Fix use-after-free when destroying partition_snapshots in the background"from Tomasz
"
partition_snapshots created in the memtable will keep a reference to
the memtable (as region*) and to memtable::_cleaner. As long as the
reader is alive, the memtable will be kept alive by
partition_snapshot_flat_reader::_container_guard. But after that
nothing prevents it from being destroyed. The snapshot can outlive the
read if mutation_cleaner::merge_and_destroy() defers its destruction
for later. When the read ends after memtable was flushed, the snapshot
will be queued in the cache's cleaner, but internally will reference
memtable's region and cleaner. This will result in a use-after-free
when the snapshot resumes destruction.

The fix is to update snapshots's region and cleaner references at the
time of queueing to point to the cache's region and cleaner.

When memtable is destroyed without being moved to cache there is no
problem because the snapshot would be queued into memtable's cleaner,
which will be drained on destruction from all snapshots.

Introduced in f3da043 (in >= 3.0-rc1)

Fixes #4030.

Tests:

  - mvcc_test (debug)

"

* tag 'fix-snapshot-merging-use-after-free-v1.1' of github.com:tgrabiec/scylla:
  tests: mvcc: Add test_snapshot_merging_after_container_is_destroyed
  tests: mvcc: Introduce mvcc_container::migrate()
  tests: mvcc: Make mvcc_partition move-constructible
  tests: mvcc: Introduce mvcc_container::make_not_evictable()
  tests: mvcc: Allow constructing mvcc_container without a cache_tracker
  mutation_cleaner: Migrate partition_snapshots when queueing for background cleanup
  mvcc: partition_snapshot: Introduce migrate()
  mutation_cleaner: impl: Store a back-reference to the owning mutation_cleaner
2018-12-28 12:45:10 +02:00
Tomasz Grabiec
bb1c9cb6f3 tests: mvcc: Add test_snapshot_merging_after_container_is_destroyed 2018-12-28 10:32:39 +01:00
Tomasz Grabiec
4d13dea39a tests: mvcc: Introduce mvcc_container::migrate() 2018-12-28 10:32:39 +01:00
Tomasz Grabiec
676868ed31 tests: mvcc: Make mvcc_partition move-constructible 2018-12-28 10:32:39 +01:00
Tomasz Grabiec
c6798f7872 tests: mvcc: Introduce mvcc_container::make_not_evictable() 2018-12-28 10:32:39 +01:00
Tomasz Grabiec
1fa00656ea tests: mvcc: Allow constructing mvcc_container without a cache_tracker
Some test cases will need many containers to simulate memtable ->
cache transitions, but there can be only one cache_tracker per shard
due to metrics. Allow constructing a conatiner without a cache_tracker
(and thus non-evictable).
2018-12-28 10:32:39 +01:00
Tomasz Grabiec
ac49b1def0 mutation_cleaner: Migrate partition_snapshots when queueing for background cleanup
partition_snapshots created in the memtable will keep a reference to
the memtable (as region*) and to memtable::_cleaner. As long as the
reader is alive the memtable will be kept alive by
partition_snapshot_flat_reader::_container_guard. But after that,
nothing prevents it from being destroyed. The snapshot can outlive the
read if mutation_cleaner::merge_and_destroy() defers its destruction
for later. When the read ends after memtable was flushed, the snapshot
will be queued in the cache's cleaner, but internally will reference
memtable's region and cleaner. This will result in a use-after-free
when the snapshot resumses destruction.

The fix is to update snapshots's region and cleaner references at the
time of queueing to point to the cache's region and cleaner.

When memtable is destroyed without being moved to cache there is no
problem, because the snapshot would be queued into memtable's cleaner,
which will be drained on destruction from all snapshots.

Introduced in f3da043.

Fixes #4030.
2018-12-27 18:08:50 +01:00
Tomasz Grabiec
20f5d5d1a1 mvcc: partition_snapshot: Introduce migrate()
Snapshots which outlive the memtable will need to have their
_region and _cleaner references updated.

The snapshot can be destroyed after the memtable when it is queud in
the mutation_cleaner.
2018-12-27 18:08:50 +01:00
Tomasz Grabiec
67f9afbd1a mutation_cleaner: impl: Store a back-reference to the owning mutation_cleaner 2018-12-27 18:08:50 +01:00
Gleb Natapov
37b4043677 streaming: always read from rpc::source until end-of-stream during mutation sending
rpc::source cannot be abandoned until EOS is reached, but current code
does not obey it if error code is received, it throws exception instead that
aborts the reading loop. Fix it by moving exception throwing out of the
loop.

Fixes: #4025

Message-Id: <20181227135051.GC29458@scylladb.com>
2018-12-27 16:50:53 +02:00
Asias He
4d3c463536 storage_service: Stop cql server before gossip
We saw failure in dtest concurrent_schema_changes_test.py:
TestConcurrentSchemaChanges.changes_while_node_down_test test.

======================================================================
ERROR: changes_while_node_down_test (concurrent_schema_changes_test.TestConcurrentSchemaChanges)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/asias/src/cloudius-systems/scylla-dtest/concurrent_schema_changes_test.py", line 432, in changes_while_node_down_test
    self.make_schema_changes(session, namespace='ns2')
  File "/home/asias/src/cloudius-systems/scylla-dtest/concurrent_schema_changes_test.py", line 86, in make_schema_changes
    session.execute('USE ks_%s' % namespace)
  File "cassandra/cluster.py", line 2141, in cassandra.cluster.Session.execute
    return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile, paging_state).result()
  File "cassandra/cluster.py", line 4033, in cassandra.cluster.ResponseFuture.result
    raise self._final_exception
ConnectionShutdown: Connection to 127.0.0.1 is closed

The test:

   session = self.patient_cql_connection(node2)
   self.prepare_for_changes(session, namespace='ns2')
   node1.stop()
   self.make_schema_changes(session, namespace='ns2') --> ConnectionShutdown exception throws

The problem is that, after receiving the DOWN event, the python
Cassandra driver will call Cluster:on_down which checks if this client
has any connections to the node being shutdown. If there is any
connections, the Cluster:on_down handler will exit early, so the session
to the node being shutdown will not be removed.

If we shutdown the cql server first, the connection count will be zero
and the session will be removed.

Fixes: #4013
Message-Id: <7388f679a7b09ada10afe7e783d7868a58aac6ec.1545634941.git.asias@scylladb.com>
2018-12-27 14:13:43 +02:00
Duarte Nunes
2f69ba2844 lwt: Remove Paxos-related Cassandra code
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181227112526.4180-1-duarte@scylladb.com>
2018-12-27 13:30:10 +02:00
Duarte Nunes
66e45469b2 streaming/stream_session: Don't use table reference across defer points
When creating a sstable from which to generate view updates, we held
on to a table reference across defer points. In case there's a
concurrent schema drop, the table object might be destroyed and we
will incur in a use-after-free. Solve this by holding on to a shared
pointer and pinning the table object.

Refs #4021

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181227105921.3601-1-duarte@scylladb.com>
2018-12-27 13:05:46 +02:00
Avi Kivity
b349e11aba tools: toolchain: avoid docker-provided /tmp
On at least one system, using the container's /tmp as provided by docker
results in spurious EINVALs during aio:

INFO  2018-12-27 09:54:08,997 [shard 0] gossip - Feature ROW_LEVEL_REPAIR is enabled
unknown location(0): fatal error: in "test_write_many_range_tombstones": storage_io_error: Storage I/O error: 22: Invalid argument
seastar/tests/test-utils.cc(40): last checkpoint

The setup is overlayfs over xfs.

To avoid this problem, pass through the host's /tmp to the container.
Using --tmpfs would be better, but it's not possible to guess a good size
as the amount of temporary space needed depends on build concurrency.
Message-Id: <20181227101345.11794-1-avi@scylladb.com>
2018-12-27 10:17:23 +00:00
Avi Kivity
2c4a732735 tools: toolchain: update baseline Fedora packages
Image fedora-29-20181219 was broken due to the followin chain of events:

 - we install gnutls, which currently is at version 3.6.5
 - gnutls 3.6.5 introduced a dependency on nettle 3.4.1
 - the gnutls rpm does not include a version requirement on nettle,
   so an already-installed nettle will not be upgraded when gnutls is
   installed
 - the fedora:29 image which we use as a baseline has nettle installed
 - docker does not pull the latest tag in FROM statements during
   "docker build"
 - my build machine already had a fedora:29 image, with nettle 3.4
   installed (the repository's image has 3.4.1, but docker doesn't
   automatically pull if an image with the required tag exists)

As a result, the image ended up hacing gnutls 3.6.5 and nettle 3.4, which
are incompatible.

To fix, update all packages after installation to attempt to have a self
consistent package set even if dependencies are not perfect, and regenerate
the image.
Message-Id: <20181226135711.24074-1-avi@scylladb.com>
2018-12-26 14:58:23 +00:00
Avi Kivity
1414837fcc tools: toolchain: improve dbuild for continuous integration environments
The '-t' flag to 'docker run' passes the tty from the caller environment
to the container, which is nice for interactive jobs, but fails if there
is no tty, such as in a continuous integration environment.

Given that, the '-i' flag doesn't make sense either as there isn't any
input to pass.

Remove both, and replace with --sig-proxy=true which allows SIGTERM to
terminate the container instead of leaving it alive. This reduces the
chances of the build stopping but leaving random containers around.
Message-Id: <20181222105837.22547-1-avi@scylladb.com>
2018-12-26 10:50:34 +00:00
Avi Kivity
bfd8ade914 tools: toolchain: update toolchain for gcc-8.2.1-6
gcc was updated with some important fixes; update the toolchain to
include it.
Message-Id: <20181219190548.28675-1-avi@scylladb.com>
2018-12-26 10:21:02 +00:00
Benny Halevy
206483e6af position_in_partition_view: print bound_weight as int
Rather than a non-printable char.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181226091115.18530-1-bhalevy@scylladb.com>
2018-12-26 11:19:30 +02:00
Rafael Ávila de Espíndola
f73c60d8cf sstables: Convert an unreachable throw into an assert in read path
The function pending_collection is only called when
cdef->is_multi_cell() is true, so the throw is dead.

This patch converts it to an assert.
Message-Id: <20181207022119.38387-1-espindola@scylladb.com>
2018-12-26 11:10:19 +02:00
Benny Halevy
52188a20fa HACKING.md: Add details about unit test debug info
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181225133513.20751-1-bhalevy@scylladb.com>
2018-12-25 16:03:24 +02:00
Avi Kivity
c96fc1d585 Merge "Introduce row level repair" from Asias
"
=== How the the partition level repair works

- The repair master decides which ranges to work on.
- The repair master splits the ranges to sub ranges which contains around 100
partitions.
- The repair master computes the checksum of the 100 partitions and asks the
related peers to compute the checksum of the 100 partitions.
- If the checksum matches, the data in this sub range is synced.
- If the checksum mismatches, repair master fetches the data from all the peers
and sends back the merged data to peers.

=== Major problems with partition level repair

- A mismatch of a single row in any of the 100 partitions causes 100
partitions to be transferred. A single partition can be very large. Not to
mention the size of 100 partitions.

- Checksum (find the mismatch) and streaming (fix the mismatch) will read the
same data twice

=== Row level repair

Row level checksum and synchronization: detect row level mismatch and transfer
only the mismatch

=== How the row level repair works

- To solve the problem of reading data twice

Read the data only once for both checksum and synchronization between nodes.

We work on a small range which contains only a few mega bytes of rows,
We read all the rows within the small range into memory. Find the
mismatch and send the mismatch rows between peers.

We need to find a sync boundary among the nodes which contains only N bytes of
rows.

- To solve the problem of sending unnecessary data.

We need to find the mismatched rows between nodes and only send the delta.
The problem is called set reconciliation problem which is a common problem in
distributed systems.

For example:
Node1 has set1 = {row1, row2, row3}
Node2 has set2 = {      row2, row3}
Node3 has set3 = {row1, row2, row4}

To repair:
Node1 fetches nothing from Node2 (set2 - set1), fetches row4 (set3 - set1) from Node3.
Node1 sends row1 and row4 (set1 + set2 + set3 - set2) to Node2
Node1 sends row3 (set1 + set2 + set3 - set3) to Node3.

=== How to implement repair with set reconciliation

- Step A: Negotiate sync boundary

class repair_sync_boundary {
    dht::decorated_key pk;
    position_in_partition position
}

Reads rows from disk into row buffers until the size is larger than N
bytes. Return the repair_sync_boundary of the last mutation_fragment we
read from disk. The smallest repair_sync_boundary of all nodes is
set as the current_sync_boundary.

- Step B: Get missing rows from peer nodes so that repair master contains all the rows

Request combined hashes from all nodes between last_sync_boundary and
current_sync_boundary. If the combined hashes from all nodes are identical,
data is synced, goto Step A. If not, request the full hashes from peers.

At this point, the repair master knows exactly what rows are missing. Request the
missing rows from peer nodes.

Now, local node contains all the rows.

- Step C: Send missing rows to the peer nodes

Since local node also knows what peer nodes own, it sends the missing rows to
the peer nodes.

=== How the RPC API looks like

- repair_range_start()

Step A:
- request_sync_boundary()

Step B:
- request_combined_row_hashes()
- reqeust_full_row_hashes()
- request_row_diff()

Step C:
- send_row_diff()

- repair_range_stop()

=== Performance evaluation

We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. We
created a keyspace with a replication factor of 3 and inserted 1 billion
rows to each of the 3 nodes. Each node has 241 GiB of data.
We tested 3 cases below.

1) 0% synced: one of the node has zero data. The other two nodes have 1 billion identical rows.

Time to repair:
   old = 87 min
   new = 70 min (rebuild took 50 minutes)
   improvement = 19.54%

2) 100% synced: all of the 3 nodes have 1 billion identical rows.
Time to repair:
   old = 43 min
   new = 24 min
   improvement = 44.18%

3) 99.9% synced: each node has 1 billion identical rows and 1 billion * 0.1% distinct rows.

Time to repair:
   old: 211 min
   new: 44 min
   improvement: 79.15%

Bytes sent on wire for repair:
   old: tx= 162 GiB,  rx = 90 GiB
   new: tx= 1.15 GiB, tx = 0.57 GiB
   improvement: tx = 99.29%, rx = 99.36%

It is worth noting that row level repair sends and receives exactly the
number of rows needed in theory.

In this test case, repair master needs to receives 2 million rows and
sends 4 million rows. Here are the details: Each node has 1 billion *
0.1% distinct rows, that is 1 million rows. So repair master receives 1
million rows from repair slave 1 and 1 million rows from repair slave 2.
Repair master sends 1 million rows from repair master and 1 million rows
received from repair slave 1 to repair slave 2. Repair master sends
sends 1 million rows from repair master and 1 million rows received from
repair slave 2 to repair slave 1.

In the result, we saw the rows on wire were as expected.

tx_row_nr  = 1000505 + 999619 + 1001257 + 998619 (4 shards, the numbers are for each shard) = 4'000'000
rx_row_nr  =  500233 + 500235 +  499559 + 499973 (4 shards, the numbers are for each shard) = 2'000'000

Fixes: #3033

Tests: dtests/repair_additional_test.py
"

* 'asias/row_level_repair_v7' of github.com:cloudius-systems/seastar-dev: (51 commits)
  repair: Enable row level repair
  repair: Add row_level_repair
  repair: Add docs for row level repair
  repair: Add repair_init_messaging_service_handler
  repair: Add repair_meta
  repair: Add repair_writer
  repair: Add repair_reader
  repair: Add repair_row
  repair: Add fragment_hasher
  repair: Add decorated_key_with_hash
  repair: Add get_random_seed
  repair: Add get_common_diff_detect_algorithm
  repair: Add shard_config
  repair: Add suportted_diff_detect_algorithms
  repair: Add repair_stats to repair_info
  repair: Introduce repair_stats
  flat_mutation_reader:  Add make_generating_reader
  storage_service: Introduce ROW_LEVEL_REPAIR feature
  messaging_service: Add RPC verbs for row level repair
  repair: Export the repair logger
  ...
2018-12-25 13:13:00 +02:00
Takuya ASADA
b9a06ae552 dist/offline_installer/redhat: support building RHEL 7 offline installer
We had issue to build offline installer on RHEL because of repository
difference.
This fix enables to build offline installer both on CentOS and RHEL.

Also it introduces --releasever <ver>, to build offline installer for
specific minor version of CentOS and RHEL.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181212032129.29515-1-syuu@scylladb.com>
2018-12-25 12:50:09 +02:00
Botond Dénes
3ae77a2587 configure.py: generate ${mode}-objects targets
Sometimes one wants to just compile all the source files in the
projects, because for example one just moved around code or files and
there is no need to link and run anything, just check that everything
still compiles.
Since linking takes up a considerable amount of time it is worthwhile to
have a specific target that caters for such needs.
This patch introduces a ${mode}-objects target for each mode (e.g.
release-objects) that only runs the compilation step for each source
file but does not link anything.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <eaad329bf22dfaa3deff43344f3e65916e2c8aaf.1545045775.git.bdenes@scylladb.com>
2018-12-25 12:40:20 +02:00
Benny Halevy
f104951928 sstable_test: read_file should open the file read-only
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181218145156.12716-1-bhalevy@scylladb.com>
2018-12-25 12:02:46 +02:00
Rafael Ávila de Espíndola
f8c81d4d89 tests: sstables: mc: add tests with incompatible schemas
In one test the types in the schema don't match the types in the
statistics file. In another a column is missing.

The patch also updates the exceptions to have more human readable
messages.

Tests: unit (release)

Part of issue #3960.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181219233046.74229-1-espindola@scylladb.com>
2018-12-25 11:11:54 +02:00
Yibo Cai (Arm Technology China)
422987ab04 utils: add fast ascii string validation
Validate ascii string by ORing all bytes and check if 7-th bit is 0.
Compared with original std::any_of(), which checks ascii string byte
by byte, this new approach validates input in 8 bytes and two
independent streams. Performance is much higher for normal cases,
though slightly slower when string is very short. See table below.

Speed(MB/s) of ascii string validation
+---------------+-------------+---------+
| String length | std::any_of | u64 x 2 |
+---------------+-------------+---------+
| 9 bytes       | 1691        | 1635    |
+---------------+-------------+---------+
| 31 bytes      | 2923        | 3181    |
+---------------+-------------+---------+
| 129 bytes     | 3377        | 15110   |
+---------------+-------------+---------+
| 1039 bytes    | 3357        | 31815   |
+---------------+-------------+---------+
| 16385 bytes   | 3448        | 47983   |
+---------------+-------------+---------+
| 1048576 bytes | 3394        | 31391   |
+---------------+-------------+---------+

Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Message-Id: <1544669646-31881-1-git-send-email-yibo.cai@arm.com>
2018-12-24 09:58:08 +02:00
Tomasz Grabiec
419c771791 sstables: index_reader: Fix abort when _trust_pi == trust_promoted_index::no
data is not moved-from if _trust_pi == trust_promoted_index::no, which
triggers the assert on data.empty(). We should make it empty
unconditionally.

Message-Id: <1545408731-14333-1-git-send-email-tgrabiec@scylladb.com>
2018-12-23 12:09:21 +02:00
Tomasz Grabiec
07d153c769 sstables: mc: reader: Use enum class instead of variant
variant is an overkill here.

Message-Id: <1545409014-16289-1-git-send-email-tgrabiec@scylladb.com>
2018-12-23 12:04:02 +02:00
Duarte Nunes
e6a8883228 service/storage_proxy: Protect against empty mutation when storing hint
mutation_holder::get_mutation_for() can return nullptr's, so protect
against those when storing a hint.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181221194853.98775-2-duarte@scylladb.com>
2018-12-23 11:14:44 +02:00
Duarte Nunes
6c4a34f378 service/storage_proxy: Protect against empty mutation in mutation_holder
The per_destination_mutation holder can contain empty mutations,
so make sure release_mutation() skips over those.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181221194853.98775-1-duarte@scylladb.com>
2018-12-23 11:14:43 +02:00
Duarte Nunes
5e7d18380d Merge 'Reduce dependencies on config.hh for extensions access' from Avi
"
Some files use db/config.hh just to access extensions. Reduce dependencies
on this global and volatile file by providing another path to access extensions.

Tests: unit(release)
"

* tag 'unconfig-2/v1' of https://github.com/avikivity/scylla:
  hints: reduce dependencies on db/config.hh
  commitlog: reduce dependencies on db/config.hh
  cql3: reduce dependencies on db/config.hh
  database: provide accessor to db::extensions
2018-12-21 20:15:44 +00:00
Avi Kivity
eae030b061 hints: reduce dependencies on db/config.hh
Instead of accessing extensions via config, access it via
database::extensions(). This reduces recompilations when configuration
is extended.
2018-12-21 20:15:44 +00:00
Avi Kivity
cc8312a8b9 commitlog: reduce dependencies on db/config.hh
Instead of accessing extensions via config, access it via
database::extensions(). This reduces recompilations when configuration
is extended.
2018-12-21 20:15:43 +00:00
Avi Kivity
d2dae3af86 cql3: reduce dependencies on db/config.hh
Instead of accessing extensions via config, access it via
database::extensions(). This reduces recompilations when configuration
is extended.
2018-12-21 20:15:43 +00:00
Avi Kivity
74c1afad29 database: provide accessor to db::extensions
Rather than forcing callers to go through get_config(), provide a
direct accessor. This reduces dependencies on config.hh, and will
allow separation of extensions from configuration.
2018-12-21 20:15:43 +00:00
Tomasz Grabiec
d2f96a60f6 sstables: mc: index_reader: Handle CK_SIZE split across buffers properly
we incorrectly falled-through to the next state instead of returning
to read more data.

This can manifest in a number of ways, an abort, or incorrect read.

Introduced in 917528c

Fixes #4011.

Message-Id: <1545402032-4114-1-git-send-email-tgrabiec@scylladb.com>
2018-12-21 16:34:10 +02:00
Tomasz Grabiec
7afe2bad51 sstables: mc: reader: Avoid unnecessary index reads on fast forwarding
When the next pending fragments are after the start of the new range,
we know there is no need to skip.

Caught by perf_fast_forward --datasets large-part-ds3 \
                            --run-tests=large-partition-slicing

Refs #3984
Message-Id: <1545308006-16389-1-git-send-email-tgrabiec@scylladb.com>
2018-12-20 16:21:07 +00:00
Gleb Natapov
393269d34b streaming: hold to sink while close() is running and call close on error as well
Currently if something throws while streaming in mutation sending loop
sink is not closed. Also when close() is running the code does not hold
onto sink object. close() is async, so sink should be kept alive until
it completes. The patch uses do_with() to hold onto sink while close is
running and run close() on error path too.

Fixes #4004.

Message-Id: <20181220155931.GL3075@scylladb.com>
2018-12-20 18:03:37 +02:00
Rafi Einstein
533e46ac72 top_k: map template arguments
Added Hash and KeyEqual template arguments to enable unordered_map in top_k implementation.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-20 16:41:40 +02:00
Rafi Einstein
75f21954d4 top_k: whitespace and minor fixes
Style and minor logic changes from code review.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-20 16:41:33 +02:00
Tomasz Grabiec
2b55ab8c8e Merge "Add more extensive test for mutation reader fast-forwarding" from Paweł
Mutation readers allow fast-forwarding the ranges from which the data is
being read. The main user of this feature is cache which, when reading
from the underlying reader, may want to skip some data it already has.
Unsurprisingly, this adds more complexity to the implementation of the
readers and more edge cases the developers need to take care of.

While most of the readers were at least to some extent checked in this
area those test usually were quite isolated (e.g. one test doing
inter-partition fast-forwarding, another doing intra-partition
fast-forwarding) and as a consequence didn't cover many corner cases.

This patch adds a generic test for fast-forwarding and slicing that
covers more complicated scenarios when those operations are combined.
Needless to say that did uncover some problems, but fortunately none of
them is user-visible.

Fixes #3963.
Fixes #3997.

Tests: unit(release, debug)

* https://github.com/pdziepak/scylla.git test-fast-forwarding/v4.1:
  tests/flat_mutation_reader_assertions: accumulate received tombstones
  tests/flat_mutation_reader_assertions: add more test messages
  tests/flat_mutation_reader_assertions: relax has_monotonic_positions()
    check
  tests/mutation_readers: do not ignore streamed_mutation::forwarding
  Revert "mutation_source_test: add option to skip intra-partition
    fast-forwarding tests"
  memtable: it is not a single partition read if partition
    fast-forwaring is enabled
  sstables: add more tracing in mp_row_consumer_m
  row_cache: use make_forwardable() to implement
    streamed_mutation::forwarding
  row_cache: read is not single-partition if inter-partition forwarding
    is enabled
  row_cache: drop support for streamed_mutation::forwarding::yes
    entirely
  sstables/mp_row_consumer: position_range end bound is exclusive
  mutation_fragment_filter: handle streamed_mutation::forwarding::yes
    properly
  tests/mutation_reader: reduce sleeping time
  tests/memtable: fix partition_range use-after-free
  tests/mutation: fix partition range use-after-free
  flat_mutation_reader_from_mutations: add overload that accepts a slice
    and partition range
  flat_mutation_reader_from_mutations: fix empty range case
  flat_mutation_reader_from_mutations: destroy all remaining mutations
  tests/mutation_source: drop dropped column handling test
  tests/mutation_source: add test for complex fast_forwarding and
    slicing
2018-12-20 15:05:21 +01:00
Paweł Dziepak
3355d16938 tests/mutation_source: add test for complex fast_forwarding and slicing
While we already had tests that verified inter- and intra-partition
fast-forwarding as well as slicing, they had quite limited scope and
didn't combine those operations. The new test is meant to extensively
test these cases.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
26a30375b1 tests/mutation_source: drop dropped column handling test
Schema changes are now covered by for_each_schema_change() function.
Having some additional tests in run_mutation_source_tests() is
problematic when it is used to test intermediate mutation readers
because schema changes may be irrelevant for them, which makes the test
a waste of time (might be a problem in debug mode) and requires those
intermediate reader to use more complex underlying reader that supports
schema changes (again, problem in a very slow debug mode).
2018-12-20 13:27:25 +00:00
Paweł Dziepak
048ed2e3d3 flat_mutation_reader_from_mutations: destroy all remaining mutations
If the reader is fast-forwarded to another partition range mutation_ may
be left with some partial mutations. Make sure that those are properly
destroyed.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
d50cd31eee flat_mutation_reader_from_mutations: fix empty range case
An iterator shall not be dereferenced until it is verified that it is
dereferencable.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
93488209de tests/mutation: fix partition range use-after-free 2018-12-20 13:27:25 +00:00
Paweł Dziepak
e91165d929 tests/memtable: fix partition_range use-after-free 2018-12-20 13:27:25 +00:00
Paweł Dziepak
5db8dacd1f tests/mutation_reader: reduce sleeping time
It is a very bad taste to sleep anywhere in the code. The test should be
fixed to explicitly test various orderings between concurrent
operations, but before that happens let's at least readuce how much
those sleeps slow it down by changing it from milliseconds to
microseconds.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
243aade3b2 mutation_fragment_filter: handle streamed_mutation::forwarding::yes properly 2018-12-20 13:27:25 +00:00
Paweł Dziepak
dfa5b3d996 sstables/mp_row_consumer: position_range end bound is exclusive 2018-12-20 13:27:25 +00:00
Paweł Dziepak
df1d438fcd row_cache: drop support for streamed_mutation::forwarding::yes entirely 2018-12-20 13:27:25 +00:00
Paweł Dziepak
adcb3ec20c row_cache: read is not single-partition if inter-partition forwarding is enabled 2018-12-20 13:27:25 +00:00
Paweł Dziepak
7ecee197c4 row_cache: use make_forwardable() to implement streamed_mutation::forwarding
Implementing intra-partition fast-forwarding adds more complexity to
already very-much-not-trivial cache readers and isn't really critical in
any way since it is not used outside of the tests. Let's use the generic
adapter instead of natively implementing it.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
e96a5f96d9 sstables: add more tracing in mp_row_consumer_m 2018-12-20 13:27:25 +00:00
Paweł Dziepak
18825af830 memtable: it is not a single partition read if partition fast-forwaring is enabled
Single-partition reader is less expensive than the one that accepts any
range of partitions, but it doesn't support fast-forwarding to another
partition range properly and therefore cannot be used if that option is
enabled.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
bcb5aed1ef Revert "mutation_source_test: add option to skip intra-partition fast-forwarding tests"
This reverts commit b36733971b. That commit made
run_mutation_reader_tests() support  mutation_sources that do not implement
streamed_mutation::forwarding::yes. This is wrong since mutation_sources
are not allowed to ignore or otherwise not support that mode. Moreover,
there is absolutely no reason for them to do so since there is a
make_forwardable() adapter that can make any mutation_reader a
forwardable one (at the cost of performance, but that's not always
important).
2018-12-20 13:27:25 +00:00
Paweł Dziepak
8706750b9b tests/mutation_readers: do not ignore streamed_mutation::forwarding
It is wrong to silently ignore streamed_mutation::forwarding option
which completely changes how the reader is supposed to operate. The best
solution is to use make_forwardable() adapter which changes
non-forwardable reader to a forwardable one.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
edf2c71701 tests/flat_mutation_reader_assertions: relax has_monotonic_positions() check
Since 41ede08a1d "mutation_reader: Allow
range tombstones with same position in the fragment stream" mutation
readers emit fragments in non-decreasing order (as opposed to strictly
increasing), has_monotonic_posiitons() needs to be updated to allow
that.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
787d1ba7b2 tests/flat_mutation_reader_assertions: add more test messages 2018-12-20 13:27:25 +00:00
Paweł Dziepak
593fb936c2 tests/flat_mutation_reader_assertions: accumulate received tombstones
Current data model employed by mutation readers doesn't have an unique
representation of range tombstones. This complicates testing by making
multiple ways of emitting range tombstones and rows equally valid.

This patch adds an option to verify mutation readers by checking whether
tombstones they emit properly affect the clustered rows regardless of how
exactly the tombstones are emitted. The interface of
flat_mutation_reader_assertions is extended by adding
may_produce_tombstones() that accepts any number of tombstones and
accumulates them. Then, produces_row_with_key() accepts an additional
argument which is the expected timestamp of the range tombstone that
affects that row.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
e6d26a528f Merge "Optimize slicing sstable readers" from Tomasz
"
Contains several improvements for fast-forwarding and slicing readers. Mainly
for the MC format, but not only:

  - Exiting the parser early when going out of the fast-forwarding window [MC-format-only]
  - Avoiding reading of the head of the partition when slicing
  - Avoiding parsing rows which are going to be skipped [MC-format-only]
"

* 'sstable-mc-optimize-slicing-reads' of github.com:tgrabiec/scylla:
  sstables: mc: reader: Skip ignored rows before parsing them
  sstables: mc: reader: Call _cells.clear() when row ends rather than when it starts
  sstables: mc: mutation_fragment_filter: Take position_in_partition rather than a clustering_row
  sstables: mc: reader: Do not call consume_row_marker_and_tombstone() for static rows
  sstables: mc: parser: Allow the consumer to skip the whole row
  sstables: continuous_data_consumer: Introduce skip()
  sstables: continuous_data_consumer: Make position() meaningful inside state_processor::process_state()
  sstables: mc: parser: Allocate dynamic_bitset once per read instead of once per row
  sstables: reader: Do not read the head of the partition when index can be used
  sstables: mc: mutation_fragment_filter: Check the fast-forward window first
  sstables: mc: writer: Avoid calling unsigned_vint::serialized_size()
2018-12-20 12:48:22 +00:00
Avi Kivity
b66f59aa3d Merge "materialized views: Apply backpressure from view replicas" from Duarte
"
As the amount of pending view updates increases we know that there’s a
mismatch between the rate at which the base receives writes and the
rate at which the view retires them. We react by applying backpressure
to decrease the rate of incoming base writes, allowing the slow view
replicas to catch up. We want to delay the client’s next writes to a
base replica and we use the base’s backlog of view updates to derive
this delay.

To validate this approach we tested a 3 node Scylla cluster on GCE,
using n1-standard-4 instances with NVMEs. A loader running on a
n1-standard-8 instance run cassandra-stress with 100 threads. With the
delay function d(x) set to 1s, we see no base write timeouts. With the
delay function as defined in the series, we see that backlogs stabilize
at some (arbitrary) point, as predicted, but this stabilization
co-exists with base write timeouts. However, the system overall behaves
better than the current version, with the 100 view update limit, and
also better than the version without such limit or any backpressure.

More work is necessary to further stabilize the system. Namely, we want
to keep delaying until we see the backlog is decreasing. This will
require us to add more delay beyond the stabilization point, which in
turn should minimize the base write timeouts, and will also minimize the
amount of memory the backlog takes at each base replica.

Design document:
    https://docs.google.com/document/d/1J6GeLBvN8_c3SbLVp8YsOXHcLc9nOLlRY7pC6MH3JWo

Fixes #2538
"

Reviewed-by: Nadav Har'El <nyh@scylladb.com>

* 'materialized-views/backpressure/v2' of https://github.com/duarten/scylla: (32 commits)
  service/storage_proxy: Release mutation as early as possible
  service/storage_proxy: Delay replica writes based on view update backlog
  service/storage_proxy: Get the backlog of a particular base replica
  service/storage_proxy: Add counters for delayed base writes
  main: Start and stop the view_update_backlog_broker
  service: Distribute a node's view update backlog
  service: Advertise view update backlog over gossip
  service/storage_proxy: Send view update backlog from replicas
  service/storage_proxy: Prepare to receive replica view update backlog
  service/storage_proxy: Expose local view update backlog
  tests/view_schema_test: Add simple test for db::view::node_update_backlog
  db/view: Introduce node_update_backlog class
  db/hints: Initialize current backlog
  database: Add counter for current view backlog
  database: Expose current memory view update backlog
  idl: Add db::view::update_backlog
  db/view: Add view_update_backlog
  database: Wait on view update semaphore for view building
  service/storage_proxy: Use near-infinite timeouts for view updates
  database: generate_and_propagate_view_updates no longer needs a timeout
  ...
2018-12-20 12:44:51 +02:00
Asias He
bcba6b4f4d streaming: Futurize estimate_partitions
The loop can take a long time if the number of sstables and/or ranges
are large. To fix, futurize the loop.

Fixes: #4005

Message-Id: <3b05cb84f3f57cc566702142c6365a04b075018e.1545290730.git.asias@scylladb.com>
2018-12-20 12:08:03 +02:00
Amos Kong
385d74db01 redhat/scylla.spec: add python34-setuptools dependency
Commit 00476c3946 switched some scripts to python3, it introduced an
ImportError: No module named 'pkg_resources'.

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <293c05d9315ec6c9da1f32e8cb3d2fdf8d8d3924.1545272049.git.amos@scylladb.com>
2018-12-20 06:32:36 +02:00
Duarte Nunes
2d7c026d6e service/storage_proxy: Release mutation as early as possible
When delaying a base write, there is no need to hold on to the
mutation if all replicas have already replied.

We introduce mutation_holder::release_mutation(), which frees the
mutations that are no longer needed during the rest of the delay.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
756b601560 service/storage_proxy: Delay replica writes based on view update backlog
As the amount of pending view updates increases we know that there’s a
mismatch between the rate at which the base receives writes and the
rate at which the view retires them. We react by applying backpressure
to decrease the rate of incoming base writes, allowing the slow view
replicas to catch up. We want to delay the client’s next writes to a
base replica. We use the base’s backlog of view updates to derive
this delay.

If we achieve CL and the backlogs of all replicas involved were last
seen to be empty, then we wouldn't delay the client's reply. However,
it could be that one of the replicas is actually overloaded, and won't
reply for many new such requests. We'll eventually start applying
backpressure to the client via the background's write queue, but in
the meanwhile we may be dropping view updates. To mitigate this we rely
on the backlog being gossiped periodically.

Fixes #2538

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
997bdf5d98 service/storage_proxy: Get the backlog of a particular base replica
Add a function that returns the view update backlog for a particular
replica.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
819b6f3406 service/storage_proxy: Add counters for delayed base writes
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
6df32bfb0c main: Start and stop the view_update_backlog_broker
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
37dfd22619 service: Distribute a node's view update backlog
This patch introduces the view_update_backlog_broker class, which is
responsible for periodically updating the local gossip state with the
current node's view update backlog. It also registers to updates from
other nodes, and updates the local coordinator's view of their view
update backlogs.

We consider the view update backlog received from a peer through the
mutation_done verb to be always fresh, but we consider the one received
through gossip to be fresh only if it has a higher timestamp than what
we currently have recorded.

This is because a node only updates its gossip state periodically, and
also because a node can transitively receive gossip state about a third
node with outdated information.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
8da6a31e75 service: Advertise view update backlog over gossip
This lays the groundwork for brokering a node's view update
backlog across the whole cluster. This is needed for when a
coordinator does not contact a given replica for a long time, and uses
a backlog view that is outdated and causes requests to be
unnecessarily delayed.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
ede5742f9b service/storage_proxy: Send view update backlog from replicas
Change the inter-node protocol so we can propagate the view update
backlog from a base replica to the coordinator through the
mutation_done and mutation_failed verbs.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
34b48e1d98 service/storage_proxy: Prepare to receive replica view update backlog
In subsequent patches, replicas will reply to the coordinator with
their view update backlog. Before introducing changes to the
messaging_service, prepare the storage_proxy to receive and store
those backlogs.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
776fdd4d1a service/storage_proxy: Expose local view update backlog
The local view update backlog is the max backlog out of the relative
memory backlog size and the relative hints backlog size.

We leverage the db::view::node_update_backlog class so we can send the
max backlog out of the node's shards.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
6662475dd9 tests/view_schema_test: Add simple test for db::view::node_update_backlog
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
2bd76f8fc5 db/view: Introduce node_update_backlog class
This class is an atomic view update backlog representation,
safe to update from multiple shards.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
6afbec4685 db/hints: Initialize current backlog
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
8d6718b6e4 database: Add counter for current view backlog
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
2174eed640 database: Expose current memory view update backlog
Expose the base replica's current memory view update backlog, which is
defined in terms of units consumed from the semaphore.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
d54ac4961d idl: Add db::view::update_backlog
Add db::view::update_backlog to the newly created view.idl.hh.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
12ce517242 db/view: Add view_update_backlog
The view update backlog represents the pending view data that a base replica
maintains. It is the maximum of the memory backlog - how much memory pending
view updates are consuming - and the disk backlog - how much view hints are
consuming. The size of a backlog is relative to its maximum size.

We will use this class to represent a base replica's view update
backlog at the coordinator.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
fc9176e784 database: Wait on view update semaphore for view building
View building sends view updates synchronously, which has natural
backpressure. However, they

1) Contribute to the load on the view replicas, and;
2) Add memory pressure to the base replica.

They should thus count towards the current view update backlog, and
consume units from the view update concurrency semaphore.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
e33e187096 service/storage_proxy: Use near-infinite timeouts for view updates
View updates are sent with a timeout of 5 minutes, unrelated to
any user-defined value and meant as a protection mechanism. During
normal operation we don’t benefit from timing out view writes and
offloading them to the hinted-handoff queue, since they are an
internal, non-real time workload that we already spent resources on.

This value should be increases further, but that change depends on

Refs #2538
Refs #3826

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
86198060e5 database: generate_and_propagate_view_updates no longer needs a timeout
We no longer wait on the semaphore and instead over-subscribe it, so
there's not reason to pass a timeout.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
39eda68094 database: Don't generate view updates when node is overloaded
We arrive at an overloaded state when we fail to acquire semaphore
units in the base replica. This can mean clients are working in
interactive mode, we fail to throttle them and consequently should
start shedding load. We want to avoid impacting base table
availability by running out of memory, so we could offload the memory
queue to disk by writing the view updates as hints without attempting
to send them. However, the disk is also a limited resource and in
extreme cases we won’t be able to write hints. A tension exists
between forgetting the view updates, thereby opening up a window for
inconsistencies between base and view, or failing the base replica
write. The latter can fail the whole user write, or if the
coordinator was able to achieve CL, can instead cause inconsistencies
between base tables (we wouldn't want to store a hint, because if the
base replica is still overloaded, we would redo the whole dance).

Between the devil and the deep blue sea, we chose to forget view
updates. As a further simplification, we don't even write hints,
assuming that if clients can’t be throttled (as we'll attempt to do in
future patches), it will only be a matter of time before view updates
can’t be offloaded. We also start acquiring the semaphore units using
consume(), which is non-blocking, but allows for underflow of the
available semaphore units. This is okay, and we expect not to underflow
by much, as we stop generating new view updates.

Refs #2538

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
a3d30ea99a db/view: Propagate acquired semaphore units to mutate_MV()
Propagate acquired semaphore units to mutate_MV() to allow the
semaphore to be incrementally signalled as view updates are processed
by view replicas.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
8c1e6fcee8 db/timeout_clock: Define timeout_semaphore_units
Defines the type of semaphore_units<> associated with
timeout_semaphore.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
11c02c51fe database: Wait for pending view updates to drain before stopping
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
185a4594af database: Restore formatting of table::stop()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
f286d2ec34 database: Wait for pending operations in table::stop()
Stopping a table with in-flight reads and writes can be happening
concurrently, which rely on table state and we must therefore prevent
its destruction before those operations complete.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
1f1fc36b72 database: Make view update concurrency semaphore memory-based
The semaphore currently limiting the amount of view updates a given
base replica emits aims to control the load that is imposed on the
cluster, to protect view replicas from being overloaded when there
are bursts of traffic (especially for degenerate cases like an index
with low selectivity).

100 is, however, an arbitrary number. It might allow too much load on
the view replicas, and it might also allow too much memory from the
base shard to be consumed. Conversely, it might allow for too few
updates to be queued in case of a burst, or to absorb updates while a
view replica becomes partitioned.

To deal with the load that is inflicted on the cluster, future patches
will ensure that the rate of base writes obeys the rate at which the
slowest view replica can consume the corresponding view updates.

To protect the current shard from using too much memory for this
queue, we will limit it to 10% of the shard's memory. The goal is to
both protect the shard from being overloaded, but also to allow it to
absorb bursts of writes resulting in large view mutations.

Refs #2538

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
bf4277fd8c service/storage_proxy: Remove unused send_to_endpoint() overloads
The send_to_endpoint() overloads that receive a non-frozen mutation
are no longer used.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
2753cfee88 db/view: Generate view updates as frozen_mutations
Working in terms of frozen_mutations allows us to account more
precisely the memory pending view updates consume at the storage_proxy
layer.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
715da6fd6b db/view: Reserve vector space in mutate_MV()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
5d011eb61f db/view: Cleanup mutate_MV()
In particular, extract out the logic updating the stats in case of a
failed update.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
7cfcd21bbb database: Make lambda in table::populate_views mutable
This allows an std::move() in its body to work as intended. Also, make
the lambda's argument type explicit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
122737a8ab Merge seastar upstream
* seastar 132e6cd...6c8c229 (3):
  > reactor: disable nowait aio due to a kernel bug
  > core/semaphore: Allow combining semaphore_units()
  > core/shared_ptr: Allow releasing a lw_shared_ptr to a non-const object

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181217153241.67514-2-duarte@scylladb.com>
2018-12-19 12:57:07 +02:00
Duarte Nunes
bf05e59672 seastar: Change the source repository to scylla-seastar
Scylla is at the moment incompatible with the Seastar master branch,
so in order to allow Scylla commits that depend on Seastar patches,
we change the submodule to point to scylla-seastar and use a branch
(master-20181217) to hold these dependent commits.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181217153241.67514-1-duarte@scylladb.com>
2018-12-19 12:57:03 +02:00
Rafael Ávila de Espíndola
ff18c837b7 tests: Add missing include in random-utils.hh
This file uses std::cout and so should include <iostream>.

Found with a patch to seastar that removes some redundant <iostream>
includes.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181218183816.34504-1-espindola@scylladb.com>
2018-12-19 10:52:19 +00:00
Avi Kivity
dd51c659f7 config: remove "to be removed before release" notice mc sstable config
The "enable_sstables_mc_format" config item help text wants to remove itself
before release. Since scylla-3.0 did not get enough mc format mileage, we
decided to leave it in, so the notice should be removed.

Fixes #4003.
Message-Id: <20181219082554.23923-1-avi@scylladb.com>
2018-12-19 09:39:29 +00:00
Duarte Nunes
a7456db687 Merge 'Simplify natural endpoint calculation' from Calle
"
Implementation of origin change c000da13563907b99fe220a7c8bde3c1dec74ad5

Modifies network topology calculation, reducing the amount of
maps/sets used by applying the knowledge of how many replicas we
expect/need per dc and sharing endpoint and rack set (since we cannot have
overlaps).

Also includes a transposed origin test to ensure new calculation
matches the old one.

Fixes #2896
"

* 'calle/network_topology' of github.com:scylladb/seastar-dev:
  network_topology_test: Add test to verify new algorith results equals old
  network_topology_strategy: Simplify calculate_natural_endpoints
  token_metadata: Add "get_location" ip to dc+rack accessor
  sequenced_set: Add "insert" method, following std::set semantics
2018-12-19 09:39:29 +00:00
Rafael Ávila de Espíndola
b93d8d863d Add a test with mismatched timestamps.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181218035931.3554-1-espindola@scylladb.com>
2018-12-18 11:30:56 +01:00
Tomasz Grabiec
37d9ba68bc sstables: mc: reader: Skip ignored rows before parsing them
Currently filtering happens inside consume_row_end() after the whole
row is parsed. It's much faster to skip without parsing.

This patch moves filtering and range tombstones splitting to
consume_row_start().

_stored_row is no longer needed because in case the filter returns
store_and_finish, the consumer exits with retry_later, and the parser
will call consume_row_start() again when resumed.

Tests:

  ./build/release/tests/perf/perf_fast_forward_g \
     --sstable-format=mc \
     --datasets large-part-ds1 \
     --run-tests=large-partition-skips

Before:

read    skip      time (s)     frags     frag/s    mad f/s    max f/s    min f/s    aio      (KiB)
1       4096      1.085142      1953       1800         32       1803       1720   4990     159604

After:

read    skip      time (s)     frags     frag/s    mad f/s    max f/s    min f/s    aio      (KiB)
1       4096      0.694560      1953       2812         11       2813       2684   4986     159588
2018-12-18 11:13:52 +01:00
Tomasz Grabiec
e3c3ef2f0e sstables: mc: reader: Call _cells.clear() when row ends rather than when it starts
This way we will later avoid calling clear() for ignored rows.
2018-12-18 11:11:48 +01:00
Tomasz Grabiec
fa126106f8 sstables: mc: mutation_fragment_filter: Take position_in_partition rather than a clustering_row 2018-12-18 11:11:48 +01:00
Tomasz Grabiec
522a75f761 sstables: mc: reader: Do not call consume_row_marker_and_tombstone() for static rows
mp_row_consumer_m::consume_row_marker_and_tombstone() is called for
both clustering and static rows, but it dereferences and modifies
_in_progress_row, which is only set when inside a clustering row.

Fixes #3999.
2018-12-18 11:11:47 +01:00
Tomasz Grabiec
9498977a34 sstables: mc: parser: Allow the consumer to skip the whole row
The MC format contains row size before the row body, which we can use
to skip the row without parsing its contents, which will be much
faster.
2018-12-18 11:11:47 +01:00
Tomasz Grabiec
b4c3b78082 sstables: continuous_data_consumer: Introduce skip() 2018-12-18 11:11:47 +01:00
Tomasz Grabiec
36dd660507 sstables: continuous_data_consumer: Make position() meaningful inside state_processor::process_state()
Will allow state_processor to know its position in the
stream.

Currently position() is meaningless inside process_state() because in
some cases it points to the position after the buffer and in some
cases before it. This patch standardizes on the former. This is more
useful than the latter because process_state() trims from the front of
the buffer as it consumes, so the position inside the stream can be
obtained by subtracting the remaining buffer size from position(),
without introducing any new variables.
2018-12-18 11:11:47 +01:00
Tomasz Grabiec
e950c8b00a sstables: mc: parser: Allocate dynamic_bitset once per read instead of once per row
The size of the bitset is the same for given row kind across the sstable, so we can
allocate it once.

_columns_selector is moved into row_schema structure, which we have
one for each row kind and setup in the constructor.
2018-12-18 11:11:47 +01:00
Tomasz Grabiec
fb15759934 sstables: reader: Do not read the head of the partition when index can be used
read_partition() was always called through read_next_partition(), even
if we're at the beginning of the read.  read_next_partition() is
supposed to skip to the next partition. It still works when we're
positioned before a partition, it doesn't advance the consumer, but it
clears _index_in_current_partition, because it (correctly) assumes it
corresponds to the partition we're about to leave, not the one we're
about to enter.

This means that index lookups we did in the read initializer will be
disregarded when reading starts, and we'll always start by reading
partition data from the data file. This is suboptimal for reads which
are slicing a large partition and don't need to read the front of the
partition.

Regression introduced in 4b9a34a854.

The fix is to call read_partition() directly when we're positioned at
the beginning of the partition. For that purpose a new flag was
introduced.

test_no_index_reads_when_rows_fall_into_range_boundaries has to be
relaxed, because it assumed that slicing reads will read the head of
the partition.

Refs #3984
Fixes #3992

Tested using:

 ./build/release/tests/perf/perf_fast_forward_g \
     --sstable-format=mc \
     --datasets large-part-ds1 \
     --run-tests=large-partition-slicing-clustering-keys

Before (focus on aio):

offset  read      time (s)     frags     frag/s    mad f/s    max f/s    min f/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
4000000 1         0.001378         1        726          5        736        102      6        200       4       2        0        1        1        0        0        0  65.8%

After:

offset  read      time (s)     frags     frag/s    mad f/s    max f/s    min f/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
4000000 1         0.001290         1        775          6        788        716      2        136       2       0        0        1        1        0        0        0  69.1%
2018-12-18 11:11:37 +01:00
Tomasz Grabiec
385a4c23fd sstables: mc: mutation_fragment_filter: Check the fast-forward window first
Otherwise the parser will keep consuming and dropping fragments
needlessly, rather than giving the user a chance to consume
end-of-stream condition, and maybe skip again.

Refs #3984
2018-12-18 11:11:37 +01:00
Tomasz Grabiec
62a1afaac9 sstables: mc: writer: Avoid calling unsigned_vint::serialized_size()
Rather than adding serialized_size() to the body size before
serializing the field, we can serialize the field to _tmp_bufs at the
beginning and have the body size automatically account for it.
2018-12-18 11:11:36 +01:00
Duarte Nunes
1f578be187 Merge 'Fix evictable shard reader related issues' from Botond
"
Recently some additional issues were discovered related to recent
changes to the way inactive readers are evicted and making shard readers
evictable.
One such issue is that the `querier_cache` is not prepared for the
querier to be immediately evicted by the reader concurrency semaphore,
when registered with it as an inactive read (#3987).
The other issue is that the multishard mutation query code was not
fully prepared for evicted shard readers being re-created, or failing
why being re-created (#3991).

This series fixes both of these issues and adds a unit test which covers
the second one. I am working on a unit test which would cover the second
issue, but it's proving to be a difficult one and I don't want to delay
the fixes for these issues any longer as they also affect 3.0.

Fixes: #3987
Fixes: #3991

Tests: unit(release, debug)
"

* 'evictable-reader-related-issues/v2' of https://github.com/denesb/scylla:
  multishard_mutation_query: reset failed readers to inexistent state
  multishard_mutation_query: handle missing readers when dismantling
  multishard_mutation_query: add support for keeping stats for discarded partitions
  multishard_mutation_query: expect evicted reader state when creating reader
  multishard_mutation_query: pretty-print the reader state in log messages
  querier_cache: check that the query wasn't evicted during registering
  reader_concurrency_semaphore: use the correct types in the constructor
  reader_concurrency_semaphore: add consume_resources()
  reader_concurrency_semaphore::inactive_read_handle: add operator bool()
2018-12-17 15:36:23 +00:00
Calle Wilund
e353a8633a network_topology_test: Add test to verify new algorith results equals old
Transposed from origin unit test.

Creates a semi-random topology of racks, dcs, tokens and replication
factors and verifies endpoint calculation equals old algo.
2018-12-17 13:10:59 +00:00
Calle Wilund
bfc6c89b00 network_topology_strategy: Simplify calculate_natural_endpoints
Fixes #2896 (hopefully)

Implementation of origin change c000da13563907b99fe220a7c8bde3c1dec74ad5

Reduces the amount of maps and sets and general complexity of
endpoint calculation by simply mapping dc:s to expected node
counts, re-using endpoint sets and iterate thusly.

Tested with transposed origin unit test comparing old vs. new
algo results. (Next patch)
2018-12-17 13:10:59 +00:00
Botond Dénes
b4c3aab4a7 multishard_mutation_query: reset failed readers to inexistent state
When attempting to dismantling readers, some of the to-be-dismantled
readers might be in a failed state. The code waiting on the reader to
stop is expecting failures, however it didn't do anything besides
logging the failure and bumping a counter. Code in the lower layers did
not know how to deal with a failed reader and would trigger
`std::bad_variant_access` when trying to process (save or cleanup) it.
To prevent this, reset the state of failed readers to `inexistent_state`
so code in the lower layers doesn't attempt to further process them.
2018-12-17 13:18:08 +02:00
Botond Dénes
9cef043841 multishard_mutation_query: handle missing readers when dismantling
When dismantling the combined buffer and the compaction state we are no
longer guaranteed to have the reader each partition originated from. The
reader might have been evicted and not resumed, or resuming it might
have failed. In any case we can no longer assume the originating reader
of each partition will be present. If a reader no longer exists,
discard the partitions that it emitted.
2018-12-17 13:18:08 +02:00
Botond Dénes
438bef333b multishard_mutation_query: add support for keeping stats for discarded partitions
In the next patches we will add code that will have to discard some of
the dismantled partitions/fragments/bytes. Prepare the
`dismantle_buffer_stats` struct for being able to track the discarded
partitions/fragments/bytes in addition to those that were successfully
dismantled.
2018-12-17 13:18:08 +02:00
Botond Dénes
ce52436af4 multishard_mutation_query: expect evicted reader state when creating reader
Previously readers were created once, so `make_remote_reader()` had a
validation to ensure readers were not attempted at being created more
than once. This validation was done by checking that the reader-state is
either `inexistent` or `successful_lookup`. However with the
introduction of pausing shard readers, it is now possible that a reader
will have to be created and then re-created several times, however this
validation was not updated to expect this.
Update the validation so it also expects the reader-state to be
`evicted`, the state the reader will be if it was evicted while paused.
2018-12-17 13:18:08 +02:00
Botond Dénes
1effb1995b multishard_mutation_query: pretty-print the reader state in log messages 2018-12-17 13:18:08 +02:00
Botond Dénes
5780f2ce7a querier_cache: check that the query wasn't evicted during registering
The reader concurrency semaphore can evict the querier when it is
registered as an inactive read. Make the `querier_cache` aware of this
so that it doesn't continue to process the inserted querier when this
happens.
Also add a unit test for this.
2018-12-17 13:18:08 +02:00
Botond Dénes
e1d8237e6b reader_concurrency_semaphore: use the correct types in the constructor
Previously there was a type mismatch for `count` and `memory`, between
the actual type used to store them in the class (signed) and the type
of the parameters in the constructor (unsigned).
Although negative numbers are completely valid for these members,
initializing them to negative numbers don't make sense, this is why they
used unsigned types in the constructor. This restriction can backfire
however when someone intends to give these parameters the maximum
possible value, which, when interpreted as a signed value will be `-1`.
What's worse the caller might not even be aware of this unsigned->signed
conversion and be very suprised when they find out.
So to prevent surprises, expose the real type of these members, trusting
the clients of knowing what they are doing.

Also add a `no_limits` constructor, so clients don't have to make sure
they don't overflow internal types.
2018-12-17 13:18:08 +02:00
Botond Dénes
dfd649a6b4 reader_concurrency_semaphore: add consume_resources() 2018-12-17 13:18:08 +02:00
Botond Dénes
21b44adbfe reader_concurrency_semaphore::inactive_read_handle: add operator bool() 2018-12-17 13:18:08 +02:00
Amnon Heiman
571755e117 node-exporter.service: Update command line to fix service startup
The upgrade to node_exporter 0.17 commit
09c2b8b48a ("node_exporter_install: switch
to node_exporter 0.17") caused the service to no longer start. Turns out
node_exported broke backwards compatibility of the command line between
0.15 to 0.16. Fix it up.

While fixing the command line, all the collector that are enabled by
default were removed.

Fixes #3989

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
[ penberg@scylladb.com: edit commit message ]
Message-Id: <20181213114831.27216-1-amnon@scylladb.com>
2018-12-17 10:22:17 +02:00
Rafael Ávila de Espíndola
4de14e6143 Add tests on broken mc range tombstones.
This tests that we diagnose both two consecutive range starts and two
consecutive range ends.
Message-Id: <20181214212608.95452-1-espindola@scylladb.com>
2018-12-15 13:53:25 +01:00
Avi Kivity
b023e8b45d Merge " Extract MC sstable writer to a separate compilation unit" from Tomasz
"
The motivation is to keep code related to each format separate, to make it
easier to comprehend and reduce incremental compilation times.

Also reduces dependency on sstable writer code by removing writer bits from
sstales.hh.

The ka/la format writers are still left in sstables.cc, they could be also extracted.
"

* 'extract-sstable-writer-code' of github.com:tgrabiec/scylla:
  sstables: Make variadic write() not picked on substitution error
  sstables: Extract MC format writer to mc/writer.cc
  sstables: Extract maybe_add_summary_entry() out of components_writer
  sstables: Publish functions used by writers in writer.hh
  sstables: Move common write functions to writer.hh
  sstables: Extract sstable_writer_impl to a header
  sstables: Do not include writer.hh from sstables.hh
  sstables: mc: Extract bound_kind_m related stuff into mc/types.hh
  sstables: types: Extract sstable_enabled_features::all()
  sstables: Move components_writer to .cc
  tests: sstable_datafile_test: Avoid dependency on components_writer
2018-12-14 15:05:00 +02:00
Duarte Nunes
224821303c Merge 'Reduce the dependency on database.hh' from Botond
"
Working on database.hh or any header that is included in database.hh
(of which there is a lot), is a major pain as each change involves the
recompilation of half of our compilation units.
Reduce the impact by removing the `#include "database.hh"` directive
from as many header files as possible. Many headers can make do with
just some forward declarations and don't need to include the entire
headers. I also found some headers that included database.hh without
actually needing it.

Results

Before:
    $ touch database.hh
    $ ninja build/release/scylla
    [1/154] CXX build/release/gen/cql3/CqlParser.o

After:
    $ touch database.hh
    $ ninja build/release/scylla
    [1/107] CXX build/release/gen/cql3/CqlParser.o
"

* 'reduce-dependencies-on-database-hh/v2' of https://github.com/denesb/scylla:
  treewide: remove include database.hh from headers where possible
  database_fwd.hh: add keyspace fwd declaration
  service/client_state: de-inline set_keyspace()
  Move cache_temperature into its own header
2018-12-14 12:24:48 +00:00
Piotr Sarna
63bd43e57e cql3: add refusing to create an index on static column
Secondary indexes on static columns are not yet supported,
so creating such index should return an appropriate error.

Fixes #3993
Message-Id: <700b0a71e80da52d2d5250edacc12626b55681fa.1544785127.git.sarna@scylladb.com>
2018-12-14 11:15:28 +00:00
Rafael Ávila de Espíndola
f48d54543f Use read_rows_flat to test broken sstables.
The previous code was using mp_row_consumer_k_l to be as close to the
tested code as possible.

Given that it is testing for an unhandled exception, there is probably
more value in moving it to a higher level, easier to use, API.

This patch changes it to use read_rows_flat().

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181210235016.41133-1-espindola@scylladb.com>
2018-12-14 10:14:28 +01:00
Botond Dénes
1865e5da41 treewide: remove include database.hh from headers where possible
Many headers don't really need to include database.hh, the include can
be replaced by forward declarations and/or including the actually needed
headers directly. Some headers don't need this include at all.

Each header was verified to be compilable on its own after the change,
by including it into an empty `.cc` file and compiling it. `.cc` files
that used to get `database.hh` through headers that no longer include it
were changed to include it themselves.
2018-12-14 08:03:57 +02:00
Botond Dénes
efe2b2c75d database_fwd.hh: add keyspace fwd declaration 2018-12-14 08:03:57 +02:00
Tomasz Grabiec
245a0d953a tests: cql_test_env: Start the compaction manager
Broken in fee4d2e

Not doing this results in compaction requests being ignored.

One effect of this is that perf_fast_forward produces many sstables instead of one.

Refs #3984
Refs #3983

Message-Id: <1544719540-10178-1-git-send-email-tgrabiec@scylladb.com>
2018-12-13 18:58:50 +02:00
Piotr Sarna
6743af5dbd cql3: refuse to create index on COMPACT STORAGE with ck
To follow C* compatibility, creating an index on COMPACT STORAGE
table should be disallowed not only on base primary keys,
but also when the base table contains clustering keys.
Message-Id: <ab40c39730aff2e164d11ee5159ff62b8ec9e8e8.1544698186.git.sarna@scylladb.com>
2018-12-13 13:39:12 +00:00
Duarte Nunes
f8878238ed service/storage_proxy: Embed the expire timer in the response handler
Embedding the expire timer for a write response in the
abstract_write_response_handler simplifies the code as it allows
removing the rh_entry type.

It will also make the timeout easily accessible inside the handler,
for future patches.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181213111818.39983-1-duarte@scylladb.com>
2018-12-13 14:25:21 +02:00
Tomasz Grabiec
3889b05d7e Merge "Tests and small fixes for composite markers" from Rafael
* https://github.com/espindola/scylla espindola/add-composite-tests:
  Remove newline from exception messages.
  Fix end marker exception message.
  Add tests for broken start and end composite markers.
2018-12-13 10:29:44 +01:00
Rafael Ávila de Espíndola
51fd880892 Add tests for broken start and end composite markers. 2018-12-13 10:29:44 +01:00
Rafael Ávila de Espíndola
64439f6477 Fix end marker exception message.
The code tested the end marker, but the exception mentioned the start
marker.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2018-12-13 10:29:44 +01:00
Rafael Ávila de Espíndola
cfd07185b7 Remove newline from exception messages.
They are inconsistent with other uses of malformed_sstable_exception
and incompatible with adding " in sstable ..." to the message.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2018-12-13 10:29:44 +01:00
Vlad Zolotarov
7da1ac2c2c large_partition_handler: fix the message
We currently detect large partitions - not rows. So this is what we
should be reporting.

Fixes #3986

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20181212215506.9879-1-vladz@scylladb.com>
2018-12-13 00:11:27 +00:00
Rafael Ávila de Espíndola
894f07f912 Move default case out of two switches.
These switches are fully covered, having the default label disables
-Wswitch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181212160904.17341-1-espindola@scylladb.com>
2018-12-12 18:20:24 +01:00
Botond Dénes
10336c13fc service/client_state: de-inline set_keyspace() 2018-12-12 18:14:03 +02:00
Botond Dénes
76fe4ebc18 Move cache_temperature into its own header
Some headers need to include database.hh just because of
cache_temperature. Move it into its own header so these includes can be
removed.
2018-12-12 16:03:45 +02:00
Tomasz Grabiec
0a853b8866 sstables: index_reader: Avoid schema copy in advance_to()
Introduced in 7e15e43.

Exposed by perf_fast_forward:

  running: large-partition-skips on dataset large-part-ds1
  Testing scanning large partition with skips.
  Reads whole range interleaving reads with skips according to read-skip pattern:
  read    skip      time (s)     frags     frag/s (...)
  1       0         5.268780   8000000    1518378

  1       1        31.695985   4000000     126199
Message-Id: <1544614272-21970-1-git-send-email-tgrabiec@scylladb.com>
2018-12-12 11:33:46 +00:00
Tomasz Grabiec
ff2ad2f6bb sstables: Make variadic write() not picked on substitution error
If write(v, out, x) doesn't match any overload, the variadic write()
will be picked, with Rest = {}. The compiler will print error messages
about unable to find write(v, out), which totally obscures the
original cause of mismatch.

Make it picked only when there are at least two write() parameters so
that debugging compilation errors is actually possible.
2018-12-12 12:07:31 +01:00
Tomasz Grabiec
a14633c6d0 sstables: Extract MC format writer to mc/writer.cc
This moves all MC-related writing code to mc/writer.cc:

  - m_format_write_helpers.hh is dropped
  - m_format_write_helpers_impl.hh is dropped
  - sstable_writer_m is moved out of sstables.cc

sstable_writer_m is renamed to sstables::mc::writer
2018-12-12 12:07:31 +01:00
Tomasz Grabiec
2636e6b5ab sstables: Extract maybe_add_summary_entry() out of components_writer
So that it can be used from writer implementations, which don't have
access to the definition of the components_writer.
2018-12-12 12:07:31 +01:00
Tomasz Grabiec
577e71478d sstables: Publish functions used by writers in writer.hh 2018-12-12 12:07:31 +01:00
Tomasz Grabiec
faf0ff1843 sstables: Move common write functions to writer.hh
They are common for sstable writers of different formats.

Note that writer.hh is supposed to be included only by writer
implementations, not writer users.
2018-12-12 12:07:31 +01:00
Tomasz Grabiec
3b4ccc85d0 sstables: Extract sstable_writer_impl to a header 2018-12-12 12:07:31 +01:00
Tomasz Grabiec
6e3c9c3e5e sstables: Do not include writer.hh from sstables.hh
It is only needed by writer implementations.
2018-12-12 12:07:05 +01:00
Tomasz Grabiec
bd7e9ad3ab sstables: mc: Extract bound_kind_m related stuff into mc/types.hh 2018-12-12 12:06:46 +01:00
Tomasz Grabiec
a4721b4d50 sstables: types: Extract sstable_enabled_features::all() 2018-12-12 12:06:45 +01:00
Tomasz Grabiec
90074d0b75 sstables: Move components_writer to .cc 2018-12-12 12:06:45 +01:00
Tomasz Grabiec
eff47a59ee tests: sstable_datafile_test: Avoid dependency on components_writer
It's LA format specific and it's going to become private to sstable.cc
2018-12-12 12:06:22 +01:00
Avi Kivity
fa96e07e6b build: pass C compiler configuration in relocatable package build
Just like we allow customizing the C++ compiler, we should allow customizing
the C compiler.

Ref #3978
Message-Id: <20181211172821.30830-1-avi@scylladb.com>
2018-12-12 11:45:13 +01:00
Calle Wilund
707bff563e token_metadata: Add "get_location" ip to dc+rack accessor 2018-12-12 09:32:05 +00:00
Calle Wilund
66472bc52d sequenced_set: Add "insert" method, following std::set semantics 2018-12-12 09:32:05 +00:00
Asias He
b9e0db801d repair: Enable row level repair
Finally, enable new row level repair if the cluster supports it. If not,
fallback to the old partition level repair.

Fixes #3033
2018-12-12 16:49:01 +08:00
Asias He
d372317e99 repair: Add row_level_repair
=== How the the partition level repair works

- The repair master decides which ranges to work on.
- The repair master splits the ranges to sub ranges which contains around 100
partitions.
- The repair master computes the checksum of the 100 partitions and asks the
related peers to compute the checksum of the 100 partitions.
- If the checksum matches, the data in this sub range is synced.
- If the checksum mismatches, repair master fetches the data from all the peers
and sends back the merged data to peers.

=== Major problems with partition level repair

- A mismatch of a single row in any of the 100 partitions causes 100
partitions to be transferred. A single partition can be very large. Not to
mention the size of 100 partitions.

- Checksum (find the mismatch) and streaming (fix the mismatch) will read the
same data twice

=== Row level repair

Row level checksum and synchronization: detect row level mismatch and transfer
only the mismatch

=== How the row level repair works

- To solve the problem of reading data twice

Read the data only once for both checksum and synchronization between nodes.

We work on a small range which contains only a few mega bytes of rows,
We read all the rows within the small range into memory. Find the
mismatch and send the mismatch rows between peers.

We need to find a sync boundary among the nodes which contains only N bytes of
rows.

- To solve the problem of sending unnecessary data.

We need to find the mismatched rows between nodes and only send the delta.
The problem is called set reconciliation problem which is a common problem in
distributed systems.

For example:
Node1 has set1 = {row1, row2, row3}
Node2 has set2 = {      row2, row3}
Node3 has set3 = {row1, row2, row4}

To repair:
Node1 fetches nothing from Node2 (set2 - set1), fetches row4 (set3 - set1) from Node3.
Node1 sends row1 and row4 (set1 + set2 + set3 - set2) to Node2
Node1 sends row3 (set1 + set2 + set3 - set3) to Node3.

=== How to implement repair with set reconciliation

- Step A: Negotiate sync boundary

class repair_sync_boundary {
    dht::decorated_key pk;
    position_in_partition position
}

Reads rows from disk into row buffers until the size is larger than N
bytes. Return the repair_sync_boundary of the last mutation_fragment we
read from disk. The smallest repair_sync_boundary of all nodes is
set as the current_sync_boundary.

- Step B: Get missing rows from peer nodes so that repair master contains all the rows

Request combined hashes from all nodes between last_sync_boundary and
current_sync_boundary. If the combined hashes from all nodes are identical,
data is synced, goto Step A. If not, request the full hashes from peers.

At this point, the repair master knows exactly what rows are missing. Request the
missing rows from peer nodes.

Now, local node contains all the rows.

- Step C: Send missing rows to the peer nodes

Since local node also knows what peer nodes own, it sends the missing rows to
the peer nodes.

=== How the RPC API looks like

- repair_range_start()

Step A:
- request_sync_boundary()

Step B:
- request_combined_row_hashes()
- reqeust_full_row_hashes()
- request_row_diff()

Step C:
- send_row_diff()

- repair_range_stop()

=== Performance evaluation

We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. We
created a keyspace with a replication factor of 3 and inserted 1 billion
rows to each of the 3 nodes. Each node has 241 GiB of data.
We tested 3 cases below.

1) 0% synced: one of the node has zero data. The other two nodes have 1 billion identical rows.

Time to repair:
   old = 87 min
   new = 70 min (rebuild took 50 minutes)
   improvement = 19.54%

2) 100% synced: all of the 3 nodes have 1 billion identical rows.
Time to repair:
   old = 43 min
   new = 24 min
   improvement = 44.18%

3) 99.9% synced: each node has 1 billion identical rows and 1 billion * 0.1% distinct rows.

Time to repair:
   old: 211 min
   new: 44 min
   improvement: 79.15%

Bytes sent on wire for repair:
   old: tx= 162 GiB,  rx = 90 GiB
   new: tx= 1.15 GiB, tx = 0.57 GiB
   improvement: tx = 99.29%, rx = 99.36%

It is worth noting that row level repair sends and receives exactly the
number of rows needed in theory.

In this test case, repair master needs to receives 2 million rows and
sends 4 million rows. Here are the details: Each node has 1 billion *
0.1% distinct rows, that is 1 million rows. So repair master receives 1
million rows from repair slave 1 and 1 million rows from repair slave 2.
Repair master sends 1 million rows from repair master and 1 million rows
received from repair slave 1 to repair slave 2. Repair master sends
sends 1 million rows from repair master and 1 million rows received from
repair slave 2 to repair slave 1.

In the result, we saw the rows on wire were as expected.

tx_row_nr  = 1000505 + 999619 + 1001257 + 998619 (4 shards, the numbers are for each shard) = 4'000'000
rx_row_nr  =  500233 + 500235 +  499559 + 499973 (4 shards, the numbers are for each shard) = 2'000'000

Fixes #3033
2018-12-12 16:49:01 +08:00
Asias He
b2b20cd5c0 repair: Add docs for row level repair 2018-12-12 16:49:01 +08:00
Asias He
fab31efae1 repair: Add repair_init_messaging_service_handler
This patch implements all the rpc handlers for row level repair.
2018-12-12 16:49:01 +08:00
Asias He
3c80727d51 repair: Add repair_meta
This patch introduces repair_meta class that is the core class for the
row level repair.

For each range to repair, repair_meta objects are created on both repair
master and repair slaves. It stores the meta data for the row level
repair algorithms, e.g, the current sync boundary, the buffer used to
hold the rows the peers are working on, the reader to read data from
sstable and the writer to write data to sstable.

This patch also implements the RPC verbs for row level repair, for
example, REPAIR_ROW_LEVEL_START/REPAIR_ROW_LEVEL_STOP to starts/stops
row level repair for a range, REPAIR_GET_SYNC_BOUNDARY to get sync
boundary peers want to work on, REPAIR_GET_ROW_DIFF to get missing rows
from repair slaves and REPAIR_PUT_ROW_DIFF to pus missing rows to repair
slaves.
2018-12-12 16:49:01 +08:00
Asias He
65099bac85 repair: Add repair_writer
repair_writer uses multishard_writer to apply the mutation_fragments to
sstable. The repair master needs one such writer for each of the repair
slave. The repair slave needs one writer for the repair master.
2018-12-12 16:49:01 +08:00
Asias He
5b75f64e0e repair: Add repair_reader
repair_reader is used to read data from disk. It is simply a local
flat_mutation_reader reader for the repair master. It is more
complicated for the repair slave.

The repair slaves have to follow what repair master read from disk.

For example,

Assume repair master has 2 shards and repair slave has 3 shards
Repair master on shard 0 asks repair slave on shard 0 to read range [0,100).
Repair master on shard 1 asks repair slave on shard 1 to read range [0,100).

Repair master on shard 0 will only read the data that belongs to shard 0
within range [0,100). Since master and slave have different shard count,
repair slave on shard 0 has to use the multi shard reader to collect
data on all the shards. It can not pass range [0, 100) to the multi
shard reader, otherwise it will read more data than the repair master.
Instead, repair slave uses a sharder using sharding configuration of the
repair master, to generate the sub ranges belong to shard 0 of repair
master.

If repair master and slave has the same sharding configuration, a simple
local reader is enough for repair slave.
2018-12-12 16:49:01 +08:00
Asias He
27128d132d repair: Add repair_row
repair_row is the in-memory representation of "row" that the row level
repair works on. It represents a mutation_fragment that is read from the
flat_mutation reader. The hash of a repair_row is the combination of the
mutation_fragment hash and partition_key hash.
2018-12-12 16:49:01 +08:00
Asias He
3e7b1d2ef4 repair: Add fragment_hasher
It is used to calculate the hash of a mutation_fragment.
2018-12-12 16:49:01 +08:00
Asias He
e135871e4a repair: Add decorated_key_with_hash
Represents a decorated_key and the hash for it so that we do not need to
calculate more than once if the decorated_key is used more than once.
2018-12-12 16:49:01 +08:00
Asias He
16c1b26937 repair: Add get_random_seed
Get a random uint64_t number as the seed for the repair row hashing.
The seed is passed to xx_hasher.

We add the randomization when hashing rows so that when we run repair
for the next time the same row produces different hashing number.
2018-12-12 16:49:01 +08:00
Asias He
54888ac52c repair: Add get_common_diff_detect_algorithm
It is used to find the common difference detection algorithms supported
by repair master and repair slaves.

It is up to repair master to choose what algorithm to use.
2018-12-12 16:49:01 +08:00
Asias He
0b294d5829 repair: Add shard_config
It is used to store the shard configuration.
2018-12-12 16:49:01 +08:00
Asias He
a36b0966cf repair: Add suportted_diff_detect_algorithms
It returns a vector of row level repair difference detection algorithms
supported by this node.

We are going to implement the "send_full_set" in the following patches.
2018-12-12 16:49:01 +08:00
Asias He
42f2cd8dc5 repair: Add repair_stats to repair_info
Also add update_statistics() to update current stats.
2018-12-12 16:49:01 +08:00
Asias He
43c04302f3 repair: Introduce repair_stats
It is used by row level repair to track repair statistics.
2018-12-12 16:49:01 +08:00
Asias He
0067d32b47 flat_mutation_reader: Add make_generating_reader
Move generating_reader from stream_session.cc to flat_mutation_reader.cc.
It will be used by repair code soon.

Also introduce a helper make_generating_reader to hide the
implementation of generating_reader.
2018-12-12 16:49:01 +08:00
Asias He
fe4afb1aa3 storage_service: Introduce ROW_LEVEL_REPAIR feature
With this feature enabled, the node supports row level repair.
2018-12-12 16:49:01 +08:00
Asias He
acc9ff8dce messaging_service: Add RPC verbs for row level repair
This patch adds the RPC verbs that are needed by the row level repair.
The usage of those verbs are in the following patches.

All the verbs for row level repair are sent by the repair master.
Repair master asks repair slaves to create repair meta objects, a.k.a,
repair_meta object, to store the repair meta data needed by row level
repair algorithm. The repair meta object is identified by the IP address
of the repair master and a uint32 number repair_meta_id chosen by repair
master. When repair master restarts or is out of the cluster, repair
slaves will detect it and remove all existing repair_meta for the repair
master. When repair slave restarts, the existing repair_meta on the
slave will be gone.

The sync boundary used in the verbs is the position_in_partition of the
last mutation_fragment. In each repair round, peers work on
(last_sync_boundary, current_sync_boundary]
2018-12-12 16:49:01 +08:00
Asias He
8cfdcf435e repair: Export the repair logger
It will be used by the row level repair soon.
2018-12-12 16:49:01 +08:00
Asias He
e62aeae2db repair: Export repair_info
It will be used by the row level repair soon.
2018-12-12 16:49:01 +08:00
Asias He
6be3b35d52 repair: Export estimate_partitions
It will be used by row level repair soon.
2018-12-12 16:49:01 +08:00
Asias He
48341a2d4d idl: Add decorated_key support
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
1db4e3fd0a idl: Add row_level_diff_detect_algorithm
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
ccc706559f idl: Add get_sync_boundary_response
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
1173d1dd5a idl: Add repair_sync_boundary
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
dc223e9216 idl: Add partition_key_and_mutation_fragments
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
5fbbc63676 idl: Add position_in_partition
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
e9fbc27740 idl: Add bound_weight
It will be used by the row level repair code.
2018-12-12 16:49:01 +08:00
Asias He
3c39462397 idl: Add partition_region
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
e2b9840e24 idl: Add repair_hash
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
1a0bc8acf1 repair: Add struct hash<node_repair_meta_id> for node_repair_meta_id 2018-12-12 16:49:01 +08:00
Asias He
28d090ffda repair: Add struct hash<repair_hash> for repair_hash 2018-12-12 16:49:01 +08:00
Asias He
ce70225b1c repair: Introduce row_level_diff_detect_algorithm
It specifies the algorithm that is used to find the row difference in
repair.
2018-12-12 16:49:01 +08:00
Asias He
e9251df478 repair: Introduce partition_key_and_mutation_fragments
Represent a partition_key and frozen_mutation_fragments within the
partition_key.
2018-12-12 16:49:01 +08:00
Asias He
5d5a1beaec repair: Introduce node_repair_meta_id
It uses an IP address and a repair_meta_id to identify a repair
instance started by the row level repair.
2018-12-12 16:49:01 +08:00
Asias He
edd72e10ac repair: Introduce get_sync_boundary_response
The return value of the REPAIR_GET_SYNC_BOUNDARY verb. It will be used
in the row level repair code soon.
2018-12-12 16:49:01 +08:00
Asias He
95b9a889cf repair: Introduce repair_hash
It represents the hash value of a repair row.
2018-12-12 16:49:01 +08:00
Asias He
3e86b7a646 repair: Introduce repair_sync_boundary
Represent a position of a mutation_fragment read from a flat mutation
reader. Repair nodes negotiate a small sub range identified by two
repair_sync_boundary to work on in each round.
2018-12-12 16:49:01 +08:00
Asias He
063dfcda26 messaging_service: Add constructor for msg_addr
Which takes the ip address and shard id.
2018-12-12 16:49:01 +08:00
Asias He
8cb3ea98d0 xx_hasher: Allow specifying seed
It will be used by row level repair.
2018-12-12 16:49:01 +08:00
Asias He
165d3053b1 position_in_partition: Add get_type, get_bound_weight and get_clustering_key_prefix
Needed by the RPC serialization code.
2018-12-12 16:49:01 +08:00
Asias He
4e55d22a8f position_in_partition: Switch _bound_weight to use enum
The _bound_weight in position_in_partition will be sent on wire in rpc.
Make it enum instead of int.
2018-12-12 16:49:01 +08:00
Asias He
5bc109e1ee position_in_partition: Add bound_weight
It will be used to change _bound_weight to use enum instead of int8_t.
2018-12-12 16:49:01 +08:00
Asias He
05c663b932 position_in_partition: Use std::optional for clustering_key_prefix
The new row level repair code will access clustering_key_prefix and it
uses std::optional everywhere. Convert position_in_partition to use
std::optional.
2018-12-12 16:49:01 +08:00
Asias He
0b31d7059b position_in_partition: Make partition_region uint8_t
It will be sent over rpc. Make the type explicit.
2018-12-12 16:49:01 +08:00
Asias He
dfd206b3a3 serializer: Add std::optional support 2018-12-12 16:49:01 +08:00
Asias He
3eecdc670f serializer: Add std::list support
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
b540df2819 serializer: Add std::unordered_set support
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
1367c8c47e dht: Add make_partitioner
Given the name and shard count and the sharding_ignore_msb_bits, make a
partitioner.

It is used by row level repair.
2018-12-12 16:49:01 +08:00
Asias He
f1a914060b dht: Add constructor for decorated_key which takes token and partition_key
decorated_key(const dht::token& t, const partition_key& k)
2018-12-12 16:49:01 +08:00
Asias He
71c1681f6c storage_service: Notify NEW_NODE only when a node is new node
This is a backport of CASSANDRA-11038.

Before this, a restarted node will be reported as new node with NEW_NODE
cql notification.

To fix, only send NEW_NODE notification when the node was not part of
the cluster

Fixes: #3979
Tests: pushed_notifications_test.py:TestPushedNotifications.restart_node_test
Message-Id: <453d750b98b5af510c4637db25b629f07dd90140.1544583244.git.asias@scylladb.com>
2018-12-12 07:33:49 +02:00
Juliana Oliveira
5eb76c9bc6 compress: add support for Cassandra's compression parameter
This patch adds compatibility for Cassandra's "chunk_size_in_kb", as
well as it keeps Scylla's "chunk_size_kb" compression parameter.

Fixes #3669
Tests: unit (release)

v2: use variable instead of array
v3: fix commited files

Signed-off-by: Juliana Oliveira <juliana@scylladb.com>
Message-Id: <20181211215840.GA7379@shenzou.localdomain>
2018-12-11 23:33:27 +00:00
Nadav Har'El
a0379209e6 secondary indexes: fail attempts to create a CUSTOM INDEX
Cassandra supports a "CREATE CUSTOM INDEX" to create a secondary index
with a custom implementation. The only custom implementation that Cassandra
supports is SASI. But Scylla doesn't support this, or any other custom
index implementation. If a CREATE CUSTOM INDEX statement is used, we
shouldn't silently ignore the "CUSTOM" tag, we should generate an error.

This patch also includes a regression test that "CREATE CUSTOM INDEX"
statements with valid syntax fail (before this patch, they succeeded).

Fixes #3977

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181211224545.18349-2-nyh@scylladb.com>
2018-12-11 23:33:02 +00:00
Nadav Har'El
36db4fba23 Fix typo in error message
Interestingly, this typo was copied from the original Cassandra source
code :-)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181211224545.18349-1-nyh@scylladb.com>
2018-12-11 23:32:58 +00:00
Avi Kivity
5b08e91bdb tools: add SYS_PTRACE capability to dbuild
LeakSanitizer uses ptrace, and docker disables ptrace by default. Add it
back so tests pass.
Message-Id: <20181208112524.19229-1-avi@scylladb.com>
2018-12-11 19:09:12 +00:00
Avi Kivity
34a31a807d build: build libdeflate with user selected C compiler
If the user specified a C compiler, use it to build libdeflate.

Fixes #3978.
Message-Id: <20181211145604.14847-1-avi@scylladb.com>
2018-12-11 14:58:16 +00:00
Duarte Nunes
89ae3fbf11 db/system_distributed_keyspace: Create the schema with min_timestamp
Different nodes can concurrently create the distributed system
keyspace on boot, before the "if not exists" clause can take effect.

However, the resulting schema mutations will be different since
different nodes use different timestamps. This patch forces the
timestamps to be the same across all nodes, so we save some schema
mismatches.

This fixes a bug exposed by ca5dfdf, whereby the initialization of the
distributed system keyspace is done before waiting for schema
agreement. While waiting for schema agreement in
storage_service::join_token_ring(), the node still hasn't joined the
ring and schemas can't be pulled from it, so nodes can deadlock. A
similar situation can happen between a seed node and a non-seed node,
where the seed node progresses to a different "wait for schema
agreement" barrier, but still can't make progress because it can't
pull the schema from the non-seed node still trying to join the ring.

Finally, it is assumed that changes to the schema of the current
distributed system keyspace tables will be protected by a cluster
feature and a subsequent schema synchronization, such that all nodes
will be at a point where schemas can be transferred around.

Fixes #3976

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181211113407.20075-1-duarte@scylladb.com>
2018-12-11 13:35:48 +01:00
Paweł Dziepak
e3f53542c9 Merge "Optimize sstable writing of large partitions" from Tomasz
"
This series contains several optimizations of the MC format sstable writer, mainly:
  - Avoiding output_stream when serializing into memory (e.g. a row)
  - Faster serialization of primitive types when serializing into memory

I measured the improvement in throughput (frag/s) using perf_fast_forward for
datasets with a single large partition with many small rows:

  - 10% for a row with a single cell of 8 bytes
  - 10% for a row with a single cell of 100 bytes
  -  9% for a row with a single cell of 1000 bytes
  - 13% for a row with 6 cells of 100 bytes
"

* tag 'avoid-output-stream-in-sstable-writer-v2' of github.com:tgrabiec/scylla:
  bytes_ostream: Optimize writing of fixed-size types
  sstables: mc: Write temporary data to bytes_ostream rather than file_writer
  sstables: mc: Avoid double-serialization of a range tombstone marker
  sstables: file_writer: Generalize bytes& writer to accept bytes_view
  sstables: Templetize write() functions on the writer
  sstables: Turn m_format_write_helpers.cc into an impl header
  sstables: De-futurize file_writer
  bytes_ostream: Implement clear()
  bytes_ostream: Make initial chunk size configurable
2018-12-11 12:29:24 +00:00
Duarte Nunes
d66bd0100b Merge 'Simplify db::extensions' from Avi
"
Carry out simplifications of db::extensions: less magical types, de-inline
complex functions, and reduce #include dependencies

Tests: unit(release)
"

* tag 'extensions-simplify/v1' of https://github.com/avikivity/scylla:
  extensions: remove unneeded includes
  extensions: deinline extension accessors
  extensions: return concrete types from the extension accessors
  extensions: remove dependency on cql layer
2018-12-10 22:00:51 +00:00
Avi Kivity
b251183359 extensions: remove unneeded includes
<boost/any.hpp> is not used, and "schema.hh" can be replaced with forward
declarations.
2018-12-10 21:34:09 +02:00
Avi Kivity
119a83bf2f extensions: deinline extension accessors
Quite complex code that is not performance sensitive. Move it out of line.
2018-12-10 21:22:56 +02:00
Avi Kivity
e9f5641b64 extensions: return concrete types from the extension accessors
Returning "auto" makes it harder to understand what the function is returning,
and impossible to de-inline.

Return a vector of pointers instead. The caller should iterate immediately, in
any case, and since the previous return value was a range of references to const
unique_ptrs, nothing else could be done with it anyway.
2018-12-10 21:16:45 +02:00
Tomasz Grabiec
f206ef0038 bytes_ostream: Optimize writing of fixed-size types
Inlining write() allows the writing code to be optimized for
fixed-size types. In particular, memcpy() calls and loops will be
eliminated.

Saw 4% improvement in throughput in perf_fast_forward for tiny rows.
2018-12-10 20:08:16 +01:00
Tomasz Grabiec
5a35240d47 sstables: mc: Write temporary data to bytes_ostream rather than file_writer
Currently temporary data is serialized into a file_writer, because
that's what write() functions used to expect, which goes through an
output_stream, a data_sink, into an in-memory data sink implementation
which collects the temporary_buffers.

Going through those abstractions is relatively expensive if we don't
write much, because each time we begin to write after a flush() of the
file_writer the output stream has to allocate a new buffer, which
means a large allocation for small amount of data.

We could avoid that and write into bytes_ostream directly, which will
keep its buffer across clear().

write() functions which are used both to write directly into the data
file and to a temporary arena were templatized to accept a Writer to
which both file_writer and bytes_ostream conform.
2018-12-10 20:08:16 +01:00
Tomasz Grabiec
c4003b3e79 sstables: mc: Avoid double-serialization of a range tombstone marker 2018-12-10 20:08:16 +01:00
Tomasz Grabiec
9edb9434e5 sstables: file_writer: Generalize bytes& writer to accept bytes_view
Note that bytes is imlpicitly convertible to bytes_view.
2018-12-10 20:08:16 +01:00
Tomasz Grabiec
fad4fba4bc sstables: Templetize write() functions on the writer
Will allow writing to both a file_writer, or an in-memory writer like
a bytes_ostream.
2018-12-10 20:08:16 +01:00
Tomasz Grabiec
f4016996d3 sstables: Turn m_format_write_helpers.cc into an impl header
I need to templatize functions defined in it and want to avoid
explicit instantiations.

There is only one compilation unit in which this is used
(sstables.cc). I think in the long term we should move all those
"helpers" into sstables/mc/writer.{cc,hh} together with their only
user, the sstable_writer_m class from sstables.cc.
2018-12-10 20:07:43 +01:00
Tomasz Grabiec
13999a4d09 sstables: De-futurize file_writer 2018-12-10 20:07:43 +01:00
Tomasz Grabiec
a1fb441df8 bytes_ostream: Implement clear() 2018-12-10 20:07:43 +01:00
Tomasz Grabiec
7cf5de3d9c bytes_ostream: Make initial chunk size configurable 2018-12-10 20:07:43 +01:00
Avi Kivity
8e05bcbe71 extensions: remove dependency on cql layer
The extensions class reaches into cql's property_definitions class to grab
a map<sstring, sstring> type. This generates a few unneeded dependencies.

Reduce dependencies by defining the map type ourselves; if cql's property_definitions
changes in an incompatible way, it will have to adapt, rather than the extensions
class.
2018-12-10 20:55:30 +02:00
Tomasz Grabiec
1dd2bf52ca Merge "Add a couple of tests of broken sstables" From Rafael
These are the current uninteresting cases I found when looking at
malformed_sstable_exception. The existing code is working, just not
being tested.

* https://github.com/espindola/scylla.git espindola/espindola/broken-sst:
  Add a broken sstable test.
  Add a test with mismatched schema.
2018-12-10 19:30:58 +01:00
Tomasz Grabiec
538e041f22 Merge "Remove some dependencies on db::config" from Avi
db::config is a global class; changes in any module can cause changes
in db::config. Therefore, it is a cause of needless recompilation.

Remove some of these dependencies by having consumers of db::config
declare an intermediate config struct that is contains only
configuration of interest to them, and have their caller fill it out
(in the case of auth, it already followed this scheme and the patchset
only moves the translation function).

In addition, some outright pointless inclusions of db/config.hh are
removed.

The result is somewhat shorter compile times, and fewer needless
recompiles.

* https://github.com/avikivity/scylla unconfig-1/v1:
  config: remove inclusions of db/config.hh from header files
  repair: remove unneeded config.hh inclusion
  batchlog_manager: remove dependency on db::config
  auth: remove permissions_cache dependency on db::config
  auth: remove auth::service dependency on db::config
  auth: remove unneeded db/config.hh includes
2018-12-10 14:53:14 +01:00
Benny Halevy
ef53ddf3ae scylla_io_setup: correct units in low space warning
GiB -> GB

Refs #2676

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181210092503.10344-1-bhalevy@scylladb.com>
2018-12-10 13:58:49 +02:00
Avi Kivity
475b151c97 Merge "Use utils::small_vector more in read path" from Paweł
"
This series optimises the read path by replacing some usages of
std::vector by utils::small_vector. The motivation for this change was
an observation that memory allocation functions are pointed out by the
profiler as the ones where we spent most time and while they have a
large number of callers storage allocation for some vectors was close to
the top. The gains are not huge, since the problem is a lot of things
adding up and not a single slow thing, but we need to start with
something.

Unfortunately, the performance of boost::container::small_vector is
quite disappointing so a new implementation of a small_vector was
introduced.

perf_simple_query -c4 --duration 60, medians:

       ./perf_before  ./perf_after  diff
 read      343086.80     360720.53  5.1%

Tests: unit(release, small_vector in debug)
"

* tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla:
  partition_slice: use small_vector for column_ids
  mutation_fragment_merger: use small_vector
  auth: use small_vector in resource
  auth: avoid list-initialisation of vectors
  idl: serialiser: add serialiser for utils::small_vector
  idl: serialiser: deduplicate vector serialisers
  utils: introduce small_vector
  intrusive_set_external_comparator: make iterator nothrow move constructible
  mutation_fragment_merger: value-initialise iterator
2018-12-10 13:50:59 +02:00
Duarte Nunes
a42b2895c2 Merge branch 'gossip: Send node UP event to cql client after cql server is up' from Asias
"
This is a backport of CASSANDRA-8236.

Before this patch, scylla sends the node UP event to cql client when it
sees a new node joins the cluster, i.e., when a new node's status
becomes NORMAL. The problem is, at this time, the cql server might not
be ready yet. Once the client receives the UP event, it tries to
connect to the new node's cql port and fails.

To fix, a new application_sate::RPC_READY is introduced, new node sets
RPC_READY to false when it starts gossip in the very beginning and sets
RPC_READY to true when the cql server is ready.

The RPC_READY is a bad name but I think it is better to follow Cassandra.

Nodes with or without this patch are supposed to work together with no
problem.

Refs #3843
"

* 'asias/node_up_down.upstream.v4.1' of github.com:scylladb/seastar-dev:
  storage_service: Use cql_ready facility
  storage_service: Handle application_state::RPC_READY
  storage_service: Add notify_cql_change
  storage_service: Add debug log in notify_joined
  storage_service: Add extra check in notify_joined
  storage_service: Add notify_joined
  storage_service: Add debug log in notify_up
  storage_service: Add extra check in notify_up
  storage_service: Add notify_up
  storage_service: Make notify_left log debug level
  storage_service: Introduce notify_left
  storage_service: Add debug log in notify_down
  storage_service: Introduce notify_down
  storage_service: Add set_cql_ready
  gossip: Add gossiper::is_cql_ready
  gms: Add endpoint_state::is_cql_ready
  gms: Add application_state::RPC_READY
  gms: Introduce cql_ready in versioned_value
2018-12-10 11:37:59 +00:00
Asias He
06dc9b8da0 storage_service: Use cql_ready facility
At this point the cql_ready facility is ready. To use it, advertise the
RPC_READY application state in the following cases:

- When a node boots, set it to false
- When cql server is ready, set it to true
- When cql server is down, set it to false
2018-12-10 19:20:20 +08:00
Asias He
4761b53035 storage_service: Handle application_state::RPC_READY 2018-12-10 19:20:20 +08:00
Asias He
0e64814206 storage_service: Add notify_cql_change
It is called when a RPC_READY gossip application state is received.
2018-12-10 19:20:20 +08:00
Asias He
a1bbd7bcc7 storage_service: Add debug log in notify_joined 2018-12-10 19:20:20 +08:00
Asias He
17d68cb408 storage_service: Add extra check in notify_joined
Do not send node joined event if node is not in NORMAL status which
means the node has joined the cluster officially.
2018-12-10 19:20:20 +08:00
Asias He
9abb15192f storage_service: Add notify_joined
Add a helper for node joined event.
2018-12-10 19:20:20 +08:00
Asias He
60c74431f7 storage_service: Add debug log in notify_up 2018-12-10 19:20:20 +08:00
Asias He
948d2b6c78 storage_service: Add extra check in notify_up
Do not send up event if is_cql_ready is false which means cql server is
not ready yet or node is down.
2018-12-10 19:20:20 +08:00
Asias He
48cd31dc1e storage_service: Add notify_up
Add a helper for node up event.
2018-12-10 19:20:20 +08:00
Asias He
03f9c3e7e5 storage_service: Make notify_left log debug level
Be consistent with other notification log.
2018-12-10 19:20:20 +08:00
Asias He
a5ec25f28b storage_service: Introduce notify_left
Add a helper for node left event.
2018-12-10 19:20:20 +08:00
Asias He
15d7fce902 storage_service: Add debug log in notify_down 2018-12-10 19:20:19 +08:00
Asias He
f18cb0654d storage_service: Introduce notify_down
Add a helper for node down event.
2018-12-10 19:20:19 +08:00
Asias He
2f3130b36f storage_service: Add set_cql_ready
It is used to set the status of the RPC_READY of this node so it can be
advertised by gossip.
2018-12-10 19:20:17 +08:00
Asias He
e07150166a gossip: Add gossiper::is_cql_ready
- New scylla node always send application_state::RPC_READY = false when
the node boots and send application_state::RPC_READY = true when cql
server is up

- Old scylla node that does not support the application_state::RPC_READY
never has application_state::RPC_READY in the endpoint_state, we can
only think their cql server is up, so we return true here if
application_state::RPC_READY is not present
2018-12-10 19:16:44 +08:00
Asias He
2737654c75 gms: Add endpoint_state::is_cql_ready
Retrun if the endpoint_state has the RPC_READY application_state.
2018-12-10 19:16:44 +08:00
Asias He
67093324ad gms: Add application_state::RPC_READY
It is used to tell peer nodes that the cql server is ready and can
accept clients request.

Follow the same name which Cassandra uses.
2018-12-10 19:16:44 +08:00
Asias He
4ed2ef23e9 gms: Introduce cql_ready in versioned_value 2018-12-10 19:16:43 +08:00
Avi Kivity
7c7da0b462 sstables: fix overflow in clustering key blocks header bit access
_ck_blocks_header is a 64-bit variable, so the mask should be 64 bits too.
Otherwise, a shift in the range 32-63 will produce wrong results.

Fix by using a 64-bit mask.

Found by Fedora 29's ubsan.

Fixes #3973.
Message-Id: <20181209120549.21371-1-avi@scylladb.com>
2018-12-10 11:09:25 +00:00
Takuya ASADA
a2d0ebf4d9 dist/offline_installer/redhat: fix missing dependencies
Offline installer with Scylla 3.0 causes dependency error on CentOS, added
missing packages.

Fixes #3969

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181207020711.23055-1-syuu@scylladb.com>
2018-12-10 12:47:10 +02:00
Avi Kivity
904db433d9 Merge "Re-use commitlog segments" from Calle
"
Refs #3929

Enables re-use of commitlog segments.

First, ensures we never succeed playing back a commitlog
segment with name not matching the ID:s in the actual
file data, by determining expected id based on file name.
This will also handle partially written re-used files, as
each chunk headers CRC is dependent on the ID, and will
fail once we hit any left-overs.

Second part renamed and puts files into a recycle list
instead of actually deleting them when finished.
Allocating new files will the prioritize this list
before creating a new file.

Note that since consumtion and release of segments can
be somewhat unbalanced, this does not really guarantee
we will use recycled files even in all cases when it
might be possible, simply because of timing. It does
however give a good chance of it.

We limit recycled files based on the max disk size
setting, thus we can potentially grow disk size
more than without depending on timing, but not
uncontrolled.

While all this theoretially might improve disk
writes in some cases, it is far from any magic bullet.
No real performance testing has been done yet, only
functional.
"

* 'calle/commitlog-reuse' of github.com:scylladb/seastar-dev:
  commitlog: Recycle used segments instead of delete + new file
  commitlog: Terminate all segments with a zero chunk
  commitlog_replay: Enforce file name based id matching
2018-12-10 11:15:02 +02:00
Calle Wilund
55f10ffc43 commitlog: Recycle used segments instead of delete + new file
Refs #3929

When deleting a segment, IFF we have not yet filled up all reserves,
instead of actually deleting the file, put it on a "recycle" list.
Next segment allocation will instead of creating a new one simply
rename the segment and reuse the file and its allocated space.

We rename the file twice: Once on adding to recycle list, with special
prefix so we don't mix up actual replayable segments and these. Second
when we actually re-use the file (also to ensure consecutive names).

Note that we limit the amount of recyclables, so a really stressed
application which somehow fills up the replenish queue might
cause us to still drop the segments. Could skip this but risk
getting to many files on disk.

Replay should be safe, since all entries are guarded by CRC based
on the file ID (i.e. file name). Thus replaying a recycled segment
will simply cause a CRC error in the main header and be ignored (see
previous patch).

Segments that are fully synced will have terminating zero-header (see
previous patch) so we know when to stop processing a recycled file.
If a file is the result of a mid-write crash, we will generate a CRC
processing error as "normally" in this case, when hitting partially
written block or coming to an old/new chunk boundary.

v2:
* Sync dir on rename
* auto -> const sstring&
* Allow recycling files as long as we're within disk space limits

v3:
* Use special names for files waiting for reuse
2018-12-10 09:09:07 +00:00
Calle Wilund
b13b6ef6a0 commitlog: Terminate all segments with a zero chunk
Writes a final chunk header of zero to the file on close, to mark
end-of-segment.
This allows us to gracefully stop replay processing of a segment file
even if it was not zeroed from the beginning (maybe recycled - hint
hint).
2018-12-10 09:09:07 +00:00
Calle Wilund
b35af84599 commitlog_replay: Enforce file name based id matching
When reading the header chunk of a commitlog file, check the stored id
value against the id derived from the file name, and ignore if
mismatched. This is a prerequisite for re-using renamed commitlog files,
as we can then fail-fast should one such be left on disk, instead of
trying to replay it.

We also check said id via the CRC check for each chunk parsed. If we
find a chunk with
mismatched id, we will get a CRC error for the chunk, and replay will
terminate (albeit not gracefully).
2018-12-10 09:09:07 +00:00
Amnon Heiman
09c2b8b48a node_exporter_install: switch to node_exporter 0.17
The newer version of node_exporter comes with important bug fixes, that
is especially important for I3.metal is not supported with the older
version of node_exporter.

The dashboards can now support both the new and the old version of
node_exporter.

Fixes #3927

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20181210085251.23312-1-amnon@scylladb.com>
2018-12-10 10:54:50 +02:00
Benny Halevy
bcb486b8b9 scylla_io_setup: io_tune should not run when there is less than 10GB of disk space
Fixes #2676

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181209174852.3620-1-bhalevy@scylladb.com>
2018-12-10 10:38:33 +02:00
Yibo Cai (Arm Technology China)
6717816a8d utils/gz: optimize crc_combine for arm64
Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Message-Id: <1544418903-26290-1-git-send-email-yibo.cai@arm.com>
2018-12-10 10:31:08 +02:00
Avi Kivity
40677fae37 Merge "Compaction strategy aware major compaction" from Raphael
"
Make major compaction aware of compaction strategy, by using an
optimal approach which suits the strategy needs.

Refs #1431.
"

* 'compaction_strategy_aware_major_compaction_v2' of github.com:raphaelsc/scylla:
  tests: add test for compaction-strategy-aware major compaction
  compaction: implement major compaction heuristic for leveled strategy
  compaction: introduce notion of compaction-strategy-aware major compaction
2018-12-10 10:10:22 +02:00
Avi Kivity
d7c7949d43 auth: remove unneeded db/config.hh includes 2018-12-09 20:11:38 +02:00
Avi Kivity
37a681e46d auth: remove auth::service dependency on db::config
auth::service already has its own configuration and a function to create it
from db::config; just move it to the caller. This reduces dependencies on the
global db::config class.
2018-12-09 20:11:38 +02:00
Avi Kivity
77e6b7a155 auth: remove permissions_cache dependency on db::config
permissions_cache already has its own configuration and a function to create it
from db::config; just move it to the caller. This reduces dependencies on the
global db::config class.
2018-12-09 20:11:38 +02:00
Avi Kivity
89be47e291 batchlog_manager: remove dependency on db::config
Extract configuration into a new struct batchlog_manager_config and have the
callers populate it using db::config. This reduces dependencies on global objects.
2018-12-09 20:11:38 +02:00
Avi Kivity
85e9b0d78d repair: remove unneeded config.hh inclusion 2018-12-09 20:11:38 +02:00
Avi Kivity
864f55e745 config: remove inclusions of db/config.hh from header files
Instead, distribute those inclusions to .cc files that require them. This
reduces rebuilds when config.hh changes, and makes it easier to locate files
that need config disaggregation.
2018-12-09 20:11:38 +02:00
Amos Kong
09a3b11c2f scylla_setup: only ask for nic in interactive mode
Current scylla_setup still asks for nic even nic is already assigned in cmdline.

Fixes #3908

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <6b867e17a5583c495c771a37d5fa1e8366b1d61b.1542337635.git.amos@scylladb.com>
2018-12-09 15:29:31 +02:00
Gleb Natapov
9fb79bf379 storage_proxy: fix crash during write timeout callback invocation
rh_entry address is captured inside timeout's callback lambda, so the
structure should not be moved after it is created. Change the code to
create rh_entry in-place instead of moving it into the map.

Fixes #3972.

Message-Id: <20181206164043.GN25283@scylladb.com>
2018-12-09 10:33:37 +02:00
Vladimir Krivopalov
6a5d8934a6 db: Enable SSTables 'mc' format by default.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <ab4394b98a520b87c986bea2ceef13d015688967.1544227350.git.vladimir@scylladb.com>
2018-12-08 11:07:38 +02:00
Tomasz Grabiec
b78d98a358 tests: perf_fast_forward: Fix result_collector::add() for multi-element results
The results vector should be populated vertically, not horizontally.

Responsible for assertion failure with --cache-enabled:

  void result_collector::add(test_result_vector): Assertion `rs.size() == results.size()' failed.

Introduced in 3fc78a25bf.
Message-Id: <1544105835-24530-2-git-send-email-tgrabiec@scylladb.com>
2018-12-07 12:44:32 +00:00
Tomasz Grabiec
10cde9ae50 tests: perf_fast_forward: Fix live_range not being initialized
Broken in 470552b7ab

Causes test failure when running with --cache-enabled
Message-Id: <1544105835-24530-1-git-send-email-tgrabiec@scylladb.com>
2018-12-07 12:38:01 +00:00
Tomasz Grabiec
bb24d378b2 Merge "Fixes for collecting stats in SST3 + more tests" from Vladimir
This patchset fixes several remaining issues found during thorough
testing of SSTables 3.x statistics and enriches ~30 unit tests with
statistics validation against Cassandra-generated golden copies.

* https://github.com/argenet/scylla/tree/projects/sstables-30/sst3-tests-statistics/v1:
  sstables: Enforce estimated_partitions in generate_summary() to be
    always positive.
  sstables: Don't enforce default max_local_deletion_time value for 'mc'
    files.
  sstables: Update TTL/local deletion stats for non-expiring and live
    liveness_info.
  sstables: Collect statistics when writing RT markers to SSTables 3.x.
  tests: Return sstable_assertions from validate_read() helper.
  tests: Introduce helper for validating stats metadata in SSTables 3.x
    tests.
  tests: Add stats metadata validation to test_write_static_row.
  tests: Add stats metadata validation to
    test_write_composite_partition_key.
  tests: Add stats metadata validation to
    test_write_composite_clustering_key.
  tests: Add stats metadata validation to test_write_wide_partitions.
  tests: Add stats metadata validation to write_ttled_row
  tests: Add stats metadata validation to write_ttled_column
  tests: Add stats metadata validation to write_deleted_column
  tests: Add stats metadata validation to write_deleted_row
  tests: Add stats metadata validation to write_collection_wide_update
  tests: Add stats metadata validation to
    write_collection_incremental_update
  tests: Add stats metadata validation to write_multiple_partitions
  tests: Add stats metadata validation to write_multiple_rows
  tests: Add stats metadata validation to
    write_missing_columns_large_set
  tests: Add stats metadata validation to write_different_types
  tests: Add stats metadata validation to write_empty_clustering_values
  tests: Add stats metadata validation to write_large_clustering_key
  tests: Add stats metadata validation to write_compact_table
  tests: Add stats metadata validation to write_user_defined_type_table
  tests: Add stats metadata validation to write_simple_range_tombstone
  tests: Add stats metadata validation to
    write_adjacent_range_tombstones
  tests: Add stats metadata validation to
    write_non_adjacent_range_tombstones
  tests: Add stats metadata validation to
    write_mixed_rows_and_range_tombstones
  tests: Add stats metadata validation to
    write_adjacent_range_tombstones_with_rows
  tests: Add stats metadata validation to
    write_range_tombstone_same_start_with_row
  tests: Add stats metadata validation to
    write_range_tombstone_same_end_with_row
  tests: Add stats metadata validation to
    write_two_non_adjacent_range_tombstones
  tests: Delete unused (bogus) Statistics.db file from write_ SST3
    tests.
2018-12-07 12:05:55 +01:00
Vladimir Krivopalov
98ae39f920 tests: Delete unused (bogus) Statistics.db file from write_ SST3 tests.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
dcd639b4d5 tests: Add stats metadata validation to write_two_non_adjacent_range_tombstones
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
d07ab3b3ef tests: Add stats metadata validation to write_range_tombstone_same_end_with_row
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
b856cf837e tests: Add stats metadata validation to write_range_tombstone_same_start_with_row
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
ba24572fb6 tests: Add stats metadata validation to write_adjacent_range_tombstones_with_rows
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
4167c9e51d tests: Add stats metadata validation to write_mixed_rows_and_range_tombstones
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
fd1c9b84c6 tests: Add stats metadata validation to write_non_adjacent_range_tombstones
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
1a6d613654 tests: Add stats metadata validation to write_adjacent_range_tombstones
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
57d2d1a1c6 tests: Add stats metadata validation to write_simple_range_tombstone
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
bc5d5633dc tests: Add stats metadata validation to write_user_defined_type_table
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
d9f2829ca0 tests: Add stats metadata validation to write_compact_table
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
3a1e287c6a tests: Add stats metadata validation to write_large_clustering_key
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
722fc7222a tests: Add stats metadata validation to write_empty_clustering_values
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
1367243b7e tests: Add stats metadata validation to write_different_types
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
12b10c0cca tests: Add stats metadata validation to write_missing_columns_large_set
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
c990c518fc tests: Add stats metadata validation to write_multiple_rows
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
9bb46f7cc6 tests: Add stats metadata validation to write_multiple_partitions
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
99d3cbd2fc tests: Add stats metadata validation to write_collection_incremental_update
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
0118b15c06 tests: Add stats metadata validation to write_collection_wide_update
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
85782ed729 tests: Add stats metadata validation to write_deleted_row
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
66913adcc6 tests: Add stats metadata validation to write_deleted_column
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
997101f105 tests: Add stats metadata validation to write_ttled_column
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
a018388049 tests: Add stats metadata validation to write_ttled_row
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
260dfb3492 tests: Add stats metadata validation to test_write_wide_partitions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
349a73c464 tests: Add stats metadata validation to test_write_composite_clustering_key.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
4f14e65d70 tests: Add stats metadata validation to test_write_composite_partition_key.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
a7b85e8009 tests: Add stats metadata validation to test_write_static_row.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
ccb2dec22b tests: Introduce helper for validating stats metadata in SSTables 3.x tests.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
5f6240cd7d tests: Return sstable_assertions from validate_read() helper.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
cc12449646 sstables: Collect statistics when writing RT markers to SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
2e5c221865 sstables: Update TTL/local deletion stats for non-expiring and live liveness_info.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Rafael Ávila de Espíndola
298873d33b Add a test with mismatched schema.
The sstable in the test is fine, but the schema thinks a static column
is regular.
2018-12-06 15:38:01 -08:00
Rafael Ávila de Espíndola
d392bc4924 Add a broken sstable test.
This sstable has a static column with clustering information.
2018-12-06 15:23:33 -08:00
Raphael S. Carvalho
1ddbbe51e6 tests: add test for compaction-strategy-aware major compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-12-06 18:37:16 -02:00
Raphael S. Carvalho
525ee18560 compaction: implement major compaction heuristic for leveled strategy
Major compaction for leveled strategy will now create a run of
non-overlapping sstables at the highest level. Until now, a single
sstable would be created at level 0 which was very suboptimal because
all data would need to climb up the levels again, making it a very
expensive I/O process.

Refs #1431.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-12-06 18:22:31 -02:00
Raphael S. Carvalho
3d9566e40d compaction: introduce notion of compaction-strategy-aware major compaction
That's only the very first step which introduces the machinery for making
major compaction aware of all strategies. By the time being, default
implementation is used for them all which only suits size tiered.

Refs #1431.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-12-06 18:22:30 -02:00
Vladimir Krivopalov
d2dfa2e15d sstables: Don't enforce default max_local_deletion_time value for 'mc' files.
Commit cc6c383249 has fixed an issue with
incorrectly tracking max_local_deletion_time and the check in
validate_max_local_deletion_time was called to work around old files.

This fix relaxes conditions for enforcing defaut max_local_deletion_time
so that they don't apply to SSTables in 'mc' format because the original
problem has been resolved before 'mc' format have been introduced.

This is needed to be able to read correct values from
Cassandra-generated SSTables that don't have a Scylla.db component.
Its presence or absence is used as an indicator of possibly affected
files.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 10:15:07 -08:00
Vladimir Krivopalov
0b1e6427ad sstables: Enforce estimated_partitions in generate_summary() to be always positive.
For tiny index files (< 8 bytes long) it could turn to zero and trigger
an assertion in prepare_summary().

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 10:15:07 -08:00
Raphael S. Carvalho
ffb00d2118 storage_service: remove outdated comment on ongoing compaction interrupt
After commit 5e953b5e47, compaction manager will forcefully stop
ongoing compactions instead of waiting for them to finish.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181206142600.21354-1-raphaelsc@scylladb.com>
2018-12-06 15:43:42 +01:00
Tomasz Grabiec
6012a63660 Merge "Fix window during init where waiting for a feature can be ignored" from Avi
storage_service keeps a bunch of "feature" variables, indicating cluster-wide
supported features, and has the ability to wait until the entire cluster supports
a given feature.

The propagation of features depends on gossip, but gossip is initialized after
storage_service, so the current code late-initializes the features. However, that
means that whoever waits on a feature between storage_service initialization and
gossip initialization loses their wait entry. In #3952, we have proof that this
in fact happens.

Fix this by removing the circular dependency. We now store features in a new
service, feature_service, that is started before both gossip and storage_service.
Gossip updates feature_service while storage_service reads for it.

Fixes #3953.

* https://github.com/avikivity/3953/v4.1:
  storage_service: deinline enable_all_features()
  gossiper: keep features registered
  tests/gossip: switch to seastar::thread
  storage_service: deinline init/deinit functions
  gossiper: split feature storage into a new feature_service
  gossiper: maybe enable features after start_gossiping()
  storage_service: fix gap when feature::when_enabled() doesn't work
2018-12-06 15:42:26 +01:00
Avi Kivity
33a0366ed8 storage_service: fix gap when feature::when_enabled() doesn't work
storage_service::register_features() reassigns to feature variables in
storage_service. This means that any call to feature::when_enabled() will be
orphaned when the feature is assigned.

Now that feature lifetimes are not tied to gossip, we can move the feature
initialization to the constructor and eliminate the gap. When gossip is started
it will evaluate application_states and enable features that the cluster agrees on.
2018-12-06 16:31:05 +02:00
Avi Kivity
587fd9b6c0 gossiper: maybe enable features after start_gossiping()
Since we may now start with features already registered, we need to enable
features immediately after gossip is started. This case happens in a cluster
that already is fully upgraded on startup. Before this series, features were
only added after this point.
2018-12-06 16:31:04 +02:00
Avi Kivity
4e553b692e gossiper: split feature storage into a new feature_service
Feature lifetime is tied to storage_service lifetime, but features are now managed
by gossip. To avoid circular dependency, add a new feature_service service to manage
feature lifetime.

To work around the problem, the current code re-initializes features after
gossip is initialized. This patch does not fix this problem; it only makes it
possible to solve it by untyping features from gossip.
2018-12-06 16:31:04 +02:00
Avi Kivity
9b476fc377 storage_service: deinline init/deinit functions
Reduces #include dependencies later on.
2018-12-06 16:31:04 +02:00
Avi Kivity
db72a7e8bd tests/gossip: switch to seastar::thread
Much simpler to manage the long initialization chain.
2018-12-06 16:31:04 +02:00
Avi Kivity
1215512e98 gossiper: keep features registered
Gossiper unregisters enabled features as an optimization. However that makes
decoupling features from gossiper harder. Disable this optimization; since the
number of features is small and normal access is to a single feature at a time,
there is no significant performance or memory loss.
2018-12-06 16:31:04 +02:00
Paweł Dziepak
9024187222 partition_slice: use small_vector for column_ids 2018-12-06 14:21:04 +00:00
Paweł Dziepak
a014367c5b mutation_fragment_merger: use small_vector 2018-12-06 14:21:04 +00:00
Paweł Dziepak
142c4a9d84 auth: use small_vector in resource 2018-12-06 14:21:04 +00:00
Paweł Dziepak
edbcac85cb auth: avoid list-initialisation of vectors
List-initialisation forces often completely unnecessary copies of the
elements.
2018-12-06 14:21:04 +00:00
Paweł Dziepak
890a5ba8ac idl: serialiser: add serialiser for utils::small_vector 2018-12-06 14:21:04 +00:00
Paweł Dziepak
abb4953209 idl: serialiser: deduplicate vector serialisers
In Scylla we have three implementations of vector-like structures
std::vector, utils::chunked_vector and utils::small_vector. Which one is
used is largerly an implementation detail and all should be serialised
by the IDL infrastructure in exactly the same way. To make sure that
it's indeed the case let's make them share the serialiser
implementation.
2018-12-06 14:21:04 +00:00
Paweł Dziepak
23d19d21bd utils: introduce small_vector
small_vector is a variation of std::vector<> that reserves a configurable
amount of storage internally, without the need for memory allocation.
This can bring measurable gains if the expected number of elements is
small. The drawback is that moving such small_vector is more expensive
and invalidates iterators as well as references which disqualifies it in
some cases.
2018-12-06 14:21:04 +00:00
Avi Kivity
21b4b2b9a1 Merge "Fix deadlocking multishard readers" from Botond
"
Multishard combining readers, running concurrently, with limited
concurrency and no timeout may deadlock, due to inactive shard readers
sitting on permits. To avoid this we have to make sure that all shard
readers belonging to a multishard combining readers, that are not
currently active, can be evicted to free up their permits, ensuring that
all readers can make progress.
Making inactive shard readers evictable is the solution for this
problem, however the original series introducing this solution
(414b14a6bd) did not go all they way and
left some loose ends. These loose ends are tied up by this mini-series.
Namely, two issues remained:
* The last reader to reach EOS was not paused (made evictable).
* Readers created/resumed as part of a read-ahead were not paused
  immediately after finishing the read-ahead.

This series fixes both of these.

Fixes: #3865
Tests: unit(release, debug)
"

* 'fix-multishard-reader-deadlock/v1' of https://github.com/denesb/scylla:
  multishard_combining_reader: pause readers after reading ahead
  multishard_combining_reader: pause *all* EOS'd readers
2018-12-06 16:08:11 +02:00
Botond Dénes
ee193f1ab4 multishard_combining_reader: pause readers after reading ahead
Readers created or resumed just to read ahead should be paused right
after, to avoid consuming all available permits on the shards they
operate on, causing a deadlock.
2018-12-06 13:20:30 +02:00
Avi Kivity
d4f353d3c8 Merge "normalized python3 compatibility, shebang and encoding" from Alexys
"
This series of patches ensures that all the Python code base is python3 compliant
and consistent by applying the following logic:

- python3 classifier on setup.py to explicitly state our python compatibility matrix
- add UTF-8 encoding header
- correct every shebang to the same /usr/bin/env python3
- shebang is only added on scripts meant to be executed on their own (removed otherwise)
- migrate some leftover scripts from python2 to python3 with minimal QA

This work is important to prepare for a more drastic change on Python code styling
using the black formatter and the setting up of automated QA checks on Python code base.
"

* 'python3_everywhere' of https://github.com/numberly/scylla:
  scylla-housekeeping: fix python3 compat and shebang
  dist/ami/files/scylla_install_ami: python3 shebang
  dist/docker/redhat/docker-entrypoint.py: add encoding comment
  fix_system_distributed_tables.py: fix python3 compat and shebang
  gen_segmented_compress_params.py: add encoding comment
  idl-compiler.py: python3 shebang
  scylla-gdb.py: python3 shebang
  configure.py: python3 shebang
  tools/scyllatop/: add / normalize python3 shebang
  scripts/: add / normalize python3 shebang
  dist/common/scripts: add / normalize python3 shebang
  test.py: add encoding comment
  setup.py: add python3 classifiers
2018-12-06 12:16:57 +02:00
Avi Kivity
f073ea5f87 Merge "Fix tombstone histogram when writing SSTables 3.x" from Vladimir
"
This patchset extends a number of existing tests to check SSTables
statistics for 'mc' format and fixes an issue discovered with the help
of one of the tests.

Tests: unit {release}
"

* 'projects/sstables-30/check-stats/v2' of https://github.com/argenet/scylla:
  tests: Run sstable_timestamp_metadata_correcness_with_negative with all SSTables versions.
  tests: Run sstable_tombstone_histogram_test for all SSTables versions.
  tests: Run min_max_clustering_key_test on all SSTables versions.
  tests: Expand test_sstable_max_local_deletion_time_2 to run for all SSTables versions.
  tests: Run test_sstable_max_local_deletion_time on all SSTables versions.
  tests: Extend test checking tombstones histogram to cover all SSTables versions.
  sstables: Properly track row-level tombstones when writing SSTables 3.x.
  tests: Run min_max_clustering_key_test_2 for all SSTables versions.
  tests: Make reusable_sst() helper accept SSTables version parameter.
2018-12-06 11:44:33 +02:00
Botond Dénes
170fa382fa multishard_combining_reader: pause *all* EOS'd readers
Previously the last shard reader to reach EOS wasn't paused. This is a
mistake and can contribute to causing deadlocks when the number of
concurrently active readers on any shard is limited.
2018-12-06 10:30:43 +02:00
Vladimir Krivopalov
dd769f2b41 tests: Run sstable_timestamp_metadata_correcness_with_negative with all SSTables versions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 15:29:28 -08:00
Vladimir Krivopalov
a098387e9f tests: Run sstable_tombstone_histogram_test for all SSTables versions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 15:29:28 -08:00
Vladimir Krivopalov
06a47fc9f9 tests: Run min_max_clustering_key_test on all SSTables versions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 15:29:28 -08:00
Vladimir Krivopalov
c53afd7bba tests: Expand test_sstable_max_local_deletion_time_2 to run for all SSTables versions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 15:29:28 -08:00
Vladimir Krivopalov
cfbde5b89c tests: Run test_sstable_max_local_deletion_time on all SSTables versions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 15:29:28 -08:00
Vladimir Krivopalov
9955710cac tests: Extend test checking tombstones histogram to cover all SSTables versions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 12:36:22 -08:00
Vladimir Krivopalov
cdae62ec29 sstables: Properly track row-level tombstones when writing SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 12:36:22 -08:00
Vladimir Krivopalov
0f3fb32028 tests: Run min_max_clustering_key_test_2 for all SSTables versions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 12:36:22 -08:00
Vladimir Krivopalov
c474b0d851 tests: Make reusable_sst() helper accept SSTables version parameter.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 12:36:22 -08:00
Paweł Dziepak
504c586392 intrusive_set_external_comparator: make iterator nothrow move constructible 2018-12-05 20:07:29 +00:00
Paweł Dziepak
402902ac78 mutation_fragment_merger: value-initialise iterator
ForwardIterators are default constructible, but they have to be
value-initialised to compare equal to other value-initialised instances
of that iterator.
2018-12-05 20:07:29 +00:00
Tomasz Grabiec
2c2d202354 tests: perf_fast_forward: Make output directory configurable
Message-Id: <1544020034-16340-1-git-send-email-tgrabiec@scylladb.com>
2018-12-05 21:51:01 +02:00
Tomasz Grabiec
247347058c tests: perf_fast_forward: Always print to stdout
Otherwise errors cannot be made sense of, since error are reported
always to stdout. Without test output we don't know what they're
referring to.

This change makes the output always go to stdout, in addition to other
reportes, if any.
Message-Id: <1544020084-16492-1-git-send-email-tgrabiec@scylladb.com>
2018-12-05 21:51:01 +02:00
Yibo Cai (Arm Technology China)
6fadba56cc utils: optimize UTF-8 validation
UTF-8 string is now validated by boost::locale::conv::utf_to_utf, it
actually does string conversions which is more than necessary.  As
observed on Arm server, UTF-8 validation can become bottleneck under
heavy loads.

This patch introduces a brand new SIMD implementation supporting both
NEON and SSE, as well as a naive approach to handle short strings.
The naive approach is 3x faster than boost utf_to_utf, whilst SIMD
method outperforms naive approach 3x ~ 5x on Arm and x86. Details at
https://github.com/cyb70289/utf8/.

UTF-8 unit test is added to check various corner cases.

Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Message-Id: <1543978498-12123-1-git-send-email-yibo.cai@arm.com>
2018-12-05 21:51:01 +02:00
Tomasz Grabiec
3e70ae1d06 Merge "Improve times to start / stop the nodes" from Glauber
If the compaction manager is started, compactions may start (this is
regardless of whether or not we trigger them). The problem with that is
that they start at a time in which we are flushing the commitlog and the
initialization procedure waits for the commitlog to be fully flushed and
the resulting memtables flushed before we move on.

Because there are no incoming writes, the amount of shares in memtable
flushes decrease as memory used decreases and that can cause the startup
procedure to take a long time.

We have recently started to bump the shares manually for manual flushes.
While that guarantees that we will not drive the shares to zero, I will
make the argument that we can do better by making sure that those things
are, at this point, running alone: user experience is affected by
startup times and the bump we give to user-triggered operations will
only do so much. Even if we increase the shares a lot flushes will still
be fighting for resources with compactions and startup will take longer
than it could.

By making sure that flushes are this point running alone we improve the
user experience by making sure the startup is as fast as it can be.

There is a similar problem at the drain level, which is also fixed in this
series.

Fixes #3958

* git@github.com:glommer/scylla.git faster-restart
  compaction_manager: delay initialization of the compaction manager.
  drain: stop compactions early
2018-12-05 21:51:01 +02:00
Asias He
eeeb2da7bb gossip: Fix race in real_mark_alive and shutdown msg
In dtest, we have

   self.check_rows_on_node(node1, 2000)
   self.check_rows_on_node(node2, 2000)

which introduce the following cluster operations:

1) Initially:

- node1 up
- node2 up

2) self.check_rows_on_node(node1, 2000)
- node2 down
- node2 up (A: node2 will call gossiper::real_mark_alive when node2 boots
up to mark node1 up)

3) self.check_rows_on_node(node2, 2000)
- node1 down (B: node1 will send shutdown gossip message to node2, node2
will mark node1 down)
- node1 up (C: when node1 is up, node2 will call
gossiper::real_mark_alive)

Since there is no guarantee the order of Operation A and Operation B, it
is possible node2 will mark node1 as status=shutdown and mark node1 is
UP.

In Operation C, node2 will call gossiper::real_mark_alive to mark node1
up, but since node2 might think node1 is already up, node2 will exit
early in gossiper::real_mark_alive and not log "InetAddress 127.0.0.1 is
now UP, status={}"

As a result, dtest fails to see node2 reports node1 is up when it boots
node1 and fail the test.

   TimeoutError: 23 Nov 2018 10:44:19 [node2] Missing: ['127.0.0.1.* now UP']

In the log we can see node1 marked as DOWN and UP almost at the same time on node2:

   INFO  2018-11-23 22:31:29,999 [shard 0] gossip - InetAddress 127.0.0.1 is now DOWN, status = shutdown
   INFO  2018-11-23 22:31:30,006 [shard 0] gossip - InetAddress 127.0.0.1 is now UP, status = shutdown

Fixes #3940

Tests: dtest with 20 consecutive succesful runs
Message-Id: <996dc325cbcc3f94fc0b7569217aa65464eaaa1c.1543213511.git.asias@scylladb.com>
2018-12-05 21:51:01 +02:00
Tomasz Grabiec
edbef7400b configure.py: Always add a rule for building gen_crc_combine_table
Fixes a build failure when only the scylla binary was selected for
building like this:

  ./configure.py --with scylla

In this case the rule for gen_crc_combine_table was missing, but it is
needed to build crc_combine_table.o

Message-Id: <1544010138-21282-1-git-send-email-tgrabiec@scylladb.com>
2018-12-05 21:51:01 +02:00
Botond Dénes
77dbc7d09a querier: fix evict_one() and evict_all_for_table()
Both of these have the same problem. They remove the to-be-evicted
entries from `_entries` but they don't unregister the `entry` from the
`read_concurrency_semaphore`. This results in the
`reader_concurrency_semaphore` being left with a dangling pointer to the
entries will trigger segfault when it tries to evict the associated
inactive reads.

Also add a unit test for `evict_all_for_table()` to check that it works
properly (`evict_one()` is only used in tests, so no dedicated test for
it).

Fixes: #3962

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <57001857e3791c6385721b624d33b667ccda2e7d.1544010868.git.bdenes@scylladb.com>
2018-12-05 21:51:01 +02:00
Avi Kivity
0be554c337 storage_service: deinline enable_all_features()
Next commit wants to make it depend on config, which is best done out-of-line.
2018-12-05 17:30:42 +02:00
Asias He
a5d8b66f2c gossip: Make favor newly added node log debug level
It is not very useful for user to know this.

Message-Id: <6c2dfc522d6974adb97c34fbc1e3a0339d2d530c.1543997137.git.asias@scylladb.com>
2018-12-05 10:45:03 +02:00
Avi Kivity
b0cb69ec25 Merge "Make sstable reader fail on unknown colum names in MC format" from Piotr
"
Before the reader was just ignoring such columns but this creates a risk of data loss.

Refs #2598
"

* 'haaawk/2598/v3' of github.com:scylladb/seastar-dev:
  sstables: Add test_sstable_reader_on_unknown_column
  sstables: Exception on sstable's column not present in schema
  sstables: store column name in column_translation::column_info
  sstables: Make test_dropped_column_handling test dropped columns
2018-12-05 10:43:29 +02:00
Takuya ASADA
9388f3d626 reloc: drop --jobs from build_deb.sh/build_rpm.sh scripts
Since we merged relocatable package, build_deb.sh/build_rpm.sh only does
packaging using prebuilt binary taken from relocatable package, won't compile
anything.

So passing --jobs option to build_deb.sh/build_rpm.sh becomes meaningless,
we can drop it.

Note that we still can specify --jobs option on reloc/build_reloc.sh, it
runs "ninja-build -jN" to compile Scylla, then generate relocatable package.

See #3956

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181204205652.25138-1-syuu@scylladb.com>
2018-12-04 21:00:51 +00:00
Glauber Costa
0b7818d2b9 drain: stop compactions early
drain suffers from the same problem as startup suffers now: memtables
are flushed as part of the drain routine, and because there are no
incoming writes the shares the controller assign to flushes go down over
time, slowing down the process of drain.

This patch reorders things so that we stop compactions first, and flush
later. It guarantees that when flush do happen it will have the full
bandwidth to work with.

There is a comment in the code saying we should stop compactions
forcefully instead of waiting for them to finish. I consider this
orthogonal to this patch therefore I am not touching this. Doing so will
make the drain operation even faster but can be done later. Even when we
do it, having the flushes proceed alone instead of during compactions
will make it faster.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-12-04 13:55:59 -05:00
Glauber Costa
fee4d2eb9b compaction_manager: delay initialization of the compaction manager.
If the compaction manager is started, compactions may start (this is
regardless of whether or not we trigger them). The problem with that is
that they start at a time in which we are flushing the commitlog and the
initialization procedure waits for the commitlog to be fully flushed and
the resulting memtables flushed before we move on.

Because there are no incoming writes, the amount of shares in memtable
flushes decrease as memory used decreases and that can cause the startup
procedure to take a long time.

We have recently started to bump the shares manually for manual flushes.
While that guarantees that we will not drive the shares to zero, I will
make the argument that we can do better by making sure that those things
are, at this point, running alone: user experience is affected by
startup times and the bump we give to user-triggered operations will
only do so much. Even if we increase the shares a lot flushes will still
be fighting for resources with compactions and startup will take longer
than it could.

By making sure that flushes are this point running alone we improve the
user experience by making sure the startup is as fast as it can be.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-12-04 13:48:42 -05:00
Tomasz Grabiec
b8c405c019 Merge "Correct the usage of row ttl and add write-read test" from Piotr
Fixes the condition which determines whether a row ttl should be used for a cell
and adds a test that uses each generated mutation to populate mutation source
and then verifies that it can read back the same mutation.

* seastar-dev.git haaawk/sst3/write-read-test/v3:
  Fix use_row_ttl condition
  Add test_all_data_is_read_back
2018-12-04 19:47:28 +01:00
Tomasz Grabiec
9a4c00beb7 utils/gz: Fix compilation on non-x86 archs
gen_crc_combine_table is now executed on every build, so it should not
fail on unsupported archs. The generated file will not contain data,
but this is fine since it should not be used.

Another problem is that u32 and u64 aliases were not visible in the #else
branch in crc_combine.cc
Message-Id: <1543864425-5650-1-git-send-email-tgrabiec@scylladb.com>
2018-12-04 18:17:27 +00:00
Piotr Jastrzebski
fed3b51abe Add test_all_data_is_read_back
This tests that a source after being populated with a mutation
returns exactly the same mutation when read.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-12-04 11:42:08 +01:00
Piotr Sarna
7b0a3fbf8a auth: add abort_source to waiting for schema agreement
When the auth service is requested to stop during bootstrap,
it might have still not reached schema agreement.
Currently, waiting for this agreement is done in an infinite loop,
without taking abort_source into account.
This patch introduces checking if abort was requested
and breaking the loop in such case, so auth service can terminate.

Tests:
unit (release)
dtest (bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test)
Message-Id: <1b7ded14b7c42254f02b5d2e10791eb767aae7fc.1543914769.git.sarna@scylladb.com>
2018-12-04 10:41:09 +00:00
Piotr Jastrzebski
75b99838fc Fix use_row_ttl condition
Previous condition was wrong and was using row ttl too often.

We also have to change test_dead_row_marker to compare
resulting sstable with sstable generated by Origin not
by sstableupgrade.
This is because sstableupgrade transmits information about deleted row
marker automatically to cells in that row.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-12-04 10:51:36 +01:00
Avi Kivity
c3e664eec2 Merge "Improve corrupt sstable reporting" from Rafael
"
This is a small step in fixing issue #2347. It is mostly tests and
testing infrastructure, but it does include a fix for a case where we
were missing the filename in the malformed_sstable_exception.
"

* 'espindola/sstable-corruption-v2' of https://github.com/espindola/scylla:
  Add a filename to a malformed_sstable_exception.
  Try to read the full sst in broken_sst.
  Convert tests to SEASTAR_THREAD_TEST_CASE.
  Check the exception message.
  Move some tests to broken_sstable_test.cc
2018-12-04 10:32:10 +02:00
Avi Kivity
414b14a6bd Merge "Make inactive shard readers evictable" from Botond
"
This series attempts to solve the regressions recently discovered in
performance of multi-partition range-scans. Namely that they:
* Flood the reader concurrency semaphore's queues, trampling other
  reads.
* Behave very badly when too many of them is running concurrently
  (trashing).
* May deadlock if enough of them is running without a timeout.

The solution for these problems is to make inactive shard readers
evictable. This should address all three issues listed above, to varying
degrees:
* Shard readers will now not cling onto their permits for the entire
  duration of the scan, which might be a lot of time.
* Will be less affected by infinite concurrency (more than the node can
  handle) as each scan now can make progress by evicting inactive shard
  readers belonging to other scans.
* Will not deadlock at all.

In addition to the above fix, this series also bundles two further
improvements:
* Add a mechanism to `reader_concurrecy_semaphore` to be notified of
  newly inserted evictables.
* General cleanups and fixes for `multishard_combining_reader` and
  `foreign_reader`.

I can unbundle these mini series and send them separately, if the
maintainers so prefer, altough considering that this series will have to
be backported to 3.0, I think this present form is better.

Fixes: #3835
"

* 'evictable-inactive-shard-readers/v7' of https://github.com/denesb/scylla: (27 commits)
  tests/multishard_mutation_query_test: test stateless query too
  tests/querier_cache: fail resource-based eviction test gracefully
  tests/querier_cache: simplify resource-based eviction test
  tests/mutation_reader_test: add test_multishard_combining_reader_next_partition
  tests/mutation_reader_test: restore indentation
  tests/mutation_reader_test: enrich pause-related multishard reader test
  multishard_combining_reader: use pause-resume API
  query::partition_slice: add clear_ranges() method
  position_in_partition: add region() accessor
  foreign_reader: add pause-resume API
  tests/mutation_reader_test: implement the pause-resume API
  query_mutations_on_all_shards(): implement pause-resume API
  make_multishard_streaming_reader(): implement the pause-resume API
  database: add accessors for user and streaming concurrency semaphores
  reader_lifecycle_policy: extend with a pause-resume API
  query_mutations_on_all_shards(): restore indentation
  query_mutations_on_all_shards(): simplify the state-machine
  multishard_combining_reader: use the reader lifecycle policy
  multishard_combining_reader: add reader lifecycle policy
  multishard_combining_reader: drop unnecessary `reader_promise` member
  ...
2018-12-04 10:22:35 +02:00
Botond Dénes
9de4f3a834 tests/multishard_mutation_query_test: test stateless query too
In the `test_read_all`, do a stateless read as well to ensure that path
works correctly as well.
2018-12-04 08:51:05 +02:00
Botond Dénes
6676ceba7f tests/querier_cache: fail resource-based eviction test gracefully
Currently when this test fails, resources are not released in the
correct order, which results in ASAN complaining about use-after-free
in debug builds. This is due to the BOOST_REQUIRE macro aborting the
test when the predicate fails, not allowing for correct destruction
order to take place.
To avoid this ugly failure, that adds noise and might cause a developer
investigating into the failure starting on the wrong path, use the more
mild BOOST_CHECK family of test macros. These will allow the test to run
to completion even when the predicate fails, allowing for the correct
destruction of the resources.
2018-12-04 08:51:05 +02:00
Botond Dénes
93e41397f7 tests/querier_cache: simplify resource-based eviction test
Now that we have an accessor for all concurrency semaphores, we don't
need the tricks of creating a dummy keyspace to get them. Use the
accessors instead.
2018-12-04 08:51:05 +02:00
Botond Dénes
dcd2d116a3 tests/mutation_reader_test: add test_multishard_combining_reader_next_partition
Test the interaction of the multishard reader with the foreign reader
w.r.t next_partition(). next_partition() is a special operation, as it
its execution is deferred until the next cross-shard operations. Give it
some extra stress-testing.
2018-12-04 08:51:05 +02:00
Botond Dénes
20e994e526 tests/mutation_reader_test: restore indentation
Left over from the previous patch.
2018-12-04 08:51:05 +02:00
Botond Dénes
a577ff97e9 tests/mutation_reader_test: enrich pause-related multishard reader test
Enrich the existing test_multishard_combining_reader_as_mutation_source
test case with delaying the pause/resume and eviction of paused
readers.
2018-12-04 08:51:05 +02:00
Botond Dénes
22b14d593b multishard_combining_reader: use pause-resume API
Refactor the multishard combining reader to make use of the new
pause-resume API to pause inactive shard readers.

Make the pause-resume API mandatory to implement, as by now all existing
clients have adapted it.
2018-12-04 08:51:05 +02:00
Botond Dénes
77b758707c query::partition_slice: add clear_ranges() method
Allows for clearing any custom partition ranges, effectively resetting
them to the default ones. Useful for code that needs to set several
different specific partition ranges, one after the other, but doesn't
want to remember the last key it set a range for to be able to clear the
previous range with `clear_range()`.
2018-12-04 08:51:05 +02:00
Botond Dénes
a594fd39ce position_in_partition: add region() accessor 2018-12-04 08:51:05 +02:00
Botond Dénes
9601d23e0d foreign_reader: add pause-resume API
Allowing for pausing the reader and later resume it. Pausing the reader
waits on the ongoing read ahead (if any), executes any pending
`next_partition()` calls and than detaches the shard reader's buffer.
The paused shard reader is returned to the client.
Resuming the reader consists of getting the previously detached reader
back, or one that has the same position as the old reader had.
This API allows for making the inactive shard readers of the
`multishard_combining_reader` evictable.
The API is private, it's only accessible for classes knowing the full
definition of the `foreign_reader` (which resides in a .cc file).
2018-12-04 08:51:05 +02:00
Botond Dénes
a12fae366d tests/mutation_reader_test: implement the pause-resume API 2018-12-04 08:51:05 +02:00
Botond Dénes
f334d3717f query_mutations_on_all_shards(): implement pause-resume API 2018-12-04 08:51:05 +02:00
Botond Dénes
72ed655ef0 make_multishard_streaming_reader(): implement the pause-resume API 2018-12-04 08:51:05 +02:00
Botond Dénes
bf0d1f4eea database: add accessors for user and streaming concurrency semaphores
These will soon be needed to register inactive user and streaming reads
with the respective semaphores.
2018-12-04 08:51:05 +02:00
Botond Dénes
5f67a065c6 reader_lifecycle_policy: extend with a pause-resume API
This API provides a way for the mulishard reader to pause inactive shard
readers and later resume them when they are needed again. This allows
for these paused shard readers to be evicted when the node is under
pressure.
How the readers are made evictable while paused is up to the clients.

Using this API in the `multishard_combining_reader` and implementing it
in the clients will be done in the next patches.

Provide default implementation for the new virtual methods to facilitate
gradual adoption.
2018-12-04 08:51:05 +02:00
Botond Dénes
6f0e0c4ed7 query_mutations_on_all_shards(): restore indentation
The previous patch added half-aligned lines to improve readability of
that patch.
2018-12-04 08:51:05 +02:00
Botond Dénes
aa6083a75b query_mutations_on_all_shards(): simplify the state-machine
The `read_context` which handles creating, saving and looking-up the
shard readers has to deal with its `destroy_reader()` method called any
time, even before some other method finished its work. For example it is
valid for a reader to be requested to be destroyed, even before the
contexts finishes creating it.
This means that state transitions that take time can be interleaved with
another state transition request. To deal with this the read context
uses `future_` states, states that mark an ongoing state transitions.
This allows for state transition request that arrive in the middle of
another state transition to be attached as a continuation to the ongoing
transition, and to be executed after that finishes. This however
resulted in complex code, that has to handle readers being in all sorts
of different states, when the `save_readers()` method is called.
To avoid all this complexity, exploit the fact that `destroy_reader()`
receives a future<> as its argument, which resolves when all previous
state transitions have finished. Use a gate to wait on all these futures
to resolve. This way we don't need all those transitional states,
instead in `save_readers()` we only need to wait on the gate to close.
Thus the number of states `save_readers()` has to consider drops
drastically.

This has the theoretical drawback of the process of saving the readers
having to wait on each of the readers to stop, but in practice the
process finishes when the last reader is saved anyway, so I don't expect
this to result in any slowdown.
2018-12-04 08:51:05 +02:00
Botond Dénes
007619de4c multishard_combining_reader: use the reader lifecycle policy
Refactor the multishard combining reader and its clients to use the
reader lifecycle policy introduced in the previous patch.
2018-12-04 08:51:05 +02:00
Botond Dénes
0a616c899e multishard_combining_reader: add reader lifecycle policy
Currently `multishard_combining_reader` takes two functors, one for
creating the readers and optionally one for destroying them.
A bag of functors (`std::function`) however make for a terrible
interface, and as we are about to add some more customization points,
it's time to use something more formal: policy based design, a
well-known design pattern.

As well as merging the job of the two functors into a single policy
class, also widen the area of responsibility of the policy to include
keeping alive any resource the shard readers might need on their home
shard. Implementing a proper reader cleanup is now not optional either.

This patch only adds the `reader_managing_policy` interface,
refactoring the multishard reader to use it will be done in the next patch.
2018-12-04 08:51:05 +02:00
Botond Dénes
301abaca07 multishard_combining_reader: drop unnecessary reader_promise member
The `reader_promise` member of the `shard_reader` was used to
synchronize a foreground request to create the underlying reader with an
ongoing background request with the same goal. This is however
unnecessary. The underlying reader is created in the background only as
part of a read ahead. In this case there is no need for extra
synchronization point, the foreground reader create request can just
wait for the read ahead to finish, for which there already exists a
mean. Furthermore, foreground reader create requests are always followed
by a `fill_buffer()` request, so by waiting on the read ahead we ensure
that the following `fill_buffer()` call will not block.
2018-12-04 08:51:05 +02:00
Botond Dénes
a73175fdbc multishard_combining_reader: drop tracking of pending next_partition calls
Shard readers used to track pending `next_partition()` calls that they
couldn't execute, because their underlying reader wasn't created yet.
These pending calls were then executed after the reader was created.
However the only situation where a shard reader can receive a
`next_partition()` call, before its underlying reader wasn't created is
when `next_partition()` is called on the multishard reader before a
single fragment is read. In this case we know we are at a partition
boundary and thus this call has no effect, therefore it is safe to
ignore it.
2018-12-04 08:51:05 +02:00
Botond Dénes
ab3e639c3b foreign_reader: use bool for pending_next_partition
Foreign reader doesn't execute `next_partition()` calls straight away,
when this would require interaction with the remote reader. Instead
these calls are "remembered" and executed on the next occasion the
foreign reader has to interact with the remote reader. This was
implemented with a counter that counts the number of pending
`next_partition()` calls.
However when `next_partition()` is called multiple times, without
interleaving calls to `operator()()` or `fast_forward_to()`, only the
first such call has effect. Thus it doesn't make sense to count these
calls, it is enough to just set a flag if there was at least one such
call.
2018-12-04 08:51:05 +02:00
Botond Dénes
5a4fd1abab multishard_combining_reader: drop support for streamed_mutation fast-forwarding
It doesn't make sense for the multishard reader anyway, as it's only
used by the row-cache. We are about to introduce the pausing of inactive
shard readers, and it would require complex data structures and code
to maintain support for this feature that is not even used. So drop it.
2018-12-04 08:51:05 +02:00
Botond Dénes
b36733971b mutation_source_test: add option to skip intra-partition fast-forwarding tests
To allow for using this test suite for testing mutation sources that
don't support intra-partition fast-forwarding.
2018-12-04 08:51:05 +02:00
Botond Dénes
37f0117747 reader_concurrency_semaphore: refactor eviction mechanism
As we are about to add multiple sources of evictable readers, we need a
more scalable solution than a single functor being passed that opaquely
evicts a reader when called.
Add a generic way to register and unregister evictable (inactive)
readers to the semaphore. The readers are expected to be registered when
they become evictable and are expected to be unregistered when they
cease to become evictable. The semaphore might evict any reader that is
registered to it, when it sees fit.

This also solves the problem of notifying the semaphore when new readers
become evictable. Previously there was no such mechanism, and the
semaphore would only evict any such new readers when a new permit was
requested from it.
2018-12-04 08:51:00 +02:00
Rafael Ávila de Espíndola
21199a7a5c Add a filename to a malformed_sstable_exception.
It is reasonable for parse() to throw when it finds something wrong
with the format. This seems to be the best spot to add the filename
and rethrow.

Also add a testcase to make sure we keep handling this error
gracefully.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2018-12-03 13:50:23 -08:00
Rafael Ávila de Espíndola
a6e25e4bd0 Try to read the full sst in broken_sst.
With this patch we use data_consume_rows to try to read the entire
sstable. The patch also adds a test with a particular corruption that
would not be found without parsing the file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2018-12-03 13:47:49 -08:00
Rafael Ávila de Espíndola
b1190c58ec Convert tests to SEASTAR_THREAD_TEST_CASE.
This will simplify future changes to broken_sst.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2018-12-03 13:26:06 -08:00
Rafael Ávila de Espíndola
e5c5afffc9 Check the exception message.
This makes the tests a bit more strict by also checking the message
returned by the what() function.

This shows that some of the tests are out of sync with which errors
they check for. I will hopefully fix this in another pass.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2018-12-03 12:31:53 -08:00
Rafael Ávila de Espíndola
f9d81bcd43 Move some tests to broken_sstable_test.cc
sstable_test.cc was already a bit too big and there is potential for
having a lot of tests about broken sstables.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2018-12-03 12:16:30 -08:00
Rafael Ávila de Espíndola
cf4dc38259 Simplify state machine loop.
These loops have the structure :

while (true) {
  switch (state) {
  case state1:
  ...
  break;
  case state2:
  if (...) { ... break; } else {... continue; }
  ...
  }
  break;
}

There a couple things I find a bit odd on that structure:

* The break refers to the switch, the continue to the loop.
* A while (true) loop always hits a break or a continue.

This patch uses early returns to simplify the logic to

while (true) {
  switch (state) {
  case state1:
  ...
  return
  case state2:
  if (...) { ... return; }
  ...
  }
}

Now there are no breaks or continues.

Tests: unit (release)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181126171726.84629-1-espindola@scylladb.com>
2018-12-03 20:34:03 +01:00
Avi Kivity
b098b5b987 Merge "Optimize checksum_combine() for CRC32" from Tomek
"
zlib's crc32_combine() is not very efficient. It is faster to re-combine
the buffer using crc32(). It's still substantial amount of work which
could be avoided.

This patch introduces a fast implementation of crc32_combine() which
uses a different algorithm than zlib. It also utilizes intrinsics for
carry-less multiplication instruction to perform the computation faster.
The details of the algorithm can be found in code comments.

Performance results using perf_checksum and second buffer of length 64 KiB:

zlib CRC32 combine:   38'851   ns
libdeflate CRC32:      4'797   ns
fast_crc32_combine():     11   ns

So the new implementation is 3500x faster than zlib's, and 417x faster than
re-checksumming the buffer using libdeflate.

Tested on i7-5960X CPU @ 3.00GHz

Performance was also evaluated using sstable writer benchmark:

  perf_fast_forward --populate --sstable-format=mc --data-directory /tmp/perf-mc \
     --value-size=10000 --rows 1000000 --datasets small-part

It yielded 9% improvement in median frag/s (129'055 vs 117'977).

Refs #3874
"

* tag 'fast-crc32-combine-v2' of github.com:tgrabiec/scylla:
  tests: perf_checksum: Test fast_crc32_combine()
  tests: Rename libdeflate_test to checksum_utils_test
  tests: libdeflate: Add more tests for checksum_combine()
  tests: libdeflate: Check both libdeflate and default checksummers
  sstables: Use fast_crc_combine() in the default checksummer
  utils/gz: Add fast implementation of crc32_combine()
  utils/gz: Add pre-computed polynomials
  utils/gz: Import Barett reduction implementation from libdeflate
  utils: Extract clmul() from crc.hh
2018-12-03 19:02:01 +02:00
Tomasz Grabiec
aa19f98d18 sstables: Write Statistics.db offset map entries in the same order as Cassandra
Before this patch we were writing offset map enteies in unspecified
order, the one returned by std::unorderd_map. Cassandra writes them
sorted by metadata_type. Use the same order for improved
compatibility.

Fixes #3955.

Message-Id: <1543846649-22861-1-git-send-email-tgrabiec@scylladb.com>
2018-12-03 16:40:24 +02:00
Avi Kivity
4dc402b53f Merge "Create sstable in a sub-directory" from Benny
"
Due to an XFS heuristic, if all files are in one (or a few) directories,
then block allocation can become very slow. This is because XFS divides
the disk into a few allocation groups (AGs), and each directory allocates
preferentially from a single AG. That AG can become filled long before
the disk is full.

This patchset works around the problem by:
- creating sstable component files in their own temporary, per-sstable sub-directory,
- moving the files back into the canonical location right after begin created, and finally
- removing the temp sub-directory when the sstable is sealed.
- In addition, any temporary sub-directories that might have been left over if scylla
  crashed while creating sstables are looked up and removed when populating the table.

Fixes: #3167

Tests: unit (release)
"

* 'issues/3167/v7' of https://github.com/bhalevy/scylla:
  distributed_loader::populate_column_family: lookup and remove temp sstable directories
  database: directly use std::experimental::filesystem::path for lister::path
  database: use std::experimental::filesystem::path for lister::path
  sstable: use std::experimental::filesystem rather than boost
  sstable::seal_sstable: fixup indentation
  sstable: create sstable component files in a subdirectory
  sstable::new_sstable_component_file: pass component_type rather than filename
  sstable: cleanup filename related functions
  sstable: make write_crc, write_digest, and new_sstable_component_file private methods
2018-12-03 16:26:12 +02:00
Tomasz Grabiec
feefb23232 tests: perf_checksum: Test fast_crc32_combine() 2018-12-03 14:40:35 +01:00
Tomasz Grabiec
dda0f9b6eb tests: Rename libdeflate_test to checksum_utils_test 2018-12-03 14:40:35 +01:00
Tomasz Grabiec
7febdb5a5c tests: libdeflate: Add more tests for checksum_combine() 2018-12-03 14:40:35 +01:00
Tomasz Grabiec
b22ed75416 tests: libdeflate: Check both libdeflate and default checksummers 2018-12-03 14:40:35 +01:00
Tomasz Grabiec
1eb03b6ff1 sstables: Use fast_crc_combine() in the default checksummer 2018-12-03 14:40:35 +01:00
Tomasz Grabiec
1fb792c547 utils/gz: Add fast implementation of crc32_combine()
zlib's crc32_combine() is not very efficient. It is faster to re-combine
the buffer using crc32(). It's still substantial amount of work which
could be avoided.

This patch introduces a fast implementation of crc32_combine() which
uses a different algorithm than zlib. It also utilizes intrinsics for
carry-less multiplication instruction to perform the computation faster.
The details of the algorithm can be found in code comments.

Performance results using perf_checksum and second buffer of length 64 KiB:

zlib CRC32 combine:   38'851   ns
libdeflate CRC32:      4'797   ns
fast_crc32_combine():     11   ns

So the new implementation is 3500x faster than zlib's, and 417x faster than
re-checksumming the buffer using libdeflate.

Tested on i7-5960X CPU @ 3.00GHz

Performance was also evaluated using sstable writer benchmark:

  perf_fast_forward --populate --sstable-format=mc --data-directory /tmp/perf-mc \
     --value-size=10000 --rows 1000000 --datasets small-part

It yielded 9% improvement in median frag/s (129'055 vs 117'977).
2018-12-03 14:40:35 +01:00
Tomasz Grabiec
cd3d9d357b utils/gz: Add pre-computed polynomials
gen_crc_combine_table.cc will be run during build to produce tables
with precomputed polynomials (4 x 256 x u32). The definitions will
reside in:

  build/<mode>/gen/utils/gz/crc_combine_table.cc

It takes 20ms to generate on my machine.

The purpose of those polynomials will be explained in crc_combine.cc
2018-12-03 14:36:09 +01:00
Tomasz Grabiec
63e0da9e58 utils/gz: Import Barett reduction implementation from libdeflate 2018-12-03 14:36:09 +01:00
Tomasz Grabiec
bb7d95d6c3 utils: Extract clmul() from crc.hh 2018-12-03 14:36:08 +01:00
Botond Dénes
0cb7c43fb5 reader_concurrency_semaphore: add dedicated .cc file
As we are about to extend the functionality of the reader concurrency
semaphore, adding more method implementations that need to go to a .cc
file, it's time we create a dedicated file, instead of keep shoving them
into unrelated .cc files (mutation_reader.cc).
2018-12-03 13:37:02 +02:00
Avi Kivity
d6a22c50cb Update libdeflate submodule
* libdeflate 17ec6c9...e7e54ea (1):
  > build: improve out-of-tree build with multiple output trees
2018-12-03 11:18:02 +02:00
Botond Dénes
34c2d67614 reader_concurrency_semaphore: rearrange members
Use standard convention of the rest of the code base. Type definitions
first, then data members and finally member functions.
As we are about to add more members, its especially important to make
the growing class have a familiar member arrangement.
2018-12-03 08:26:10 +02:00
Benny Halevy
9e7125a9de distributed_loader::populate_column_family: lookup and remove temp sstable directories
These may be left over in case we crash while writing sstables.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
857ff4f59a database: directly use std::experimental::filesystem::path for lister::path
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
585ac6e641 database: use std::experimental::filesystem::path for lister::path
We would like to get rid of boost::filesystem and gradually replace it with
std::experimental::filesystem.

TODO: using namespace fs = std::experimental::filesystem,
use fs::path directly, rather than lister::path

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
0b74927757 sstable: use std::experimental::filesystem rather than boost
Note: Requires linking with -lstdc++fs

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
61d116a1f1 sstable::seal_sstable: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
90118fa9ef sstable: create sstable component files in a subdirectory
When writing the sstable, create a temporary directory
for creating all components so that each sstable files' will be
assigned a different allocaton groups on xfs.

Files are immediately renamed to their default location after creation.
Temp directory is removed when the sstable is sealed.

Additional work to be introduced in the following patches:
- When populating tables, temp directories need to be looked up and removed.

Fixes #3167

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
23d8afb20d sstable::new_sstable_component_file: pass component_type rather than filename
So we can create the file in the sstable directory and then move into the final location

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
7b170eb0dc sstable: cleanup filename related functions
- use const sstring& params rather than sstring
- returning const sstring is superfleous

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
ad5f1e4fbb sstable: make write_crc, write_digest, and new_sstable_component_file private methods
Prepare for per-sstable sub directory.
Also, these functions get most of their parameters from the sst at hand so they might
as well be first class members.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Avi Kivity
2a0a36d48b tools: update toolchain to fedora-29-20181202
Added: git, sudo, python
Message-Id: <20181202185608.14141-1-avi@scylladb.com>
2018-12-02 19:00:55 +00:00
Benny Halevy
d257e5c123 sstable: remove unused get_sstable_key_range
Since 024c8ef8a1
db: adjust sstable load to use sstable self-reporting of shard ownership

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181202114523.14296-1-bhalevy@scylladb.com>
2018-12-02 18:32:34 +02:00
Avi Kivity
224c4c0b81 tools: add frozen toolchain support
Add a reference to a docker image that contains an "official" toolchain
for building Scylla. In addition, add a script that allows easy usage of
the image, and some documentation.
Message-Id: <20181202120829.21218-1-avi@scylladb.com>
2018-12-02 18:32:34 +02:00
Takuya ASADA
0fdf807f51 install-dependencies.sh: add missing packages to run build in Fedora container
git, python, sudo packages are installed by default on normal Fedora
installation but not in Docker image, we need to install it by this
script.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181201020834.24961-1-syuu@scylladb.com>
2018-12-02 12:51:29 +02:00
Avi Kivity
009cbd3dcb Merge "Fix multiple summary regeneration bugs." from Vladimir
"
This patchset addresses two recently discovered bugs both triggered by
summary regeneration:

Tests: unit {release}

+

Validated with debug build of Scylla (ASAN) that no use-after-free
occurs when re-generating Summary.db.
"

* 'projects/sstables-30/summary-regeneration/v1' of https://github.com/argenet/scylla:
  tests: Add test reading SSTables in 'mc' format with missing summary.
  sstables: When loading, read statistics before summary.
  database: Capture io_priority_class by reference to avoid dangling ref.
2018-12-02 11:56:18 +02:00
Vladimir Krivopalov
d24875b736 tests: Add test reading SSTables in 'mc' format with missing summary.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-11-30 11:56:56 -08:00
Vladimir Krivopalov
b0e5404071 sstables: When loading, read statistics before summary.
In case if summary is missing and we attempt to re-generate it,
statistics must be already read to provide us with values stored in
serialization header to facilitate clustering prefixes deserialization.

Fixes #3947

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-11-30 11:56:56 -08:00
Vladimir Krivopalov
68458148e7 database: Capture io_priority_class by reference to avoid dangling ref.
The original reference points to a thread-local storage object that
guaranteed to outlive the continuation, but copying it make the
subsequent calls point to a local object and introduces a use-after-free
bug.

Fixes #3948

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-11-30 10:43:36 -08:00
Piotr Jastrzebski
329303cae7 sstables: Add test_sstable_reader_on_unknown_column
This test checks that sstable reader throws an exception
when sstable contains a column that's not present in the schema.

It also checks that dropped columns do not cause exceptions.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-30 10:29:47 +01:00
Piotr Jastrzebski
5cc3f904ce sstables: Exception on sstable's column not present in schema
Previously such column was ignored but it's better to be explicit
about this situation.

Refs #2598

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-30 08:59:13 +01:00
Piotr Jastrzebski
c0ce94c6f9 sstables: store column name in column_translation::column_info
This will be used for better diagnostics.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-30 08:59:00 +01:00
Duarte Nunes
1afda28cf3 Merge 'Fix filtering with LIMIT' from Piotr
"
This series adds proper handling of filtering queries with LIMIT.
Previously the limit was erroneously applied before filtering,
which leads to truncated results.

To avoid that, paged filtering queries now use an enhanced pager,
which remembers how many rows dropped and uses that information
to fetch for more pages if the limit is not yet reached.

For unpaged filtering queries, paging is done internally as in case
of aggregations to avoid returning keeping huge results in memory.

Also, previously, all limited queries used the page size counted
from max(page size, limit). It's not good for filtering,
because with LIMIT 1 we would then query for rows one-by-one.
To avoid that, filtered queries ask for the whole page and the results
are truncated if need be afterwards.

Tests: unit (release)
"

* 'fix_filtering_with_limit_2' of https://github.com/psarna/scylla:
  tests: add filtering with LIMIT test
  tests: split filtering tests from cql_query_test
  cql3: add proper handling of filtering with LIMIT
  service/pager: use dropped_rows to adjust how many rows to read
  service/pager: virtualize max_rows_to_fetch function
  cql3: add counting dropped rows in filtering pager
2018-11-29 23:07:40 +00:00
Piotr Jastrzebski
654eeb30ac sstables: Make test_dropped_column_handling test dropped columns
Before it was testing missing columns.
It's better to test dropped columns because they should be ignored
while for missing columns some sources will throw.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-29 16:16:44 +01:00
Avi Kivity
2dba809844 Merge "scylla_io_setup: support multiple devices" from Benny
"
This patchset adds support to scylla_io_setup for
multiple data directories as well as commitlog,
hints, and saved_caches directories.

Refs #2415

Tests: manual testing with scylla-ccm generated scylla.yaml
"

* 'projects/multidev/v3' of https://github.com/bhalevy/scylla:
  scylla_io_setup: assume default directories under /var/lib/scylla
  scylla_io_setup: add support for commitlog, hints, and saved_caches directory
  scylla_io_setup: support multiple data directories
2018-11-29 16:44:33 +02:00
Piotr Sarna
7adbdaba0b tests: add filtering with LIMIT test
Refs #3902
2018-11-29 14:53:30 +01:00
Piotr Sarna
5f97c78875 tests: split filtering tests from cql_query_test
In order to avoid blowing cql_query_test even more out of proportions,
all filtering tests are moved to a separate file.
2018-11-29 14:53:30 +01:00
Piotr Sarna
acf4eadf88 cql3: add proper handling of filtering with LIMIT
Previously, limit was erroneously applied before filtering,
which might have resulted in truncated results.
Now, both paged and unpaged queries are filtered first,
and only after that properly trimmed so only X rows are returned
for LIMIT X.

Fixes #3902
2018-11-29 14:53:30 +01:00
Piotr Sarna
5b052bdae5 service/pager: use dropped_rows to adjust how many rows to read
Filtering pager may drop some rows and as a result return less
than what was fetched from the replica. To properly adjust how
many rows were actually read, dropped_rows variable is introduced.
2018-11-29 14:53:29 +01:00
Piotr Sarna
021caeddf7 service/pager: virtualize max_rows_to_fetch function
Regular pagers use max_rows to figure out how many rows to fetch,
but filtering pager potentially needs the whole page to be fetched
in order to filter the results.
2018-11-29 14:14:37 +01:00
Benny Halevy
5ec191536e scylla_io_setup: assume default directories under /var/lib/scylla
If a specific directory is not configure in scylla.yaml, scylla assumes
a default location under /var/lib/scylla.

Hard code these locations in scylla_io_setup until we have a better way
to probe scylla about it.

Be permissive and ignore the default directories if they don't not exist
on disk and silently ignore them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-11-29 15:07:29 +02:00
Piotr Sarna
4f5ee3dfcd cql3: add counting dropped rows in filtering pager
Counter for dropped rows is added to the filtering pager.
This metrics can be used later to implement applying LIMIT
to filtering queries properly.
Dropped rows are returned on visitor::accept_partition_end.
2018-11-29 14:06:59 +01:00
Benny Halevy
88b85b363a scylla_io_setup: add support for commitlog, hints, and saved_caches directory
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-11-29 10:09:17 +02:00
Benny Halevy
e4382caa4a scylla_io_setup: support multiple data directories
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-11-29 10:09:17 +02:00
Alexys Jacob
00476c3946 scylla-housekeeping: fix python3 compat and shebang 2018-11-29 00:04:02 +01:00
Alexys Jacob
1cf41760a8 dist/ami/files/scylla_install_ami: python3 shebang 2018-11-29 00:00:41 +01:00
Alexys Jacob
a6447f543c dist/docker/redhat/docker-entrypoint.py: add encoding comment 2018-11-29 00:00:19 +01:00
Alexys Jacob
9f041158df fix_system_distributed_tables.py: fix python3 compat and shebang 2018-11-28 23:59:51 +01:00
Alexys Jacob
887322daa2 gen_segmented_compress_params.py: add encoding comment 2018-11-28 23:59:18 +01:00
Alexys Jacob
14e65e1089 idl-compiler.py: python3 shebang 2018-11-28 23:58:38 +01:00
Alexys Jacob
170120a391 scylla-gdb.py: python3 shebang 2018-11-28 23:58:14 +01:00
Alexys Jacob
3902922113 configure.py: python3 shebang 2018-11-28 23:57:54 +01:00
Alexys Jacob
d2dbbba139 tools/scyllatop/: add / normalize python3 shebang 2018-11-28 23:57:03 +01:00
Alexys Jacob
e321b839c7 scripts/: add / normalize python3 shebang 2018-11-28 23:56:35 +01:00
Alexys Jacob
02656fb00e dist/common/scripts: add / normalize python3 shebang 2018-11-28 23:55:26 +01:00
Alexys Jacob
954da947f8 test.py: add encoding comment 2018-11-28 23:54:41 +01:00
Alexys Jacob
cbd72786dd setup.py: add python3 classifiers 2018-11-28 23:54:03 +01:00
Dan Yasny
019a2e3a27 scylla_setup: Mark required args
Fixes #3945

Message-Id: <20181128220549.3083-1-dyasny@gmail.com>
2018-11-28 22:30:02 +00:00
Avi Kivity
de17150cb2 Update seastar submodule
* seastar 1fbb633...132e6cd (2):
  > scripts: json2code: port to Python 3
  > docker/dev/Dockerfile: add c-ares-devel to docker setup
2018-11-28 19:05:21 +02:00
Duarte Nunes
a589dade07 Merge 'Fix checking for multi-column restrictions in filtering' from Piotr
"
This series fixes #3891 by amending the way restrictions
are checked for filtering. Previous implementation that returned
false from need_filtering() when multi-column restrictions
were present was incorrect.
Now, the error is going to be returned from restrictions filter layer,
and once multi-column support is implemented for filtering, it will
require no further changes.

Tests: unit (release)
"

* 'fix_multi_column_filtering_check_3' of https://github.com/psarna/scylla:
  tests: add multi-column filtering check
  cql3: remove incorrect multi-column check
  cql3: check filtering restrictions only if applicable
  cql3: add pk/ck_restrictions_need_filtering()
2018-11-28 15:36:37 +00:00
Piotr Sarna
ae0ffa6575 tests: add multi-column filtering check
Multi-column restrictions filtering is not supported yet,
so a simple case to ensure that is added.
2018-11-28 13:58:16 +01:00
Piotr Sarna
0013929782 cql3: remove incorrect multi-column check
need_filtering() incorrectly returned false if multi-column restrictions
were present. Instead, these restrictions should be allowed to need
filtering.

Fixes #3891
2018-11-28 13:58:16 +01:00
Piotr Sarna
65f21cc518 cql3: check filtering restrictions only if applicable
Primary key restrictions should be checked only when they need
filtering - otherwise it's superfluous, since they were already
applied on query level.
2018-11-28 13:58:16 +01:00
Piotr Sarna
f59ddcab52 cql3: add pk/ck_restrictions_need_filtering()
These functions return true if partition/clustering key restriction
parts of statement restrictions require filtering.
2018-11-28 13:58:16 +01:00
Duarte Nunes
d09d4bbd91 Merge 'Fix checking if system tables need view updates' from Piotr
"
This miniseries ensures that system tables are not checked
for having view updates, because they never do.
What's more, distributed system table is used in the process,
so it's unsafe to query the table while streaming it.

Tests: unit (release), dtest(update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test)
"

* 'fix_checking_if_system_tables_need_view_updates_3' of https://github.com/psarna/scylla:
  streaming: don't check view building of system tables
  database: add is_internal_keyspace
  streaming: remove unused sstable_is_staging bool class
2018-11-28 10:00:34 +00:00
Piotr Sarna
8e6021dfa1 streaming: don't check view building of system tables
System tables will never need view building, and, what's more,
are actually used in the process of view build checking.
So, checking whether system tables need a view update path
is simplified to returning 'false'.
2018-11-28 09:21:56 +01:00
Piotr Sarna
1336b9ee31 database: add is_internal_keyspace
Similarly to is_system_keyspace, it will allow checking if a keyspace
is created for internal use.
2018-11-28 09:21:56 +01:00
Piotr Sarna
6ad2c39f88 streaming: remove unused sstable_is_staging bool class
sstable_is_staging bool class is not used anywhere in the code anymore,
so it's removed.
2018-11-28 09:21:56 +01:00
Duarte Nunes
9f639edaa2 Merge 'storage_proxy: fix some bugs in early (due to errors) request completion' from Gleb
"
The series fixed #3565 and #3566
"

* 'gleb/write_failure_fixes' of github.com:scylladb/seastar-dev:
  storage_proxy: store hint for CL=ANY if all nodes replied with failure
  storage_proxy: complete write request early if all replicas replied with success of failure
  storage_proxy: check that write failure response comes from recognized replica
  storage_proxy: move code executed on write timeout into separate function
2018-11-27 21:44:01 +00:00
Takuya ASADA
52f030806f install-dependencies.sh: fix dependency issues on Debian variants
Sync Debian variants dependencies with dist/debian/control.mustache
(before merging relocatable), use scylla 3rdparty packages.

Since we use 3rdparty repo on seastar/install-dependencies.sh, drop repo
setup part from this script.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181031122800.11802-1-syuu@scylladb.com>
2018-11-27 21:44:01 +00:00
Gleb Natapov
17197fb005 storage_proxy: store hint for CL=ANY if all nodes replied with failure
Current code assumes that request failed if all replicas replied with
failure, but this is not true for CL=ANY requests. Take it into account.

Fixed: #3565
2018-11-27 15:06:37 +02:00
Gleb Natapov
d1d04eae3c storage_proxy: complete write request early if all replicas replied with success of failure
Currently if write request reaches CL and all replicas replied, but some
replied with failures, the request will wait for timeout to be retired.
Detect this case and retire request immediately instead.

Fixes #3566
2018-11-27 14:49:37 +02:00
Gleb Natapov
76ab3d716b storage_proxy: check that write failure response comes from recognized replica
Before accounting failure response we need to make sure it comes from a
replica that participates in the request.
2018-11-27 14:44:49 +02:00
Rafael Ávila de Espíndola
777ea893e6 Delete data_consume_rows_at_once.
As far as I can tell the old sstable reading code required reading the
data into a contiguous buffer. The function data_consume_rows_at_once
implemented the old behavior and incrementally code was moved away
from it.

Right now the only use is in two tests. The sstables used in those
tests are already used in other tests with data_consume_rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181127024319.18732-2-espindola@scylladb.com>
2018-11-27 14:11:50 +02:00
Avi Kivity
1ff6b8fb96 Merge "Don't binary compare compressed sstables in test_write_many_partitions_* tests" from Piotr
"
Compression is not deterministic so instead of binary comparing the sstable files we just read data back
and make sure everything that was written down is still present.

Tests: unit(release)
"

* 'haaawk/binary-compare-of-compressed-sstables/v3' of github.com:scylladb/seastar-dev:
  sstables: Remove compressed parameter from get_write_test_path
  sstables: Remove unused sstable test files
  sstables: Ensure compare_sstables isn't used for compressed files
  sstables: Don't binary compare compressed sstables
  sstables: Remove debug printout from test_write_many_partitions
2018-11-27 14:01:20 +02:00
Duarte Nunes
098dd90bd2 Merge 'Reduce dependencies around consistency_level.hh' from Avi
"
consistency_level.hh is rather heavyweighy in both its contents and what it
includes. Reduce the number of inclusion sites and split the file to reduce
dependencies.
"

* tag 'cl-header/v2' of https://github.com/avikivity/scylla:
  consistency_level: simplify validation API
  Split consistency_level.hh header
  database: remove unneeded consistency_level.hh include
  cql: remove unneeded includes of consistency_level.hh
2018-11-27 11:59:34 +00:00
Piotr Jastrzebski
4366302c4c sstables: Extract mp_row_cosumer_m::check_schema_mismatch
This method will contain common logic used in multiple places
and reduce code duplication.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <bbda2f4ea4f9325055f096dc549f63b1bb03d3b6.1543311990.git.piotr@scylladb.com>
2018-11-27 12:45:12 +01:00
Avi Kivity
4676e07400 consistency_level: simplify validation API
Remove unused parameters, replace refcounted pointers by references.
2018-11-27 13:41:49 +02:00
Avi Kivity
2c08bff8d5 Split consistency_level.hh header
It has two unrelated users: cql for validation, and storage_proxy for
complicated calculations. Split the simple stuff into a new header to reduce
dependencies.
2018-11-27 13:32:10 +02:00
Avi Kivity
b015f41344 database: remove unneeded consistency_level.hh include 2018-11-27 13:30:56 +02:00
Gleb Natapov
7bc68aa0eb storage_proxy: move code executed on write timeout into separate function
Currently the callback is in lambda, but we will want to call the code
not only during timer expiration.
2018-11-27 13:23:30 +02:00
Avi Kivity
9201d22c06 cql: remove unneeded includes of consistency_level.hh
Move the includes to .cc to reduce include pollution.
2018-11-27 13:18:33 +02:00
Raphael S. Carvalho
626afa6973 database: conditionally release sstable references from compaction manager
Not all compaction operations submitted through compaction manager sets a callback
for releasing references of exhausted sstables in compaction manager itself.
That callback lives in compaction descriptor which is passed to table::compaction().
Let's make the call conditional to avoid bad function call exceptions.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181126235616.10452-1-raphaelsc@scylladb.com>
2018-11-27 12:10:43 +01:00
Avi Kivity
2eaeb3e4eb Update swagger-ui submodule
Updates to version 2.2.10 with a local change (from Amnon) to support our location.

Fixes #3942.
2018-11-27 13:01:02 +02:00
Tomasz Grabiec
17a8a9d13d gdb: Properly parse unique_ptr in 'scylla lsa'
There's no _M_t._M_head_impl any more in the standard library.

We now have std_unique_ptr wrapper which abstracts this fact away so
use that.

Message-Id: <20181126174837.11542-1-tgrabiec@scylladb.com>
2018-11-27 12:32:41 +02:00
Tomasz Grabiec
eecda72175 gdb: Adjust 'scylla lsa' for removal of emergency reserve
There's no _emergency_reserve any more. Show _free_segments instead.

Message-Id: <20181126174837.11542-2-tgrabiec@scylladb.com>
2018-11-27 12:32:37 +02:00
Avi Kivity
5e759b0c07 Merge "Optimize checksum computation for the MC sstable format" from Tomek
"
One part of the improvement comes from replacing zlib's CRC32 with the one
from libdeflate, which is optimized for modern architecture and utilizes the
PCLMUL instruction.

perf_checksum test was introduced to measure performance of various
checksumming operations.

Results for 514 B (relevant for writing with compression enabled):

    test                                      iterations      median         mad         min         max
    crc_test.perf_deflate_crc32_combine            58414    16.711us     3.483ns    16.708us    16.725us
    crc_test.perf_adler_combine                165788278     6.059ns     0.031ns     6.027ns     7.519ns
    crc_test.perf_zlib_crc32_combine               59546    16.767us    26.191ns    16.741us    16.801us
    ---
    crc_test.perf_deflate_crc32_checksum        12705072    83.267ns     4.580ns    78.687ns    98.964ns
    crc_test.perf_adler_checksum                 3918014   206.701ns    23.469ns   183.231ns   258.859ns
    crc_test.perf_zlib_crc32_checksum            2329682   428.787ns     0.085ns   428.702ns   510.085ns

Results for 64 KB (relevant for writing with compression disabled):

    test                                      iterations      median         mad         min         max
    crc_test.perf_deflate_crc32_combine            25364    38.393us    17.683ns    38.375us    38.545us
    crc_test.perf_adler_combine                169797143     5.842ns     0.009ns     5.833ns     6.901ns
    crc_test.perf_zlib_crc32_combine               26067    38.663us    95.094ns    38.546us    40.523us
    ---
    crc_test.perf_deflate_crc32_checksum          202821     4.937us    14.426ns     4.912us     5.093us
    crc_test.perf_adler_checksum                   44684    22.733us   206.263ns    22.492us    25.258us
    crc_test.perf_zlib_crc32_checksum              18839    53.049us    36.117ns    53.013us    53.274us

The new CRC32 implementation (deflate_crc32) doesn't provide a fast
checksum_combine() yet, it delegates to zlib so it's as slow as the latter.

Because for CRC32 checksum_combine() is several orders of magnitude slower
than checksum(), we avoid calling checksum_combine() completely for this
checksummer. We still do it for adler32, which has combine() which is faster
than checksum().

SStable write performance was evaluated by running:

  perf_fast_forward --populate --data-directory /tmp/perf-mc \
     --rows=10000000 -c1 -m4G --datasets small-part

Below is a summary of the average frag/s for a memtable flush. Each result is
an average of about 20 flushes with stddev of about 4k.

Before:

 [1] MC,lz4: 330'903
 [2] LA,lz4: 450'157
 [3] MC,checksum: 419'716
 [4] LA,checksum: 459'559

After:

 [1'] MC,lz4: 446'917 ([1] + 35%)
 [2'] LA,lz4: 456'046 ([2] + 1.3%)
 [3'] MC,checksum: 462'894 ([3] + 10%)
 [4'] LA,checksum: 467'508 ([4] + 1.7%)

After this series, the performance of the MC format writer is similar to that
of the LA format before the series.

There seems to be a small but consistent improvement for LA too. I'm not sure
why.
"

* tag 'improve-mc-sstable-checksum-libdeflate-v3' of github.com:tgrabiec/scylla:
  tests: perf: Introduce perf_checksum
  tests: Add test for libdeflate CRC32 implementation
  sstables: compress: Use libdeflate for crc32
  sstables: compress: Rename crc32_utils to zlib_crc32_checksummer
  licenses: Add libdeflate license
  Integrate libdeflate with the build system
  Add libdeflate submodule
  sstables: Avoid checksum_combine() for the crc32 checksummer
  sstables: compress: Avoid unnecessary checksum_combine()
  sstables: checksum_utils: Add missing include
2018-11-26 20:10:46 +02:00
Tomasz Grabiec
f1a35b654a tests: perf: Introduce perf_checksum 2018-11-26 18:59:43 +01:00
Tomasz Grabiec
5b6e3fb5ed tests: Add test for libdeflate CRC32 implementation 2018-11-26 18:59:42 +01:00
Tomasz Grabiec
bf0164cdaf sstables: compress: Use libdeflate for crc32
Improves memtable flush performance by 10% in a CPU-bound case.

Unlike the zlib implementation, libdeflate is optimized for modern
CPUs. It utilizes the PCLMUL instruction.
2018-11-26 18:59:42 +01:00
Tomasz Grabiec
0ac1905f4f sstables: compress: Rename crc32_utils to zlib_crc32_checksummer 2018-11-26 18:59:42 +01:00
Tomasz Grabiec
ba141a4852 licenses: Add libdeflate license 2018-11-26 18:59:41 +01:00
Tomasz Grabiec
048d569b45 Integrate libdeflate with the build system 2018-11-26 18:59:09 +01:00
Tomasz Grabiec
f704f7bc19 Add libdeflate submodule 2018-11-26 18:57:51 +01:00
Tomasz Grabiec
743cf43847 sstables: Avoid checksum_combine() for the crc32 checksummer
checksum_combine() is much slower than re-feeding the buffer to
checksum() for the zlib CRC32 checksummer.

Introduce Checksum::prefer_combine() to determine this and select
more optimal behavior for given checksummer.

Improves performance of memtable flush with compression enabled by 30%.
2018-11-26 18:57:33 +01:00
Avi Kivity
b351a9fee7 db/repair_decision.hh: add missing #include
Message-Id: <20181126154948.2453-1-avi@scylladb.com>
2018-11-26 18:49:08 +01:00
Tomasz Grabiec
88cf1c61ba sstables: compress: Avoid unnecessary checksum_combine() 2018-11-26 14:31:38 +01:00
Tomasz Grabiec
8372cf7bcc sstables: checksum_utils: Add missing include 2018-11-26 14:31:38 +01:00
Avi Kivity
c6d700279b class_registry: introduce a non-static variant of class_registry
class_registry's staticness brings has the usual problem of
static classes (loss of dependency information) and prevents us
from librarifying Scylla since all objects that define a registration
must be linked in.

Take a first step against this staticness by defining a nonstatic
variant. The static class_registry is then redefined in terms of the
nonstatic class. After all uses have been converted, the static
variant can be retired.
Message-Id: <20181126130935.12837-1-avi@scylladb.com>
2018-11-26 13:30:21 +00:00
Paweł Dziepak
62ea153629 Merge "Check for schema mismatch after dropping dead cells" from Piotr
"
Previously we were checking for schema incompatibility between current schema and sstable
serialization header before reading any data. This isn't the best approach because
data in sstable may be already irrelevant due to column drop for example.

This patchset moves the check after actual data is read and verified that it has
a timestamp new enough to classify it as nonobsolete.

Fixes #3924
"

* 'haaawk/3924/v3' of github.com:scylladb/seastar-dev:
  sstables: Enable test_schema_change for MC format
  sstables3: Throw error on schema mismatch only for live cells
  sstables: Pass column_info to consume_*_column
  sstables: Add schema_mismatch to column_info
  sstables: Store column data type in column_info
  sstables: Remove code duplication in column_translation
2018-11-26 13:10:18 +00:00
Avi Kivity
9a46ee69d4 doc: fix BYPASS CACHE documentation
BYPASS CACHE was mistakenly documenting an earlier version of the patch.
Correct it to document th committed version.
Message-Id: <20181126125810.9344-1-avi@scylladb.com>
2018-11-26 13:04:52 +00:00
Piotr Jastrzebski
dec48dd1e2 sstables: Remove compressed parameter from get_write_test_path
This parameter is no longer used.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:46:23 +01:00
Piotr Jastrzebski
92ffccd636 sstables: Remove unused sstable test files
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:35:15 +01:00
Piotr Jastrzebski
a29c9189cb sstables: Ensure compare_sstables isn't used for compressed files
Binary comparing compressed sstables is wrong because compression
is not deterministic.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:35:15 +01:00
Piotr Jastrzebski
7e263208f0 sstables: Don't binary compare compressed sstables
This family of test_write_many_partitions_* tests writes
sstables down from memtable using different compressions.
Then it compares the resulting file with a blueprint file
and reads the data back to check everything is there.

Compression is not deterministic so this patch makes the
tests not compare resulting compressed sstable file with blueprint
file and instead only read data back.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:35:03 +01:00
Piotr Jastrzebski
5c86294a56 sstables: Enable test_schema_change for MC format
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:25:23 +01:00
Piotr Jastrzebski
4bdb86c712 sstables3: Throw error on schema mismatch only for live cells
Previously we were throwing exception during the creation of
column_translation. This wasn't always correct because sometimes
column for which the mismatch appeared was already dropped and
data present in sstable should be ignored anyway.

Fixes #3924

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:25:10 +01:00
Piotr Sarna
6ab8235369 main: fix deinitialization order for view update generator
View update generator should be stopped only after
drain_on_shutdown() is performed on storage service.
Message-Id: <4d2bda4c73422a2ebf46d6dcd06c95d960839889.1543230849.git.sarna@scylladb.com>
2018-11-26 11:21:37 +00:00
Duarte Nunes
2a371c2689 Merge 'Allow bypassing cache on a per-query basis' from Avi
"
Some queries are very unlikely to hit cache. Usually this includes
range queries on large tables, but other patterns are possible.

While the database should adapt to the query pattern, sometimes the
user has information the database does not have. By passing this
information along, the user helps the database manage its resources
more optimally.

To do this, this patch introduces a BYPASS CACHE clause to the
SELECT statement. A query thus marked will not attempt to read
from the cache, and instead will read from sstables and memtables
only. This reduces CPU time spent to query and populate the cache,
and will prevent the cache from being flooded with data that is
not likely to be read again soon. The existing cache disabled path
is engaged when the option is selected.

Tests: unit (release), manual metrics verification with ccm with and without the
    BYPASS CACHE clause.

Ref #3770.
"

* tag 'cache-bypass/v2' of https://github.com/avikivity/scylla:
  doc: document SELECT ... BYPASS CACHE
  tests: add test for SELECT ... BYPASS CACHE
  cql: add SELECT ... BYPASS CACHE clause
  db: add query option to bypass cache
2018-11-26 09:59:40 +00:00
Paweł Dziepak
13385778fd Merge "Measure performance of dataset population in perf_fast_forward" from Tomasz
* tag 'perf-ffwd-dataset-population-v2' of github.com:tgrabiec/scylla:
  tests: perf_fast_forward: Measure performance of dataset population
  tests: perf_fast_forward: Record the dataset on which test case was run
  tests: perf_fast_forward: Introduce the concept of a dataset
  tests: perf_fast_forward: Introduce make_compaction_disabling_guard()
  tests: perf_fast_forward: Initialize output manager before population
  tests: perf_fast_forward: Handle empty test parameter set
  tests: perf_fast_forward: Extract json_output_writer::write_common_test_group()
  tests: perf_fast_forward: Factor out access to cfg to a single place per function
  tests: perf_fast_forward: Extract result_collector
  tests: perf_fast_forward: Take writes into account in AIO statistics
  tests: perf_fast_forward: Reorder members
  tests: perf_fast_forward: Add --sstable-format command line option
2018-11-26 09:45:55 +00:00
Avi Kivity
58033ad3a4 doc: document SELECT ... BYPASS CACHE
Add a new cql-extensions.md file and document BYPASS CACHE there.
2018-11-26 11:37:52 +02:00
Avi Kivity
f69401c609 tests: add test for SELECT ... BYPASS CACHE
The test verifies that cache read metrics are not incremented during a cache
bypass read.
2018-11-26 11:37:52 +02:00
Avi Kivity
ecf3f92ec7 cql: add SELECT ... BYPASS CACHE clause
The BYPASS CACHE clause instructs the database not to read from or populate the
cache for this query. The new keywords (BYPASS and CACHE) are not reserved.
2018-11-26 11:37:49 +02:00
Takuya ASADA
7740cd2142 dist/common/systemd/scylla-housekeeping-restart.service.mustache: specify correct repo for Debian variants
We do specify correct repo for both Red Hat/Debian variants on -deily, but
mistakenly don't for -restart, so do same on -restart.

Fixes #3906

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181109224509.27380-1-syuu@scylladb.com>
2018-11-26 11:02:25 +02:00
Rafael Ávila de Espíndola
6746907999 Use fully covered switches in continuous_data_consumer
do_process_buffer had two unreachable default cases and a long
if-else-if chain.

This converts the the if-else-if chain to a switch and a helper
function.

This moves the error checking from run time to compile time. If we
were to add a 128 bit integer for example, gcc would complain about it
missing from the switch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181125221451.106067-1-espindola@scylladb.com>
2018-11-25 22:52:11 +00:00
Avi Kivity
b4765af790 Merge "Introduce SSTable-run-based compaction" from Raphael
"
This new compaction approach consists of releasing exhausted fragments[1] of a run[2] a
compaction proceeds, so decreasing considerably the space requirement.
These changes will immediately benefit leveled strategy because it already works with
the run concept.

[1] fragment is a sstable composing a run; exhausted means sstable was fully consumed
by compaction procedure.
[2] run is a set of non-overlapping sstables which roughly span the
entire token range.

Note:
Last patch includes an example compaction strategy showing how to work with the interface.

unit tests: all modes passing
dtests: compaction ones passing
"

* 'sstable_run_based_compaction_v10' of github.com:raphaelsc/scylla:
  tests: add example compaction strategy for sstable run based approach
  sstables/compaction: propagate sstable replacement to all compaction of a CF
  sstables: store cf pointer in compaction_info
  tests/sstable_test: add test for compaction replacement of exhausted sstable
  sstables: add sstable's on closed handling
  tests/sstables: add test for sstable run based compaction
  sstables/compaction_manager: prevent partial run from being selected for compaction
  compaction: use same run identifier for sstables generated by same compaction
  sstables: introduce sstable run
  sstables/compaction_manager: release reference to exhausted sstable through callback
  sstables/compaction: stop tracking exhausted input sstable in compaction_read_monitor
  database: do not keep reference to sstable in selector when done selecting
  compaction: share sstable set with incremental reader selector
  sstables/compaction: release space earlier of exhausted input sstables
  sstables: make partitioned sstable set's incremental selector resilient to changes in the set
  database: do not store reference to sstable in incremental selector
  tests/sstables: add run identifier correctness test
  sstables: use a random uuid for sstables without run identifier
  sstables: add run identifier to scylla metadata
2018-11-25 17:20:24 +02:00
Avi Kivity
b835b93ee6 db: add query option to bypass cache
With the option enabled, we bypass the cache unconditionally and only
read from memtables+sstables. This is useful for analytics queries.
2018-11-25 16:26:08 +02:00
Piotr Jastrzebski
c2561a2796 sstables: Remove debug printout from test_write_many_partitions
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-25 13:29:10 +01:00
Raphael S. Carvalho
3fa70d6b5f tests: add example compaction strategy for sstable run based approach
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 20:16:54 -02:00
Raphael S. Carvalho
2058001f94 sstables/compaction: propagate sstable replacement to all compaction of a CF
This is needed for parallel compaction to work with sstable run based approach.
That's because regular compaction clones a set containing all sstables of its
column family. So compaction A can potentially hold a reference to a compacting
sstable of compaction B, so preventing compacting B from releasing its exhausted
sstable.

So all replacements are propagated to all compactions of a given column family,
and compactions in turn, including the one which initiated the propagation,
will do the replacement.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:30 -02:00
Raphael S. Carvalho
953fdcc867 sstables: store cf pointer in compaction_info
motivation is that we need a more efficient way to find compactions
that belong to a given column family in compaction list.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:28 -02:00
Raphael S. Carvalho
baf89f0df3 tests/sstable_test: add test for compaction replacement of exhausted sstable
Make sure that compaction is capable of releasing exhausted sstable space
early in the procedure.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:26 -02:00
Raphael S. Carvalho
824c20b76d sstables: add sstable's on closed handling
Motivation is that it will be useful for catching regression on compaction
when releasing early exhausted sstables. That's because sstable's space
is only released once it's closed. So this will allow us to write a test
case and possibly use it for entities holding exhausted sstable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:25 -02:00
Raphael S. Carvalho
0085e8371d tests/sstables: add test for sstable run based compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:23 -02:00
Raphael S. Carvalho
e88d1d54b9 sstables/compaction_manager: prevent partial run from being selected for compaction
Filter out sstable belonging to a partial run being generated by an ongoing
compaction. Otherwise, that could lead to wrong decisions by the compaction
strategy.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:22 -02:00
Raphael S. Carvalho
23884fe9f6 compaction: use same run identifier for sstables generated by same compaction
SSTables composing the same run will share the same run identifier.
Therefore, a new compaction strategy will be able to get all sstables belong
to the same run from sstable_set, which now keeps track of existing runs.

Same UUID is passed to writers of a given compaction. Otherwise, a new UUID
is picked for every sstable created by compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:20 -02:00
Raphael S. Carvalho
4f68cb34a6 sstables: introduce sstable run
sstable run is a structure that will hold all sstables that has the same
run identifier. All sstables belonging to the same run will not overlap
with one another.
It can be used by compaction strategy to work on runs instead of individual
sstables.

sstable_set structure which holds all sstables for a given column family
will be responsible for providing to its user an interface to work with
runs instead of individual sstables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:18 -02:00
Raphael S. Carvalho
fc92fb955d sstables/compaction_manager: release reference to exhausted sstable through callback
That's important for the reference to sstable to not be kept throughout
the compaction procedure, which would break the goal of releasing
space during compaction.

Manager passes a callback to compaction which calls it whenever
there's sstable replacement.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:16 -02:00
Raphael S. Carvalho
3f309ebba9 sstables/compaction: stop tracking exhausted input sstable in compaction_read_monitor
Motivation is that we want to release space for exhausted sstable and that
will only happen when all references to it are gone *and* that backlog
tracker takes the early replacement into account.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:13 -02:00
Raphael S. Carvalho
3433de3dc0 database: do not keep reference to sstable in selector when done selecting
When compacting, we'll create all readers at once and will not select
again from incremental selector, meaning the selector will keep all
respective sstables in current_sstables, preventing compaction from
releasing space as it goes on.

The change is about refreshing sstable set's selector such that it
will not hold a reference to an exhausted sstable whatsoever.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:12 -02:00
Raphael S. Carvalho
f6df949c1a compaction: share sstable set with incremental reader selector
By doing that, we'll be able to release exhausted sstable from both
simulteaneously.
That's achieved by sharing set containing input sstables with the incremental
reader selector and removing exhausted sstables from shared set when the
time has come.

Step towards reducing disk requirement for compaction by making it delete
sstable which all data is in a sealed new sstable. For that to happen,
all references must be gone.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:10 -02:00
Raphael S. Carvalho
e5a0b05c15 sstables/compaction: release space earlier of exhausted input sstables
Currently, compaction only replace input sstables at end of compaction,
meaning compaction must be finished for all the space of those sstables
to be released.

What we can do instead is to delete earlier some input sstable under
some conditions:

1) SStable data should be committed to a new, sealed output sstable,
meaning it's exhausted.
2) Exhausted sstable mustn't overlap with a non-exhausted sstable
because a tombstone in the exhausted could have been purged and the
shadowed data in non-exhausted could be ressurected if system
crashes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:07 -02:00
Raphael S. Carvalho
ace070c8fc sstables: make partitioned sstable set's incremental selector resilient to changes in the set
The motivation is that compaction may remove a sstable from the set while the
incremental selector is alive, and for that to work, we need to invalidate
the iterators stored by the selector. We could have added a method to notify
it, but there will be a case where the one keeping the set cannot forward
the notification to the selector. So it's better for the selector to take
care of itself. Change counter approach is used which allows the selector
to know when to invalidate the iterators.

After invalidation, selector will move the iterator back into its right
place by looking for lower bound for current pos.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:05 -02:00
Raphael S. Carvalho
8d11b0bbb4 database: do not store reference to sstable in incremental selector
Use sstable generation instead to keep track of read sstables.
The motivation is that we'll not keep reference to sstables, so allowing
their space on disk to be released as soon they get exhausted.
Generation is used because it guarantees uniqueness of the sstable.

Reviewed-by: Botond Dénes <bdenes@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:04 -02:00
Raphael S. Carvalho
edc87014c1 tests/sstables: add run identifier correctness test
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:02 -02:00
Raphael S. Carvalho
a66b1954cc sstables: use a random uuid for sstables without run identifier
Older sstables must have an identifier for them to be associated
with their own run.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:01 -02:00
Raphael S. Carvalho
62025fa52c sstables: add run identifier to scylla metadata
It identifies a run which a particular sstable belongs to.
Existing sstables will have a random uuid associated with it
in memory.

UUID is the correct choice because it allows sstables to be
exported without having conflicts when using identifier generated
by different nodes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:52:44 -02:00
Rafael Ávila de Espíndola
d18bbe9d45 Remove unreachable default cases.
These switches are fully covered. We can be sure they will stay this
way because of -Werror and gcc's -Wswitch warning.

We can also be sure that we never have an invalid enum value since the
state machine values are not read from disk.

The patch also removes a superfluous ';'.
Message-Id: <20181124020128.111083-1-espindola@scylladb.com>
2018-11-24 09:31:51 +00:00
Piotr Jastrzebski
569508158c sstables: Pass column_info to consume_*_column
This will allow checking for schema mismatches
and better error messages.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-23 21:48:14 +01:00
Piotr Jastrzebski
9ca6877cbd sstables: Add schema_mismatch to column_info
This field is true when there's a mismatch
between column type in serialization header and
current schema.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-23 21:48:14 +01:00
Piotr Jastrzebski
51fa8e0c94 sstables: Store column data type in column_info
This will be used to check schema mismatch and
provide informative error message.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-23 21:48:14 +01:00
Piotr Jastrzebski
99dfb9cc96 sstables: Remove code duplication in column_translation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-23 21:48:14 +01:00
Raphael S. Carvalho
d29482dce8 sstables: deprecate sstable metadata's ancestors
The reason for that is that it's not available in sstable format mc,
so we can no longer rely on it in common code for the currently
supported formats.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181121170057.20900-1-raphaelsc@scylladb.com>
2018-11-23 19:38:32 +01:00
Tomasz Grabiec
8e93046abc tests: perf_fast_forward: Measure performance of dataset population 2018-11-23 19:22:50 +01:00
Tomasz Grabiec
2c95aa4d8d tests: perf_fast_forward: Record the dataset on which test case was run
Now any given test case can potentially run on many different datasets.
2018-11-23 19:22:12 +01:00
Tomasz Grabiec
470552b7ab tests: perf_fast_forward: Introduce the concept of a dataset
A dataset represents a table with data, populated in certain way, with
certain characteristics of the schema and data.

Before this change, datasets were implicitly defined, with population
hard-coded inside the populate() function.

This change gathers logic related to datasets into classes, in order to:

  - make it easier to define new datasets.

  - be able to measure performance of dataset population in a
    standardized way.

  - being able to express constraints on datasets imposed by different
    test cases.  Test cases are matched with possible datasets based
    on the abstract interface they accept (e.g. clustered_ds,
    multipartition_ds), and which must be implemented by a compatible
    dataset. To facilitate this matching, test function is now wrapped
    into a dataset_acceptor object, with an automatically-generated can_run()
    virtual method, deduced by make_test_fn().

  - be able to select tests to run based on the dataset name.
    Only tests which are compatible with that dataset will be run.
2018-11-23 19:22:09 +01:00
Tomasz Grabiec
2746f78a9f tests: perf_fast_forward: Introduce make_compaction_disabling_guard() 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
b00d360281 tests: perf_fast_forward: Initialize output manager before population 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
25dc481030 tests: perf_fast_forward: Handle empty test parameter set 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
38a1b7e87b tests: perf_fast_forward: Extract json_output_writer::write_common_test_group() 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
a507ca8159 tests: perf_fast_forward: Factor out access to cfg to a single place per function
Preparatory change before making n_rows be determined through a
dataset object.
2018-11-23 19:18:09 +01:00
Tomasz Grabiec
3fc78a25bf tests: perf_fast_forward: Extract result_collector
Extracts the result collection and reporting logic out of
run_test_case(). Will be needed in population tests, for which we
don't want the looping logic.
2018-11-23 19:18:09 +01:00
Tomasz Grabiec
f4a70283ee tests: perf_fast_forward: Take writes into account in AIO statistics
Relevant for population tests. So far all tests were read tests.
2018-11-23 19:18:09 +01:00
Tomasz Grabiec
96f5bd2f46 tests: perf_fast_forward: Reorder members 2018-11-23 19:18:09 +01:00
Tomasz Grabiec
3ac5e8887e tests: perf_fast_forward: Add --sstable-format command line option 2018-11-23 19:18:09 +01:00
Tomasz Grabiec
564b328b2e Merge 'Add tests for schema changes' from Paweł
This series adds a generic test for schema changes that generates
various schema and data before and after an ALTER TABLE operation. It is
then used to check correctness of mutation::upgrade() and sstable
readers and lead to the discovery of #3924 and #3925.

Fixes #3925.

* https://github.com/pdziepak/scylla.git schema-change-test/v3.1
  schema_builder: make member function names less confusing
  converting_mutation_partition_applier: fix collection type changes
  converting_mutation_partition_applier: do not emit empty collections
  sstable: use format() instead of sprint()
  tests/random-utils: make functions and variables inline
  tests: add models for schemas and data
  tests: generate schema changes
  tests/mutation: add test for schema changes
  tests/sstable: add test for schema changes
2018-11-23 15:11:31 +01:00
Paweł Dziepak
09439cd809 tests/sstable: add test for schema changes
for_each_schema_change() is used for testing reading an sstable that was
written with a different schema. Because of #3924, for now the mc format
is not verified this way.
2018-11-23 12:14:06 +00:00
Paweł Dziepak
dc7f9fea5b tests/mutation: add test for schema changes 2018-11-23 12:14:06 +00:00
Paweł Dziepak
35f9f424e9 tests: generate schema changes
This patch adds for_each_schema_change() functions which generates
schemas and data before and after some modification to the schema (e.g.
adding a column, changing its type). It can be used to test schema
upgrades.
2018-11-23 12:14:06 +00:00
Paweł Dziepak
daee4bd3b8 tests: add models for schemas and data
This patch introduces a model of Scylla schemas and data, implemented
using simple standard library primitives. It can be used for testing the
actuall schemas, mutation_partitions, etc. used by the schema by
comparing the results of various actions.

The initial use case for this model was to test schema changes, but
there is no reason why in the future it cannot be extended to test other
things as well.
2018-11-23 12:14:06 +00:00
Takuya ASADA
cf0d00b81a dist/ami: fix 'unknown configuration key: "enhanced_networking"' error while building AMI
packer 1.3.2 no longer supported enhanced_networking directive, we need
to use new directives("sriov_support" and "ena_support") to build with
new version.
packer provides automatic configuration file fixing tool, so new
scylla.json is generated by following command:
 ./packer/packer fix scylla.json

Fixes #3938

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181123053719.32451-1-syuu@scylladb.com>
2018-11-23 08:15:47 +02:00
Paweł Dziepak
91793c0a43 bytes_ostream: drop appending_hash specialisation
appending_hash is used for computing hashes that become part of the
binary interface. They cannot change between Scylla version and the same
data needs to always result in the same hash.

At the moment, appending_hash<bytes_ostream> doesn't fulfil those
requirements since it leaks information how the underlying buffer is
fragmented. Fortunately, it has no users so it doesn't casue any
compatibility issues.

Moreover, bytes_ostream is usually used as an output of some
serialisation routine (e.g. frozen_mutation_fragment or CQL response).
Those serialisation formats do not guarantee that there is a single
representation of a given data and therefore are not fit to be hashed by
appending_hash. Removing appending_hash<bytes_ostream> may help
preventing such incorrect uses.
Message-Id: <20181122163823.12759-1-pdziepak@scylladb.com>
2018-11-22 23:53:54 +00:00
Tomasz Grabiec
fb38f0e9f8 Update seastar submodule
* seastar b924495...1fbb633 (3):
  > rpc: Reduce code duplication
  > tests: perf: Make do_not_optimize() take the argument by const&
  > doc: Fix import paths in the tutorial
2018-11-22 23:53:54 +00:00
Paweł Dziepak
2a0e929830 tests/random-utils: make functions and variables inline
random-utils.hh is a header which may be included in multiple
translation units so all members should be non-static inline to avoid
any duplication.
2018-11-22 11:30:31 +00:00
Paweł Dziepak
edb5402a73 sstable: use format() instead of sprint()
The format message was using the new stlye formatting markers ("{}")
which are understood by format() but not by sprint() (the latter is
basically deprecated).
2018-11-22 11:30:31 +00:00
Paweł Dziepak
1fbe33791d converting_mutation_partition_applier: do not emit empty collections
This patch changes the behaviour of the schema upgrade code so that if
all cells and the tombstons of a collection are removed during the upgrade
the collection is not emitted (as opposed to emitting an empty one).
Both behaviours are valid, but the new one makes it more consistent with
how atomic cells are upgraded and how schema upgrades work for sstable
readers.
2018-11-22 11:30:31 +00:00
Paweł Dziepak
7b12aaa093 converting_mutation_partition_applier: fix collection type changes
ALTER TABLE allows changing the type of a collection to a compatible
one. This includes changes from a fixed-sized type to a variable-sized
one. If that happens the atomic_cells representing collection elements
need to be rewritten so that the value size is included. The logic for
rewritting atomic cells already exists (for those that are not
collection members) and is reused in this patch.

Fixes #3925.
2018-11-22 11:30:31 +00:00
Paweł Dziepak
43e0201ec6 schema_builder: make member function names less confusing
Right now, schema_builder member functions have names that very poorly
convey the actions that are performed for them. This is made even worse
by some overloads which drastically change the semantics. For example:

    schema_builder()
        .with_column("v1", /* ... */)
        .without_column("v1", removal_timestamp);

Creates a column "v1" and adds an information that there was a column
with that name that was removed at 'removal_timestamp'.

    schema_builder()
        .with_coulmn("v1")
        .without_column(utf8_type->decompose("v1"));

This adds column "v1" and then immediately removes it.

In order to clean up this mess the names were changes so that:
 * with_/without_ functions only add informations to the schema (e.g.
   info that a column was removed, but without removing a column of that
   name if one exists)
 * functions which names start with a verb actually perform that action,
   e.g. the new remove_column() removes the column (and adds information
   that it used to exist) as in the second example.
2018-11-22 11:30:31 +00:00
Benny Halevy
dcd18e2b62 remove exec permission from top_k source files
This was introduced by 32525f2694

Cc: Rafi Einstein <rafie@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181121163352.13325-1-bhalevy@scylladb.com>
2018-11-21 18:38:50 +02:00
Gleb Natapov
b4a8802edc hints: make hints manager more resilient to unexpected directory content
Currently if hints directory contains unexpected directories Scylla fails to
start with unhandled std::invalid_argument exception. Make the manager
ignore malformed files instead and try to proceed anyway.
Message-Id: <20181121134618.29936-2-gleb@scylladb.com>
2018-11-21 14:53:03 +00:00
Gleb Natapov
9433d02624 hints: add auxiliary function for scanning high level hints directory
We scan hints directory in two places: to search for files to replay and
to search for directories to remove after resharding. The code that
translates directory name to a shard is duplicated. It is simple now, so
not a bit issue but in case it grows better have it in one place.
Message-Id: <20181121134618.29936-1-gleb@scylladb.com>
2018-11-21 14:53:03 +00:00
Paweł Dziepak
4aa5d83590 Merge "Optimize sstable writing of the MC format" from Tomasz
"
Tested with perf_fast_forward from:

  github.com/tgrabiec/scylla.git perf_fast_forward-for-sst3-opt-write-v1

Using the following command line:

  build/release/tests/perf/perf_fast_forward_g --populate --sstable-format=mc \
     --data-directory /tmp/perf-mc --rows=10000000 -c1 -m4G \
     --datasets small-part

The average reported flush throughput was (stdev for the avergages is around 4k):
  - for mc before the series: 367848 frag/s
  - for lc before the series: 463458 frag/s (= mc.before +25%)
  - for mc after the series: 429276 frag/s (= mc.before +16%)
  - for lc after the series: 466495 frag/s (= mc.before +26%)

Refs #3874.
"

* tag 'sst3-opt-write-v2' of github.com:tgrabiec/scylla:
  sstables: mc: Avoid serialization of promoted index when empty
  sstables: mc: Avoid double serialization of rows
  tests: sstable 3.x: Do not compare Statistics component
  utils: Introduce memory_data_sink
  schema: Optimize column count getters
  sstables: checksummed_file_data_sink_impl: Bypass output_stream
2018-11-21 13:11:40 +00:00
Tomasz Grabiec
049926bfb8 sstables: mc: Avoid serialization of promoted index when empty
calculate_write_size() adds some overhead, even if we're not going to
write anything.
2018-11-21 14:04:27 +01:00
Tomasz Grabiec
0a9f5b563a sstables: mc: Avoid double serialization of rows
The old code was serializing the row twice. Once to get the size of
its block on disk, which is needed to write the block length, and then
to actually write the block.

This patch avoids this by serializing once into a temporary buffer and
then appending that buffer to the data file writer.

I measured about 10% improvement in memtable flush throughput with
this for the small-part dataset in perf_fast_forward.
2018-11-21 14:04:27 +01:00
Tomasz Grabiec
8f686af9af tests: sstable 3.x: Do not compare Statistics component
The Statistics component recorded in the test was generated using a
buggy verion of Scylla, and is not correct. Exposed by fixing the bug
in the way statistics are generated.

Rather than comparing binary content, we should have explicit checks
for statistics.
2018-11-21 14:04:27 +01:00
Tomasz Grabiec
143fd6e1c2 utils: Introduce memory_data_sink 2018-11-21 14:04:27 +01:00
Tomasz Grabiec
789fac9884 schema: Optimize column count getters 2018-11-21 14:04:27 +01:00
Tomasz Grabiec
8e8b96c6ed sstables: checksummed_file_data_sink_impl: Bypass output_stream
We can avoid the data copying by switching from this:

  sink -> stream -> sink

to this:

  sink -> sink
2018-11-21 14:04:27 +01:00
Avi Kivity
bb85a21a8f Merge "compress: Restore lz4 as default compressor" from Duarte
"
Enables sstable compression with LZ4 by default, which was the
long-time behavior until a regression turned off compression by
default.

Fixes #3926
"

* 'restore-default-compression/v2' of https://github.com/duarten/scylla:
  tests/cql_query_test: Assert default compression options
  compress: Restore lz4 as default compressor
  tests: Be explicit about absence of compression
2018-11-21 14:20:39 +02:00
Benny Halevy
76b1c184b7 conf: clean up cassandra references in scylla.yaml
Indicate the default scylla directories, rather than Cassandra's.
Provide links to Scylladocumentation where possible,
update links to Casandra documentation otherwise.
Clean up a few typos.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181119141912.28830-1-bhalevy@scylladb.com>
2018-11-21 13:04:24 +02:00
Rafael Ávila de Espíndola
7fa7e9716d Mention scylla-tools-java and scylla-jmx in HACKING.md
I struggled a bit finding out why nodetool was not working, so it
might be a good idea to expand the documentation a bit.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181120233358.25859-1-espindola@scylladb.com>
2018-11-21 12:55:17 +02:00
Tomasz Grabiec
349c9f7a69 HACKING.md: Add a link to the slides about core dump debugging tools
Message-Id: <1542793207-1620-1-git-send-email-tgrabiec@scylladb.com>
2018-11-21 11:45:23 +02:00
Michael Munday
53fdde75f6 dht: use little endian byte order explicitly for token hash
This avoids a difference between little and big endian sytems. We
now also calculate a full murmur hash for tokens with less than 8
bytes, however in practice the token size is always 8.

Message-Id: <20181120214733.43800-1-mike.munday@ibm.com>
2018-11-21 11:44:29 +02:00
Michael Munday
360374cfde tests: fix compilation of partitioner_test with boost 1.68 on IBM Z
The boost multiprecision library that I am compiling against seems
to be missing an overload for the cast to a string. The easy
workaround seems to be to call str() directly instead.

This also fixes #3922.

Message-Id: <20181120215709.43939-1-mike.munday@ibm.com>
2018-11-21 11:43:42 +02:00
Duarte Nunes
9464fffc8c tests/cql_query_test: Assert default compression options
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-11-20 22:47:27 +00:00
Duarte Nunes
36dc9e3280 compress: Restore lz4 as default compressor
Fixes a regression introduced in
74758c87cd, where tables started to be
created without compression by default (before they were created with
lz4 by default).

Fixes #3926

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-11-20 22:47:27 +00:00
Duarte Nunes
5f64e34fcc tests: Be explicit about absence of compression
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-11-20 22:47:26 +00:00
Avi Kivity
775b7e41f4 Update seastar submodule
* seastar d59fcef...b924495 (2):
  > build: Fix protobuf generation rules
  > Merge "Restructure files" from Jesse

Includes fixup patch from Jesse:

"
Update Seastar `#include`s to reflect restructure

All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
2018-11-21 00:01:44 +02:00
Takuya ASADA
42baf6a6f7 dist/ami: update packer
Update packer to latest version, 1.3.2.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181031110441.16284-2-syuu@scylladb.com>
2018-11-20 21:29:57 +02:00
Takuya ASADA
b9a42e83ad dist/ami: enable AMI build log
To make easier to debug AMI build error, enable logging.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181031110441.16284-1-syuu@scylladb.com>
2018-11-20 21:29:57 +02:00
Takuya ASADA
72411f95cb reloc/build_reloc.sh: find ninja-build after executed install-dependencies.sh
The build environment may not installed ninja-build before running
install-dependencies.sh, so do it after running the script.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181031110737.17755-1-syuu@scylladb.com>
2018-11-20 21:29:57 +02:00
Avi Kivity
183c2369f3 Update seastar submodule
* seastar a44cedf...d59fcef (10):
  > dns: Set tcp output stream buffer size to zero explicitly
  > tests: add libc-ares to travis dependencies
  > tests: add dns_test to test suite
  > build: drop bundled c-ares package
  > prometheus: replace the instance label with an optional one
  > build: Refactor C++ dialect detection
  > build: add libatomic to install-depenencies.sh
  > core: use std::underlying_type for open_flags
  > core: introduce open_flags::operator&
  > core: Fix build for `gnu++14`
2018-11-20 21:29:57 +02:00
Tomasz Grabiec
57e25fa0f8 utils: phased_barrier: Make advance_and_await() have strong exception guarantees
Currently, when advance_and_await() fails to allocate the new gate
object, it will throw bad_alloc and leave the phased_barrier object in
an invalid state. Calling advance_and_await() again on it will result
in undefined behavior (typically SIGSEGV) beacuse _gate will be
disengaged.

One place affected by this is table::seal_active_memtable(), which
calls _flush_barrier.advance_and_await(). If this throws, subsequent
flush attempts will SIGSEGV.

This patch rearranges the code so that advance_and_await() has strong
exception guarantees.
Message-Id: <1542645562-20932-1-git-send-email-tgrabiec@scylladb.com>
2018-11-20 16:15:12 +00:00
Glauber Costa
9f403334c8 remove monitor if sstable write failed
In (almost) all SSTable write paths, we need to inform the monitor that
the write has failed as well. The monitor will remove the SSTable from
controller's tracking at that point.

Except there is one place where we are not doing that: streaming of big
mutations. Streaming of big mutations is an interesting use case, in
which it is done in 2 parts: if the writing of the SSTable fails right
away, then we do the correct thing.

But the SSTables are not commited at that point and the monitors are
still kept around with the SSTables until a later time, when they are
finally committed. Between those two points in time, it is possible that
the streaming code will detect a failure and manually call
fail_streaming_mutations(), which marks the SSTable for deletions. At
that point we should propagate that information to the monitor as well,
but we don't.

Fixes #3732 (hopefully)
Tests: unit (release)

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181114213618.16789-1-glauber@scylladb.com>
2018-11-20 16:15:12 +00:00
Gleb Natapov
d144e6ceac messaging_service: enable port load balancing algorithm for RPC server
In a homogeneous cluster this will reduce number of internal cross-shard hops
per request since RPC calls will arrive to correct shard.

Message-Id: <20181118150817.GF2062@scylladb.com>
2018-11-20 16:15:12 +00:00
Michael Munday
b9a2f4a228 dht: fix byte ordered partitioner midpoint calculation
New versions of boost saturate the output of the convert_to method
so we need to mask the part we want to extract.

Updates #3922.

Message-Id: <20181116191441.35000-1-mike.munday@ibm.com>
2018-11-16 21:19:06 +02:00
Glauber Costa
c6811bd877 sstables: correctly parse estimated histograms
In commit a33f0d6, we changed the way we handle arrays during the write
and parse code to avoid reactor stalls. Some potentially big loops were
transformed into futurized loops, and also some calls to vector resizes
were replaced by a reserve + push_back idiom.

The latter broke parsing of the estimated histogram. The reason being
that the vectors that are used here are already initialized internally
by the estimated_histogram object. Therefore, when we push_back, we
don't fill the array all the way from index 0, but end up with a zeroed
beginning and only push back some of the elements we need.

We could revert this array to a resize() call. After all, the reason we
are using reserve + push_back is to avoid calling the constructor member
for each element, but We don't really expect the integer specialization
to do any of that.

However, to avoid confusion with future developers that may feel tempted
to converted this as well for the sake of consistency, it is safer to
just make sure these arrays are zeroed.

Fixes #3918

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181116130853.10473-1-glauber@scylladb.com>
2018-11-16 20:52:44 +02:00
Avi Kivity
d708dabab9 doc: add reference to Linux' submitting-patches document
Since our development process is a derivative of Linux, almost everything there
is pertinent.

Message-Id: <20181115184037.5256-1-avi@scylladb.com>
2018-11-16 20:15:40 +02:00
Vladimir Krivopalov
759fbbd5f6 random_mutation_generator: Add row_marker to rows regardless of whether they're deleted.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <f55b91f1349f0e98def6b7ca9755b5ccf4f48a3e.1542308626.git.vladimir@scylladb.com>
2018-11-16 13:17:07 +01:00
Avi Kivity
6548a404b2 Remove patch file committed by mistake 2018-11-15 19:47:55 +02:00
Duarte Nunes
6fbf792777 db/view/view_builder: Don't timeout waiting for view to be built
Remove the timeout argument to
db::view::view_builder::wait_until_built(), a test-only function to
wait until a given materialized view has finished building.

This change is motivated by the fact that some tests running on slow
environments will timeout. Instead of incrementally increasing the
timeout, remove it completely since tests are already run under an
exterior timeout.

Fixes #3920

Tests: unit release(view_build_test, view_schema_test)

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181115173902.19048-1-duarte@scylladb.com>
2018-11-15 19:41:43 +02:00
Amnon Heiman
25378916bc API: colummn_family.hh yield in map_reduce_column_families_locally
map_reduce_column_families_locally iterate over all tables (column
family) in a shard.

If the number of tables is big it can cause latency spikes.

This patch replaces the current loop with a do_for_each allowing
preepmtion inside the loop.

Fixes #3886

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20181115154825.23430-1-amnon@scylladb.com>
2018-11-15 18:58:23 +02:00
Nadav Har'El
45f05b06d2 view_complex_test: fix another ttl
In a previous patch I fixed most TTLs in the view_complex_test.cc tests
from low numbers to 100 seconds. I missed one. This one never caused
problems in practice, but for good form, let's fix it too.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181115160234.26478-1-nyh@scylladb.com>
2018-11-15 18:03:28 +02:00
Nadav Har'El
78ed7d6d0c Materialized Views and Secondary Index: no longer experimental
After this patch, the Materialized Views and Secondary Index features
are considered generally-available and no longer require passing an
explicit "--experimental=on" flag to Scylla.

The "--experimental=on" flag and the db::config::check_experimental()
function remain unused, as we graduated the only two features which used
this flag. However, we leave the support for experimental features in
the code, to make it easier to add new experimental features in the future.
Another reason to leave the command-line parameter behind is so existing
scripts that still use it will not break.

Fixes #3917

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181115144456.25518-1-nyh@scylladb.com>
2018-11-15 17:59:27 +02:00
Vladimir Krivopalov
51afb1d8bd tests: Generate deleted rows and shadowable tombstones in random_mutation_generator.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <77e956890264023227e07cc6d295df870d0a5af2.1542295208.git.vladimir@scylladb.com>
2018-11-15 16:26:07 +01:00
Avi Kivity
0216f49bb0 Merge "Add filtering support for CONTAINS" from Piotr
"
This series enables filtering support for CONTAINS restriction.
"

* 'enable_filtering_for_contains_2' of https://github.com/psarna/scylla:
  tests: add CONTAINS test case to filtering tests
  cql3: enable filtering for CONTAINS restriction
  cql3: add is_satisfied_by(bytes_view) for CONTAINS
2018-11-15 16:49:29 +02:00
Nadav Har'El
4108458b8e view_complex_test: increase low ttl which may fail test on busy machine
Several of the tests in tests/view_complex_test.cc set a cell with a
TTL, and then skip time ahead artificially with forward_jump_clocks(),
to go past the TTL time and check the cell disappeared as expected.

The TTLs chosen for these tests were arbitrary numbers - some had 3 seconds,
some 5 seconds, and some 60 seconds. The actual number doesn't matter - it
is completely artificial (we move the clock with forward_jump_clocks() and
never really wait for that amount of time) and could very well be a million
seconds. But *low* numbers, like the 3 seconds, present a problem on extremely
overcomitted test machines. Our eventually() function already allows for
the possibility that things can hang for up to 8 seconds, but with a 3 second
TTL, we can find ourselves with data being expired and the test failing just
after 3 seconds of wall time have passed - while the test intended that the
dataq will expire only when we explicitly call forward_jump_clocks().

So this patch changes all the TTLs in this test to be the same high number -
100 seconds. This hopefully fixes #3918.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181115125607.22647-1-nyh@scylladb.com>
2018-11-15 15:34:08 +02:00
Piotr Jastrzebski
411437f320 Fix format string in mutation_partition::operator<<
fmt does not allow bool values for :d and previous
format string was resulting in:

fmt::v5::format_error: invalid type specifier

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <3980a3cdb903263e29689b1c6cd24e3592826fe0.1542284205.git.piotr@scylladb.com>
2018-11-15 12:22:10 +00:00
Yannis Zarkadas
d292d0c78d dist/redhat: extend docker entrypoint with more cmd flags
With the use of Docker image, some extra options needed to be exposed
to provide extended functionality when starting the image. The flags
added by this commit are:

 - cluster-name: name of the Scylla cluster. cluster_name option in
scylla.yaml.
 - rpc-address: IP address for client connections (CQL). rpc_address
option in scylla.yaml.
 - endpoint-snitch: The snitch used to discover the cluster topology.
endpoint_snitch option in scylla.yaml.
 - replace-address-first-boot: Replace a Scylla node by its IP.
replace_address_first_boot option in scylla.yaml.

Signed-off-by: Yannis Zarkadas <yanniszarkadas@gmail.com>
[ penberg@scylladb.com: fix up merge conflicts ]
Message-Id: <20181108234212.19969-2-yanniszarkadas@gmail.com>
2018-11-15 09:07:52 +02:00
Alexys Jacob
cd9d01cd7e test.py: coding style fixes
test.py:26:1: F401 'signal' imported but unused
test.py:27:1: F401 'shlex' imported but unused
test.py:28:1: F401 'threading' imported but unused
test.py:173:1: E305 expected 2 blank lines after class or function definition,
found 1
test.py:181:34: E241 multiple spaces after ','
test.py:183:34: E241 multiple spaces after ','
test.py:209:24: E222 multiple spaces after operator
test.py:240:5: E301 expected 1 blank line, found 0
test.py:249:23: W504 line break after binary operator
test.py:254:9: E306 expected 1 blank line before a nested definition, found 0
test.py:263:13: F841 local variable 'out' is assigned to but never used
test.py:264:33: E128 continuation line under-indented for visual indent
test.py:265:33: E128 continuation line under-indented for visual indent
test.py:266:33: E128 continuation line under-indented for visual indent
test.py:274:64: F821 undefined name 'e'
test.py:278:53: F821 undefined name 'e'

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104115255.22547-1-ultrabug@gentoo.org>
2018-11-14 19:25:14 +02:00
Alexys Jacob
e76a1085d3 scylla-gdb.py: coding style fixes
scylla-gdb.py:1:11: E401 multiple imports on one line
scylla-gdb.py:5:1: F811 redefinition of unused 're' from line 2
scylla-gdb.py:10:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:19:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:24:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:30:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:39:9: E722 do not use bare 'except'
scylla-gdb.py:47:33: E711 comparison to None should be 'if cond is None:'
scylla-gdb.py:63:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:90:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:115:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:139:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:161:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:184:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:204:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:210:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:214:5: E301 expected 1 blank line, found 0
scylla-gdb.py:221:5: E301 expected 1 blank line, found 0
scylla-gdb.py:224:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:252:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:267:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:284:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:300:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:314:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:318:5: E301 expected 1 blank line, found 0
scylla-gdb.py:322:5: E301 expected 1 blank line, found 0
scylla-gdb.py:337:1: E305 expected 2 blank lines after class or function
definition, found 1
scylla-gdb.py:339:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:342:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:345:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:348:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:352:129: E202 whitespace before ')'
scylla-gdb.py:361:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:363:129: E202 whitespace before ')'
scylla-gdb.py:371:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:375:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:378:5: E301 expected 1 blank line, found 0
scylla-gdb.py:383:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:386:5: E301 expected 1 blank line, found 0
scylla-gdb.py:393:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:396:5: E301 expected 1 blank line, found 0
scylla-gdb.py:407:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:410:5: E301 expected 1 blank line, found 0
scylla-gdb.py:412:9: E306 expected 1 blank line before a nested definition,
found 0
scylla-gdb.py:439:26: E703 statement ends with a semicolon
scylla-gdb.py:462:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:500:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:506:5: E722 do not use bare 'except'
scylla-gdb.py:516:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:518:18: E271 multiple spaces after keyword
scylla-gdb.py:522:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:530:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:533:5: E301 expected 1 blank line, found 0
scylla-gdb.py:537:13: E306 expected 1 blank line before a nested definition,
found 0
scylla-gdb.py:547:9: E722 do not use bare 'except'
scylla-gdb.py:550:26: E261 at least two spaces before inline comment
scylla-gdb.py:568:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:571:5: E301 expected 1 blank line, found 0
scylla-gdb.py:577:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:577:39: E226 missing whitespace around arithmetic operator
scylla-gdb.py:583:15: E128 continuation line under-indented for visual indent
scylla-gdb.py:596:19: E128 continuation line under-indented for visual indent
scylla-gdb.py:609:82: E227 missing whitespace around bitwise or shift operator
scylla-gdb.py:609:90: E226 missing whitespace around arithmetic operator
scylla-gdb.py:609:113: E226 missing whitespace around arithmetic operator
scylla-gdb.py:613:1: E303 too many blank lines (3)
scylla-gdb.py:645:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:659:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:671:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:678:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:679:9: E128 continuation line under-indented for visual indent
scylla-gdb.py:680:9: E128 continuation line under-indented for visual indent
scylla-gdb.py:681:9: E128 continuation line under-indented for visual indent
scylla-gdb.py:682:9: E128 continuation line under-indented for visual indent
scylla-gdb.py:708:12: E111 indentation is not a multiple of four
scylla-gdb.py:721:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:723:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:725:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:727:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:729:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:748:33: E261 at least two spaces before inline comment
scylla-gdb.py:770:17: E306 expected 1 blank line before a nested definition,
found 0
scylla-gdb.py:795:17: E128 continuation line under-indented for visual indent
scylla-gdb.py:796:17: E128 continuation line under-indented for visual indent
scylla-gdb.py:797:17: E128 continuation line under-indented for visual indent
scylla-gdb.py:798:17: E128 continuation line under-indented for visual indent
scylla-gdb.py:800:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:807:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:814:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:820:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:823:5: E301 expected 1 blank line, found 0
scylla-gdb.py:845:35: E703 statement ends with a semicolon
scylla-gdb.py:865:91: E703 statement ends with a semicolon
scylla-gdb.py:896:9: F841 local variable 'segment_size' is assigned to but
never used
scylla-gdb.py:904:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:907:5: E301 expected 1 blank line, found 0
scylla-gdb.py:915:73: E128 continuation line under-indented for visual indent
scylla-gdb.py:916:73: E128 continuation line under-indented for visual indent
scylla-gdb.py:917:73: E126 continuation line over-indented for hanging indent
scylla-gdb.py:922:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:925:5: E301 expected 1 blank line, found 0
scylla-gdb.py:933:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:934:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:934:49: E251 unexpected spaces around keyword / parameter equals
scylla-gdb.py:934:51: E251 unexpected spaces around keyword / parameter equals
scylla-gdb.py:934:74: E251 unexpected spaces around keyword / parameter equals
scylla-gdb.py:934:76: E251 unexpected spaces around keyword / parameter equals
scylla-gdb.py:940:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:941:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:949:17: E128 continuation line under-indented for visual indent
scylla-gdb.py:950:17: E128 continuation line under-indented for visual indent
scylla-gdb.py:951:17: E128 continuation line under-indented for visual indent
scylla-gdb.py:952:21: E128 continuation line under-indented for visual indent
scylla-gdb.py:953:21: E128 continuation line under-indented for visual indent
scylla-gdb.py:954:21: E128 continuation line under-indented for visual indent
scylla-gdb.py:955:21: E128 continuation line under-indented for visual indent
scylla-gdb.py:958:1: E305 expected 2 blank lines after class or function
definition, found 1
scylla-gdb.py:958:11: E261 at least two spaces before inline comment
scylla-gdb.py:959:1: E302 expected 2 blank lines, found 0
scylla-gdb.py:971:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:989:5: E301 expected 1 blank line, found 0
scylla-gdb.py:993:5: E301 expected 1 blank line, found 0
scylla-gdb.py:995:5: E301 expected 1 blank line, found 0
scylla-gdb.py:997:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1001:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1005:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1029:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1034:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1037:46: E128 continuation line under-indented for visual indent
scylla-gdb.py:1057:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1060:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1071:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1076:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1084:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1093:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1096:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1101:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1104:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1116:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1119:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1123:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1126:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1132:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1135:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1138:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1141:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1147:15: E241 multiple spaces after ':'
scylla-gdb.py:1148:15: E241 multiple spaces after ':'
scylla-gdb.py:1149:15: E241 multiple spaces after ':'
scylla-gdb.py:1150:15: E241 multiple spaces after ':'
scylla-gdb.py:1151:15: E241 multiple spaces after ':'
scylla-gdb.py:1152:15: E241 multiple spaces after ':'
scylla-gdb.py:1153:15: E241 multiple spaces after ':'
scylla-gdb.py:1154:15: E241 multiple spaces after ':'
scylla-gdb.py:1170:20: E221 multiple spaces before operator
scylla-gdb.py:1191:40: E226 missing whitespace around arithmetic operator
scylla-gdb.py:1191:59: E226 missing whitespace around arithmetic operator
scylla-gdb.py:1225:1: E305 expected 2 blank lines after class or function
definition, found 1
scylla-gdb.py:1227:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1233:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1236:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1240:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1278:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1281:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1284:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1287:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1293:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1296:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1320:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1323:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1355:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1362:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1383:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1386:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1388:9: E306 expected 1 blank line before a nested definition,
found 0
scylla-gdb.py:1397:13: F841 local variable 'selector' is assigned to but never
used
scylla-gdb.py:1446:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1477:5: E301 expected 1 blank line, found 0

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104113603.1111-1-ultrabug@gentoo.org>
2018-11-14 19:25:14 +02:00
Alexys Jacob
e58eb6d6ab idl-compiler.py: coding style fixes
idl-compiler.py:22:1: F401 'json' imported but unused
idl-compiler.py:23:1: F401 'sys' imported but unused
idl-compiler.py:24:1: F401 're' imported but unused
idl-compiler.py:25:1: F401 'glob' imported but unused
idl-compiler.py:27:1: F401 'os' imported but unused
idl-compiler.py:54:1: F811 redefinition of unused 'reindent' from line 33
idl-compiler.py:57:1: E302 expected 2 blank lines, found 1
idl-compiler.py:61:1: E302 expected 2 blank lines, found 1
idl-compiler.py:66:1: E302 expected 2 blank lines, found 1
idl-compiler.py:96:1: E302 expected 2 blank lines, found 1
idl-compiler.py:160:1: E302 expected 2 blank lines, found 1
idl-compiler.py:163:1: E302 expected 2 blank lines, found 1
idl-compiler.py:166:1: E302 expected 2 blank lines, found 1
idl-compiler.py:172:1: E302 expected 2 blank lines, found 1
idl-compiler.py:176:1: E302 expected 2 blank lines, found 1
idl-compiler.py:176:47: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:176:49: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:191:24: E203 whitespace before ':'
idl-compiler.py:191:43: E203 whitespace before ':'
idl-compiler.py:191:67: E203 whitespace before ':'
idl-compiler.py:191:84: E202 whitespace before '}'
idl-compiler.py:195:1: E302 expected 2 blank lines, found 1
idl-compiler.py:195:45: E203 whitespace before ','
idl-compiler.py:195:69: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:195:71: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:198:28: E225 missing whitespace around operator
idl-compiler.py:198:40: E225 missing whitespace around operator
idl-compiler.py:198:43: E272 multiple spaces before keyword
idl-compiler.py:212:25: E203 whitespace before ':'
idl-compiler.py:212:45: E203 whitespace before ':'
idl-compiler.py:212:100: E203 whitespace before ':'
idl-compiler.py:218:1: E302 expected 2 blank lines, found 1
idl-compiler.py:225:1: E302 expected 2 blank lines, found 1
idl-compiler.py:226:11: E271 multiple spaces after keyword
idl-compiler.py:228:1: E302 expected 2 blank lines, found 1
idl-compiler.py:235:1: E302 expected 2 blank lines, found 1
idl-compiler.py:238:1: E302 expected 2 blank lines, found 1
idl-compiler.py:241:5: E722 do not use bare 'except'
idl-compiler.py:243:1: E305 expected 2 blank lines after class or function
definition, found 0
idl-compiler.py:245:1: E302 expected 2 blank lines, found 1
idl-compiler.py:250:25: E231 missing whitespace after ','
idl-compiler.py:253:1: E302 expected 2 blank lines, found 1
idl-compiler.py:256:1: E302 expected 2 blank lines, found 1
idl-compiler.py:263:1: E302 expected 2 blank lines, found 1
idl-compiler.py:266:1: E302 expected 2 blank lines, found 1
idl-compiler.py:267:75: E225 missing whitespace around operator
idl-compiler.py:269:1: E302 expected 2 blank lines, found 1
idl-compiler.py:272:1: E302 expected 2 blank lines, found 1
idl-compiler.py:275:1: E302 expected 2 blank lines, found 1
idl-compiler.py:278:1: E305 expected 2 blank lines after class or function
definition, found 1
idl-compiler.py:280:1: E302 expected 2 blank lines, found 1
idl-compiler.py:283:1: E302 expected 2 blank lines, found 1
idl-compiler.py:286:1: E302 expected 2 blank lines, found 1
idl-compiler.py:288:1: E302 expected 2 blank lines, found 0
idl-compiler.py:293:1: E302 expected 2 blank lines, found 1
idl-compiler.py:294:20: E203 whitespace before ':'
idl-compiler.py:294:22: E241 multiple spaces after ':'
idl-compiler.py:294:51: E203 whitespace before ':'
idl-compiler.py:294:55: E202 whitespace before '}'
idl-compiler.py:296:1: E302 expected 2 blank lines, found 1
idl-compiler.py:298:23: E203 whitespace before ':'
idl-compiler.py:300:1: E305 expected 2 blank lines after class or function
definition, found 1
idl-compiler.py:301:1: E302 expected 2 blank lines, found 0
idl-compiler.py:304:1: E302 expected 2 blank lines, found 1
idl-compiler.py:304:45: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:304:47: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:311:67: E202 whitespace before '}'
idl-compiler.py:314:74: E241 multiple spaces after ':'
idl-compiler.py:316:114: E241 multiple spaces after ':'
idl-compiler.py:316:129: E203 whitespace before ':'
idl-compiler.py:326:1: E302 expected 2 blank lines, found 1
idl-compiler.py:328:27: E231 missing whitespace after ','
idl-compiler.py:328:34: E225 missing whitespace around operator
idl-compiler.py:330:1: E302 expected 2 blank lines, found 1
idl-compiler.py:332:5: F841 local variable 'typ' is assigned to but never used
idl-compiler.py:348:63: E202 whitespace before '}'
idl-compiler.py:352:1: E302 expected 2 blank lines, found 1
idl-compiler.py:353:21: E231 missing whitespace after ','
idl-compiler.py:368:30: E203 whitespace before ':'
idl-compiler.py:374:30: E203 whitespace before ':'
idl-compiler.py:411:57: E203 whitespace before ':'
idl-compiler.py:413:1: E302 expected 2 blank lines, found 1
idl-compiler.py:413:64: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:413:66: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:413:80: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:413:82: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:413:98: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:413:100: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:415:51: E225 missing whitespace around operator
idl-compiler.py:417:57: E225 missing whitespace around operator
idl-compiler.py:448:1: E302 expected 2 blank lines, found 1
idl-compiler.py:448:60: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:448:62: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:448:76: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:448:78: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:448:94: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:448:96: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:451:51: E225 missing whitespace around operator
idl-compiler.py:453:57: E225 missing whitespace around operator
idl-compiler.py:455:30: E231 missing whitespace after ','
idl-compiler.py:477:1: E302 expected 2 blank lines, found 1
idl-compiler.py:477:48: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:477:50: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:477:67: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:477:69: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:484:24: E222 multiple spaces after operator
idl-compiler.py:488:74: E203 whitespace before ':'
idl-compiler.py:498:20: E222 multiple spaces after operator
idl-compiler.py:507:68: E203 whitespace before ':'
idl-compiler.py:507:88: E203 whitespace before ':'
idl-compiler.py:514:87: E231 missing whitespace after ','
idl-compiler.py:520:14: E211 whitespace before '('
idl-compiler.py:521:15: E703 statement ends with a semicolon
idl-compiler.py:523:1: E302 expected 2 blank lines, found 1
idl-compiler.py:540:47: E231 missing whitespace after ':'
idl-compiler.py:542:1: E302 expected 2 blank lines, found 1
idl-compiler.py:542:47: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:542:49: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:542:69: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:542:71: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:547:24: E222 multiple spaces after operator
idl-compiler.py:553:47: E231 missing whitespace after ':'
idl-compiler.py:558:43: E231 missing whitespace after ':'
idl-compiler.py:560:1: E302 expected 2 blank lines, found 1
idl-compiler.py:564:1: E302 expected 2 blank lines, found 1
idl-compiler.py:564:82: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:564:84: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:564:105: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:564:107: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:573:21: E222 multiple spaces after operator
idl-compiler.py:576:25: E222 multiple spaces after operator
idl-compiler.py:577:13: F841 local variable 'sate' is assigned to but never
used
idl-compiler.py:584:66: E203 whitespace before ':'
idl-compiler.py:589:66: E203 whitespace before ':'
idl-compiler.py:589:89: E203 whitespace before ':'
idl-compiler.py:589:113: E203 whitespace before ':'
idl-compiler.py:600:48: E203 whitespace before ':'
idl-compiler.py:600:68: E203 whitespace before ':'
idl-compiler.py:602:1: E302 expected 2 blank lines, found 1
idl-compiler.py:602:1: F811 redefinition of unused 'add_vector_node' from line
330
idl-compiler.py:604:38: E231 missing whitespace after ','
idl-compiler.py:604:59: E202 whitespace before ')'
idl-compiler.py:607:1: E305 expected 2 blank lines after class or function
definition, found 1
idl-compiler.py:609:1: E302 expected 2 blank lines, found 1
idl-compiler.py:615:39: E231 missing whitespace after ','
idl-compiler.py:622:1: E302 expected 2 blank lines, found 1
idl-compiler.py:630:46: E203 whitespace before ':'
idl-compiler.py:637:33: E231 missing whitespace after ':'
idl-compiler.py:640:90: E203 whitespace before ':'
idl-compiler.py:641:13: F841 local variable 'vr' is assigned to but never used
idl-compiler.py:642:1: E305 expected 2 blank lines after class or function
definition, found 0
idl-compiler.py:644:1: E302 expected 2 blank lines, found 1
idl-compiler.py:657:1: E302 expected 2 blank lines, found 1
idl-compiler.py:657:51: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:657:53: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:657:67: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:657:69: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:660:5: E265 block comment should start with '# '
idl-compiler.py:679:16: E272 multiple spaces before keyword
idl-compiler.py:692:56: E271 multiple spaces after keyword
idl-compiler.py:695:5: F841 local variable 'is_param_vector' is assigned to
but never used
idl-compiler.py:699:1: E302 expected 2 blank lines, found 1
idl-compiler.py:699:56: E202 whitespace before ')'
idl-compiler.py:711:1: E302 expected 2 blank lines, found 1
idl-compiler.py:719:26: E201 whitespace after '{'
idl-compiler.py:730:39: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:730:41: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:733:1: E302 expected 2 blank lines, found 1
idl-compiler.py:735:21: E225 missing whitespace around operator
idl-compiler.py:738:1: E302 expected 2 blank lines, found 1
idl-compiler.py:747:1: E305 expected 2 blank lines after class or function
definition, found 1
idl-compiler.py:749:1: E302 expected 2 blank lines, found 1
idl-compiler.py:767:17: E211 whitespace before '('
idl-compiler.py:767:26: E203 whitespace before ':'
idl-compiler.py:770:5: E303 too many blank lines (2)
idl-compiler.py:777:20: E211 whitespace before '('
idl-compiler.py:777:29: E203 whitespace before ':'
idl-compiler.py:783:28: E203 whitespace before ':'
idl-compiler.py:783:44: E203 whitespace before ':'
idl-compiler.py:783:82: E203 whitespace before ':'
idl-compiler.py:786:1: E302 expected 2 blank lines, found 1
idl-compiler.py:794:28: E203 whitespace before ':'
idl-compiler.py:802:33: E203 whitespace before ':'
idl-compiler.py:815:21: E126 continuation line over-indented for hanging
indent
idl-compiler.py:815:28: E203 whitespace before ':'
idl-compiler.py:815:50: E203 whitespace before ':'
idl-compiler.py:817:82: E203 whitespace before ':'
idl-compiler.py:817:104: E203 whitespace before ':'
idl-compiler.py:827:33: E203 whitespace before ':'
idl-compiler.py:827:48: E203 whitespace before ':'
idl-compiler.py:827:68: E203 whitespace before ':'
idl-compiler.py:827:84: E203 whitespace before ':'
idl-compiler.py:827:100: E203 whitespace before ':'
idl-compiler.py:859:24: E203 whitespace before ':'
idl-compiler.py:859:58: E203 whitespace before ':'
idl-compiler.py:859:78: E203 whitespace before ':'
idl-compiler.py:861:1: E302 expected 2 blank lines, found 1
idl-compiler.py:865:1: E302 expected 2 blank lines, found 1
idl-compiler.py:876:1: E302 expected 2 blank lines, found 1
idl-compiler.py:876:71: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:876:73: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:883:21: E222 multiple spaces after operator
idl-compiler.py:884:28: E225 missing whitespace around operator
idl-compiler.py:884:46: E225 missing whitespace around operator
idl-compiler.py:884:49: E272 multiple spaces before keyword
idl-compiler.py:904:86: E203 whitespace before ':'
idl-compiler.py:904:107: E203 whitespace before ':'
idl-compiler.py:906:81: E203 whitespace before ':'
idl-compiler.py:906:106: E203 whitespace before ':'
idl-compiler.py:906:124: E203 whitespace before ':'
idl-compiler.py:906:143: E203 whitespace before ':'
idl-compiler.py:911:49: E203 whitespace before ':'
idl-compiler.py:911:69: E203 whitespace before ':'
idl-compiler.py:911:93: E203 whitespace before ':'
idl-compiler.py:918:85: E203 whitespace before ':'
idl-compiler.py:918:108: E203 whitespace before ':'
idl-compiler.py:918:151: E203 whitespace before ':'
idl-compiler.py:922:62: E203 whitespace before ':'
idl-compiler.py:922:90: E203 whitespace before ':'
idl-compiler.py:925:82: E203 whitespace before ':'
idl-compiler.py:925:110: E203 whitespace before ':'
idl-compiler.py:940:70: E203 whitespace before ':'
idl-compiler.py:940:128: E203 whitespace before ':'
idl-compiler.py:942:110: E203 whitespace before ':'
idl-compiler.py:942:168: E203 whitespace before ':'
idl-compiler.py:948:25: E203 whitespace before ':'
idl-compiler.py:948:75: E203 whitespace before ':'
idl-compiler.py:954:78: E203 whitespace before ':'
idl-compiler.py:954:101: E203 whitespace before ':'
idl-compiler.py:954:144: E203 whitespace before ':'
idl-compiler.py:957:62: E203 whitespace before ':'
idl-compiler.py:957:90: E203 whitespace before ':'
idl-compiler.py:969:13: E271 multiple spaces after keyword
idl-compiler.py:971:13: E271 multiple spaces after keyword
idl-compiler.py:976:1: E302 expected 2 blank lines, found 1
idl-compiler.py:987:1: E302 expected 2 blank lines, found 1
idl-compiler.py:1016:1: E302 expected 2 blank lines, found 1
idl-compiler.py:1023:42: E225 missing whitespace around operator
idl-compiler.py:1024:79: E225 missing whitespace around operator
idl-compiler.py:1027:1: E305 expected 2 blank lines after class or function
definition, found 0

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104112308.19409-1-ultrabug@gentoo.org>
2018-11-14 19:25:13 +02:00
Alexys Jacob
0cf480aad0 gen_segmented_compress_params.py: coding style fixes
gen_segmented_compress_params.py:52:47: E226 missing whitespace around
arithmetic operator
gen_segmented_compress_params.py:56:64: E226 missing whitespace around
arithmetic operator
gen_segmented_compress_params.py:60:36: E226 missing whitespace around
arithmetic operator
gen_segmented_compress_params.py:60:48: E226 missing whitespace around
arithmetic operator
gen_segmented_compress_params.py:70:35: E226 missing whitespace around
arithmetic operator
gen_segmented_compress_params.py:70:48: E226 missing whitespace around
arithmetic operator
gen_segmented_compress_params.py:99:43: E226 missing whitespace around
arithmetic operator
gen_segmented_compress_params.py:106:18: E225 missing whitespace around
operator
gen_segmented_compress_params.py:120:5: E303 too many blank lines (2)
gen_segmented_compress_params.py:200:30: E261 at least two spaces before
inline comment
gen_segmented_compress_params.py:200:31: E262 inline comment should start with
'# '
gen_segmented_compress_params.py:218:76: E261 at least two spaces before
inline comment
gen_segmented_compress_params.py:219:59: E703 statement ends with a semicolon
gen_segmented_compress_params.py:219:60: E261 at least two spaces before
inline comment

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104115753.4701-1-ultrabug@gentoo.org>
2018-11-14 19:25:12 +02:00
Alexys Jacob
43a04ad693 fix_system_distributed_tables.py: coding style fixes
fix_system_distributed_tables.py:28:20: E203 whitespace before ':'
fix_system_distributed_tables.py:29:20: E203 whitespace before ':'
fix_system_distributed_tables.py:30:20: E203 whitespace before ':'
fix_system_distributed_tables.py:31:20: E203 whitespace before ':'
fix_system_distributed_tables.py:33:20: E203 whitespace before ':'
fix_system_distributed_tables.py:34:23: E203 whitespace before ':'
fix_system_distributed_tables.py:35:23: E203 whitespace before ':'
fix_system_distributed_tables.py:39:20: E203 whitespace before ':'
fix_system_distributed_tables.py:40:20: E203 whitespace before ':'
fix_system_distributed_tables.py:41:20: E203 whitespace before ':'
fix_system_distributed_tables.py:42:20: E203 whitespace before ':'
fix_system_distributed_tables.py:43:20: E203 whitespace before ':'
fix_system_distributed_tables.py:44:20: E203 whitespace before ':'
fix_system_distributed_tables.py:45:20: E203 whitespace before ':'
fix_system_distributed_tables.py:46:20: E203 whitespace before ':'
fix_system_distributed_tables.py:47:20: E203 whitespace before ':'
fix_system_distributed_tables.py:48:20: E203 whitespace before ':'
fix_system_distributed_tables.py:52:20: E203 whitespace before ':'
fix_system_distributed_tables.py:53:20: E203 whitespace before ':'
fix_system_distributed_tables.py:54:20: E203 whitespace before ':'
fix_system_distributed_tables.py:55:20: E203 whitespace before ':'
fix_system_distributed_tables.py:56:20: E203 whitespace before ':'
fix_system_distributed_tables.py:57:20: E203 whitespace before ':'
fix_system_distributed_tables.py:58:20: E203 whitespace before ':'
fix_system_distributed_tables.py:59:20: E203 whitespace before ':'
fix_system_distributed_tables.py:60:20: E203 whitespace before ':'
fix_system_distributed_tables.py:61:20: E203 whitespace before ':'
fix_system_distributed_tables.py:62:20: E203 whitespace before ':'
fix_system_distributed_tables.py:66:19: E203 whitespace before ':'
fix_system_distributed_tables.py:67:19: E203 whitespace before ':'
fix_system_distributed_tables.py:72:19: E203 whitespace before ':'
fix_system_distributed_tables.py:73:19: E203 whitespace before ':'
fix_system_distributed_tables.py:74:19: E203 whitespace before ':'
fix_system_distributed_tables.py:78:19: E203 whitespace before ':'
fix_system_distributed_tables.py:79:19: E203 whitespace before ':'
fix_system_distributed_tables.py:80:19: E203 whitespace before ':'
fix_system_distributed_tables.py:84:19: E203 whitespace before ':'
fix_system_distributed_tables.py:85:19: E203 whitespace before ':'
fix_system_distributed_tables.py:89:19: E203 whitespace before ':'
fix_system_distributed_tables.py:90:19: E203 whitespace before ':'
fix_system_distributed_tables.py:91:19: E203 whitespace before ':'
fix_system_distributed_tables.py:95:22: E203 whitespace before ':'
fix_system_distributed_tables.py:96:22: E203 whitespace before ':'
fix_system_distributed_tables.py:99:1: E302 expected 2 blank lines, found 0
fix_system_distributed_tables.py:103:72: E201 whitespace after '['
fix_system_distributed_tables.py:103:82: E202 whitespace before ']'
fix_system_distributed_tables.py:105:43: E201 whitespace after '['
fix_system_distributed_tables.py:105:53: E202 whitespace before ']'
fix_system_distributed_tables.py:111:16: E713 test for membership should be
'not in'
fix_system_distributed_tables.py:118:20: E713 test for membership should be
'not in'
fix_system_distributed_tables.py:135:25: E722 do not use bare 'except'
fix_system_distributed_tables.py:138:5: E722 do not use bare 'except'
fix_system_distributed_tables.py:144:1: E305 expected 2 blank lines after
class or function definition, found 0
fix_system_distributed_tables.py:145:47: E251 unexpected spaces around keyword
/ parameter equals
fix_system_distributed_tables.py:145:49: E251 unexpected spaces around keyword
/ parameter equals
fix_system_distributed_tables.py:160:1: W391 blank line at end of file

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104113001.22783-1-ultrabug@gentoo.org>
2018-11-14 19:25:12 +02:00
Alexys Jacob
c9e3b739ae dist/docker/redhat/scyllasetup.py: coding style fixes
dist/docker/redhat/scyllasetup.py:6:1: E302 expected 2 blank lines, found 1
dist/docker/redhat/scyllasetup.py:41:21: E128 continuation line under-indented for visual indent
dist/docker/redhat/scyllasetup.py:65:22: E201 whitespace after '['
dist/docker/redhat/scyllasetup.py:65:51: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:67:22: E201 whitespace after '['
dist/docker/redhat/scyllasetup.py:67:45: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:69:22: E201 whitespace after '['
dist/docker/redhat/scyllasetup.py:69:42: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:79:18: E201 whitespace after '['
dist/docker/redhat/scyllasetup.py:79:42: E225 missing whitespace around operator
dist/docker/redhat/scyllasetup.py:80:39: E225 missing whitespace around operator
dist/docker/redhat/scyllasetup.py:81:70: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:84:48: E225 missing whitespace around operator
dist/docker/redhat/scyllasetup.py:84:70: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:86:22: E201 whitespace after '['
dist/docker/redhat/scyllasetup.py:86:53: E225 missing whitespace around operator
dist/docker/redhat/scyllasetup.py:86:78: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:89:42: E225 missing whitespace around operator
dist/docker/redhat/scyllasetup.py:89:58: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:92:44: E225 missing whitespace around operator
dist/docker/redhat/scyllasetup.py:92:63: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:95:41: E225 missing whitespace around operator
dist/docker/redhat/scyllasetup.py:95:57: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:98:22: E201 whitespace after '['
dist/docker/redhat/scyllasetup.py:98:42: E202 whitespace before ']'

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104110913.13796-1-ultrabug@gentoo.org>
2018-11-14 19:25:11 +02:00
Alexys Jacob
1585983fc9 dist/docker/redhat: coding style fixes
dist/docker/redhat/docker-entrypoint.py:20:1: E722 do not use bare 'except'
dist/docker/redhat/commandlineparser.py:13:13: E128 continuation line
under-indented for visual indent

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104120134.9598-1-ultrabug@gentoo.org>
2018-11-14 19:25:10 +02:00
Alexys Jacob
c24e0e5599 dist/common/scripts/scylla_util.py: coding style fixes
dist/common/scripts/scylla_util.py:388:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:414:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:418:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:453:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:468:5: E722 do not use bare 'except'
dist/common/scripts/scylla_util.py:472:1: E302 expected 2 blank lines, found 1

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104120832.11273-1-ultrabug@gentoo.org>
2018-11-14 19:25:09 +02:00
Vladimir Krivopalov
2c21fb4897 Use coloured tests results in test.py script output.
With the number of unit tests approaching one hundred, the output of
test.py becomes more challenging to read.
If some test fails, we will only get the details after all the tests
complete, but some tests take way longer than others.

With the coloured status, it is much simpler to immediately locate
failing tests. Developer can cancel others and repeat the failing
ones.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <63a99a2fb70fdc33fd6eeb8e18fee977a47bd278.1541541184.git.vladimir@scylladb.com>
2018-11-14 19:23:39 +02:00
Piotr Sarna
b04508041d tests: add CONTAINS test case to filtering tests 2018-11-14 16:08:19 +01:00
Piotr Sarna
0fc7d63842 cql3: enable filtering for CONTAINS restriction
With contains::is_satisfied_by(bytes_view) implemented,
it's possible to enable filtering support for CONTAINS restriction.

Fixes #3573
2018-11-14 14:39:21 +01:00
Piotr Sarna
d8a1693d84 cql3: add is_satisfied_by(bytes_view) for CONTAINS
is_satisfied_by that takes a bytes_view parameter is needed for
filtering, so it's provided for CONTAINS restriction.
2018-11-14 14:39:21 +01:00
Botond Dénes
9e4276669b flat_mutation_reader: document next_partition()
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <01fa57c7473c00e4dc891527a8628026b6dccc01.1542180913.git.bdenes@scylladb.com>
2018-11-14 13:38:38 +00:00
Avi Kivity
447f953a2c Merge "Add DEFAULT UNSET support to JSON" from Piotr
"
This series adds DEFAULT UNSET and DEFAULT NULL keyword support
to INSERT JSON statement, as stated in #3909.

Tests: unit (release)
"

* 'add_json_default_unset_2' of https://github.com/psarna/scylla:
  tests: add DEFAULT UNSET case to JSON cql tests
  tests: split JSON part of cql query test
  cql3: add DEFAULT UNSET to INSERT JSON
2018-11-13 09:14:50 -08:00
Piotr Sarna
fc4ecf9be4 tests: add DEFAULT UNSET case to JSON cql tests
A case covering DEFAULT UNSET/DEFAULT NULL params is added
to json cql query test suite.

Refs #3909
2018-11-13 18:06:15 +01:00
Piotr Sarna
cb6fd6a30d tests: split JSON part of cql query test
JSON part of cql query test is split into another file
to make cql_query_test.cc less huge.
2018-11-13 18:06:15 +01:00
Piotr Sarna
e153e590c1 cql3: add DEFAULT UNSET to INSERT JSON
When inserting a JSON, additional DEFAULT UNSET or DEFAULT NULL
keywords can be appended.
With DEFAULT UNSET, values omitted in JSON will not be changed
at all. With DEFAULT NULL (default), omitted values will be
treated as having a 'null' value.

Fixes #3909
2018-11-13 18:05:55 +01:00
Avi Kivity
a089f66755 Merge "ec2_multi_region_snitch: print a proper error message when a Public IP is not available" from Vlad
"
Fix for #3897
"Ec2MultiRegionSnitch: prints a cryptic error when a Public IP is not
available"

Ec2MultiRegionSnitch naturally requires a Public IP to be available and
therefore it's expected to refuse to work without it.

However the error message that is printed today is a total disaster and
has to be fixed ASAP to be something much more human readable.

This series adds a human readable preabmle that will let a poor user
understand what should he/she do.
"

* 'improve-ec2-multi-region-snitch-error-message-when-pulic-address-is-not-available-v2' of https://github.com/vladzcloudius/scylla:
  locator: ec2_multi_region_snitch::start(): print a human readable error if Public IP may not be retrieved
  locator: ec2_multi_region_snitch::start(): rework on top of seastar::thread
2018-11-13 09:02:55 -08:00
Duarte Nunes
a38f6078fb Merge 'Generating view updates during streaming' from Piotr
During streaming, there are cases when we should invoke the view write
path. In particular, if we're streaming because of repair or if a view
has not yet finished building and we're bootstrapping a new node.

The design constraints are:
1) The streamed writes should be visible to new writes, but the
   sstable should not participate in compaction, or we would lose the
   ability to exclude the streamed writes on a restart;
2) The streamed writes must not be considered when generating view
   updates for them;
3) Resilient to node restarts;
4) Resilient to concurrent stream sessions, possibly streaming mutations for overlapping ranges.

We achieve this by writing the streamed writes to an sstable in a
different folder, call it "staging". We achieve 1) by publishing the
sstable to the column family sstable set, but excluding it from
compactions. We do these steps upon boot, by looking at the staging
directory, thus achieving 3).

Fixes #3275

* 'streaming_view_to_staging_sstables_9' of https://github.com/psarna/scylla: (29 commits)
  tests: add materialized views test
  tests: add view update generator to cql test env
  main: add registering staging sstables read from disk
  database: add a check if loaded sstable is already staging
  database: add get_staging_sstable method
  streaming: stream tables with views through staging sstables
  streaming: add system distributed keyspace ref to streaming
  streaming: add view update generator reference to streaming
  main: add generating missed mv updates from staging sstables
  storage_service: move initializing sys_dist_ks before bootstrap
  db/view: add view_update_from_staging_generator service
  db/view: add view updating consumer
  table: add stream_view_replica_updates
  table: split push_view_replica_updates
  table: add as_mutation_source_excluding
  table: move push_view_replica_updates to table.cc
  database: add populating tables with staging sstables
  database: add creating /staging directory for sstables
  database: add sstable-excluding reader
  table: add move_sstable_from_staging_in_thread function
  ...
2018-11-13 15:16:31 +00:00
Piotr Sarna
1724ee55c7 tests: add materialized views test
Right now materialized_views_test.cc contains view updating tests,
but the intention is to move mv-related tests from cql_query_test
here and use it for all future unit testing of MV.
2018-11-13 15:21:55 +01:00
Piotr Sarna
056a78bbc7 tests: add view update generator to cql test env
Keeping view update generator in cql test env enables
generating updates from staging sstables in tests.
2018-11-13 15:04:43 +01:00
Piotr Sarna
16c042039c main: add registering staging sstables read from disk
Staging sstables read from disk are registered to the view update
generator right after initializing non system keyspaces.

Fixes #3275
2018-11-13 15:04:43 +01:00
Piotr Sarna
de43b4f41d database: add a check if loaded sstable is already staging
Staging sstables are loaded before regular ones. If the process
fails midway, an sstable can be linked both in the regular directory
and in staging directory. In such cases, the sstable remains
in staging and will be moved to the regular directory
by view update streamer service.
2018-11-13 15:04:43 +01:00
Piotr Sarna
d7849e6ea4 database: add get_staging_sstable method
This method can be used to check if sstable is staging,
i.e. it shouldn't be compacted and it will not be used
for generating view updates from other staging tables,
and return proper shared_sstable pointer if it is.
2018-11-13 15:04:43 +01:00
Piotr Sarna
32c0fe8df2 streaming: stream tables with views through staging sstables
While streaming to a table with paired views, staging sstables
are used. After the table is written to disk, it's used to generate
all required view updates. It's also resistant to restarts as it's
stored on a hard drive in staging/ directory.

Refs #3275
2018-11-13 15:04:42 +01:00
Piotr Sarna
dc74887ff3 streaming: add system distributed keyspace ref to streaming
Streaming code needs system distributed keyspace to check if streamed
sstables should be staging, so a proper reference is added.
2018-11-13 15:01:53 +01:00
Piotr Sarna
7ef5e1b685 streaming: add view update generator reference to streaming
Streaming code may need view update generator service to generate
and send view updates, so a proper reference is added.
2018-11-13 15:01:53 +01:00
Piotr Sarna
eb0c507a45 main: add generating missed mv updates from staging sstables
If any sstables are found in the staging directory, it means that
they missed generating view updates, so it's performed now.
2018-11-13 15:01:53 +01:00
Piotr Sarna
ca5dfdffc6 storage_service: move initializing sys_dist_ks before bootstrap
Bootstrapping process may need system distributed keyspace
to generate view updates, so initializing sys_dist_ks
is moved before the bootstrapping process is launched.
2018-11-13 15:01:53 +01:00
Piotr Sarna
fc7267c797 db/view: add view_update_from_staging_generator service
A shardable service for generating mv updates after restarts
is added.
2018-11-13 15:01:52 +01:00
Piotr Sarna
ed05d91adc db/view: add view updating consumer
This consumer is used to generate and push view replica updates
from read mutations.
2018-11-13 14:54:39 +01:00
Piotr Sarna
348fa3b092 table: add stream_view_replica_updates
Generating view replica updates during streaming ignores
the staging sstable that is used to generate them.
2018-11-13 14:52:22 +01:00
Piotr Sarna
fed9c59eb8 table: split push_view_replica_updates
push_view_replica_updates is split in order to allow different
mutation source to be provided.
2018-11-13 14:52:22 +01:00
Piotr Sarna
466d780445 table: add as_mutation_source_excluding
A variant of table::as_mutation_source that allows excluding
a single sstable is added.
2018-11-13 14:52:22 +01:00
Piotr Sarna
c825a17b9d table: move push_view_replica_updates to table.cc 2018-11-13 14:52:22 +01:00
Piotr Sarna
a17fcb8d94 database: add populating tables with staging sstables
After populating tables with regular sstables, same procedure
is performed for staging sstables.
2018-11-13 14:52:22 +01:00
Piotr Sarna
19bf94fa8f database: add creating /staging directory for sstables
staging directory is now created on boot.
2018-11-13 14:52:22 +01:00
Piotr Sarna
e88b85134c database: add sstable-excluding reader
When generating view updates from a staging sstable, this sstable
should not be used in the process. Hence, a reader that skips a single
sstable is added.
2018-11-13 14:52:22 +01:00
Avi Kivity
a8203ca799 Update seastar submodule
* seastar c02150e...a44cedf (5):
  > build: link against libatomic
  > dns.cc: Include name/address in resolver error messages
  > log: Print full error message for std::system_error
  > tests: test-utils: Add missing include
  > fstream: Introduce make_file_data_sink()

Fixes #3894.
2018-11-13 03:28:16 -08:00
Piotr Sarna
160a6d58d2 table: add move_sstable_from_staging_in_thread function
After materialized view updates are generated, the sstable
should be moved from staging/ to a regular directory.
It's expected to be called from seastar::async thread context.
2018-11-13 11:45:30 +01:00
Piotr Sarna
ff361ca877 sstables: add move_to_new_dir_in_thread function
When moving sstables between directories, this helper function
will create links and update generation and dir accordingly.
It's expected to be called in thread context.
2018-11-13 11:45:30 +01:00
Piotr Sarna
b7977f4790 sstables: add staging directory to regex
datadir/staging directory becomes a valid path for an sstable.
2018-11-13 11:45:30 +01:00
Piotr Sarna
e42d97060f database: provide nonfrozen version of push_view_replica_updates
Now it's also possible to pass a mutation to push to view replicas.
2018-11-13 11:45:30 +01:00
Piotr Sarna
642c3ae0e0 database: add subdir param to make_streaming_sstable_for_write
This function allows specifying a subfolder to put a newly created
sstable in - e.g. staging/ subfolder for streamed base table mutations.
2018-11-13 11:45:30 +01:00
Piotr Sarna
788e03433c table: init table.cc file
This file will be used to move table-related functions to it.
2018-11-13 11:45:30 +01:00
Piotr Sarna
8e053f9efb database: add staging sstables to a map
SSTables that belong to staging/ directory are put in the
_sstables_staging map.
2018-11-13 11:45:30 +01:00
Piotr Sarna
3970808294 sstables: add is_staging() method
This method returns true if the last part of directory structure
is /staging.
2018-11-13 11:45:30 +01:00
Piotr Sarna
3f34312aa6 database: skip staging sstables in compaction
Staging sstables are not part of the compaction process to ensure
than each sstable can be easily excluded from view generation process
that depends on the mentioned sstable.
2018-11-13 11:45:30 +01:00
Piotr Sarna
701d88e39f database: add staging sstables map
In order to keep track of staging sstables (used for mv updates),
a map of them is now kept in table class.
2018-11-13 11:45:30 +01:00
Paweł Dziepak
6469a1b451 Merge "Write static rows for all partitions if there are static columns" from Vladimir
"
It appears that in case when there are any static columns in serialization header,
Cassandra would write a (possibly empty) static row to every partition
in the SSTables file.

This patchset alings Scylla's logic with that of Cassandra.

Note that Scylla optimizes the case when no partition contains a static
row because it keeps track of updated columns that Scylla currently does
not do - see #3901 for details.

Fixes #3900.
"

* 'projects/sstables-30/write-all-static-rows/v1' of https://github.com/argenet/scylla:
  tests: Test writing empty static rows for partitions in tables with static columns.
  sstables: Ignore empty static rows on reading.
  sstables: Write empty static rows when there are static columns in the table.
2018-11-09 12:01:25 -08:00
Raphael S. Carvalho
1c5934c934 sstables: fix procedure to get fully expired sstables with MC format
MC format lacks ancestors metadata, so we need to workaround it by using
ancestors in metadata collector, which is only available for a sstable
written during this instance. It works fine here because we only want
to know if a sstable recently compacted has an ancestor which wasn't
yet deleted.

Fixes #3852.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Reviewed-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <20181102154951.22950-1-raphaelsc@scylladb.com>
2018-11-06 09:28:37 +02:00
Vladimir Krivopalov
69b453fb69 tests: Test writing empty static rows for partitions in tables with static columns.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-11-05 13:47:30 -08:00
Vladimir Krivopalov
f767dfbb33 sstables: Ignore empty static rows on reading.
Fixes #3900.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-11-05 13:47:30 -08:00
Vladimir Krivopalov
89051d37e3 sstables: Write empty static rows when there are static columns in the table.
This is consistent with what Cassandra does.

Fixes #3900.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-11-05 13:28:50 -08:00
Vladimir Krivopalov
2ebab69ce7 mutation_source_test: Use counter and collection columns in static rows.
They are legal and should be covered along with atomic columns.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <a1c0e0f8c0c0f12b68af6df426370511f4e1253b.1541106233.git.vladimir@scylladb.com>

[tgrabiec: fixed the patch title]
2018-11-02 10:33:27 +01:00
Vlad Zolotarov
2636395c65 locator: ec2_multi_region_snitch::start(): print a human readable error if Public IP may not be retrieved
Public IP is required for Ec2MultiRegionSnitch. If it's not available
different snitch should be used.

This patch would result in a readable error message to be printed
instead of just a cryptic message with HTTP response body.

Fixes #3897

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-11-01 11:50:58 -04:00
Vlad Zolotarov
c462af5549 locator: ec2_multi_region_snitch::start(): rework on top of seastar::thread
Rework ec2_multi_region_snitch::start() on top of seastar::async() in
order to simplify the code.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-11-01 10:48:37 -04:00
Paweł Dziepak
1129134a4a Merge "Convert sprint() calls to fmt" from Avi
"
The update to libfmt 5.2.1 brought with it a subtle change - calls to
sprint("%s", 3) now throw a format_error instead of returning "3". To
prevent such hidden (or not so hidden) bugs from lurking, convert all calls
to the modern fmt syntax.

Such conversion has several benefits:
 - prevent the bug from biting us
 - as fmt is being standardized, we can later move to std::format()
 - commonality with the logger format syntax (indeed, we may move the logger
   to use libfmt itself)

During the conversion, some bugs were caught and fixed. These are presented in
individual patches in the patchset.

Most of the conversion was scripted, using https://github.com/avikivity/unsprint.

Some sprint() calls remain, as they were too complex for the script. They
will be converted later.
"

* tag 'fmt-1/v1' of https://github.com/avikivity/scylla:
  toplevel: convert sprint() to format()
  repair: convert sprint() to format()
  tests: convert sprint() to format()
  tracing: convert sprint() to format()
  service: convert sprint() to format()
  exceptions: convert sprint() to format()
  index: convert sprint() to format()
  streaming: convert sprint() to format()
  streaming: progress_info: fix format string
  api: convert sprint() to format()
  dht: convert sprint() to format()
  thrift: convert sprint() to format()
  locator: convert sprint() to format()
  gms: convert sprint() to format()
  db: convert sprint() to format()
  transport: convert sprint() to format()
  utils: convert sprint() to format()
  sstables: convert sprint() to format()
  auth: convert sprint() to format()
  cql3: convert sprint() to format()
  row_cache: fix bad format string syntax
  repair: fix bad format string syntax
  tests: fix bad format string syntax
  dht: fix bad format string syntax
  sstables: fix bad format string syntax
  utils: estimated_histogram: convert generated format strings to fmt
  tests: perf_fast_forward: rename "format" variable
  tests: perf_fast_forward: massage result of sprint() into std::string
  utils: i_filter: rename "format" variable
  system_keyspace: simplify complicated sprint()
  cql: convert Cql.g sprint()s to fmt
  types: get rid of PRId64 formatting
2018-11-01 13:16:17 +00:00
Avi Kivity
a71ab365e3 toplevel: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
51ce53738f repair: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
f70ece9f88 tests: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
239ecec043 tracing: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
bb0eb9dae8 service: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
71fc5fb738 exceptions: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
7ae23d8f9b index: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
fd513c42ad streaming: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
8501e2a45d streaming: progress_info: fix format string
We try to escape % as \%, but the correct escape is %%.
2018-11-01 13:16:17 +00:00
Avi Kivity
da17c29bd3 api: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
82818758ca dht: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
7a125c6634 thrift: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
0c33d13165 locator: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
e096fa2fde gms: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
d77e044cde db: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
5f79ff0f54 transport: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
be99101f36 utils: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
455f00e993 sstables: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
eb74fe784d auth: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
cb7ee5c765 cql3: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
8cca3b2879 row_cache: fix bad format string syntax
Some sprint() calls use the fmt language instead of the printf syntax. Convert
them all the way to format().
2018-11-01 13:16:17 +00:00
Avi Kivity
6488b017c3 repair: fix bad format string syntax
Some sprint() calls use the fmt language instead of the printf syntax. Convert
them all the way to format().
2018-11-01 13:16:17 +00:00
Avi Kivity
bceff1550c tests: fix bad format string syntax
Some sprint() calls use the fmt language instead of the printf syntax. Convert
them all the way to format().
2018-11-01 13:16:17 +00:00
Avi Kivity
7ff5569ee8 dht: fix bad format string syntax
Some sprint() calls use the fmt language instead of the printf syntax. Convert
them all the way to format().
2018-11-01 13:16:17 +00:00
Avi Kivity
738e713edf sstables: fix bad format string syntax
Some sprint() calls use the fmt language instead of the printf syntax. Convert
them all the way to format().
2018-11-01 13:16:17 +00:00
Avi Kivity
3cf434b863 utils: estimated_histogram: convert generated format strings to fmt
Convert printf games to format games.

Note that fmt supports specifying the field width as an argument, but that
is left to a dedicated change.
2018-11-01 13:16:17 +00:00
Avi Kivity
8ca4b7abea tests: perf_fast_forward: rename "format" variable
The format local variable will soon alias with the format function which we
intend to use in the same context. Rename it away to avoid a clash.
2018-11-01 13:16:17 +00:00
Avi Kivity
7908f09148 tests: perf_fast_forward: massage result of sprint() into std::string
sprint() returns std::string(), but the new format() returns an sstring. Usually
an sstring is wanted but in this case an sstring will fail as it is added to
an std::string.

Fix the failure (after spring->format conversion) by converting to an std::string.
2018-11-01 13:16:17 +00:00
Avi Kivity
7726ce23b7 utils: i_filter: rename "format" variable
The format variable hides the format function, which we'll soon want to use
here. Rename the format variable to unhide the function.
2018-11-01 13:16:17 +00:00
Avi Kivity
04b70a2ff8 system_keyspace: simplify complicated sprint()
update_peer_info() uses two sprint()s where one would do, which confuses
the sprint-to-fmt translator. Simplify the code by using just one call.
2018-11-01 13:16:17 +00:00
Avi Kivity
23e05a045b cql: convert Cql.g sprint()s to fmt
The only sprint() call had an extra complication due to quoting, which can be
removed now.
2018-11-01 13:16:16 +00:00
Avi Kivity
8db8c01fbe types: get rid of PRId64 formatting
It's not needed for out sprint() implementation, and gets in the way of
converting all formatting to fmt.
2018-11-01 13:16:16 +00:00
Avi Kivity
f170e3e589 Merge "dist: use perftune.py for disks tuning" from Vlad
"
Use perftune.py for tuning disks:
   - Distribute/pin disks' IRQs:
      - For NVMe drives: evenly among all present CPUs.
      - For non-NVMe drives: according to chosen tuning mode.
   - For all disks used by scylla:
      - Tune nomerges
      - Tune I/O scheduler.

It's important to tune NIC and disks together in order to keep IRQ
pinning in the same mode.

Disk are detected and tuned based on the current content of
/etc/scylla/scylla.yaml configuration file.
"

Fixes #3831.

* 'use_perftune_for_disks-v3' of https://github.com/vladzcloudius/scylla:
  dist: change the sysconfig parameter name to reflect the new semantics
  scylla_util.py::sysconfig_parser: introduce has_option()
  dist: scylla_setup and scylla_sysconfig_setup: change paremeters names to reflect new semantics
  dist: don't distribute posix_net_conf.sh any more
  dist: use perftune.py to tune disks and NIC
2018-11-01 13:13:49 +00:00
Avi Kivity
96173e81e0 Update seastar submodule
* seastar c1e0e5d...c02150e (5):
  > prometheus: pass names as query parameter instead of part of the URL
  > treewide: convert printf() style formatting to fmt
  > print: add fmt_print()
  > build: Remove experimental CMake support
  > Merge "Correct and clean-up `signal_test`" from Jesse
2018-11-01 13:13:48 +00:00
Yibo Cai (Arm Technology China)
79136e895f utils/crc: calculate crc in parallel
It achieves 2.0x speedup on intel E5 and 1.1x to 2.5x speedup on
various arm64 microarchitectures.

The algorithm cuts data into blocks of 1024 bytes and calculates crc
for each block, which is furthur divided into three subblocks of 336
bytes(42 uint64) each, and 16 remaining bytes(2 uint64).

For each iteration, three independent crc are caculated for one uint64
from each subgroup. It increases IPC(instructions per cycle) much.
After subblocks are done, three crc and remaining two uint64 are
combined using carry-less multiplication to reach the final result
for one block of 1024 bytes.

Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Message-Id: <1541042759-24767-1-git-send-email-yibo.cai@arm.com>
2018-11-01 10:19:32 +02:00
Vlad Zolotarov
84d341a12d dist: change the sysconfig parameter name to reflect the new semantics
We tune NIC and disks together now. Change the sysconfig parameter to
reflect this new semantics.

However if we detect an old parameter name in the scylla-server we would
still update it thereby keeping the support for old installations.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-31 15:28:13 -04:00
Vlad Zolotarov
7950062a82 scylla_util.py::sysconfig_parser: introduce has_option()
has_option() returns TRUE if a given configuration option is set.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-31 15:27:00 -04:00
Vlad Zolotarov
9a5373254a dist: scylla_setup and scylla_sysconfig_setup: change paremeters names to reflect new semantics
Change the name of the corresponding parameter (--setup-nic) to reflect
the fact that we tune not just NIC now but rather NIC and disks together.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-31 15:27:00 -04:00
Vlad Zolotarov
c74e1a9368 dist: don't distribute posix_net_conf.sh any more
We don't need it since we use perftune.py directly

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-31 15:27:00 -04:00
Vlad Zolotarov
0e47d8bb1d dist: use perftune.py to tune disks and NIC
Tune disks using perftune.py together with NIC.
This is needed because disk(s) and NIC tuning has to be
performed using the mode (for non-NVMe disks).

We tune disks based on the current content of /etc/scylla/scylla.yaml.

Don't use scylla-blocktune for optimizing disks' performance
any more.

Unite the decision to optimize the NIC and disks tuning.
Optimize or not optimize them both together.

Disable disk tuning for DPDK and "virtio" modes for now.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-31 15:27:00 -04:00
Takuya ASADA
5bf9a03d65 dist/debian: skip running dh_strip_nondeterminism
On some Fedora environment dh build tries to run
dh_strip_nondeterminism, and fails sice Fedora does not provide such
command.
(see:
http://jenkins.cloudius-systems.com/view/master/job/scylla-master/job/unified-deb/3/console)

To prevent the build error we need to skip it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181030062935.9930-1-syuu@scylladb.com>
2018-10-31 10:23:54 +02:00
Tomasz Grabiec
62c7685b0d Merge "Proper support for static rows in SSTables 3.x" from Vladimir
This patchset addresses two issues with static rows support in SSTables
3.x. ('mc' format):

1. Since collections are allowed in static rows, we need to check for
complex deletion, set corresponding flag and write tombstones, if any.
2. Column indices need to be partitioned for static columns the same way
they are partitioned for regular ones.

 * github.com/argenet/scylla.git projects/sstables-30/columns-proper-order-followup/v1:
  sstables: Partition static columns by atomicity when reading/writing
    SSTables 3.x.
  sstables: Use std::reference_wrapper<> instead of a helper structure.
  sstables: Check for complex deletion when writing static rows.
  tests: Add/fix comments to
    test_write_interleaved_atomic_and_collection_columns.
  tests: Add test covering inverleaved atomic and collection cells in
    static row.
2018-10-30 10:36:46 +01:00
Vladimir Krivopalov
d82ac02fad tests: Add test covering inverleaved atomic and collection cells in static row.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-29 15:01:34 -07:00
Vladimir Krivopalov
7bd95399ed tests: Add/fix comments to test_write_interleaved_atomic_and_collection_columns.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-29 15:00:55 -07:00
Vladimir Krivopalov
6bd738ceb1 sstables: Check for complex deletion when writing static rows.
It is possible to have collections in a static row so we need to check
for collection-wide tombstones like with clustering rows.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-29 14:59:19 -07:00
Vladimir Krivopalov
6b7003088a sstables: Use std::reference_wrapper<> instead of a helper structure.
No need to store column_id separately as it can be accessed from the
column_definition.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-29 14:58:08 -07:00
Vladimir Krivopalov
8592b834d1 sstables: Partition static columns by atomicity when reading/writing SSTables 3.x.
Collections are permitted in static rows so same partitioning as for
regular columns is required.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-29 10:32:02 -07:00
Takuya ASADA
2ac14dcf25 dist/redhat: prevent build error on older Fedora/CentOS
Current scylla.spec fails build on Fedora 27, since python2-pystache is
new package name that renamed on Fedora 28.
But Fedora 28's python2-pystache has tag "Provides: pystache",
so we can depends on old package name, this way we can build scylla.spec both
on Fedora 27/28.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181028175450.31156-1-syuu@scylladb.com>
2018-10-29 11:36:40 +02:00
Yibo Cai (Arm Technology China)
1c48e3fbec utils/crc: leverage arm64 crc extension
It achieves 6.7x to 11x speedup on various arm64 microarchitectures.

Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Message-Id: <1540781879-15465-1-git-send-email-yibo.cai@arm.com>
2018-10-29 10:50:48 +02:00
Nadav Har'El
b8337f8c9d Materalized views: fix race condition in resharding while view building
When a node reshards (i.e., restarts with a different number of CPUs), and
is in the middle of building a view for a pre-existing table, the view
building needs to find the right token from which to start building on all
shards. We ran the same code on all shards, hoping they would all make
the same decision on which token to continue. But in some cases, one
shard might make the decision, start building, and make progress -
all before a second shard goes to make the decision, which will now
be different.

This resulted, in some rare cases, in the new materialized view missing
a few rows when the build was interrupted with a resharding.

The fix is to add the missing synchronization: All shards should make
the same decision on whether and how to reshard - and only then should
start building the view.

Fixes #3890
Fixes #3452

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181028140549.21200-1-nyh@scylladb.com>
2018-10-28 17:20:10 +00:00
Avi Kivity
75dbff984c Merge "Re-order columns when reading/writing SSTables 3.x" from Vladimir
"
In Cassandra, row columns are stored in a BTree that uses the following
ordering on them:
    - all atomic columns go first, then all multi-cell ones
    - columns of both types (atomic and multi-cell) are
      lexicographically ordered by name regarding each other

Scylla needs to store columns and their respective indices using the
same ordering as well as when reading them back.

Fixes #3853

Tests: unit {release}

+

Checked that the following SSTables are dumped fine using Cassandra's
sstabledump:

cqlsh:sst3> CREATE TABLE atomic_and_collection3 ( pk int, ck int, rc1 text, rc2 list<text>, rc3 text, rc4 list<text>, rc5 text, rc6 list<text>, PRIMARY KEY (pk, ck)) WITH compression = {'sstable_compression': ''};
cqlsh:sst3> INSERT INTO atomic_and_collection3 (pk, ck, rc1, rc4, rc5) VALUES (0, 0, 'hello', ['beautiful','world'], 'here');
<< flush >>

sstabledump:

[
  {
    "partition" : {
      "key" : [ "0" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 96,
        "clustering" : [ 0 ],
        "liveness_info" : { "tstamp" : "1540599270139464" },
        "cells" : [
          { "name" : "rc1", "value" : "hello" },
          { "name" : "rc5", "value" : "here" },
          { "name" : "rc4", "deletion_info" : { "marked_deleted" : "1540599270139463", "local_delete_time" : "1540599270" } },
          { "name" : "rc4", "path" : [ "45e22cb0-d97d-11e8-9f07-000000000000" ], "value" : "beautiful" },
          { "name" : "rc4", "path" : [ "45e22cb1-d97d-11e8-9f07-000000000000" ], "value" : "world" }
        ]
      }
    ]
  }
]
"

* 'projects/sstables-30/columns-proper-order/v1' of https://github.com/argenet/scylla:
  tests: Test interleaved atomic and multi-cell columns written to SSTables 3.x.
  sstables: Re-order columns (atomic first, then collections) for SSTables 3.x.
  sstables: Use a compound structure for storing information used for reading columns.
2018-10-28 10:56:09 +02:00
Rafi Einstein
32525f2694 Space-Saving Top-k algorithm for handling stream summary statistics
Based on the following implementation ([2]) for the Space-Saving algorithm from [1].
[1] http://www.cse.ust.hk/~raywong/comp5331/References/EfficientComputationOfFrequentAndTop-kElementsInDataStreams.pdf
[2] https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/StreamSummary.java

The algorithm keeps a map between keys seen and their counts, keeping a bound on the number of tracked keys.
Replacement policy evicts the key with the lowest count while inheriting its count, and recording an estimation
of the error which results from that.
This error estimation can be later used to prove if the distribution we arrived at corresponds to the real top-K,
which we can display alongside the results.
Accuracy depends on the number of tracked keys.

Introduced as part of 'nodetool toppartition' query implementation.

Refs #2811
Message-Id: <20181027220937.58077-1-rafie@scylladb.com>
2018-10-28 10:10:28 +02:00
Vladimir Krivopalov
f3dc2a4927 tests: Test interleaved atomic and multi-cell columns written to SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-26 16:58:34 -07:00
Vladimir Krivopalov
7e56e9fca6 sstables: Re-order columns (atomic first, then collections) for SSTables 3.x.
In Cassandra, row columns are stored in a BTree that uses the following
ordering on them:
    - all atomic columns go first, then all multi-cell ones
    - columns of both types (atomic and multi-cell) are
      lexicographically ordered by name regarding each other

Since schema already has all columns lexicographically sorted by name,
we only need to stably partition them by atomicity for that.

Fixes #3853

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-26 15:58:33 -07:00
Vladimir Krivopalov
210507b867 sstables: Use a compound structure for storing information used for reading columns.
This representation makes it easier to operate with compound structures
instead of separate values that were stored in multiple containers.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-26 11:32:44 -07:00
Tomasz Grabiec
cf2d5c19fb Merge "Properly write static rows missing columns for SSTables 3.x." from Vladimir
Before this fix, write_missing_columns() helper would always deal with
regular columns even when writing static rows.

This would cause errors on reading those files.

Now, the missing columns are written correctly for regular and static
rows alike.

* github.com/argenet/scylla.git projects/sstables-30/fix-writing-static-missing-columns/v1:
  schema: Add helper method returning the count of columns of specified
    kind.
  sstables: Honour the column kind when writing missing columns in 'mc'
    format.
  tests: Add test for a static row with missing columns (SStables 3.x.).
2018-10-26 09:06:01 +02:00
Vladimir Krivopalov
9843343ad8 tests: Add test for a static row with missing columns (SStables 3.x.).
This is a test case for #3892.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-25 17:16:31 -07:00
Vladimir Krivopalov
44043cfd44 sstables: Honour the column kind when writing missing columns in 'mc' format.
Previously, we've been writing the wrong missing columns indices for
static rows because write_missing_columns() explicitly used regular
columns internally.

Now, it takes the proper column kind into account.

Fixes #3892

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-25 17:09:09 -07:00
Vladimir Krivopalov
399f815a89 schema: Add helper method returning the count of columns of specified kind.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-25 17:07:20 -07:00
Tomasz Grabiec
dcac0ac80c tests: sstables: Verify no index reads during scans which dont need it
Reproducer for https://github.com/scylladb/scylla/issues/3868

Message-Id: <1540459849-27612-2-git-send-email-tgrabiec@scylladb.com>
2018-10-25 16:14:45 +03:00
Tomasz Grabiec
46d0c157ae tests: sstables: Extract make_sstable_mutation_source()
Message-Id: <1540459849-27612-1-git-send-email-tgrabiec@scylladb.com>
2018-10-25 16:14:39 +03:00
Tomasz Grabiec
fe0a0bdf1e utils/loading_shared_values: Add missing stat update call in one of the cases
Message-Id: <1540469591-32738-1-git-send-email-tgrabiec@scylladb.com>
2018-10-25 15:15:05 +03:00
Duarte Nunes
e46ef6723b Merge seastar upstream
* seastar d152f2d...c1e0e5d (6):
  > scripts: perftune.py: properly merge parameters from the command line and the configuration file
  > fmt: update to 5.2.1
  > io_queue: only increment statistics when request is admitted
  > Adds `read_first_line.cc` and `read_first_line.hh` to CMake.
  > fstream: remove default extent allocation hint
  > core/semaphore: Change the access of semaphore_units main ctor

Due to a compile-time fight between fmt and boost::multiprecision, a
lexical_cast was added to mediate.

sprint("%s", var) no longer accepts numeric values, so some sprint()s were
converted to format() calls. Since more may be lurking we'll need to remove
all sprint() calls.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-25 12:53:30 +03:00
Benny Halevy
2a57c454f2 update_compaction_history: handle execute_cql exception
Fixes #3774

Tested using view_schema_test with and without injecting an exception in
modification_statement::do_execute for "compaction_history".

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181017105758.9602-3-bhalevy@scylladb.com>
2018-10-24 18:39:53 +03:00
Benny Halevy
44e5c2643b compaction_manager::maybe_stop_on_error: add stop_iteration param
some call sites are stopping in any case, regardless of what
maybe_stop_on_error returns. Reflect that in the log messages.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181017105758.9602-2-bhalevy@scylladb.com>
2018-10-24 18:39:52 +03:00
Avi Kivity
8210f4c982 Merge "Properly writing/reading shadowable deletions with SSTables 3.x." from Vladimir
"
This patchset adddresses two problems with shadowable deletions handling
in SSTables 3.x. ('mc' format).

Firstly, we previously did not set a flag indicating the presence of
extended flags byte with HAS_SHADOWABLE_DELETION bitmask on writing.
This would break subsequent reading and cause all types of failures up
to crash.

Secondly, when reading rows with this extended flag set, we need to
preserve that information and create a shadowable_tombstone for the row.

Tests: unit {release}
+

Verified manually with 'hexdump' and using modified 'sstabledump' that
second (shadowable) tombstone is written for MV tables by Scylla.

+
DTest (materialized_views_test.py:TestMaterializedViews.hundred_mv_concurrent_test)
that originally failed due to this issue has successfully passed locally.
"

* 'projects/sstables-30/shadowable-deletion/v4' of https://github.com/argenet/scylla:
  tests: Add tests writing both regular and shadowable tombstones to SSTables 3.x.
  tests: Add test covering writing and reading a shadowable tombstone with SSTables 3.x.
  sstables: Support Scylla-specific extension for writing shadowable tombstones.
  sstables: Introduce a feature for shadowable tombstones in Scylla.db.
  memtable: Track regular and shadowable tombstones separately in encoding_stats_collector.
  sstables: Error out when reading SSTables 3.x with Cassandra shadowable deletion.
  sstables: Support checking row extension flags for Cassandra shadowable deletion.
2018-10-24 18:20:16 +03:00
Tomasz Grabiec
9e756d3863 sstable_mutation_reader: Do not read partition index when scanning
Even when we're using a full clustering range, need_skip() will return
true when we start a new partition and advance_context() will be
called with position_in_partition::before_all_clustered_rows(). We
should detect that there is no need to skip to that position before
the call to advance_to(*_current_partition_key), which will read the
index page.

Fixes #3868.

Message-Id: <1539881775-8578-1-git-send-email-tgrabiec@scylladb.com>
2018-10-24 15:55:13 +03:00
Avi Kivity
925ef48fce Merge "Use relocatable package to generate .rpm/.deb" from Takuya
"
This patchset adds support generating .rpm/.deb from relocatable
package.
"

* 'reloc_rpmdeb_v5' of https://github.com/syuu1228/scylla:
  configure.py: run create-relocatable-package.py everytime
  configure.py: add SCYLLA-RELEASE-FILE/SCYLLA-VERSION-FILE targets
  configure.py: use {mode} instead of $mode on scylla-package.tar.gz build target
  dist/ami: build relocatable .rpm when --localrpm specified
  dist/debian: use relocatable package to produce .deb
  dist/redhat: use relocatable package to produce .rpm
  install-dependencies.sh: add libsystemd as dependencies
  install.sh: drop hardcoded distribution name, add --target option to specify distribution
  build: add script to build relocatable package
  build: compress relocatable package
  build: add files on relocatable package to support generating .rpm/.deb
2018-10-24 14:44:09 +03:00
Takuya ASADA
59e4900ca7 configure.py: run create-relocatable-package.py everytime
Right now we don't have dependencies for dist/, ninja not able to detect
changes under the directory.
To update relocatable package even only change is under dist/, we need
to run create-relocatable-package.py everytime.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
6e1617d71c configure.py: add SCYLLA-RELEASE-FILE/SCYLLA-VERSION-FILE targets
To re-generate scylla version files when it removed, since these files
required for relocatable package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
0cb8a4cb0c configure.py: use {mode} instead of $mode on scylla-package.tar.gz build target
It's better to use {mode} to extract fixed path just like other build targets do.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
929f03533d dist/ami: build relocatable .rpm when --localrpm specified
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
f3c3b9183c dist/debian: use relocatable package to produce .deb
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
8e2dc9e4f4 dist/redhat: use relocatable package to produce .rpm
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
5fa7ed52e3 install-dependencies.sh: add libsystemd as dependencies
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
ce4067ca02 install.sh: drop hardcoded distribution name, add --target option to specify distribution
Allow user to build .rpm for Fedora, need to support specifying distribution.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
6319229020 build: add script to build relocatable package
To build relocatable package easier, add build_reloc.sh to build it in
one command.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
a502715b29 build: compress relocatable package
Since debian packaging system requires source package to compress tar
file, so let's use .gz compression.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
85fed12c07 build: add files on relocatable package to support generating .rpm/.deb
We are missing some files on relocatable package to generate .rpm/.deb,
add them.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Paweł Dziepak
637b9a7b3b atomic_cell_or_collection: make operator<< show cell content
After the new in-memory representation of cells was introduced there was
a regression in atomic_cell_or_collection::operator<< which stopped
printing the content of the cell. This makes debugging more incovenient
are time-consuming. This patch fixes the problem. Schema is propagated
to the atomic_cell_or_collection printer and the full content of the
cell is printed.

Fixes #3571.

Message-Id: <20181024095413.10736-1-pdziepak@scylladb.com>
2018-10-24 13:29:51 +03:00
Avi Kivity
a9836ad758 thrift: limit message size
Limit message size according to the configuration, to avoid a huge message from
allocating all of the server's memory.

We also need to limit memory used in aggregate by thrift, but that is left to
another patch.

Fixes #3878.
Message-Id: <20181024081042.13067-1-avi@scylladb.com>
2018-10-24 09:57:58 +01:00
Raphael S. Carvalho
c958294991 tests/sstable_perf: fix compaction mode for a multi shard instance
Compaction mode fails if more than one shard is used because it doesn't
make sure sstables used as input for compaction only contain local keys.
Therefore, sstable generated by compaction has less keys than expected
because non-local keys are purged out.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181022225153.12029-1-raphaelsc@scylladb.com>
2018-10-24 09:58:34 +03:00
Glauber Costa
fc5635100d install seastar-addr2line and seastar-cpumap into scylla packages
It is very useful for investigations in scylla issues, and we have
been moving those scripts manually when needed. Make it officially
part of the scylla package.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181023184400.23187-1-glauber@scylladb.com>
2018-10-24 09:52:17 +03:00
Amnon Heiman
6bcde841bd scyllatop: Nicer error message when fail opening a log file or connecting
scyllatop uses a log file, if opening the file fails, the user should
get a clear response not an exception trace.

The same is true for connecting to scylla

After this patch the following:
$ scyllatop.py -L /usr/lib/scyllatop.log
scyllatop failed opening log file: '/usr/lib/scyllatop.log' With an error: [Errno 13] Permission denied: '/usr/lib/scyllatop.log'

Fixes #3860

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20181021065525.22749-1-amnon@scylladb.com>
2018-10-24 09:50:45 +03:00
Vlad Zolotarov
4d1bb719a4 config: enable hinted handoff by default
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20181019180401.12400-1-vladz@scylladb.com>
2018-10-24 09:47:36 +03:00
Vladimir Krivopalov
ad599d4342 tests: Add tests writing both regular and shadowable tombstones to SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Vladimir Krivopalov
3dcf0acfc2 tests: Add test covering writing and reading a shadowable tombstone with SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Vladimir Krivopalov
759d36a26e sstables: Support Scylla-specific extension for writing shadowable tombstones.
The original SSTables 'mc' format, as defined in Cassandra, does not provide
a way to store shadowable deletion in addition to regular row deletion
for materialized views.
It is essential to store it because of known corner-case issues that
otherwise appear.

For this to work, we introduce a Scylla-specific extended flag to be set
in SSTables in 'mc' format that indicates a shadowable tombstone is
written after the regular row tombstone.

This is deemed to be safe because shadowable tombstones are specific to
materialized views and MV tables are not supposed to be imported or
exported.

Note that a shadowable tombstone can be written without a regular
tombstone as well as along with it.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Vladimir Krivopalov
e168433945 sstables: Introduce a feature for shadowable tombstones in Scylla.db.
This is used to indicate that the SSTables being read may contain a
Scylla-specific HAS_SCYLLA_SHADOWABLE_TOMBSTONE extended flag set.

If feature is not disabled, we should not honour this flag.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Vladimir Krivopalov
a95ba2f38a memtable: Track regular and shadowable tombstones separately in encoding_stats_collector.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Vladimir Krivopalov
b7d48c1ccd sstables: Error out when reading SSTables 3.x with Cassandra shadowable deletion.
This flag can be only set in MV tables that are not supported to be
imported to Scylla.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Vladimir Krivopalov
8f79f76116 sstables: Support checking row extension flags for Cassandra shadowable deletion.
This flag can be only used in MV tables that are not supposed to be
imported to Scylla.
Since Scylla representation of shadowable tombstones differs from that
of Cassandra, such SSTables are rejected on read and Scylla never sets
this flag on writing.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Avi Kivity
1533487ba8 Merge "hinted handoff: give a sender a low priority" from Vlad
"
Hinted handoff should not overpower regular flows like READs, WRITEs or
background activities like memtable flushes or compactions.

In order to achieve this put its sending in the STEAMING CPU scheduling
group and its commitlog object into the STREAMING I/O scheduling group.

Fixes #3817
"

* 'hinted_handoff_scheduling_groups-v2' of https://github.com/vladzcloudius/scylla:
  db::hints::manager: use "streaming" I/O scheduling class for reads
  commitlog::read_log_file(): set the a read I/O priority class explicitly
  db::hints::manager: add hints sender to the "streaming" CPU scheduling group
2018-10-23 16:55:05 +00:00
Raphael S. Carvalho
65e8853e8d tests: test that sstable cleanup wont get rid of key which token belongs to node
Commit 1ce52d54 fixed sort order of local ranges, which is needed for cleanup to
work properly because it relies on that to perform a binary search.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181023031322.22763-1-raphaelsc@scylladb.com>
2018-10-23 16:55:05 +00:00
Avi Kivity
d9e0ea6bb0 config: mark range_request_timeout_in_ms and request_timeout_in_ms as Used
This makes them available in scylla --help.

Fixes #3884.
Message-Id: <20181023101150.29856-1-avi@scylladb.com>
2018-10-23 11:52:03 +01:00
Paweł Dziepak
c94d2b6aa6 cql3: restore original timeout behaviour for aggregate queries
Commit 1d34ef38a8 "cql3: make pagers use
time_point instead of duration" has unintentionally altered the timeout
semantics for aggregate queries. Such requests fetch multiple pages before
sending a response to the client. Originally, each of those fetches had
a timeout-duration to finish, after the problematic commit the whole
request needs to complete in a single timeout-duration. This,
unsurprisingly, makes some queries that were successful before fail with
a timeout. This patch restores the original behaviour.

Fixes #3877.

Message-Id: <20181022125318.4384-1-pdziepak@scylladb.com>
2018-10-23 12:52:42 +03:00
Takuya ASADA
950dbdb466 dist/common/sysctl.d: add new conf file to set fs.aio-max-nr
We need raise fs.aio-max-nr to larger value since Seastar may allocates
more then 65535 AIO events (= kernel default value)

Fixes #3842

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181023030449.15445-1-syuu@scylladb.com>
2018-10-23 11:01:07 +03:00
Tomasz Grabiec
a34e417874 Merge "Stabilise perf_fast_forward results" from Paweł
his series attempts to make fragments per second results reported by
perf_fast_forward more stable. That includes running each test case
multiple time and reporting median, median average deviation, maximum
and minimum value. That should allow to relatively easily assess how
repeatable the presented results are. Moreover, since perf_fast_forward
does IO operation it is important that they do not introduce any
excessive noise to the results. The location of the data directory
is made configurable so that the user can choose less noisy disk or a
ramdisk.

 * github.com/pdziepak/scylla.git stabilise-perf_fast_forward/v3:
  tests/perf_fast_forward: make fragments/s measurements more stable
  tests/perf_fast_forward: make data directory location configurable
2018-10-22 18:33:25 +02:00
Avi Kivity
d5d831f41b tests: network_topology_strategy_test: remove quadratic complexity
network_topology_strategy test creates a ring with hundreds of tokens (and one
token per node). Then, for each token, it calls get_primary_ranges(), which in
turn walks the token ring. However, because the each datacenter occupies a
disjoint token range, this walk practically has to walk the entire ring until
it collects enough endpoints for each datacenter. The whole thing takes 15 minutes.

Speed this up by randomizing the token<->dc relationship. This is more realistic,
and switches the algorithm to be O(token count), and now it completes in less
than a minute (still not great, but better).
Message-Id: <20181022154026.19618-1-avi@scylladb.com>
2018-10-22 17:06:57 +01:00
Paweł Dziepak
63a705dca3 tests/perf_fast_forward: make data directory location configurable
perf_fast_forward populates perf_fast_forward_output with some data and
then runs performance tests that read it. That makes the disk a
significant factor in the final result and may make the results less
repeatable. This patch adds a flag that allows setting the location
of the data directory so that the user can opt for a less noisy disk
or a ramdisk.
2018-10-22 16:52:58 +01:00
Paweł Dziepak
29e872f865 tests/perf_fast_forward: make fragments/s measurements more stable
perf_fast_forward performs various operations, many of which involve
sstable reads and verifies the metrics that there weren't any
unnecessary IO operations. It also provides fragments per seconds
measurements for the tests it runs. However, since some of the tests are
very short and involve IO those values vary a lot what makes them not
very useful.

This commit attempts to stabilise those results. Each test case is run
multiple time (by default for a second, but at least 3 times) and shows
median, median absolute deviation, maximum and minimum value. This
should allow assessing whether the changes in the results are just noise
or a real regression or improvement.
2018-10-22 16:52:58 +01:00
Duarte Nunes
f3a5ec0fd9 db/view: Don't copy keyspace name
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181022104527.14555-1-duarte@scylladb.com>
2018-10-22 13:00:00 +02:00
George Kollias
c2343dc841 Make restricting reader fill_buffer more efficient
Currently, restricting_mutation_reader::fill_buffer justs reads
lower-layer reader's fragments one by one without doing any further
transformations. This change just swaps the parent-child buffers in a
single step, as suggested in #3604, and, hence, removing any possible
per-fragment overhead.

I couldn't find any test that exercises restricting_mutation_reader as
a mutation source, so I added test_restricted_reader_as_mutation_source
in mutation_reader_test.

Tests: unit (release), though these 4 tests are failing regardless of
my changes (they fail on master for me as well): snitch_reset_test,
sstable_mutation_test, sstable_test, sstable_3_x_test.

Fixes: #3604

Signed-off-by: George Kollias <georgioskollias@gmail.com>
Message-Id: <1540052861-621-1-git-send-email-georgioskollias@gmail.com>
2018-10-22 11:36:54 +03:00
Duarte Nunes
3fe92663d4 Merge 'Fix for a select statement with filtered columns' from Eliran
"
This patchset fixes #3803. When a select statement with filtering
is executed and the column that is needed for the filtering is not
present in the select clause, rows that should have been filtered out
according to this column will still be present in the result set.

Tests:
 1. The testcase from the issue.
 2. Unit tests (release) including the
 newly added test from this patchset.
"

* 'issues/3803/v10' of https://github.com/eliransin/scylla:
  unit test: add test for filtering queries without the filtered column
  cql3 unit test: add assertion for the number of serialized columns
  cql3: ensure retrieval of columns for filtering
  cql3: refactor find_idx to be part of statement restrictions object
  cql3: add prefix size common functionality to all clustering restrictions
  cql3: rename selection metadata manipulation functions
2018-10-21 09:53:37 +01:00
Eliran Sinvani
145f931ae7 unit test: add test for filtering queries without the filtered column
Test the usecase where the column that the filtering operates on
is not a part of the select clause. The expected result is a set
containing the columns of the select clause with the additional
columns for filtering marked as non serializable.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2018-10-21 08:41:46 +03:00
Eliran Sinvani
86637a1d0d cql3 unit test: add assertion for the number of serialized columns
The result sets that the assertions are performed against
are result sets before serialization to the user and therefore
contain also columns that will not be serialized and sent as
the query's final result. The patch adds an assertion on the
number of columns that will be present in the final serialized
result.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2018-10-21 08:41:46 +03:00
Eliran Sinvani
fd422c954e cql3: ensure retrieval of columns for filtering
When a query that needs filtering is executed, the columns
that the coordinator is filtering by have to be retrieved.The
columns should be retrieved even if they are not used for
ordering or named in the actual select clause.
If the columns are missing from the result set, then any
filtering that restricts the missing column will not take
place.

Fixes #3803

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2018-10-21 08:41:46 +03:00
Eliran Sinvani
3e036e2c8c cql3: refactor find_idx to be part of statement restrictions object
find_idx calculates the index that will be used in the statement if
indexes are to be used. In the static form it requires redundant
information (the schema is already contained within the statement
restrictions object). In addition find_idx will need to be used for
filtering in order not to include redundant selectors in the selection
objects. This change refactors find_idx to run under the statement
restrictions object and changes it's scope from private to public.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2018-10-21 08:40:24 +03:00
Eliran Sinvani
4496086bf1 cql3: add prefix size common functionality to all clustering restrictions
Up untill now, knowing the prefix size, which is used to determine
if a filtering is needed was implemented only for a single column
clustering restrictions. The patch adds a function to calculate the
prefix size for all types of clustering key restrictions given the
schema.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2018-10-21 08:39:57 +03:00
Vlad Zolotarov
a87c11bad2 storage_proxy::query_result_local: create a single tracing span on a replica shard
Every call of a tracing::global_trace_state_ptr object instead of a
tracing::tracing_state_ptr or a call to tracing::global_trace_state_ptr::get()
creates a new tracing session (span) object.

This should never be done unless query handling moves to a different shard.

Fixes #3862

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20181018003500.10030-1-vladz@scylladb.com>
2018-10-19 16:47:17 +00:00
Tomasz Grabiec
fc37b80d24 Merge "Correctly handle dropped columns in SSTable 3" from Piotr J.
Previously we were making assumptions about missing columns
(the size of its value, whether it's a collection or a counter) but
they didn't have to be always true. Now we're using column type
from serialization header to use the right values.

Fixes #3859

* seastar-dev.git haaawk/projects/sstables-30/handling-dropped-columns/v4:
  sstables 3: Correctly handle dropped columns in column_translation
  sstables 3: Add test for dropped columns handling
2018-10-19 16:47:17 +00:00
Duarte Nunes
3a53b3cebc Merge 'hinted handoff: add manager::state and split storing and replaying enablement' from Vlad
"
Refs #3828
(Probably fixes it)

We found a few flaws in a way we enable hints replaying.
First of all it was allowed before manager::start() is complete.
Then, since manager::start() is called after messaging_service is
initialized there was a time window when hints are rejected and this
creates an issue for MV.

Both issues above were found in the context of #3828.

This series fixes them both.

Tested {release}:
dtest: materialized_views_test.py:TestMaterializedViews.write_to_hinted_handoff_for_views_test
dtest: hintedhandoff_additional_test.py
"

* 'hinted_handoff_dont_create_hints_until_started-v1' of https://github.com/vladzcloudius/scylla:
  hinted handoff: enable storing hints before starting messaging_service
  db::hints::manager: add a "started" state
  db::hints::manager: introduce a _state
2018-10-19 16:47:16 +00:00
Avi Kivity
1ce52d5432 locator: fix abstract_replication_strategy::get_ranges() and friends violating sort order
get_ranges() is supposed to return ranges in sorted order. However, a35136533d
broke this and returned the range that was supposed to be last in the second
position (e.g. [0, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9]). The broke cleanup, which
relied on the sort order to perform a binary search. Other users of the
get_ranges() family did not rely on the sort order.

Fixes #3872.
Message-Id: <20181019113613.1895-1-avi@scylladb.com>
2018-10-19 16:47:12 +00:00
Vlad Zolotarov
aca0882a3f hinted handoff: enable storing hints before starting messaging_service
When messaging_service is started we may immediately receive a mutation
from another node (e.g. in the MV update context). If hinted handoff is not
ready to store hints at that point we may fail some of MV updates.

We are going to resolve this by start()ing hints::managers before we
start messaging_service and blocking hints replaying until all relevant
objects are initialized.

Refs #3828

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-18 16:49:58 -04:00
Vlad Zolotarov
cff4186517 db::hints::manager: add a "started" state
Hinting is allowed after "started" before "stopping".
Hints that attempted to be stored outside this time frame are going to
be dropped.

Refs #3828

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-18 16:41:36 -04:00
Vlad Zolotarov
fb513a4b23 db::hints::manager: introduce a _state
Introduce a multi-bit state field. In this patch it replaces the _stopping
boolean. We are going to add more states in the following patches.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-18 16:41:33 -04:00
Piotr Jastrzebski
e94254b563 sstables 3: Add test for dropped columns handling
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-18 19:13:58 +02:00
Piotr Jastrzebski
cafb3dc2ae sstables 3: Correctly handle dropped columns in column_translation
Previously we were making assumptions about missing columns
(the size of its value, whether it's a collection or a counter) but
they didn't have to be always true. Now we're using column type
from serialization header to use the right values.

Fixes #3859

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-18 19:13:44 +02:00
Eliran Sinvani
ded3a03356 cql3: rename selection metadata manipulation functions
In the past the addition of non serializable columns was being used
only for post ordering of result sets.The newly added ALLOW FILTERING
feature will need to use these functions to other post processing operations
i.e filtering. The renaming accounts for the new and existing uses for the
function.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2018-10-18 17:52:04 +03:00
Avi Kivity
472afea6cd Update seastar submodule
* seastar 4669469...d152f2d (5):
  > build: don't link with libgcc_s explicitly
  > scheduling: add std::hash<seastar::scheduling_group>
  > prometheus: Allow preemption between each metric
  > Merge "improve memory detection in containers" from Juliana
  > Merge "perf_tests: produce json reports" from Paweł
2018-10-18 14:55:18 +03:00
Duarte Nunes
7610cedc34 Merge "db/hints: Expose current backlog" from Duarte
"
Hints are stored on disk by a hints::manager, ensuring they are
eventually sent. A hints::resource_manager ensures the hints::managers
it tracks don't consume more than their allocated resources by
monitoring disk space and disabling new hints if needed. This series
fixes some bugs related to the backlog calculation, but mainly exposes
the backlog through a hints::manager so upper layers can apply flow
control.

Refs #2538
"

* 'hh-manager-backlog/v3' of https://github.com/duarten/scylla:
  db/hints/manager: Expose current backlog
  db/hints/manager: Move decision about blocking hints to the manager
  db/hints/resource_manager: Correctly account resources in space_watchdog
  db/hints/resource_manager: Replace timer with seastar::thread
  db/hints/resource_manager: Ensure managers are correctly registered
  db/hints/resource_manager: Fix formatting
  db/hints: Disallow moving or copying the managers
2018-10-16 20:35:34 +01:00
Duarte Nunes
624472d16a db/hints/manager: Expose current backlog
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:35:00 +01:00
Duarte Nunes
6dcb7a39d4 db/hints/manager: Move decision about blocking hints to the manager
The space_watchdog enables or disables hints for the managers
associated with a particular device. We encapsulate this decision
inside the hints::managers by introducing the update_backlog()
function.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:35:00 +01:00
Duarte Nunes
207c9c8e38 db/hints/resource_manager: Correctly account resources in space_watchdog
A db::hints::resource_manager manages the resources for one or two
db::hints::managers. Each of these can be using the same or different
devices. The db::hints::space_watchdog periodically checks whether
each manager is within their resource allocation, and if not disables
it.

The watchdog iterates over the managers and accounts for the total
size they are using. This is wrong, since it can account in the same
variable the size consumed by managers using different devices.

We fix this while taking advantage of the fact that on_timer is now
called in the context of a seastar::thread, instead of using future
combinators.

Fixes #3821

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:34:54 +01:00
Duarte Nunes
25d266bdc1 db/hints/resource_manager: Replace timer with seastar::thread
Will make on_timer() much simpler to allow fixing a bug in subsequent
patches.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:32:16 +01:00
Duarte Nunes
278aa13bb0 db/hints/resource_manager: Ensure managers are correctly registered
Registering a manager for a new device used
std::unordered_map::emplace(), which may not insert the specified
value if one with the same key has already been added. This could
happen if both managers were using the same device and the fiber
deferred in-between adding them.

Found during code reading. Could cause hints to not be disabled for an
overloaded manager.

Fixes #3822

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:32:16 +01:00
Duarte Nunes
9e3b09cf48 db/hints/resource_manager: Fix formatting
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:32:16 +01:00
Duarte Nunes
622ac734da db/hints: Disallow moving or copying the managers
Disable the copy and move ctors and assignment operators for both the
hints::manager and the hints::resource_manager.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:32:16 +01:00
Glauber Costa
7edae5421d sstables: print sstable path in case of an exception
Without that, we don't know where to look for the problems

Before:

 compaction failed: sstables::malformed_sstable_exception (Too big ttl: 3163676957)

After:

 compaction_manager - compaction failed: sstables::malformed_sstable_exception (Too big ttl: 4294967295 in sstable /var/lib/scylla/data/system_traces/events-8826e8e9e16a372887533bc1fc713c25/mc-832-big-Data.db)

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181016181004.17838-1-glauber@scylladb.com>
2018-10-16 20:31:20 +01:00
Asias He
7f826d3343 streaming: Expose reason for streaming
On receiving a mutation_fragment or a mutation triggered by a streaming
operation, we pass an enum stream_reason to notify the receiver what
the streaming is used for. So the receiver can decide further operation,
e.g., send view updates, beyond applying the streaming data on disk.

Fixes #3276
Message-Id: <f15ebcdee25e87a033dcdd066770114a499881c0.1539498866.git.asias@scylladb.com>
2018-10-15 22:03:28 +01:00
Benny Halevy
7eef527769 handle both special token_kinds in dht::tri_compare
Handle the before_all_keys and after_all_keys token_kind
at the highest layer before calling into the virtual
i_partitioner::tri_compare that is not set up to handle these cases.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181015165612.29356-1-bhalevy@scylladb.com>
2018-10-15 20:00:54 +03:00
Glauber Costa
51906f7144 compactions: log tokens that we decide not to write down to an SSTable
May be important when debugging issues related to cleanups

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181015162643.7834-1-glauber@scylladb.com>
2018-10-15 19:28:00 +03:00
Vladimir Krivopalov
092276b13d sstables: Reset opened range tombstone when moving to another partition.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <f6dc6b0bd88ca44f2ef84c2a8bee43fde82c89cc.1539396572.git.vladimir@scylladb.com>
2018-10-14 11:20:11 +03:00
Vladimir Krivopalov
926b6430fd sstables: Factor out code resetting values for a new partition.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <83a3a4ce6942b036be447bcfeb66142828e75293.1539396572.git.vladimir@scylladb.com>
2018-10-14 11:20:10 +03:00
Glauber Costa
98332de268 api: use longs instead of ints for snapshot sizes
Int types in json will be serialized to int types in C++. They will then
only be able to handle 4GB, and we tend to store more data than that.

Without this patch, listsnapshots is broken in all versions.

Fixes: #3845

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181012155902.7573-1-glauber@scylladb.com>
2018-10-12 21:17:24 +03:00
Tomasz Grabiec
b89556512a Merge "Enable sstable_mutation_test with SSTables 3.x." from Vladimir
Introduce uppermost_bound() method instead of upper_bound() in mutation_fragment_filter and clustering_ranges_walker.

For now, this has been only used to produce the final range tombstone
for sliced reads inside consume_partition_end().

Usage of the upper bound of the current range causes problems of two
kinds:
    1. If not all the slicing ranges have been traversed with the
    clustering range walker, which is possible when the last read
    mutation fragment was before some of the ranges and reading was limited
    to a specific range of positions taken from index, the emitted range
    tombstone will not cover the untraversed slices.

    2. At the same time, if all ranges have been walked past, the end
    bound is set to after_all_clustered_rows and the emitted RT may span
    more data than it should.

To avoid both situations, the uppermost bound is used instead, which
refers to the upper bound of the last range in the sequence.

* github.com/scylladb/seastar-dev.git haaawk/projects/sstables-30/enable-mc-with-sstable-mutation-test/v2
  sstables: Use uppermost_bound() instead of upper_bound() in
    mutation_fragment_filter.
  tests: Enable sstable_mutation_test for SSTables 'mc' format.

Rebased by Piotr J.
2018-10-12 15:14:17 +02:00
Vladimir Krivopalov
5b03fe7982 tests: Enable sstable_mutation_test for SSTables 'mc' format.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-12 14:18:15 +02:00
Vladimir Krivopalov
199dc9d5a7 sstables: Use uppermost_bound() instead of upper_bound() in mutation_fragment_filter.
For now, this has been only used to produce the final range tombstone
for sliced reads inside consume_partition_end().

Usage of the upper bound of the current range causes problems of two
kinds:
    1. If not all the slicing ranges have been traversed with the
    clustering range walker, which is possible when the last read
    mutation fragment was before some of the ranges and reading was limited
    to a specific range of positions taken from index, the emitted range
    tombstone will not cover the untraversed slices.

    2. At the same time, if all ranges have been walked past, the end
    bound is set to after_all_clustered_rows and the emitted RT may span
    more data than it should.

To avoid both situations, the uppermost bound is used instead, which
refers to the upper bound of the last range in the sequence.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-12 14:18:15 +02:00
Tomasz Grabiec
193efef950 Merge "Make SST3 pass test_clustering_slices test" from Piotr
* seastar-dev.git haaawk/sst3/test_clustering_slices/v8:
  sstables: Extract on_end_of_stream from consume_partition_end
  sstables: Don't call consume_range_tombstone_end in
    consume_partition_end
  sstables: Change the way fragments are returned from consumer
2018-10-12 14:11:51 +02:00
Piotr Jastrzebski
1a6cef80f0 sstables: Change the way fragments are returned from consumer
Split range tombstone (if present) on every consume_row_end call
and store both range tombstone and row in different fields called
_stored_row and _stored_tombstone instead of using single field
called _stored.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-12 13:51:39 +02:00
Piotr Jastrzebski
3109c94c84 sstables: Don't call consume_range_tombstone_end in consume_partition_end
We don't need to check _opened_range_tombstone and _mf_filter again

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-12 13:51:28 +02:00
Piotr Jastrzebski
7dcea660e8 sstables: Extract on_end_of_stream from consume_partition_end
The new function will be called when the stream of data is finished
while old consume_partition_end will be called when partition
is finished but stream is not done yet.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-12 13:50:52 +02:00
Piotr Jastrzebski
717cb2a9e7 sstables: Adopt test_clustering_slices test for SST3
Readers for SST3 return a bit more precise range tombstones
when reader is slicing. Namely, SST2 readers return whole
range tombstones that overlap with slicing range but SST3
trim those range tombstones to slicing range.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-11 15:47:47 +02:00
Tomasz Grabiec
a7a14e3af2 Merge "Handle dead row markers when writing to SSTables 3.x" from Vladimir
There is a mismatch between row markers used in SSTables 2.x (ka/la) and
liveness_info used by SSTables 3.x (mc) in that a row marker can be
written as a deleted cell but liveness_info cannot.

To handle this, for a dead row marker the corresponding liveness_info is
written as expiring liveness_info with a fake TTL set to 1.
This approach is adapted from the solution for CASSANDRA-13395 that
exercised similar issue during SSTables upgrades.

* github.com/argenet/scylla.git projects/sstables-30/dead-row-marker/v7:
  sstables: Introduce TTL limitation and special 'expired TTL' value.
  sstables: Write dead row marker as expired liveness info.
  tests: Add test covering dead row marker writing to SSTables 3.x.
2018-10-11 10:58:57 +02:00
Gleb Natapov
ceb361544a stream_session: remove unused capture
'Consumer function' parameter for distribute_reader_and_consume_on_shards()
captures schema_ptr (which is a seastar::shared_ptr), but the function
is later copied on another shard at which point schema_ptr is also copied
and its counter is incremented by the wrong shard. The capture is not
even used, so lets just drop it.

Fixes #3838

Message-Id: <20181011075500.GN14449@scylladb.com>
2018-10-11 11:10:58 +03:00
Botond Dénes
23f3831aaf table::make_streaming_reader(): add forwarding parameter
The single-range overload, when used by
make_multishard_streaming_reader(), has to create a reader that is
forwardable. Otherwise the multishard streaming reader will not produce
any output as it cannot fast-forward its shard readers to the ranges
produced by the generator.

Also add a unit test, that is based on the real-life purpose the
multishard streaming reader was designed for - serving partition
from a shard, according to a sharding configuration that is different
than the local one. This is also the scenario that found the buf in the
first place.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <bf799961bfd535882ede6a54cd6c4b6f92e4e1c1.1539235034.git.bdenes@scylladb.com>
2018-10-11 10:59:18 +03:00
Vlad Zolotarov
5b12ec441d db::hints::manager: use "streaming" I/O scheduling class for reads
Make sure that read I/O in the context of HH sending do not overpower I/O
in the context of queries, memtable flushes or compactions.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-10 15:22:43 -04:00
Vlad Zolotarov
a89188de07 commitlog::read_log_file(): set the a read I/O priority class explicitly
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-10 15:22:43 -04:00
Vlad Zolotarov
629972d586 db::hints::manager: add hints sender to the "streaming" CPU scheduling group
Make sure that HH sends do not overpower (CPU wise) regular WRITEs flow.

Fixes #3817

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-10 15:22:43 -04:00
Vladimir Krivopalov
9a04200b03 tests: Add test covering dead row marker writing to SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-10 11:44:54 -07:00
Vladimir Krivopalov
9c773fa6cf sstables: Write dead row marker as expired liveness info.
This allows to distinguish expired liveness info from yet-to-expire one
and convert it into a dead row marker on read.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-10 11:44:14 -07:00
Vladimir Krivopalov
e71cc5ab20 sstables: Introduce TTL limitation and special 'expired TTL' value.
This allows to store expired liveness info in SSTables 3.x format
without introducing a possible conflict with real TTL values.

As per Cassandra, TTL cannot exceed 20 years so taking the maximum value
as a special value for indicating expired liveness info is safe.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-10 11:44:14 -07:00
Calle Wilund
3cb50c861d messaging_service: Make rpc streaming sink respect tls connection
Fixes #3787

Message service streaming sink was created using direct call to
rpc::client::make_sink. This in turn needs a new socker, which it
creates completely ignoring what underlying transport is active for the
client in question.

Fix by retaining the tls credential pointer in the client wrapper, and
using this in a sink method to determine whether to create a new tls
socker, or just go ahead with a plain one.

Message-Id: <20181010003249.30526-1-calle@scylladb.com>
2018-10-10 12:55:28 +03:00
Avi Kivity
1891779e64 Merge "db/hints: Use frozen_mutation in hinted handoff" from Duarte
"
This series changes hinted handoff to work with `frozen_mutation`s
instead of naked `mutation`s. Instead of unfreezing a mutation from
the commitlog entry and then freezing it again for sending, now we'll
just keep the read, frozen mutation.

Tests: unit(release)
"

* 'hh-manager-cleanup/v1' of https://github.com/duarten/scylla:
  db/hints/manager: Use frozen_mutation instead of mutation
  db/hints/manager: Use database::find_schema()
  db/commitlog/commitlog_entry: Allow moving the contained mutation
  service/storage_proxy: send_to_endpoint overload accepting frozen_mutation
  service/storage_proxy: Build a shared_mutation from a frozen_mutation
  service/storage_proxy: Lift frozen_mutation_and_schema
  service/storage_proxy: Allow non-const ranges in mutate_prepare()
2018-10-09 17:48:18 +03:00
Piotr Sarna
a93d27960c tests: add secondary index paging unit test case
A simple case for SI paging is added to secondary_index_test suite.
This commit should be followed by more complex testing
and serves as an example on how to extract paging state and use it
across CQL queries.
Message-Id: <b22bdb5da1ef8df399849a66ac6a1f377e6a650a.1539090350.git.sarna@scylladb.com>
2018-10-09 15:05:20 +01:00
Avi Kivity
cfab7a2be6 Update seastar submodule
* seastar ed44af8...4669469 (2):
  > prometheus: Fix histogram text representation
  > reactor: count I/O errors

Fixes #3827.
2018-10-09 16:36:47 +03:00
Gleb Natapov
319ece8180 storage_proxy: do not pass write_stats down to send_to_live_endpoints
write_stats is referenced from write handler which is available in
send_to_live_endpoints already. No need to pass it down.

Message-Id: <20181009133017.GA14449@scylladb.com>
2018-10-09 16:33:53 +03:00
Botond Dénes
d467b518bc multishard_mutation_query(): don't attempt to stop broken readers
Currently, when stopping a reader fails, it simply won't be attempted to
be saved, and it will be left in the `_readers` array as-is. This can
lead to an assertion failure as the reader state will contain futures
that were already waited upon, and that the cleanup code will attempt to
wait on again. To prevent this, when stopping a reader fails, reset it
to nonexistent state, so that the cleanup code doesn't attempt to do
anything with it.

Refs: #3830

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <a1afc1d3d74f196b772e6c218999c57c15ca05be.1539088164.git.bdenes@scylladb.com>
2018-10-09 15:59:50 +03:00
Gleb Natapov
207b57a892 storage_proxy: count number of timed out write attempts after CL is reached
It is useful to have this counter to investigate the reason for read
repairs. Non zero value means that writes were lost after CL is reached
and RR is expected.

Message-Id: <20181009120900.GF22665@scylladb.com>
2018-10-09 15:17:07 +03:00
Piotr Sarna
b3685342a6 service/pager: avoid dereferencing null partition key
The pager::state() function returns a valid paging object even
if the pager itself is exhausted. It may also not contain the partition
key, so using it unconditionally was a bug - now, in case there is no
partition key present, paging state will contain an empty partition key.

Fixes #3829

Message-Id: <28401eb21ab8f12645c0a33d9e92ada9de83e96b.1539074813.git.sarna@scylladb.com>
2018-10-09 12:13:52 +03:00
Botond Dénes
4bb0bbb9e2 database: add make_multishard_streaming_reader()
Creates a streaming reader that reads from all shards. Shard readers are
created with `table::make_streaming_reader()`.
This is needed for the new row-level repair.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4b74c710bed2ef98adf07555a4c841e5b690dd8c.1538470782.git.bdenes@scylladb.com>
2018-10-09 11:07:47 +03:00
Botond Dénes
3eeb6fbd23 table::make_streaming_reader(): add single-range overload
This will be used by the `make_multishard_streaming_reader()` in the
next patch. This method will create a multishard combining reader which
needs its shard readers to take a single range, not a vector of ranges
like the existing overload.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <cc6f2c9a8cf2c42696ff756ed6cb7949b95fe986.1538470782.git.bdenes@scylladb.com>
2018-10-09 11:07:46 +03:00
Botond Dénes
a56871fab7 tests/multishard_mutation_query_test: test rage-tombstones spanning multiple pages
Extend the existing range-tombstone test, such that range tombstones
span multiple pages worth of rows.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <583aa826ea12118289b08d483b55b5573d27e1ee.1539002810.git.bdenes@scylladb.com>
2018-10-09 10:18:28 +03:00
Vladimir Krivopalov
e9aba6a9c3 sstables: Add missing 'mc' format into format strings map in sstable::filename().
Fixes #3832.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <269421fb2ac8ab389231cbe9ed501da7e7ff936a.1539048008.git.vladimir@scylladb.com>
2018-10-09 10:07:08 +03:00
Asias He
8edf3defdf range_streamer: Futurize add_ranges
It might take long time for get_all_ranges_with_sources_for and
get_all_ranges_with_strict_sources_for to calculate which cause reactor
stall. To fix, run them in a thread and yield. Those functions are used in
the slow path, it is ok to yield more than needed.

Fixes #3639

Message-Id: <63aa7794906ac020c9d9b2984e1351a8298a249b.1536135617.git.asias@scylladb.com>
2018-10-09 09:46:50 +03:00
Nadav Har'El
b8668dc0f8 materialized views: refuse to filter by non-key column
A materialized views can provide a filter so as to pick up only a subset
of the rows from the base table. Usually, the filter operates on columns
from the base table's primary key. If we use a filter on regular (non-key)
columns, things get hairy, and as issue #3430 showed, wrong: merely updating
this column in the base table may require us to delete, or resurrect, the
view row. But normally we need to do the above when the "new view key column"
was updated, when there is one. We use shadowable tombstones with one
timestamp to do this, so it cannot take into account the two timestamp from
those two columns (the filtered column and the new key column).

So in the current code, filtering by a non-key column does not work correctly.
In this patch we provide two test cases (one involving TTLs, and one involves
only normal updates), which demonstrate vividly that it does *not* work
correctly. With normal updates, trying to resurect a view row that has
previously disappeared, fails. With TTLs, things are even worse, and the view
row fails to disappear when the filtered column is TTLed.

In Cassandra, the same thing doesn't work correctly as well (see
CASSANDRA-13798 and CASSANDRA-13832) so they decided to refuse creating
a materialized view filtering a non-key column. In this patch we also
do this - fail the creation of such an unsupported view. For this reason,
the two tests mentioned above are commented out in a "#if", with, instead,
a trivial test verifying a failure to create such a view.

Note that as explained above, when the filtered column and new view key
column are *different* we have a problem. But when they are the *same* - namely
we filter by a non-key base column which actually *is* a key in the view -
we are actually fine. This patch includes additional test cases verifying
that this case is really fine and provides correct results. Accordingly,
this case is *not* forbidden in the view creation code.

Fixes #3430.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181008185633.24616-1-nyh@scylladb.com>
2018-10-08 20:37:11 +01:00
Avi Kivity
0fa60660b8 Merge "Fix mutation fragments clobbering on fast_forward" from Vladimir
"
This patchset fixes a bug in SSTables 3.x reading when fast-forwarding
is enabled. It is possible that a mutation fragment, row or RT marker,
is read and then stored because it falls outside the current
fast-forwarding range.

If the reader is further fast-forwarded but the
row still falls outside of it, the reader would still continue reading
and get the next fragment, if any, that would clobber the currently
stored one. With this fix, the reader does not attempt to read on
after storing the current fragment.

Tests: unit {release}
"

* 'projects/sstables-30/row-skipped-on-double-ff/v2' of https://github.com/argenet/scylla:
  tests: Add test for reading rows after multiple fast-forwarding with SSTables 3.x.
  sstables: mp_row_consumer_m to notify reader on end of stream when storing a mutation fragment.
  sstables: In mp_row_consumer_m::push_mutation_fragments(), return the called helper's value.
2018-10-08 20:18:42 +03:00
Vladimir Krivopalov
07d61683b6 tests: Add test for reading rows after multiple fast-forwarding with SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-08 09:09:33 -07:00
Botond Dénes
d0eb443913 result_memory_accounter: drop state_for_another_shard()
This is not used since range-scans were refactored (e49a14e30) as part
of making them stateful.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <589f30163e29299e840750457919214a26f0da93.1539005336.git.bdenes@scylladb.com>
2018-10-08 14:29:48 +01:00
Duarte Nunes
48ebe6552c Merge 'Fix issues with endpoint state replication to other shards' from Tomasz
Fixes #3798
Fixes #3694

Tests:

  unit(release), dtest([new] cql_tests.py:TruncateTester.truncate_after_restart_test)

* tag 'fix-gossip-shard-replication-v1' of github.com:tgrabiec/scylla:
  gms/gossiper: Replicate enpoint states in add_saved_endpoint()
  gms/gossiper: Make reset_endpoint_state_map() have effect on all shards
  gms/gossiper: Replicate STATUS change from mark_as_shutdown() to other shards
  gms/gossiper: Always override states from older generations
2018-10-08 14:19:19 +01:00
Avi Kivity
4b16867bd7 cql: relax writetime/ttl selections of collections
writetime() or ttl() selections of non-frozen collections can work, as they
are single cells. Relax the check to allow them, and only forbid non-frozen
collections.

Fixes #3825.

Tests: cql_query_test (release).
Message-Id: <20181008123920.27575-1-avi@scylladb.com>
2018-10-08 14:07:01 +01:00
Duarte Nunes
56e36ee14b flat_mutation_reader: Use std::move(range) in move_buffer_content_to()
Instead of open coding it.

Tests: unit(release)

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181008104328.13164-1-duarte@scylladb.com>
2018-10-08 13:57:13 +03:00
Avi Kivity
474bb4e44f cql: functions: implement min/max/count for bytes type
Uncomment existing declare() calls and implement tests. Because the
data_value(bytes) constructor is explicit, we add explicit conversion to
data_value in impl_min_function_for<> and impl_max_function_for<>.

Fixes #3824.
Message-Id: <20181008084127.11062-1-avi@scylladb.com>
2018-10-08 10:48:30 +01:00
Takuya ASADA
d89114d1fc dist/debian: install GPG key for cross-building
We found on some Debian environment Ubuntu .deb build fails with
gpg error because lack of Ubuntu GPG key, so we need to install it before
start pbuilder.
Same as on Ubuntu, it needs to install Debian GPG key.

Fixes #3823

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181008072246.13305-1-syuu@scylladb.com>
2018-10-08 10:43:25 +03:00
Botond Dénes
b01050e28c HACKING.md: add link to the scylla-dev mailing list
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <9a5d967f791d7a0db584864f68f93bbc68f52372.1538977773.git.bdenes@scylladb.com>
2018-10-08 10:06:50 +03:00
Duarte Nunes
74d809f8be db/hints/manager: Use frozen_mutation instead of mutation
Instead of unfreezing a mutation from the commitlog and then freezing
it again to send, just keep the read frozen mutation.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-07 19:57:30 +01:00
Duarte Nunes
6eec9748fc db/hints/manager: Use database::find_schema()
Instead of using find_column_family() and repeatedly asking for
column_family::schema(), use database::find_schema() instead.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-07 19:57:30 +01:00
Duarte Nunes
5b3d08defc db/commitlog/commitlog_entry: Allow moving the contained mutation
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-07 19:57:30 +01:00
Duarte Nunes
3b6d2286e9 service/storage_proxy: send_to_endpoint overload accepting frozen_mutation
Add an overload to send_to_endpoint() which accepts a frozen_mutation.
The motivation is to allow better accounting of pending view updates,
but this change also allows some callers to avoid unfreezing already
frozen mutations.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-07 19:37:39 +01:00
Duarte Nunes
c7639f53e0 service/storage_proxy: Build a shared_mutation from a frozen_mutation
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-07 19:27:29 +01:00
Duarte Nunes
9e14412528 service/storage_proxy: Lift frozen_mutation_and_schema
Lift frozen_mutation_and_schema to frozen_mutation.hh, since other
subsystems using frozen_mutations will likely want to pass it around
together with the schema.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-07 19:27:29 +01:00
Duarte Nunes
2c739f36cc service/storage_proxy: Allow non-const ranges in mutate_prepare()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-07 19:27:29 +01:00
Avi Kivity
1cc81d1492 Update seastar submodule
* seastar 71e914e...ed44af8 (4):
  > Merge "Add semaphore_units<>::split() function" from Duarte
  > scheduling: introduce destroy_scheduling_group()
  > tls: include "api.hh" for listen_options
  > rpc: connection-level resource isolation
2018-10-07 20:45:49 +03:00
Duarte Nunes
4162bff37a Merge 'cql3: allow adding or dropping multiple columns in ALTER TABLE statement' from Benny
"
This patchset implements ALTER TABLE ADD/DROP for multiple columns.

Fixes: #2907
Fixes: #3691

Tests: schema_change_test
"

* 'projects/cql3/alter-table-multi/v3' of https://github.com/bhalevy/scylla:
  cql3: schema_change_test: add test_multiple_columns_add_and_drop
  cql3: allow adding or dropping multiple columns in ALTER TABLE statement
  cql3: alter_table_statement: extract add/alter/drop per-column code into functions
  cql3: testing for MVs for alter_table_statement::type::drop is not per column
  cql3: schema_change_test: add test_static_column_is_dropped
2018-10-07 17:30:09 +01:00
Benny Halevy
0f350f5d59 cql3: schema_change_test: add test_multiple_columns_add_and_drop
Add a unit test for adding or dropping multiple columns.
See https://github.com/scylladb/scylla/issues/2907

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-07 19:14:29 +03:00
Benny Halevy
23fecc7e5e cql3: allow adding or dropping multiple columns in ALTER TABLE statement
Fixes #2907
Fixes #3691

See Cassandra reference: https://apache.googlesource.com/cassandra/+/cassandra-3.6/src/antlr/Parser.g
/**
 * ALTER COLUMN FAMILY <CF> ALTER <column> TYPE <newtype>;
 * ALTER COLUMN FAMILY <CF> ADD <column> <newtype>; | ALTER COLUMN FAMILY <CF> ADD (<column> <newtype>,<column1> <newtype1>..... <column n> <newtype n>)
 * ALTER COLUMN FAMILY <CF> DROP <column>; | ALTER COLUMN FAMILY <CF> DROP ( <column>,<column1>.....<column n>)
 * ALTER COLUMN FAMILY <CF> WITH <property> = <value>;
 * ALTER COLUMN FAMILY <CF> RENAME <column> TO <column>;
 */
alterTableStatement returns [shared_ptr<alter_table_statement> expr]
    @init {
        alter_table_statement::type type;
        auto props = make_shared<cql3::statements::cf_prop_defs>();
        std::vector<alter_table_statement::column_change> column_changes;
        std::vector<std::pair<shared_ptr<cql3::column_identifier::raw>, shared_ptr<cql3::column_identifier::raw>>> renames;
    }
    : K_ALTER K_COLUMNFAMILY cf=columnFamilyName
          ( K_ALTER id=cident K_TYPE v=comparatorType { type = alter_table_statement::type::alter; column_changes.emplace_back(id, v); }
          | K_ADD                                     { type = alter_table_statement::type::add; }
            (         id1=cident  v1=comparatorType  b1=cfisStatic { column_changes.emplace_back(id1, v1, b1); }
            | '('     id1=cident  v1=comparatorType  b1=cfisStatic { column_changes.emplace_back(id1, v1, b1); }
                 (',' idn=cident  vn=comparatorType  bn=cfisStatic { column_changes.emplace_back(idn, vn, bn); } )* ')'
            )

          | K_DROP  id=cident                         { type = alter_table_statement::type::drop; column_changes.emplace_back(id); }
          | K_WITH  properties[props]                 { type = alter_table_statement::type::opts; }
          | K_RENAME                                  { type = alter_table_statement::type::rename; }
               id1=cident K_TO toId1=cident { renames.emplace_back(id1, toId1); }
               ( K_AND idn=cident K_TO toIdn=cident { renames.emplace_back(idn, toIdn); } )*
          )
    {
        $expr = ::make_shared<alter_table_statement>(std::move(cf), type, std::move(column_changes), std::move(props), std::move(renames));
    }
    ;

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-07 19:14:26 +03:00
Benny Halevy
3fa6d3d3a8 cql3: alter_table_statement: extract add/alter/drop per-column code into functions
In preparation to supporting ALTER TABLE with multiple columns (#3691)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-07 18:57:06 +03:00
Alexys Jacob
eebbae066a dist/common/scripts/scylla_setup: fix gentoo linux installed package detection
return code is expected to be 0 when installed package was found

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181002123433.4702-1-ultrabug@gentoo.org>
2018-10-07 16:46:02 +03:00
Alexys Jacob
850d046551 dist/common/scripts/scylla_ntp_setup: fix gentoo linux systemd service name
fix typo as ntpd package systemd service is named ntpd, not sntpd

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181002123802.5576-1-ultrabug@gentoo.org>
2018-10-07 16:46:01 +03:00
Alexys Jacob
54151d2039 dist/common/scripts/scylla_cpuscaling_setup: fix file open mode for writing
gentoo linux part tries to open the configuration file without the
write flag, leading to an exception

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181002123957.6010-1-ultrabug@gentoo.org>
2018-10-07 16:46:00 +03:00
Avi Kivity
700994a4f2 Merge "Add GDB commands for examining gossiper and RPC state" from Tomasz
* 'gdb-gms-netw' of github.com:tgrabiec/scylla:
  gdb: Introduce 'scylla netw' command
  gdb: Introduce 'scylla gms' command
  gdb: Add sharded service wrapper
  gdb: Add unique_ptr wrapper
  gdb: Add list_unordered_set()
  gdb: Make std_vector wrapper indexable
  gdb: Add wrapper for std_map
2018-10-07 16:42:52 +03:00
Vlad Zolotarov
7cbe5f2983 service: priority_manager.hh: add #pragma once
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20181005040552.2183-3-vladz@scylladb.com>
2018-10-07 16:04:26 +03:00
Duarte Nunes
30d6ed8f92 service/storage_proxy: Consider target liveness in sent_to_endpoint()
So we don't attempt to send mutations to unreachable endpoints and
instead store a hint for them, we now check the endpoint status and
populate dead_endpoints accordingly in
storage_proxy::send_to_endpoint().

Fixes #3820

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181007100640.2182-1-duarte@scylladb.com>
2018-10-07 16:04:26 +03:00
Benny Halevy
581b9006d4 cql3: testing for MVs for alter_table_statement::type::drop is not per column
No column can be dropped from a table with materialized views
so the respective exception can ignore and omit the dropped column name.

In preparation for refactoring the respective code, moving the per-column
code to member functions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-07 15:16:32 +03:00
Benny Halevy
8d298064b1 cql3: schema_change_test: add test_static_column_is_dropped
Test dropping of static column defined in CREATE TABLE, and
adding and dropping of a static column using ALTER TABLE.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-07 14:34:28 +03:00
Duarte Nunes
a69d468101 service/storage_proxy: Fix formatting of send_to_endpoint()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181006204756.32232-1-duarte@scylladb.com>
2018-10-07 11:05:32 +03:00
Vladimir Krivopalov
9db124c6e5 sstables: mp_row_consumer_m to notify reader on end of stream when storing a mutation fragment.
Without it, the reader will attempt to read further and may clobber the
stored fragment with the next one read, if any.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-05 19:09:09 -07:00
Vladimir Krivopalov
8e004684e9 sstables: In mp_row_consumer_m::push_mutation_fragments(), return the called helper's value.
Instead of blindly proceeding, use whatever the call to maybe_push_*()
has returned.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-05 19:05:03 -07:00
Duarte Nunes
b839f551cf cql3/statements/select_statement: Don't double count unpaged queries
Unpaged queries are those for which the client didn't enable paging,
and we already account for them in
indexed_table_select_statement::do_execute().

Remove the second increment in read_posting_list().

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181003121811.11750-1-duarte@scylladb.com>
2018-10-05 17:36:39 +02:00
Nadav Har'El
e4ef7fc40a materialized views: enable two tests in view_schema_test
We had two commented out tests based on Cassandra's MV unit tests, for
the case that the view's filter (the "SELECT" clause used to define the
view) filtered by a non-primary-key column. These tests used to fail
because of problems we had in the filtering code, but they now succeed,
so we can enable them. This patch also adds some comments about what
the tests do, and adds a few more cases to one of the tests.

Refs #3430.

However, note that the success of these tests does not really prove that
the non-PK-column filtering feature works fully correctly and that issue
forbidding it, as explained in
https://issues.apache.org/jira/browse/CASSANDRA-13798. We can probably
fix this feature with our "virtual cells" mechanism, but will need to add
a test to confirm the possible problem and its (probably needed fix).
We do not add such a test in this patch.

In the meantime, issue #3430 should remain open: we still *allow* users
to create MV with such a filter, and, as the tests in this patch show,
this "mostly" works correctly. We just need to prove and/or fix what happens
with the complex row liveness issues a la issue #3362.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181004213637.32330-1-nyh@scylladb.com>
2018-10-04 22:43:38 +01:00
Tomasz Grabiec
3c7de9fee9 gms/gossiper: Replicate enpoint states in add_saved_endpoint() 2018-10-04 12:54:00 +02:00
Tomasz Grabiec
ddf3a61bcf gms/gossiper: Make reset_endpoint_state_map() have effect on all shards 2018-10-04 12:53:56 +02:00
Tomasz Grabiec
9e3f744603 gms/gossiper: Replicate STATUS change from mark_as_shutdown() to other shards
Lack of this may result in non-zero shards on some nodes still seeing
STATUS as NORMAL for a node which shut down, in some cases.

mark_as_shutdown() is invoked in reaction to an RPC call initiated by
the node which is shutting down. Another way a node can learn about
other node shutting down is via gossiping with a node which knows
this. In that case, the states will be replicated to non-zero
shards. The node which learnt via mark_as_shutdown() may also
eventually propagate this to non-zero shards, e.g. when it gossips
about it with other nodes, and its local version number at the time of
mark_as_shudown() was smaller than the one used to set the STATE by
the shutting down node.
2018-10-04 12:51:42 +02:00
Tomasz Grabiec
c4ec81e126 gms/gossiper: Always override states from older generations
Application states of each node are versioned per-node with a pair of
generation number (more significant) and value version. Generation
number uniquely identifies the life time of a scylla
process. Generation number changes after restart. Value versions start
from 0 on each restart. When a node gets updates for application
states, it merges them with its view on given node. Value updates with
older versions are ignored.

Gossiper processes updates only on shard 0, and replicates value
updates to other shards. When it sees a value with a new generation,
it correclty forgets all previous values. However, non-zero shards
don't forget values from previous generations. As a result,
replication will fail to override the values on non-zero shards when
generation number changes until their value version exceeds the
version prior to the restart.

This will result in incorrect STATUS for non-seed nodes on non-zero
shards.  When restarting a non-seed node, it will do a shadow gossip
round before setting its STATUS to NORMAL. In the shadow round it will
learn from other nodes about itself, and set its STATUS to shutdown on
all shards with a high value version. Later, when it sets its status
to NORMAL, it will override it only on shard 0, because on other
shards the version of STATUS is higher.

This will cause CQL truncate to skip current node if the coordinator
runs on non-zero shards.

The fix is to override the entries on remote shards in the same way we
do on shard 0. All updates to endpoint states should be already
serialized on shard 0, and remote shards should see them in the same
order.

Introduced in 2d5fb9d

Fixes #3798
Fixes #3694
2018-10-04 12:47:27 +02:00
Piotr Sarna
a5570cb288 tests: add missing get() calls in threaded context
One test case missed a few get() calls in order to wait
for continuations, which only accidentally worked,
because it was followed by 'eventually()' blocks.
Message-Id: <69c145575ac81154c4b5f500d01c6b045a267088.1536839959.git.sarna@scylladb.com>
2018-10-04 10:55:45 +01:00
Piotr Sarna
8a2abd45fb tests: add collections test for secondary indexing
Test case regarding creating indexes on collection columns
is added to the suite.

Refs #3654
Refs #2962
Message-Id: <1b6844634b6e9a353028545813571647c92fb330.1536839959.git.sarna@scylladb.com>
2018-10-04 10:55:45 +01:00
Piotr Sarna
2d355bdf47 cql3: prevent creation of indexes on non-frozen collections
Until indexes for non-frozen collections is implemented,
creating such indexes should be disallowed to prevent unnecessary
errors on insertions/selections.

Fixes #3653
Refs #2962
Message-Id: <218cf96d5e38340806fb9446b8282d2296ba5f43.1536839959.git.sarna@scylladb.com>
2018-10-04 10:55:45 +01:00
Duarte Nunes
959559d568 cql3/statements/select_statement: Remove outdated comment
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181003193033.13862-1-duarte@scylladb.com>
2018-10-04 09:45:17 +03:00
Eliran Sinvani
20f49566a2 cql3 : add workaround to antlr3 null dereference bug
The Antlr3 exception class has a null dereference bug that crashes
the system when trying to extract the exception message using
ANTLR_Exception<...>::displayRecognitionError(...) function. When
a parsing error occurs the CqlParser throws an exception which in
turn processesed for some special cases in scylla to generate a custom
message. The default case however, creates the message using
displayRecognitionError, causing the system to crash.
The fix is a simple workaround, making sure the pointer is not null
before the call to the function. A "proper" fix can't be implemented
because the exception class itself is implemented outside scylla
in antlr headers that resides on the host machine os.

Tested manualy 2 testcases, a typo causing scylla to crash and
a cql comment without a newline at the end also caused scylla to crash.
Ran unit tests (release).

Fixes #3740
Fixes #3764

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <cfc7e0d758d7a855d113bb7c8191b0fd7d2e8921.1538566542.git.eliransin@scylladb.com>
2018-10-03 18:30:06 +03:00
Tomasz Grabiec
9c57abcce7 gossiper: Fix shutdown_announce_in_ms not being respected
shutdown_announce_in_ms specifies a period of time that a node which
is shutting down waits to allow its state to propagate to other nodes.
However, we were setting _enabled to false before waiting, which
will make the current node ignore gossip messages.
Message-Id: <1538576996-26283-1-git-send-email-tgrabiec@scylladb.com>
2018-10-03 15:43:00 +01:00
Tomasz Grabiec
fda8e271e3 gdb: Introduce 'scylla netw' command
Prints information about the state of the messaging service layer.

Example:

(gdb) scylla netw
Dropped messages: {0 <repeats 25 times>}
Outgoing connections:
IP: 127.0.0.2, (netw::messaging_service::rpc_protocol_client_wrapper*) 0x6000051cd220:
  stats: {replied = 0, pending = 0, exception_received = 0, sent_messages = 23, wait_reply = 0, timeout = 0}
  outstanding: 0
Server: resources={_count = 85899345, _ex = {_M_exception_object = 0x0}, _wait_list = {_list = {_front_chunk = 0x0, _back_chunk = 0x0, _nchunks = 0, _free_chunks = 0x0, _nfree_chunks = 0}, _on_expiry = {<No data fields>}, _size = 0}}
Incoming connections:
127.0.0.1:28071:
   {replied = 0, pending = 0, exception_received = 0, sent_messages = 2, wait_reply = 0, timeout = 0}
2018-10-03 15:05:22 +02:00
Tomasz Grabiec
cf07cda08f gdb: Introduce 'scylla gms' command
Prints gossiper state. Example:

(gdb) scylla gms
127.0.0.2: (gms::endpoint_state*) 0x6010050c0550 ({_generation = 1538568389, _version = 2147483647})
  gms::application_state::STATUS: {version=18, value="NORMAL,968364964011550971"}
  gms::application_state::LOAD: {version=267, value="494510"}
  gms::application_state::SCHEMA: {version=13, value="27e48f6a-a668-398a-b2f5-cf4b905450e9"}
  gms::application_state::DC: {version=10, value="datacenter1"}
  gms::application_state::RACK: {version=11, value="rack1"}
  gms::application_state::RELEASE_VERSION: {version=4, value="3.0.8"}
  gms::application_state::RPC_ADDRESS: {version=3, value="127.0.0.2"}
  gms::application_state::NET_VERSION: {version=1, value="0"}
  gms::application_state::HOST_ID: {version=2, value="ee281b83-1acb-4aa3-927c-985a7d9a7c6f"}
127.0.0.1: (gms::endpoint_state*) 0x6010051422b0 ({_generation = 1538557402, _version = 0})
  gms::application_state::STATUS: {version=18, value="NORMAL,9176584852507611499"}
  gms::application_state::LOAD: {version=22521, value="409817"}
  gms::application_state::SCHEMA: {version=13, value="27e48f6a-a668-398a-b2f5-cf4b905450e9"}
  gms::application_state::DC: {version=10, value="datacenter1"}
  gms::application_state::RACK: {version=11, value="rack1"}
  gms::application_state::RELEASE_VERSION: {version=4, value="3.0.8"}
  gms::application_state::RPC_ADDRESS: {version=3, value="127.0.0.1"}
  gms::application_state::NET_VERSION: {version=1, value="0"}
  gms::application_state::HOST_ID: {version=2, value="88ff543f-e9b8-42eb-a876-c0f917078a31"}
2018-10-03 15:05:22 +02:00
Tomasz Grabiec
8c6f8b1773 gdb: Add sharded service wrapper 2018-10-03 15:05:22 +02:00
Tomasz Grabiec
4adfed9dba gdb: Add unique_ptr wrapper 2018-10-03 15:05:22 +02:00
Tomasz Grabiec
e29e302272 gdb: Add list_unordered_set() 2018-10-03 15:05:22 +02:00
Tomasz Grabiec
272bc88699 gdb: Make std_vector wrapper indexable 2018-10-03 15:05:22 +02:00
Tomasz Grabiec
b436759d49 gdb: Add wrapper for std_map 2018-10-03 15:05:22 +02:00
Pekka Enberg
de48966abc cql3: Move as_json_function class to separate file
The as_json_function class is not registered as a function, but we can
still keep it cql3/functions, as per its namespace, to reduce the size
of select_statement.cc.
Message-Id: <20181002132637.30233-1-penberg@scylladb.com>
2018-10-03 13:30:08 +01:00
Piotr Sarna
4a23297117 cql3: add asking for pk/ck in the base query
Base query partition and clustering keys are used to generate
paging state for an index query, so they always need to be present
when a paged base query is processed.
Message-Id: <f3bf69453a6fd2bc842c8bdbd602d62c91cf9218.1538568953.git.sarna@scylladb.com>
2018-10-03 13:26:51 +01:00
Piotr Sarna
50d3de0693 cql3: add checking for may_need_paging when executing base query
It's not sufficient to check for positive page_size when preparing
a base query for indexed select statement - may_need_paging() should
be called as well.
Message-Id: <d435820019e4082a64ca9807541f0c9ad334e6a8.1538568953.git.sarna@scylladb.com>
2018-10-03 13:26:51 +01:00
Piotr Sarna
11b8831c04 cql3: move base query command creation to a separate function
Message-Id: <6b48b8cbd6312da4a17bfd3c85af628b4215e9f4.1538568953.git.sarna@scylladb.com>
2018-10-03 13:26:51 +01:00
Avi Kivity
7c8143c3c4 Revert "compaction: demote compaction start/end messages to DEBUG level"
This reverts commit b443a9b930. The compaction
history table doesn't have enough information to be a replacement for this
log message yet.
2018-10-03 13:13:37 +03:00
Avi Kivity
b9702222f8 Merge "Handle simple column type schema changes in SST3" from Piotr
"
This patchset enables very simple column type conversions.
It covers only handling variable and fixed size type differences.
Two types still have to be compatiple on bits level to be able to convert a field from one to the other.
"

* 'haaawk/sst3/column_type_schema_change/v4' of github.com:scylladb/seastar-dev:
  Fix check_multi_schema to actually check the column type change
  Handle very basic column type conversions in SST3
  Enable check_multi_schema for SST3
2018-10-03 13:12:10 +03:00
Piotr Jastrzebski
3a60eac1d5 Fix check_multi_schema to actually check the column type change
Field 'e' was supposed to be read as blob but the test had a bug
and the read schema was treating that field as int. This patch
changes that and makes the test really check column type change.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-03 10:56:40 +02:00
Piotr Jastrzebski
3cecb61ac1 Handle very basic column type conversions in SST3
After this change very simple schema changes of column type
will work. This change makes sure that variable size and fixed
size types can be converted to each other but only if their bit
representation can be automatically converted between those types.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-03 10:56:40 +02:00
Piotr Jastrzebski
c117a6b3c8 Enable check_multi_schema for SST3
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-03 10:56:39 +02:00
Nadav Har'El
bebe5b5df2 materialized views: add view_updates_pending statistic
We are already maintaining a statistic of the number of pending view updates
sent but but not yet completed by view replicas, so let's expose it.
As all per-table statistics, also this one will only be exposed if the
"--enable-keyspace-column-family-metrics" option is on.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-10-02 20:44:58 +01:00
Nadav Har'El
1d5f8d0015 materialized views: update stats.write statistics in all cases
mutate_MV usually calls send_to_endpoint() to push view update to remote
view replicas. This function gets passed a statistics object,
service::storage_proxy_stats::write_stats and, in particular, updates
its "writes" statistic which counts the number of ongoing writes.

In the case that the paired view replica happens to be the *same* node,
we avoid calling send_to_endpoint() and call mutate_locally() instead.
That function does not take a write_stats object, so the "writes" statistic
doesn't get incremented for the duration of the write. So we should do
this explicitly.

Co-authored-by: Nadav Har'El <nyh@scylladb.com>
Co-authored-by: Duarte Nunes <duarte@scylladb.com>
2018-10-02 20:44:58 +01:00
Duarte Nunes
40a30d4129 db/schema_tables: Diff tables using ID instead of name
Currently we diff schemas based on table/view name, and if the names
match, then we detect altered schemas by comparing the schema
mutations. This fails to detect transitions which involve dropping and
recreating a schema with the same name, if a node receives these
notifications simultaneously (for example, if the node was temporarily
down or partitioned).

Note that because the ID is persisted and created when executing a
create_table_statement, then even if a schema is re-created with the
exact same structure as before, we will still considered it altered
because the mutations will differ.

This also stops schema pulling from working, since it relies on schema
merging.

The solution is to diff schemas using their ID, and not their name.

Keyspaces and user types are also susceptible to this, but in their
case it's fine: these are values with no identity, and are just
metadata. Dropping and recreating a keyspace can be views as dropping
all tables from the keyspace, altering it, and eventually adding new
tables to the keyspace.

Note that this solution doesn't apply to tables dropped and created
with the same ID (using the `WITH ID = {}` syntax). For that, we would
need to detect deltas instead of applying changes and then reading the
new state to find differences. However, this solution is enough,
because tables are usually created with ID = {} for very specific,
peculiar reasons. The original motivation meant for the new table to
be treated exactly as the old, so the current behavior is in fact the
desired one.

Tests: unit(release), dtests(schema_test, schema_management_test)

Fixes #3797

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181001230932.47153-2-duarte@scylladb.com>
2018-10-02 20:15:46 +02:00
Duarte Nunes
e404f09a23 db/schema_tables: Drop tables before creating new ones
Doing it by the inverse order doesn't support dropping and creating a
schema with the same name.

Refs #3797

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181001230932.47153-1-duarte@scylladb.com>
2018-10-02 20:15:32 +02:00
Avi Kivity
aaab8a3f46 utils: crc32: mark power crc32 assembly as not requiring an executable stack
The linker uses an opt-in system for non-executable stack: if all object files
opt into a non-executable stack, the binary will have a non-executable stack,
which is very desirable for security. The compiler cooperates by opting into
a non-executable stack whenever possible (always for our code).

However, we also have an assembly file (for fast power crc32 computations).
Since it doesn't opt into a non-executable stack, we get a binary with
executable stack, which Gentoo's build system rightly complains about.

Fix by adding the correct incantation to the file.

Fixes #3799.

Reported-by: Alexys Jacob <ultrabug@gmail.com>
Message-Id: <20181002151251.26383-1-avi@scylladb.com>
2018-10-02 18:48:23 +01:00
Avi Kivity
53a4b8ae86 Update seastar submodule
* seastar 5712816...71e914e (12):
  > Merge "rpc shard to shard connection" from Gleb
  > Merge "Fix memory leaks when stoppping memcached" from Tomasz
  > scripts: perftune.py: prioritize I/O schedulers
  > alien: fix the size of local item[]
  > seastar-addr2line: don't invoke addr2line multiple times
  > reactor: use labels for different io_priority_class:s
  > util/spinlock: fix bad namespacing of <xmmintrin.h>
  > Merge "scripts: perftune.py: support different I/O schedulers" from Vlad
  > timer: Do not require callback to be copyable
  > core/reactor: Fix hang on shutdown with long task quota
  > build: use 'ppa:scylladb/ppa' instead of URL for sourceline
  > net/dns: add net::dns::get_srv_records() helper
2018-10-02 18:48:23 +01:00
Avi Kivity
7322ac105c Merge "sstables_stats" from Benny
"
This patchset adds sstable partition/row read/write/seek statistics.

Tests: dtest sstable_generation_loading_test.py stress_tool_test.py

Fixes: #251
"

* 'projects/sstables-stats/v5' of https://github.com/bhalevy/scylla:
  sstables stats: row reads
  sstables stats: partition seeks
  sstables stats: partition reads
  sstables stats: flat mutation reads
  sstables stats: cell/cell_tombstone writes
  sstables stats: partition/row/tombstone writes
  sstables_stats: writer_impl: move common members to base class
2018-10-02 15:05:10 +03:00
Duarte Nunes
7ba944a243 service/migration_manager: Validate duplicate ID in time
We allow tables to be created with the ID property, mostly for
advanced recovery cases. However, we need to validate that the ID
doesn't match an existing one. We currently do this in
database::add_column_family(), but this is already too late in the
normal workflow: if we allow the schema change to go through, then
it is applied to the system tables and loaded the next time the node
boots, regardless of us throwing from database::add_column_family().

To fix this, we perform this validation when announcing a new table.

Note that the check wasn't removed from database::add_column_family();
it's there since 2015 and there might have been other reasons to add
it that are not related to the ID property.

Refs #2059

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181001230142.46743-1-duarte@scylladb.com>
2018-10-02 13:40:40 +03:00
Calle Wilund
2996b8154f storage_proxy: Add missing re-throw in truncate_blocking
Iff truncation times out, we want to log it, but the exception should
not be swallowed, but re-thrown.

Fixes #3796.

Message-Id: <20181001112325.17809-1-calle@scylladb.com>
2018-10-01 19:07:04 +02:00
Paweł Dziepak
ad4a50dab6 Merge "multi range reader: add support for range generating functor" from Botond
"
This series adds support for range generator functors to multi range
reader. A range generator functor can lazily generate an uknown amount
of ranges on-the-fly for the reader to read.
The range generator support was added by refactoring
`flat_multi_range_mutation_reader` to work in terms of a generator
functor. The existing overload taking a `dht::partition_range_vector`
is adapted to the generator interface behind the scenes.
"

* 'multi-range-reader-generator/v9' of https://github.com/denesb/scylla:
  tests/flat_mutation_reader_test: extend multi-range reader tests
  make_flat_multi_range_reader: add documentation
  make_flat_multi_range_reader: add generator overload
  flat_multi_range_reader: refactor to work in terms of generator
  make_flat_multi_range_reader(): better handle the 0 range case
  flat_mutation_reader: add move_buffer_content_to()
  flat_multi_range_mutation_reader: drop fwd_mr ctor parameter
2018-10-01 12:53:31 +01:00
Benny Halevy
bd6533f471 sstables stats: row reads
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-01 13:15:43 +03:00
Benny Halevy
192c1949a3 sstables stats: partition seeks
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-01 13:15:43 +03:00
Benny Halevy
edb3c23125 sstables stats: partition reads
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-01 13:15:43 +03:00
Benny Halevy
e9dffa56c8 sstables stats: flat mutation reads
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-01 13:15:43 +03:00
Benny Halevy
4ccdc1115d sstables stats: cell/cell_tombstone writes
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-01 13:15:41 +03:00
Benny Halevy
2f48f72d5c sstables stats: partition/row/tombstone writes
Introduce per-thread sstables stats infrastructure

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-01 13:01:14 +03:00
Benny Halevy
6853c1677d sstables_stats: writer_impl: move common members to base class
To be used by sstable_writer for stats collection.

Note that this patch is factored out so it can be verified with no
other change in functionality.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-01 13:01:00 +03:00
Duarte Nunes
e6630c627b Merge 'Add secondary index paging' from Piotr
"
Indexed select statement consists of two queries - the view query
used to extract base keys and the base query that uses those keys
to return base rows.
The main idea of this series is to replace raw proxy.query() call
during the view query to one that uses a pager.
Additionally, paging info from the view query needs to be returned
to the client, in order to be used later for requesting new pages.
"

* 'paging_indexes_7' of https://github.com/psarna/scylla:
  tests: add test for secondary index with paging
  cql3: remove execute(primary_keys) from select statement
  cql3: add incremental base queries to index query
  storage_proxy: make get_restricted_ranges public
  cql3: add base query handling function to indexed statement
  cql3: add generating base key from index keys
  cql3: add paging state generation function
  cql3: move getting index view schema to prepare stage
  pager: make state() defined for exhausted pagers
  cql3: add maybe_set_paging_state function
  cql3: rename set_has_more_pages to set_paging_state
  pager: add setters for partition/clustering keys
  cql3: add paging to read_posting_list
  cql3: add non-const get_result_metadata method
  cql3: make find_index_* functions return paging state
  cql3: make read_posting_list return future<rows>
  cql3: make pagers use time_point instead of duration
2018-10-01 10:42:21 +01:00
Avi Kivity
900ffad979 config: re-add murmur3_ignore_msb_bits to scylla.yaml
Commit d6b0c4dda4 changed the built-in default
murmur3_ignore_msb_bits to 12 (from 0) and removed the scylla.yaml default.

Removal of the scylla.yaml default was a mistake for two reasons:
 - if someone downgrades a cluster, keeping scylla.yaml derived from the
   master branch, they will experience resharding since the built-in default,
   which has changed, will take effect. While that scenario is not supported,
   it already happened and caused much consternation.
 - if, in the future, we wish to change the default, we will cause resharding
   again. Embedding the default in scylla.yaml allows us to change the default
   for new clusters while allowing upgraded clusters to retain older values.

Therefore, this patch restores murmur3_ignore_msb_bits in scylla.yaml. Future
changes to the configuration item should change both scylla.yaml and the
built-in default.

Message-Id: <20180930090053.21136-1-avi@scylladb.com>
2018-10-01 10:01:36 +03:00
Takuya ASADA
0a471c32cb dist/ami/files/scylla_install_ami: enable ssh_deletekeys
For some reason upstream AMI is disabling 'ssh_deletekeys' feature on
cloud-init, but generating SSH host keys should important for public AMI
images, so enable it again.

See: https://cloudinit.readthedocs.io/en/latest/topics/modules.html?highlight=ssh_deletekeys#ssh

Fixes scylladb/scylla-ami#31

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180927122816.27809-1-syuu@scylladb.com>
2018-09-30 16:29:46 +03:00
Paweł Dziepak
2bcaf4309e utils/reusable_buffer: do not warn about large allocations
Reusable buffers are meant to be used when protocol or third-party
library limiations force us to allocate large contiguous buffers. There
isn't much that can be done about this so there is little point in
warning about that.

Fixes #3788.
Message-Id: <20180928085141.6469-1-pdziepak@scylladb.com>
2018-09-30 11:12:23 +03:00
Asias He
91dae0149d token_metadata: Invalidate cached ring in update_normal_tokens
In commit 4a0b561376, "storage_service:
Get rid of moving operation", we removed remove_from_moving() in
update_normal_tokens(). However, remove_from_moving() calls
invalidate_cached_rings(). We should call invalidate_cached_rings() in
update_normal_tokens(), otherwise we will get wrong token range to
address map in the token_metadata cache.

This issue exists in master only. It is not in any of the releases.

Message-Id: <c03f2ed478cfdb84494f36dce9a8cfc05ed9e0cd.1538288364.git.asias@scylladb.com>
2018-09-30 11:06:46 +03:00
Alexys Jacob
6d6764133b dist/common/scripts: coding style fixes
dist/common/scripts/scylla_blocktune.py:24:10: E401 multiple imports on one line
dist/common/scripts/scylla_blocktune.py:27:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_blocktune.py:35:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_blocktune.py:48:1: E305 expected 2 blank lines after class or function definition, found 1
dist/common/scripts/scylla_blocktune.py:52:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_blocktune.py:59:5: E306 expected 1 blank line before a nested definition, found 0
dist/common/scripts/scylla_blocktune.py:74:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_blocktune.py:81:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_blocktune.py:87:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_config_get.py:26:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_config_get.py:43:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_config_get.py:53:1: E305 expected 2 blank lines after class or function definition, found 1
dist/common/scripts/scylla_util.py:18:22: E401 multiple imports on one line
dist/common/scripts/scylla_util.py:19:22: E401 multiple imports on one line
dist/common/scripts/scylla_util.py:24:1: F401 'string' imported but unused
dist/common/scripts/scylla_util.py:32:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:50:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:61:30: E225 missing whitespace around operator
dist/common/scripts/scylla_util.py:75:53: E703 statement ends with a semicolon
dist/common/scripts/scylla_util.py:79:32: E272 multiple spaces before keyword
dist/common/scripts/scylla_util.py:80:25: E703 statement ends with a semicolon
dist/common/scripts/scylla_util.py:85:32: E201 whitespace after '['
dist/common/scripts/scylla_util.py:85:51: E202 whitespace before ']'
dist/common/scripts/scylla_util.py:130:34: E201 whitespace after '['
dist/common/scripts/scylla_util.py:130:65: E202 whitespace before ']'
dist/common/scripts/scylla_util.py:170:1: E266 too many leading '#' for block comment
dist/common/scripts/scylla_util.py:172:11: E225 missing whitespace around operator
dist/common/scripts/scylla_util.py:174:10: E225 missing whitespace around operator
dist/common/scripts/scylla_util.py:178:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:181:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:184:17: E201 whitespace after '['
dist/common/scripts/scylla_util.py:184:50: E202 whitespace before ']'
dist/common/scripts/scylla_util.py:186:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:193:16: E201 whitespace after '['
dist/common/scripts/scylla_util.py:193:76: E202 whitespace before ']'
dist/common/scripts/scylla_util.py:195:18: E201 whitespace after '{'
dist/common/scripts/scylla_util.py:195:27: E203 whitespace before ':'
dist/common/scripts/scylla_util.py:195:41: E203 whitespace before ':'
dist/common/scripts/scylla_util.py:195:48: E202 whitespace before '}'
dist/common/scripts/scylla_util.py:203:25: E201 whitespace after '['
dist/common/scripts/scylla_util.py:203:54: E202 whitespace before ']'
dist/common/scripts/scylla_util.py:204:76: E225 missing whitespace around operator
dist/common/scripts/scylla_util.py:208:27: E703 statement ends with a semicolon
dist/common/scripts/scylla_util.py:217:27: E201 whitespace after '['
dist/common/scripts/scylla_util.py:217:62: E202 whitespace before ']'
dist/common/scripts/scylla_util.py:238:25: E201 whitespace after '['
dist/common/scripts/scylla_util.py:238:87: E202 whitespace before ']'
dist/common/scripts/scylla_util.py:257:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:258:11: E225 missing whitespace around operator
dist/common/scripts/scylla_util.py:259:11: E225 missing whitespace around operator
dist/common/scripts/scylla_util.py:268:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:277:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:280:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:283:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:286:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:297:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:302:5: E722 do not use bare except'
dist/common/scripts/scylla_util.py:305:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:325:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:329:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:335:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:338:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:341:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:343:81: E231 missing whitespace after ','
dist/common/scripts/scylla_util.py:352:1: E305 expected 2 blank lines after class or function definition, found 1
dist/common/scripts/scylla_util.py:352:21: E231 missing whitespace after ':'
dist/common/scripts/scylla_util.py:352:41: E231 missing whitespace after ':'
dist/common/scripts/scylla_util.py:352:65: E231 missing whitespace after ':'
dist/common/scripts/scylla_util.py:353:1: E302 expected 2 blank lines, found 0
dist/common/scripts/scylla_util.py:358:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:360:22: E225 missing whitespace around operator
dist/common/scripts/scylla_util.py:365:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:367:11: E225 missing whitespace around operator
dist/common/scripts/scylla_util.py:370:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:373:15: E225 missing whitespace around operator
dist/common/scripts/scylla_util.py:374:14: E225 missing whitespace around operator
dist/common/scripts/scylla_util.py:375:14: E225 missing whitespace around operator
dist/common/scripts/scylla_util.py:376:20: E225 missing whitespace around operator
dist/common/scripts/scylla_util.py:385:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:388:9: E225 missing whitespace around operator
dist/common/scripts/scylla_util.py:389:9: E225 missing whitespace around operator
dist/common/scripts/scylla_util.py:393:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:396:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:399:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:432:1: E302 expected 2 blank lines, found 1

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20180918213707.6069-1-ultrabug@gentoo.org>
2018-09-30 11:00:37 +03:00
Botond Dénes
eba8d68313 tests/flat_mutation_reader_test: extend multi-range reader tests
Add unit tests for the generator version and extend existing ones with
tests for the corner cases (0 and 1 range).
2018-09-28 14:27:55 +03:00
Botond Dénes
bb7447bbe4 make_flat_multi_range_reader: add documentation 2018-09-28 14:27:55 +03:00
Botond Dénes
39bfd5d1df make_flat_multi_range_reader: add generator overload
Allows creating a multi range reader from an arbitrary callable that
return std::optional<dht::partition_range>. The callable is expected to
return a new range on each call, such that passing each successive range
to `flat_mutation_reader::fast_forward_to` is valid. When exhausted the
callable is expected to return std::nullopt.
2018-09-28 14:27:55 +03:00
Botond Dénes
8c5387890d flat_multi_range_reader: refactor to work in terms of generator
Instead of working with a dht::partition_range_vector directly, work
with an abstract generator that returns a pointer to the next range on
each invocation. When exhausted it returns nullptr. This opens up the
possibility to create multi range readers from a generator functor that
creates ranges lazily. This is indeed what the next path does.
2018-09-28 14:27:55 +03:00
Botond Dénes
f3bf2e83dd make_flat_multi_range_reader(): better handle the 0 range case
Previously, when the passed in range of partition ranges contained 0
ranges, an empty reader was returned. This means that the returned
reader was forwardable or not depending on the number of passed in
ranges. This is inconsistent and can lead to nasty surprises.
To solve this problem add `forwardable_empty_mutation_reader`, a
specialized reader that delays creating the underlying reader until
fast_forward_to() is called on it, and thus a range is available.

When `make_flat_multi_range_mutation_reader()` is called with
`mutation_reader::forwarding::no` a simple empty reader is created, like
before.
2018-09-28 14:27:55 +03:00
Botond Dénes
03be9510a7 flat_mutation_reader: add move_buffer_content_to()
`move_buffer_content_to()` makes it possible to implement more efficient
wrapping readers, readers that wrap another flat mutation reader but do
no transformation to the underlying fragment stream.
These readers, when filling their buffers, can simply fill the
underlying reader's buffer, then move its content into their own. When
the reader's own buffer is empty, this is very efficient, as it can be
done by simply swapping the buffers, avoiding the work of moving the
fragments one-by-one.
2018-09-28 14:27:54 +03:00
Botond Dénes
68b6c83ee8 flat_multi_range_mutation_reader: drop fwd_mr ctor parameter
The factory function creating this reader ensures that the passed-in
ranges vector has more then one range, which effectively makes the
`fwd_mr` constructor parameter have no effect. The underlying reader
will always be created with `mutation_reader::forwarding::yes` as it has
to be able to fast-forward between the ranges.
2018-09-28 14:25:03 +03:00
Duarte Nunes
b8749a61dc tests/aggregate_fcts_test: Fix formatting of create_table()
And drop the template.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180927223315.28254-1-duarte@scylladb.com>
2018-09-28 09:45:27 +02:00
Duarte Nunes
17578c3579 tests/aggregate_fcts_test: Add test case for wrapped types
Provide a test case which checks a type being wrapped in a
reverse_type plays no role in assignment.

Refs #3789

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180927223201.28152-2-duarte@scylladb.com>
2018-09-28 07:09:08 +03:00
Duarte Nunes
5e7bb20c8a cql3/selection/selector: Unwrap types when validating assignment
When validating assignment between two types, it's possible one of
them is wrapped in a reverse_type, if it comes, for example, from the
type associated with a clustering column. When checking for weak
assignment the types are correctly unwrapped, but not when checking
for an exact match, which this patch fixes.

Technically, the receiver is never a reversed_type for the current
callers, but this is the morally correct implementation, as the type
being reversed or not plays no role in assignment.

Tests: unit(release)

Fixes #3789

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180927223201.28152-1-duarte@scylladb.com>
2018-09-28 07:08:19 +03:00
Piotr Sarna
da3821c598 tests: add test for secondary index with paging
A test case with enough rows to have multiple pages
is added to secondary_index_test suite.
2018-09-27 15:29:28 +02:00
Piotr Sarna
4b4f57747a cql3: remove execute(primary_keys) from select statement
Right now, with specialized execute() that takes primary keys
for indexed_table_select_statement, the original execute()
method implemented in select_statement is not used anywhere,
so it's removed.
2018-09-27 15:29:28 +02:00
Piotr Sarna
9e0b3cad1e cql3: add incremental base queries to index query
Base queries that are part of index queries are allowed to be short,
which can result in wasted work - e.g. when we query all replicas
in parallel, but have to discard most of the result, since the first
one (in token order) resulted in a short read.
Thus, we start by quering 1 range, check if the read is short,
and if not, continue by querying 2x more ranges than before.

Refs #2960
2018-09-27 15:29:28 +02:00
Piotr Sarna
c41e0ade6c storage_proxy: make get_restricted_ranges public
This function is useful for splitting ranges in indexed queries.
2018-09-27 15:29:28 +02:00
Piotr Sarna
5b16aeb395 cql3: add base query handling function to indexed statement
Handling a base query during the indexed statement execution
may require updating its paging state.
2018-09-27 15:29:28 +02:00
Piotr Sarna
bce7232555 cql3: add generating base key from index keys
A function that computes base partition/clustering key from index view
primary key is provided.
2018-09-27 15:29:28 +02:00
Piotr Sarna
2f085848d8 cql3: add paging state generation function
For indexed queries, the paging state needs to be updated
based on the results of base query when the read was short.
2018-09-27 15:29:28 +02:00
Piotr Sarna
f21bcbefdf cql3: move getting index view schema to prepare stage
Searching for index view schema for an indexed statement can be done
once in prepare stage, so it's moved to indexed_table_select_statement
prepare method.
2018-09-27 15:29:28 +02:00
Piotr Sarna
b6d90b2869 pager: make state() defined for exhausted pagers
If service::pager is exhausted, state() function used to return
a nullptr instead of a pointer to a valid paging state and the
documented return type in this case was 'unspecified'.
Sometimes a paging state may be needed anyway, even if the pager
is already exhausted - thus, state() return value becomes defined
after this commit. Exhausted pagers will return a valid object
to a state with _remaining field set to 0.
2018-09-27 15:29:28 +02:00
Piotr Sarna
c1be660c3a cql3: add maybe_set_paging_state function
set_paging_state is split into its unconditional variant and a maybe_
one in order to avoid double checks.
2018-09-27 15:29:28 +02:00
Piotr Sarna
744ac3bf7b cql3: rename set_has_more_pages to set_paging_state
This function's primary goal is to set the paging state passed
as a parameter, so its name is changed to match the semantics better.
2018-09-27 15:29:28 +02:00
Glauber Costa
c3f27784de database: guarantee a minimum amount of shares when manual operations are requested.
We have found issues when a flush is requested outside the usual
memtable flush loop and because there is not a lot of data the
controller will not have a high amount of shares.

To prevent this, this patch guarantees some minimum amount of shares
when extraneous operations (nodetool flush, commitlog-driven flush, etc)
are requested.

Another option would be to add shares instead of guarantee a minimum.
But in my view the approach I am taking here has two main advantages:

1) It won't cause spikes when those operations are requested
2) It is cumbersome to add shares in the current infrastructure, as just
adding backlog can cause shares to spike. Consider this example:

  Backlog is within the first range of very low backlog (~0.2). Shares
  for this would be around ~20. If we want to add 200 shares, that is
  equivalent to a backlog of 0.8. Once we add those two backlogs
  together, we end up with 1 (max backlog).

Fixes #3761

Tests: unit (release)

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20180927131904.8826-1-glauber@scylladb.com>
2018-09-27 15:20:31 +02:00
Piotr Sarna
336cc70438 pager: add setters for partition/clustering keys 2018-09-27 15:18:06 +02:00
Piotr Sarna
7c1e4c2deb cql3: add paging to read_posting_list
Instead of a single query, paging is used in order to query
an index.
2018-09-27 15:18:06 +02:00
Piotr Sarna
b83aa69a2e cql3: add non-const get_result_metadata method 2018-09-27 15:18:06 +02:00
Piotr Sarna
430a49f91a cql3: make find_index_* functions return paging state
In order to implement secondary index paging, intermediary query
functions now also return paging state for the view query.
2018-09-27 15:18:06 +02:00
Piotr Sarna
c3dd1775c8 cql3: make read_posting_list return future<rows>
Instead of returning a coordinator result and making a caller parse it
later, read_posting_list now extracts rows by itself.
This change is later needed when querying is replaced with a pager.
2018-09-27 15:18:06 +02:00
Piotr Sarna
1d34ef38a8 cql3: make pagers use time_point instead of duration
A standard way for passing a timeout parameter is specifying
a time_point, while pagers used to take a duration in order
to compute time points on the fly. This patch adds a timeout
parameter, which is a time_point, to fetch_page().
2018-09-27 15:18:06 +02:00
Tomasz Grabiec
78d9205a50 Merge "Multiple fixes to tests/normalizing_reader" from Vladimir
This patchset addresses multiple errors in normalizing_reader
implementation found during review.

I have decided to not make a clustering key full inside
before_key()/after_key() helpers. The reason is that for this they
would need schema to be passed as another parameter so existing
methods don't suit. OTOH, introducing new members for a class using
for testing purposes only seems an overkill.

* github.com/argenet/scylla.git projects/sstables-30/normalizing_reader_fixes/v1:
  range_tombstone: Add constructor accepting position_in_partition_views
    for range bounds.
  tests: Make sure range tombstone is properly split over rows with
    non-full keys.
  tests: Multiple fixes for draining and clearing range tombstones in
    normalizing_reader.
2018-09-27 12:51:47 +02:00
Vladimir Krivopalov
653fb37ea5 range_tombstone: Remove code that duplicates logic.
The actions performed by the call to set_start() were duplicated by the
immediately following code lines that are removed with this patch.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <20eaa1338c1719ded34f5c9ada69ec03907936f5.1537989044.git.vladimir@scylladb.com>
2018-09-27 12:05:25 +02:00
Vladimir Krivopalov
b74706a8f5 tests: Multiple fixes for draining and clearing range tombstones in normalizing_reader.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-26 19:24:10 -07:00
Vladimir Krivopalov
26d4d276e9 tests: Make sure range tombstone is properly split over rows with non-full keys.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-26 17:19:43 -07:00
Vladimir Krivopalov
fbccae0d15 range_tombstone: Add constructor accepting position_in_partition_views for range bounds.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-26 17:17:18 -07:00
Avi Kivity
e0b34003b5 tests: sstable_mutation_test: await background jobs
We only wait from the last test case, so if an individual test is executed,
a memory leak may be reported.

Fix by waiting from all test cases.
Message-Id: <20180926203723.18026-1-avi@scylladb.com>
2018-09-26 21:48:32 +01:00
Eliran Sinvani
44d93b4d4c cql3: fix incorrect results returned from prepared select with an IN clause
When executing a prepared select statement with a multicolumn IN, the
system returned incorrect results due to a memory violation (a bytes view
referring to an out of scope bytes object).
Added test for the prepared statement results correctness.

Tests:
1. unit (release) with the new test.
2. Python script.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <36c9cf9ed3fe72e3b4801e3cd120678429ce218a.1537947897.git.eliransin@scylladb.com>
2018-09-26 15:23:41 +03:00
Eliran Sinvani
22ad5434d1 cql3 : fix a crash upon preparing select with an IN restriction due to memory violation
When preparing a select query with a multicolumn in restriction, the
node crashed due to using a parameter after using a move on it.

Tests:
1. UnitTests (release)
2. Preparing a select statement that crashed the system before,
and verify it is not crashing.

Fixes #3204
Fixes #3692

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <7ebd210cd714a460ee5557ac612da970cee03270.1537947897.git.eliransin@scylladb.com>
2018-09-26 15:23:38 +03:00
Avi Kivity
8f5e80e61a Revert "setup: add the lazytime XFS version"
This reverts commit f828fe0d59. It causes
scylla_raid_setup to fail on CentOS 7.

Fixes #3784.
2018-09-26 11:10:07 +01:00
Avi Kivity
e8d988caf8 Merge "Enable existing SSTables unit tests for 'mc' format" from Vladimir and Piotr
"
This patchset fixes several issues in SSTables 3.x ('mc') writing and
parsing and extends existing SSTables unit tests to cover the new
format.

The only test enabled temporarily is check_multi_schema because it
turned out that reading SSTables 3.x with a different schema has not
been implemented in full. This will be addressed in a separate patchset.

This patchset depends on the "Support SSTables 3.x in Scylla runtime"
patchset.

Tests: unit {release}
"

* 'projects/sstables-30/unit-tests/v3' of https://github.com/argenet/scylla: (25 commits)
  tests: Enable existing SSTables tests for 'mc' format.
  tests: Fix test_wrong_range_tombstone_order for 'mc' format.
  tests: Extend reader assertions to check clustering keys made full.
  tests: Disable test_old_format_non_compound_range_tombstone_is_read for 'mc' format.
  tests: Disable check_multi_schema for 'mc' format.
  tests: Fix test_promoted_index_read for 'mc' format by using normalizing_reader.
  tests: Fix promoted_index_read to not rely on a specific index length
  tests: Add 'mc' files for test_wrong_range_tombstone_order
  tests: Add 'mc' files for test_wrong_counter_shard_order
  tests: Add 'mc' files for summary_test
  tests: Add 'mc' files for test_promoted_index_read
  tests: Add 'mc' files for test_partition_skipping
  tests: Add 'mc' files for large_partition tests (promoted_index_read, sub_partition_read, sub_partitions_read
  tests: Add 'mc' files for test_counter_read
  tests: Add 'mc' files for test_broken_promoted_index_is_skipped
  tests: SSTables 'mc' files for sliced_mutation_reads_test.
  tests: Introduce normalizing_reader helper for SSTables tests.
  mutation_fragment: Add range_tombstone_stream::empty() method.
  sstables: Make key full when setting a range tombstone start from end open marker.
  sstables: For 'mc' format, use excl_start when split an RT over a row with a full key.
  ...
2018-09-26 11:10:07 +01:00
Avi Kivity
337ee6153a Merge "Support SSTables 3.x in Scylla runtime" from Vladimir and Piotr
"
This patchset makes it possible to use SSTables 'mc' format, commonly
referred to as 'SSTables 3.x', when running Scylla instance.

Several bugs found on this way are fixed. Also, a configuration option
is introduced to allow running Scylla either with 'mc' or 'la' format
as default.

Tests: unit {release}

+ tested Scylla with both 'la' and 'mc' formats to work fine:

cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};                                                                  [3/1890]
cqlsh> USE test;
cqlsh:test> CREATE TABLE cfsst3 (pk int, ck int, rc int, PRIMARY KEY (pk, ck)) WITH compression = {'sstable_compression': ''};
cqlsh:test> INSERT INTO cfsst3 (pk, ck, rc) VALUES ( 4, 7, 8);
    <<flush>>
cqlsh:test> DELETE from cfsst3 WHERE pk = 4 and ck> 3 and ck < 8;
    <<flush>>
cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 2, 3);
cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 4, 6);
cqlsh:test> SELECT * FROM cfsst3 ;

 pk | ck | rc
----+----+------
  2 |  3 | null
  4 |  6 | null

(2 rows)
    <<Scylla restart>>
cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 5, 7);
cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 6, 8);
cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 7, 9);
cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 8, 10);
cqlsh:test> SELECT * from cfsst3 ;

 pk | ck | rc
----+----+------
  5 |  7 | null
  8 | 10 | null
  2 |  3 | null
  4 |  6 | null
  7 |  9 | null
  6 |  8 | null

(6 rows)
"

* 'projects/sstables-30/try-runtime/v8' of https://github.com/argenet/scylla:
  database: Honour enable_sstables_mc_format configuration option.
  sstables: Support SSTables 'mc' format as a feature.
  db: Add configuration option for enabling SSTables 'mc' format.
  tests: Add test for reading a complex column with zero subcolumns (SST3).
  sstables: Fix parsing of complex columns with zero subcolumns.
  sstables: Explicitly cast api::timestamp_type to uint64_t when delta-encoding.
  sstables: Use parser_type instead of abstract_type::parse_type in column_translation.
  bytes: Add helper for turning bytes_view into sstring_view.
  sstables: Only forward the call to fast_forwarding_to in mp_row_consumer_m if filter exists.
  sstables: Fix string formatting for exception messages in m_format_read_helpers.
  sstables: Don't validate timestamps against the max value on parsing.
  sstables: Always store only min bases in serialization_header.
  sstables: Support 'mc' version parsing from filename.
  SST3: Make sure we call consume_partition_end
2018-09-26 11:10:07 +01:00
Vladimir Krivopalov
38c8d1ce05 tests: Enable existing SSTables tests for 'mc' format.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 18:02:46 -07:00
Vladimir Krivopalov
c33e0f3f15 tests: Fix test_wrong_range_tombstone_order for 'mc' format.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 18:02:46 -07:00
Vladimir Krivopalov
ad2b9e44ee tests: Extend reader assertions to check clustering keys made full.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 18:02:46 -07:00
Vladimir Krivopalov
9239195473 tests: Disable test_old_format_non_compound_range_tombstone_is_read for 'mc' format.
This test is not applicable to the 'mc' format as it covers a backward
compatibility case which may only occur with SSTables generated by older
Scylla versions in 'ka' format.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 18:02:46 -07:00
Vladimir Krivopalov
952536c9f5 tests: Disable check_multi_schema for 'mc' format.
Altering types in schema has been disabled in Origin (see
CASSANDRA-12443). We do the same.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 18:02:46 -07:00
Vladimir Krivopalov
86aae36e04 tests: Fix test_promoted_index_read for 'mc' format by using normalizing_reader.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
5422203714 tests: Fix promoted_index_read to not rely on a specific index length
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
be5fe11f22 tests: Add 'mc' files for test_wrong_range_tombstone_order
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
3dd6e6f899 tests: Add 'mc' files for test_wrong_counter_shard_order
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
f08a2b35da tests: Add 'mc' files for summary_test
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
7e40947a80 tests: Add 'mc' files for test_promoted_index_read
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
20f3edba61 tests: Add 'mc' files for test_partition_skipping
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
8c37801ae5 tests: Add 'mc' files for large_partition tests (promoted_index_read, sub_partition_read, sub_partitions_read
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
28c32a353a tests: Add 'mc' files for test_counter_read
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
60c9a25b38 tests: Add 'mc' files for test_broken_promoted_index_is_skipped
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
24342dc27d tests: SSTables 'mc' files for sliced_mutation_reads_test.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
4393233a86 tests: Introduce normalizing_reader helper for SSTables tests.
This is a helper flat_mutation_reader that wraps another reader and
splits range tombstones over rows before emitting them.

It is used to produce the same mutation streams for both old (ka/la) and
new (mc) SSTables formats in unit tests.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
7a5c4f0a63 mutation_fragment: Add range_tombstone_stream::empty() method.
The method checks if the underlying range_tombstone_list is empty.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
eddf846c8a sstables: Make key full when setting a range tombstone start from end open marker.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
fa48a78d71 sstables: For 'mc' format, use excl_start when split an RT over a row with a full key.
This fixes the monotonicity issue as otherwise the range tombstone
emitted after such clustering row has a start position that should be
ordered before that of the row.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
45082ef18c sstables: Don't write promoted index consisting of a single block in 'mc' format.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:55:52 -07:00
Piotr Jastrzebski
8f5ac1d86f SST3: Make sure we emit range tombstone when slicing/fft
If we go past the slice to be read with a range tombstone being opened
we need to emit an RT corresponding to this slice.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-09-25 17:55:52 -07:00
Piotr Jastrzebski
ade8027960 Add mutation_fragment_filter::upper_bound
This method returns end of current position range.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-09-25 17:55:52 -07:00
Piotr Jastrzebski
82ff29cde8 Add clustering_ranges_walker::upper_bound
This method returns end of current position range.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-09-25 17:55:52 -07:00
Piotr Jastrzebski
bff49345cd Add position_in_partition_view::as_end_bound_view
This will be used in sstables 3.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-09-25 17:55:52 -07:00
Vladimir Krivopalov
cd80d6ff65 database: Honour enable_sstables_mc_format configuration option.
Only enable SSTables 'mc' format if the entire cluster supports it and
it is enabled in the configuration file.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:23:40 -07:00
Vladimir Krivopalov
c98937e04c sstables: Support SSTables 'mc' format as a feature.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:23:40 -07:00
Vladimir Krivopalov
650b245657 db: Add configuration option for enabling SSTables 'mc' format.
This flag will only be used for testing purposes until Scylla 3.o
release and will be removed once SSTables 'mc' testing is completed.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:23:40 -07:00
Vladimir Krivopalov
0edd3c57a9 tests: Add test for reading a complex column with zero subcolumns (SST3).
The files are generated by Scylla as a compaction_history table.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:23:40 -07:00
Vladimir Krivopalov
24590fe88c sstables: Fix parsing of complex columns with zero subcolumns.
Before this fix, a complex column with zero subcolumns would be
incorrecty parsed as it would re-read the deletion time twice.

Now, this case is handled properly.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:23:40 -07:00
Vladimir Krivopalov
be3613bdb6 sstables: Explicitly cast api::timestamp_type to uint64_t when delta-encoding.
This avoids noisy warnings like "signed value overflow" when ASAN is
turned on.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:23:40 -07:00
Vladimir Krivopalov
0048f4814e sstables: Use parser_type instead of abstract_type::parse_type in column_translation.
abstract_type::parse_type() only deals with simple types and fails to
parse wrapped types such as
org.apache.cassandra.db.marshal.FrozenType(org.apache.cassandra.db.marshal.ListType(org.apache.cassandra.db.marshal.UTF8Type))

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:23:40 -07:00
Vladimir Krivopalov
0f298113c7 bytes: Add helper for turning bytes_view into sstring_view.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:23:40 -07:00
Vladimir Krivopalov
9166badebe sstables: Only forward the call to fast_forwarding_to in mp_row_consumer_m if filter exists.
It may happen that we hit the end of partition and then get
fast_forward_to() called in which case we attempt to call it from an
already destroyed object. We need to check the _mf_filter value before
doing so.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:23:40 -07:00
Vladimir Krivopalov
fc901eb700 sstables: Fix string formatting for exception messages in m_format_read_helpers.
Before this fix, the code was a potential undefined behaviour and crash
because it would add a large value to a const char* and try to create a
std::string out of it.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:23:40 -07:00
Vladimir Krivopalov
84341821b1 sstables: Don't validate timestamps against the max value on parsing.
Internally, timestamps are represented as signed integers (int64_t) but
stored as unsigned ones. So it is quite possible to store data with
timestamp that is represented as a number larger than the max value of
int64_t type.
One such example is api::min_timestamp() that is used when generating
system schema tables ("keyspaces"). When cast to uint64_t, it turns into
a large value.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:23:40 -07:00
Vladimir Krivopalov
bdca27ae41 sstables: Always store only min bases in serialization_header.
There previously was an inconsistency in treating min values stored in a
serialization_header. They are written to or read from a Statistics.db
as deltas against fixed bases, but when we parse timeouts from the data
file, we need the full bases, not just deltas.

This inconsistency causes wrong timestamp values if we write an sstable
and then read from it using one and the same sstable object because we
turn min values into bases on write and then don't adjust them back
because we already have them in memory.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:23:40 -07:00
Vladimir Krivopalov
057c26f894 sstables: Support 'mc' version parsing from filename.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:23:40 -07:00
Piotr Jastrzebski
d8e6d1ed98 SST3: Make sure we call consume_partition_end
even when we slice and fast forward to.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-09-25 17:23:40 -07:00
Raphael S. Carvalho
745e35fa82 database: Fix sstable resharding for mc format
SStable format mc doesn't write ancestors to metadata, so resharding
will not work with this new format because it relies on ancestors to
replace new unshared sstables with old shared ones.
Fix is about not relying on ancestors metadata for this operation.

Fixes #3777.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180922211933.1987-1-raphaelsc@scylladb.com>
2018-09-25 18:37:48 +03:00
Nadav Har'El
05f8ed270b Add docs/metrics.md - documentation on metrics
Today I realised that although we have per-table metrics, they are not
*really* available by default. I was suprised to find that we don't have
(as far as I can tell) a document explaining why it is so, or how to enable
them anyway. Moreover, the more I investigated this issue, the more I
realised how little I know on Scylla's metrics - how they are calculated,
how they are collected, their different types, and so on.

So I sat down to figure out everything I wanted to learn about Scylla metrics,
and then wrote it all down in a new document, docs/metrics.md.

There are some missing pieces in this document marked by TODO, and probably
additional missing pieces that I'm not aware of, but I think this is already
a good start and can be (and should be) improved-on later.

We really need to have more of these documents describing various Scylla
subsystems to new developers - what each subsystem does, why it does what
it does, where is the code, and so on. I am facing these problems every
day as a seasoned developer - I can't even imagine what our new developers
face when trying to understand a subsystem they are not yet familiar with.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180920131103.20590-1-nyh@scylladb.com>
2018-09-25 17:51:20 +03:00
Paweł Dziepak
a3746d3b05 paging: make may_need_paging() more conservative
There is a bad interaction between may_need_paging() and query result
size limiter. The former is trying to avoid the complexity of paged
queries when the number of returned rows is going to be smaller than the
page size. The latter uses the fact that paged queries need not return
all requested rows to limit the size of a query results. Since
may_need_paging() may turn a paged query into non-paged one as a side
effect it disables the oversized result protection.

This patch limits the cases when may_need_paging() disables paging to
the situations when we know for sure that query result size limiter
won't be needed, i.e.: the result is not going to contain more than one
row. If the client knows for sure that the paging is not needed and
the performance impact is worthwhile it can disable paging on its side.
Otherwise, let's default to the safer behaviour.

Fixes #3620.

Message-Id: <20180925134431.24329-1-pdziepak@scylladb.com>
2018-09-25 17:01:04 +03:00
Avi Kivity
c6f651ead4 Merge "Use fragmented buffers in commitlog writes" from Paweł
"
This series changes commitlog write path so that it uses fragmented
buffers and therefore avoids large allocations. This is done by first
switching the code to use seastar memory_output_stream interface, which
can handle fragmented buffer without any additional actions from the
user code needed and then making it use buffers of fixed size 128 kB.

Tests: unit(release, debug) dtest(commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup commitlog_test.py:TestCommitLog.test_commitlog_replay_with_alter_table)
"

* tag 'fragmented-commitlog-writes/v3' of https://github.com/pdziepak/scylla:
  commitlog: switch to fragmented buffers
  commitlog: drop buffer pools
  commitlog: drop recovery from bad alloc
  utils: drop data_output
  commitlog: use memory_output_stream
  serialization_visitors: add support for memory_output_stream
  utils: fragmented_temporary_buffer::view: add remove_prefix()
  utils: fragmented_temporary_buffer: add empty() and size_bytes()
  utils: fragmented_temporary_buffer: add get_ostream()
  idl: serializer: don't assume Iterator::value_type is bytes_view
  idl: serializer:  create buffer view from streams
  utils: crc: accept FragmentRange
2018-09-25 12:43:06 +03:00
Avi Kivity
8276ada1c4 tests: sstable_3_x_test: await sstable background tasks
When an sstable is deleted, this work is done as a background task
since it cannot be done from the destructor.  If we don't wait for
that background task, it is detected as a leak by ASAN.

Fix by waiting for background tasks in every test.

A more complete fix would involve having a factory class create
sstables and assume the responsibility for background tasks, and
something similar to with_cql_test_env(), but that is deferred until later.

Tests: sstable_3_x_test (debug).
Message-Id: <20180923111745.8313-1-avi@scylladb.com>
2018-09-24 10:43:58 +02:00
Takuya ASADA
21a12aa458 dist/redhat: specify correct repo file path on scylla-housekeeping services
Currently, both scylla-housekeeping-daily/-restart services mistakenly
specify repo file path as "@@REPOFILES@@", witch is copied from .in
template, need to be replace with actual path.

Fixes #3776

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180921031605.9330-1-syuu@scylladb.com>
2018-09-23 11:38:26 +03:00
Glauber Costa
f828fe0d59 setup: add the lazytime XFS version
Starting with kernel 4.17 XFS will support the lazytime mount option.
That will be beneficial for Scylla as updating times synchronously is
one of our current sources of stalls.

Fortunately, older kernels are able to parse the option and just ignore
it. We verified that to be the case in a 4.15 kernel on ubuntu.
Therefore, just add the option unconditionally.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20180920170017.13215-1-glauber@scylladb.com>
2018-09-20 20:12:44 +03:00
Gleb Natapov
0bf9a78c78 sstables: wrap file into checked file after applying extensions
File extensions can also produce errors that checked file wants to
intercept and act upon. The patch changes the order in which files are
wrapped to make checked file the outermost wrapped to be able to handle
exception generated by all inner wrappers.

Message-Id: <20180920124430.GD2326@scylladb.com>
2018-09-20 15:57:38 +03:00
Botond Dénes
eb357a385d flat_mutation_reader: make timeout opt-out rather than opt-in
Currently timeout is opt-in, that is, all methods that even have it
default it to `db::no_timeout`. This means that ensuring timeout is used
where it should be is completely up to the author and the reviewrs of
the code. As humans are notoriously prone to mistakes this has resulted
in a very inconsistent usage of timeout, many clients of
`flat_mutation_reader` passing the timeout only to some members and only
on certain call sites. This is small wonder considering that some core
operations like `operator()()` only recently received a timeout
parameter and others like `peek()` didn't even have one until this
patch. Both of these methods call `fill_buffer()` which potentially
talks to the lower layers and is supposed to propagate the timeout.
All this makes the `flat_mutation_reader`'s timeout effectively useless.

To make order in this chaos make the timeout parameter a mandatory one
on all `flat_mutation_reader` methods that need it. This ensures that
humans now get a reminder from the compiler when they forget to pass the
timeout. Clients can still opt-out from passing a timeout by passing
`db::no_timeout` (the previous default value) but this will be now
explicit and developers should think before typing it.

There were suprisingly few core call sites to fix up. Where a timeout
was available nearby I propagated it to be able to pass it to the
reader, where I couldn't I passed `db::no_timeout`. Authors of the
latter kind of code (view, streaming and repair are some of the notable
examples) should maybe consider propagating down a timeout if needed.
In the test code (the wast majority of the changes) I just used
`db::no_timeout` everywhere.

Tests: unit(release, debug)

Signed-off-by: Botond Dénes <bdenes@scylladb.com>

Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>
2018-09-20 11:31:24 +02:00
Asias He
de05df216f streaming: Use rpc::source on the shard where it is created
rpc::source can only work on the shard where it is created, thus we can
not apply the load distribution optimization. Disable it and let the
multishard_writer to forward the data to the correct shard.

Fixes #3731.

Message-Id: <0d1b4d3e7adcfdc4e392b83aeb2544b95f3f46dd.1537430162.git.asias@scylladb.com>
2018-09-20 12:29:24 +03:00
Avi Kivity
8b2bf73c6f Merge "Fix compaction metadata read/write for SSTables 3.x" from Vladimir
"
In SSTables 3.x, the 'ancestors' field of compaction metadata is no
longer stored in the Statistics.db file

The newly added test has previously failed due to this inconsistency.

Tests: unit {release}
"

* 'projects/sstables-30/empty_clustering_key/v1' of https://github.com/argenet/scylla:
  tests: Add test for reading table with empty clustering key from SSTables 3.x.
  tests: Update Statistics.db files for SSTables 3.x write tests.
  sstables: Do not parse ancestors from compaction metadata for SSTables 3.x
2018-09-20 09:53:46 +03:00
Vladimir Krivopalov
bf351c4a4f tests: Add test for reading table with empty clustering key from SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-19 20:57:23 -07:00
Vladimir Krivopalov
3bbb013ecd tests: Update Statistics.db files for SSTables 3.x write tests.
Those files have been generated with 'ancestors' field in compaction
metadata and so were invalid.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-19 20:57:23 -07:00
Vladimir Krivopalov
48fa088ec6 sstables: Do not parse ancestors from compaction metadata for SSTables 3.x
Ancestors array has been removed starting from 'ma' format
(CASSANDRA-7066).

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-19 17:11:43 -07:00
Vlad Zolotarov
043ced243e fix_system_distributed_tables.sh: adjust newly added 'request_size' and 'response_size' columns
Adjust the script to the new schema of system_traces.sessions. Two
new columns have been added:
   - request_size:  int
   - response_size: int

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20180919005504.12498-1-vladz@scylladb.com>
2018-09-19 15:46:11 +01:00
Paweł Dziepak
4469f76e7c commitlog: switch to fragmented buffers
So far commitlog was using contiguous buffers for storing the data that
is about to be written to disk. It was able to coalesce small writes so
that multiple small mutations would use the same buffer, but if a
muation was large the commitlog would attempt to allocate a single,
appropriately large buffer. This excessively stresses the memory
allocator and may cause memory fragmentation to become an issue. The
solution is to use fixed-size buffers of 128 kB, which is the standard
buffer size in Scylla and keep large values fragmented.
2018-09-18 17:22:59 +01:00
Paweł Dziepak
7c1add6769 commitlog: drop buffer pools
Buffer pools were added in 7191a130bb
"Commitlog: recycle buffers to reduce fragmentation." They introduce a
lot of complexity and will become unnecessary once the code is switched
to use fixed-size 128kB buffers.
2018-09-18 17:22:59 +01:00
Paweł Dziepak
9fee8b8d76 commitlog: drop recovery from bad alloc
If a node cannot allocate a 128 kB it is already in a very bad shape, so
there isn't much value in trying to recover by attempting smaller
allocations and it just adds more complexity to the segment allocation.
It actually may be better to let some requests fail and give the node a
chance to recover rather than trying to use every last byte of free
memory and end up with bad_alloc in a noexcept context.
2018-09-18 17:22:59 +01:00
Paweł Dziepak
2e5b375309 utils: drop data_output 2018-09-18 17:22:59 +01:00
Paweł Dziepak
fe48aaae46 commitlog: use memory_output_stream
memory_output_stream deals with all required pointer arithmetic and
allows easy transition to fragmented buffers.
2018-09-18 17:22:59 +01:00
Paweł Dziepak
b9ab058834 serialization_visitors: add support for memory_output_stream 2018-09-18 17:22:59 +01:00
Paweł Dziepak
cbe2ef9e5c utils: fragmented_temporary_buffer::view: add remove_prefix() 2018-09-18 17:22:59 +01:00
Alexys Jacob
24b90ef527 configure.py: coding style fixes
configure.py:23:10: E401 multiple imports on one line
configure.py:39:61: W291 trailing whitespace
configure.py:47:1: E302 expected 2 blank lines, found 1
configure.py:53:16: W291 trailing whitespace
configure.py:55:1: E302 expected 2 blank lines, found 1
configure.py:62:1: E302 expected 2 blank lines, found 1
configure.py:63:53: E251 unexpected spaces around keyword / parameter equals
configure.py:63:55: E251 unexpected spaces around keyword / parameter equals
configure.py:63:68: E251 unexpected spaces around keyword / parameter equals
configure.py:63:70: E251 unexpected spaces around keyword / parameter equals
configure.py:63:92: E251 unexpected spaces around keyword / parameter equals
configure.py:63:94: E251 unexpected spaces around keyword / parameter equals
configure.py:64:33: E251 unexpected spaces around keyword / parameter equals
configure.py:64:35: E251 unexpected spaces around keyword / parameter equals
configure.py:65:54: E251 unexpected spaces around keyword / parameter equals
configure.py:65:56: E251 unexpected spaces around keyword / parameter equals
configure.py:65:69: E251 unexpected spaces around keyword / parameter equals
configure.py:65:71: E251 unexpected spaces around keyword / parameter equals
configure.py:65:94: E251 unexpected spaces around keyword / parameter equals
configure.py:65:96: E251 unexpected spaces around keyword / parameter equals
configure.py:66:33: E251 unexpected spaces around keyword / parameter equals
configure.py:66:35: E251 unexpected spaces around keyword / parameter equals
configure.py:68:1: E302 expected 2 blank lines, found 1
configure.py:72:18: E712 comparison to True should be 'if cond is True:' or 'if cond:'
configure.py:80:1: E302 expected 2 blank lines, found 1
configure.py:83:1: E302 expected 2 blank lines, found 1
configure.py:87:1: E302 expected 2 blank lines, found 1
configure.py:87:33: E251 unexpected spaces around keyword / parameter equals
configure.py:87:35: E251 unexpected spaces around keyword / parameter equals
configure.py:87:45: E251 unexpected spaces around keyword / parameter equals
configure.py:87:47: E251 unexpected spaces around keyword / parameter equals
configure.py:88:56: E251 unexpected spaces around keyword / parameter equals
configure.py:88:58: E251 unexpected spaces around keyword / parameter equals
configure.py:90:1: E302 expected 2 blank lines, found 1
configure.py:94:1: E302 expected 2 blank lines, found 1
configure.py:94:42: E251 unexpected spaces around keyword / parameter equals
configure.py:94:44: E251 unexpected spaces around keyword / parameter equals
configure.py:94:54: E251 unexpected spaces around keyword / parameter equals
configure.py:94:56: E251 unexpected spaces around keyword / parameter equals
configure.py:104:42: E251 unexpected spaces around keyword / parameter equals
configure.py:104:44: E251 unexpected spaces around keyword / parameter equals
configure.py:105:42: E251 unexpected spaces around keyword / parameter equals
configure.py:105:44: E251 unexpected spaces around keyword / parameter equals
configure.py:110:1: E302 expected 2 blank lines, found 1
configure.py:114:29: E251 unexpected spaces around keyword / parameter equals
configure.py:114:31: E251 unexpected spaces around keyword / parameter equals
configure.py:114:61: E251 unexpected spaces around keyword / parameter equals
configure.py:114:63: E251 unexpected spaces around keyword / parameter equals
configure.py:116:1: E302 expected 2 blank lines, found 1
configure.py:123:26: E251 unexpected spaces around keyword / parameter equals
configure.py:123:28: E251 unexpected spaces around keyword / parameter equals
configure.py:123:49: E251 unexpected spaces around keyword / parameter equals
configure.py:123:51: E251 unexpected spaces around keyword / parameter equals
configure.py:123:84: E251 unexpected spaces around keyword / parameter equals
configure.py:123:86: E251 unexpected spaces around keyword / parameter equals
configure.py:129:1: E302 expected 2 blank lines, found 1
configure.py:135:1: E302 expected 2 blank lines, found 1
configure.py:137:35: E251 unexpected spaces around keyword / parameter equals
configure.py:137:37: E251 unexpected spaces around keyword / parameter equals
configure.py:137:53: E251 unexpected spaces around keyword / parameter equals
configure.py:137:55: E251 unexpected spaces around keyword / parameter equals
configure.py:137:83: E251 unexpected spaces around keyword / parameter equals
configure.py:137:85: E251 unexpected spaces around keyword / parameter equals
configure.py:143:1: E302 expected 2 blank lines, found 1
configure.py:148:1: E302 expected 2 blank lines, found 1
configure.py:152:5: E301 expected 1 blank line, found 0
configure.py:159:5: E301 expected 1 blank line, found 0
configure.py:161:5: E301 expected 1 blank line, found 0
configure.py:163:5: E301 expected 1 blank line, found 0
configure.py:165:5: E301 expected 1 blank line, found 0
configure.py:168:1: E302 expected 2 blank lines, found 1
configure.py:169:5: F841 local variable 'mach' is assigned to but never used
configure.py:175:1: E302 expected 2 blank lines, found 1
configure.py:178:5: E301 expected 1 blank line, found 0
configure.py:183:5: E301 expected 1 blank line, found 0
configure.py:185:5: E301 expected 1 blank line, found 0
configure.py:187:5: E301 expected 1 blank line, found 0
configure.py:189:5: E301 expected 1 blank line, found 0
configure.py:192:1: E305 expected 2 blank lines after class or function definition, found 1
configure.py:329:5: E123 closing bracket does not match indentation of opening bracket's line
configure.py:335:5: E123 closing bracket does not match indentation of opening bracket's line
configure.py:340:41: E251 unexpected spaces around keyword / parameter equals
configure.py:340:43: E251 unexpected spaces around keyword / parameter equals
configure.py:340:60: E251 unexpected spaces around keyword / parameter equals
configure.py:340:62: E251 unexpected spaces around keyword / parameter equals
configure.py:340:85: E251 unexpected spaces around keyword / parameter equals
configure.py:340:87: E251 unexpected spaces around keyword / parameter equals
configure.py:341:30: E251 unexpected spaces around keyword / parameter equals
configure.py:341:32: E251 unexpected spaces around keyword / parameter equals
configure.py:342:29: E251 unexpected spaces around keyword / parameter equals
configure.py:342:31: E251 unexpected spaces around keyword / parameter equals
configure.py:343:38: E251 unexpected spaces around keyword / parameter equals
configure.py:343:40: E251 unexpected spaces around keyword / parameter equals
configure.py:343:54: E251 unexpected spaces around keyword / parameter equals
configure.py:343:56: E251 unexpected spaces around keyword / parameter equals
configure.py:344:29: E251 unexpected spaces around keyword / parameter equals
configure.py:344:31: E251 unexpected spaces around keyword / parameter equals
configure.py:345:37: E251 unexpected spaces around keyword / parameter equals
configure.py:345:39: E251 unexpected spaces around keyword / parameter equals
configure.py:345:52: E251 unexpected spaces around keyword / parameter equals
configure.py:345:54: E251 unexpected spaces around keyword / parameter equals
configure.py:346:29: E251 unexpected spaces around keyword / parameter equals
configure.py:346:31: E251 unexpected spaces around keyword / parameter equals
configure.py:349:43: E251 unexpected spaces around keyword / parameter equals
configure.py:349:45: E251 unexpected spaces around keyword / parameter equals
configure.py:349:59: E251 unexpected spaces around keyword / parameter equals
configure.py:349:61: E251 unexpected spaces around keyword / parameter equals
configure.py:349:84: E251 unexpected spaces around keyword / parameter equals
configure.py:349:86: E251 unexpected spaces around keyword / parameter equals
configure.py:350:29: E251 unexpected spaces around keyword / parameter equals
configure.py:350:31: E251 unexpected spaces around keyword / parameter equals
configure.py:351:44: E251 unexpected spaces around keyword / parameter equals
configure.py:351:46: E251 unexpected spaces around keyword / parameter equals
configure.py:351:60: E251 unexpected spaces around keyword / parameter equals
configure.py:351:62: E251 unexpected spaces around keyword / parameter equals
configure.py:351:86: E251 unexpected spaces around keyword / parameter equals
configure.py:351:88: E251 unexpected spaces around keyword / parameter equals
configure.py:352:29: E251 unexpected spaces around keyword / parameter equals
configure.py:352:31: E251 unexpected spaces around keyword / parameter equals
configure.py:353:43: E251 unexpected spaces around keyword / parameter equals
configure.py:353:45: E251 unexpected spaces around keyword / parameter equals
configure.py:353:59: E251 unexpected spaces around keyword / parameter equals
configure.py:353:61: E251 unexpected spaces around keyword / parameter equals
configure.py:353:79: E251 unexpected spaces around keyword / parameter equals
configure.py:353:81: E251 unexpected spaces around keyword / parameter equals
configure.py:354:29: E251 unexpected spaces around keyword / parameter equals
configure.py:354:31: E251 unexpected spaces around keyword / parameter equals
configure.py:355:45: E251 unexpected spaces around keyword / parameter equals
configure.py:355:47: E251 unexpected spaces around keyword / parameter equals
configure.py:355:61: E251 unexpected spaces around keyword / parameter equals
configure.py:355:63: E251 unexpected spaces around keyword / parameter equals
configure.py:355:78: E251 unexpected spaces around keyword / parameter equals
configure.py:355:80: E251 unexpected spaces around keyword / parameter equals
configure.py:356:29: E251 unexpected spaces around keyword / parameter equals
configure.py:356:31: E251 unexpected spaces around keyword / parameter equals
configure.py:359:45: E251 unexpected spaces around keyword / parameter equals
configure.py:359:47: E251 unexpected spaces around keyword / parameter equals
configure.py:359:61: E251 unexpected spaces around keyword / parameter equals
configure.py:359:63: E251 unexpected spaces around keyword / parameter equals
configure.py:359:83: E251 unexpected spaces around keyword / parameter equals
configure.py:359:85: E251 unexpected spaces around keyword / parameter equals
configure.py:360:29: E251 unexpected spaces around keyword / parameter equals
configure.py:360:31: E251 unexpected spaces around keyword / parameter equals
configure.py:361:48: E251 unexpected spaces around keyword / parameter equals
configure.py:361:50: E251 unexpected spaces around keyword / parameter equals
configure.py:361:69: E251 unexpected spaces around keyword / parameter equals
configure.py:361:71: E251 unexpected spaces around keyword / parameter equals
configure.py:361:87: E251 unexpected spaces around keyword / parameter equals
configure.py:361:89: E251 unexpected spaces around keyword / parameter equals
configure.py:362:29: E251 unexpected spaces around keyword / parameter equals
configure.py:362:31: E251 unexpected spaces around keyword / parameter equals
configure.py:363:48: E251 unexpected spaces around keyword / parameter equals
configure.py:363:50: E251 unexpected spaces around keyword / parameter equals
configure.py:363:64: E251 unexpected spaces around keyword / parameter equals
configure.py:363:66: E251 unexpected spaces around keyword / parameter equals
configure.py:363:89: E251 unexpected spaces around keyword / parameter equals
configure.py:363:91: E251 unexpected spaces around keyword / parameter equals
configure.py:364:29: E251 unexpected spaces around keyword / parameter equals
configure.py:364:31: E251 unexpected spaces around keyword / parameter equals
configure.py:365:46: E251 unexpected spaces around keyword / parameter equals
configure.py:365:48: E251 unexpected spaces around keyword / parameter equals
configure.py:365:62: E251 unexpected spaces around keyword / parameter equals
configure.py:365:64: E251 unexpected spaces around keyword / parameter equals
configure.py:365:82: E251 unexpected spaces around keyword / parameter equals
configure.py:365:84: E251 unexpected spaces around keyword / parameter equals
configure.py:365:97: E251 unexpected spaces around keyword / parameter equals
configure.py:365:99: E251 unexpected spaces around keyword / parameter equals
configure.py:366:29: E251 unexpected spaces around keyword / parameter equals
configure.py:366:31: E251 unexpected spaces around keyword / parameter equals
configure.py:367:48: E251 unexpected spaces around keyword / parameter equals
configure.py:367:50: E251 unexpected spaces around keyword / parameter equals
configure.py:367:70: E251 unexpected spaces around keyword / parameter equals
configure.py:367:72: E251 unexpected spaces around keyword / parameter equals
configure.py:368:1: E101 indentation contains mixed spaces and tabs
configure.py:368:1: W191 indentation contains tabs
configure.py:368:4: E128 continuation line under-indented for visual indent
configure.py:368:8: E251 unexpected spaces around keyword / parameter equals
configure.py:368:10: E251 unexpected spaces around keyword / parameter equals
configure.py:369:48: E251 unexpected spaces around keyword / parameter equals
configure.py:369:50: E251 unexpected spaces around keyword / parameter equals
configure.py:369:73: E251 unexpected spaces around keyword / parameter equals
configure.py:369:75: E251 unexpected spaces around keyword / parameter equals
configure.py:370:1: E101 indentation contains mixed spaces and tabs
configure.py:370:13: E128 continuation line under-indented for visual indent
configure.py:370:17: E251 unexpected spaces around keyword / parameter equals
configure.py:370:19: E251 unexpected spaces around keyword / parameter equals
configure.py:371:47: E251 unexpected spaces around keyword / parameter equals
configure.py:371:49: E251 unexpected spaces around keyword / parameter equals
configure.py:371:71: E251 unexpected spaces around keyword / parameter equals
configure.py:371:73: E251 unexpected spaces around keyword / parameter equals
configure.py:372:13: E128 continuation line under-indented for visual indent
configure.py:372:17: E251 unexpected spaces around keyword / parameter equals
configure.py:372:19: E251 unexpected spaces around keyword / parameter equals
configure.py:373:50: E251 unexpected spaces around keyword / parameter equals
configure.py:373:52: E251 unexpected spaces around keyword / parameter equals
configure.py:373:76: E251 unexpected spaces around keyword / parameter equals
configure.py:373:78: E251 unexpected spaces around keyword / parameter equals
configure.py:374:13: E128 continuation line under-indented for visual indent
configure.py:374:17: E251 unexpected spaces around keyword / parameter equals
configure.py:374:19: E251 unexpected spaces around keyword / parameter equals
configure.py:375:52: E251 unexpected spaces around keyword / parameter equals
configure.py:375:54: E251 unexpected spaces around keyword / parameter equals
configure.py:375:68: E251 unexpected spaces around keyword / parameter equals
configure.py:375:70: E251 unexpected spaces around keyword / parameter equals
configure.py:375:94: E251 unexpected spaces around keyword / parameter equals
configure.py:375:96: E251 unexpected spaces around keyword / parameter equals
configure.py:375:109: E251 unexpected spaces around keyword / parameter equals
configure.py:375:111: E251 unexpected spaces around keyword / parameter equals
configure.py:376:29: E251 unexpected spaces around keyword / parameter equals
configure.py:376:31: E251 unexpected spaces around keyword / parameter equals
configure.py:377:43: E251 unexpected spaces around keyword / parameter equals
configure.py:377:45: E251 unexpected spaces around keyword / parameter equals
configure.py:377:59: E251 unexpected spaces around keyword / parameter equals
configure.py:377:61: E251 unexpected spaces around keyword / parameter equals
configure.py:377:79: E251 unexpected spaces around keyword / parameter equals
configure.py:377:81: E251 unexpected spaces around keyword / parameter equals
configure.py:378:29: E251 unexpected spaces around keyword / parameter equals
configure.py:378:31: E251 unexpected spaces around keyword / parameter equals
configure.py:379:30: E251 unexpected spaces around keyword / parameter equals
configure.py:379:32: E251 unexpected spaces around keyword / parameter equals
configure.py:379:46: E251 unexpected spaces around keyword / parameter equals
configure.py:379:48: E251 unexpected spaces around keyword / parameter equals
configure.py:379:62: E251 unexpected spaces around keyword / parameter equals
configure.py:379:64: E251 unexpected spaces around keyword / parameter equals
configure.py:380:30: E251 unexpected spaces around keyword / parameter equals
configure.py:380:32: E251 unexpected spaces around keyword / parameter equals
configure.py:380:44: E251 unexpected spaces around keyword / parameter equals
configure.py:380:46: E251 unexpected spaces around keyword / parameter equals
configure.py:380:58: E251 unexpected spaces around keyword / parameter equals
configure.py:380:60: E251 unexpected spaces around keyword / parameter equals
configure.py:395:36: E251 unexpected spaces around keyword / parameter equals
configure.py:395:38: E251 unexpected spaces around keyword / parameter equals
configure.py:395:76: E251 unexpected spaces around keyword / parameter equals
configure.py:395:78: E251 unexpected spaces around keyword / parameter equals
configure.py:398:18: E127 continuation line over-indented for visual indent
configure.py:424:32: W291 trailing whitespace
configure.py:649:18: E124 closing bracket does not match visual indentation
configure.py:650:17: E127 continuation line over-indented for visual indent
configure.py:650:17: W503 line break before binary operator
configure.py:651:17: W503 line break before binary operator
configure.py:652:17: E124 closing bracket does not match visual indentation
configure.py:784:8: E713 test for membership should be 'not in'
configure.py:790:45: W291 trailing whitespace
configure.py:819:32: E261 at least two spaces before inline comment
configure.py:832:5: E123 closing bracket does not match indentation of opening bracket's line
configure.py:836:35: E251 unexpected spaces around keyword / parameter equals
configure.py:836:37: E251 unexpected spaces around keyword / parameter equals
configure.py:836:49: E251 unexpected spaces around keyword / parameter equals
configure.py:836:51: E251 unexpected spaces around keyword / parameter equals
configure.py:845:45: E251 unexpected spaces around keyword / parameter equals
configure.py:845:47: E251 unexpected spaces around keyword / parameter equals
configure.py:845:59: E251 unexpected spaces around keyword / parameter equals
configure.py:845:61: E251 unexpected spaces around keyword / parameter equals
configure.py:848:43: E251 unexpected spaces around keyword / parameter equals
configure.py:848:45: E251 unexpected spaces around keyword / parameter equals
configure.py:869:1: E302 expected 2 blank lines, found 1
configure.py:879:1: E305 expected 2 blank lines after class or function definition, found 1
configure.py:965:118: E225 missing whitespace around operator
configure.py:967:18: E124 closing bracket does not match visual indentation
configure.py:969:27: F821 undefined name 'python'
configure.py:969:73: E251 unexpected spaces around keyword / parameter equals
configure.py:969:75: E251 unexpected spaces around keyword / parameter equals
configure.py:976:7: E201 whitespace after '{'
configure.py:976:12: E203 whitespace before ':'
configure.py:976:73: E202 whitespace before '}'
configure.py:981:58: E251 unexpected spaces around keyword / parameter equals
configure.py:981:60: E251 unexpected spaces around keyword / parameter equals
configure.py:987:10: E222 multiple spaces after operator
configure.py:1001:17: E124 closing bracket does not match visual indentation
configure.py:1026:29: E251 unexpected spaces around keyword / parameter equals
configure.py:1026:31: E251 unexpected spaces around keyword / parameter equals
configure.py:1100:82: W291 trailing whitespace
configure.py:1110:29: E251 unexpected spaces around keyword / parameter equals
configure.py:1110:31: E251 unexpected spaces around keyword / parameter equals
configure.py:1110:49: E251 unexpected spaces around keyword / parameter equals
configure.py:1110:51: E251 unexpected spaces around keyword / parameter equals
configure.py:1111:64: E251 unexpected spaces around keyword / parameter equals
configure.py:1111:66: E251 unexpected spaces around keyword / parameter equals
configure.py:1112:13: E128 continuation line under-indented for visual indent
configure.py:1112:22: E251 unexpected spaces around keyword / parameter equals
configure.py:1112:24: E251 unexpected spaces around keyword / parameter equals
configure.py:1140:106: W291 trailing whitespace
configure.py:1149:86: E127 continuation line over-indented for visual indent
configure.py:1191:116: E251 unexpected spaces around keyword / parameter equals
configure.py:1191:118: E251 unexpected spaces around keyword / parameter equals
configure.py:1191:139: E251 unexpected spaces around keyword / parameter equals
configure.py:1191:141: E251 unexpected spaces around keyword / parameter equals
configure.py:1197:83: E231 missing whitespace after ','
configure.py:1200:76: E231 missing whitespace after ','
configure.py:1215:99: W291 trailing whitespace
configure.py:1242:31: E251 unexpected spaces around keyword / parameter equals
configure.py:1242:33: E251 unexpected spaces around keyword / parameter equals

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20180917155438.12410-1-ultrabug@gentoo.org>
2018-09-18 13:49:23 +03:00
Avi Kivity
e5e59ea9cf Merge "More SSTables 3.x write tests enriched with read after write." from Vladimir
"
Some of the write tests were missing the read after write validation
which has now been added for better coverage.

Tests: unit {release}
"

* 'projects/sstables-30/more-enriched-tests/v1' of https://github.com/argenet/scylla:
  tests: Enrich test_write_adjacent_range_tombstones_with_rows with read after write
  tests: Enrich test_write_many_range_tombstones with read after write
  tests: Enrich test_write_mixed_rows_and_range_tombstones with read after write
  tests: Enrich test_write_non_adjacent_range_tombstones with read after write
  tests: Enrich test_write_adjacent_range_tombstones with read after write
  tests: Enrich test_write_simple_range_tombstone with read after write.
  tests: Enrich test_write_deleted_column with read after write.
2018-09-18 13:45:52 +03:00
Paweł Dziepak
e464ad4f5d utils: fragmented_temporary_buffer: add empty() and size_bytes() 2018-09-18 11:29:37 +01:00
Paweł Dziepak
f4bb219a8b utils: fragmented_temporary_buffer: add get_ostream() 2018-09-18 11:29:37 +01:00
Paweł Dziepak
196c5a5eee idl: serializer: don't assume Iterator::value_type is bytes_view 2018-09-18 11:29:36 +01:00
Paweł Dziepak
953942b256 idl: serializer: create buffer view from streams 2018-09-18 11:29:36 +01:00
Paweł Dziepak
252cf0c681 utils: crc: accept FragmentRange 2018-09-18 11:29:36 +01:00
Avi Kivity
9d90ba470b Merge "Fix deleted counters handling in SSTables 3.x" from Vladimir
"
This patchset fixes the bug in SSTables 3.x parser that did not properly
handle deleted counter cells.

A write test is enriched to validate read after write so that this case
is covered.

Tests: unit {release}
"

* 'projects/sstables-30/fix-deleted-counters-read/v1' of https://github.com/argenet/scylla:
  tests: Read after write in test_write_counter_table.
  sstables: Fix deleted counter cells processing in SSTables 3.x parser.
2018-09-18 12:20:54 +03:00
Vladimir Krivopalov
8c08ccbd3b tests: Enrich test_write_adjacent_range_tombstones_with_rows with read after write
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-17 11:06:24 -07:00
Vladimir Krivopalov
f0966a935e tests: Enrich test_write_many_range_tombstones with read after write
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-17 11:06:10 -07:00
Vladimir Krivopalov
262874a90c tests: Enrich test_write_mixed_rows_and_range_tombstones with read after write
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-17 11:05:56 -07:00
Vladimir Krivopalov
6fbf4d3589 tests: Enrich test_write_non_adjacent_range_tombstones with read after write
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-17 11:05:42 -07:00
Vladimir Krivopalov
4bf9c87a1a tests: Enrich test_write_adjacent_range_tombstones with read after write
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-17 11:05:26 -07:00
Vladimir Krivopalov
5b087daf91 tests: Enrich test_write_simple_range_tombstone with read after write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-17 11:04:57 -07:00
Vladimir Krivopalov
e63d960b8e tests: Enrich test_write_deleted_column with read after write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-17 11:04:25 -07:00
Eliran Sinvani
83628f5881 cql3: maintain correctness of multicolumn restriction on mixed order columns
When a query with multicolumn inequality is issued on clustering columns
having mixed order (ASC and DESC together), if the ranges are not
broken to none overlapping lexicographically monotonic ones, the node
return incorrect rows. This is due to the search nature
(prefix comparison). The solution is to break the range imposed
by the restriction into several single column restrictions OR-ed
together that will be logically equivalent and preserve the
monotonicity assumption. This commit also fixes incorrect results
returned by a multicolumn query on an all descending columns.

A unit test have been added to account for both issues fixed.

Fixes #2050
Tests: Unit test, manual tests of the use case in the issue.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <3b96620a3bd8b0614359a3b0757f324d45189dbb.1536478193.git.eliransin@scylladb.com>
2018-09-17 20:35:55 +03:00
Vladimir Krivopalov
e796fa2b02 tests: Read after write in test_write_counter_table.
This covers the case of deleted counter cells.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-17 10:11:48 -07:00
Vladimir Krivopalov
79ccce147c sstables: Fix deleted counter cells processing in SSTables 3.x parser.
Deleted counter cells should be processed the same way as regular
deleted cells.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-17 10:10:57 -07:00
Alexys Jacob
cd74dfebfb scripts: coding style fixes
scripts/create-relocatable-package.py:24:1: F401 'shutil' imported but unused
scripts/create-relocatable-package.py:24:1: F401 'tempfile' imported but unused
scripts/create-relocatable-package.py:24:16: E401 multiple imports on one line
scripts/create-relocatable-package.py:26:1: E302 expected 2 blank lines, found 1
scripts/create-relocatable-package.py:47:1: E305 expected 2 blank lines after class or function definition, found 1
scripts/create-relocatable-package.py:93:6: E225 missing whitespace around operator

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20180917152520.5032-1-ultrabug@gentoo.org>
2018-09-17 18:40:23 +03:00
Alexys Jacob
c80d7b97cc scyllatop: more coding style fixes
tools/scyllatop/metric.py:2:1: F401 're' imported but unused
tools/scyllatop/metric.py:53:20: E221 multiple spaces before operator
tools/scyllatop/metric.py:69:20: E221 multiple spaces before operator

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20180917153308.7240-1-ultrabug@gentoo.org>
2018-09-17 18:39:53 +03:00
Raphael S. Carvalho
5bc028f78b database: fix 2x increase in disk usage during cleanup compaction
Don't hold reference to sstables cleaned up, so that file descriptors
for their index and data files will be closed and consequently disk
space released.

Fixes #3735.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180914194047.26288-1-raphaelsc@scylladb.com>
2018-09-17 17:26:46 +03:00
Alexys Jacob
46d101c1f2 scyllatop: coding style fixes
tools/scyllatop/prometheus.py:3:1: F401 'sys' imported but unused
tools/scyllatop/prometheus.py:7:1: E302 expected 2 blank lines, found 1
tools/scyllatop/prometheus.py:12:5: E301 expected 1 blank line, found 0
tools/scyllatop/prometheus.py:17:1: W293 blank line contains whitespace
tools/scyllatop/prometheus.py:22:82: E225 missing whitespace around operator

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20180914110847.1862-1-ultrabug@gentoo.org>
2018-09-17 15:45:43 +03:00
Botond Dénes
a84c26799d tests/mutation_reader_test: fix flaky restricted reader timeout test
The test in question is `restricted_reader_timeout`.

Use `eventually_true()` instead of `sleep()` to wait on the timeout
expiring, making the test more robust on overloaded machines.

Also fix graceful failing, another longstanding issue with this test.
The readers created for the test need different destruction logic
depending whether the test failed or succeeded. Previously this was
dealt with by using the logic that worked in case of success and using
asserts to abort when the test failed, thus avoiding developers
investigating the invalid memory accesses happening due to the wrong
destruction logic.
The solution is to use BOOST_CHECK() macro in the check that validates
whether timeout works as expected. This allows for execution to continue
even if the test failed, and thus allows for running the proper cleanup
code even when the test failed.

Fixes: #3719
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <911921dffc924f1b0a3e86408757467e9be2b65b.1537169933.git.bdenes@scylladb.com>
2018-09-17 09:40:45 +01:00
Nadav Har'El
0006e21c4d tests/view_complex_test: add missing timestamp
test_partial_delete_selected_column() does a long string of various
updates and deletes, each specifies a different timestamp. In one
of these updates, the timestamp was forgotten. This means that the
server picks the current time, a large number.

As the test is currently written, it doesn't matter which timestamp
was chosen, the test would still succeed (if timestamp >= 15, and it
must be since the timestamp is the time from the epoch).
But the intention was probably to use timestamp = 15, so let's make
this intention clear.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180905095552.11883-2-nyh@scylladb.com>
2018-09-17 00:38:55 +01:00
Nadav Har'El
2ae4ed151e tests/view_complex_test - add test passpoints
We recently saw a failure in test_partial_delete_selected_column() but
this is a very long test doing many operations and comparisons of their
results, and without BOOST_TEST_PASSPOINT() we can't know which of them
really failed.

So let's sprinkle BOOST_TEST_PASSPOINT() calls between the different parts
of test_partial_delete_selected_column(). If this test ever fails again,
we'll know where.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180905095552.11883-1-nyh@scylladb.com>
2018-09-17 00:38:55 +01:00
Jesse Haber-Kucharsky
9d27045c76 auth: Shorten random_device instance life-span
On Fedora 28, creating an instance of `std::random_device` opens a file
descriptor for `/dev/urandom` (observed via `strace`).

By declaring static thread-local instances of `std::random_device`,
these descriptors will be open (barring optimization by the compiler)
for the entire duration of the Scylla process's life.

However, the `std::random_device` instance is only necessary for
initializing the `RandomNumberEngine` for generating salts. With this
change, the file-descriptor is closed immediately after the engine is
initialized.

I considered generalizing this pattern of initialization into a
function, but with only two uses (and simple ones) I think this would
only obscure things.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Tests: unit (release)
Message-Id: <f1b985d99f66e5e64d714fd0f087e235b71557d2.1536697368.git.jhaberku@scylladb.com>
2018-09-12 12:14:21 +01:00
Botond Dénes
dfad223ea2 multishard_mutation_reader: shard_reader: don't do concurrent read-aheads
multishard_mutation_reader starts read-aheads on the
shards-to-be-read-soon. When doing this it didn't check whether the
respective shards had an ongoing read-ahead already. This lead to a
single shard executing multiple concurrent read-aheads. This is damaging
for multiple reasons:
    * Can lead to concurrent access of the remote reader's data members.
    * The `shard_reader` was designed around a single read-ahead and
    thus will synchronise foreground reads with only the last one.

The practical implications of this seen so far was that queries reading
a large number of rows (large enough to reliably trigger the
bug) would stop the read early, due the `combined_mutation_reader`'s
internal accounting being messed up by concurrent access.

Also add a unit test. Instead of coming up with a very specific, and
very contrived unit test, use the test-case that detected this bug in
the first place: count(*) on a table with lots of rows (>1000). This
unit-test should serve well for detecting any similar bugs in the
future.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <ff1c49be64e2fb443f9aa8c5c8d235e682442248.1536746388.git.bdenes@scylladb.com>
2018-09-12 11:43:18 +01:00
Botond Dénes
6a07b8ae83 multishard_mutation_reader: update shard_reader's comment
The `adandoned` member was renamed to `stopped`. Update the comment
accordingly.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1d655785f28fe1e5fa041f2f49852f0ad88be53e.1536743950.git.bdenes@scylladb.com>
2018-09-12 11:32:08 +02:00
Botond Dénes
d9a2ffad84 mutation_partition: don't move tracing_state early
Currently the `trace_state` is moved into the `querier` object's
constructor when one has to be created. Since the trace_state is used
below this lines this had the effect that on the first page of the
query, when a querier object has to be created, tracing would not work
inside the `querier_cache` which received a move-from `trace_state` (a
nullptr effectively).
Change the move to a copy so the other half of the function doesn't use
a moved-from `trace_state`.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4987419781aa287141aa9dc8ce99c5068b564c84.1536739052.git.bdenes@scylladb.com>
2018-09-12 11:32:08 +02:00
Botond Dénes
49704755b0 combined_mutation_reader: propagate timeout in fill_buffer()
All user reads go through the combined reader. Not propagating the
timeout down from there means that the storage layer's timeout
functionality is effectively disabled. Spotted while reading the code.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <7fc10eca1c231dd04ac433913d9e6a51b6b17139.1536657041.git.bdenes@scylladb.com>
2018-09-11 15:44:28 +02:00
Botond Dénes
99ab43a1cc flat_mutation_reader: add timeout parameter to operator()()
For consistency with fast_foward_to() and fill_buffer(), and for
correctness: operator()() calls fill_buffer() and thus should provide a
timeout for the storage layer.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <6e97552ac2372e5846c955d94400b5315dbd2a89.1536657041.git.bdenes@scylladb.com>
2018-09-11 15:44:12 +02:00
Tomasz Grabiec
eb321a0830 Merge "Enrich SSTables 3.x write tests with subsequent read" from Vladimir
As our support for reading SSTables 3.x rows is nearly complete, the
write tests can be extended to read data after write.
This patchset adds reading to a handful of write tests.

* https://github.com/argenet/scylla/tree/projects/sstables-30/enrich-write-tests/v6:
  tests: Factor out the helper building SSTables path for write tests.
  tests: Add validate_read() helper to use in SSTables 3.x write tests.
  tests: Preserve tmpdir in SSTables 3.x write tests upon comparison.
  tests: Read SSTables for write_static_row test after validating write.
  tests: Read SSTables for write_composite_partition_key test after
    validating write.
  tests: Read SSTables for write_composite_clustering_key test after
    validating write.
  tests: Read SSTables for write_wide_partitions test after validating
    write.
  tests: Read SSTables for write_ttled_column test after validating
    write.
  tests: Read SSTables for write_collection_wide_update test after
    validating write.
  tests: Read SSTables for write_collection_incremental_update test
    after validating write.
  tests: Read SSTables for write_missing_columns_large_set test after
    validating write.
  tests: Read SSTables for write_multiple_partitions test after
    validating write.
  tests: Read SSTables for write_multiple_rows test after validating
    write.
  tests: Read SSTables for write_different_types test after validating
    write.
  tests: Read SSTables for write_empty_clustering_values test after
    validating write.
  tests: Read SSTables for write_large_clustering_keys test after
    validating write.
  tests: Read SSTables for write_user_defined_type_table test after
    validating write.
  tests: Read SSTables for write_deleted_row test after validating
    write.
  sstables: Fix SSTables 3.x parsing: check use_row_ttl() for TTLed
    columns.
  tests: Read SSTables for write_ttled_row test after validating write.
  Read SSTables for write_compact_table test after validating write.
  tests: Read SSTables for tests of many partitions after validating
    write.
2018-09-11 15:42:43 +02:00
Duarte Nunes
3f0643f34f Merge 'Misc improvements to stateful range scans' from Botond
"
This series contains miscellaneous improvements to the stateful range
scans. These improvements are either things that I forgot to include in
the original series (tracing), was requested by other developers
(comments) or I discovered them while reading the code (lockup and
cleanup).
"

* 'multishard_mutation_query_fixes/v1' of https://github.com/denesb/scylla:
  multishard_mutation_query: add some tracing
  multishard_mutation_query: add comment to `read_context`
  multishard_mutation_query: always cleanup readers properly
  multishard_mutation_query: fix possible deadlock when creating a reader fails
2018-09-11 10:26:05 +01:00
Botond Dénes
7d71b42651 multishard_mutation_query: add some tracing
Add tracing for the following events:
1) Dismantling of the combined buffer.
2) Dismantling of the compaction state.
3) Cleaning up the readers.

(1) and (2) can possibly have adverse effects on the performance of the
query and hence it is important that details about the dismantled
fragments is exposed in the tracing data.
(3) is less critical but still good to know how much readers were
created by the read (in case they aren't saved). Since normally (in
strateful queries) this will always be 0 only trace this when it is
non-zero (and is interesting).
2018-09-11 08:18:16 +03:00
Botond Dénes
b41be7c8e5 multishard_mutation_query: add comment to read_context
Explain the purpose of the class and its intended usage and any gotchas
the reader/modifier of the code has to keep in mind.
2018-09-11 08:18:16 +03:00
Botond Dénes
b6e1a8f32d multishard_mutation_query: always cleanup readers properly
Currently the reader cleanup code, which ensures the readers and their
dependent objects are destroyed in the corect order and a single
smp::submit_to() message, are only run when the readers are attempted to
be saved. However proper cleanup is needed not only then, but also when
the query is not stateful. Rename the current `cleanup()` method to
`stop()`, make it public and call it from a `finally()` block after the
page is finalized to ensure readers are properly cleaned up at all
times.
Also make sure that failures in `stop()` are never propagated so that
a failure in the cleanup doesn't fail the read itself.
2018-09-11 08:18:16 +03:00
Vladimir Krivopalov
c4a4ef6e3c tests: Read SSTables for tests of many partitions after validating write.
This covers five tests, including three for compressed tables:
  - write_many_partitions_deflate
  - write_many_partitions_lz4
  - write_many_partitions_snappy
  - write_many_live_partitions
  - write_many_deleted_partitions

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
f1214bfceb Read SSTables for write_compact_table test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
a39638c0ba tests: Read SSTables for write_ttled_row test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
bcae761d72 sstables: Fix SSTables 3.x parsing: check use_row_ttl() for TTLed columns.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
9b55f06456 tests: Read SSTables for write_deleted_row test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
8869f1a591 tests: Read SSTables for write_user_defined_type_table test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
dae49358d8 tests: Read SSTables for write_large_clustering_keys test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
8c2bc4a16a tests: Read SSTables for write_empty_clustering_values test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
6f23446962 tests: Read SSTables for write_different_types test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
4865f2f5a3 tests: Read SSTables for write_multiple_rows test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
3594b887df tests: Read SSTables for write_multiple_partitions test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
eee775dab7 tests: Read SSTables for write_missing_columns_large_set test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
2d764da415 tests: Read SSTables for write_collection_incremental_update test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
88a3b05210 tests: Read SSTables for write_collection_wide_update test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
abdae2dd9e tests: Read SSTables for write_ttled_column test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
cdf148dc67 tests: Read SSTables for write_wide_partitions test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
5b1a4686eb tests: Read SSTables for write_composite_clustering_key test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
e908d07fe7 tests: Read SSTables for write_composite_partition_key test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
aa5dc16dbb tests: Read SSTables for write_static_row test after validating write.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
42ab8ed3cd tests: Preserve tmpdir in SSTables 3.x write tests upon comparison.
It can be used to do other checks on written files, like reading them
back.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
bc16304e99 tests: Add validate_read() helper to use in SSTables 3.x write tests.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Vladimir Krivopalov
6cddd7500a tests: Factor out the helper building SSTables path for write tests.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-10 17:28:48 -07:00
Botond Dénes
b3f1fe14e8 multishard_mutation_query: fix possible deadlock when creating a reader fails
Failing to create a reader (`do_make_remote_reader()`) can lead to a
deadlock if the reader is in any of the future_*_state states, as the
`then()` block is not executed and hence the promise of the first
future in the chain is not set. Avoid this by changing the `then()` to a
`then_wrapped()` and using `set_exception()` and `set_value()`
accordingly, such that the future is resolved on both the happy and
error path.
2018-09-10 16:41:13 +03:00
Avi Kivity
4553238653 messaging: fix unbounded allocation in TLS RPC server
The non-TLS RPC server has an rpc::resource_limits configuration that limits
its memory consumption, but the TLS server does not. That means a many-node
TLS configuration can OOM if all nodes gang up on a single replica.

Fix by passing the limits to the TLS server too.

Fixes #3757.
Message-Id: <20180907192607.19802-1-avi@scylladb.com>
2018-09-10 12:11:16 +01:00
Gleb Natapov
9e438933a2 mutation_query_test: add test for result size calculation
Check that digest only and digest+data query calculate result size to be
the same.

Message-Id: <20180906153800.GK2326@scylladb.com>
2018-09-06 20:54:57 +03:00
Gleb Natapov
d7674288a9 mutation_partition: accurately account for result size in digest only queries
When measuring_output_stream is used to calculate result's element size
it incorrectly takes into account not only serialized element size, but
a placeholder that ser::qr_partition__rows/qr_partition__static_row__cells
constructors puts in the beginning. Fix it by taking starting point in a
stream before element serialization and subtracting it afterwords.

Fixes #3755

Message-Id: <20180906153609.GJ2326@scylladb.com>
2018-09-06 20:52:44 +03:00
Takuya ASADA
2136479012 dist/debian: delete mounts.conf on scylla-server.postrm
Since we added mounts.conf on 687372bc48,
we need to delete the file on uninstall the package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180905204631.9265-1-syuu@scylladb.com>
2018-09-06 16:50:14 +03:00
Gleb Natapov
98092353df mutation_partition: correctly measure static row size when doing digest calculation
The code uses incorrect output stream in case only digest is requested
and thus getting incorrect data size. Failing to correctly account
for static row size while calculating digest may cause digest mismatch
between digest and data query.

Fixes #3753.

Message-Id: <20180905131219.GD2326@scylladb.com>
2018-09-06 13:09:41 +03:00
Takuya ASADA
ab361e9897 dist/redhat: add mounts.conf to ghost file
Since we added mounts.conf on 687372bc48,
we need to delete the file on uninstall the package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180905191037.1570-1-syuu@scylladb.com>
2018-09-05 22:14:48 +03:00
Jesse Haber-Kucharsky
682805b22c auth: Use finite time-out for all QUORUM reads
Commit e664f9b0c6 transitioned internal
CQL queries in the auth. sub-system to be executed with finite time-outs
instead of infinite ones.

It should have also modified the functions in `auth/roles-metadata.cc`
to have finite time-outs.

This change fixes some previously failing dtests, particularly around
repair. Without this change, the QUORUM query fails to terminate when
the necessary consistency level cannot be achieved.

Fixes #3736.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <e244dc3e731b4019f3be72c52a91f23ee4bb68d1.1536163859.git.jhaberku@scylladb.com>
2018-09-05 21:55:26 +03:00
Tomasz Grabiec
82270c8699 storage_proxy: Fix misqualification of reads as foreground or background in some cases
The foreground reads metric is derived from the number of live read
executors minus the number of background reads. Background reads are
counted down when their resolver times out. However, a read executor
may still be around for a while, resulting in such reads being
accounted as foreground.

Usually, the gap in which this happens is short, because executor
reference holders timeout quickly as well. It's not always the case
though. For instance, local read executor doesn't time out quickly
when the target shard has an overloaded CPU, and it takes a while
before the request goes through all the queues, even if IO is not
involved. Observed in #3628.

Fixes #3734.

Another problem is that all reads which received CL responses are
accounted as background, until all replicas respond, but if such read
needs reconciliation, it's still practically a foreground read and
should be accounted as such. Found during code review.

Fixes #3745.

This patch fixes both issues by rearranging accounting to track
foreground reads instead of background reads, and considering all
reads as foreground until the resulting promise is resolved.

Message-Id: <1535999620-25784-1-git-send-email-tgrabiec@scylladb.com>
2018-09-05 20:42:51 +03:00
Avi Kivity
c168805ca6 Merge "Filtering and fast-forwarding of range tombstones in SSTables 3.x" from Vladimir
"
This patchset adds proper support for sliced reads of partitions
containing range tombstones.

Given the SSTables 3.x repesentation of range tombstones by separate
start and end markers, we refer to the index for the information about
the currently opened range tombstone, if any, when skipping to the next
promoted index block.

Note that for this we have to take the promoted index block immediately
preceding the one we are jumping to.

Tests: unit {release}
"

* 'projects/sstables-30/range-tombstones-slicing/v3' of https://github.com/argenet/scylla:
  tests: Test filtering and forwarding on a partition with interleaved rows and RTs.
  tests: Add tests for reading wide partitions with range tombstones.
  sstables: Support slicing for range tombstones.
  sstables: Set/reset range tombstone start from end open marker.
  sstables: Fix end_open_marker population in promoted index blocks.
  sstables: Add need_skip() helper to data_consume_context.
  sstables: For end_open_marker, return both position in partition and deletion time.
2018-09-05 20:38:39 +03:00
Vladimir Krivopalov
3d13ee3909 tests: Test filtering and forwarding on a partition with interleaved rows and RTs.
In this test, rows lie inside range tombstones so we split them on
reading.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-05 09:48:17 -07:00
Vladimir Krivopalov
d39e58a97a tests: Add tests for reading wide partitions with range tombstones.
Test the case where rows lie outside range tombstones.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-05 09:48:17 -07:00
Vladimir Krivopalov
ec2047e1e6 sstables: Support slicing for range tombstones.
Both filtering on queried ranges and fast-forwarding are supported.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-05 09:48:17 -07:00
Vladimir Krivopalov
d57380f44c sstables: Set/reset range tombstone start from end open marker.
When we skip through a wide partition using promoted index, we may land
to a position that lies in the middle of a range tombstone so we need to
be aware of it. For this, we check if the previous promoted block has an
end open marker and either set the range tombstone start using it or
reset if missing.

Note several things about the implementation.

Firstly, we have to peek back at the previous promoted index block for the
end open marker, and so we have to always preserve one more promoted
index block when we read the next batch so that we can stil access it.

Secondly, we use the previous promoted block end position to build
position in partition for the range tombstone start.

Lastly, we don't have a notion of end open marker in older consumers
that work with SSTables of ka/la formats so we only call the
corresponding methods if the consumer supports them.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-05 09:48:17 -07:00
Vladimir Krivopalov
939e4893ef sstables: Fix end_open_marker population in promoted index blocks.
We should not access the internal object stored in std::optional when
passing the end_open_marker, moreover that it can be disengaged.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-05 09:48:17 -07:00
Vladimir Krivopalov
84bff86fbc sstables: Add need_skip() helper to data_consume_context.
This methods tells whether we will need to skip to reach the input
position or not.
It can be used for skipping with index when reading SSTables 3.x because
we only want to to set/reset the open range tombstone bound when we
actually move to another promoted index block.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-05 09:48:17 -07:00
Tomasz Grabiec
cd201d1987 db/batchlog_manager: Do not return a value from timer callback
Timer callbacks are std::function<void()>.

Exposed by changing callback_t to noncopyable_function<>.

Message-Id: <1536138045-29209-1-git-send-email-tgrabiec@scylladb.com>
2018-09-05 12:32:21 +03:00
Asias He
89b769a073 storage_service: Wait for range setup before announcing join status
When a joining node announcing join status through gossip, other
existing nodes will send writes to the joining node. At this time, it
is possible the joining node hasn't learnt the tokens of other nodes
that causes the error like below:

   token_metadata - sorted_tokens is empty in first_token_index!
   storage_proxy - Failed to apply mutation from 127.0.4.1#0:
   std::runtime_error (sorted_tokens is empty in first_token_index!)

To fix, wait for the token range setup before announcing the join
status.

Fixes: #3382
Tests: 60 run of materialized_views_test.py:TestMaterializedViews.add_dc_during_mv_update_test

Message-Id: <01abb21ae3315ae275297e507c5956e5774557ef.1536128531.git.asias@scylladb.com>
2018-09-05 10:51:43 +03:00
Vlad Zolotarov
dae70e1166 tests: loading_cache_test: configure a validity timeout in test_loading_cache_loading_different_keys to a greater value
Change the validity timeout from 1s to 1h in order to avoid false alarms
on busy systems: for a short value there is a chance that
(loading_cache.size() == num_loaders) check is going to run after some elements
of the cache have already been evicted.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20180904193026.7304-1-vladz@scylladb.com>
2018-09-05 10:19:59 +03:00
Vladimir Krivopalov
ac0c71bdc1 sstables: For end_open_marker, return both position in partition and deletion time.
Prior to this fix, the end_open_marker has been only accessible as a
plain deletion_time structure. Now it also contains the start position
of a promoted index block so that it can be used for setting range
tombstone open bound.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-04 18:16:21 -07:00
Piotr Sarna
f494d03c3f tests: add test case for filtering with DESC clustering order
Refs #3741

Message-Id: <1b8eab8d668eb000b306686c15324e6acde8e616.1535981852.git.sarna@scylladb.com>
2018-09-04 16:05:19 +03:00
Piotr Sarna
8e52b66516 cql3: fix filtering with descending clustering order
When slice::is_satisfied_by() restriction check is performed
on raw data represented as bytes, it should always use a regular
type comparator, not a reversed one. Reversed types are used to
preserve descending clustering order, but comparison with constants
should be used with a regular underlying type comparator (for x < 1
to actually mean 'lesser than 1' instead of 'bigger than 1, because
the clustering order is reversed').

Fixes #3741

Message-Id: <3e25fc66688c9253287f2c4f31ede8339b9bbe23.1535981852.git.sarna@scylladb.com>
2018-09-04 16:05:15 +03:00
Piotr Sarna
5b5c9f2707 cql3: fix a 'pratition_key' typo
partition_key got misspelled with 'pratition_key' typo in the original
series.

Message-Id: <de59fe6161df5442b19d8ba4336e2f828b7ede32.1535981852.git.sarna@scylladb.com>
2018-09-04 16:05:09 +03:00
Takuya ASADA
bd8a5664b8 dist/common/scripts/scylla_raid_setup: create scylla-server.service.d when it doesn't exist
When /etc/systemd/system/scylla-server.service.d/capabilities.conf is
not installed, we don't have /etc/systemd/system/scylla-server.service.d/,
need to create it.

Fixes #3738

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180904015841.18433-1-syuu@scylladb.com>
2018-09-04 10:12:32 +03:00
Tomasz Grabiec
4fb3f7e8eb managed_vector: Make external_memory_usage() ignore reserved space
This ensures that row::external_memory_usage() is invariant to
insertion order of cells.

It should be so, so that accounting of a clustering_row, merged from
multiple MVCC versions by the partition_snapshot_flat_reader on behalf
of a memtable flush, doesn't give a greater result than what is used
by the memtable region. Overaccounting leads to assertion failure in
~flush_memory_accounter.

Fixes #3625 (hopefully).

Message-Id: <1535982513-19922-1-git-send-email-tgrabiec@scylladb.com>
2018-09-03 17:09:54 +03:00
Takuya ASADA
d78762d627 dist/debian: fix broken debian/changelog
It also need $MUSTACHE_DIST.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180903094558.3862-1-syuu@scylladb.com>
2018-09-03 14:04:01 +03:00
Duarte Nunes
e49a14e308 Merge 'Stateful range scans' from Botond
"
This series extends the query statefullness, introduced by f8613a841 to
point queries, to range scans as well. This means that queriers will be
saved and reused for range scans too.
This series builds heavily on the infrastructure introduced by stateful
point queries, namely the querier object and the querier_cache. It also
builds on another critical piece of infrastructure, the
multishard_combining_reader, introduced by 2d126a79b.
To make the range scan on a given node suspendable and resumable we move
away from the current code in
`storage_proxy::query_nonsingular_mutations_locally()` and use a
multishard_combining_reader to execute the read. When the page is filled
this reader is dismantled and its shard readers are saved in the
querier cache.
There are of course a lot more details to it but this is the gist of it.

Tests: unit(release, debug), dtest(paging_test.py, paging_additional_test.py)
"

* '1865/range-scans/v7.1' of https://github.com/denesb/scylla: (33 commits)
  query_pagers: generate query_uuid for range-scans as well
  storage_proxy: use preferred/last replicas
  storage_proxy: add preferred/last replicas to the signature of query_partition_key_range_concurrent
  db::consistency_level::filter_for_query() add preferred_endpoints
  storage_proxy: use query_mutations_from_all_shards() for range scans
  tests: add unit test for multishard_mutation_query()
  tests/mutation_assertions.hh: add missing include
  multishard_mutation_query: add badness counters
  database: add query_mutations_on_all_shards()
  mutation_compactor: add detach_state()
  flat_mutation_reader: add unpop_mutation_fragment()
  Move reconcilable_result_builder declaration to mutation_query.hh
  mutation_source_test: add an additional REQUIRE()
  mutation: add missing assert to mutation from reader
  querier: add shard_mutation_querier
  querier: prepare for multi-ranges
  tests/querier_cache: add tests specific for multiple entry-types
  querier: split querier into separate data and mutation querier types
  querier: move consume_page logic into a free function
  querier: move all matching related logic into free functions
  ...
2018-09-03 09:09:17 +01:00
Botond Dénes
cd49c23a66 query_pagers: generate query_uuid for range-scans as well
And thus enable stateful range scans.
2018-09-03 10:31:44 +03:00
Botond Dénes
6486d6c8bd storage_proxy: use preferred/last replicas 2018-09-03 10:31:44 +03:00
Botond Dénes
577a06ce1b storage_proxy: add preferred/last replicas to the signature of query_partition_key_range_concurrent 2018-09-03 10:31:44 +03:00
Botond Dénes
6e59cee244 db::consistency_level::filter_for_query() add preferred_endpoints
To the second overload (the one without read-repair related params) too.
2018-09-03 10:31:44 +03:00
Botond Dénes
2f66bde26f storage_proxy: use query_mutations_from_all_shards() for range scans 2018-09-03 10:31:44 +03:00
Botond Dénes
6779b63dfe tests: add unit test for multishard_mutation_query() 2018-09-03 10:31:44 +03:00
Botond Dénes
c678b665b4 tests/mutation_assertions.hh: add missing include 2018-09-03 10:31:44 +03:00
Botond Dénes
253407bdc8 multishard_mutation_query: add badness counters
Add badness counters that allow tracking problems. The following
counters are added:
1) multishard_query_unpopped_fragments
2) multishard_query_unpopped_bytes
3) multishard_query_failed_reader_stops
4) multishard_query_failed_reader_saves

The first pair of counters observe the amount of work range scan queries
have to undo on each page. It is normal for these counters to be
non-zero, however sudden spikes in their values can indicate problems.
This undoing of work is needed for stateful range-scans to work.
When stateful queries are enabled the `multishard_combining_reader` is
dismantled and all unconsumed fragments in its and any of its
intermediate reader's buffers are pushed back into the originating shard
reader's buffer (via `unpop_mutation_fragment()`). This also includes
the `partition_start`, the `static_row` (if there is one) and all
extracted and active `range_tombstone` fragments. This together can
amount to a substantial amount of fragments.
(1) counts the amount of fragments moved back, while (2) counts the
number of bytes. Monitoring size and quantity separately allows for
detecting edge cases like moving many small fragments or just a few huge
ones. The counters count the fragments/bytes moved back to readers
located on the shard they belong to.

The second pair of counters are added to detect any problems around
saving readers. Since the failure to save a reader will not fail the
read itself, it is necessary to add visibility to these failures by
other means.
(3) counts the number of times stopping a shard reader (waiting
on pending read-aheads and next-partitions) failed while (4)
counts the number of times inserting the reader into the `querier_cache`
failed.
Contrary to the first two counters, which will almost certainly never be
zero, these latter two counters should always be zero. Any other value
indicates problems in the respective shards/nodes.
2018-09-03 10:31:44 +03:00
Botond Dénes
97364c7ad9 database: add query_mutations_on_all_shards()
This method allows for querying a range or ranges on all shards of the
node. Under the hood it uses the multishard_combining_reader for
executing the query.
It supports paging and stateful queries (saving and reusing the readers
between pages). All this is transparent to the client, who only needs to
supply the same query::read_command::query_uuid through the pages of the
query (and supply correct start positions on each page, that match the
stop position of the last page).
2018-09-03 10:31:44 +03:00
Botond Dénes
33d72efa49 mutation_compactor: add detach_state()
Allow the state of the compaction to be detached. The detached state is
a set of mutation fragments, which if replayed through a new compactor
object will result in the latter being in the same state as the previous
one was.
This allows for storing the compaction state in the compacted reader by
using `unpop_mutation_fragment()` to push back the fragments that
comprise the detached state into the reader. This way, if a new
compaction object is created it can just consume the reader and continue
where the previous compaction left off.
2018-09-03 10:31:44 +03:00
Botond Dénes
48054ed810 flat_mutation_reader: add unpop_mutation_fragment()
This is the inverse of `pop_mutation_fragment()`. Allow fragments to be
pushed back into the buffer of the reader to undo a previous consumtion
of the fragments.
2018-09-03 10:31:44 +03:00
Botond Dénes
3bcd577907 Move reconcilable_result_builder declaration to mutation_query.hh
It will be used by code outside of mutation_partition.cc so it needs to
be public. The definition remains in mutation_partition.cc.
2018-09-03 10:31:44 +03:00
Botond Dénes
b8b34223a4 mutation_source_test: add an additional REQUIRE()
test_streamed_mutation_forwarding_is_consistent_with_slicing already has
a REQUIRE() for the mutation read with the slicing reader. Add another
one for the forwarding reader. This makes it more consistent and also
helps finding problems with either the forwarding or slicing reader.
2018-09-03 10:31:44 +03:00
Botond Dénes
d347866664 mutation: add missing assert to mutation from reader
read_mutation_from_flat_mutation_reader's internal adapter can build a
single mutation only and hence can consume only a single partition.
If more than one partitions are pushed down from the producer the
adaptor will very likely crash. To avoid unnecessary investigations add
an assert() to fail early and make it clear what the real problem is.
All other consume_ methods have an assert() already for their
invariants so this is just following suit.
2018-09-03 10:31:44 +03:00
Botond Dénes
ecb1e79bcc querier: add shard_mutation_querier
The querier to be used for saving shard readers belonging to a
multishard range scan. This querier doesn't provide a `consume_page`
method as it doesn't support reading from it directly. It is more
of a storage to allow caching the reader and any objects it depends on.
2018-09-03 10:31:44 +03:00
Botond Dénes
07cdf766c5 querier: prepare for multi-ranges
In the next patch a querier will be added that reads multiple ranges as
opposed to a single range that data and mutation queriers read.
To keep `querier_cache` code seamless regarding this difference change all
range-matching logic to work in terms of `dht::partition_ranges_view`.
This allows for cheap and seamless way of having a single code-base for
the insert/lookup logic. Code actually matching ranges is updated to be
able to handle both singular and multi-ranges while maintaining backward
compatibility.
2018-09-03 10:31:44 +03:00
Botond Dénes
88a7effd8d tests/querier_cache: add tests specific for multiple entry-types 2018-09-03 10:31:44 +03:00
Botond Dénes
c12008b8cb querier: split querier into separate data and mutation querier types
Instead of hiding what compaction method the querier uses (and only
expose it via rejecting 'can_be_used_for_page()`) make it very explicit
that these are really two different queriers. This allows using
different indexes for the two queriers in `querier_cache` and
eliminating the possibility of picking up a querier with the wrong
compaction method (read kind).
This also makes it possible to add new querier type(s) that suit the
multishard-query's needs without making a confusing mess of `querier` by
making it a union of all querying logic.

Splitting the queriers this way changes what happens when a lookup finds
a querier of the wrong kind (e.g. emit_only_live::yes for an
emit_only_live::no command). As opposed to dropping the found (but
wrong) querier the querier will now simply not be found by the lookup.
This is a result of using separate search indexes for the different
mutation kinds. This change should have no practical implications.

Splitting is done by making querier templated on `emit_only_live_rows`.
It doesn't make sense to duplicate the entire querier as the two share
99% of the code.
2018-09-03 10:31:44 +03:00
Botond Dénes
e46251ebf6 querier: move consume_page logic into a free function
In preparation of the now single querier being split into multiple more
specialized ones. Make it possible for the multiple queriers sharing the
same implementation. Also, the code can now be reused by outside code as
well, not just queriers.
2018-09-03 10:31:44 +03:00
Botond Dénes
c53f17ddb8 querier: move all matching related logic into free functions
So that they can be used for multiple querier classes easily, without
inheritance. The functions are not visible from the header.
Also update the comments on `querier` to w.r.t. the disappeared
checking functions. Change the language to be more general. In practice
these checks are never done by client code, instead they are done by the
`querier_cache`.
2018-09-03 10:31:44 +03:00
Botond Dénes
43f464c52d querier: inline querier::current_position() and make it public 2018-09-03 10:31:44 +03:00
Botond Dénes
86a61ded7d querier: s/position/position_view/
Also treat it as a view, that is take it by value in functions,
instead of reference.
2018-09-03 10:31:44 +03:00
Botond Dénes
6e4ec53679 querier: move position outside of querier
In preparation for having multiple querier types that can share code
without inheritance.
2018-09-03 10:31:44 +03:00
Botond Dénes
a172dfec4e querier: move clustering_position_tracker outside of querier
In preparation for having multiple querier types that can share code
without inheritance.
2018-09-03 10:31:44 +03:00
Botond Dénes
7bd955e993 querier_cache: move insert/lookup related logic into free functions
In preparations for introducing support multiple entry types in the
querier_cache move all insert/lookup related logic into free functions.
Later these functions will be templated so they can handle multiple
entry types with the same code.
2018-09-03 10:31:44 +03:00
Botond Dénes
cded477b94 querier: return std::optional<querier> instead of using create_fun()
Requiring the caller of lookup() to pass in a `create_fun()` was not
such a good idea in hindsight. It leads to awkward call sites and even
more awkward code when trying to find out whether the lookup was
successfull or not.
Returning an optional gives calling code much more flexibility and makes
the code cleaner.
2018-09-03 10:31:44 +03:00
Botond Dénes
5f726e9a89 querier: move all to query namespace
To avoid name clashes.
2018-09-03 10:31:44 +03:00
Botond Dénes
867f69b9d1 dht::i_partitioner: add partition_ranges_view 2018-09-03 10:31:44 +03:00
Botond Dénes
a011a9ebf2 mutation_reader: multishard_combining_reader: support custom dismantler
Add a dismantler functor parameter. When the multishard reader is
destroyed this functor will be called for each shard reader, passing a
future to a `stopped_foreign_reader`. This future becomes available when
the shard reader is stopped, that is, when it finished all in-progress
read-aheads and/or pending next partition calls.

The intended use case for the dismantler functor is a client that needs
to be notified when readers are destroyed and/or has to have access to
any unconsumed fragments from the foreign readers wrapping the shard
readers.
2018-09-03 10:31:44 +03:00
Botond Dénes
f13b878a94 mutation_reader: pass all standard reader params to remote_reader_factory
Extend `remote_reader_factory` interface so that it accepts all standard
mutation reader creation parameters. This allows factory lambdas to be
truly stateless, not having to capture any standard parameters that is
needed for creating the reader.
Standard parameters are those accepted by
`mutation_source::make_reader()`.
2018-09-03 10:31:44 +03:00
Botond Dénes
e67c6d9f39 flat_mutation_reader::impl: add protected buffer() member
To allow implementations to access the buffer in a read-only way.
2018-09-03 10:31:44 +03:00
Botond Dénes
8915293257 multishard_combining_reader: fix incorrect comment 2018-09-03 10:31:44 +03:00
Botond Dénes
75d60b0627 docs: add paged-queries.md design doc 2018-09-03 10:31:44 +03:00
Duarte Nunes
6593226849 Merge branch 'loading_cache: fix a consistency of size() and iterators APIs' from Vlad
"
After we fixed reloading flow it enabled situations when items are no longer cached but
still held in the underlying loading_shared_values object. Since loading_cache::size() returns
the size of its loading_shared_values object and loading_cache::begin()/end()/find() are returning
iterators based on loading_shared_values iterators these APIs may return very weird values, e.g.
size() may return the same value after one of the items have been removed using remove(key) API.

This series fixes this by switching mentioned above APIs to work on top of lru_list object instead
of loading_shared_values.
"

* 'loading_cache_fix_api_semantics-v1' of https://github.com/vladzcloudius/scylla:
  loading_cache: make iterator work on top of lru_list iterators instead of loading_shared_values'
  loading_cache: make size() return the size of lru_list instead of loading_shared_values
2018-09-01 11:05:28 +01:00
Avi Kivity
fd8eae50db build: add relocatable package target
A relocatable package contains the Scylla (and iotune)
executables (in a bin/ directory), any libraries they may need (lib/)
the configuration file defaults (conf/) and supporting scripts (dist/).
The libraries are picked up from the host; including libc and the dynamic
linker (ld.so).

We also provide a thunk script that forces the library path
(LD_LIBRARY_PATH) to point at our libraries, and overrides the
interpreter to point at our ld.so.

With these files, it is possible to run a fully functional Scylla
instance on any Linux distribution. This is similar to chroot or
containers, except that we run in the same namespace as the host.

The packages are created by running

    ninja build/release/scylla-package.tar

or

    ninja --mode debug build/debug/scylla-package.tar
Message-Id: <20180828065352.30730-1-avi@scylladb.com>
2018-08-31 23:14:42 +01:00
Vlad Zolotarov
945d26e4ee loading_cache: make iterator work on top of lru_list iterators instead of loading_shared_values'
Reloading may hold value in the underlying loading_shared_values while
the corresponding cache values have already been deleted.

This may create weird situations like this:

<populate cache with 10 entries>
cache.remove(key1);
for (auto& e : cache) {
    std::out << e << std::endl;
}

<all 10 entries are printed, including the one for "key1">

In order to avoid such situations we are going to make the loading_cache::iterator
to be a transform_iterator of lru_list::iterator instead of loading_shared_values::iterator
because lru_list contains entries only for cached items.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-08-30 20:56:44 -04:00
Vlad Zolotarov
1e56c7dd58 loading_cache: make size() return the size of lru_list instead of loading_shared_values
reloading flow may hold the items in the underlying loading_shared_values
after they have been removed (e.g. via remove(key) API) thereby loading_shared_values.size()
doesn't represent the correct value for the loading_cache. lru_list.size() on the other hand - does.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-08-30 15:55:30 -04:00
Paweł Dziepak
dbbd664600 Update seastar submodule
* seastar 12f18ce...5712816 (6):
  > tests: add signal_test to test list
  > Merge "Enhancements for memory_output_stream" from Paweł
  > seastar-addr2line: don't print an empty line between backtrace lines
  > seastar-addr2line: add --verbose option
  > seastar-addr2line: make prefix matching non-greedy
  > future: make available() const
2018-08-30 11:41:27 +01:00
Glauber Costa
8dea1b3c61 database: fix directory for information when loading new SSTables from upload dir
When we load new SSTables, we use the directory information from the
entry descriptor to build information about those SSTables. When the
descriptor is created by flush_upload_dir, the sstable directory used in
the descriptor contains the `upload` part. Therefore, we will try to
load SSTables that are in the upload directory when we already moved
them out and fail.

Since the generation also changes, we have been historically fixing the
generation manually, but not the SSTable directory. The reason for that
is that up until recently, the SSTable directory was passed statically
to open_sstables, ignoring whatever the entry descriptor said. Now that
the sstable directory is also derived from the entry descriptor, we
should fix that too.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20180829165326.12183-1-glauber@scylladb.com>
2018-08-30 10:34:25 +03:00
Nadav Har'El
2f02d006b3 materialized views: more tests
Additional tests for cases surrounding issue #3362, where base rows
disappear (or not) and view rows need to disappear (or not) as well.
These new tests focus on checking that view_updates::do_delete_old_entry()
is correct.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180829131914.16042-2-nyh@scylladb.com>
2018-08-29 14:33:48 +01:00
Nadav Har'El
16a6f76873 materialized views: simplify do_delete_old_entry()
In previous patches, we gave up on an old (and broken) attempt to track
the timestamps of many unselected base-table columns through one row marker
in the view table - and replaced them by "virtual cells", one per unselected
cell.

The do_delete_old_entry() function still contains old code which maintained
that row marker, and is no longer needed. That old code is no only no longer
needed, it also no longer did anything because all columns now appear in
the view (as virtual columns) so the code ignored them when calculating the
row marker.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180829131914.16042-1-nyh@scylladb.com>
2018-08-29 14:33:41 +01:00
Duarte Nunes
79d796e710 Merge 'Materialized Views: row liveness correction' from Nadav
"
When a view's partition key contains only columns from the base's partition
key (and not an additional one), the liveness - existance or disappearance -
of a view-table row is tied to the liveness of the base table row. And
that, in turn, depends not only on selected columns (base-table columns
SELECTed to also appear in the view) but also on unselected columns.

This means that we may need to keep a view row alive even without data,
just because some unselected column is alive in the base table. Before this
patch set we tried to build a single "row marker" in the view column which
tried to summarize the liveness information in all unselected columns.
But this proved unworkable, as explained in issue #3362 and as will be
demonstrated in unit tests at the end of this series.

Because we can't replace several unselected cells by one row marker, what
we do in this series is to add for each for the unselected cells a "virtual
cell" which contains the cell's liveness information (timestamp, deletion,
ttl) but not its value. For collections, we can't represent the entire
collection by one virtual cell, and rather need a collection of virtual
cells.

Fixes #3362
"

* 'virtual-cols-v3' of https://github.com/nyh/scylla:
  Materialized Views: test that virtual columns are not visible
  Materialized Views: unit test reproducing fixed issue #3362
  Materialized Views: no need for elaborate row marker calculations
  Materialized Views: add unselected columns as virtual columns
  Materialized Views: fill virtual columns
  Do not allow selecting a virtual column
  schema: persist "view virtual" columns to a separate system table
  schema: add "view virtual" flag to schema's column_definition
  Add "empty" type name to CQL parser, but only for internal parsing
2018-08-29 14:32:38 +01:00
Paweł Dziepak
6f1c3e6945 Merge "Convert more execution_stages to inherit scheduling_groups" from Avi
"
Previous work (71471bb322) converted the CQL layer to inheriting
execution stages, paving the way to multiple users sharing the front-end.

This patchset does the same thing to the back-end, converting more execution
stages to preserve the caller's scheduling_group. Since RPC now (8c993e0728)
assigns the correct scheduling group within the replica, we can extend that
work so a statement is executed with the same scheduling group all the way
to sstable parsing, even if we cross nodes in the process. This improves
performance isolation and paves the way to multi-user SLA guarantees.
"

* tag 'inherit-sched_group/v1' of https://github.com/avikivity/scylla:
  database: make database's mutation apply stage inherit its scheduling group from the caller
  database: make database::_mutation_query_stage inherit the scheduling group
  database: make database::_data_query_stage inheriting its caller's scheduling_group
  storage_proxy: make _mutate_stage inherit its caller's scheduling_group
2018-08-28 13:49:31 +01:00
Duarte Nunes
f6aadd8077 Merge 'utils::loading_cache: improve reload() robustness' from Vlad
"This series introduces a few improvements related to a reload flow.

From now on the callback may assume that the "key" parameter value
is kept alive till the end of its execution in the reloading flow.

It may also safely evict as many items from the cache as needed."

Fixes #3606

* 'loading_cache_improve_reload-v1' of https://github.com/vladzcloudius/scylla:
  utils::loading_cache: hold a shared_value_ptr to the value when we reload
  utils::loading_cache::on_timer(): remove not needed capture of "this"
  utils::loading_cache::on_timer(): use chunked_vector for storing elements we want to reload
2018-08-28 10:52:20 +01:00
Piotr Sarna
aa2bfc0a71 tests: add multi-column pk test to INSERT JSON case
Refs #3687
Message-Id: <6ba1328549ed701691ca7cbdacc7d6fa72f2c3de.1534171422.git.sarna@scylladb.com>
2018-08-28 11:34:13 +03:00
Piotr Sarna
fa72422baa cql3: fix handling multi-column partition key in INSERT JSON
Multiple column partition keys were previously handled incorrectly,
now the implementation is based on from_exploded instead of
from_singular.

Fixes #3687
Message-Id: <09e0bdb0f1c18d49b9e67c21777d93ba1545a13c.1534171422.git.sarna@scylladb.com>
2018-08-28 11:34:11 +03:00
Avi Kivity
1fd9974b6b Merge "tests/loading_cache_test: Fix flakiness" from Duarte
"
Fix loading_cache_test flakiness by retrying assertions.

Tests: unit(loading_cache_test(debug, release))

Fixes #3723
"

* 'loading-cache-test-flake/v4' of https://github.com/duarten/scylla:
  tests/loading_cache_test: Unflake test_loading_cache_loading_reloading
  tests/loading_cache_test: Use eventually() instead of open-coding it
  tests/mutation_reader_test: Extract eventually_true() to eventually.hh
  tests/cql_test_env: Lift eventually() to its own header file
2018-08-28 09:35:09 +03:00
Takuya ASADA
4a5157857a dist/debian: support package renaming on build script
To automatically rename packages on enterprise release, added package name
prefix as a variable on build_deb.sh.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180828010445.11920-1-syuu@scylladb.com>
2018-08-28 09:25:07 +03:00
Avi Kivity
22396d57c2 Update seastar submodule
* seastar 9bb1611...12f18ce (17):
  > correctly configure I/O Scheduler for usage with the YAML file
  > Added support for user-defined signal handlers
  > Added reactor method to modify blocked_reactor_notify_ms
  > configure.py: Use the user-specified compiler for dialect detection
  > seastar-addr2line: clear current trace when omitting already seen trace
  > seastar-addr2line: fix redirecting output to a file
  > seastar-addr2line: don't require a space before the addresses
  > tests: Ensure test thread is always joined
  > README.md: Add cute badges
  > iotune: adjust num-io-queues recommendation
  > dns: add SRV record lookup
  > reactor: define max_aio_per_queue for C++14
  > reactor,alien: silence GCC warnings
  > core,json,net: silence GCC warnings
  > fstream: "using data_sink_impl::put" to silence gcc warning
  > Merge 'Ensure Seastar compiles in C++14 mode' from Jesse
  > Revert "foreign_ptr: allow waiting for the destruction of the managed ptr"
2018-08-28 09:10:14 +03:00
Tomasz Grabiec
75cde85349 Merge "Support reading range tombstones" from Piotr and Vladimir
Implement and test support for reading range tombstones in SSTables 3.

Does not yet support reads which are using slicing or fast forwarding.

From github.com/scylladb/seastar-dev.git haaawk/sstables3/tombstones_v11:

Piotr Jastrzebski (5):
  sstables: Add consumer_m::consume_range_tombstone
  sstables: Support null columns in ck
  sstables: Support reading range_tombstones
  sstables: Test reading range_tombstones
  sstables: Add test for RT with non-full key

Vladimir Krivopalov (2):
  sstables: Add operator<< overload for bound_kind_m.
  keys: Add clustering_key_prefix::make_full helper.
2018-08-27 20:43:38 +02:00
Duarte Nunes
40044c0460 tests/loading_cache_test: Unflake test_loading_cache_loading_reloading
The `loading_cache_test::test_loading_cache_loading_reloading` test
case is flaky, and fails in both debug and release mode. In an
over-provisioned environment, it's possible that when the reactor
runs, the timers for the `sleep()` and for reloading the
`loading_cache` are both expired, and continuations are scheduled with
an arbitrary order, causing the test to fail.

Fixes #3723

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-08-27 19:24:05 +01:00
Duarte Nunes
0cb03b966d tests/loading_cache_test: Use eventually() instead of open-coding it
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-08-27 19:24:05 +01:00
Duarte Nunes
b89fa0d67b tests/mutation_reader_test: Extract eventually_true() to eventually.hh
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-08-27 19:24:05 +01:00
Duarte Nunes
636c5ded6c tests/cql_test_env: Lift eventually() to its own header file
Retrying is needed everywhere.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-08-27 19:24:00 +01:00
Avi Kivity
5792a59c96 migration_manager: downgrade frightening "Can't send migration request" ERROR
This error is transient, since as soon as the node is up we will be able
to send the migration request.  Downgrade it to a warning to reduce anxiety
among people who actually read the logs (like QA).

The message is also badly worded as no one can guess what a migration
request is, but that is left to another patch.

Fixes #3706.
Message-Id: <20180821070200.18691-1-avi@scylladb.com>
2018-08-27 14:49:36 +02:00
Takuya ASADA
10b67c7934 dist/ami: package scylla-ami as rpm
Now scylla-ami is not submodule of scylla repo, it will works as
independent repository just like scylla-jmx and scylla-tools, provides
.rpm package to install AMI scripts on AMI.

Most files are gone from dist/ami/files, but scylla_install_ami copied
from scylla-ami, since it requires to install scylla .rpms, cannot
pacakge in scylla-ami rpm.

On scylla_install_ami, we dropped ixgbevf/ena drivers code, we will
provide 'scylla-ixgbevf' and 'scylla-ena' DKMS .rpm instead.
It will automatically build kernel modules for current kernel.

A repo of the driver packages is on
https://copr.fedorainfracloud.org/coprs/scylladb/scylla-ami-drivers/

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180821201101.4631-1-syuu@scylladb.com>
2018-08-27 11:48:52 +03:00
Avi Kivity
62750eb517 Merge "Prepare for removing Iterator from simple_memory_input_stream" from Paweł
"
Right now, simple_memory_input_stream takes Iterator as a template
parameter. That iterator is supposed to point to fragments in a
underlying fragmented buffer. This makes no sense, since simple streams
deal only with contiguous buffer.

This series removes any assumption that simple_memory_input_stream has
iterator_type member from Scylla so that it can be removed.
"

* tag 'prepare-simple-stream-no-iterator/v1' of https://github.com/pdziepak/scylla:
  idl: deserialized_bytes_proxy do not assume presence of iterator_type
  idl-compiler: specify return type of with_serialized_stream() lambdas
2018-08-26 16:29:06 +03:00
Avi Kivity
16478355be Merge "Refactor password handling" from Jesse
"
This series is a refactor of password management, motivated by a
combination of correctness bugs, improving testability, improving
clarity, and adding documentation.

Tests: unit (release)
"

* 'jhk/passwords_refactor/v2' of https://github.com/hakuch/scylla:
  auth: Clean up implementation comments
  auth: Remove unnecessary local variable
  auth: Allow different random engines for salt
  auth: Correct modulo bias in salt generation
  auth: Extract random byte generation for salt
  auth: Split out test for best supported scheme
  auth: Rename function to use full words
  auth: Add domain-specific exception for passwords
  auth: Document passwords interface
  auth: Move passsword stuff to its own namespace
  auth: Identify password hashing errors correctly
  auth: Add unit tests for password handling
  auth: Move password handling to its own files
  auth: Construct `std::random_device` instances once
2018-08-26 11:18:31 +03:00
Tomasz Grabiec
2afce13967 database: Avoid OOM when soft pressure but nothing to flush
There could be soft pressure, but soft-pressure flusher may not be
able to make progress (Refs #3716). It will keep trying to flush empty
memtables, which block on earlier flushes to complete, and thus
allocate continuations in memory. Those continuations accumulate in
memory and can cause OOM.

flush will take longer to complete. Due to scheduling group isolation,
the soft-pressure flusher will keep getting the CPU.

This causes bad_alloc and crashes of dtest:
limits_test.py:TestLimits.max_cells_test

Fixes #3717

Message-Id: <1535102520-23039-1-git-send-email-tgrabiec@scylladb.com>
2018-08-26 11:03:58 +03:00
Tomasz Grabiec
1e50f85288 database: Make soft-pressure memtable flusher not consider already flushed memtables
The flusher picks the memtable list which contains the largest region
according to region_impl::evictable_occupancy().total_space(), which
follows region::occupancy().total_space(). But only the latest
memtable in the list can start flushing. It can happen that the
memtable corresponding to the largest region was already flushed to an
sstable (flush permit released), but not yet fsynced or moved to
cache, so it's still in the memtable list.

The latest memtable in the winning list may be small, or empty, in
which case the soft pressure flusher will not be able to make much
progress. There could be other memtable lists with non-empty
(flushable) latest memtables. This can lead to writes unnecessarily
blocking on dirty.

I observed this for the system memtable group, where it's easy for the
memtables to overshoot small soft pressure limits. The flusher kept
trying to flush empty memtables, while the previous non-empty memtable
was still in the group.

The CPU scheduler makes this worse, because it runs memtable_to_cache
in a separate scheduling group, so it further defers in time the
removal of the flushed memtable from the memtable list.

This patch fixes the problem by making regions corresponding to
memtables which started flushing report evictable_occupancy() as 0, so
that they're picked by the flusher last.

Fixes #3716.
Message-Id: <1535040132-11153-2-git-send-email-tgrabiec@scylladb.com>
2018-08-26 11:02:34 +03:00
Tomasz Grabiec
364418b5c5 logalloc: Make evictable_occupancy() indicate no free space
Doesn't fix any bug, but it's closer to the truth that all segments
are used rather than none is used.

Message-Id: <1535040132-11153-1-git-send-email-tgrabiec@scylladb.com>
2018-08-26 11:02:32 +03:00
Avi Kivity
54ac334f4b Update scylla-ami submodule
* dist/ami/files/scylla-ami c7e5a70...b7db861 (2):
  > scylla-ami-setup.service: run only on first startup
  > Use fstab to mount RAID volume on every reboot
2018-08-26 10:57:32 +03:00
Takuya ASADA
ff55e3c247 dist/common/scripts/scylla_raid_setup: refuse start scylla-server.service when RAID volume is not mounted
Since the Linux system abort booting when it fails to mount fstab entries,
user may not able to see an error message when we use fstab to mount
/var/lib/scylla on AMI.

Instead of abort booting, we can just abort to start scylla-server.service
when RAID volume is not mounted, using RequiresMountsFor directive of systemd
unit file.

See #3640

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180824185511.17557-1-syuu@scylladb.com>
2018-08-26 10:55:34 +03:00
Avi Kivity
37f9a3c566 database: make database's mutation apply stage inherit its scheduling group from the caller
Like the two preceeding patches, convert the mutation apply stage
to an inheriting_concrete_scheduling_group.  This change has two
added benefits: we get rid of a thread_local, and we drop a
with_scheduling_group() inside an execution stage which just creates a bunch
of continuations and somewhat undoes the benefit of the execution stage.
2018-08-24 19:04:49 +03:00
Avi Kivity
ebff1cfc37 database: make database::_mutation_query_stage inherit the scheduling group
Like the preceeding patch and for the same reasons, adjust
database::_mutation_query_stage to inherit the scheduling group from its
caller.
2018-08-24 19:04:49 +03:00
Avi Kivity
596fb6f2f7 database: make database::_data_query_stage inheriting its caller's scheduling_group
Now (8c993e0728) that replica-side operations run under the correct
scheduling group, we can inherit the scheduling_group for _data_query_stage
from the caller.  By itself this doesn't do much, but it will later allow us
to have multiple groups for statement executions.
2018-08-24 19:04:49 +03:00
Avi Kivity
908e497f3d storage_proxy: make _mutate_stage inherit its caller's scheduling_group
Right now, storage_proxy's mutate_stage violates isolation by running
in a plain execution_stage without a scheduling_group. This means do_mutate()
will run under the main scheduling_group, at least until we reach the database
apply execution stage, which is correct.

Fix by moving to an inheriting execution stage; this works because the
messaging service will tell RPC to set the correct execution stage for us. We
could explicitly specify statement_scheduling_group, but inheriting the
scheduling group allows us to have multiple statment scheduling groups, later.
2018-08-24 19:04:49 +03:00
Paweł Dziepak
4ca991ea65 idl: deserialized_bytes_proxy do not assume presence of iterator_type
deserialized_bytes_proxy assumes that the provided input stream has
iterator_type that represents the iterator pointing to the next
fragment of the fragmented underlying buffyer. This makes little sense
if the input stream is a contiguous one (i.e.
simple_memory_input_stream) so let's not make such assumptions.
2018-08-24 16:19:40 +01:00
Paweł Dziepak
3b7579aa0e idl-compiler: specify return type of with_serialized_stream() lambdas
IDL-generated code uses with_serialized_stream() to optimise for cases
when the underlying buffer is not fragmented. The provided lambda will
be called with wither simple or fragmented stream as an argument. The
consequence of this is that both instantations of generic lambda need to
return the same type. This is a problem if the type is deduced and
depends on the provided input stream (e.g. different type for fragmented
and simple streams). The solution is to explictly specify the return
type as the type returned by deserialising general utils::input_stream.
This way each instantation of lambda can return whatever it wants as
long as it is convertible to the type that the serialiser would return
if utils::input_stream was given.
2018-08-24 16:07:20 +01:00
Tomasz Grabiec
10f6b125c8 database: Run system table flushes in the main scheduling group
memtable flushes for system and regular region groups run under the
memtable_scheduling_group, but the controller adjusts shares based on
the occupancy of the regular region group.

It can happen that regular is not under pressure, but system is. In
this case the controller will incorrectly assign low shares to the
memtable flush of system. This may result in high latency and low
throughput for writes in the system group.

I observed writes to the sytem keyspace timing out (on scylla-2.3-rc2)
in the dtest: limits_test.py:TestLimits.max_cells_test, which went
away after this.

Fixes #3717.

Message-Id: <1535016026-28006-1-git-send-email-tgrabiec@scylladb.com>
2018-08-23 15:07:05 +03:00
Piotr Sarna
94262cf5d0 tests: add null collection test scenario to INSERT JSON
Refs #3664
Message-Id: <a34b9f5e8b9d7e3dd8906b559957220d74734b41.1534848313.git.sarna@scylladb.com>
2018-08-23 11:22:07 +03:00
Piotr Sarna
465045368f cql3: add proper setting of empty collections in INSERT JSON
Previously empty collections where incorrectly added as dead cells,
which resulted in serialization errors later.

Fixes #3664
Message-Id: <a9c90d66c6737641cafe40edb779df490ada0309.1534848313.git.sarna@scylladb.com>
2018-08-23 11:22:05 +03:00
Duarte Nunes
36a293bb23 cell_locking: Use xxhash instead of fnv1a
Being the single user of fnv1a, this allows us to get rid of it. As
the TODO inside fnv1a_hasher.hh indicates, and judging by any
independent benchmark, fnv1a is very slow. As we have added xx_hash
since then, and we know it to be fast, use it instead.

Tests: unit(release/cell_locker_test)

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180823081715.26089-1-duarte@scylladb.com>
2018-08-23 11:21:00 +03:00
Piotr Jastrzebski
2997fda1b1 sstables: Add test for RT with non-full key
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-08-22 18:28:11 +02:00
Piotr Jastrzebski
c50929233f sstables: Test reading range_tombstones
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-08-22 18:28:11 +02:00
Piotr Jastrzebski
7434be348c sstables: Support reading range_tombstones
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-08-22 18:27:41 +02:00
Piotr Jastrzebski
d19a108d87 sstables: Support null columns in ck
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-08-22 14:32:10 +02:00
Piotr Jastrzebski
3636697663 sstables: Add consumer_m::consume_range_tombstone
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-08-22 12:53:15 +02:00
Vladimir Krivopalov
8acf4ddb8e keys: Add clustering_key_prefix::make_full helper.
This method fills non-full clustering key with trailing empty values to
make it full.
This can be used for clustering keys of rows in a compact table as,
unlike in regular tables, they can be non-full.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-08-22 12:13:23 +02:00
Amnon Heiman
ab207356a5 API: storage_service stream endpoints
This patch changes how list of tokens returned from the storage_service
API.

Instead of create a vector and construct a json object of it, use the
streaming capabilities of the http.

This is important for large cluster and prevent large allocations.

Fixes #3701

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20180820195631.26792-1-amnon@scylladb.com>
2018-08-22 11:24:38 +03:00
Takuya ASADA
e4f38b7c22 dist/redhat: support package renaming on build script
To automatically rename packages on enterprise release, added package name
prefix as a variable on build_rpm.sh.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180822072105.9420-1-syuu@scylladb.com>
2018-08-22 11:03:39 +03:00
Piotr Sarna
4a274ee7e2 tests: add parsing varint from JSON string test
Refs #3666
Message-Id: <f4205e9484f5385796fade7986e3e38dcbc65bac.1534845398.git.sarna@scylladb.com>
2018-08-21 11:20:11 +01:00
Piotr Sarna
37a5c38471 types: enable deserializing varint from JSON string
Previously deserialization failed because the JSON string
representing a number was unnecessarily quoted.

Fixes #3666
Message-Id: <a0a100dbac7c151d627522174303657d1da05c27.1534845398.git.sarna@scylladb.com>
2018-08-21 11:20:11 +01:00
Tomasz Grabiec
6937cc2d1c Merge 'Fix multi-cell static list updates in the presence of ckeys' from Duarte
Fixes a regression introduced in
9e88b60ef5, which broke the lookup for
prefetched values of lists when a clustering key is specified.

This is the code that was removed from some list operations:

 std::experimental::optional<clustering_key> row_key;
 if (!column.is_static()) {
   row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
 }
 ...
 auto&& existing_list = params.get_prefetched_list(m.key().view(), row_key, column);

Put it back, in the form of common code in the update_parameters class.

Fixes #3703

* https://github.com/duarten/scylla cql-list-fixes/v1:
  tests/cql_query_test: Test multi-cell static list updates with ckeys
  cql3/lists: Fix multi-cell static list updates in the presence of ckeys
  keys: Add factory for an empty clustering_key_prefix_view
2018-08-21 12:14:30 +02:00
Vladimir Krivopalov
c8422c9a91 sstables: Add operator<< overload for bound_kind_m.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-08-20 16:22:53 -07:00
Duarte Nunes
ff7304b190 tests/cql_query_test: Test multi-cell static list updates with ckeys
Refs #3703

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-08-20 21:39:37 +01:00
Duarte Nunes
05731cb5ad cql3/lists: Fix multi-cell static list updates in the presence of ckeys
This patch fixes a regression introduced in
9e88b60ef5, which broke the lookup for
prefetched values of lists when a clustering key is specified.

This is the code that was removed from some list operations:

std::experimental::optional<clustering_key> row_key;
if (!column.is_static()) {
  row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
}
...
auto&& existing_list = params.get_prefetched_list(m.key().view(), row_key, column);

Put it back, in the form of common code in the update_parameters class.

Fixes #3703

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-08-20 21:39:37 +01:00
Duarte Nunes
ce461b06d7 keys: Add factory for an empty clustering_key_prefix_view
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-08-20 21:39:37 +01:00
Avi Kivity
231174cda9 build: auto-detect g++ -gz support
Older combinations of g++/binutils don't support -gz, so auto-detect its
presence.

Fixes #3697.
Message-Id: <20180817161113.2287-1-avi@scylladb.com>
2018-08-20 18:48:18 +02:00
Tomasz Grabiec
c31dff8211 Merge 'Skip inside wide partitions using index (rows only)' from Vladimir
This patchset adds support for skipping inside wide partitions using
index for sliced queries. This can significantly reduce disk I/O for
queries that only need to read a small amount of data from a wide
partition.

Other changes include general code clean-up and simplification.

 * github.com/argenet/scylla.git tree/projects/sstables-30/skip_using_index/v6:
  sstables: Support resetting data_consume_rows_context_m to
    indexable_element::cell.
  tests: Add tests to cover skipping with index through SSTables 3.x.
  sstables: Support skipping inside wide partitions using index.
  to_string: Add operator<< overload for std::optional.
  sstables: Use std::optional instead of std::experimental::optional.
2018-08-20 18:39:51 +02:00
Avi Kivity
e605cd4ff8 multishard_writer_test: reduce mutation count in release mode
We see occasional bad_alloc failures in release mode; this is due
to the random mutation generator generating large mutations.

Reduce the mutation count to 300. I tested 100 runs and all passed,
so it reduces the false positive rate to < 1%.
2018-08-20 16:53:05 +03:00
Gleb Natapov
7277ee2939 storage_proxy: do not fail read without speculation on connection error
After ac27d1c93b if a read executor has just enough targets to
achieve request's CL and a connection to one of them will be dropped
during execution ReadFailed error will be returned immediately and
client will not have a chance to issue speculative read (retry). The
patch changes the code to not return ReadFailed error immediately, but
wait for timeout instead and give a client chance to issue speculative
read in case read executor does not have additional targets to send
speculative reads to by itself.

Fixes #3699.
Message-Id: <20180819131646.GK2326@scylladb.com>
2018-08-20 10:12:31 +03:00
Vladimir Krivopalov
f1b9f82ff5 sstables: Use std::optional instead of std::experimental::optional.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-08-17 18:20:05 -07:00
Vladimir Krivopalov
7b1d4915a1 to_string: Add operator<< overload for std::optional.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-08-17 18:20:05 -07:00
Vladimir Krivopalov
3e92434eed sstables: Support skipping inside wide partitions using index.
This fix adds proper support for skipping inside wide partitions using
index for sliced reads. This significantly reduces disk I/O for filtered
queries.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-08-17 18:20:04 -07:00
Vladimir Krivopalov
ec78fb9f13 tests: Add tests to cover skipping with index through SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-08-17 18:19:22 -07:00
Vladimir Krivopalov
4bf1e9de3f sstables: Support resetting data_consume_rows_context_m to indexable_element::cell.
Set the proper parsing state when resetting to indexable_element::cell.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-08-17 10:09:19 -07:00
Eliran Sinvani
f5f6cf2096 cql3: remove rejection of an IN relation if not on last partition KEY
The constraint is no longer relevant, since Casandra removed
it in version 2.2. In addition the mechanism for handling this
case is already implemented and is identical in case of
clustering keys with single column EQ,= and IN relations.
(Cartesian product of singular ranges).

A unit test for this test case was added.

Fixes #1735
Tests:
1. Unit Tests.
2. Manual testing with the case described in the issue.
3. dtest: ql_additional_tests.py:TestCQL.composite_row_key_test

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <83b43fdc1ca0e0cc287f66f11816fc71b8bd2925.1534430405.git.eliransin@scylladb.com>
2018-08-16 19:32:43 +01:00
Eliran Sinvani
d743ceae76 cql3: ignore LIMIT in select statement with aggregate
LIMIT should restrict the output result and not the query whose result
set is aggregated. when using aggregate the output is guarantied to
be only one row long. since LIMIT accepts only none negative numbers,
it has no effect and can be ignored.

Fixes #2028
Tests: The issue described Testcase ,  UnitTests.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <6c235376c81f052020e2ed23d0a3d071b36d4415.1534416997.git.eliransin@scylladb.com>
2018-08-16 19:31:56 +01:00
Nadav Har'El
8c604921ac Materialized Views: test that virtual columns are not visible
In the previous patches, we added "virtual columns" to materialized views
to solve row liveness issues (issue #3362). Here we add a test that confirms
that although these virtual columns exist in the view, they should not be
visible to the user. They cannot be explicitly SELECTed from the view table,
and a "SELECT *" will skip them.

Refs #3362.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-08-16 15:51:46 +03:00
Nadav Har'El
5ca974547a Materialized Views: unit test reproducing fixed issue #3362
This patch includes several tests reproducing issue #3362 - the effect
of unselected columns on view-table row liveness - and confirming
that it was fixed.

We found two example scenarios to demonstrate the bug. One scenario,
test_3362_with_ttls(), involves an unselected column with a TTL. The other,
test_3362_no_ttls() demonstrates the same bug without using TTL, and using
explicit updates and deletions instead. These two tests are heavily
commented, to explain what they test, and why.

In addition to these two basic tests, we also include similar tests
involving multiple items in a collection column, instead of multiple
separate columns, which demonstrate the same problem exists there (and
why, unfortunately, the "virtual columns" we add in that case need to
be collections too).

We also test that the virtual columns - and the problems they fix -
work not only on columns originally created with the view, but also
with unselected columns added later with ALTER TABLE on the base table.

Refs #3362.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-08-16 15:48:07 +03:00
Nadav Har'El
6c00341383 Materialized Views: no need for elaborate row marker calculations
Now that we have separate virtual cells to represent unselected columns
in a materialized view, we no longer need the elaborate row-marker liveness
calculations which aimed (but failed) to do the same thing. So that code
can be removed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-08-16 15:45:41 +03:00
Nadav Har'El
30f721afab Materialized Views: add unselected columns as virtual columns
When a view's partition key contains only columns from the base's partition
key (and not an additional one), the liveness (existance or disappearance)
of a view-table row is tied to the liveness of the base table row - and
that depends not only on selected columns (base-table columns SELECTed to
also appear in the view) but also on unselected columns.

This means that we may need to keep a view row alive even without data,
just because some unselected column is alive in the base table. Before this
patch we tried to build a single "row marker" in the view column which
summarizes the liveness information in all unselected columns, but this
proved unworkable, as explained in issue #3362 and as will be demonstrated
in unit tests in a later patch.

Because we can't replace several unselected cells by one row marker, what
we do in this patch is to add for each for the unselected cell a "virtual
cell" which contains the cell's liveness information (timestamp, deletion,
ttl) but not its value. For collections, we can't represent the entire
collection by one virtual cell, and rather need a collection of virtual
cells.

This patch just adds the virtual columns to the view schema. Code in
the previous patch, when it notices the virtual columns in the view's
schema, added the appropriate content into these columns.

We may need to add virtual columns to a view when first created, but also
when an unselected column is added to the base table with "ALTER TABLE",
so both are supported in this patch.

Fixes #3362.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-08-16 15:42:22 +03:00
Nadav Har'El
782baa44ef Materialized Views: fill virtual columns
The add_cells_to_view() function usually adds selected cells from the base
table to the view mutation. For issue #3362, we sometimes want to also
add unselected cells as "virtual" cells -  truncated versions of the
base-table cells just without the values.

This patch contains the code to fill the virtual columns' data using the
regular columns from the base table.

This patch does not yet actually *add* any virtual columns to the schema,
so until that is done (in the next patch), this patch will not yet cause
any behavior change. This is important for bisectability.

Refs #3362.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-08-16 15:38:27 +03:00
Nadav Har'El
3f3a76aa8f Do not allow selecting a virtual column
For issue #3362, we will need to add to a materialized view also unselected
base-table columns as "virtual columns". We need these columns to exist
to keep view rows alive, but we don't want the user to be able to see
them.

In this patch we prevent SELECTing the virtual columns of the view,
and also exclude the virtual columns from a "SELECT *" on a view.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-08-16 15:34:22 +03:00
Nadav Har'El
36a657fc10 schema: persist "view virtual" columns to a separate system table
In the previous patch, we added a "view virtual" flag on columns. In this
patch we add persistance to this flag: I.e., writing it to the on-disk
schema table and reading it back on startup. But the implementation is
not as simple as adding a flag:

In the on-disk system tables, we have a "columns" table listing all the
columns in the database and their types. Cqlsh's "DESCRIBE MATERIALIZED
VIEW" works by reading this "columns" table, and listing all of the
requested view's columns. Therefore, we cannot add "virtual columns" -
which are columns not added by the user and not intended to be seen -
to this list.

We therefore need to create in this patch a separate list for virtual
columns, in a new table "view_virtual_columns". This table is essentially
identical to the existing "columns" table, just separate. We need to write
each column to the appropriate table (columns with the view_virtual flag to
"view_virtual_columns", columns without it to the old "columns"), read
from both on startup, and remember to delete columns from both when a table
is dropped.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-08-16 15:30:06 +03:00
Nadav Har'El
0a1d93138d schema: add "view virtual" flag to schema's column_definition
In this patch we add a flag, "view virtual", that we can mark on on a
column defined in a schema. In following patches, we will add such virtual
columns to materialized views to allow view rows to remain alive despite
having no data (refs #3362).

After this patch, the "view virtual" flag exists in our in-memory
representation of the schema, but not persisted to disk - we will
fix this in the next patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-08-16 15:23:09 +03:00
Nadav Har'El
b4fc711903 Add "empty" type name to CQL parser, but only for internal parsing
Even before this patch, Scylla supported the "empty" type (a column with
no content) but only internally - i.e., in code but not in CQL syntax.
The "empty" type was used in dense tables without regular columns, and a
special optimization in db::cql_type_parser::parse() allowed this type
name to be parsed when reading the schema tables, without allowing the
"empty" type to be used by users in CQL statements.

However, parse() only supported "empty" itself, and more complex types
like list<empty> were not recognized by parse(). In the following patches,
we plan to add to virtual columns to materialized views, with types empty,
list<empty> or map<something, empty>. We need all these types to work, and
before this patch, they don't. But we want all of these types to only work
internally - when Scylla's code creates these hidden columns; we do not
want to add the "empty" type to CQL's syntax.

This is what we do in this patch: The CQL parser's comparator_type rule
now has a parameter, "internal", used to differenciate internal calls
via db::cql_type_parser::parse() from calls from CQL query parsing.
If a user tries something like:

    CREATE TABLE e (pk empty PRIMARY KEY);

He will get the error:

    Invalid (reserved) user type name empty

Note that here, as usual, unknown types are treated as "user types",
and "empty" is not allowed as a user type name - we "reserve" it in case
one day in the future we will want to allow users a direct syntax to
create empty columns. We already have, following Cassandra, a bunch of
other names reserved from being user type names, including "byte",
"complex", and others (see _reserved_type_names()), and using "empty"
as a type name will result in a similar error message.

Just like all other type names, the name "empty" is not a reserved
keyword in other senses: a user can create a table or a column with
the name "empty", just like he can create one with the name "int".

Refs #3362.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-08-16 15:12:27 +03:00
Duarte Nunes
a4355fe7e7 cql3/query_options: Use _value_views in prepare()
_value_views is the authoritative data structure for the
client-specified values. Indeed, the ctor called
transport::request::read_options() leaves _values completely empty.

In query_options::prepare() we were, however, using _values to
associated values to the client-specified column names, and not
_value_views. Fix this by using _value_views instead.

As for the reasons we didn't see this bug earlier, I assume it's
because very few drivers set the 0x04 query options flag, which means
column names are omitted. This is the right thing to do since most
drivers have enough information to correctly position the values.

Fixes #3688

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180814234605.14775-1-duarte@scylladb.com>
2018-08-15 10:38:09 +01:00
Duarte Nunes
8751a58a2b cql3/query_options: Preserve unset values when building value_views
A raw value can be in one of three states: a valid value, an unset
value, a null value. When translating raw_values to their views, we
were treating both unset and null values are null raw_value_views.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180814231051.14385-1-duarte@scylladb.com>
2018-08-15 10:37:29 +01:00
Duarte Nunes
805ce6e019 cql3/query_processor: Validate presence of statement values timeously
We need to validate before calling query_options::prepare() whether
the set of prepared statement values sent in the query matches the
amount of names we need to bind, otherwise we risk an out-of-bounds
access if the client also specified names together with the values.

Refs #3688

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180814225607.14215-1-duarte@scylladb.com>
2018-08-15 10:37:13 +01:00
Eliran Sinvani
d734d316a6 cql3: ensure repeated values in IN clauses don't return repeated rows
When the list of values in the IN list of a single column contains
duplicates, multiple executors are activated since the assumption
is that each value in the IN list corresponds to a different partition.
this results in the same row appearing in the result number times
corresponding to the duplication of the partition value.

Added queries for the in restriction unitest and fixed with a bad result check.

Fixes #2837
Tests: Queries as in the usecase from the GitHub issue in both forms ,
prepared and plain (using python driver),Unitest.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <ad88b7218fa55466be7bc4303dc50326a3d59733.1534322238.git.eliransin@scylladb.com>
2018-08-15 10:21:22 +01:00
Duarte Nunes
a025bf6a7d Merge seastar upstream
Seastar introduced a "compat" namespace, which conflicts with Scylla's
own "compat" namespaces. The merge thus includes changes to scope
uses of Scylla's "compat" namespaces.

* seastar 8ad870f...9bb1611  (5):
  > util/variant_utils: Ensure variant_cast behaves well with rvalues
  > util/std-compat: Fix infinite recursion
  > doc/tutorial: Undo namespace changes
  > util/variant_utils: Add cast_variant()
  > Add compatbility with C++17's library types

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-08-14 13:07:09 +01:00
Duarte Nunes
25a0a0f83d tests/cql_test_env: Increase eventually() attempts
The current value has proved to be insufficient for our CI
infrastructure.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180814112201.8595-1-duarte@scylladb.com>
2018-08-14 12:37:32 +01:00
Duarte Nunes
495a92c5b6 tests/gossip_test: Use RAII for orderly destruction
Change the test so that services are correctly teared down, by the
correct order (e.g., storage_service access the messaging_service when
stopping).

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180814112111.8521-2-duarte@scylladb.com>
2018-08-14 12:27:14 +01:00
Duarte Nunes
3956a77235 tests/gossip_test: Don't bind address to avoid conflicts
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180814112111.8521-1-duarte@scylladb.com>
2018-08-14 12:27:02 +01:00
Piotr Sarna
310d0a74b9 cql3: throw proper request exception for INSERT JSON
JSON code is amended in order to return proper
"Missing mandatory PRIMARY KEY part" message instead of generic
"Attempt to access value of a disengaged optional object".

Fixes #3665
Message-Id: <69157d659d51ce5a2d408614ce3ba7bf8e3a5d88.1534161127.git.sarna@scylladb.com>
2018-08-13 23:57:37 +01:00
Piotr Sarna
b73669c329 tests: add parsing numeric values from string
Numeric values (ints, doubles) should accept string representation
when passed in INSERT JSON statement.

Refs #3666
Message-Id: <586fea8fd08fe01f7a133f82f517e26d08d7cb76.1534153955.git.sarna@scylladb.com>
2018-08-13 23:57:37 +01:00
Piotr Sarna
b3f438bfec types: enable parsing numeric JSON values from string
In order to be Cassandra-compatible, JSON values passed in INSERT JSON
statement should accept string parameters for numeric types - int,
double, etc.

Fixes #3666
Message-Id: <4da9a2f68de31492a2e9432493663a62b138c2f2.1534153955.git.sarna@scylladb.com>
2018-08-13 23:57:37 +01:00
Duarte Nunes
5de02ab98c tracing: Pass string_view instead of string to add_query
This resulted in superfluous copies.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180812085326.6260-1-duarte@scylladb.com>
2018-08-13 23:57:37 +01:00
Jesse Haber-Kucharsky
b95bbb2e72 auth: Clean up implementation comments 2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
9519a03351 auth: Remove unnecessary local variable
The variable could be declared `const`, but removing it outright seems
more clear and this way we don't have to come up with a name.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
52d3ff057a auth: Allow different random engines for salt
This makes the function useable in more contexts due to
flexibility (including in tests), since the state is not captured and
the characteristics of salt generation can be customized to the caller's
needs.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
836fd954e1 auth: Correct modulo bias in salt generation
Instead of reducing the large value via `%`, which can produce
non-uniformly distributed values when the range is small, we specify the
range in the distribution, which is uniform by construction.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
fe58a0b207 auth: Extract random byte generation for salt 2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
fd60d61ebf auth: Split out test for best supported scheme
The `generate_salt` function invokes this function internally now.

This change means that `generate_salt` is now thread-safe and therefore
does not have to be invoked by a single thread only when starting the
`password_authenticator`.

This further means that `generate_salt` does not need to be part of the
public interface of the module, and can be moved to the implementation
file.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
adf058bd1f auth: Rename function to use full words 2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
9b8cbb8542 auth: Add domain-specific exception for passwords 2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
dbea3f5a01 auth: Document passwords interface 2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
b272d622f8 auth: Move passsword stuff to its own namespace
For clarity and nicer function names.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
de01aaf181 auth: Identify password hashing errors correctly
See fce10f2c6e for reference.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
c10fcbf7a5 auth: Add unit tests for password handling
This will mean we can make changes more confidently.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
2a40bcb281 auth: Move password handling to its own files
While the `password_authenticator` is a complex component with lots of
dependencies, password hashing and checking itself is a process with
limited logical state and dependencies, which makes it easy to isolate
and test.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
03cf57db62 auth: Construct std::random_device instances once
`std::random_device` has a lot of implementation-specific behavior, and
as a result we cannot assume much about its performance characteristics.

We initialize thread-specific static instances of `std::random_device`
once so that we don't have the overhead of invoking the ctor during
every invocation of `gensalt`.
2018-08-13 13:24:45 -04:00
Duarte Nunes
f86811a3c9 Merge seastar upstream
* seastar d40faff...8ad870f (9):
  > reactor: switch indentation
  > properly configure I/O Scheduler when --max-io-requests is passed
  > IOTune: tell users that the evaluation will take a while
  > exceptions: fix compilation with static libstdc++
  > apps/iotune: print out which config file updated
  > foreign_ptr: allow waiting for the destruction of the managed ptr
  > Merge "Improve UX for backtraces read from stdin" from Botond

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-08-12 14:01:36 +01:00
Avi Kivity
183d5ba178 build: compress debug sections
Compressing debug section reduces build size by 30% with no
significant increase in build time.

Results on a 4-core system (ninja release, size in MB):

before:

18056	build

real	59m43.138s
user	229m3.180s
sys	6m49.460s

after:

12387	build

real	60m30.112s
user	232m8.962s
sys	6m49.364s

Presumably, the difference in debug mode is even greater.x
Message-Id: <20180811180444.30578-1-avi@scylladb.com>
2018-08-11 19:41:55 +01:00
Takuya ASADA
2ef1b094d7 dist/common/scripts/scylla_setup: don't proceed RAID setup until user type 'done'
Need to wait user confirmation before running RAID setup.

See #3659
Fixes #3681

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180810194507.1115-1-syuu@scylladb.com>
2018-08-11 18:48:05 +03:00
Takuya ASADA
b7cf3d7472 dist/common/scripts/scylla_setup: don't mention about interactive mode prompt when running on non-interactive mode
Skip showing message when it's non-interactive mode.

Fixes #3674

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180810191945.32693-1-syuu@scylladb.com>
2018-08-11 18:48:03 +03:00
Takuya ASADA
ef9475dd3c dist/common/scripts/scylla_setup: check existance of housekeeping.cfg before asking to run version check
Skip asking to run version check when housekeeping.cfg is already
exists.
Fixes #3657

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180807232313.15525-1-syuu@scylladb.com>
2018-08-11 18:48:02 +03:00
Takuya ASADA
f30b701872 dist/debian: fix install scylla-server.service
On previous commit we moved debian/scylla-server.service to
debian/scylla-server.scylla-server.service to explicitly specify
subpackage name, but it doesn't work for dh_installinit without '--name'
option.

Result of that current scylla-server .deb package missing
scylla-server.service, so we need to rename the service to original
file name.

Fixes #3675

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180810221944.24837-1-syuu@scylladb.com>
2018-08-11 15:07:37 +03:00
Duarte Nunes
1521dc56ae Merge 'Pass query options to restrictions filter' from Piotr
"
This miniseries fixes ALLOW FILTERING support for prepared statements
by passing correct query options to the filter instead of empty ones.
"

* 'pass_query_options_to_restrictions_filter' of https://github.com/psarna/scylla:
  tests: add testing prepared statements with ALLOW FILTERING
  cql3: pass query options to restrictions filter
2018-08-09 18:15:18 +01:00
Duarte Nunes
95677877c2 Merge 'JSON support fixes' from Piotr
"
This series addresses SELECT/INSERT JSON support issues, namely
handling null values properly and parsing decimals from strings.
It also comes with updated cql tests.

Tests: unit (release)
"

* 'json_fixes_3' of https://github.com/psarna/scylla:
  cql3: remove superfluous null conversions in to_json_string
  tests: update JSON cql tests
  cql3: enable parsing decimal JSON values from string
  cql3: add missing return for dead cells
  cql3: simplify parsing optional JSON values
  cql3: add handling null value in to_json
  cql3: provide to_json_string for optional bytes argument
2018-08-09 18:05:34 +01:00
Piotr Sarna
9ba218c161 cql3: remove superfluous null conversions in to_json_string
Some types checked when passed bytes argument was empty, and if so,
returned "null" as a JSON string. Now, with to_json_string(bytes_opt)
it's not needed anymore. Also, some types returned "null" instead
of signaling a deserialization error.
2018-08-09 18:07:12 +02:00
Piotr Sarna
fc187fa31e tests: update JSON cql tests
Tests are updated to check for recently fixed issues, i.e.
 * proper handling of null values
 * parsing decimal values from string

Refs #3664
Refs #3666
Refs #3667
2018-08-09 18:07:12 +02:00
Piotr Sarna
957cc712b6 cql3: enable parsing decimal JSON values from string
In order to be Cassandra-compatible, decimal type should be parsable
from both numeric values and strings.

Fixes #3666
2018-08-09 18:07:12 +02:00
Piotr Sarna
f962b85fa3 cql3: add missing return for dead cells
Fixes #3664
2018-08-09 18:07:12 +02:00
Piotr Sarna
cdbeed4e3b cql3: simplify parsing optional JSON values
With new to_json_string implementation that accepts bytes_opt,
parsing optional values can be simplified to remove explicit
branching.
2018-08-09 18:07:12 +02:00
Piotr Sarna
e4396e17cb cql3: add handling null value in to_json
Previously to_json function would fail with null passed as a parameter.

Fixes #3667
2018-08-09 18:07:12 +02:00
Piotr Sarna
52052b53a8 cql3: provide to_json_string for optional bytes argument
In order to handle optional arguments in a neat way, a wrapper
for to_json_string is provided.
2018-08-09 18:07:07 +02:00
Piotr Sarna
4a9014675f tests: add testing prepared statements with ALLOW FILTERING
ALLOW FILTERING support for prepared statements was buggy,
so a test case for prepared statements is added to cql test suite.
2018-08-09 18:06:09 +02:00
Piotr Sarna
8c18aaa511 cql3: pass query options to restrictions filter
Query options may contain bound values needed for checking filtering
restrictions. Previously, empty query_options{} were used, which
caused prepared statements to fail.

Fixes #3677
2018-08-09 17:44:45 +02:00
Eliran Sinvani
3f2bb07599 cql3: Count unpaged select queries
If the counter goes up this can be a possible reason for slowdown in
queries (since it means that potentially a large amount of data will
be sent to the client at once).

Fixes #2478
Tests: cqlsh with PAGING OFF and ON and validating with a print.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <01253cee0b8c1110aaee3da41d1f434ca798b430.1533817568.git.eliransin@scylladb.com>
2018-08-09 13:53:44 +01:00
Tomasz Grabiec
024b3c9fd9 mutation_partition: Fix exception safety of row::apply_monotonically()
When emplace_back() fails, value is already moved-from into a
temporary, which breaks monotonicity expected from
apply_monotonically(). As a result, writes to that cell will be lost.

The fix is to avoid the temporary by in-place construction of
cell_and_hash. To do that, appropriate cell_and_hash constructor was
added.

Found by mutation_test.cc::test_apply_monotonically_is_monotonic with
some modifications to the random mutation generator.

Introduced in 99a3e3a.

Fixes #3678.

Message-Id: <1533816965-27328-1-git-send-email-tgrabiec@scylladb.com>
2018-08-09 15:29:10 +03:00
Tomasz Grabiec
fd543603dd tests: random_mutation_generator: Use collection_member::yes for collection cells
Caused assert failure when collection cells were so large as to
require fragmentation. Currently collection cells are not fragmented,
and deserialization asserts that.

Message-Id: <1533817077-27583-1-git-send-email-tgrabiec@scylladb.com>
2018-08-09 15:27:20 +03:00
Vladimir Krivopalov
55d2fdee9a clustering_key_filter_ranges: Fix move assignment to avoid undefined behaviour.
Get rid of the new(this) trick that results in undefined behaviour
because the class contains a const reference member.

Use std::reference_wrapper instead to ease the transition.

Refs #3032.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <5642bf79659231627dd7f8693c17cb46f274bcda.1533765105.git.vladimir@scylladb.com>
2018-08-09 00:53:17 +01:00
Takuya ASADA
ad7bc313f7 dist/common/scripts: pass format variables to colorprint()
When we use str.format() to pass variables on the message it will always
causes Exception like "KeyError: 'red'", since the message contains color
variables but it's not passed to str.format().
To avoid the error we need to pass all format variables to colorprint()
and run str.format() inside the function.

Fixes #3649

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180803015216.14328-1-syuu@scylladb.com>
2018-08-08 18:37:50 +03:00
Avi Kivity
d6b0c4dda4 config: default murmur3_ignore_msb_bits to 12 even if not specified in scylla.yaml
When murmur3_ignore_msb_bits was introduced in 1.7, we set its default zero
(to avoid resharding on upgrade) and set it to 12 in the scylla.yaml template
(to make sure we get the right value for new clusters).

Now, however, things have changed:
 - clusters installed before 1.7 are a small minority
 - they should have resharded long ago
 - resharding is much better these days
 - we have more migrations from Cassandra compared to old clusters

To allow clusters that migrated using their cassandra.yaml, and to clean up
the default scylla.yaml, make the default 12.

Users upgrading from pre-1.7 clusters will need to update their scylla.yaml,
or to reshard (which is a good idea anyway).

Fixes #3670.
Message-Id: <20180808063003.26046-1-avi@scylladb.com>
2018-08-08 13:46:06 +02:00
Asias He
d47d46e1a8 streaming: Use streaming_write_priority for the sstable writer
Use the streaming io priority otherwise it uses the default io priority.

Message-Id: <e1836a9a93e7204d4bc9bba9c841d57c8b24aff8.1533715438.git.asias@scylladb.com>
2018-08-08 11:08:06 +03:00
Takuya ASADA
15825d8bf1 dist/common/scripts/scylla_setup: print message when EC2 instance is optimized for Scylla
Currently scylla_ec2_check exits silently when EC2 instance is optimized
for Scylla, it's not clear a result of the check, need to output
message.

Note that this change effects AMI login prompt too.

Fixes #3655

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180808024256.9601-1-syuu@scylladb.com>
2018-08-08 10:17:52 +03:00
Takuya ASADA
652eb5ae0e dist/common/scripts/scylla_setup: fix typo on interactive setup
Scylls -> Scylla

Fixes #3656

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180808002443.1374-1-syuu@scylladb.com>
2018-08-08 09:15:13 +03:00
Vladimir Krivopalov
7f77087caa tests: Add tests performing compaction on SSTables 3.x.
These tests check the correctness of resulting compacted SSTables based
on the files produced by compacting input files with Cassandra.

Note that output files are not identical to those generated by Cassandra
because Scylla compaction does not yet optimise delta-encoded values
using serialization header.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <3fa05ce72352292d1026ce80ac87552889d10d96.1533667535.git.vladimir@scylladb.com>
2018-08-08 08:50:41 +03:00
Rafi Einstein
c7f41c988f Add a counter to count large partition warning in compaction
Fixes #3562

Tests: dtest(compaction_test.py)
Message-Id: <20180807190324.82014-1-rafie@scylladb.com>
2018-08-07 20:15:09 +01:00
Avi Kivity
c9caaa8e6e docker: adjust for script conversion to Python
Since our scripts were converted to Python, we can no longer
source them from a shell. Execute them directly instead. Also,
we now need to import configuration variables ourselves, since
scylla_prepare, being an independent process, won't do it for
us.

Fixes #3647
Message-Id: <20180802153017.11112-1-avi@scylladb.com>
2018-08-07 15:34:03 +01:00
Takuya ASADA
a300926495 dist/common/scripts/scylla_setup: use specified NIC ifname correctly
Interactive NIC selection prompt always returns 'eth0' as selected NIC name
mistakenly, need to fix.

Fixes #3651

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180803020724.15155-1-syuu@scylladb.com>
2018-08-06 20:59:19 +03:00
Amnon Heiman
80b1ef0f47 storage_service: Add nodes_status related metrics
This patch adds a metric for a node own operation mode, the
operation_mode metric represent the enum modes as gauge values according
to: UNKNOWN = 0, STARTING = 1, JOINING = 2, NORMAL = 3, LEAVING = 4, DECOMMISSIONED =
5, DRAINING = 6, DRAINED = 7, MOVING = 8

Fixes: #3482

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20180806142706.23579-1-amnon@scylladb.com>
2018-08-06 18:19:56 +03:00
Tomasz Grabiec
88053b3bc9 tests: sstables: Replace sleep with accurate synchronzation
Message-Id: <1533545829-31109-1-git-send-email-tgrabiec@scylladb.com>
2018-08-06 10:09:39 +01:00
Avi Kivity
13b729bf71 Merge "tracing: store request and response sizes" from Vlad
"
Store sizes of the request and the response for each traces query.

In the example below I traced the cassandra-stress write workload with a default schema using the probabilistic tracing.

Here is an entry created for one of queries:

cassandra@cqlsh> SELECT parameters FROM system_traces.sessions where session_id=30c3a8ea-96bb-11e8-8a97-000000000000;

 parameters
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 {'consistency_level': 'LOCAL_ONE', 'page_size': '5000', 'param[0]': 'f749eb03d6a995d8b3496075da8f20aa9228c5db12401e8a37000fa5baa13531...', 'param[1]': '845809b53a9aff7eef8f85308eaef79e03c696653ca23957f1ed5d539dc00463...', 'param[2]': 'd303585def93a5d40e41ceb12880ad3ede3d9f6308a1b1c5e42e911a191f1de1...', 'param[3]': 'be77c7da059d4b52687cd9b3eaa7d04cdfe7b5e38e84a8eea318299a01c7845f...', 'param[4]': '32faaaea1b3d73d9d628a4945b69a8531740348d49ee30c03f697dd2d63e8dee...', 'param[5]': '50503850374d34323330', 'query': 'UPDATE "standard1" SET "C0" = ?,"C1" = ?,"C2" = ?,"C3" = ?,"C4" = ? WHERE KEY=?', 'serial_consistency_level': 'SERIAL'}

(1 rows)
cassandra@cqlsh> SELECT request_size,response_size FROM system_traces.sessions where session_id=30c3a8ea-96bb-11e8-8a97-000000000000;

 request_size | response_size
--------------+---------------
          239 |             4

(1 rows)

Now let's try to read the same keyspace1.standard1 entry (based on the "key" value in "param[5]") from cqlsh and trace it using TRACING ON.

cassandra@cqlsh> TRACING ON
Now Tracing is enabled
cassandra@cqlsh> SELECT * from keyspace1.standard1 where key=0x50503850374d34323330;

 key                    | C0                                                                     | C1                                                                     | C2                                                                     | C3                                                                     |
C4
------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+-
-----------------------------------------------------------------------
 0x50503850374d34323330 | 0xf749eb03d6a995d8b3496075da8f20aa9228c5db12401e8a37000fa5baa135315430 | 0x845809b53a9aff7eef8f85308eaef79e03c696653ca23957f1ed5d539dc00463e10e | 0xd303585def93a5d40e41ceb12880ad3ede3d9f6308a1b1c5e42e911a191f1de12924 | 0xbe77c7da059d4b52687cd9b3eaa7d04cdfe7b5e38e84a8eea318299a01c7845fb8a2 |
0x32faaaea1b3d73d9d628a4945b69a8531740348d49ee30c03f697dd2d63e8dee5dde

(1 rows)

Tracing session: 639ca0a0-96bb-11e8-8a97-000000000000

 activity                                                                                                                                 | timestamp                  | source        | source_elapsed
------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------
                                                                                                                       Execute CQL3 query | 2018-08-02 21:20:20.906000 | 192.168.1.138 |              0
                                                                                                            Parsing a statement [shard 0] | 2018-08-02 21:20:20.906358 | 192.168.1.138 |             --
                                                                                                         Processing a statement [shard 0] | 2018-08-02 21:20:20.906405 | 192.168.1.138 |             47
 Creating read executor for token -5698461774438220979 with all: {192.168.1.138} targets: {192.168.1.138} repair decision: NONE [shard 0] | 2018-08-02 21:20:20.906445 | 192.168.1.138 |             87
                                                                                                    read_data: querying locally [shard 0] | 2018-08-02 21:20:20.906448 | 192.168.1.138 |             90
                                                           Start querying the token range that starts with -5698461774438220979 [shard 0] | 2018-08-02 21:20:20.906452 | 192.168.1.138 |             94
                                                                                                               Querying is done [shard 0] | 2018-08-02 21:20:20.906509 | 192.168.1.138 |            151
                                                                                           Done processing - preparing a result [shard 0] | 2018-08-02 21:20:20.906533 | 192.168.1.138 |            175
                                                                                                                         Request complete | 2018-08-02 21:20:20.906186 | 192.168.1.138 |            186

cassandra@cqlsh> TRACING OFF
Disabled Tracing.

cassandra@cqlsh> SELECT request_size,response_size FROM system_traces.sessions where session_id=639ca0a0-96bb-11e8-8a97-000000000000;

 request_size | response_size
--------------+---------------
           82 |           369

(1 rows)
"

* 'tracing_request_response_size-v2' of https://github.com/vladzcloudius/scylla:
  tracing: move all tracing related API functions to a cold path
  tracing: store a query response size
  tracing: store request size
2018-08-05 18:26:29 +03:00
Jesse Haber-Kucharsky
fce10f2c6e auth: Don't use unsupported hashing algorithms
In previous versions of Fedora, the `crypt_r` function returned
`nullptr` when a requested hashing algorithm was not supported.

This is consistent with the documentation of the function in its man
page.

As of Fedora 28, the function's behavior changes so that the encrypted
text is not `nullptr` on error, but instead the string "*0".

The info pages for `crypt_r` clarify somewhat (and contradict the man
pages):

    Some implementations return `NULL` on failure, and others return an
    _invalid_ hashed passphrase, which will begin with a `*` and will
    not be the same as SALT.

Because of this change of behavior, users running Scylla on a Fedora 28
machine which was upgraded from a previous release would not be able to
authenticate: an unsupported hashing algorithm would be selected,
producing encrypted text that did not match the entry in the table.

With this change, unsupported algorithms are correctly detected and
users should be able to continue to authenticate themselves.

Fixes #3637.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <bcd708f3ec195870fa2b0d147c8910fb63db7e0e.1533322594.git.jhaberku@scylladb.com>
2018-08-05 08:57:36 +03:00
Vlad Zolotarov
896c1822b5 tracing: move all tracing related API functions to a cold path
This patch completes what was started in a4282c2c6e

Make trace_state_ptr to be a wrapper class around lw_shared_ptr<trace_state> that
hints that bool(trace_state_ptr) is likely to return FALSE.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-08-03 12:32:54 -04:00
Vlad Zolotarov
6db90a2e63 tracing: store a query response size
Add a new "response_size" column to system_traces.sessions and store a size of an uncompressed response
for a traced query.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-08-03 12:29:36 -04:00
Vlad Zolotarov
05020921bb tracing: store request size
Add a new column "request_size" to system_traces.sessions and store
the uncompressed request frame data size.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-08-03 12:29:36 -04:00
Avi Kivity
3b42fcfeb2 Merge "Fix exception safety in imr::utils::object" from Paweł
"

There is an exception safety problem in imr::utils::object. If multiple
memory allocations are needed and one of them fails the main object is
going to be freed (as expected). However, at this stage it is not
constructed yet, so  when LSA asks its migrator for the size it may get
a meaningless value. The solution is to remember the size until object
is fully created and use sized deallocation in case of failures.

Fixes #3618.

Tests: unit(release, debug/imr_test)
"
2018-08-02 12:10:24 +03:00
Takuya ASADA
1bb463f7e5 dist/debian: install *.service on correct subpackage
We mistakenly installing scylla-housekeeping-*.service to scylla-conf
package, all *.service should explicitly specified subpackage name.

Fixes #3642

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180801233042.307-1-syuu@scylladb.com>
2018-08-02 11:39:52 +03:00
Paweł Dziepak
fd44d13145 tests/imr: add test for exception safety in imr::utils::object::make() 2018-08-01 16:50:58 +01:00
Paweł Dziepak
7ec906e657 imr: detect lsa migrator mismatch
Each IMR type needs its own LSA migrator. It is possible that user will
provide a migrator for a different type than the one which instance is
being created. This patch adds compile-time detection of that bug.
2018-08-01 16:50:58 +01:00
Benny Halevy
6b179b0183 HACKING.md: update ./install-dependencies.sh filename
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20180801150813.25408-1-bhalevy@scylladb.com>
2018-08-01 18:09:29 +03:00
Paweł Dziepak
6fbf2d72e9 imr::utils::object_context: fix context_for for backpointer
Each member of a structure may require different deserialisation
context. They are provided by context_for<Tag>() method of the context
used to deserialise the structure itself.

imr::utils::object needs to add backpointer to the structure it manages
so that it can be used in the LSA memory. This is done by creating a
structure that has two members: the backpointer and the actual structure
that imr::utils::object is to manage. imr::utils::object_context creates
approperiate deserialisation context for it.

context_for() is called for each member of a structure. object_context
implementation of context_for() always created a deserialisation context
for the underlying structure regardless which member that was, so it was
done also for backpointer. This is wrong since the context may read the
object on its creation.

The fix is to use no_context_t for the backpointer.
2018-08-01 15:17:25 +01:00
Paweł Dziepak
61749019cb imr::utils::object: fix exception safety if allocation fails
imr::utils::object::make() handles creation of IMR objects. They are
created in three phases:
  1. The size of the object and all additional needed memory allocations
     is determined
  2. All needed buffers are allocated
  3. Data is written to the allocated space

When IMR objects are deallocated LSA asks their migrator for the size.
Migrator may read some parts of the object to figure out its size. This
is a problem if there is allocation failure in make() at point 2.
If one of required allocations fails, the buffers that were already
acquired need to be freed. However, since the object hasn't been fully
created yet migrator won't return a valid value.

The solution for this is to remember object size until all allocations
are completed. This way the LSA won't need to ask migrators for it in
case of failure. imr::alloc::object_allocator already does that but
imr::utils::object doesn't. This patch fixes that.
2018-08-01 15:17:13 +01:00
Piotr Sarna
156888fb44 docs: fix system.large_partitions doc entry
For some reason the doc entry for large_partitions was outdated.
It contained incorrect ORDERING information and wrong usage example,
since large_partitions' schema changed multiple times during
the reviewing process.

Message-Id: <1910f270419536ebccffde163ec1bfc67d273306.1533128957.git.sarna@scylladb.
com>
2018-08-01 16:12:39 +03:00
Asias He
95849371aa range_streamer: Remove unordered_multimap usage
We need the mapping between dht::token_range to
std::vector<inet_address> and inet_address to dht::token_range_vector in
various places. Currently, we use std::unordered_multimap and convert to
std::unordered_map. It is better to use std::unordered_map in the first
place. The changes like below:

- Change from

  std::unordered_multimap<dht::token_range, inet_address>

to

  std::unordered_map<dht::token_range, std::vector<inet_address>>

- Change from

   std::unordered_multimap<inet_address, dht::token_range>

to

   std::unordered_map<inet_address, dht::token_range_vector>

Message-Id: <b8ecc41775e46ec064db3ee07510c404583390aa.1533106019.git.asias@scylladb.com>
2018-08-01 13:01:41 +03:00
Gleb Natapov
44a6afad8c cache_hitrate_calculator: fix race when new table is added during calculations
The calculation consists of several parts with preemption point between
them, so a table can be added while calculation is ongoing. Do not
assume that table exists in intermediate data structure.

Fixes #3636

Message-Id: <20180801093147.GD23569@scylladb.com>
2018-08-01 12:45:03 +03:00
Avi Kivity
620e950fc8 Merge "No infinite time-outs for internal distributed queries" from Jesse
"
This series replaces infinite time-outs in internal distributed
(non-local) CQL queries with finite ones.

The implementation of tracing, which also performs internal queries,
already has finite time-outs, so it is unchanged.

Fixes #3603.
"

* 'jhk/finite_time_outs/v2' of https://github.com/hakuch/scylla:
  Use finite time-outs for internal auth. queries
  Use finite query time-outs for `system_distributed`
2018-08-01 11:23:42 +03:00
Asias He
4a0b561376 storage_service: Get rid of moving operation
The moving operation changes a node's token to a new token. It is
supported only when a node has one token. The legacy moving operation is
useful in the early days before the vnode is introduced where a node has
only one token. I don't think it is useful anymore.

In the future, we might support adjusting the number of vnodes to reblance
the token range each node owns.

Removing it simplifies the cluster operation logic and code.

Fixes #3475

Message-Id: <144d3bea4140eda550770b866ec30e961933401d.1533111227.git.asias@scylladb.com>
2018-08-01 11:18:17 +03:00
Asias He
02befb6474 gossip: Log seeds seen
It is useful for debugging bootstap issue, especially for large
clusters.

Also do not use the `_seeds` as the set_seeds function parameter since
there is a class member called _seeds.

Refs #3417
Message-Id: <15e6bdf06376949ced1bdb845f810da09266783d.1532474820.git.asias@scylladb.com>
2018-08-01 10:57:56 +03:00
Takuya ASADA
2cd99d800b dist/common/scripts/scylla_ntp_setup: fix typo
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1533070539-2147-1-git-send-email-syuu@scylladb.com>
2018-08-01 10:31:07 +03:00
Avi Kivity
2c9b886b6d logalloc: reindent
No functional changes.
Message-Id: <20180731125116.32009-1-avi@scylladb.com>
2018-08-01 00:35:54 +01:00
Jesse Haber-Kucharsky
e664f9b0c6 Use finite time-outs for internal auth. queries 2018-07-31 11:38:16 -04:00
Jesse Haber-Kucharsky
ca44f4de3c Use finite query time-outs for system_distributed 2018-07-31 11:38:15 -04:00
Paweł Dziepak
b20a15bdda Merge "Prevent scheduling leaks when out of memtable space" from Avi
"
When we are out of memtable space (real of virtual), lsa will defer running
our mutation application and run it later when memory is in fact available.
However, it will run it in the main group, giving the write more shares than it
would otherwise get.

This patchset fixes the problem by running those deferred mutation applications
in the correct scheduling group.

Fixes #3638
"

* tag '3638/v2' of https://github.com/avikivity/scylla:
  database: tag dirty memory managers with scheduling groups
  logalloc: run releaser() in user-provided scheduling group
2018-07-31 11:55:19 +01:00
Avi Kivity
2d311c26b3 database: tag dirty memory managers with scheduling groups
dirty memory managers run code on behalf of their callers
in a background fiber, so provide that background fiber with
the scheduling group appropriate to their caller.

 - system: main (we want to let system writes through quickly)
 - dirty: statement (normal user writes)
 - streaming: streaming (streaming writes)
2018-07-31 13:18:21 +03:00
Paweł Dziepak
98217f0d66 Update seastar submodule
* seastar 6b97e00...d40faff (10):
  > tutorial: update build as needed for newer pandoc
  > core: fix __libc_free return type signature
  > future-utils: when_all: avoid calling member function on an uninitialized data member
  > future-util: reduce continuations in when_all (variadic version)
  > future-utils: remove allocation in when_all() if all futures are available
  > future: reduce allocations in when_all()
  > future: fill missing futurize::from_tuple() functions
  > future: expose more types in continuation_base
  > log: predict logger::is_enabled() as false
  > README: add Resources section with infomation about the mailing list etc.
2018-07-31 10:12:52 +01:00
Avi Kivity
0fc54aab98 logalloc: run releaser() in user-provided scheduling group
Let the user specify which scheduling group should run the
releaser, since it is running functions on the user's behalf.

Perhaps a cleaner interface is to require the user to call
a long-running function for the releaser, and so we'd just
inherit its scheduling group, but that's a much bigger change.
2018-07-31 11:57:58 +03:00
Avi Kivity
f258df099a Update ami submodule
* dist/ami/files/scylla-ami d53834f...c7e5a70 (1):
  > ds2_configure.py: uncomment 'cluster_name' when it's commented out
2018-07-31 09:34:33 +03:00
Avi Kivity
e7ae4beef0 main: run prometheus and API servers under streaming group
Both the Prometheus and the API servers are used for maintenance
operations, similarly to streaming. Run them under the streaming
scheduling group to prevent them from impacting normal operations,
and rename the streaming scheduling group to reflect the more
generic role.

This helps to prevent spikes from Prometheus or API requests from
interfering with the normal workload. Using an existing group is
preferable to creating a new group because in the worst case, all
the non-main-workload groups compete with the main workload.
Consolidating them allows us to give them significant shares in
total without increasing competition in the worst case.

The group's label is unchanged to preserve compatibility with
dashboards.

A nice side effect is that repair, which is initiated by API calls,
gets placed into the maintenance group naturally. Compaction tasks
which are run by compaction manager are not changed.
Message-Id: <20180714160723.23655-1-avi@scylladb.com>
2018-07-30 15:07:33 +01:00
Avi Kivity
a4282c2c6e tracing: move tracing code to cold path
Most queries run without tracing (and those that run with tracing
are not sensitive to a few cycles), so mark the tracing paths as
cold.
Message-Id: <20180723133000.30482-1-avi@scylladb.com>
2018-07-30 15:05:57 +01:00
Rafi Einstein
123f2c2a1c Add a counter for reverse queries
Fixes #3492

Tests: dtest(cql_additional_tests.py)
Message-Id: <20180729202615.22459-1-rafie@scylladb.com>
2018-07-30 12:34:43 +03:00
Takuya ASADA
032b26deeb dist/common/scripts/scylla_ntp_setup: fix typo
Comment on Python is "#" not "//".

Fixes #3629

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180730091022.4512-1-syuu@scylladb.com>
2018-07-30 12:30:53 +03:00
Avi Kivity
04d88e8ff7 scripts: add a script to compute optimal number of compile jobs
This will allow continuous integration to use the optimal number
of compiler jobs, without having to resort to complex calculations
from its scripting environment.

Message-Id: <20180722172050.13148-1-avi@scylladb.com>
2018-07-30 10:15:11 +03:00
Avi Kivity
a4c9330bfc Merge "Optimise paged queries" from Paweł
"
This series adds some optimisations to the paging logic, that attempt to
close the performance gap between paged and not paged queries. The
former are more complex so always are going to be slower, but the
performance loss was unacceptably large.

Fixes #3619.

Performance with paging:
        ./perf_paging_before  ./perf_paging_after   diff
 read              271246.13            312815.49  15.3%

Without paging:
        ./perf_nopaging_before  ./perf_nopaging_after   diff
 read                343732.17              342575.77  -0.3%

Tests: unit(release), dtests(paging_test.py, paging_additional_test.py)
"

* tag 'optimise-paging/v1' of https://github.com/pdziepak/scylla:
  cql3: select statement: don't copy metadata if not needed
  cql3: query_options: make simple getter inlineable
  cql3: metadata: avoid copying column information
  query_pager: avoid visiting result_view if not needed
  query::result_view: add get_last_partition_and_clustering_key()
  query::result_reader: fix const correctness
  tests/uuid: add more tests including make_randm_uuid()
  utils: uuid: don't use std::random_device()
2018-07-26 19:24:03 +03:00
Nadav Har'El
25bd139508 cross-tree: clean up use of std::random_device()
std::random_device() uses the relatively slow /dev/urandom, and we rarely if
ever intend to use it directly - we normally want to use it to seed a faster
random_engine (a pseudo-random number generator).

In many places in the code, we first created a random_device variable, and then
using it created a random_engine variable. However, this practice created the
risk of a programmer accidentally using the random_device object, instead of the
random_engine object, because both have the same API; This hurts performance.

This risk materialized in just two places in the code, utils/uuid.cc and
gms/gossiper.cc. A patch for to uuid.cc was sent previously by Pawel and is
not included in this patch, and the fix for gossiper.{cc,hh} is included here.

To avoid risking the same mistake in the future, this patch switches across the
code to an idiom where the random_device object is not *named*, so cannot be
accidentally used. We use the following idiom:

   std::default_random_engine _engine{std::random_device{}()};

Here std::random_device{}() creates the random device (/dev/urandom) and pulls
a random integer from it. It then uses this seed to create the random_engine
(the pseudo-random number generator). The std::random_device{} object is
temporary and unnamed, and cannot be unintentionally used directly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180726154958.4405-1-nyh@scylladb.com>
2018-07-26 16:54:58 +01:00
Takuya ASADA
8e4d1350c9 dist/common/scripts/scylla_ntp_setup: ignore ntpdate error
Even ntpdate fails to adjust clock ntpd may able to recover it later,
ignore ntpdate error keep running the script.

Fixes #3629

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180726080206.28891-1-syuu@scylladb.com>
2018-07-26 14:44:53 +03:00
Paweł Dziepak
3e32245bb8 cql3: select statement: don't copy metadata if not needed 2018-07-26 12:37:20 +01:00
Paweł Dziepak
15775c958a cql3: query_options: make simple getter inlineable 2018-07-26 12:37:06 +01:00
Paweł Dziepak
ef0c999742 cql3: metadata: avoid copying column information
The column-related metadata is shared by all requests done with the same
perpared query. However, metadata class contains also some additional
flags and paging state which may differ. This patch allows sharing
column information among multiple instances of the metadata class.
2018-07-26 12:17:04 +01:00
Paweł Dziepak
757d9e3b5d query_pager: avoid visiting result_view if not needed
query::result_visitor provides get_last_partition_and_clustering_key()
which allows getting those without iterating through the whole result.
Moreover, row count may be precomputed in the result, if it isn't there
is query::result_view::count_partitions_and_rows() for getting it.
2018-07-26 12:14:48 +01:00
Paweł Dziepak
9b6dc52255 query::result_view: add get_last_partition_and_clustering_key()
Paging needs to get last partition and clustering key (if the latter
exists). Previously, this was done by result_view visitor but that is
suboptimal. Let's add a direct getter for those.
2018-07-26 12:12:08 +01:00
Paweł Dziepak
b5ed4c8806 query::result_reader: fix const correctness 2018-07-26 12:11:27 +01:00
Paweł Dziepak
495df277f9 tests/uuid: add more tests including make_randm_uuid() 2018-07-26 12:03:37 +01:00
Paweł Dziepak
b485deb124 utils: uuid: don't use std::random_device()
std::random_device() is extremely slow. This patch modifies
make_rand_uuid() so that it requires only two invocations of the PRNG.
2018-07-26 12:02:32 +01:00
Avi Kivity
b167647bf6 dist: redhat: fix up bad file ownership of rpms/srpms
mock outputs files owned by root. This causes attempts
by scripts that want to junk the working directory (typically
continuous integration) to fail on permission errors.

Fixup those permissions after the fact.
Message-Id: <20180719163553.5186-1-avi@scylladb.com>
2018-07-26 08:20:42 +03:00
Avi Kivity
bea1f715dc storage_proxy: count cross-shard operations
Count operations which were started on one shard and
were performed on another, due to non-shard-aware driver
and/or RPC.
Message-Id: <20180723155118.8545-1-avi@scylladb.com>
2018-07-25 16:21:04 +01:00
Avi Kivity
d6ef74fe36 Merge "Fix JSON string quoting" from Piotr
"

This mini-series covers a regression caused by newest versions
of jsoncpp library, which changed the way of quoting UTF-8 strings.

Tests: unit (release)
"

* 'add_json_quoting_3' of https://github.com/psarna/scylla:
  tests: add JSON unit test
  types: use value_to_quoted_string in JSON quoting
  json: add value_to_quoted_string helper function

Ref #3622.
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
2018-07-25 17:49:55 +03:00
Piotr Sarna
b367cff05d tests: add JSON unit test
Since value_to_quoted_string now has an internal implementation,
a unit test is provided to check if strings are quoted
and escaped properly.
2018-07-25 13:16:06 +02:00
Piotr Sarna
d307b5712c types: use value_to_quoted_string in JSON quoting
In order to avoid regressions caused by external libraries,
our own value_to_quoted_string implementation is used.

Fixes #3622
2018-07-25 13:16:06 +02:00
Piotr Sarna
783762a958 json: add value_to_quoted_string helper function
After open-source-parsers/jsoncpp@42a161f commit jsoncpp's version
of valueToQuotedString no longer fits our needs, because too many
UTF-8 characters are unnecessarily escaped. To remedy that,
this commit provides our own string quoting implementation.

Reported-by: Nadav Har'El <nyh@scylladb.com>

Refs #3622
2018-07-25 13:16:00 +02:00
Piotr Sarna
f66aace685 cql3: fix INSERT JSON grammar
Previously CQL grammar wrongfully required INSERT JSON queries
to provide a list of columns, even though they are already
present in JSON itself.
Unfortunately, tests were written with this false assumption as well,
so they're are updated.
Message-Id: <33b496cba523f0f27b6cbf5539a90b6feb20269e.1532514111.git.sarna@scylladb.com>
2018-07-25 11:36:59 +01:00
Avi Kivity
b443a9b930 compaction: demote compaction start/end messages to DEBUG level
Compactions start and end all the time, especially with many shards,
and don't contribute much to understanding what is going on these
days. Compaction throughput is available through the metrics and
other information is available via the compaction history table.

Demote compaction start and end messages to DEBUG level to keep
the log clean. Cleaning and resharding compactions are kept as
INFO, at least for now, since they are manual operations and
therefore rarer.
Message-Id: <20180724132859.14109-1-avi@scylladb.com>
2018-07-25 09:53:39 +01:00
Takuya ASADA
58f094e06d dist/debian: fix ImportError on pystache
Seems like pystache does not provides dependency, need to install it on
build_deb.sh.

Fixes #3627

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180724164852.16094-1-syuu@scylladb.com>
2018-07-25 07:42:19 +03:00
Avi Kivity
e2ad45c3db Merge "Add clustering prefix logic to indexes and filtering" from Piotr
"
This series follows up ALLOW FILTERING support series and depends on
this one: https://groups.google.com/d/msg/scylladb-dev/Qxt3_MP03jI/5ZhRTJ3gBwAJ

The following optimizations regarding clustering key prefix and filtering are
applied:
 * if clustering key restrictions require filtering, but they still
   contain any part of the prefix, this prefix can be used to narrow
   down the query by using it in computing clustering bounds
 * if an indexed query has partition key restrictions and any clustering
   key restrictions that form a prefix, then from now on this prefix
   will be used to narrow down the index query

"

Ref #3611.

* 'use_prefix_with_filtering_and_si_4' of https://github.com/psarna/scylla:
  tests: add prefix cases to indexed filtered queries tests
  cql3: use ck prefix in filtered queries
  cql3: use clustering key prefix in index queries
  cql3: add conversion to ck longest prefix restrictions
  cql3: add prefix_size method to ck restrictions
2018-07-23 15:28:50 +03:00
Piotr Sarna
517a5b66ba tests: add prefix cases to indexed filtered queries tests
More cases related to querying clustering key prefix in an indexed
query are added to secondary index test suite.
2018-07-23 14:10:52 +02:00
Piotr Sarna
8523c24576 cql3: use ck prefix in filtered queries
If a filtering query has restrictions that include any clustering
prefix, the longest prefix will be used to narrow down the query.

Fixes #3611
2018-07-23 14:10:52 +02:00
Piotr Sarna
6cc8ccc771 cql3: use clustering key prefix in index queries
If an indexed query has partition+clustering key restrictions as well
and at least some of these restrictions create a prefix, this prefix
is used in the index query to narrow down the number of rows read.

Refs #3611
2018-07-23 14:10:52 +02:00
Piotr Sarna
ab74f75727 cql3: add conversion to ck longest prefix restrictions
For optimization purposes it's sometimes useful to extract
the longest prefix of clustering key restrictions in order
to narrow down queries.
2018-07-23 14:10:52 +02:00
Piotr Sarna
2e4c493870 cql3: add prefix_size method to ck restrictions
Clustering key restrictions are usually set for at least part
of the clustering key prefix. A method of extracting the longest
prefix's size is added.
2018-07-23 14:10:52 +02:00
Vladimir Krivopalov
ec7f853f49 sstables: Do not pass liveness_info to consume_row_end().
The liveness_info is unconditionally added to the _in_progress_row as of
commit cbfc741d70 so no need to pass it to consume_row_end() and add
conditionally.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <7cd3e599817cbd4b857c3295153602cd2b9a6ef1.1532311852.git.vladimir@scylladb.com>
2018-07-23 13:10:36 +03:00
Avi Kivity
bb79eccf55 tests: sstable_mutation_test: hack around leak during sstable close
sstable close is an asychronous operation launched in the background,
so we can't wait for it. If the test ends before all operations are
complete, the background operations are detected as leaks.

We need either a proper close(), or maybe a sstables::quiesce() that
waits until there are no sstables alive on the shard, but until then,
a hack.
2018-07-23 12:40:46 +03:00
Avi Kivity
af6ce47082 Merge "Support filtering and fast-forwarding with SSTables 3.x" from Piotr and Vladimir
"
This patchset authored by Piotr fixes ck filtering and fast forwarding in SSTables 3.x.
For now only clustering rows are supported and range tombstones will come next.

Test: unit {release}
"

* 'projects/sstables-30/filtering/v5' of https://github.com/argenet/scylla:
  sstables: Minor clean-up and renaming to clustering_ranges_walker.
  sstables: Add test for filtering and forwarding
  sstables: Fix schema for static row tests
  sstables: Fix ck filtering and fast forwarding
  sstables: Introduce mutation_fragment_filter
2018-07-22 21:11:51 +03:00
Avi Kivity
761931659a Merge "Do not linearise incoming CQL3 requests" from Paweł
"
This series changes the native CQL3 protocl layer so that it works with
fragmented buffers instead of a single temporary_buffer per request.
The main part is fragmented_temporary_buffer which represents a
fragmented buffer consisting of multiple temporary_buffers. It provides
helpers for reading fragmented buffer from an input_stream, interpreting
the data in the fragmented buffer as well as view that satisfy
FragmentRange concept.

There are still situations where a fragmented buffer is linearised. That
includes decompressing client requests (this uses reusable buffers in a
similar way to the code that sends compressed responses), CQL statement
restrictions and values that are hard-coded in prepared statements
(hopefully, the values in those cases will be small), value validation
in some cases (blobs are not validated, irrelevant for many fixed-size
small types, but may be a problem for large text cells) as well as
operations on collections.

Tests: unit(release), dtests(cql_prepared_test.py, cql_tests.py, cql_additional_tests.py)
"

* tag 'fragmented-cql3-receive/v1' of https://github.com/pdziepak/scylla: (23 commits)
  types: bytes_view: override fragmented validate()
  cql3: value_view: switch to fragmented_temporary_buffer::view
  types: add validate that accepts fragmented_temporary_buffer::view
  cql3 query_options: add linearize()
  cql3: query_options: use bytes_ostream for temporaries
  cql3: operation: make make_cell accept fragmented_temporary_buffer::view
  atomic_cell: accept fragmented_temporary_buffer::view values
  cql3: avoid ambiguity in a call to update_parameters::make_cell()
  transport: switch to fragmented_temporary_buffer
  transport: extract compression buffers from response class
  tests/reusable_buffer: test fragmented_temporary_buffer support
  utils: reusable_buffer: support fragmented_temporary_buffer
  tests: add test for fragmented_temporary_buffer
  util fragment_range: add general linearisation functions
  utils: add fragmented_temporary_buffer
  tests: add basic test for transport requests and responses
  tests/random-utils: print seed
  tests/random-utils: generate sstrings
  cql3: add value_view printer and equality comparison
  transport: move response outside of cql_server class
  ...
2018-07-22 19:40:37 +03:00
Avi Kivity
30cddd4531 Merge "Support reading promoted index from SSTables 3.x" from Vladimir and Piotr
"
This patchset adds support for reading Index.db files written in
SSTables 3.x ('mc') format.

Note that the offsets map introduced in SSTables 3.x is neither used nor
read yet. It is located last in promoted index and so current parsers
just ignore it for the time being.

Later it should be used to perform binary search of a desired promoted
index block in large partition, thus reducing the complexity from linear
to logarithmic.

Tests: unit {release}
"

* 'projects/sstables-30/index_reader/v5' of https://github.com/argenet/scylla:
  sstables: Add getter for end_open_marker to index_reader.
  tests: Add test reading index for a partition comprised of RT markers of boundary types.
  tests: Add test for reading index of a partition comprised of only range tombstones.
  tests: Use std::adjacent_find in index_reader_assertions::has_monotonic_positions()
  tests: Read rows only index
  sstables: Do not seek through the promoted index for static row positions.
  sstables: Read promoted index stored in SSTables 3.x ('mc') format.
  sstables: Make promoted_index_block support clustering keys for both ka/la and mc formats.
  utils: Add overloaded_functor helper.
  position_in_partition: Add a constructor from range_tag_t{}, bound_kind and clustering_key_prefix.
  sstables: Support reading signed vints in continuous_data_consumer.
  sstables: Factor out the code building a vector of fixed clustering values lengths.
  sstables: Remove unused includes from index_entry.hh
  tests: Add test for reading SSTables 3.x index file with empty promoted index.
  tests: Rename sstable_assertions.hh -> tests/index_reader_assertions.hh
  sstables: Support parsing index entries from SSTables 3.x format.
  sstables: move bound_kind_m to header
2018-07-22 16:15:41 +03:00
Vladimir Krivopalov
df1a151f75 sstables: Minor clean-up and renaming to clustering_ranges_walker.
- Renamed _current to _current_range to better reflect its nature as
  there are other similarly named members (_current_start and
  _current_end).

- Don't use a temporary variable for incrementing the change counter.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 16:34:37 -07:00
Piotr Jastrzebski
01611f2083 sstables: Add test for filtering and forwarding
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-07-20 16:34:37 -07:00
Piotr Jastrzebski
3466dc2368 sstables: Fix schema for static row tests 2018-07-20 16:34:37 -07:00
Piotr Jastrzebski
abf3fc1b98 sstables: Fix ck filtering and fast forwarding
Both were broken before this change.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 16:34:37 -07:00
Piotr Jastrzebski
564bcfa4d0 sstables: Introduce mutation_fragment_filter
This class encapsulates the logic related to
clustering key filtering and fast forwarding.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 16:19:07 -07:00
Vladimir Krivopalov
4d3467d793 sstables: Add getter for end_open_marker to index_reader.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:51:13 -07:00
Vladimir Krivopalov
c7285abc9e tests: Add test reading index for a partition comprised of RT markers of boundary types.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:51:13 -07:00
Vladimir Krivopalov
91f96d7d2b tests: Add test for reading index of a partition comprised of only range tombstones.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:51:13 -07:00
Vladimir Krivopalov
fc051954c2 tests: Use std::adjacent_find in index_reader_assertions::has_monotonic_positions()
Not only this is easier to read and understand, but it also doesn't
force the promoted_index_block class to support copying which is
heavyweight and otherwise not needed.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:51:13 -07:00
Vladimir Krivopalov
d4e0fa96e3 tests: Read rows only index
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:51:13 -07:00
Vladimir Krivopalov
5561c713d9 sstables: Do not seek through the promoted index for static row positions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:51:13 -07:00
Vladimir Krivopalov
917528c427 sstables: Read promoted index stored in SSTables 3.x ('mc') format.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:51:13 -07:00
Vladimir Krivopalov
86d14f8166 sstables: Make promoted_index_block support clustering keys for both ka/la and mc formats.
This is a pre-requisite for parsing promoted index blocks written in
SSTables 'mc' format.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:51:13 -07:00
Vladimir Krivopalov
79c2f0095c utils: Add overloaded_functor helper.
The overloaded_functor class template can be used to encompass multiple
lambdas accepting different types into a single callable object that can
be used with any of those types.

One application is visitors for std::variant where different handling is
required for different types.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:50:17 -07:00
Vladimir Krivopalov
593d8faf7d position_in_partition: Add a constructor from range_tag_t{}, bound_kind and clustering_key_prefix.
This facilitates position_in_partition creation when parsing range tombstones bounds from SSTables files.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:50:17 -07:00
Vladimir Krivopalov
997ebaaa14 sstables: Support reading signed vints in continuous_data_consumer.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:50:17 -07:00
Vladimir Krivopalov
540dfcc9bf sstables: Factor out the code building a vector of fixed clustering values lengths.
This code will be re-used in promoted_index_blocks_parser to parse
clustering key prefixes from SSTables 3.x format.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:50:17 -07:00
Vladimir Krivopalov
741d5f3b5d sstables: Remove unused includes from index_entry.hh
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:50:17 -07:00
Vladimir Krivopalov
b29b948872 tests: Add test for reading SSTables 3.x index file with empty promoted index.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:50:17 -07:00
Vladimir Krivopalov
054eb2df66 tests: Rename sstable_assertions.hh -> tests/index_reader_assertions.hh
The previous name of the file is moreover confusing as we have several
sstable_assertions classes throughout tests but this header only
contains a class for index reader assertions.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:50:17 -07:00
Vladimir Krivopalov
f50ffa267f sstables: Support parsing index entries from SSTables 3.x format.
With this patch, index_reader is capable of reading index_entries from
both 'ka'/'la' and 'mc' formats.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:50:17 -07:00
Piotr Jastrzebski
d0f8c71e28 sstables: move bound_kind_m to header
and add helper methods.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-07-20 13:50:17 -07:00
Duarte Nunes
6bd087facb Merge 'Make indexed queries with pk restrictions non-filtering' from Piotr
"
Queries that use secondary index and have a full partition key restriction
or full primary key restriction should not require filtering - it's
sufficient to add these restrictions to the index query.
This also adds secondary index tests to cover this case.

Tests: unit (release)
"

* 'si_and_pk_restrictions_2' of https://github.com/psarna/scylla:
  tests: add index + partition key test
  cql3: make index+primary key restrictions filtering-independent
  cql3: use primary key restrictions in filtering index queries
  cql3: add is_all_eq to primary key restrictions
  cql3: add explicit conversion between key restrictions
  cql3: add apply_to() method to single column restriction
  cql3: make primary key restrictions' values unambiguous
2018-07-19 16:54:43 +01:00
Tomasz Grabiec
d5534d6a77 Merge "Improve categorization of messaging verbs into connections" from Avi
Now that verb categorizations also affect scheduling, getting them
correct is more important. The first three patches in this series
improve the infrastructure a little, and the forth fixes some
categorization errors wrt. repair/streaming verbs.

* https://github.com/avikivity/scylla msg-idx-sanity/v1:
  messaging: choose connection index via a look-up table
  messaging: convert do_get_rpc_client_idx into a switch
  messaging: remove default when computing rpc client index
  messaging: categorize more streaming/repair verbs as streaming
2018-07-19 15:03:15 +02:00
Tomasz Grabiec
ef4fb1f91d sstables: mp_row_consumer_m: Add trace-level logging
Very useful for debugging. The old mp_row_consumer_k_l had this.

Message-Id: <1532000326-28649-1-git-send-email-tgrabiec@scylladb.com>
2018-07-19 14:58:00 +03:00
Asias He
1f06ee3960 range_streamer: Limit nr of nodes to stream in parallel
For example, to bootstrap a 50th node in a cluster

 [shard 0] range_streamer - Bootstrap with
 [127.0.0.8, 127.0.0.2, 127.0.0.24, 127.0.0.21, 127.0.0.49, 127.0.0.44,
 127.0.0.9, 127.0.0.7, 127.0.0.47, 127.0.0.15, 127.0.0.5, 127.0.0.30,
 127.0.0.14, 127.0.0.12, 127.0.0.36, 127.0.0.11, 127.0.0.48, 127.0.0.28,
 127.0.0.33, 127.0.0.10, 127.0.0.41, 127.0.0.4, 127.0.0.40, 127.0.0.3,
 127.0.0.6, 127.0.0.43, 127.0.0.22, 127.0.0.26, 127.0.0.42, 127.0.0.25,
 127.0.0.17, 127.0.0.37, 127.0.0.23, 127.0.0.13, 127.0.0.38, 127.0.0.1,
 127.0.0.18, 127.0.0.20, 127.0.0.39, 127.0.0.27, 127.0.0.34, 127.0.0.32,
 127.0.0.19, 127.0.0.16, 127.0.0.31, 127.0.0.45, 127.0.0.29, 127.0.0.35,
 127.0.0.46]
 for keyspace=keyspace1 started, nodes_to_stream=49, nodes_in_parallel=49

the new node will get data from 49 existing nodes.

Currently, it will stream from all the 49 existing nodes at the same
time. It is not a good idea to stream from all the nodes in parallel
which can overwhelm the bootstrap node, i.e., 49 nodes sending, 1 node
receiving.

To fix this, limit the nr of nodes to stream in parallel. We should have
a better control over the memory usage and parallelism. But for now,
limit the nr of nodes to a maximum of 16 as a starter. With this limit,
each shard can work with as many as 16 remote nodes in parallel, I think
this has enough parallelism for streaming in terms of performance.

This change have effect on the bootstrap/decommission/removenode node
operations, and do not have effect on repair.

Refs #2782

Message-Id: <980610dc97490d4f16281a0c3203b9bee73e04e4.1531989557.git.asias@scylladb.com>
2018-07-19 11:44:05 +03:00
Avi Kivity
31d4d37161 Merge "Reduce continuous memory usage in gossip" from Asias"
"
Use chunked_vector instead of vector. It won't have compatibility issues
because chunked_vector and vector have the same on wire format.

Refs #278
"

* 'asias/gossip_memory_v2' of github.com:scylladb/seastar-dev:
  gossip: Reduce continuous memory usage
  to_string: Add std::list and utils::chunked_vector support
  serializer: Add chunked_vector support
2018-07-19 09:12:09 +03:00
Tomasz Grabiec
9a0548397c tests: row_cache: Add test for eviction from invalidated partitions
Message-Id: <1531933216-28026-1-git-send-email-tgrabiec@scylladb.com>
2018-07-18 21:06:36 +03:00
Piotr Sarna
82c049692b tests: add index + partition key test
Tests covering querying both index and partition keys are added
- it's checked that such queries do not require filtering.
2018-07-18 18:45:08 +02:00
Piotr Sarna
0c85bdcdc2 cql3: make index+primary key restrictions filtering-independent
If full partition key (or full primary key) is used in an indexed
query, it should not require filtering, because queries like that
can be efficiently narrowed down with stricter index restrictions.
2018-07-18 18:45:08 +02:00
Piotr Sarna
2542630a18 cql3: use primary key restrictions in filtering index queries
If both index and partition key is used in a query, it should not
require filtering, because indexed query can be narrowed down
with partition key information. This commit appends partition key
restrictions to index query.
2018-07-18 18:45:08 +02:00
Piotr Sarna
27590816f0 cql3: add is_all_eq to primary key restrictions
is_all_eq is later needed to decide if restrictions can be used
in an indexed query.
2018-07-18 18:45:08 +02:00
Piotr Sarna
20a349777e cql3: add explicit conversion between key restrictions
Partition and clustering key restrictions sometimes need to be converted
and this commit provides a way to do that.
2018-07-18 18:45:08 +02:00
Piotr Sarna
f1357defd6 cql3: add apply_to() method to single column restriction
This method allows copying single column restriction,
possibly with a new column definition.
2018-07-18 18:44:38 +02:00
Tomasz Grabiec
dc453d4f5d tests: flat_mutation_reader: Use fluent assertions for better error messages
Message-Id: <1531908313-29810-2-git-send-email-tgrabiec@scylladb.com>
2018-07-18 13:52:23 +01:00
Tomasz Grabiec
604c8baed8 tests: flat_mutation_reader_assertions: Introduce produces(mutation_fragment)
Message-Id: <1531908313-29810-1-git-send-email-tgrabiec@scylladb.com>
2018-07-18 13:52:23 +01:00
Tomasz Grabiec
c46813717c tests: sstables: Check that reading large index pages does not cause large allocations
Reproducer of #3597.

Message-Id: <1531914040-5427-1-git-send-email-tgrabiec@scylladb.com>
2018-07-18 14:56:41 +03:00
Piotr Sarna
30f9924ad5 cql3: make primary key restrictions' values unambiguous
using directive must be used to disambiguate the overridden method.
2018-07-18 13:28:37 +02:00
Paweł Dziepak
a0c1c0c921 types: bytes_view: override fragmented validate()
The default implementation linearises the buffer and calls
validate(bytes_view). This is bad and not needed for bytes_type which
doesn't do any validation anyway.
2018-07-18 12:28:06 +01:00
Paweł Dziepak
0b9eed72f4 cql3: value_view: switch to fragmented_temporary_buffer::view 2018-07-18 12:28:06 +01:00
Paweł Dziepak
0551efee3b types: add validate that accepts fragmented_temporary_buffer::view 2018-07-18 12:28:06 +01:00
Paweł Dziepak
8f4cb36ef2 cql3 query_options: add linearize()
Some code in the CQL3 layer requires bytes_view and it is fairly
reasonable to assume that it won't deal with large buffers (e.g.
statement restrictions). query_options already has make_temporary()
which takes ownership of a cql3::raw_value so that the rest of the code
can use cql3::raw_value_view. This patch adds similar linearize()
function which, if necessary, linearises a cql3::raw_value_view and
returns a bytes_view with lifetime tied to the life or query_options.
2018-07-18 12:28:06 +01:00
Paweł Dziepak
3810045f8f cql3: query_options: use bytes_ostream for temporaries
bytes_ostream is going to be more efficient than
std::vector<std::vector<char>> since it can put multiple small values in
a single buffer thus reducing the number of memory allocations.
2018-07-18 12:28:06 +01:00
Paweł Dziepak
dff6cd3e2f cql3: operation: make make_cell accept fragmented_temporary_buffer::view 2018-07-18 12:28:06 +01:00
Paweł Dziepak
cc87263bd8 atomic_cell: accept fragmented_temporary_buffer::view values 2018-07-18 12:28:06 +01:00
Paweł Dziepak
7d7910aa4d cql3: avoid ambiguity in a call to update_parameters::make_cell()
Using initializer lists in calls like foo({}) is ambiguous if foo() has
multiple overloads with more than one accepting a type that is
default-constructible. update_parameters::make_cell() is about to get an
overload that accepts fragmented_temporary_buffer::view as a value, so
let's make sure its call site won't be ambiguous.
2018-07-18 12:28:06 +01:00
Paweł Dziepak
8c6e544fec transport: switch to fragmented_temporary_buffer
The logic responsible for reading requests was operating on
temporary_buffer<char> and bytes_view. This required all request
messages to be linearised to a contiguous buffer, possibly causing large
allocations. Changing to fragmented_temporary_buffer mostly alleviates this
problem unless the reader code explicitly asks for a contiguous bytes_view.
2018-07-18 12:28:06 +01:00
Paweł Dziepak
f95bb21d99 transport: extract compression buffers from response class
Both compression and decompression code is going to reuse the same pair
of reusable buffers.
2018-07-18 12:28:06 +01:00
Paweł Dziepak
a8c4f41a0b tests/reusable_buffer: test fragmented_temporary_buffer support 2018-07-18 12:28:06 +01:00
Paweł Dziepak
32ba47fb87 utils: reusable_buffer: support fragmented_temporary_buffer
reusable_buffer already supports bytes_ostream which is often used for
handling data sent from Scylla. This patch adds support for
fragmented_temporary_buffer which is going to be mainly used for data
received by Scylla.
2018-07-18 12:28:06 +01:00
Paweł Dziepak
166c9a3b8c tests: add test for fragmented_temporary_buffer 2018-07-18 12:28:06 +01:00
Paweł Dziepak
b152aafd67 util fragment_range: add general linearisation functions
All FragmentRange implementations can be linearised in the same way, so
let's provide linearized() and with_linearized() functions for all of
them.
2018-07-18 12:28:06 +01:00
Paweł Dziepak
fc484f0819 utils: add fragmented_temporary_buffer
Seastar output_streams produce temporary_buffer<char>s.
fragmented_temporary_buffer represents a single fragmented buffer that
consists of, possibly multiple, temporary_buffer<char>s.
2018-07-18 12:28:06 +01:00
Paweł Dziepak
b5a72a880b tests: add basic test for transport requests and responses 2018-07-18 12:28:06 +01:00
Paweł Dziepak
054d39b8f7 tests/random-utils: print seed
Knowning the seed will make it easier to investigate failures in
randomised tests.
2018-07-18 12:28:06 +01:00
Paweł Dziepak
9445ce3f84 tests/random-utils: generate sstrings 2018-07-18 12:28:06 +01:00
Paweł Dziepak
46acd76cc8 cql3: add value_view printer and equality comparison
BOOST_CHECK_*() expect compared objcts to be equality-comparable and
printable.
2018-07-18 12:28:06 +01:00
Paweł Dziepak
24929fd2ce transport: move response outside of cql_server class 2018-07-18 12:28:06 +01:00
Paweł Dziepak
5986e7a383 transport: drop request_reader::read_value() 2018-07-18 12:28:06 +01:00
Paweł Dziepak
72450e2f7f transport: extract request reading to request_reader 2018-07-18 12:28:06 +01:00
Paweł Dziepak
1eeef4383c transport: fix use-after-free in read_name_and_value_list() 2018-07-18 12:28:06 +01:00
Avi Kivity
31151cadd4 Merge "row_cache: Fix violation of continuity on concurrent eviction and population" from Tomasz
"
The problem happens under the following circumstances:

  - we have a partially populated partition in cache, with a gap in the middle

  - a read with no clustering restrictions trying to populate that gap

  - eviction of the entry for the lower bound of the gap concurrent with population

The population may incorrectly mark the range before the gap as continuous.
This may result in temporary loss of writes in that clustering range. The
problem heals by clearing cache.

Caught by row_cache_test::test_concurrent_reads_and_eviction, which has been
failing sporadically.

The problem is in ensure_population_lower_bound(), which returns true if
current clustering range covers all rows, which means that the populator has a
right to set continuity flag to true on the row it inserts. This is correct
only if the current population range actually starts since before all
clustering rows. Otherwise, we're populating since _last_row and should
consult it.

Fixes #3608.
"

* 'tgrabiec/fix-violation-of-continuity-on-concurrent-read-and-eviction' of github.com:tgrabiec/scylla:
  row_cache: Fix violation of continuity on concurrent eviction and population
  position_in_partition: Introduce is_before_all_clustered_rows()
2018-07-18 10:11:34 +03:00
Asias He
506eed325a dht: Fix typo in boot_strapper.cc
Eror -> Error

Message-Id: <ab1050c526f6e70c3a365595376acde7706d86e9.1531877929.git.asias@scylladb.com>
2018-07-18 10:00:27 +03:00
Tomasz Grabiec
894961006b Merge "db/view/view_builder: Fixes to bookkeeping" from Duarte
This series contains a couple of fixes to the bookkeeping of the view
build process, which could cause data to be left behind in the system
tables.

* git@github.com:duarten/scylla.git materialized-views/view-build-fixes/v1:

Duarte Nunes (3):
  db/system_keyspace: Add function to remove view build status of a
    shard
  db/view: Don't have shard 0 clear other shard's status on drop
  db/view: Restrict writes to the distributed system keyspace to shard 0
2018-07-17 18:01:28 +02:00
Tomasz Grabiec
25d09e51ac Merge "db/view/build_progress_virtual_reader: Fixes to clustering key adjusts" from Duarte
This series contains a couple of fixes to the adjusting of clustering
keys in the build_progress_virtual_reader, some of which could
potentially cause heap overflows when querying the legacy system table.

* git@github.com:duarten/scylla.git materialized-views/build-progress-virtual-reader-fixes/v1:

Duarte Nunes (3):
  db/view/build_progress_virtual_reader: Use correct schema to adjust ck
  db/view/build_progress_virtual_reader: Fix full ck detection
  db/view/build_progress_virtual_reader: Also adjust end RT bound
2018-07-17 18:00:30 +02:00
Avi Kivity
9ffa6b9ad6 Merge "Fix leaks and corruption of continuity in cache in case of bad_alloc from key linearization" from Tomasz
"
This series fixes two issues related to bad_allocs and keys which require
linearization (larger than 12.8 KiB). With such keys, comparators may throw if
memory allocation fails. This may cause lookups in partition and rows trees to
fail with bad_alloc.

The first issue (#3583) was that partition version merging
(mutation_partition::apply_monotonically()) was not taking into account that
lookups may fail. If we fail, the partition which is being applied may be
incorrectly left with the clustering range since the begging of the range up
to the current row marked as continuous, if the current row has the continuity
flag set, because we've moved all of the preceding rows into the target, and
the correct lower bound row is no longer there in the source. This may mark
some discontinuous ranges as continuous. Merging is retried by
allocating_section, and there will be no problem if it eventually succeeds,
original continuity will be reflected in the sum. The problem will persist if
it doesn't eventually succeed, when we're really out of memory.

The user-perceivable effect of this would be temporary loss of writes in the
clustering range which was marked as continuous but shouldn't. Introduced in
2.2-rc1.

The second issue (#3585) is that the code which inserts partitions in memtable
and cache will leak the entry if boost::intrusive_set::insert() throws. This
will also cause SIGSEGV when cache tries to evict from such a leaked entry.
"

* tag 'tgrabiec/fix-bad-continuity-on-oom-in-apply-v2' of github.com:tgrabiec/scylla:
  managed_bytes: Mark read_linearize() as an allocation point
  tests: Relax expectation about continuity after failed merging
  tests: mutation_partition: Verify continuity is consistent on bad_alloc on merging
  tests: Switch to seastar's allocation failure injector
  mutation_partition: Introduce set_continuity()
  clustering_interval_set: Introduce contained_in()
  clustering_interval_set: Introduce add() overload accepting another interval set
  mutation_partition: Fix merging to not leave the source with broader continuity on bad_alloc
  mutation_partition: Preserve continuity in case row merging with no tracker throws
  memtable, cache: Fix exception safety of partition entry insertions
2018-07-17 18:19:37 +03:00
Tomasz Grabiec
477d7b439b row_cache: Fix violation of continuity on concurrent eviction and population
ensure_population_lower_bound() returned true if current clustering
range covers all rows, which means that the populator has a right to
set continuity flag to true on the row it inserts. This is correct
only if the current population range actually starts since before all
clustering rows. Otherwise we're populating since _last_row, and
should consult it.

The fix introduces a new flag, set when starting to populte, which
indicates if we're populating from the beginning of the range or
not. We cannot simply check if _last_row is set in
ensure_population_lower_bound() because _last_row can be set and then
become empty again.

Fixes #3608
2018-07-17 16:43:21 +02:00
Tomasz Grabiec
8d47d21149 position_in_partition: Introduce is_before_all_clustered_rows() 2018-07-17 16:43:21 +02:00
Tomasz Grabiec
612b223819 managed_bytes: Mark read_linearize() as an allocation point 2018-07-17 16:39:43 +02:00
Tomasz Grabiec
be678a81ee tests: Relax expectation about continuity after failed merging
Currently we check that the sum of continuities is exactly the same as
expected on failure. Relax this to require that continuity is not
broader, since in some bad_alloc scenarios, or preemption, we will
have to mark some ranges as discontinuous.
2018-07-17 16:39:43 +02:00
Tomasz Grabiec
f366ac76e8 tests: mutation_partition: Verify continuity is consistent on bad_alloc on merging 2018-07-17 16:30:01 +02:00
Tomasz Grabiec
d9db79a85d tests: Switch to seastar's allocation failure injector
It catches more allocation sites.
2018-07-17 16:30:01 +02:00
Tomasz Grabiec
6b1fe6cbe5 mutation_partition: Introduce set_continuity() 2018-07-17 16:30:01 +02:00
Tomasz Grabiec
ac772cbd81 clustering_interval_set: Introduce contained_in() 2018-07-17 16:30:01 +02:00
Tomasz Grabiec
d24ebe8565 clustering_interval_set: Introduce add() overload accepting another interval set 2018-07-17 16:30:01 +02:00
Tomasz Grabiec
c6c54021a8 mutation_partition: Fix merging to not leave the source with broader continuity on bad_alloc
When clustering keys are larger than 12.8 KiB they may get fragmented
and key comparator will need to linearize them on comparison. This may
cause lookups in the rows tree to fail with bad_alloc. Partition
version merging (mutation_partition::apply_monotonically()) was not
taking this into account. If we fail on lookup, the partition which is
being applied may be incorrectly left with the clustering range since
the begging up to the current row marked as continuous, if the current
row has the continuity flag set, because we've moved all of the
preceding rows into the target, and the correct lower bound row is no
longer there in the source. This may mark some discontinuous ranges as
continuous.

Merging is retried by allocating_section, and there will be no problem
if it eventually suceeds, original continity will be reflected in the
sum. The problem will persist if it doesn't eventually succeed, when
we're really out of memory.

To protect against this, we could reset the continuity flag of the
current row in the source when exiting on exception.

Fixes #3583
2018-07-17 16:30:01 +02:00
Tomasz Grabiec
de5c52f422 mutation_partition: Preserve continuity in case row merging with no tracker throws
Example:

 p:      row{key=A, cont=0} row{key=C, cont=1}
 this:                      row{key=C, cont=0}

When we get to processing key=C, key=A was already moved to this, so p
has stale continuity on key=C, which marks (-inf,C) as continuous,
whereas it should mark only (A, C). That's not a problem if merging
succeeds, but if exception happens at this point, we will violate the
invariant which says that the sum of p and this should yield the same
logical partition. It wouldn't because continuity of the sum is
calculated as a set union, and (-inf, A) would be incorrectly turned
into a continuous range.

This is not a problem currently because continuity is always full when
there is no tracker (memtables), so won't change anyway, and when
there is a tracker (cache) we never merge but overwrite instead, so
there is no memory allocation and thus no possibility for failure. But
better be safe.
2018-07-17 16:30:01 +02:00
Tomasz Grabiec
567da3e063 memtable, cache: Fix exception safety of partition entry insertions
boost::intrusive::set::insert() may throw if keys require
linearization and that fails, in which case we will leak the entry.

When this happens in cache, we will also violate the invariant for
entry eviction, which assumes all tracked entries are linked, and
cause a SEGFAULT.

Use the non-throwing and faster insert_before() instead. Where we
can't use insert_before(), use alloc_strategy_unique_ptr<> to ensure
that entry is deallocated on insert failure.

Fixes #3585.
2018-07-17 16:30:01 +02:00
Tomasz Grabiec
c82c0be0be tests: mutation_diff: Ignore differences in memory addresses
Differences in memory addresses are not necessarily differences in
values.

Refs #3571

Message-Id: <1531824919-12737-1-git-send-email-tgrabiec@scylladb.com>
2018-07-17 16:32:04 +03:00
Amos Kong
0fcdab8538 scylla_setup: nic setup dialog is only for interactive mode
Current code raises dialog even for non-interactive mode when we pass options
in executing scylla_setup. This blocked automatical artifact-test.

Fixes #3549

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <58f90e1e2837f31d9333d7e9fb68ce05208323da.1531824972.git.amos@scylladb.com>
2018-07-17 16:31:18 +03:00
Paweł Dziepak
422d1eaeb9 Merge "Improve usability of pkeys in system.large_partitions table" from Avi
"
Partition keys are currently stored in serialized form in the
system.large_partitions table. This is an obstacle to operators
who usually can't deserialize partition keys in their heads.

Improve the situation by deserializing the partition key for them.
"

* tag 'pkey-print/v1' of https://github.com/avikivity/scylla:
  large_partition_handler: output friendly partition key
  keys: schema-aware printing of a partition_key
2018-07-17 13:51:22 +01:00
Avi Kivity
002ac87aac Update seastar submodule
* seastar aac6cf1...6b97e00 (5):
  > Merge "changes to fix travis CI builds" from Kefu
  > tls.cc: Make "close" timeout delay exception proof
  > core/sharded: mark foreign_ptr::get_owner_shard() const
  > core/memory: Expose counter of large allocations
  > tests: add test for multi-fragmented net::packet

Fixes #3461.
Ref scylladb/seastar#474.
2018-07-17 15:43:01 +03:00
Tomasz Grabiec
3f509ee3a2 mutation_partition: Fix exception-safety of row copy constructor
In case population of the vector throws, the vector object would not
be destroyed. It's a managed object, so in addition to causing a leak,
it would corrupt memory if later moved by the LSA, because it would
try to fixup forward references to itself.

Caused sporadic failures and crashes of row_cache_test, especially
with allocation failure injector enabled.

Introduced in 27014a23d7.
Message-Id: <1531757764-7638-1-git-send-email-tgrabiec@scylladb.com>
2018-07-17 13:21:21 +01:00
Asias He
fd71c5718f gossip: Reduce continuous memory usage
Gossip SYN and ACK uses std::vector to store a list of gossip_digest,
the larger the cluster, the more continuous memory is needed. To reduce
the memory pressure which might cause std::bad_alloc, switch the std::vector
to chunked_vector.

In addition, change add_local_application_state to use std::list instead
of std::vector.

Refs #2782
2018-07-17 20:15:32 +08:00
Avi Kivity
acb3163639 large_partition_handler: output friendly partition key
Use abstract_type::to_string() to prettify partition key components.

Manually tested by setting --compaction-large-partition-warning-threshold-mb
to zero and inspecting the output for compound and non-compound partition
keys.
2018-07-17 14:44:52 +03:00
Avi Kivity
bfd14b4123 keys: schema-aware printing of a partition_key
Add a with_schema() helper to decorate a partition key with its
schema for pretty-printing purposes, and matching operator<<.

This is useful to print partition keys where the operator, who
may not be familiar with the encoding, may see them.
2018-07-17 14:43:12 +03:00
Tomasz Grabiec
d94c7c07a3 lsa: Disable alloc failure injector inside the LSA sanitizer
Message-Id: <1531814822-30259-1-git-send-email-tgrabiec@scylladb.com>
2018-07-17 11:27:56 +01:00
Asias He
77018b7304 to_string: Add std::list and utils::chunked_vector support
It will be used by the gossip code.
2018-07-17 16:14:31 +08:00
Asias He
e4802d2fe3 serializer: Add chunked_vector support
It will be used by the gossip SYN and ACK message soon.
2018-07-17 16:12:50 +08:00
Botond Dénes
cc4acb6e26 storage_proxy: use the original row limits for the final results merging
`query_partition_key_range()` does the final result merging and trimming
(if necessary) to make sure we don't send more rows to the client than
requested. This merging and trimming is done by a continuation attached
to the `query_partition_key_range_concurrent()` which does the actual
querying. The continuations captures via value the `row_limit` and
`partition_limit` fields of the `query::read_command` object of the
query. This has an unexpected consequence. The lambda object is
constructed after the call to `query_partition_key_range_concurrent()`
returns. If this call doesn't defer, any modifications done to the read
command object done by `query_partition_key_range_concurrent()` will be
visible to the lambda. This is undesirable because
`query_partition_key_range_concurrent()` updates the read command object
directly as the vnodes are traversed which in turn will result in the
lambda doing the final trimming according to a decremented `row_limits`,
which will cause the paging logic to declare the query as exhausted
prematurely because the page will not be full.
To avoid all this make a copy of the relevant limit fields before
`query_partition_key_range_concurrent()` is called and pass these copies
to the continuation, thus ensuring that the final trimming will be done
according to the original page limits.

Spotted while investigating a dtest failure on my 1865/range-scans/v2
branch. On that branch the way range scans are executed on replicas is
completely refactored. These changes appearantly reduce the number of
continuations in the read path to the point where an entire page can be
filled without deferring and thus causing the problem to surface.

Fixes #3605.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f11e80a6bf8089d49ba3c112b25a69edf1a92231.1531743940.git.bdenes@scylladb.com>
2018-07-16 16:54:50 +03:00
Takuya ASADA
9479ff6b1e dist/common/scripts/scylla_prepare: fix error when /etc/scylla/ami_disabled exists
On this part shell command wasn't converted to python3, need to fix.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180715075015.13071-1-syuu@scylladb.com>
2018-07-16 09:29:38 +03:00
Avi Kivity
c4013f6fe1 messaging: categorize more streaming/repair verbs as streaming
Since the messaging service will assign a scheduling group based
on the client index, it's more important now to get the verbs categorized
correctly.

Re-categorize REPLICATION_FINISHED, REPAIR_CHECKSUM_RANGE, and most
importantly STREAM_MUTATION_FRAGMENTS to the repair/streaming oriented
connections so we get the correct scheduling.
2018-07-15 15:44:10 +03:00
Avi Kivity
ff3d7839ab messaging: remove default when computing rpc client index
A default means that when adding new verbs, we may forget to
categorize a verb correctly.  Without the default, the compiler
will complain due to -Wswitch.
2018-07-15 15:40:29 +03:00
Avi Kivity
fe2db68be8 messaging: convert do_get_rpc_client_idx into a switch
A switch is more readable for multiple choice with no
clearly preferred choice.
2018-07-15 15:26:50 +03:00
Avi Kivity
3b1e04091c messaging: choose connection index via a look-up table
Looking up is faster than a bunch of if()s.
2018-07-15 15:21:06 +03:00
Takuya ASADA
1511d92473 dist/redhat: drop scylla_lib.sh from .rpm
Since we dropped scylla_lib.sh at 58e6ad22b2,
we need remove it from RPM spec file too.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180712155129.17056-1-syuu@scylladb.com>
2018-07-15 14:46:22 +03:00
Avi Kivity
ef9b36376c Merge "database: support multiple data directories" from Glauber
"
While Cassandra supports multiple data directories, we have been
historically supporting just one. The one-directory model suits us
better because of the I/O Scheduler and so far we have seen very few
requests -- if any, to support this.

Still, the infrastructure needed to support multiple directories can be
beneficial so I am trying to bring this in.

For simplicity, we will treat the first directory in the list as the
main directory. By being able to still associate one singular directory
with a table, most of the code doesn't have to change and we don't have
to worry about how to distribute data between the directories.

In this design:
- We scan all data directories for existing data.
- resharding only happens within a particular data directory.
- snapshot details are accumulated with data for all directories that
  host snapshots for the tables we are examining
- snapshots are created with files in its own directories, but the
  manifest file goes to the main directory. For this one, note that in
  Cassandra the same thing happens, except that there is no "main"
  directory. Still the manifest file is still just in one of them.
- SSTables are flushed into the main directory.
- Compactions write data into the main directory

Despite the restrictions, one example of usage of this is recovery.  If
we have network attached devices for instance, we can quickly attach a
network device to an existing node and make the data immediately
available as it is compacted back to main storage.

Tests: unit (release)
"

* 'multi-data-file-v2' of github.com:glommer/scylla:
  database: change ident
  database: support multiple data directories
  database: allow resharing to specify a directory
  database: support multiple directories in get_snapshot_details
  database: move get_snapshot_info into a seastar::thread
  snapshots: always create the snapshot directory
  sstables: pass sstable dir with entry descriptor
  database: make nodetool listsnapshots print correct information
  sstables: correctly create descriptors for snapshots
2018-07-15 13:31:04 +03:00
Avi Kivity
8ee807321f Merge "scylla streaming with rpc streaming" from Asias
"
This work is on top of Gleb's rpc streaming which is merged recently.

What this series does is to replace scylla streaming service's data plane to
use the new rpc streaming instead of the old rpc verb to send the mutations for
scylla streaming. Other parts of scylla streaming, the control plane, are not
changed.

In my test, to bootstrap a new node to the existing one node cluster, smp 2,
scylla stores data on ramdisk to minimize disk io impact.

I saw x2 improvment in streaming bandwidth.

Before:
   [shard 0] stream_session - [Stream #2ae92320-5fc8-11e8-911a-000000000000]
   Streaming plan for Bootstrap-ks3-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=1570312 KiB, 109521.02 KiB/s
   [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks3 succeeded, took 14.338 seconds

After:
   [shard 0] stream_session - [Stream #e5589ac0-5fc7-11e8-b463-000000000000]
   Streaming plan for Bootstrap-ks3-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=1546875 KiB, 220415.36 KiB/s
   [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks3 succeeded, took 7.018 seconds

Tests: dtest update_cluster_layout_tests.py

Fixes: #3591
"

* tag 'asias/scylla_streaming_with_rpc_streaming_v8' of github.com:scylladb/seastar-dev:
  streaming: Add rpc streaming support
  storage_service: Introduce STREAM_WITH_RPC_STREAM feature
  streaming: Add estimate_partitions to send_info
  messaging_service: Add streaming with rpc streaming support
  messaging_service: Add streaming_domain
  database: Add add_sstable_and_update_cache
  database: Add make_streaming_sstable_for_write
2018-07-15 12:36:52 +03:00
Vlad Zolotarov
235520292e utils::loading_cache: hold a shared_value_ptr to the value when we reload
This allows to remove the requirement to hold the key value inside the
_load callback if its value is needed in the asynchronous continuation
inside the callback in the context of a reload.

This also resolves the use-after-free issue when a _load() callback removes
the item for a given key.

See a9b72db34d.1528794135.git.bdenes%40scylladb.com
for a discussion about this.

In addition this patch makes the loading_cache more robust for any existing
and potential situations when cached entries are being removed from inside the
callback. This is achieved by extending the idea implemented by Duarte in the
"utils/loading_cache: Avoid using invalidated iterators" by capturing timestamped_val_ptr
(which is essentially a lw_shared_ptr to an intrusive set entry which holds both the key
and the cached value) instead of a naked pointer.

Tests {debug, release}:
   - Unit tests:
      - loading_cache_test
      - view_build_test
      - auth_test
      - auth_resource_test

   - dtest:
      - auth_test.py

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-07-13 11:27:58 -04:00
Vlad Zolotarov
b44ad5677a utils::loading_cache::on_timer(): remove not needed capture of "this"
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-07-13 11:27:43 -04:00
Vlad Zolotarov
4aa0e5914b utils::loading_cache::on_timer(): use chunked_vector for storing elements we want to reload
The list of elements that needs to be reloaded may be rather large.
Use chunked_vector in order to make the allocator's life easier.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-07-13 09:53:59 -04:00
Avi Kivity
8c993e0728 messaging: tag RPC services with scheduling groups
Assign a scheduling_group for each RPC service. Assignement is
done by connection (get_rpc_client_idx()) - all verbs on the
same connection are assigned the same group. While this may seem
arbitrary, it avoids priority inversion; if two verbs on the same
connection have different scheduling groups, the verb with the low
shares may cause a backlog and stall the connection, including
following requests from verbs that ought to have higher shares.

The scheduling_group parameters are encapsulated in different
classes as they are passed around to avoid adding dependencies.
Message-Id: <20180708140433.6426-1-avi@scylladb.com>
2018-07-13 13:57:08 +02:00
Vladimir Krivopalov
cf7b42619d clustering_ranges_walker: Improve class consistency and readability.
This patch addresses several issues.
  1. The class no longer uses placement-new trick for move-assignment.
     It was incorrect to use because the class contains const refererences
     and re-initializing the same region of memory would result in undefined
     behaviour on accessing these members.

  2. Use boost::iterator_range for tracking the current range of
     cr_ranges. It is easier to deal with and avoids possible bugs like
     assigning only one of two iterators
Message-Id: <4096182c4ee2fb1157e135c487c41012b266ba69.1531440684.git.vladimir@scylladb.com>
2018-07-13 11:23:33 +02:00
Asias He
deff5e7d60 streaming: Add rpc streaming support
This patch changes scylla streaming to use the recently added rpc
streaming feature provided by seastar to send mutation fragments for
scylla streaming instead of the rpc verbs.

It also changes the receiver to write to the sstable file directly,
skipping writing to memtable.
2018-07-13 08:36:47 +08:00
Asias He
71e22fe981 storage_service: Introduce STREAM_WITH_RPC_STREAM feature
With this feature, the node supports scylla streaming using the rpc
streaming.
2018-07-13 08:36:47 +08:00
Asias He
faa6769cdb streaming: Add estimate_partitions to send_info
The sender needs to estimate the number of partitions to send, because
the receiver needs this to prepare the sstables.
2018-07-13 08:36:46 +08:00
Asias He
ddfb4590ce messaging_service: Add streaming with rpc streaming support
Preparation for adding rpc streaming in scylla streaming.

- register_stream_mutation_fragments is used to register the rpc
streaming verb

- make_sink_and_source_for_stream_mutation_fragments is used to get the
sink and source object for the sender

- make_sink_for_stream_mutation_fragments is used to get a sink object
for the receiver
2018-07-13 08:36:46 +08:00
Asias He
671e1b08fe messaging_service: Add streaming_domain
The rpc streaming needs a streaming_domain id for the same logical
server. Chose one for our messaging service.
2018-07-13 08:36:46 +08:00
Asias He
6540051f77 database: Add add_sstable_and_update_cache
Since we can write mutations to sstable directly in streaming, we need
to add those sstables to the system so it can be seen by the query.
Also we need to update the cache so the query refects the latest data.
2018-07-13 08:36:45 +08:00
Asias He
dfc2739625 database: Add make_streaming_sstable_for_write
This will be used to create sstable for streaming receiver to write the
mutations received from network to sstable file instead of writing to
memtable.
2018-07-13 08:36:45 +08:00
Takuya ASADA
ee61660b76 dist/common/scripts/scylla_ec2_check: support custom NIC ifname on EC2
Since some AMIs using consistent network device naming, primary NIC
ifname is not 'eth0'.
But we hardcoded NIC name as 'eth0' on scylla_ec2_check, we need to add
--nic option to specify custom NIC ifname.

Fixes #3584

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180712142446.15909-1-syuu@scylladb.com>
2018-07-12 18:22:28 +03:00
Tomasz Grabiec
b17f7257a9 sstables: index_reader: Reduce size of index_entry by indirecting promoted_index
Reduces size of index_entry from 384 bytes to 64 bytes by using
indirection for the optional promoted index instead of embedding it.

Improves query time from 9ms to 4ms in a micro benchmark with a very
large index page.

Message-Id: <1531406354-10089-1-git-send-email-tgrabiec@scylladb.com>
2018-07-12 17:46:58 +03:00
Tomasz Grabiec
101dcdbb48 gdb: Fix scylla heapprof command
Type of _frames was chagned to static_vector<>

Message-Id: <1531233685-20786-2-git-send-email-tgrabiec@scylladb.com>
2018-07-12 16:51:30 +03:00
Tomasz Grabiec
059133ffa8 gdb: Introduce iteration wrapper for static_vector
Message-Id: <1531233685-20786-1-git-send-email-tgrabiec@scylladb.com>
2018-07-12 16:51:30 +03:00
Duarte Nunes
63b63b0461 utils/loading_cache: Avoid using invalidated iterators
When periodically reloading the values in the loading_cache, we would
iterate over the list of entries and call the load() function for
those which need to be reloaded.

For some concrete caches, load() can remove the entry from the LRU set,
and can be executed inline from the parallel_for_each(). This means we
could potentially keep iterating using an invalidated iterator.

Fix this by using a temporary container to hold those entries to be
reloaded.

Spotted when reading the code.

Also use if constexpr and fix the comment in the function containing
the changes.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180712124143.13638-1-duarte@scylladb.com>
2018-07-12 13:59:09 +01:00
Botond Dénes
2e7bf9c6f9 loading_cache::reload(): obtain key before calling _load()
The continuation attached to _load() needs the key of the loaded entry
to check whether it was disposed during the load. However if _load()
invalidates the entry the continuation's capture line will access
invalid memory while trying to obtain the key.
To avoid this save a copy of the key before calling _load() and pass it
to both _load() and the continuation.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <b571b73076ca863690f907fbd3fb4ff54e597b28.1531393608.git.bdenes@scylladb.com>
2018-07-12 13:42:42 +01:00
Avi Kivity
a4a2f743a8 Merge "Avoid large allocations when reading sstable index pages" from Tomasz
"
If there is a lot of partitions in the index page, index_list may grow large
and require large contiguous blocks of memory, because it's based on
std::vector. That puts pressure on the memory allocator, and if memory is
fragmented, may not be possible to satisfy without a lot of eviction. Switch
to chunked_vector to avoid this.

Refs #3597
"

* 'tgrabiec/avoid-large-alloc-in-index-reader' of github.com:tgrabiec/scylla:
  sstables: Switch index_list to chunked_vector to avoid large allocations
  utils: chunked_vector: Do not require T to be default-constructible for clear()
  utils: chunked_vector: Implement front()
2018-07-12 15:30:18 +03:00
Duarte Nunes
1fb3b924f4 utils/loading_cache: Remove superfluous continuation
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180712122031.13424-1-duarte@scylladb.com>
2018-07-12 15:22:35 +03:00
Takuya ASADA
8f80d23b07 dist/common/scripts/scylla_util.py: fix typo
Fix typo, and rename get_mode_cpu_set() to get_mode_cpuset(), since a
term 'cpuset' is not included '_' on other places.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180711141923.12675-1-syuu@scylladb.com>
2018-07-12 10:14:55 +03:00
Tomasz Grabiec
8c85b01ad3 gdb: Fix scylla lsa-segment on python 3
Referring to a function parameter via "global" no longer works on
python 3. We should be using "nonlocal", which is absent on python 2
though. To make the script work on both, inline next().

Message-Id: <1531317984-29224-1-git-send-email-tgrabiec@scylladb.com>
2018-07-12 10:14:22 +03:00
Duarte Nunes
a7fdf4fc49 Merge 'ALLOW FILTERING for indexed queries' from Piotr
"
Previous series on ALLOW FILTERING introduced it for regular queries,
but it's also possible to have an indexed query which requires
filtering. This series contains minor fixes that allow treating
indexed+filtered queries properly. The most important part is having
more selective approach of extracting values from restrictions
in read_posting_list() helper function. Before ALLOW FILTERING,
restrictions contained only a single entry that matched the indexed
column, but it's not the case with filtering (and it won't be the case
with multiple indexing support).

This series also comes with test cases for indexed+filtered queries.

Tests: unit (release)
"

* 'allow_filtering_and_si_3' of https://github.com/psarna/scylla:
  tests: add filtering indexed queries tests
  cql3: use single restriction value in index creation
  cql3: add secondary index condition to need_filtering
  cql3: add value_for method
  cql3: add missing inline declarations to restrictions
  cql3: make index detection more specific
  index: add target_column getter to index
2018-07-12 00:17:36 +01:00
Duarte Nunes
55caaec411 db/view/build_progress_virtual_reader: Also adjust end RT bound
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-07-11 23:28:31 +01:00
Duarte Nunes
eda6b88b0e db/view/build_progress_virtual_reader: Fix full ck detection
As an optimization, the virtual reader doesn't change the underlying
key if it is not full, and hence doesn't include the extra clustering
key. However, this detection is broken because it checked for 3
clustering columns, instead of 2.

This patch fixes that by obtaining the clustering key size from the
underlying schema instead of hardcoding the size.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-07-11 23:28:31 +01:00
Duarte Nunes
ff3a0d437a db/view/build_progress_virtual_reader: Use correct schema to adjust ck
The virtual reader adjusts clustering keys obtained from the
underlying, scylla-specific schema, and potentially sheds the extra
clustering key that's absent from the Cassandra-compatible schema.

This patches ensures we use the correct schema to iterator over the
key.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-07-11 23:28:31 +01:00
Duarte Nunes
df66d7db59 db/view: Restrict writes to the distributed system keyspace to shard 0
Writing to the distributed system keyspace should be confined to a
single shard of each host, namely shard 0. We were violating this
constraint by having all shards set the host status to "started". This
could be problematic when the build finishes quickly or there's a
concurrent view drop, such that a write done by shard 0 can have a
smaller timestamp than one done by some other shard.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-07-11 21:45:26 +01:00
Duarte Nunes
e683c1367f db/view: Don't have shard 0 clear other shard's status on drop
Shard 0 can clear the in-progress build status of all shards when a
view finishes building, because we are ensured all writes to the
system table have completed with earlier timestamps.

This is not the case when dropping a view. A drop can happen
concurrently with the build, in which case shard 0 may process the
notification before another shard receives it, and before that shard
writes to the system table.

Fix this by ensuring each shard clears its own status on drop.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-07-11 21:45:26 +01:00
Duarte Nunes
2fa7f10429 db/system_keyspace: Add function to remove view build status of a shard
This patch adds a function that clears the view build in-progress
status for the current shard, similar to the existing one that clears
it across all shards.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-07-11 21:27:39 +01:00
Piotr Sarna
fcfbc804e4 tests: add filtering indexed queries tests
Tests covering ALLOW FILTERING usage while using secondary indexes
as well are added to cql_query_test.
Tests are based on Cassandra's test suite for filtering secondary
indexes + some more simple cases.
2018-07-11 18:06:21 +02:00
Piotr Sarna
7d9715db27 cql3: use single restriction value in index creation
ALLOW FILTERING support caused index-related restrictions to possibly
have more values. In order to remain correct, only those restrictions
which match the indexed columns should be used.
2018-07-11 18:06:21 +02:00
Piotr Sarna
1d75035672 cql3: add secondary index condition to need_filtering
A query that restricts a partition key and an indexed column
needs filtering (after reading an index) and it wasn't
properly detected before.
2018-07-11 18:06:21 +02:00
Piotr Sarna
80ce9b72a1 cql3: add value_for method
In order to extract value from a restriction for just one column,
value_for(column_name, options) method is implemented.
It's needed because once ALLOW FILTERING support was introduced,
index-related restrictions may contain more than 1 value.
2018-07-11 18:06:21 +02:00
Piotr Sarna
c1ad28f28e cql3: add missing inline declarations to restrictions
In order to prevent future compilation errors, externally defined
class methods from single column primary key restrictions are explicitly
marked inline.
2018-07-11 18:06:21 +02:00
Piotr Sarna
02811d8996 cql3: make index detection more specific
Conditions that detect if restrictions need an indexed query weren't
specific enough to work properly with mixed index-filtering queries,
because they would overly eager assume that partition/clustering key
restrictions have a backing index.
2018-07-11 18:06:21 +02:00
Piotr Sarna
372644c909 index: add target_column getter to index
Target column for an index is later needed to find matching
restrictions.
2018-07-11 18:06:21 +02:00
Tomasz Grabiec
3b2890e1db sstables: Switch index_list to chunked_vector to avoid large allocations
If there is a lot of partitions in the index page, index_list may grow
large and require large contiguous blocks of memory. That puts
pressure on the memory allocator, and if memory is fragmented, may not
be possible to satisfy without a lot of eviction.
2018-07-11 16:55:20 +02:00
Tomasz Grabiec
b0f5df10d2 utils: chunked_vector: Do not require T to be default-constructible for clear()
resize(), used by clear(), requires T to be default-constructible in
case the vector is expanded. It's not actually needed for clearing,
and there will be users which use clear() with
non-default-constructible T, so implement clear() without using
resize().
2018-07-11 16:55:20 +02:00
Tomasz Grabiec
03832dab97 utils: chunked_vector: Implement front()
std::vector<> has it, so should this, for easy migration.
2018-07-11 16:55:20 +02:00
Piotr Sarna
dcdd8be59c cql3: make index-related tests less timing dependent
Indexes and materialized views take time to build, so checks
that rely on that are now wrapped with 'eventually' blocks.

Message-Id: <6d3def2bc49b76dda11d7a1c9974a8b3d221003f.1531312518.git.sarna@scylladb.com>
2018-07-11 15:45:52 +03:00
Takuya ASADA
58e6ad22b2 dist/common/scripts: drop scylla_lib.sh
Drop scylla_lib.sh since all bash scripts depends on the library is
already converted to python3, and all scylla_lib.sh features are
implemented on scylla_util.py.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180711114756.21823-1-syuu@scylladb.com>
2018-07-11 14:54:56 +03:00
Avi Kivity
83d72f3755 Update scylla-ami submodule
* dist/ami/files/scylla-ami 5200f3f...d53834f (1):
  > Merge "AMI scripts python3 conversion" from Takuya
2018-07-11 13:16:08 +03:00
Avi Kivity
693cf77022 Merge "more conversion from bash to python3" from Takuya
"Converted more scripts to python3."

* 'script_python_conversion2_v2' of https://github.com/syuu1228/scylla:
  dist/common/scripts/scylla_util.py: make run()/out() functions shorter
  dist/ami: install python34 to run scylla_install_ami
  dist/common/scripts/scylla_ec2_check: move ec2 related code to class aws_instance
  dist/common/scripts: drop class concolor, use colorprint()
  dist/ami/files/.bash_profile: convert almost all lines to python3
  dist/common/scripts: convert node_exporter_install to python3
  dist/common/scripts: convert scylla_stop to python3
  dist/common/scripts: convert scylla_prepare to python3
2018-07-11 13:14:23 +03:00
Tomasz Grabiec
1de5177175 tests: row_cache: Fix use-after-scope on partition_range passed to readers
The partition_range must outlive the reader.

Message-Id: <1531301583-15476-1-git-send-email-tgrabiec@scylladb.com>
2018-07-11 12:39:30 +03:00
Avi Kivity
28621066e6 observable: allow an observable to disconnect() twice without penalty
Message-Id: <20180711070754.13286-1-avi@scylladb.com>
2018-07-11 10:15:01 +01:00
Avi Kivity
1895483781 observable: add comments explaining the purpose and use of the mechanism
Message-Id: <20180710133706.8791-1-avi@scylladb.com>
2018-07-11 10:15:01 +01:00
Avi Kivity
99d3f0a1b1 tests: add obserable_test to test suite
Message-Id: <20180711071131.13702-1-avi@scylladb.com>
2018-07-11 10:15:01 +01:00
Tomasz Grabiec
fde4a312db gdb: Replace long() with int()
Python 3 doesn't have 'long' anymore, so commands using it fail with
newer GDB. long on python2 is the same as int on python3, both are
arbitrary-precision. On python2 int is fixed-precision, but seems to
be still enough (64 bit), so use that instead.

Message-Id: <1531215600-31899-1-git-send-email-tgrabiec@scylladb.com>
2018-07-10 15:05:02 +03:00
Nadav Har'El
5e47061438 repair: fix small error-handling logic mistake
As noticed by Tomasz Grabiec, we test a future's available() after
having already waited for it with when_all(), which is pointless.

The code after the wrong if() exchanges the contents of a token-range
between this node and several other live neighbors; We can't do this
exchange if either this node is broken or there is no other live neighbor.
So this is what we needed to test. so !available() should have been failed().

Also the test for live_neighbors_checksum.empty() added in commit 7c873f0d1f
is unnecessary - we build live_neighbors and live_neighbors_checksum
together, so if one of them is empty, so is the other.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180710114940.26027-1-nyh@scylladb.com>
2018-07-10 15:04:03 +03:00
Piotr Sarna
559439b6ea tests: add more ALLOW FILTERING tests
More test cases are added to cql_query_test in order to check
ALLOW FILTERING clauses more accurately.

Message-Id: <4c59c1f3eb01558be992d0596e5423c276087387.1531220558.git.sarna@scylladb.com>
2018-07-10 14:44:33 +03:00
Piotr Sarna
aadbfc6b84 cql3: throw instead of log for collection filtering
Original series that introduced filtering logged a warning
when collection restrictions appeared. Instead, an exception
should be thrown until collection restrictions are supported
for ALLOW FILTERING clauses.

Message-Id: <ddaf342d4d6766fadb756f66e5afa0b99ce054f8.1531220558.git.sarna@scylladb.com>
2018-07-10 14:44:29 +03:00
Avi Kivity
7db394ce50 observable: switch to noncopyable_function
std::function's move constructor is not noexcept, so observer's move
constructor and assignment operator also cannot be. Switch to Seastar's
noncopyable_function which provides better guarantees.

Tests: observer_tests (release)
Message-Id: <20180710073628.30702-1-avi@scylladb.com>
2018-07-10 09:42:49 +01:00
Avi Kivity
0a2c9387e8 Merge "Support reading deleted cells" from Piotr
"
Implement and test support for reading deleted cells in SSTables 3.
"

* 'haaawk/sstables3/read-deleted-cells-v2' of ssh://github.com/scylladb/seastar-dev:
  sstables: Test reading deleted cells from SST3
  sstables: Support deleted cells in reading SST3
  test_uncompressed_compound_ck_read: fix comment
  utils: add observer/observable templates
2018-07-10 11:21:00 +03:00
Piotr Jastrzebski
0abdd919c8 sstables: Test reading deleted cells from SST3
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-07-10 10:03:29 +02:00
Piotr Jastrzebski
54fc6dde35 sstables: Support deleted cells in reading SST3
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-07-10 10:03:29 +02:00
Piotr Jastrzebski
f64901fdac test_uncompressed_compound_ck_read: fix comment
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-07-10 10:03:14 +02:00
Avi Kivity
96737d140f utils: add observer/observable templates
An observable is used to decouple an information producer from a consumer
(in the same way as a callback), while allowing multiple consumers (called
observers) to coexist and to manage their lifetime separately.

Two classes are introduced:

 observable: a producer class; when an observable is invoked all observers
        receive the information
 observer: a consumer class; receives information from a observable

Modelled after boost::signals2, with the following changes
 - all signals return void; information is passed from the producer to
   the consumer but not back
 - thread-unsafe
 - modern C++ without preprocessor hacks
 - connection lifetime is always managed rather than leaked by default
 - renamed to avoid the funky "slot" name
Message-Id: <20180709172726.5079-1-avi@scylladb.com>
2018-07-09 18:48:44 +01:00
Paweł Dziepak
00a63663d6 bytes_ostream: increase max chunk size to 128 kB
128 kB is the size of the LSA segment and therefore the default size of
any kind of chunks, fragments and buffers.

Message-Id: <20180709155615.22500-1-pdziepak@scylladb.com>
2018-07-09 19:59:51 +03:00
Tomasz Grabiec
1336744a05 mutation_fragment: Fix clustering_row::equal() using incorrect column kind
Incorrect column_kind was passed, which may cause wrong type to be
used for comparison if schema contains static columns. Affects only
tests.

Spotted during code review.
Message-Id: <1531144991-2658-1-git-send-email-tgrabiec@scylladb.com>
2018-07-09 15:25:17 +01:00
Avi Kivity
ed7855a8a6 Update seastar submodule
* seastar 216d499...aac6cf1 (5):
  > reactor: pollable_fd: limit fragment count to IOV_MAX
  > tests: silence more "-Werror=sign-compare" warnings
  > reactor: include <boost/next_prior.hpp>
  > Use `#pragma once` everywhere
  > .gitignore: adds __pycache__ directory
2018-07-09 17:01:44 +03:00
Gleb Natapov
617666efb0 storage_proxy: use logger's exception printer to report read failure
Use existing exception pretty printer since it handles nested
exceptions.

Message-Id: <20180709122826.GT28899@scylladb.com>
2018-07-09 15:31:14 +03:00
Duarte Nunes
156817e00e db/size_estimates_virtual_reader: Use left-exclusive token ranges
We were considering the token ranges in the size_estimates system
table to be inclusive, which is incorrect and incompatible with
Cassandra.

While we ignore the inclusiveness of the partition_range bounds when
selecting sstables, we do take it into account in
estimated_keys_for_range(). We would thus select the correct sstables,
but could over-estimate the range size nonetheless.

Tests: virtual_reader_test(release)

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180709115919.5106-1-duarte@scylladb.com>
2018-07-09 15:26:32 +03:00
Takuya ASADA
1a5a40e5f6 dist/common/scripts/scylla_util.py: use os.open(O_EXCL) to verify disk is unused
To simplify is_unused_disk(), just try to open the disk instead of
checking multiple block subsystems.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180709102729.30066-1-syuu@scylladb.com>
2018-07-09 13:29:15 +03:00
Avi Kivity
7d0df2a06d Update scylla-ami submodule
* dist/ami/files/scylla-ami 67293ba...5200f3f (1):
  > Add custom script options to AMI user-data
2018-07-09 13:21:30 +03:00
Gleb Natapov
ac27d1c93b storage_proxy: fix rpc connection failure handling by read operation
Currently rpc::closed_error is not counted towards replica failure
during read and thus read operation waits for timeout even if one
of the nodes dies. Fix this by counting rpc::closed_error towards
failed attempts.

Fixes #3590.

Message-Id: <20180708123522.GC28899@scylladb.com>
2018-07-09 10:05:31 +03:00
Avi Kivity
2f8537b178 database: demote "Setting compaction strategy" log message to debug level
It's not very helpful in normal operation, and generates much noise,
especially when there are many tables.
Message-Id: <20180708070051.8508-1-avi@scylladb.com>
2018-07-08 10:27:03 +01:00
Avi Kivity
512baf536f storage_proxy: implement write timeouts
Require a timeout parameter for storage_proxy::mutate_begin() and
all its callers (all the way to thrift and cql modification_statement
and batch_statement).

This should fix spurious debug-mode test failures, where overcommit
and general debug slowness result in the default timeouts being
exceeded. Since the tests use infinite timeouts, they should not
time out any more.

Tests: unit (release), with an extra patch that aborts
    when a non-infinite timeout is detected.
Message-Id: <20180707204424.17116-1-avi@scylladb.com>
2018-07-08 10:27:03 +01:00
Takuya ASADA
929ba016ed dist/common/scripts/scylla_util.py: strip double quote from sysconfig parameter
Current sysconfig_parser.get() returns parameter including double quote,
it will cause problem by append text using sysconfig_parser.set().

Fixes #3587

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180706172219.16859-1-syuu@scylladb.com>
2018-07-08 10:47:41 +03:00
Duarte Nunes
1beed0ca16 Merge 'hinted handoff: add rebalancing and unmark as experimental' from Vlad
"
This series adds the last missing part of the HH feature list (as in the design doc) - rebalancing;
and finally removes the "experimental" tag from the HH.
"

* 'hinted_handoff_rebalance-v3' of https://github.com/vladzcloudius/scylla:
  main: remove the "experimental" tag from the hinted handoff feature
  db::hints::manager: implement rebalance() method
2018-07-07 20:38:07 +01:00
Takuya ASADA
a98b4b705c dist/common/scripts/scylla_util.py: make run()/out() functions shorter
Refactored these functions to make them simpler.
2018-07-08 01:13:36 +09:00
Takuya ASADA
e2a032f7ea dist/ami: install python34 to run scylla_install_ami
Since we switched scylla_install_ami to python3, need to install python3
before launching the script.
2018-07-08 01:13:36 +09:00
Takuya ASADA
4e04fb7d68 dist/common/scripts/scylla_ec2_check: move ec2 related code to class aws_instance
There is duplicated code on both scylla_ec2_check and class aws_instance
on scylla_util.py, so drop these code from scylla_ec2_check and use
class aws_instance.
2018-07-08 01:13:36 +09:00
Takuya ASADA
99d5ca03e7 dist/common/scripts: drop class concolor, use colorprint()
To print colored console output with simplar code, drop class concolor
and use colorprint() instead.
2018-07-08 01:13:36 +09:00
Takuya ASADA
14d117363b dist/ami/files/.bash_profile: convert almost all lines to python3
Since it's .bash_profile we cannot make the file to python3 script but
almost all lines are rewritten to python3, .bash_profile just launch it.
2018-07-08 01:13:35 +09:00
Takuya ASADA
25c3249d8d dist/common/scripts: convert node_exporter_install to python3
Convert bash script to python3.
2018-07-08 01:13:35 +09:00
Takuya ASADA
505fcc92f7 dist/common/scripts: convert scylla_stop to python3
Convert bash script to python3.
2018-07-08 01:13:35 +09:00
Takuya ASADA
eb369942bd dist/common/scripts: convert scylla_prepare to python3
Convert bash script to python3.
2018-07-08 01:13:35 +09:00
Vlad Zolotarov
7495c8e56d dist: scylla_lib.sh: get_mode_cpu_set: split the declaration and ssignment to the local variable
In bash local variable declaration is a separate operation with its own exit status
(always 0) therefore constructs like

local var=`cmd`

will always result in the 0 exit status ($? value) regardless of the actual
result of "cmd" invocation.

To overcome this we should split the declaration and the assignment to be like this:

local var
var=`cmd`

Fixes #3508

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1529702903-24909-3-git-send-email-vladz@scylladb.com>
2018-07-07 18:04:19 +03:00
Vlad Zolotarov
f3ca17b1a1 dist: scylla_lib.sh: get_mode_cpu_set: don't let the error messages out
References #3508

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1529702903-24909-2-git-send-email-vladz@scylladb.com>
2018-07-07 18:04:18 +03:00
Avi Kivity
e79fccdf7b Update seastar submodule
* seastar d7f35d7...216d499 (10):
  > temporary_buffer: Add clone method()
  > temporary_buffer: Make move-assignment operator noexcept.
  > deleter: Make move-assignment operator noexcept.
  > reactor: don't become inefficient when max_task_backlog is exceeded
  > reactor: switch cumulative time metrics resolution from nanoseconds to milliseconds
  > preempt: annotate for branch prediction
  > tests: silence "-Werror=sign-compare" warnings
  > Merge "Support one I/O Scheduler per device" from Glauber
  > rpc: make rpc server scheduling aware
  > Add SEASTAR_USER_CFLAGS and SEASTAR_ENABLE_WERROR
2018-07-07 17:48:25 +03:00
Vlad Zolotarov
c65a110839 main: remove the "experimental" tag from the hinted handoff feature
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-07-06 19:19:40 -04:00
Vlad Zolotarov
83ba6d84a1 db::hints::manager: implement rebalance() method
Rebalance hints segments that need to be sent among all present shards.

Ensure that after rebalancing the difference between the number of segments
of any two shards is not greater than 1.

Try to minimize the amount of "file rename" operations in order to achieve the needed result.

Note: "Resharding" is a particular case of rebalancing.

Tests: dtest: hintedhandoff_additional_test.py:TestHintedHandoff.hintedhandoff_rebalance_test

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-07-06 19:18:46 -04:00
Piotr Sarna
77aa97f62a cql3: fix ALLOW FILTERING iterator
In original series cell iterator for regular cells
was erroneously taken by copy instead of by reference,
which will result in iterating over the first value indefinitely.
Also, the same iterator was not updated for collections,
which is fixed too.
Message-Id: <83297adf8121de4fd37257c87f250d61ea9ec80b.1530892191.git.sarna@scylladb.com>
2018-07-06 17:23:12 +01:00
Duarte Nunes
0ec3ff0611 Merge 'Add ALLOW FILTERING metrics' from Piotr
"
This series addresses issue #3575 by adding 3 ALLOW FILTERING related
metrics to help profile queries:
 * number of read request that required filtering
 * total number of rows read that required filtering
 * number of rows read that required filtering and matched

Tests: unit (release)
"

* 'allow_filtering_metrics_4' of https://github.com/psarna/scylla:
  cql3: publish ALLOW FILTERING metrics
  cql3: add updating ALLOW FILTERING metrics
  cql3: define ALLOW FILTERING metrics
2018-07-06 11:19:37 +01:00
Piotr Sarna
4a435e6f66 cql3: publish ALLOW FILTERING metrics
ALLOW FILTERING related metrics are registered and published.

Fixes #3575
2018-07-06 12:00:37 +02:00
Piotr Sarna
03f2f8633b cql3: add updating ALLOW FILTERING metrics
Metrics related to ALLOW FILTERING queries are now properly
updated on read requests.
2018-07-06 12:00:29 +02:00
Piotr Sarna
8cb242ab0b cql3: define ALLOW FILTERING metrics
The following metrics are defined for ALLOW FILTERING:
 * number of read request that required filtering
 * total number of rows read that required filtering
 * number of rows read that required filtering and matched
2018-07-06 10:43:18 +02:00
Glauber Costa
82f7f7b36d database: change ident
Previous patches have used reviewer-oriented identation. Re-ident.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-07-05 17:11:01 -04:00
Glauber Costa
99c8a1917f database: support multiple data directories
While Cassandra supports multiple data directories, we have been
historically supporting just one. The one-directory model suits us
better because of the I/O Scheduler and so far we have seen very few
requests -- if any, to support this.

Still, the infrastructure needed to support multiple directories can be
beneficial so I am trying to bring this in.

For simplicity, we will treat the first directory in the list as the
main directory. By being able to still associate one singular directory
with a table, most of the code doesn't have to change and we don't have
to worry about how to distribute data between the directories.

In this design:
 - We scan all data directories for existing data.
 - resharding only happens within a particular data directory.
 - snapshot details are accumulated with data for all directories that
   host snapshots for the tables we are examining
 - snapshots are created with files in its own directories, but the
   manifest file goes to the main directory. For this one, note that in
   Cassandra the same thing happens, except that there is no "main"
   directory. Still the manifest file is still just in one of them.
 - SSTables are flushed into the main directory.
 - Compactions write data into the main directory

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-07-05 16:58:39 -04:00
Glauber Costa
3b46984a1e database: allow resharing to specify a directory
resharding assumes that all SSTables will be in cf->dir(), but in
reality we will soon have tables in other places. We can specify a
directory in get_all_shared_sstables and specify that directory from the
resharding process.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-07-05 16:58:08 -04:00
Glauber Costa
c8b2d441a8 database: support multiple directories in get_snapshot_details
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-07-05 16:58:08 -04:00
Glauber Costa
a8ccf4d1e6 database: move get_snapshot_info into a seastar::thread
I am about to add another level of identation and this code already
shifts right too much. It is not performance critical, so let's use a
thread for that. seastar::threads did not exist when this was first
written.

Also remove one unused continuation from inside the inner scan,
simplifying its code.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-07-05 16:58:08 -04:00
Glauber Costa
919c7d6bb9 snapshots: always create the snapshot directory
We currently don't always create the snapshot directory as an
optimization. We have a test at sync time handling this use case.

This works well when all SSTables are created in the same directory, but
if we have more than one data directory than it may not work if we don't
have SSTables in all data directories.

We can fix it by unconditionally creating the directory.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-07-05 16:58:08 -04:00
Glauber Costa
86239e4e22 sstables: pass sstable dir with entry descriptor
We have been assuming that all SSTables for a table will be in the same
directory, and we pass the directory name to make_descriptor only
because that's the way in ka and la to find out the keyspace and table
names.

However, SSTables for a given column family could be spread into
multiple directories. So let's pass it down with the descriptor so we
can load from the right place.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-07-05 16:45:26 -04:00
Glauber Costa
25a02c61d6 database: make nodetool listsnapshots print correct information
nodetool listsnapshots is currently printing zero sizes for all snapshots
The reason for that is that we are moving the snapshot directory name in
the capture list, which can be evaluated by the compiler to happen
before we use it as the function parameter.

Fixes #3572

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-07-05 16:20:07 -04:00
Glauber Costa
4a62866104 sstables: correctly create descriptors for snapshots
Our regular expression for parsing SSTable files tests for the directory
for the la file format, since that file format does not include the
ks/cf pair in the file name itself.

However, the regular expression does not cover the case in which the
SSTable files are coming from snapshots. This patch extends the regex so
they are also covered.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-07-05 16:19:09 -04:00
Raphael S. Carvalho
dfd1e1229e sstables/compaction_manager: fix typo in function name to reevaluate postponed compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180702185343.26682-1-raphaelsc@scylladb.com>
2018-07-05 18:54:14 +03:00
Takuya ASADA
4df982fe07 dist/common/scripts/scylla_sysconfig_setup: fix typo
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180705133313.16934-1-syuu@scylladb.com>
2018-07-05 16:38:14 +03:00
Avi Kivity
7a1bcd9ad3 Merge "Improve mutation printing in GDB" from Tomasz
"
This is a series of patches which make it possible for a human to examine
contents of cache or memtables from GDB.
"

* 'tgrabiec/gdb-cache-printers' of github.com:tgrabiec/scylla:
  gdb: Add pretty printer for managed_vector
  gdb: Add pretty printer for rows
  gdb: Add mutation_partition pretty printer
  gdb: Add pretty printer for partition_entry
  gdb: Add pretty printer for managed_bytes
  gdb: Add iteration wrapper for intrusive_set_external_comparator
  gdb: Add iteration wrapper for boost intrusive set
2018-07-05 14:08:58 +03:00
Avi Kivity
f55a2fe3a7 main: improve reporting of dns resolution errors
A report that C-Ares returned some errors tells the user nothing.

Improve the error message by including the name of the configuration
variable and its value.
Message-Id: <20180705084959.10872-1-avi@scylladb.com>
2018-07-05 10:24:41 +01:00
Duarte Nunes
c126b00793 Merge 'ALLOW FILTERING support' from Piotr
"
The main idea of this series is to provide a filtering_visitor
as a specialised result_set_builder::visitor implementation
that keeps restriction info and applies it on query results.
Also, since allow_filtering checking is not correct now (e.g. #2025)
on select_statement level, this series tries to fix any issues
related to it.

Still in TODO:
 * handling CONTAINS relation in single column restriction filtering
 * handling multi-column restrictions - especially EQ, which can be
   split into multiple single-column restrictions
 * more tests - it's never enough; especially esoteric cases
   like filtering queries which also use secondary indexes,
   paging tests, etc.

Tests: unit (release)
"

* 'allow_filtering_6' of https://github.com/psarna/scylla:
  tests: add allow_filtering tests to cql_query_test
  cql3: enable ALLOW FILTERING
  service: add filtering_pager
  cql3: optimize filtering partition keys and static rows
  cql3: add filtering visitor
  cql3: move result_set_builder functions to header
  cql3: amend need_filtering()
  cql3: add single column primary key restrictions getters
  cql3: expose single column primary key restrictions
  cql3: add needs_filtering to primary key restrictions
  cql3: add simpler single_column_restriction::is_satisfied_by
2018-07-05 10:18:08 +01:00
Piotr Sarna
a7dd02309f tests: add allow_filtering tests to cql_query_test
Test cases for ALLOW FILTERING are added to cql_query_test suite.
2018-07-05 10:50:43 +02:00
Piotr Sarna
27bf20aa3f cql3: enable ALLOW FILTERING
Enables 'ALLOW FILTERING' queries by transfering control
to result_set_builder::filtering_visitor.
Both regular and primary key columns are allowed,
but some things are left unimplemented:
 - multi-column restrictions
 - CONTAINS queries

Fixes #2025
2018-07-05 10:50:43 +02:00
Piotr Sarna
7b018f6fd6 service: add filtering_pager
For paged results of an 'ALLOW FILTERING' query, a filtering pager
is provided. It's based on a filtering_visitor for result_builder.
2018-07-05 10:50:43 +02:00
Piotr Sarna
a08fba19e3 cql3: optimize filtering partition keys and static rows
If any restriction on partition key or static row part fails,
it will be so for every row that belongs to a partition.
Hence, full check of the rest of the rows is skipped.
2018-07-05 10:50:43 +02:00
Piotr Sarna
2a0b720102 cql3: add filtering visitor
In order to filter results of an 'ALLOW FILTERING' query,
a visitor that can take optional filter for result_builder
is provided. It defaults to nop_filter, which accepts
all rows.
2018-07-05 10:50:43 +02:00
Piotr Sarna
1cf5653f89 cql3: move result_set_builder functions to header
Moving function definitions to header is a preparation step
before turning result_set_builder into a template.
2018-07-05 10:50:43 +02:00
Piotr Sarna
4d3d32f465 cql3: amend need_filtering()
Previous implementation of need_filtering() was too eager to assume
that index query should be used, whereas sometimes a query should
just be filtered.
2018-07-05 10:50:39 +02:00
Avi Kivity
dd083122f9 Update scylla-ami submodule
* dist/ami/files/scylla-ami 0fd9d23...67293ba (1):
  > scylla_install_ami: fix broken argument parser

Fixes #3578.
2018-07-05 09:48:06 +03:00
Avi Kivity
f4caa418ff Merge "Fix the "LCS data-loss bug"" from Botond
"
This series fixes the "LCS data-loss bug" where full scans (and
everything that uses them) would miss some small percentage (> 0.001%)
of the keys. This could easily lead to permanent data-loss as compaction
and decomission both use full scans.
aeffbb673 worked around this bug by disabling the incremental reader
selectors (the class identified as the source of the bug) altogether.
This series fixes the underlying issue and reverts aeffbb673.

The root cause of the bug is that the `incremental_reader_selector` uses
the current read position to poll for new readers using
`sstable_set::incremental_selector::select()`. This means that when the
currently open sstables contain no partitions that would intersect with
some of the yet unselected sstables, those sstables would be ignored.
Solve the problem by not calling `select()` with the current read
position and always pass the `next_position` returned in the previous
call. This means that the traversal of the sstable-set happens at a pace
defined by the sstable-set itself and this guarantees that no sstable
will be jumped over. When asked for new readers the
`incremental_reader_selector` will now iteratively call `select()` using
the `next_position` from the previous `select()` call until it either
receives some new, yet unselected sstables, or `next_position` surpasses
the read position (in which case `select()` will be tried again later).
The `sstable_set::incremental_selector` was not suitable in its present
state to support calling `select()` with the `next_position` from a
previous call as in some cases it could not make progress due to
inclusiveness related ambiguities. So in preparation to the above fix
`sstable_set` was updated to work in terms of ring-position instead of
tokens. Ring-position can express positions in a much more fine-grained
way then token, including positions after/before tokens and keys. This
allows for a clear expression of `next_position` such that calling
`select()` with it guarantees forward progress in the token-space.

Tests: unit(release, debug)

Refs: #3513
"

* 'leveled-missing-keys/v4' of https://github.com/denesb/scylla:
  tests/mutation_reader_test: combined_mutation_reader_test: use SEASTAR_THREAD_TEST_CASE
  tests/mutation_reader_test: refactor combined_mutation_reader_test
  tests/mutation_reader_test: fix reader_selector related tests
  Revert "database: stop using incremental selectors"
  incremental_reader_selector: don't jump over sstables
  mutation_reader: reader_selector: use ring_position instead of token
  sstables_set::incremental_selector: use ring_position instead of token
  compatible_ring_position: refactor to compatible_ring_position_view
  dht::ring_position_view: use token_bound from ring_position
  i_partitioner: add free function ring-position tri comparator
  mutation_reader_merger::maybe_add_readers(): remove early return
  mutation_reader_merger: get rid of _key
2018-07-05 09:33:12 +03:00
Takuya ASADA
3bcc123000 dist/ami: hardcode target for scylla_current_repo since we don't have --target option anymore
We break build_ami.sh since we dropped Ubuntu support, scylla_current_repo
command does not finishes because of less argument ('--target' with no
distribution name, since $TARGET is always blank now).
It need to hardcoded as centos.

Fixes #3577

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180705035251.29160-1-syuu@scylladb.com>
2018-07-05 09:31:43 +03:00
Paweł Dziepak
07a429e837 test.py: do not disable human-readable format with --jenkins flag
When test.py is run with --jenkins flag Boost UTF is asked to generate
an XML file with the test results. This automatically disables the
human-readable output printed to stdout. There is no real reason to do
so and it is actually less confusing when the Boost UTF messages are in
the test output together with Scylla logger messages.

Message-Id: <20180704172913.23462-1-pdziepak@scylladb.com>
2018-07-05 09:31:15 +03:00
Raphael S. Carvalho
7d6af5da3a sstables/compaction_manager: properly reevaluate postponed compactions for leveled strategy
Function to reevaluate postponed compaction was called too early for strategies that
don't allow parallel compaction (only leveled strategy (LCS) at this moment).
Such strategies must first have the ongoing compaction deregistered before reevaluating
the postponed ones. Manager uses task list of ongoing compaction to decides if there's
ongoing compaction for a given column family. So compaction could stop making progress
at all *if and only if* we stop flushing new data.

So it could happen that a column family would be left with lots of pending compaction,
leading the user to think all compacting is done, but after reboot, there will be
lots of compaction activity.

We'll both improve method to detect parallel compaction here and also add a call to
reevaluate postponed compaction after compaction is done.

Fixes #3534.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180702185327.26615-1-raphaelsc@scylladb.com>
2018-07-04 16:30:21 +01:00
Botond Dénes
b32f94d31e tests/mutation_reader_test: combined_mutation_reader_test: use SEASTAR_THREAD_TEST_CASE 2018-07-04 17:42:37 +03:00
Botond Dénes
77ad085393 tests/mutation_reader_test: refactor combined_mutation_reader_test
Make combined_mutation_reader_test more interesting:
* Set the levels on the sstables
* Arrange the sstables so that they test for the "jump over sstables"
bug.
* Arrange the sstables so that they test for the "gap between sstables".

While at it also make the code more compact.
2018-07-04 17:42:37 +03:00
Botond Dénes
4b57fc9aea tests/mutation_reader_test: fix reader_selector related tests
Don't assume the partition keys use lexical ordering. Add some
additional checks.
2018-07-04 17:42:37 +03:00
Botond Dénes
a9c465d7d2 Revert "database: stop using incremental selectors"
The data-loss bug is fixed, the incremental selector can be used again.

This reverts commit aeffbb6732.
2018-07-04 17:42:37 +03:00
Botond Dénes
c37aff419e incremental_reader_selector: don't jump over sstables
Passing the current read position to the
`incremental_selector::select()` can lead to "jumping" through sstables.
This can happen when the currently open sstables have no partition that
intersects with a yet unselected sstable that has an intersecting range
nevertheless, in other words there is a gap in the selected sstables
that this unselected one completely fits into. In this case the
unselected sstable will be completely omitted from the read.
The solution is to not to avoid calling `select()` with a position that
is larger than the `next_position` returned from the previous `select()`
call. Instead, call `select()` repeatedly with the `next_position` from
the previous call, until either at least one new sstable is selected or
the current read position is surpassed. This guarantess that no sstables
will be jumped over. In other words, advance the incremental selector in
a pace defined by itself thus guaranteeing that no sstable will be
jumped over.
2018-07-04 17:42:37 +03:00
Botond Dénes
81a03db955 mutation_reader: reader_selector: use ring_position instead of token
sstable_set::incremental selector was migrated to ring position, follow
suit and migrate the reader_selector to use ring_position as well. Above
correctness this also improves efficiency in case of dense tables,
avoiding prematurely selecting sstables that share the token but start
at different keys, altough one could argue that this is a niche case.
2018-07-04 17:42:37 +03:00
Botond Dénes
a8e795a16e sstables_set::incremental_selector: use ring_position instead of token
Currently `sstable_set::incremental_selector` works in terms of tokens.
Sstables can be selected with tokens and internally the token-space is
partitioned (in `partitioned_sstable_set`, used for LCS) with tokens as
well. This is problematic for severeal reasons.
The sub-range sstables cover from the token-space is defined in terms of
decorated keys. It is even possible that multiple sstables cover
multiple non-overlapping sub-ranges of a single token. The current
system is unable to model this and will at best result in selecting
unnecessary sstables.
The usage of token for providing the next position where the
intersecting sstables change [1] causes further problems. Attempting to
walk over the token-space by repeatedly calling `select()` with the
`next_position` returned from the previous call will quite possibly lead
to an infinite loop as a token cannot express inclusiveness/exclusiveness
and thus the incremental selector will not be able to make progress when
the upper and lower bounds of two neighbouring intervals share the same
token with different inclusiveness e.g. [t1, t2](t2, t3].

To solve these problems update incremental_selector to work in terms of
ring position. This makes it possible to partition the token-space
amoing sstables at decorated key granularity. It also makes it possible
for select() to return a next_position that is guaranteed to make
progress.

partitioned_sstable_set now builds the internal interval map using the
decorated key of the sstables, not just the tokens.
incremental_selector::select() now uses `dht::ring_position_view` as
both the selector and the next_position. ring_position_view can express
positions between keys so it can also include information about
inclusiveness/exclusiveness of the next interval guaranteeing forward
progress.

[1] `sstable_set::incremental_selector::selection::next_position`
2018-07-04 17:42:33 +03:00
Duarte Nunes
33d7de0805 Merge 'Expose sharding information to connections' from Avi
"
In the same way that drivers can route requests to a coordinator that
is also a replica of the data used by the request, we can allow
drivers to route requests directly to the shard. This patchset
adds and documents a way for drivers to know which shard a connection
is connected to, and how to perform this routing.
"

* tag 'shard-info-alt/v1' of https://github.com/avikivity/scylla:
  doc: documented protocol extension for exposing sharding
  transport: expose more information about sharding via the OPTIONS/SUPPORTED messages
  dht: add i_partitioner::sharding_ignore_msb()
2018-07-04 13:01:21 +01:00
Botond Dénes
8084ce3a8e query_pager: use query::is_single_partition() to check for singular range
Use query::is_single_partition() to check whether the queried ranges are
singular or not. The current method of using
`dht::partition_range::is_singular()` is incorrect, as it is possible to
build a singular range that doesn't represent a single partition.
`query::is_single_partition()` correctly checks for this so use it
instead.

Found during code-review.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f671f107e8069910a2f84b14c8d22638333d571c.1530675889.git.bdenes@scylladb.com>
2018-07-04 10:04:50 +01:00
Takuya ASADA
3cb7ddaf68 dist/debian/build_deb.sh: make build_deb.sh more simplified
Use is_debian()/is_ubuntu() to detect target distribution, also install
pystache by path since package name is different between Fedora and
CentOS.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180703193224.4773-1-syuu@scylladb.com>
2018-07-04 11:12:26 +03:00
Takuya ASADA
ed1d0b6839 dist/ami/files/.bash_profile: drop Ubuntu support
Drop Ubuntu support on login prompt, too.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180703192813.4589-1-syuu@scylladb.com>
2018-07-04 11:12:26 +03:00
Piotr Sarna
f42eaff75e cql3: add single column primary key restrictions getters
Getters for single column partition/clustering key restrictions
are added to statement_restrictions.
2018-07-04 09:48:32 +02:00
Piotr Sarna
a99acbc376 cql3: expose single column primary key restrictions
Underlying single_column_restrictions are exposed
for single_column_primary_key_restrictions via a const method.
2018-07-04 09:48:32 +02:00
Piotr Sarna
f7a2f15935 cql3: add needs_filtering to primary key restrictions
Primary key restrictions sometimes require filtering. These functions
return true if ALLOW FILTERING needs to be enabled in order to satisfy
these restrictions.
2018-07-04 09:48:32 +02:00
Piotr Sarna
6aec9e711f cql3: add simpler single_column_restriction::is_satisfied_by
Currently restriction::is_satisfied_by() accepts only keys and rows
as arguments. In this commit, a version that only takes bytes of data
is provided.
This simpler version applies to single_column_restriction only,
because it compares raw bytes underneath anyway. For other restriction
types, simplified is_satisfied_by is not defined.
2018-07-04 09:48:32 +02:00
Botond Dénes
bf2645c616 compatible_ring_position: refactor to compatible_ring_position_view
compatible_ring_position's sole purpose is to allow creating
boost::icl::interval_map with dht::ring_position as the key and list of
sstables as the value. This function is served equally well if
compatible_ring_position wraps a `dht::ring_position_view` instead of a
`dht::ring_position` with the added benefit of not having to copy the
possibly heavy `dht::decorated_key` around. It also makes it possible
to do lookups with `dht::ring_position_view` which is much more
versatile and allows avoiding copies just to make lookups.
The only downside is that `dht::ring_position_view` requires the
lifetime of the "viewed" object to be taken care of. This is not a
concern however, as so long as an interval is present in the map the
represented sstable is guaranteed to be alive to, as the interval map
participates in the ownership of the stored sstables.

Rename compatible_ring_position to compatible_ring_position_view to
reflect the changes.
While at it upgrade the std::experimental::optional to std::optional.
2018-07-04 08:19:39 +03:00
Botond Dénes
48b07ba5d3 dht::ring_position_view: use token_bound from ring_position
Currently dht::ring_position_view's dht::token constructor takes the
token bound in the form of a raw `uint8_t`. This allows for passing a
weight of "0" which is illegal as single token does not represent a
single ring position but an interval as arbitrary number of keys can
have the same token. dht::ring_position uses an enum in its dht::token
constructor. Import that same enum into the dht::ring_position_view
scope and take a `token_bound` instead of `uint8_t`.
This is especially important as in later patches the internal weight of
the ring_position_view will be exposed and illegal values can cause all
sorts of problems.
2018-07-04 08:19:34 +03:00
Alexys Jacob
8c03c1e2ce Support Gentoo Linux on node_health_check script.
Gentoo Linux was not supported by the node_health_check script
which resulted in the following error message displayed:

"This s a Non-Supported OS, Please Review the Support Matrix"

This patch adds support for Gentoo Linux while adding a TODO note
to add support for authenticated clusters which the script does
not support yet.

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20180703124458.3788-1-ultrabug@gentoo.org>
2018-07-03 20:18:13 +03:00
Tomasz Grabiec
2ffb621271 Merge "Fix atomic_cell_or_collection::external_memory_usage()" from Paweł
After the transition to the new in-memory representation in
aab6b0ee27 'Merge "Introduce new in-memory
representation for cells" from Paweł'
atomic_cell_or_collection::external_memory_usage() stopped accounting
for the externally stored data. Since, it wasn't covered by the unit
tests the bug remained unnotices until now.

This series fixes the memory usage calculation and adds proper unit
tests.

* https://github.com/pdziepak/scylla.git fix-external-memory-usage/v1:
  tests/mutation: properly mark atomic_cells that are collection members
  imr::utils::object: expose size overhead
  data::cell: expose size overhead of external chunks
  atomic_cell: add external chunks and overheads to
    external_memory_usage()
  tests/mutation: test external_memory_usage()
2018-07-03 14:58:10 +02:00
Botond Dénes
c236a96d7d tests/cql_query_tess: add unit test for querying empty ranges test
A bug was found recently (#3564) in the paging logic, where the code
assumed the queried ranges list is non-empty. This assumption is
incorrect as there can be valid (if rare) queries that can result in the
ranges list to be empty. Add a unit test that executes such a query with
paging enabled to detect any future bugs related to assumptions about
the ranges list being non-empty.

Refs: #3564
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f5ba308c4014c24bb392060a7e72e7521ff021fa.1530618836.git.bdenes@scylladb.com>
2018-07-03 13:43:17 +01:00
Botond Dénes
59a30f0684 query_pager: be prepared to _ranges being empty
do_fetch_page() checks in the beginning whether there is a saved query
state already, meaning this is not the first page. If there is not it
checks whether the query is for a singulular partitions or a range scan
to decide whether to enable the stateful queries or not. This check
assumed that there is at least one range in _ranges which will not hold
under some circumstances. Add a check for _ranges being empty.

Fixes: #3564
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <cbe64473f8013967a93ef7b2104c7ca0507afac9.1530610709.git.bdenes@scylladb.com>
2018-07-03 11:05:01 +01:00
Avi Kivity
eafd16266d tests: reduce multishard_mutation_test runtime in debug mode
Debug mode is so slow that generating 1000 mutations is too much for it.
High memory use can also confuse the santitizers that track each allocation.

Reduce mutation count from 1000 to 10 in debug mode.
2018-07-03 12:01:44 +03:00
Avi Kivity
a36b1f1967 Merge "more scylla_setup fixes" from Takuya
"
Added NIC / Disk existance check, --force-raid mode on
scylla_raid_setup.
"

* 'scylla_setup_fix4' of https://github.com/syuu1228/scylla:
  dist/common/scripts/scylla_raid_setup: verify specified disks are unused
  dist/common/scripts/scylla_raid_setup: add --force-raid to construct raid even only one disk is specified
  dist/common/scripts/scylla_setup: don't accept disk path if it's not block device
  dist/common/scripts/scylla_raid_setup: verify specified disk paths are block device
  dist/common/scripts/scylla_sysconfig_setup: verify NIC existance
2018-07-03 11:03:08 +03:00
Takuya ASADA
d0f39ea31d dist/common/scripts/scylla_raid_setup: verify specified disks are unused
Currently only scylla_setup interactive mode verifies selected disks are
unused, on non-interactive mode we get mdadm/mkfs.xfs program error and
python backtrace when disks are busy.

So we should verify disks are unused also on scylla_raid_setup, print
out simpler error message.
2018-07-03 14:50:34 +09:00
Takuya ASADA
3289642223 dist/common/scripts/scylla_raid_setup: add --force-raid to construct raid even only one disk is specified
User may want to start RAID volume with only one disk, add an option to
force constructing RAID even only one disk specified.
2018-07-03 14:50:34 +09:00
Takuya ASADA
e0c16c4585 dist/common/scripts/scylla_setup: don't accept disk path if it's not block device
Need to ignore input when specified path is not block device.
2018-07-03 14:50:34 +09:00
Takuya ASADA
24ca2d85c6 dist/common/scripts/scylla_raid_setup: verify specified disk paths are block device
Verify disk paths are block device, exit with error if not.
2018-07-03 14:50:34 +09:00
Takuya ASADA
99b5cf1f92 dist/common/scripts/scylla_sysconfig_setup: verify NIC existance
Verify NIC existance before writing sysconfig file to prevent causing
error while running scylla.

See #2442
2018-07-03 14:50:34 +09:00
Takuya ASADA
084c824d12 scripts: merge scylla_install_pkg to scylla-ami
scylla_install_pkg is initially written for one-liner-installer, but now
it only used for creating AMI, and it just few lines of code, so it should be
merge into scylla_install_ami script.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180612150106.26573-2-syuu@scylladb.com>
2018-07-02 13:20:09 +03:00
Takuya ASADA
fafcacc31c dist/ami: drop Ubuntu AMI support
Drop Ubuntu AMI since it's not maintained for a long time, and we have
no plan to officially provide it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180612150106.26573-1-syuu@scylladb.com>
2018-07-02 13:20:08 +03:00
Avi Kivity
677991f353 Uodate scylla-ami submodule
* dist/ami/files/scylla-ami 36e8511...0fd9d23 (2):
  > scylla_install_ami: merge scylla_install_pkg
  > scylla_install_ami: drop Ubuntu AMI
2018-07-02 13:19:34 +03:00
Botond Dénes
01bd34d117 i_partitioner: add free function ring-position tri comparator
Having to create an object just to compare two ring positions (or views)
is annoying and unnecessary. Provide a free function version as well.
2018-07-02 11:41:09 +03:00
Botond Dénes
78ecf2740a mutation_reader_merger::maybe_add_readers(): remove early return
It's unnecessary (doesn't prevent anything). The code without it
expresses intent better (and is shorter by two lines).
2018-07-02 11:41:09 +03:00
Botond Dénes
d26b35b058 mutation_reader_merger: get rid of _key
`_key` is only used in a single place and this does not warrant storing
it in a member. Also get rid of current_position() which was used to
query `_key`.
2018-07-02 11:40:43 +03:00
Avi Kivity
0b148d0070 Merge "scylla_setup fixes" from Takuya
"
I found problems on previously submmited patchset 'scylla_setup fixes'
and 'more fixes for scylla_setup', so fixed them and merged into one
patchset.

Also added few more patches.
"

* 'scylla_setup_fix3' of https://github.com/syuu1228/scylla:
  dist/common/scripts/scylla_setup: allow input multiple disk paths on RAID disk prompt
  dist/common/scripts/scylla_raid_setup: skip constructing RAID0 when only one disk specified
  dist/common/scripts/scylla_raid_setup: fix module import
  dist/common/scripts/scylla_setup: check disk is used in MDRAID
  dist/common/scripts/scylla_setup: move unmasking scylla-fstrim.timer on scylla_fstrim_setup
  dist/common/scripts/scylla_setup: use print() instead of logging.error()
  dist/common/scripts/scylla_setup: implement do_verify_package() for Gentoo Linux
  dist/common/scripts/scylla_coredump_setup: run os.remove() when deleting directory is symlink
  dist/common/scripts/scylla_setup: don't include the disk on unused list when it contains partitions
  dist/common/scripts/scylla_setup: skip running rest of the check when the disk detected as used
  dist/common/scripts/scylla_setup: add a disk to selected list correctly
  dist/common/scripts/scylla_setup: fix wrong indent
  dist/common/scripts: sync instance type list for detect NIC type to latest one
  dist/common/scripts: verify systemd unit existance using 'systemctl cat'
2018-07-02 10:21:49 +03:00
Avi Kivity
a45c3aa8c7 Merge "Fix handling of stale write replies in storage_proxy" from Gleb
"
If a coordinator sends write requests with ID=X and restarts it may get a reply to
the request after it restarts and sends another request with the same ID (but to
different replicas). This condition will trigger an assert in a coordinator. Drop
the assertion in favor of a warning and initialize handler id in a way to make
this situation less likely.

Fixes: #3153
"

* 'gleb/write-handler-id' of github.com:scylladb/seastar-dev:
  storage_proxy: initialize write response id counter from wall clock value
  storage_proxy: drop virtual from signal(gms::inet_address)
  storage_proxy: do not assert on getting an unexpected write reply
2018-07-01 17:59:54 +03:00
Gleb Natapov
19e7493d5b storage_proxy: initialize write response id counter from wall clock value
Initializing write response id to the same value on each reboot may
cause stale id to be taken for active one if node restarts after
sending only a couple of write request and before receiving replies.
On next reboot it will start assigning id's from the same value and
receiving old replies will confuse it. Mitigate this by assigning
initial id to wall clock value in milliseconds. It will not solve the
problem completely, but will mitigate it.
2018-07-01 17:24:40 +03:00
Nadav Har'El
3194ce16b3 repair: fix combination of "-pr" and "-local" repair options
When nodetool repair is used with the combination of the "-pr" (primary
range) and "-local" (only repair with nodes in the same DC) options,
Scylla needs to define the "primary ranges" differently: Rather than
assign one node in the entire cluster to be the primary owner of every
token, we need one node in each data-center - so that a "-local"
repair will cover all the tokens.

Fixes #3557.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180701132445.21685-1-nyh@scylladb.com>
2018-07-01 16:39:33 +03:00
Gleb Natapov
569437aaa5 storage_proxy: drop virtual from signal(gms::inet_address)
The function is not overridden, so should not be virtual.
2018-07-01 16:35:59 +03:00
Gleb Natapov
5ee09e5f3b storage_proxy: do not assert on getting an unexpected write reply
In theory we should not get write reply from a node we did not send
write to, but in practice stale reply can be received if node reboot
between sending write and getting a reply. Do not assert, but log the
warning instead and ignore the reply.

Fixes: #3153
2018-07-01 16:35:09 +03:00
Tomasz Grabiec
b464b66e90 row_cache: Fix memtable reads concurrent with cache update missing writes
Introduced in 5b59df3761.

It is incorrect to erase entries from the memtable being moved to
cache if partition update can be preempted because a later memtable
read may create a snapshot in the memtable before memtable writes for
that partition are made visible through cache. As a result the read
may miss some of the writes which were in the memtable. The code was
checking for presence of snapshots when entering the partition, but
this condition may change if update is preempted. The fix is to not
allow erasing if update is preemptible.

This also caused SIGSEGVs because we were assuming that no such
snapshots will be created and hence were not invalidating iterators on
removal of the entries, which results in undefined behavior when such
snapshots are actually created.

Fixes SIGSEGV in dtest: limits_test.py:TestLimits.max_cells_test

Fixes #3532

Message-Id: <1530129009-13716-1-git-send-email-tgrabiec@scylladb.com>
2018-07-01 15:36:05 +03:00
Avi Kivity
f3da043230 Merge "Make in-memory partition version merging preemptable" from Tomasz
"
Partition snapshots go away when the last read using the snapshot is done.
Currently we will synchronously attempt to merge partition versions on this event.
If partitions are large, that may stall the reactor for a significant amount of time,
depending on the size of newer versions. Cache update on memtable flush can
create especially large versions.

The solution implemented in this series is to allow merging to be preemptable,
and continue in the background. Background merging is done by the mutation_cleaner
associated with the container (memtable, cache). There is a single merging process
per mutation_cleaner. The merging worker runs in a separate scheduling group,
introduced here, called "mem_compaction".

When the last user of a snapshot goes away the snapshot is slided to the
oldest unreferenced version first so that the version is no longer reachable
from partition_entry::read(). The cleaner will then keep merging preceding
(newer) versions into it, until it merges a version which is referenced. The
merging is preemtable. If the initial merging is preempted, the snapshot is
enqueued into the cleaner, the worker woken up, and merging will continue
asynchronously.

When memtable is merged with cache, its cleaner is merged with cache cleaner,
so any outstanding background merges will be continued by the cache cleaner
without disruption.

This reduces scheduling latency spikes in tests/perf_row_cache_update
for the case of large partition with many rows. For -c1 -m1G I saw
them dropping from >23ms to 1-2ms. System-level benchmark using scylla-bench
shows a similar improvement.
"

* tag 'tgrabiec/merge-snapshots-gradually-v4' of github.com:tgrabiec/scylla:
  tests: perf_row_cache_update: Test with an active reader surviving memtable flush
  memtable, cache: Run mutation_cleaner worker in its own scheduling group
  mutation_cleaner: Make merge() redirect old instance to the new one
  mvcc: Use RAII to ensure that partition versions are merged
  mvcc: Merge partition version versions gradually in the background
  mutation_partition: Make merging preemtable
  tests: mvcc: Use the standard maybe_merge_versions() to merge snapshots
2018-07-01 15:32:51 +03:00
Avi Kivity
8eba27829a doc: documented protocol extension for exposing sharding
Document a protocol extension that exposes the sharding algorithm
to drivers, and recommend how to use it to achieve connection-per-core.
2018-07-01 15:26:30 +03:00
Avi Kivity
28d064e7c0 transport: expose more information about sharding via the OPTIONS/SUPPORTED messages
Provide all infomation needed for a connection pool to set up a connection
per shard.
2018-07-01 15:26:28 +03:00
Botond Dénes
5fd9c3b9d4 tests/mutation_reader_test: require min shard-count for multishard tests
Tests testing different aspects of `foreign_reader` and
`multishard_combining_reader` are designed to run with a certain minimum
shard count. Running them with any shard count below this minimum makes
them useless at best but can even fail them.
Refuse to run these tests when the shard count is below the required
minimum to avoid an accidental and unnecessary investigation into a
false-positive test failure.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <d24159415b6a9d74eafb8355b6e3fba98c1ff7ff.1530274392.git.bdenes@scylladb.com>
2018-07-01 12:44:41 +03:00
Avi Kivity
f73340e6f8 Merge "Index reader and associated types clean-up." from Vladimir
"
This patchset paves way to support for reading SSTables 3.x index files.
It aims at streamlining and tidying up the existing index_reader and
helpers and brings no functional or high-level changes.

In v3:
  - do not capture 'found' and just return 'true' in the continuation
    inside advance_and_check_if_present()
  - split code that makes the use of advance_upper_past() internal-only
    into two commits for better readability

GitHub URL: https://github.com/argenet/scylla/tree/projects/sstables-30/index_reader_cleanup/v3

Tests: unit {release}

Performance tests (perf_fast_forward) did not reveal any noticeable
changes. The complete output is below.

========================================
Original code (before the patchset)
========================================
running: large-partition-skips
Testing scanning large partition with skips.
Reads whole range interleaving reads with skips according to read-skip pattern:
read    skip      time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
1       0         0.336514   1000000    2971642   1000     126956      35       0        0        0        0        0        0        0  99.5%
1       1         1.411239    500000     354299    993     127056       2       0        0        1        1        0        0        0  99.9%
1       8         0.464468    111112     239224    993     127056       2       0        0        1        1        0        0        0  99.8%
1       16        0.330490     58824     177990    993     127056      12       0        0        1        1        0        0        0  99.7%
1       32        0.257010     30304     117910    993     127056      15       0        0        1        1        0        0        0  99.7%
1       64        0.213650     15385      72010    997     127072     268       0        0        3        3        0        0        0  99.5%
1       256       0.159498      3892      24402    993     127056     245       0        0        1        1        0        0        0  95.5%
1       1024      0.088678       976      11006    993     127056     347       0        0        1        1        0        0        0  63.4%
1       4096      0.082627       245       2965    649      22452     389     252        0        1        1        0        0        0  20.0%
64      1         0.411080    984616    2395191   1059     127056      57       1        0        1        1        0        0        0  99.1%
64      8         0.390130    888896    2278461    993     127056       2       0        0        1        1        0        0        0  99.8%
64      16        0.369033    800000    2167828    993     127056       3       0        0        1        1        0        0        0  99.8%
64      32        0.338126    666688    1971714    993     127056      10       0        0        1        1        0        0        0  99.7%
64      64        0.297335    500032    1681711    997     127072      18       0        0        3        3        0        0        0  99.7%
64      256       0.199420    200000    1002910    993     127056     211       0        0        1        1        0        0        0  99.5%
64      1024      0.113953     58880     516704    993     127056     284       0        0        1        1        0        0        0  64.1%
64      4096      0.094596     15424     163051    687      23684     415     248        0        1        1        0        0        0  23.7%

running: large-partition-slicing
Testing slicing of large partition:
offset  read      time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
0       1         0.000586         1       1706      3        164       2       1        0        1        1        0        0        0   9.0%
0       32        0.000587        32      54539      3        164       2       1        0        1        1        0        0        0   9.9%
0       256       0.000688       256     372343      4        196       2       1        0        1        1        0        0        0  20.7%
0       4096      0.004320      4096     948185     19        676      10       1        0        1        1        0        0        0  36.7%
500000  1         0.000882         1       1134      5        228       3       2        0        1        1        0        0        0  14.3%
500000  32        0.000881        32      36321      5        228       3       2        0        1        1        0        0        0  14.3%
500000  256       0.000961       256     266386      6        260       3       2        0        1        1        0        0        0  21.9%
500000  4096      0.003127      4096    1309805     21        740      14       2        0        1        1        0        0        0  54.0%

running: large-partition-slicing-clustering-keys
Testing slicing of large partition using clustering keys:
offset  read      time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
0       1         0.000639         1       1564      3        164       2       0        0        1        1        0        0        0  13.9%
0       32        0.000626        32      51154      3        164       2       0        0        1        1        0        0        0  15.3%
0       256       0.000716       256     357560      4        168       2       0        0        1        1        0        0        0  23.1%
0       4096      0.003681      4096    1112743     16        680       8       1        0        1        1        0        0        0  38.5%
500000  1         0.000966         1       1035      4        424       3       2        0        1        1        0        0        0  12.4%
500000  32        0.000911        32      35121      5        296       3       1        0        1        1        0        0        0  13.1%
500000  256       0.000978       256     261645      5        296       3       1        0        1        1        0        0        0  19.1%
500000  4096      0.003155      4096    1298139     11        744       6       1        0        1        1        0        0        0  44.5%

running: large-partition-slicing-single-key-reader
Testing slicing of large partition, single-partition reader:
offset  read      time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
0       1         0.000756         1       1323      4        484       2       0        0        1        1        0        0        0  11.3%
0       32        0.000625        32      51174      3        164       2       0        0        1        1        0        0        0  15.5%
0       256       0.000705       256     363337      4        196       2       0        0        1        1        0        0        0  24.3%
0       4096      0.003603      4096    1136829     16        900       8       1        0        1        1        0        0        0  44.4%
500000  1         0.000880         1       1136      5        228       3       3        0        1        1        0        0        0  12.6%
500000  32        0.000882        32      36268      5        228       3       1        0        1        1        0        0        0  14.0%
500000  256       0.000965       256     265178      6        260       3       1        0        1        1        0        0        0  20.8%
500000  4096      0.003098      4096    1322024     21        740      14       2        0        1        1        0        0        0  54.6%

running: large-partition-select-few-rows
Testing selecting few rows from a large partition:
stride  rows      time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
1000000 1         0.000631         1       1585      3        164       2       2        0        1        1        0        0        0  15.2%
500000  2         0.000873         2       2291      5        228       3       2        0        1        1        0        0        0  13.2%
250000  4         0.001404         4       2850      9        356       5       4        0        1        1        0        0        0  11.9%
125000  8         0.002878         8       2779     21        740      13       8        0        1        1        0        0        0  15.5%
62500   16        0.005184        16       3087     41       1380      25      16        0        1        1        0        0        0  19.3%
2       500000    0.948899    500000     526926   1040     127056      39       0        0        1        1        0        0        0  99.9%

running: large-partition-forwarding
Testing forwarding with clustering restriction in a large partition:
pk-scan   time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
yes       0.001813         2       1103     11       1380       3       8        0        1        1        0        0        0  18.5%
no        0.000922         2       2170      5        228       3       1        0        1        1        0        0        0  14.1%

running: small-partition-skips
Testing scanning small partitions with skips.
Reads whole range interleaving reads with skips according to read-skip pattern:
   read    skip      time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> 1       0         1.023396   1000000     977139   1104     139668      12       0        0        2        2        0        0        0  99.7%
-> 1       1         2.176794    500000     229696   6200     177660    5109       0        0     5108     7679        0        0        0  69.9%
-> 1       8         1.130179    111112      98314   6200     177660    5109       0        0     5108     9647        0        0        0  41.5%
-> 1       16        0.972022     58824      60517   6200     177660    5109       0        0     5108     9913        0        0        0  32.0%
-> 1       32        0.880783     30304      34406   6201     177664    5110       0        0     5108    10057        0        0        0  25.2%
-> 1       64        0.829019     15385      18558   6199     177656    5108       0        0     5107    10135        0        0        0  20.4%
-> 1       256       2.248487      3892       1731   5028     168948    3937       0        0     3936     7801        0        0        0   4.6%
-> 1       1024      0.342806       976       2847   2076     146948     985     105        0      984     1955        0        0        0   9.3%
-> 1       4096      0.088605       245       2765    739      18152     492     246        0      247      490        0        0        0  11.1%
-> 64      1         1.796715    984616     548009   6274     177660    5120       0        0     5108     5187        0        0        0  63.1%
-> 64      8         1.688994    888896     526287   6200     177660    5109       0        0     5108     5674        0        0        0  61.2%
-> 64      16        1.593196    800000     502135   6200     177660    5109       0        0     5108     6143        0        0        0  58.7%
-> 64      32        1.438651    666688     463412   6200     177660    5109       0        0     5108     6807        0        0        0  56.5%
-> 64      64        1.290205    500032     387560   6200     177660    5109       0        0     5108     7660        0        0        0  49.2%
-> 64      256       2.136466    200000      93613   5252     170616    4161       0        0     4160     6267        0        0        0  13.8%
-> 64      1024      0.388871     58880     151413   2317     148784    1226     107        0     1225     1844        0        0        0  23.4%
-> 64      4096      0.107253     15424     143809    807      19100     562     244        0      321      482        0        0        0  24.2%

running: small-partition-slicing
Testing slicing small partitions:
offset  read      time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
0       1         0.002773         1        361      3         68       2       0        0        1        1        0        0        0  10.5%
0       32        0.002905        32      11015      3         68       2       0        0        1        1        0        0        0  11.6%
0       256       0.003170       256      80764      4        104       2       0        0        1        1        0        0        0  17.8%
0       4096      0.008125      4096     504095     20        616      11       1        0        1        1        0        0        0  54.1%
500000  1         0.002914         1        343      3         72       2       0        0        1        2        0        0        0  10.7%
500000  32        0.002967        32      10786      3         72       2       0        0        1        2        0        0        0  12.6%
500000  256       0.003338       256      76685      5        112       3       0        0        2        2        0        0        0  17.4%
500000  4096      0.008495      4096     482141     21        624      12       1        0        2        2        0        0        0  52.3%

========================================
With the patchset
========================================

running: large-partition-skips
Testing scanning large partition with skips.
Reads whole range interleaving reads with skips according to read-skip pattern:
read    skip      time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
1       0         0.340110   1000000    2940229   1000     126956      42       0        0        0        0        0        0        0  97.5%
1       1         1.401352    500000     356798    993     127056       2       0        0        1        1        0        0        0  99.9%
1       8         0.463124    111112     239918    993     127056       2       0        0        1        1        0        0        0  99.8%
1       16        0.330050     58824     178228    993     127056      11       0        0        1        1        0        0        0  99.7%
1       32        0.255981     30304     118384    993     127056       8       0        0        1        1        0        0        0  99.7%
1       64        0.215160     15385      71505    997     127072     263       0        0        3        3        0        0        0  99.4%
1       256       0.159702      3892      24370    993     127056     239       0        0        1        1        0        0        0  95.6%
1       1024      0.094403       976      10339    993     127056     298       0        0        1        1        0        0        0  58.9%
1       4096      0.082501       245       2970    649      22452     391     252        0        1        1        0        0        0  20.1%
64      1         0.415227    984616    2371272   1059     127056      52       1        0        1        1        0        0        0  99.3%
64      8         0.391556    888896    2270166    993     127056       2       0        0        1        1        0        0        0  99.8%
64      16        0.372075    800000    2150102    993     127056       4       0        0        1        1        0        0        0  99.7%
64      32        0.337454    666688    1975641    993     127056      15       0        0        1        1        0        0        0  99.7%
64      64        0.296345    500032    1687333    997     127072      21       0        0        3        3        0        0        0  99.7%
64      256       0.199221    200000    1003911    993     127056     204       0        0        1        1        0        0        0  99.4%
64      1024      0.118224     58880     498037    993     127056     275       0        0        1        1        0        0        0  61.8%
64      4096      0.095098     15424     162191    687      23684     417     248        0        1        1        0        0        0  23.7%

running: large-partition-slicing
Testing slicing of large partition:
offset  read      time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
0       1         0.000585         1       1709      3        164       2       1        0        1        1        0        0        0  10.7%
0       32        0.000589        32      54353      3        164       2       1        0        1        1        0        0        0  10.0%
0       256       0.000688       256     372293      4        196       2       1        0        1        1        0        0        0  20.7%
0       4096      0.004336      4096     944562     19        676      10       1        0        1        1        0        0        0  36.9%
500000  1         0.000877         1       1140      5        228       3       2        0        1        1        0        0        0  13.6%
500000  32        0.000883        32      36222      5        228       3       2        0        1        1        0        0        0  14.4%
500000  256       0.000963       256     265804      6        260       3       2        0        1        1        0        0        0  22.0%
500000  4096      0.003008      4096    1361779     21        740      17       2        0        1        1        0        0        0  56.7%

running: large-partition-slicing-clustering-keys
Testing slicing of large partition using clustering keys:
offset  read      time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
0       1         0.000623         1       1604      3        164       2       0        0        1        1        0        0        0  13.9%
0       32        0.000624        32      51261      3        164       2       0        0        1        1        0        0        0  14.7%
0       256       0.000714       256     358484      4        168       2       0        0        1        1        0        0        0  22.6%
0       4096      0.003687      4096    1110990     16        680       8       1        0        1        1        0        0        0  38.6%
500000  1         0.000973         1       1028      4        424       3       2        0        1        1        0        0        0  12.1%
500000  32        0.000914        32      35022      5        296       3       1        0        1        1        0        0        0  12.8%
500000  256       0.000986       256     259646      5        296       3       1        0        1        1        0        0        0  19.7%
500000  4096      0.003155      4096    1298122     11        744       6       1        0        1        1        0        0        0  44.5%

running: large-partition-slicing-single-key-reader
Testing slicing of large partition, single-partition reader:
offset  read      time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
0       1         0.000766         1       1305      4        484       2       0        0        1        1        0        0        0  12.2%
0       32        0.000626        32      51111      3        164       2       0        0        1        1        0        0        0  15.2%
0       256       0.000710       256     360563      4        196       2       0        0        1        1        0        0        0  25.2%
0       4096      0.003963      4096    1033440     16        900       8       1        0        1        1        0        0        0  40.2%
500000  1         0.000877         1       1141      5        228       3       1        0        1        1        0        0        0  12.7%
500000  32        0.000882        32      36272      5        228       3       1        0        1        1        0        0        0  14.2%
500000  256       0.000959       256     266937      6        260       3       1        0        1        1        0        0        0  21.1%
500000  4096      0.003103      4096    1319992     21        740      14       2        0        1        1        0        0        0  53.9%

running: large-partition-select-few-rows
Testing selecting few rows from a large partition:
stride  rows      time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
1000000 1         0.000631         1       1586      3        164       2       2        0        1        1        0        0        0  13.8%
500000  2         0.000872         2       2295      5        228       3       2        0        1        1        0        0        0  13.4%
250000  4         0.001483         4       2698      9        356       5       4        0        1        1        0        0        0  11.2%
125000  8         0.002894         8       2764     21        740      13       8        0        1        1        0        0        0  15.6%
62500   16        0.005182        16       3087     41       1380      25      16        0        1        1        0        0        0  19.5%
2       500000    0.942943    500000     530255   1040     127056      38       0        0        1        1        0        0        0  99.9%

running: large-partition-forwarding
Testing forwarding with clustering restriction in a large partition:
pk-scan   time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
yes       0.001807         2       1107     11       1380       3       8        0        1        1        0        0        0  18.9%
no        0.000924         2       2165      5        228       3       1        0        1        1        0        0        0  14.1%

running: small-partition-skips
Testing scanning small partitions with skips.
Reads whole range interleaving reads with skips according to read-skip pattern:
   read    skip      time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> 1       0         1.009953   1000000     990145   1104     139668      11       0        0        2        2        0        0        0  99.7%
-> 1       1         2.213846    500000     225851   6200     177660    5109       0        0     5108     7679        0        0        0  70.3%
-> 1       8         1.150029    111112      96617   6200     177660    5109       0        0     5108     9647        0        0        0  42.3%
-> 1       16        0.989438     58824      59452   6200     177660    5109       0        0     5108     9913        0        0        0  33.2%
-> 1       32        0.891590     30304      33989   6201     177664    5110       0        0     5108    10057        0        0        0  26.4%
-> 1       64        0.840952     15385      18295   6199     177656    5108       0        0     5107    10135        0        0        0  21.6%
-> 1       256       2.247875      3892       1731   5028     168948    3937       0        0     3936     7801        0        0        0   5.0%
-> 1       1024      0.345917       976       2821   2076     146948     985     105        0      984     1955        0        0        0  10.0%
-> 1       4096      0.088806       245       2759    739      18152     492     246        0      247      490        0        0        0  11.6%
-> 64      1         1.821995    984616     540406   6274     177660    5119       0        0     5108     5187        0        0        0  63.9%
-> 64      8         1.715052    888896     518291   6200     177660    5109       0        0     5108     5674        0        0        0  61.9%
-> 64      16        1.620385    800000     493710   6200     177660    5109       0        0     5108     6143        0        0        0  59.4%
-> 64      32        1.464497    666688     455233   6200     177660    5109       0        0     5108     6807        0        0        0  56.9%
-> 64      64        1.311386    500032     381300   6200     177660    5109       0        0     5108     7660        0        0        0  50.0%
-> 64      256       2.153954    200000      92853   5252     170616    4161       0        0     4160     6267        0        0        0  14.3%
-> 64      1024      0.350275     58880     168097   2317     148784    1226     107        0     1225     1844        0        0        0  27.5%
-> 64      4096      0.107498     15424     143482    807      19100     562     244        0      321      482        0        0        0  24.5%

running: small-partition-slicing
Testing slicing small partitions:
offset  read      time (s)     frags     frag/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
0       1         0.002872         1        348      3         68       2       0        0        1        1        0        0        0  10.2%
0       32        0.002833        32      11297      3         68       2       0        0        1        1        0        0        0  12.1%
0       256       0.003145       256      81404      4        104       2       0        0        1        1        0        0        0  17.9%
0       4096      0.008110      4096     505079     20        616      12       1        0        1        1        0        0        0  54.4%
500000  1         0.002934         1        341      3         72       2       1        0        1        2        0        0        0  10.6%
500000  32        0.002871        32      11145      3         72       2       0        0        1        2        0        0        0  12.0%
500000  256       0.003216       256      79598      5        112       3       0        0        2        2        0        0        0  18.3%
500000  4096      0.008557      4096     478692     21        624      12       1        0        2        2        0        0        0  51.9%
"

* 'projects/sstables-30/index_reader_cleanup/v3' of https://github.com/argenet/scylla:
  sstables: Remove "lower_" from index_reader public methods.
  sstables: Make index_reader::advance_upper_past() method private.
  sstables: Stop using index_reader::advance_upper_past() outside the class.
  sstables: Move promoted_index_block from types.hh to index_entry.hh.
  sstables: Factor out promoted index into a separate class.
  sstables: Use std::optional instead of std::experimental optional in index_reader.
2018-07-01 12:30:29 +03:00
Botond Dénes
da53ea7a13 tests.py: add --jobs command line parameter
Allowing for setting the number of jobs to use for running the tests.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <d58d6393c6271bffc37ab3b5edc37b00ef485d9c.1529433590.git.bdenes@scylladb.com>
2018-07-01 12:26:41 +03:00
Avi Kivity
db2c029f7a dht: add i_partitioner::sharding_ignore_msb()
While the sharding algorithm is exposed (as cpu_sharding_algorithm_name()),
the ignore_msb parameter is not. Add a function to do that.
2018-07-01 12:17:35 +03:00
Vladimir Krivopalov
b24eb5c11d sstables: Remove "lower_" from index_reader public methods.
The index_reader class public interface has been amended to only deal
with the upper bound cursor along with advancing the lower bound.
Since the class users can only explicitly operate with the lower bound
cursor (take data file position, advance to the next partition, etc), it
no longer makes sense to specify that the method operates on the lower
bound cursor in its name.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-06-29 11:48:33 -07:00
Vladimir Krivopalov
30109a693b sstables: Make index_reader::advance_upper_past() method private.
No changes made to the code except that it is moved around.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-06-29 11:47:48 -07:00
Vladimir Krivopalov
80d1d5017f sstables: Stop using index_reader::advance_upper_past() outside the class.
The only case when it needs to be called is when an index_reader is
advanced to a specific partition as part of sstable_reader
initialisation.

Instead, we're passing an optional upper_bound parameter that is used to
call advance_upper_past() internally if partition is found.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-06-29 11:47:20 -07:00
Duarte Nunes
0db5419ec5 Merge 'Avoid copies when unfreezing frozen_mutation' from Paweł
"
When frozen mutation gets deserialised current implementation copies
its value 3 times: from IDL buffer to bytes object, from bytes object to
atomic_cell and then atomic_cell is copied again. Moreover, the value
gets linearised which may cause a large allocation.

All of that is very wasteful. This patch devirtualises and reworks IDL
reading code so that when used with partition_builder the cell value is
copied only once and without linearisation: from the IDL buffer to the
final atomic_cell.

perf_simple_query -c4, medians of 30 results:
        ./perf_before  ./perf_after  diff
 read       310576.54     316273.90  1.8%
 write      359913.15     375579.44  4.4%

microbenchmark, perf_idl:

BEFORE
test                                      iterations      median         mad         min         max
frozen_mutation.freeze_one_small_row         2142435   462.431ns     0.125ns   462.306ns   467.659ns
frozen_mutation.unfreeze_one_small_row       1640949   601.422ns     0.082ns   601.340ns   605.279ns
frozen_mutation.apply_one_small_row          1538969   645.993ns     0.405ns   645.588ns   656.510ns

AFTER
test                                      iterations      median         mad         min         max
frozen_mutation.freeze_one_small_row         2139548   455.525ns     0.631ns   454.894ns   456.707ns
frozen_mutation.unfreeze_one_small_row       1760139   566.157ns     0.003ns   566.153ns   584.339ns
frozen_mutation.apply_one_small_row          1582050   610.951ns     0.060ns   610.891ns   613.044ns

Tests: unit(release)
"

* tag 'avoid-copy-unfreeze/v2' of https://github.com/pdziepak/scylla:
  mutation_partition_view: use column_mapping_entry::is_atomic()
  schema: column_mapping_entry: cache abstract_type::is_atomic()
  schema: column_mapping_entry: reduce logic duplication
  mutation_partition_view: do not linearise or copy cell value
  atomic_cell: allow passing value via ser::buffer_view
  mutation_partition_view: pass cell by value to visitor
  mutation_partition_view: devirtualise accept()
  storage_proxy: use mutation_partition_view::{first, last}_row_key()
  mutation_partition_view: add last_row_key() and first_row_key() getters
2018-06-28 22:55:20 +01:00
Paweł Dziepak
c45e291084 mutation_partition_view: use column_mapping_entry::is_atomic() 2018-06-28 22:16:42 +01:00
Paweł Dziepak
6c54a97320 schema: column_mapping_entry: cache abstract_type::is_atomic()
IDL deserialisation code calls is_atomic() for each cell. An additional
indirection and a virtual call can be avoided by caching that value in
column_mapping_entry. There is already very similar optimisation done
for column_definitions.
2018-06-28 22:16:42 +01:00
Paweł Dziepak
2bfdc2d781 schema: column_mapping_entry: reduce logic duplication
User-defined constructors often make it more likely that a careless
developer will forget to update one of them when adding a new member to
a structure. The risk of that happening can be reduced by reducing code
duplication with delegating constructors.
2018-06-28 22:16:42 +01:00
Paweł Dziepak
199f9196e9 mutation_partition_view: do not linearise or copy cell value 2018-06-28 22:11:19 +01:00
Paweł Dziepak
92700c6758 atomic_cell: allow passing value via ser::buffer_view 2018-06-28 22:11:19 +01:00
Paweł Dziepak
bf330a99f0 mutation_partition_view: pass cell by value to visitor
mutation_partition_view needs to create an atomic_cell from
IDL-serialised data. Then that cell is passed to the visitor. However,
because generic mutation_partition_visitor interface was used, the cell
was passed by constant reference which forced the visitor to needlessly
copy it.

This patch takes advantage of the fact that mutation_partition_view is
devirtualised now and adjust the interfaces of its visitors so that the
cell can be passed without copying.
2018-06-28 22:11:19 +01:00
Paweł Dziepak
569176aad1 mutation_partition_view: devirtualise accept()
There are only two types of visitors used and only one of them appears
in the hot path. They can be devirtualised without too much effort,
which also enables future custom interface specialisations specific to
mutation_partition_views and its users, not necessairly in the scope of
more general mutation_partition_visitor.
2018-06-28 22:11:19 +01:00
Paweł Dziepak
6bd71015e7 storage_proxy: use mutation_partition_view::{first, last}_row_key() 2018-06-28 22:11:19 +01:00
Paweł Dziepak
2259eee97c mutation_partition_view: add last_row_key() and first_row_key() getters
Some users (e.g. reconciliation code) need only to know the clustering
key of the first or the last row in the partition. This was done with a
full visitor visiting every single cell of the partition, which is very
wasteful. This patch adds direct getters for the needed information.
2018-06-28 22:11:19 +01:00
Vladimir Krivopalov
a497edcbda sstables: Move promoted_index_block from types.hh to index_entry.hh.
It is only being used by index_reader internally and never exposed so
should not be listed in commonly used types.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-06-28 12:28:59 -07:00
Vladimir Krivopalov
81fba73e9d sstables: Factor out promoted index into a separate class.
An index entry may or may not have a promoted index. All the optional
fields are better scoped under the same class to avoid lots of separate
optional fields and give better representation.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-06-28 12:28:59 -07:00
Asias He
bb4d361cf6 storage_service: Limit number of REPLICATION_FINISHED verb can retry
In the removenode operation, if the message servicing is stopped, e.g., due
to disk io error isolation, the node can keep retrying the
REPLICATION_FINISHED verb infinitely.

Scylla log full of such message was observed:

[shard 0] storage_service - Fail to send REPLICATION_FINISHED to $IP:0:
seastar::rpc::closed_error (connection is closed)

To fix, limit the number of retires.

Tests: update_cluster_layout_tests.py

Fixes #3542

Message-Id: <638d392d6b39cc2dd2b175d7f000e7fb1d474f87.1529927816.git.asias@scylladb.com>
2018-06-28 19:54:01 +01:00
Paweł Dziepak
e9dffc753c tests/mutation: test external_memory_usage() 2018-06-28 19:20:23 +01:00
Paweł Dziepak
8153df7684 atomic_cell: add external chunks and overheads to external_memory_usage() 2018-06-28 19:20:23 +01:00
Paweł Dziepak
2dc78a6ca2 data::cell: expose size overhead of external chunks 2018-06-28 18:01:17 +01:00
Paweł Dziepak
6adc78d690 imr::utils::object: expose size overhead 2018-06-28 18:01:17 +01:00
Paweł Dziepak
e69f2c361c tests/mutation: properly mark atomic_cells that are collection members 2018-06-28 18:00:39 +01:00
Takuya ASADA
972ce88601 dist/common/scripts/scylla_setup: allow input multiple disk paths on RAID disk prompt
Allow "/dev/sda1,/dev/sdb1" style input on RAID disk prompt.
2018-06-29 01:37:19 +09:00
Takuya ASADA
a83c66b402 dist/common/scripts/scylla_raid_setup: skip constructing RAID0 when only one disk specified
When only one disk specified, create XFS directly on the disk instead of
creating RAID0 volume on the disk.
2018-06-29 01:37:19 +09:00
Takuya ASADA
99fb754221 dist/common/scripts/scylla_raid_setup: fix module import
sys module was missing, import it.

Fixes #3548
2018-06-29 01:37:19 +09:00
Takuya ASADA
f2132c61bd dist/common/scripts/scylla_setup: check disk is used in MDRAID
Check disk is used in MDRAID by /proc/mdstat.
2018-06-29 01:37:19 +09:00
Takuya ASADA
daccc10a06 dist/common/scripts/scylla_setup: move unmasking scylla-fstrim.timer on scylla_fstrim_setup
Currently, enabling scylla-fstrim.timer is part of 'enable-service', it
will be enabled even --no-fstrim-setup specified (or input 'No' on interactive setup prompt).

To apply --no-fstrim-setup we need to enabling scylla-fstrim.timer in
scylla_fstrim_setup instead of enable-service part of scylla_setup.

Fixes #3248
2018-06-29 01:37:19 +09:00
Takuya ASADA
fa6db21fea dist/common/scripts/scylla_setup: use print() instead of logging.error()
Align with other script scripts, use print().
2018-06-29 01:37:19 +09:00
Takuya ASADA
2401115e14 dist/common/scripts/scylla_setup: implement do_verify_package() for Gentoo Linux
Implement Gentoo Linux support on scylla_setup.
2018-06-29 01:37:19 +09:00
Takuya ASADA
9d537cb449 dist/common/scripts/scylla_coredump_setup: run os.remove() when deleting directory is symlink
Since shutil.rmtree() causes exception when running on symlink, we need
to check the path is symlink, run os.remove() when it symlink.

Fixes #3544
2018-06-29 01:37:19 +09:00
Takuya ASADA
5b4da4d4bd dist/common/scripts/scylla_setup: don't include the disk on unused list when it contains partitions
On current implementation, we are checking the partition is mounted, but
a disk contains the partition marked as unused.
To avoid the problem, we should skip a disk which contains partitions.

Fixes #3545
2018-06-29 01:37:19 +09:00
Takuya ASADA
83bc72b0ab dist/common/scripts/scylla_setup: skip running rest of the check when the disk detected as used
Don't need to run check when we already detected the disk as used.
2018-06-29 01:37:19 +09:00
Takuya ASADA
1650d37dae dist/common/scripts/scylla_setup: add a disk to selected list correctly
When a disk path typed on the RAID setup prompt, the script mistakenly
splits the input for each character,
like ['/', 'd', 'e', 'v', '/', 's', 'd', 'b'].

To fix the issue we need to use selected.append() instead of
selected +=.

See #3545
2018-06-29 01:37:19 +09:00
Takuya ASADA
4b5826ff5a dist/common/scripts/scylla_setup: fix wrong indent
list_block_devices() should return 'devices' on both re.match() is
matched and unmatched.
2018-06-29 01:37:19 +09:00
Takuya ASADA
f828c5c4f3 dist/common/scripts: sync instance type list for detect NIC type to latest one
Current instance type list is outdated, sync with latest table from:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html#enabling_enhanced_networking

Fixes #3536
2018-06-29 01:37:19 +09:00
Takuya ASADA
6cffb164d6 dist/common/scripts: verify systemd unit existance using 'systemctl cat'
Verify unit existance by running 'systemctl cat {}' silently, raise
exception if the unit doesn't exist.
2018-06-29 01:37:19 +09:00
Vladimir Krivopalov
82f76b0947 Use std::reference_wrapper instead of a plain reference in bound_view.
The presence of a plain reference prohibits the bound_view class from
being copyable. The trick employed to work around that was to use
'placement new' for copy-assigning bound_view objects, but this approach
is ill-formed and causes undefined behaviour for classes that have const
and/or reference members.

The solution is to use a std::reference_wrapper instead.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <a0c951649c7aef2f66612fc006c44f8a33713931.1530113273.git.vladimir@scylladb.com>
2018-06-28 11:24:06 +01:00
Avi Kivity
c87a961667 Merge "Add multishard_writer support" from Asias
"
We need a multishard_writer which gets mutation fragments from a producer
(e.g., from the network using the rpc streaming) and consumes the mutation
fragments with a consumer (e.g., write to sstable).

The multishard_writer will take care of the mutation fragments do not belong to
current shard.

This multishard_writer will be used in the new scylla streaming.
"

* 'asias/multishard_writer_v10.1' of github.com:scylladb/seastar-dev:
  tests: Add multishard_writer_test to test.py
  tests: Add test for multishard_writer
  multishard_writer: Introduce multishard_writer
  tests: Allow random_mutation_generator to generate mutations belong to remote shrard
2018-06-28 12:36:55 +03:00
Asias He
fd8b7efb99 tests: Add multishard_writer_test to test.py
For multishard_writer class testing.
2018-06-28 17:20:29 +08:00
Asias He
4050a4b24e tests: Add test for multishard_writer 2018-06-28 17:20:29 +08:00
Asias He
f4b406cce1 multishard_writer: Introduce multishard_writer
The multishard_writer class gets mutation_fragments generated from
flat_mutation_reader and consumes the mutation_fragments with
multishard_writer::_consumer. If the mutation_fragment does not belong to the
shard multishard_writer is on, it will forward the mutation_fragment to the
correct shard. Future returned by multishard_writer() becomes ready
when all the mutation_fragments are consumed.

Tests: tests/multishard_writer_test.cc
Tests: dtest update_cluster_layout_tests.py

Fixes #3497
2018-06-28 17:20:28 +08:00
Asias He
8eccff1723 tests: Allow random_mutation_generator to generate mutations belong to remote shrard
- make_local_keys returns keys of current shard
- make_keys returns keys of current or remote shard
2018-06-28 17:20:28 +08:00
Asias He
27cb41ddeb range_streamer: Use float for time took for stream
It is useful when the total time to stream is small, e.g, 2.0 seconds
and 2.9 seconds. Showing the time as interger number of seconds is not
accurate in such case.

Message-Id: <d801b57279981c72acb907ad4b0190ba4d938a3d.1530175052.git.asias@scylladb.com>
2018-06-28 11:39:14 +03:00
Vladimir Krivopalov
fc629b9ca6 sstables: Use std::optional instead of std::experimental optional in index_reader.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-06-27 16:47:53 -07:00
Tomasz Grabiec
0a1aec2bd6 tests: perf_row_cache_update: Test with an active reader surviving memtable flush
Exposes latency issues caused by mutation_cleaner life time issues,
fixed by eralier commits.
2018-06-27 21:51:04 +02:00
Tomasz Grabiec
074be4d4e8 memtable, cache: Run mutation_cleaner worker in its own scheduling group
The worker is responsible for merging MVCC snapshots, which is similar
to merging sstables, but in memory. The new scheduling group will be
therefore called "memory compaction".

We should run it in a separate scheduling group instead of
main/memtables, so that it doesn't disrupt writes and other system
activities. It's also nice for monitoring how much CPU time we spend
on this.
2018-06-27 21:51:04 +02:00
Tomasz Grabiec
6c6ffaee71 mutation_cleaner: Make merge() redirect old instance to the new one
If memtable snapshot goes away after memtable started merging to
cache, it would enqueue the snapshots for cleaning on the memtable's
cleaner, which will have to clean without deferrring when the memtable
is destroyed. That may stall the reactor. To avoid this, make merge()
cause the old instance of the cleaner to redirect to the new instance
(owned by cache), like we do for regions. This way the snapshots
mentioned earlier can be cleaned after memtable is destroyed,
gracefully.
2018-06-27 21:51:04 +02:00
Tomasz Grabiec
450985dfee mvcc: Use RAII to ensure that partition versions are merged
Before this patch, maybe_merge_versions() had to be manually called
before partition snapshot goes away. That is error prone and makes
client code more complicated. Delegate that task to a new
partition_snapshot_ptr object, through which all snapshots are
published now.
2018-06-27 21:51:04 +02:00
Avi Kivity
e1efda8b0c Merge "Disable sstable filtering based on min/max clustering key components" from Tomasz
"
With DateTiered and TimeWindow, there is a read optimization enabled
which excludes sstables based on overlap with recorded min/max values
of clustering key components. The problem is that it doesn't take into
account partition tombstones and static rows, which should still be
returned by the reader even if there is no overlap in the query's
clustering range. A read which returns no clustering rows can
mispopulate cache, which will appear as partition deletion or writes
to the static row being lost. Until node restart or eviction of the
partition entry.

There is also a bad interaction between cache population on read and
that optimization. When the clustering range of the query doesn't
overlap with any sstable, the reader will return no partition markers
for the read, which leads cache populator to assume there is no
partition in sstables and it will cache an empty partition. This will
cause later reads of that partition to miss prior writes to that
partition until it is evicted from cache or node is restarted.

Disable until a more elaborate fix is implemented.

Fixes #3552
Fixes #3553
"

* tag 'tgrabiec/disable-min-max-sstable-filtering-v1' of github.com:tgrabiec/scylla:
  tests: Add test for slicing a mutation source with date tiered compaction strategy
  tests: Check that database conforms to mutation source
  database: Disable sstable filtering based on min/max clustering key components
2018-06-27 14:28:27 +03:00
Calle Wilund
054514a47a sstables::compress: Ensure unqualified compressor name if possible
Fixes #3546

Both older origin and scylla writes "known" compressor names (i.e. those
in origin namespace) unqualified (i.e. LZ4Compressor).

This behaviour was not preserved in the virtualization change. But
probably should be.

Message-Id: <20180627110930.1619-1-calle@scylladb.com>
2018-06-27 14:16:50 +03:00
Tomasz Grabiec
d1e8c32b2e gdb: Add pretty printer for managed_vector 2018-06-27 13:07:28 +02:00
Tomasz Grabiec
b0e8547569 gdb: Add pretty printer for rows 2018-06-27 13:07:28 +02:00
Tomasz Grabiec
da19508317 gdb: Add mutation_partition pretty printer 2018-06-27 13:07:28 +02:00
Tomasz Grabiec
d485e1c1d8 gdb: Add pretty printer for partition_entry 2018-06-27 13:07:28 +02:00
Tomasz Grabiec
b51c70ef69 gdb: Add pretty printer for managed_bytes 2018-06-27 13:07:28 +02:00
Tomasz Grabiec
d76cfa77b1 gdb: Add iteration wrapper for intrusive_set_external_comparator 2018-06-27 13:07:24 +02:00
Tomasz Grabiec
aa0b41f0b2 gdb: Add iteration wrapper for boost intrusive set 2018-06-27 13:04:47 +02:00
Tomasz Grabiec
c26a304fbb mvcc: Merge partition version versions gradually in the background
When snapshots go away, typically when the last reader is destroyed,
we used to merge adjacent versions atomically. This could induce
reactor stalls if partitions were large. This is especially true for
versions created on cache update from memtables.

The solution is to allow this process to be preempted and move to the
background. mutation_cleaner keeps a linked list of such unmerged
snapshots and has a worker fiber which merges them incrementally and
asynchronously with regards to reads.

This reduces scheduling latency spikes in tests/perf_row_cache_update
for the case of large partition with many rows. For -c1 -m1G I saw
them dropping from 23ms to 2ms.
2018-06-27 12:48:30 +02:00
Tomasz Grabiec
4d3cc2867a mutation_partition: Make merging preemtable 2018-06-27 12:48:30 +02:00
Tomasz Grabiec
4995a8c568 tests: mvcc: Use the standard maybe_merge_versions() to merge snapshots
Preparation for switching to background merging.
2018-06-27 12:48:30 +02:00
Piotr Sarna
03753cc431 database: make drop_column_family wait on reads in progress
drop_column_family now waits for both writes and reads in progress.
It solves possible liveness issues with row cache, when column_family
could be dropped prematurely, before the read request was finished.

Phaser operation is passed inside database::query() call.
There are other places where reading logic is applied (e.g. view
replicas), but these are guarded with different synchronization
mechanisms, while _pending_reads_phaser applies to regular reads only.

Fixes #3357

Reported-by: Duarte Nunes <duarte@scylladb.com>
Signed-off-by: Piotr Sarna <sarna@scylladb.com>
Message-Id: <d58a5ee10596d0d62c765ee2114ac171b6f087d2.1529928323.git.sarna@scylladb.com>
2018-06-27 10:02:56 +01:00
Piotr Sarna
e1a867cbe3 database: add phaser for reads
Currently drop_column_family waits on write_in_progress phaser,
but there's no such mechanism for reads. This commit adds
a corresponding reads phaser.

Refs #3357

Reported-by: Duarte Nunes <duarte@scylladb.com>
Signed-off-by: Piotr Sarna <sarna@scylladb.com>
Message-Id: <70b5fdd44efbc24df61585baef024b809cabe527.1529928323.git.sarna@scylladb.com>
2018-06-27 10:02:56 +01:00
Tomasz Grabiec
b4879206fb tests: Add test for slicing a mutation source with date tiered compaction strategy
Reproducer for https://github.com/scylladb/scylla/issues/3552
2018-06-26 18:54:44 +02:00
Tomasz Grabiec
826a237c2e tests: Check that database conforms to mutation source 2018-06-26 18:54:44 +02:00
Tomasz Grabiec
19b76bf75b database: Disable sstable filtering based on min/max clustering key components
With DateTiered and TimeWindow, there is a read optimization enabled
which excludes sstables based on overlap with recorded min/max values
of clustering key components. The problem is that it doesn't take into
account partition tombstones and static rows, which should still be
returned by the reader even if there is no overlap in the query's
clustering range. A read which returns no clustering rows can
mispopulate cache, which will appear as partition deletion or writes
to the static row being lost. Until node restart or eviction of the
partition entry.

There is also a bad interaction between cache population on read and
that optimization. When the clustering range of the query doesn't
overlap with any sstable, the reader will return no partition markers
for the read, which leads cache populator to assume there is no
partition in sstables and it will cache an empty partition. This will
cause later reads of that partition to miss prior writes to that
partition until it is evicted from cache or node is restarted.

Disable until a more elaborate fix is implemented.

Fixes #3552
Fixes #3553
2018-06-26 18:54:44 +02:00
3240 changed files with 134111 additions and 47444 deletions

3
.dockerignore Normal file
View File

@@ -0,0 +1,3 @@
.git
build
seastar/build

View File

@@ -1,4 +0,0 @@
Scylla doesn't use pull-requests, please send a patch to the [mailing list](mailto:scylladb-dev@googlegroups.com) instead.
See our [contributing guidelines](../CONTRIBUTING.md) and our [Scylla development guidelines](../HACKING.md) for more information.
If you have any questions please don't hesitate to send a mail to the [dev list](mailto:scylladb-dev@googlegroups.com).

5
.gitignore vendored
View File

@@ -19,3 +19,8 @@ CMakeLists.txt.user
__pycache__CMakeLists.txt.user
.gdbinit
resources
.pytest_cache
/expressions.tokens
tags
testlog/*
test/*/*.reject

11
.gitmodules vendored
View File

@@ -1,14 +1,17 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui
url = ../scylla-swagger-ui
ignore = dirty
[submodule "dist/ami/files/scylla-ami"]
path = dist/ami/files/scylla-ami
url = ../scylla-ami
[submodule "xxHash"]
path = xxHash
url = ../xxHash
[submodule "libdeflate"]
path = libdeflate
url = ../libdeflate
[submodule "zstd"]
path = zstd
url = ../zstd

View File

@@ -97,7 +97,7 @@ scan_scylla_source_directories(
service
sstables
streaming
tests
test
thrift
tracing
transport
@@ -138,4 +138,5 @@ target_include_directories(scylla PUBLIC
${SEASTAR_INCLUDE_DIRS}
${Boost_INCLUDE_DIRS}
xxhash
libdeflate
build/release/gen)

View File

@@ -1,6 +1,6 @@
# Asking questions or requesting help
Use the [ScyllaDB user mailing list](https://groups.google.com/forum/#!forum/scylladb-users) for general questions and help.
Use the [ScyllaDB user mailing list](https://groups.google.com/forum/#!forum/scylladb-users) or the [Slack workspace](http://slack.scylladb.com) for general questions and help.
# Reporting an issue

View File

@@ -20,11 +20,22 @@ $ git submodule update --init --recursive
Scylla depends on the system package manager for its development dependencies.
Running `./install_dependencies.sh` (as root) installs the appropriate packages based on your Linux distribution.
Running `./install-dependencies.sh` (as root) installs the appropriate packages based on your Linux distribution.
On Ubuntu and Debian based Linux distributions, some packages
required to build Scylla are missing in the official upstream:
- libthrift-dev and libthrift
- antlr3-c++-dev
Try running ```sudo ./scripts/scylla_current_repo``` to add Scylla upstream,
and get the missing packages from it.
### Build system
**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native thread, and up to 3 GB per native thread while linking.
**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native
thread, and up to 3 GB per native thread while linking. GCC >= 8.1.1. is
required.
Scylla is built with [Ninja](https://ninja-build.org/), a low-level rule-based system. A Python script, `configure.py`, generates a Ninja file (`build.ninja`) based on configuration options.
@@ -43,11 +54,9 @@ The full suite of options for project configuration is available via
$ ./configure.py --help
```
The most important options are:
The most important option is:
- `--mode={release,debug,all}`: Debug mode enables [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer) and allows for debugging with tools like GDB. Debugging builds are generally slower and generate much larger object files than release builds.
- `--{enable,disable}-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.
- `--enable-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.
Source files and build targets are tracked manually in `configure.py`, so the script needs to be updated when new files or targets are added or removed.
@@ -55,6 +64,30 @@ To save time -- for instance, to avoid compiling all unit tests -- you can also
```bash
$ ninja-build build/release/tests/schema_change_test
$ ninja-build build/release/service/storage_proxy.o
```
You can also specify a single mode. For example
```bash
$ ninja-build release
```
Will build everytihng in release mode. The valid modes are
* Debug: Enables [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer)
and other sanity checks. It has no optimizations, which allows for debugging with tools like
GDB. Debugging builds are generally slower and generate much larger object files than release builds.
* Release: Fewer checks and more optimizations. It still has debug info.
* Dev: No optimizations or debug info. The objective is to compile and link as fast as possible.
This is useful for the first iterations of a patch.
Note that by default unit tests binaries are stripped so they can't be used with gdb or seastar-addr2line.
To include debug information in the unit test binary, build the test binary with a `_g` suffix. For example,
```bash
$ ninja-build build/release/tests/schema_change_test_g
```
### Unit testing
@@ -83,7 +116,7 @@ The `-c1 -m1G` arguments limit this Seastar-based test to a single system thread
### Preparing patches
All changes to Scylla are submitted as patches to the public mailing list. Once a patch is approved by one of the maintainers of the project, it is committed to the maintainers' copy of the repository at https://github.com/scylladb/scylla.
All changes to Scylla are submitted as patches to the public [mailing list](mailto:scylladb-dev@googlegroups.com). Once a patch is approved by one of the maintainers of the project, it is committed to the maintainers' copy of the repository at https://github.com/scylladb/scylla.
Detailed instructions for formatting patches for the mailing list and advice on preparing good patches are available at the [ScyllaDB website](http://docs.scylladb.com/contribute/). There are also some guidelines that can help you make the patch review process smoother:
@@ -112,6 +145,8 @@ The usual is "Tests: unit (release)", although running debug tests is encouraged
5. When answering review comments, prefer inline quotes as they make it easier to track the conversation across multiple e-mails.
6. The Linux kernel's [Submitting Patches](https://www.kernel.org/doc/html/v4.19/process/submitting-patches.html) document offers excellent advice on how to prepare patches and patchsets for review. Since the Scylla development process is derived from the kernel's, almost all of the advice there is directly applicable.
### Finding a person to review and merge your patches
You can use the `scripts/find-maintainer` script to find a subsystem maintainer and/or reviewer for your patches. The script accepts a filename in the git source tree as an argument and outputs a list of subsystems the file belongs to and their respective maintainers and reviewers. For example, if you changed the `cql3/statements/create_view_statement.hh` file, run the script as follows:
@@ -164,6 +199,29 @@ On a development machine, one might run Scylla as
$ SCYLLA_HOME=$HOME/scylla build/release/scylla --overprovisioned --developer-mode=yes
```
To interact with scylla it is recommended to build our versions of
cqlsh and nodetool. They are available at
https://github.com/scylladb/scylla-tools-java and can be built with
```bash
$ sudo ./install-dependencies.sh
$ ant jar
```
cqlsh should work out of the box, but nodetool depends on a running
scylla-jmx (https://github.com/scylladb/scylla-jmx). It can be build
with
```bash
$ mvn package
```
and must be started with
```bash
$ ./scripts/scylla-jmx
```
### Branches and tags
Multiple release branches are maintained on the Git repository at https://github.com/scylladb/scylla. Release 1.5, for instance, is tracked on the `branch-1.5` branch.
@@ -254,7 +312,7 @@ In this example, `10.0.0.2` will be sent up to 16 jobs and the local machine wil
When a compilation is in progress, the status of jobs on all remote machines can be visualized in the terminal with `distccmon-text` or graphically as a GTK application with `distccmon-gnome`.
One thing to keep in mind is that linking object files happens on the coordinating machine, which can be a bottleneck. See the next section speeding up this process.
One thing to keep in mind is that linking object files happens on the coordinating machine, which can be a bottleneck. See the next sections speeding up this process.
### Using the `gold` linker
@@ -264,6 +322,24 @@ Linking Scylla can be slow. The gold linker can replace GNU ld and often speeds
$ sudo alternatives --config ld
```
### Using split dwarf
With debug info enabled, most of the link time is spent copying and
relocating it. It is possible to leave most of the debug info out of
the link by writing it to a side .dwo file. This is done by passing
`-gsplit-dwarf` to gcc.
Unfortunately just `-gsplit-dwarf` would slow down `gdb` startup. To
avoid that the gold linker can be told to create an index with
`--gdb-index`.
More info at https://gcc.gnu.org/wiki/DebugFission.
Both options can be enable by passing `--split-dwarf` to configure.py.
Note that distcc is *not* compatible with it, but icecream
(https://github.com/icecc/icecream) is.
### Testing changes in Seastar with Scylla
Sometimes Scylla development is closely tied with a feature being developed in Seastar. It can be useful to compile Scylla with a particular check-out of Seastar.
@@ -277,3 +353,8 @@ $ git remote add local /home/tsmith/src/seastar
$ git remote update
$ git checkout -t local/my_local_seastar_branch
```
### Core dump debugging
Slides:
2018.11.20: https://www.slideshare.net/tomekgrabiec/scylla-core-dump-debugging-tools

View File

@@ -5,8 +5,6 @@ F: Filename, directory, or pattern for the subsystem
---
AUTH
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Calle Wilund <calle@scylladb.com>
R: Vlad Zolotarov <vladz@scylladb.com>
R: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
@@ -14,22 +12,17 @@ F: auth/*
CACHE
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Paweł Dziepak <pdziepak@scylladb.com>
R: Piotr Jastrzebski <piotr@scylladb.com>
F: row_cache*
F: *mutation*
F: tests/mvcc*
COMMITLOG / BATCHLOGa
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Calle Wilund <calle@scylladb.com>
F: db/commitlog/*
F: db/batch*
COORDINATOR
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Gleb Natapov <gleb@scylladb.com>
F: service/storage_proxy*
@@ -49,12 +42,10 @@ M: Pekka Enberg <penberg@scylladb.com>
F: cql3/*
COUNTERS
M: Paweł Dziepak <pdziepak@scylladb.com>
F: counters*
F: tests/counter_test*
GOSSIP
M: Duarte Nunes <duarte@scylladb.com>
M: Tomasz Grabiec <tgrabiec@scylladb.com>
R: Asias He <asias@scylladb.com>
F: gms/*
@@ -65,14 +56,11 @@ F: dist/docker/*
LSA
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Paweł Dziepak <pdziepak@scylladb.com>
F: utils/logalloc*
MATERIALIZED VIEWS
M: Duarte Nunes <duarte@scylladb.com>
M: Pekka Enberg <penberg@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
R: Duarte Nunes <duarte@scylladb.com>
M: Nadav Har'El <nyh@scylladb.com>
F: db/view/*
F: cql3/statements/*view*
@@ -82,14 +70,12 @@ F: dist/*
REPAIR
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Asias He <asias@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
F: repair/*
SCHEMA MANAGEMENT
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
M: Pekka Enberg <penberg@scylladb.com>
F: db/schema_tables*
F: db/legacy_schema_migrator*
@@ -98,15 +84,13 @@ F: schema*
SECONDARY INDEXES
M: Pekka Enberg <penberg@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
M: Nadav Har'El <nyh@scylladb.com>
R: Pekka Enberg <penberg@scylladb.com>
F: db/index/*
F: cql3/statements/*index*
SSTABLES
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Raphael S. Carvalho <raphaelsc@scylladb.com>
R: Glauber Costa <glauber@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
@@ -114,18 +98,17 @@ F: sstables/*
STREAMING
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Asias He <asias@scylladb.com>
F: streaming/*
F: service/storage_service.*
THRIFT TRANSPORT LAYER
M: Duarte Nunes <duarte@scylladb.com>
F: thrift/*
ALTERNATOR
M: Nadav Har'El <nyh@scylladb.com>
F: alternator/*
F: alternator-test/*
THE REST
M: Avi Kivity <avi@scylladb.com>
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Nadav Har'El <nyh@scylladb.com>
F: *

View File

@@ -1,29 +0,0 @@
Seastar and DPDK
================
Seastar uses the Data Plane Development Kit to drive NIC hardware directly. This
provides an enormous performance boost.
To enable DPDK, specify `--enable-dpdk` to `./configure.py`, and `--dpdk-pmd` as a
run-time parameter. This will use the DPDK package provided as a git submodule with the
seastar sources.
To use your own self-compiled DPDK package, follow this procedure:
1. Setup host to compile DPDK:
- Ubuntu
`sudo apt-get install -y build-essential linux-image-extra-$(uname -r)`
2. Prepare a DPDK SDK:
- Download the latest DPDK release: `wget http://dpdk.org/browse/dpdk/snapshot/dpdk-1.8.0.tar.gz`
- Untar it.
- Edit config/common_linuxapp: set CONFIG_RTE_MBUF_REFCNT and CONFIG_RTE_LIBRTE_KNI to 'n'.
- For DPDK 1.7.x: edit config/common_linuxapp:
- Set CONFIG_RTE_LIBRTE_PMD_BOND to 'n'.
- Set CONFIG_RTE_MBUF_SCATTER_GATHER to 'n'.
- Set CONFIG_RTE_LIBRTE_IP_FRAG to 'n'.
- Start the tools/setup.sh script as root.
- Compile a linuxapp target (option 9).
- Install IGB_UIO module (option 11).
- Bind some physical port to IGB_UIO (option 17).
- Configure hugepage mappings (option 14/15).
3. Run a configure.py: `./configure.py --dpdk-target <Path to untared dpdk-1.8.0 above>/x86_64-native-linuxapp-gcc`.

View File

@@ -2,17 +2,23 @@
## Quick-start
To get the build going quickly, Scylla offers a [frozen toolchain](tools/toolchain/README.md)
which would build and run Scylla using a pre-configured Docker image.
Using the frozen toolchain will also isolate all of the installed
dependencies in a Docker container.
Assuming you have met the toolchain prerequisites, which is running
Docker in user mode, building and running is as easy as:
```bash
$ git submodule update --init --recursive
$ sudo ./install-dependencies.sh
$ ./configure.py --mode=release
$ ninja-build -j4 # Assuming 4 system threads.
$ ./build/release/scylla
$ # Rejoice!
```
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla
$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1
```
Please see [HACKING.md](HACKING.md) for detailed information on building and developing Scylla.
**Note**: GCC >= 8.1.1 is required to compile Scylla.
## Running Scylla
* Run Scylla
@@ -21,10 +27,10 @@ Please see [HACKING.md](HACKING.md) for detailed information on building and dev
```
* run Scylla with one CPU and ./tmp as data directory
* run Scylla with one CPU and ./tmp as work directory
```
./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1
./build/release/scylla --workdir tmp --smp 1
```
* For more run options:
@@ -32,6 +38,24 @@ Please see [HACKING.md](HACKING.md) for detailed information on building and dev
./build/release/scylla --help
```
## Scylla APIs and compatibility
By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and
Thrift. There is also experimental support for the API of Amazon DynamoDB,
but being experimental it needs to be explicitly enabled to be used. For more
information on how to enable the experimental DynamoDB compatibility in Scylla,
and the current limitations of this feature, see
[Alternator](docs/alternator/alternator.md) and
[Getting started with Alternator](docs/alternator/getting-started.md).
## Documentation
Documentation can be found in [./docs](./docs) and on the
[wiki](https://github.com/scylladb/scylla/wiki). There is currently no clear
definition of what goes where, so when looking for something be sure to check
both.
Seastar documentation can be found [here](http://docs.seastar.io/master/index.html).
User documentation can be found [here](https://docs.scylladb.com/).
## Building Fedora RPM
As a pre-requisite, you need to install [Mock](https://fedoraproject.org/wiki/Mock) on your machine:
@@ -75,4 +99,5 @@ docker run -p $(hostname -i):9042:9042 -i -t <image name>
## Contributing to Scylla
[Hacking howto](HACKING.md)
[Guidelines for contributing](CONTRIBUTING.md)

View File

@@ -1,6 +1,7 @@
#!/bin/sh
VERSION=666.development
PRODUCT=scylla
VERSION=3.3.4
if test -f version
then
@@ -22,3 +23,4 @@ echo "$SCYLLA_VERSION-$SCYLLA_RELEASE"
mkdir -p build
echo "$SCYLLA_VERSION" > build/SCYLLA-VERSION-FILE
echo "$SCYLLA_RELEASE" > build/SCYLLA-RELEASE-FILE
echo "$PRODUCT" > build/SCYLLA-PRODUCT-FILE

78
alternator-test/README.md Normal file
View File

@@ -0,0 +1,78 @@
Tests for Alternator that should also pass, identically, against DynamoDB.
Tests use the boto3 library for AWS API, and the pytest frameworks
(both are available from Linux distributions, or with "pip install").
To run all tests against the local installation of Alternator on
http://localhost:8000, just run `pytest`.
Some additional pytest options:
* To run all tests in a single file, do `pytest test_table.py`.
* To run a single specific test, do `pytest test_table.py::test_create_table_unsupported_names`.
* Additional useful pytest options, especially useful for debugging tests:
* -v: show the names of each individual test running instead of just dots.
* -s: show the full output of running tests (by default, pytest captures the test's output and only displays it if a test fails)
Add the `--aws` option to test against AWS instead of the local installation.
For example - `pytest --aws test_item.py` or `pytest --aws`.
If you plan to run tests against AWS and not just a local Scylla installation,
the files ~/.aws/credentials should be configured with your AWS key:
```
[default]
aws_access_key_id = XXXXXXXXXXXXXXXXXXXX
aws_secret_access_key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
and ~/.aws/config with the default region to use in the test:
```
[default]
region = us-east-1
```
## HTTPS support
In order to run tests with HTTPS, run pytest with `--https` parameter. Note that the Scylla cluster needs to be provided
with alternator\_https\_port configuration option in order to initialize a HTTPS server.
Moreover, running an instance of a HTTPS server requires a certificate. Here's how to easily generate
a key and a self-signed certificate, which is sufficient to run `--https` tests:
```
openssl genrsa 2048 > scylla.key
openssl req -new -x509 -nodes -sha256 -days 365 -key scylla.key -out scylla.crt
```
If this pair is put into `conf/` directory, it will be enough
to allow the alternator HTTPS server to think it's been authorized and properly certified.
Still, boto3 library issues warnings that the certificate used for communication is self-signed,
and thus should not be trusted. For the sake of running local tests this warning is explicitly ignored.
## Authorization
By default, boto3 prepares a properly signed Authorization header with every request.
In order to confirm the authorization, the server recomputes the signature by using
user credentials (user-provided username + a secret key known by the server),
and then checks if it matches the signature from the header.
Early alternator code did not verify signatures at all, which is also allowed by the protocol.
A partial implementation of the authorization verification can be allowed by providing a Scylla
configuration parameter:
```yaml
alternator_enforce_authorization: true
```
The implementation is currently coupled with Scylla's system\_auth.roles table,
which means that an additional step needs to be performed when setting up Scylla
as the test environment. Tests will use the following credentials:
Username: `alternator`
Secret key: `secret_pass`
With CQLSH, it can be achieved by executing this snipped:
```bash
cqlsh -x "INSERT INTO system_auth.roles (role, salted_hash) VALUES ('alternator', 'secret_pass')"
```
Most tests expect the authorization to succeed, so they will pass even with `alternator_enforce_authorization`
turned off. However, test cases from `test_authorization.py` may require this option to be turned on,
so it's advised.

179
alternator-test/conftest.py Normal file
View File

@@ -0,0 +1,179 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# This file contains "test fixtures", a pytest concept described in
# https://docs.pytest.org/en/latest/fixture.html.
# A "fixture" is some sort of setup which an invididual test requires to run.
# The fixture has setup code and teardown code, and if multiple tests
# require the same fixture, it can be set up only once - while still allowing
# the user to run individual tests and automatically set up the fixtures they need.
import pytest
import boto3
from util import create_test_table
# Test that the Boto libraries are new enough. These tests want to test a
# large variety of DynamoDB API features, and to do this we need a new-enough
# version of the the Boto libraries (boto3 and botocore) so that they can
# access all these API features.
# In particular, the BillingMode feature was added in botocore 1.12.54.
import botocore
import sys
from distutils.version import LooseVersion
if (LooseVersion(botocore.__version__) < LooseVersion('1.12.54')):
pytest.exit("Your Boto library is too old. Please upgrade it,\ne.g. using:\n sudo pip{} install --upgrade boto3".format(sys.version_info[0]))
# By default, tests run against a local Scylla installation on localhost:8080/.
# The "--aws" option can be used to run against Amazon DynamoDB in the us-east-1
# region.
def pytest_addoption(parser):
parser.addoption("--aws", action="store_true",
help="run against AWS instead of a local Scylla installation")
parser.addoption("--https", action="store_true",
help="communicate via HTTPS protocol on port 8043 instead of HTTP when"
" running against a local Scylla installation")
# "dynamodb" fixture: set up client object for communicating with the DynamoDB
# API. Currently this chooses either Amazon's DynamoDB in the default region
# or a local Alternator installation on http://localhost:8080 - depending on the
# existence of the "--aws" option. In the future we should provide options
# for choosing other Amazon regions or local installations.
# We use scope="session" so that all tests will reuse the same client object.
@pytest.fixture(scope="session")
def dynamodb(request):
if request.config.getoption('aws'):
return boto3.resource('dynamodb')
else:
# Even though we connect to the local installation, Boto3 still
# requires us to specify dummy region and credential parameters,
# otherwise the user is forced to properly configure ~/.aws even
# for local runs.
local_url = 'https://localhost:8043' if request.config.getoption('https') else 'http://localhost:8000'
# Disable verifying in order to be able to use self-signed TLS certificates
verify = not request.config.getoption('https')
# Silencing the 'Unverified HTTPS request warning'
if request.config.getoption('https'):
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
return boto3.resource('dynamodb', endpoint_url=local_url, verify=verify,
region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass')
# "test_table" fixture: Create and return a temporary table to be used in tests
# that need a table to work on. The table is automatically deleted at the end.
# We use scope="session" so that all tests will reuse the same client object.
# This "test_table" creates a table which has a specific key schema: both a
# partition key and a sort key, and both are strings. Other fixtures (below)
# can be used to create different types of tables.
#
# TODO: Although we are careful about deleting temporary tables when the
# fixture is torn down, in some cases (e.g., interrupted tests) we can be left
# with some tables not deleted, and they will never be deleted. Because all
# our temporary tables have the same test_table_prefix, we can actually find
# and remove these old tables with this prefix. We can have a fixture, which
# test_table will require, which on teardown will delete all remaining tables
# (possibly from an older run). Because the table's name includes the current
# time, we can also remove just tables older than a particular age. Such
# mechanism will allow running tests in parallel, without the risk of deleting
# a parallel run's temporary tables.
@pytest.fixture(scope="session")
def test_table(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
])
yield table
# We get back here when this fixture is torn down. We ask Dynamo to delete
# this table, but not wait for the deletion to complete. The next time
# we create a test_table fixture, we'll choose a different table name
# anyway.
table.delete()
# The following fixtures test_table_* are similar to test_table but create
# tables with different key schemas.
@pytest.fixture(scope="session")
def test_table_s(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ])
yield table
table.delete()
@pytest.fixture(scope="session")
def test_table_b(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'B' } ])
yield table
table.delete()
@pytest.fixture(scope="session")
def test_table_sb(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'B' } ])
yield table
table.delete()
@pytest.fixture(scope="session")
def test_table_sn(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'N' } ])
yield table
table.delete()
# "filled_test_table" fixture: Create a temporary table to be used in tests
# that involve reading data - GetItem, Scan, etc. The table is filled with
# 328 items - each consisting of a partition key, clustering key and two
# string attributes. 164 of the items are in a single partition (with the
# partition key 'long') and the 164 other items are each in a separate
# partition. Finally, a 329th item is added with different attributes.
# This table is supposed to be read from, not updated nor overwritten.
# This fixture returns both a table object and the description of all items
# inserted into it.
@pytest.fixture(scope="session")
def filled_test_table(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
])
count = 164
items = [{
'p': str(i),
'c': str(i),
'attribute': "x" * 7,
'another': "y" * 16
} for i in range(count)]
items = items + [{
'p': 'long',
'c': str(i),
'attribute': "x" * (1 + i % 7),
'another': "y" * (1 + i % 16)
} for i in range(count)]
items.append({'p': 'hello', 'c': 'world', 'str': 'and now for something completely different'})
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
yield table, items
table.delete()

View File

@@ -0,0 +1,74 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for authorization
import pytest
import botocore
from botocore.exceptions import ClientError
import boto3
import requests
# Test that trying to perform an operation signed with a wrong key
# will not succeed
def test_wrong_key_access(request, dynamodb):
print("Please make sure authorization is enforced in your Scylla installation: alternator_enforce_authorization: true")
url = dynamodb.meta.client._endpoint.host
with pytest.raises(ClientError, match='UnrecognizedClientException'):
if url.endswith('.amazonaws.com'):
boto3.client('dynamodb',endpoint_url=url, aws_access_key_id='wrong_id', aws_secret_access_key='').describe_endpoints()
else:
verify = not url.startswith('https')
boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='whatever', aws_secret_access_key='', verify=verify).describe_endpoints()
# A similar test, but this time the user is expected to exist in the database (for local tests)
def test_wrong_password(request, dynamodb):
print("Please make sure authorization is enforced in your Scylla installation: alternator_enforce_authorization: true")
url = dynamodb.meta.client._endpoint.host
with pytest.raises(ClientError, match='UnrecognizedClientException'):
if url.endswith('.amazonaws.com'):
boto3.client('dynamodb',endpoint_url=url, aws_access_key_id='alternator', aws_secret_access_key='wrong_key').describe_endpoints()
else:
verify = not url.startswith('https')
boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='wrong_key', verify=verify).describe_endpoints()
# A test ensuring that expired signatures are not accepted
def test_expired_signature(dynamodb, test_table):
url = dynamodb.meta.client._endpoint.host
print(url)
headers = {'Content-Type': 'application/x-amz-json-1.0',
'X-Amz-Date': '20170101T010101Z',
'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',
'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'
}
response = requests.post(url, headers=headers, verify=False)
assert not response.ok
assert "InvalidSignatureException" in response.text and "Signature expired" in response.text
# A test ensuring that signatures that exceed current time too much are not accepted.
# Watch out - this test is valid only for around next 1000 years, it needs to be updated later.
def test_signature_too_futuristic(dynamodb, test_table):
url = dynamodb.meta.client._endpoint.host
print(url)
headers = {'Content-Type': 'application/x-amz-json-1.0',
'X-Amz-Date': '30200101T010101Z',
'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',
'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'
}
response = requests.post(url, headers=headers, verify=False)
assert not response.ok
assert "InvalidSignatureException" in response.text and "Signature not yet current" in response.text

View File

@@ -0,0 +1,253 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for batch operations - BatchWriteItem, BatchReadItem.
# Note that various other tests in other files also use these operations,
# so they are actually tested by other tests as well.
import pytest
from botocore.exceptions import ClientError
from util import random_string, full_scan, full_query, multiset
# Test ensuring that items inserted by a batched statement can be properly extracted
# via GetItem. Schema has both hash and sort keys.
def test_basic_batch_write_item(test_table):
count = 7
with test_table.batch_writer() as batch:
for i in range(count):
batch.put_item(Item={
'p': "batch{}".format(i),
'c': "batch_ck{}".format(i),
'attribute': str(i),
'another': 'xyz'
})
for i in range(count):
item = test_table.get_item(Key={'p': "batch{}".format(i), 'c': "batch_ck{}".format(i)}, ConsistentRead=True)['Item']
assert item['p'] == "batch{}".format(i)
assert item['c'] == "batch_ck{}".format(i)
assert item['attribute'] == str(i)
assert item['another'] == 'xyz'
# Test batch write to a table with only a hash key
def test_batch_write_hash_only(test_table_s):
items = [{'p': random_string(), 'val': random_string()} for i in range(10)]
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for item in items:
assert test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)['Item'] == item
# Test batch delete operation (DeleteRequest): We create a bunch of items, and
# then delete them all.
def test_batch_write_delete(test_table_s):
items = [{'p': random_string(), 'val': random_string()} for i in range(10)]
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for item in items:
assert test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)['Item'] == item
with test_table_s.batch_writer() as batch:
for item in items:
batch.delete_item(Key={'p': item['p']})
# Verify that all items are now missing:
for item in items:
assert not 'Item' in test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)
# Test the same batch including both writes and delete. Should be fine.
def test_batch_write_and_delete(test_table_s):
p1 = random_string()
p2 = random_string()
test_table_s.put_item(Item={'p': p1})
assert 'Item' in test_table_s.get_item(Key={'p': p1}, ConsistentRead=True)
assert not 'Item' in test_table_s.get_item(Key={'p': p2}, ConsistentRead=True)
with test_table_s.batch_writer() as batch:
batch.put_item({'p': p2})
batch.delete_item(Key={'p': p1})
assert not 'Item' in test_table_s.get_item(Key={'p': p1}, ConsistentRead=True)
assert 'Item' in test_table_s.get_item(Key={'p': p2}, ConsistentRead=True)
# It is forbidden to update the same key twice in the same batch.
# DynamoDB says "Provided list of item keys contains duplicates".
def test_batch_write_duplicate_write(test_table_s, test_table):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table_s.batch_writer() as batch:
batch.put_item({'p': p})
batch.put_item({'p': p})
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table.batch_writer() as batch:
batch.put_item({'p': p, 'c': c})
batch.put_item({'p': p, 'c': c})
# But it is fine to touch items with one component the same, but the other not.
other = random_string()
with test_table.batch_writer() as batch:
batch.put_item({'p': p, 'c': c})
batch.put_item({'p': p, 'c': other})
batch.put_item({'p': other, 'c': c})
def test_batch_write_duplicate_delete(test_table_s, test_table):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table_s.batch_writer() as batch:
batch.delete_item(Key={'p': p})
batch.delete_item(Key={'p': p})
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.delete_item(Key={'p': p, 'c': c})
# But it is fine to touch items with one component the same, but the other not.
other = random_string()
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.delete_item(Key={'p': p, 'c': other})
batch.delete_item(Key={'p': other, 'c': c})
def test_batch_write_duplicate_write_and_delete(test_table_s, test_table):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table_s.batch_writer() as batch:
batch.delete_item(Key={'p': p})
batch.put_item({'p': p})
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.put_item({'p': p, 'c': c})
# But it is fine to touch items with one component the same, but the other not.
other = random_string()
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.put_item({'p': p, 'c': other})
batch.put_item({'p': other, 'c': c})
# Test that BatchWriteItem's PutRequest completely replaces an existing item.
# It shouldn't merge it with a previously existing value. See also the same
# test for PutItem - test_put_item_replace().
def test_batch_put_item_replace(test_table_s, test_table):
p = random_string()
with test_table_s.batch_writer() as batch:
batch.put_item(Item={'p': p, 'a': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi'}
with test_table_s.batch_writer() as batch:
batch.put_item(Item={'p': p, 'b': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}
c = random_string()
with test_table.batch_writer() as batch:
batch.put_item(Item={'p': p, 'c': c, 'a': 'hi'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 'hi'}
with test_table.batch_writer() as batch:
batch.put_item(Item={'p': p, 'c': c, 'b': 'hello'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'b': 'hello'}
# Test that if one of the batch's operations is invalid, because a key
# column is missing or has the wrong type, the entire batch is rejected
# before any write is done.
def test_batch_write_invalid_operation(test_table_s):
# test key attribute with wrong type:
p1 = random_string()
p2 = random_string()
items = [{'p': p1}, {'p': 3}, {'p': p2}]
with pytest.raises(ClientError, match='ValidationException'):
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for p in [p1, p2]:
assert not 'item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
# test missing key attribute:
p1 = random_string()
p2 = random_string()
items = [{'p': p1}, {'x': 'whatever'}, {'p': p2}]
with pytest.raises(ClientError, match='ValidationException'):
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for p in [p1, p2]:
assert not 'item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
# Basic test for BatchGetItem, reading several entire items.
# Schema has both hash and sort keys.
def test_batch_get_item(test_table):
items = [{'p': random_string(), 'c': random_string(), 'val': random_string()} for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p', 'c')} for x in items]
# We use the low-level batch_get_item API for lack of a more convenient
# API. At least it spares us the need to encode the key's types...
reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'ConsistentRead': True}})
print(reply)
got_items = reply['Responses'][test_table.name]
assert multiset(got_items) == multiset(items)
# Same, with schema has just hash key.
def test_batch_get_item_hash(test_table_s):
items = [{'p': random_string(), 'val': random_string()} for i in range(10)]
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p')} for x in items]
reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': keys, 'ConsistentRead': True}})
got_items = reply['Responses'][test_table_s.name]
assert multiset(got_items) == multiset(items)
# Test what do we get if we try to read two *missing* values in addition to
# an existing one. It turns out the missing items are simply not returned,
# with no sign they are missing.
def test_batch_get_item_missing(test_table_s):
p = random_string();
test_table_s.put_item(Item={'p': p})
reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': [{'p': random_string()}, {'p': random_string()}, {'p': p}], 'ConsistentRead': True}})
got_items = reply['Responses'][test_table_s.name]
assert got_items == [{'p' : p}]
# If all the keys requested from a particular table are missing, we still
# get a response array for that table - it's just empty.
def test_batch_get_item_completely_missing(test_table_s):
reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': [{'p': random_string()}], 'ConsistentRead': True}})
got_items = reply['Responses'][test_table_s.name]
assert got_items == []
# Test GetItem with AttributesToGet
def test_batch_get_item_attributes_to_get(test_table):
items = [{'p': random_string(), 'c': random_string(), 'val1': random_string(), 'val2': random_string()} for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p', 'c')} for x in items]
for wanted in [['p'], ['p', 'c'], ['val1'], ['p', 'val2']]:
reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'AttributesToGet': wanted, 'ConsistentRead': True}})
got_items = reply['Responses'][test_table.name]
expected_items = [{k: item[k] for k in wanted if k in item} for item in items]
assert multiset(got_items) == multiset(expected_items)
# Test GetItem with ProjectionExpression (just a simple one, with
# top-level attributes)
def test_batch_get_item_projection_expression(test_table):
items = [{'p': random_string(), 'c': random_string(), 'val1': random_string(), 'val2': random_string()} for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p', 'c')} for x in items]
for wanted in [['p'], ['p', 'c'], ['val1'], ['p', 'val2']]:
reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'ProjectionExpression': ",".join(wanted), 'ConsistentRead': True}})
got_items = reply['Responses'][test_table.name]
expected_items = [{k: item[k] for k in wanted if k in item} for item in items]
assert multiset(got_items) == multiset(expected_items)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,49 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Test for the DescribeEndpoints operation
import boto3
# Test that the DescribeEndpoints operation works as expected: that it
# returns one endpoint (it may return more, but it never does this in
# Amazon), and this endpoint can be used to make more requests.
def test_describe_endpoints(request, dynamodb):
endpoints = dynamodb.meta.client.describe_endpoints()['Endpoints']
# It is not strictly necessary that only a single endpoint be returned,
# but this is what Amazon DynamoDB does today (and so does Alternator).
assert len(endpoints) == 1
for endpoint in endpoints:
assert 'CachePeriodInMinutes' in endpoint.keys()
address = endpoint['Address']
# Check that the address is a valid endpoint by checking that we can
# send it another describe_endpoints() request ;-) Note that the
# address does not include the "http://" or "https://" prefix, and
# we need to choose one manually.
prefix = "https://" if request.config.getoption('https') else "http://"
verify = not request.config.getoption('https')
url = prefix + address
if address.endswith('.amazonaws.com'):
boto3.client('dynamodb',endpoint_url=url, verify=verify).describe_endpoints()
else:
# Even though we connect to the local installation, Boto3 still
# requires us to specify dummy region and credential parameters,
# otherwise the user is forced to properly configure ~/.aws even
# for local runs.
boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass', verify=verify).describe_endpoints()
# Nothing to check here - if the above call failed with an exception,
# the test would fail.

View File

@@ -0,0 +1,169 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the DescribeTable operation.
# Some attributes used only by a specific major feature will be tested
# elsewhere:
# 1. Tests for describing tables with global or local secondary indexes
# (the GlobalSecondaryIndexes and LocalSecondaryIndexes attributes)
# are in test_gsi.py and test_lsi.py.
# 2. Tests for the stream feature (LatestStreamArn, LatestStreamLabel,
# StreamSpecification) will be in the tests devoted to the stream
# feature.
# 3. Tests for describing a restored table (RestoreSummary, TableId)
# will be together with tests devoted to the backup/restore feature.
import pytest
from botocore.exceptions import ClientError
import re
import time
from util import multiset
# Test that DescribeTable correctly returns the table's name and state
def test_describe_table_basic(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert got['TableName'] == test_table.name
assert got['TableStatus'] == 'ACTIVE'
# Test that DescribeTable correctly returns the table's schema, in
# AttributeDefinitions and KeySchema attributes
def test_describe_table_schema(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
expected = { # Copied from test_table()'s fixture
'KeySchema': [ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
'AttributeDefinitions': [
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
]
}
assert got['KeySchema'] == expected['KeySchema']
# The list of attribute definitions may be arbitrarily reordered
assert multiset(got['AttributeDefinitions']) == multiset(expected['AttributeDefinitions'])
# Test that DescribeTable correctly returns the table's billing mode,
# in the BillingModeSummary attribute.
def test_describe_table_billing(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert got['BillingModeSummary']['BillingMode'] == 'PAY_PER_REQUEST'
# The BillingModeSummary should also contain a
# LastUpdateToPayPerRequestDateTime attribute, which is a date.
# We don't know what date this is supposed to be, but something we
# do know is that the test table was created already with this billing
# mode, so the table creation date should be the same as the billing
# mode setting date.
assert 'LastUpdateToPayPerRequestDateTime' in got['BillingModeSummary']
assert got['BillingModeSummary']['LastUpdateToPayPerRequestDateTime'] == got['CreationDateTime']
# Test that DescribeTable correctly returns the table's creation time.
# We don't know what this creation time is supposed to be, so this test
# cannot be very thorough... We currently just tests against something we
# know to be wrong - returning the *current* time, which changes on every
# call.
@pytest.mark.xfail(reason="DescribeTable does not return table creation time")
def test_describe_table_creation_time(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'CreationDateTime' in got
time1 = got['CreationDateTime']
time.sleep(1)
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
time2 = got['CreationDateTime']
assert time1 == time2
# Test that DescribeTable returns the table's estimated item count
# in the ItemCount attribute. Unfortunately, there's not much we can
# really test here... The documentation says that the count can be
# delayed by six hours, so the number we get here may have no relation
# to the current number of items in the test table. The attribute should exist,
# though. This test does NOT verify that ItemCount isn't always returned as
# zero - such stub implementation will pass this test.
@pytest.mark.xfail(reason="DescribeTable does not return table item count")
def test_describe_table_item_count(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'ItemCount' in got
# Similar test for estimated size in bytes - TableSizeBytes - which again,
# may reflect the size as long as six hours ago.
@pytest.mark.xfail(reason="DescribeTable does not return table size")
def test_describe_table_size(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'TableSizeBytes' in got
# Test the ProvisionedThroughput attribute returned by DescribeTable.
# This is a very partial test: Our test table is configured without
# provisioned throughput, so obviously it will not have interesting settings
# for it. DynamoDB returns zeros for some of the attributes, even though
# the documentation suggests missing values should have been fine too.
@pytest.mark.xfail(reason="DescribeTable does not return provisioned throughput")
def test_describe_table_provisioned_throughput(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert got['ProvisionedThroughput']['NumberOfDecreasesToday'] == 0
assert got['ProvisionedThroughput']['WriteCapacityUnits'] == 0
assert got['ProvisionedThroughput']['ReadCapacityUnits'] == 0
# This is a silly test for the RestoreSummary attribute in DescribeTable -
# it should not exist in a table not created by a restore. When testing
# the backup/restore feature, we will have more meaninful tests for the
# value of this attribute in that case.
def test_describe_table_restore_summary(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert not 'RestoreSummary' in got
# This is a silly test for the SSEDescription attribute in DescribeTable -
# by default, a table is encrypted with AWS-owned keys, not using client-
# owned keys, and the SSEDescription attribute is not returned at all.
def test_describe_table_encryption(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert not 'SSEDescription' in got
# This is a silly test for the StreamSpecification attribute in DescribeTable -
# when there are no streams, this attribute should be missing.
def test_describe_table_stream_specification(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert not 'StreamSpecification' in got
# Test that the table has an ARN, a unique identifier for the table which
# includes which zone it is on, which account, and of course the table's
# name. The ARN format is described in
# https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html#genref-arns
@pytest.mark.xfail(reason="DescribeTable does not return ARN")
def test_describe_table_arn(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'TableArn' in got and got['TableArn'].startswith('arn:')
# Test that the table has a TableId.
# TODO: Figure out what is this TableId supposed to be, it is just a
# unique id that is created with the table and never changes? Or anything
# else?
@pytest.mark.xfail(reason="DescribeTable does not return TableId")
def test_describe_table_id(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'TableId' in got
# DescribeTable error path: trying to describe a non-existent table should
# result in a ResourceNotFoundException.
def test_describe_table_non_existent_table(dynamodb):
with pytest.raises(ClientError, match='ResourceNotFoundException') as einfo:
dynamodb.meta.client.describe_table(TableName='non_existent_table')
# As one of the first error-path tests that we wrote, let's test in more
# detail that the error reply has the appropriate fields:
response = einfo.value.response
print(response)
err = response['Error']
assert err['Code'] == 'ResourceNotFoundException'
assert re.match(err['Message'], 'Requested resource not found: Table: non_existent_table not found')

File diff suppressed because it is too large Load Diff

874
alternator-test/test_gsi.py Normal file
View File

@@ -0,0 +1,874 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests of GSI (Global Secondary Indexes)
#
# Note that many of these tests are slower than usual, because many of them
# need to create new tables and/or new GSIs of different types, operations
# which are extremely slow in DynamoDB, often taking minutes (!).
import pytest
import time
from botocore.exceptions import ClientError, ParamValidationError
from util import create_test_table, random_string, full_scan, full_query, multiset, list_tables
# GSIs only support eventually consistent reads, so tests that involve
# writing to a table and then expect to read something from it cannot be
# guaranteed to succeed without retrying the read. The following utility
# functions make it easy to write such tests.
# Note that in practice, there repeated reads are almost never necessary:
# Amazon claims that "Changes to the table data are propagated to the global
# secondary indexes within a fraction of a second, under normal conditions"
# and indeed, in practice, the tests here almost always succeed without a
# retry.
def assert_index_query(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs)):
return
print('assert_index_query retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs))
def assert_index_scan(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs)):
return
print('assert_index_scan retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs))
# Although quite silly, it is actually allowed to create an index which is
# identical to the base table.
def test_gsi_identical(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}
])
items = [{'p': random_string(), 'x': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert multiset(items) == multiset(full_scan(table))
assert_index_scan(table, 'hello', items)
# We can't scan a non-existant index
with pytest.raises(ClientError, match='ValidationException'):
full_scan(table, IndexName='wrong')
table.delete()
# One of the simplest forms of a non-trivial GSI: The base table has a hash
# and sort key, and the index reverses those roles. Other attributes are just
# copied.
@pytest.fixture(scope="session")
def test_table_gsi_1(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
],
)
yield table
table.delete()
def test_gsi_simple(test_table_gsi_1):
items = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]
with test_table_gsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
c = items[0]['c']
# The index allows a query on just a specific sort key, which isn't
# allowed on the base table.
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table_gsi_1, KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
expected_items = [x for x in items if x['c'] == c]
assert_index_query(test_table_gsi_1, 'hello', expected_items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert_index_scan(test_table_gsi_1, 'hello', full_scan(test_table_gsi_1))
def test_gsi_same_key(test_table_gsi_1):
c = random_string();
# All these items have the same sort key 'c' but different hash key 'p'
items = [{'p': random_string(), 'c': c, 'x': random_string()} for i in range(10)]
with test_table_gsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_1, 'hello', items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Check we get an appropriate error when trying to read a non-existing index
# of an existing table. Although the documentation specifies that a
# ResourceNotFoundException should be returned if "The operation tried to
# access a nonexistent table or index", in fact in the specific case that
# the table does exist but an index does not - we get a ValidationException.
def test_gsi_missing_index(test_table_gsi_1):
with pytest.raises(ClientError, match='ValidationException.*wrong_name'):
full_query(test_table_gsi_1, IndexName='wrong_name',
KeyConditions={'x': {'AttributeValueList': [1], 'ComparisonOperator': 'EQ'}})
with pytest.raises(ClientError, match='ValidationException.*wrong_name'):
full_scan(test_table_gsi_1, IndexName='wrong_name')
# Nevertheless, if the table itself does not exist, a query should return
# a ResourceNotFoundException, not ValidationException:
def test_gsi_missing_table(dynamodb):
with pytest.raises(ClientError, match='ResourceNotFoundException'):
dynamodb.meta.client.query(TableName='nonexistent_table', IndexName='any_name', KeyConditions={'x': {'AttributeValueList': [1], 'ComparisonOperator': 'EQ'}})
with pytest.raises(ClientError, match='ResourceNotFoundException'):
dynamodb.meta.client.scan(TableName='nonexistent_table', IndexName='any_name')
# Verify that strongly-consistent reads on GSI are *not* allowed.
@pytest.mark.xfail(reason="GSI strong consistency not checked")
def test_gsi_strong_consistency(test_table_gsi_1):
with pytest.raises(ClientError, match='ValidationException.*Consistent'):
full_query(test_table_gsi_1, KeyConditions={'c': {'AttributeValueList': ['hi'], 'ComparisonOperator': 'EQ'}}, IndexName='hello', ConsistentRead=True)
with pytest.raises(ClientError, match='ValidationException.*Consistent'):
full_scan(test_table_gsi_1, IndexName='hello', ConsistentRead=True)
# Verify that a GSI is correctly listed in describe_table
@pytest.mark.xfail(reason="DescribeTable provides index names only, no size or item count")
def test_gsi_describe(test_table_gsi_1):
desc = test_table_gsi_1.meta.client.describe_table(TableName=test_table_gsi_1.name)
assert 'Table' in desc
assert 'GlobalSecondaryIndexes' in desc['Table']
gsis = desc['Table']['GlobalSecondaryIndexes']
assert len(gsis) == 1
gsi = gsis[0]
assert gsi['IndexName'] == 'hello'
assert 'IndexSizeBytes' in gsi # actual size depends on content
assert 'ItemCount' in gsi
assert gsi['Projection'] == {'ProjectionType': 'ALL'}
assert gsi['IndexStatus'] == 'ACTIVE'
assert gsi['KeySchema'] == [{'KeyType': 'HASH', 'AttributeName': 'c'},
{'KeyType': 'RANGE', 'AttributeName': 'p'}]
# TODO: check also ProvisionedThroughput, IndexArn
# When a GSI's key includes an attribute not in the base table's key, we
# need to remember to add its type to AttributeDefinitions.
def test_gsi_missing_attribute_definition(dynamodb):
with pytest.raises(ClientError, match='ValidationException.*AttributeDefinitions'):
create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [ { 'AttributeName': 'c', 'KeyType': 'HASH' } ],
'Projection': { 'ProjectionType': 'ALL' }
}
])
# test_table_gsi_1_hash_only is a variant of test_table_gsi_1: It's another
# case where the index doesn't involve non-key attributes. Again the base
# table has a hash and sort key, but in this case the index has *only* a
# hash key (which is the base's hash key). In the materialized-view-based
# implementation, we need to remember the other part of the base key as a
# clustering key.
@pytest.fixture(scope="session")
def test_table_gsi_1_hash_only(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
],
)
yield table
table.delete()
def test_gsi_key_not_in_index(test_table_gsi_1_hash_only):
# Test with items with different 'c' values:
items = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]
with test_table_gsi_1_hash_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
c = items[0]['c']
expected_items = [x for x in items if x['c'] == c]
assert_index_query(test_table_gsi_1_hash_only, 'hello', expected_items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Test items with the same sort key 'c' but different hash key 'p'
c = random_string();
items = [{'p': random_string(), 'c': c, 'x': random_string()} for i in range(10)]
with test_table_gsi_1_hash_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_1_hash_only, 'hello', items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert_index_scan(test_table_gsi_1_hash_only, 'hello', full_scan(test_table_gsi_1_hash_only))
# A second scenario of GSI. Base table has just hash key, Index has a
# different hash key - one of the non-key attributes from the base table.
@pytest.fixture(scope="session")
def test_table_gsi_2(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_gsi_2(test_table_gsi_2):
items1 = [{'p': random_string(), 'x': random_string()} for i in range(10)]
x1 = items1[0]['x']
x2 = random_string()
items2 = [{'p': random_string(), 'x': x2} for i in range(10)]
items = items1 + items2
with test_table_gsi_2.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['x'] == x1]
assert_index_query(test_table_gsi_2, 'hello', expected_items,
KeyConditions={'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if i['x'] == x2]
assert_index_query(test_table_gsi_2, 'hello', expected_items,
KeyConditions={'x': {'AttributeValueList': [x2], 'ComparisonOperator': 'EQ'}})
# Test that when a table has a GSI, if the indexed attribute is missing, the
# item is added to the base table but not the index.
def test_gsi_missing_attribute(test_table_gsi_2):
p1 = random_string()
x1 = random_string()
test_table_gsi_2.put_item(Item={'p': p1, 'x': x1})
p2 = random_string()
test_table_gsi_2.put_item(Item={'p': p2})
# Both items are now in the base table:
assert test_table_gsi_2.get_item(Key={'p': p1})['Item'] == {'p': p1, 'x': x1}
assert test_table_gsi_2.get_item(Key={'p': p2})['Item'] == {'p': p2}
# But only the first item is in the index: It can be found using a
# Query, and a scan of the index won't find it (but a scan on the base
# will).
assert_index_query(test_table_gsi_2, 'hello', [{'p': p1, 'x': x1}],
KeyConditions={'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
assert any([i['p'] == p1 for i in full_scan(test_table_gsi_2)])
# Note: with eventually consistent read, we can't really be sure that
# and item will "never" appear in the index. We do this test last,
# so if we had a bug and such item did appear, hopefully we had enough
# time for the bug to become visible. At least sometimes.
assert not any([i['p'] == p2 for i in full_scan(test_table_gsi_2, IndexName='hello')])
# Test when a table has a GSI, if the indexed attribute has the wrong type,
# the update operation is rejected, and is added to neither base table nor
# index. This is different from the case of a *missing* attribute, where
# the item is added to the base table but not index.
# The following three tests test_gsi_wrong_type_attribute_{put,update,batch}
# test updates using PutItem, UpdateItem, and BatchWriteItem respectively.
def test_gsi_wrong_type_attribute_put(test_table_gsi_2):
# PutItem with wrong type for 'x' is rejected, item isn't created even
# in the base table.
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*mismatch'):
test_table_gsi_2.put_item(Item={'p': p, 'x': 3})
assert not 'Item' in test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)
def test_gsi_wrong_type_attribute_update(test_table_gsi_2):
# An UpdateItem with wrong type for 'x' is also rejected, but naturally
# if the item already existed, it remains as it was.
p = random_string()
x = random_string()
test_table_gsi_2.put_item(Item={'p': p, 'x': x})
with pytest.raises(ClientError, match='ValidationException.*mismatch'):
test_table_gsi_2.update_item(Key={'p': p}, AttributeUpdates={'x': {'Value': 3, 'Action': 'PUT'}})
assert test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'x': x}
def test_gsi_wrong_type_attribute_batch(test_table_gsi_2):
# In a BatchWriteItem, if any update is forbidden, the entire batch is
# rejected, and none of the updates happen at all.
p1 = random_string()
p2 = random_string()
p3 = random_string()
items = [{'p': p1, 'x': random_string()},
{'p': p2, 'x': 3},
{'p': p3, 'x': random_string()}]
with pytest.raises(ClientError, match='ValidationException.*mismatch'):
with test_table_gsi_2.batch_writer() as batch:
for item in items:
batch.put_item(item)
for p in [p1, p2, p3]:
assert not 'Item' in test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)
# A third scenario of GSI. Index has a hash key and a sort key, both are
# non-key attributes from the base table. This scenario may be very
# difficult to implement in Alternator because Scylla's materialized-views
# implementation only allows one new key column in the view, and here
# we need two (which, also, aren't actual columns, but map items).
@pytest.fixture(scope="session")
def test_table_gsi_3(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'a', 'KeyType': 'HASH' },
{ 'AttributeName': 'b', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_gsi_3(test_table_gsi_3):
items = [{'p': random_string(), 'a': random_string(), 'b': random_string()} for i in range(10)]
with test_table_gsi_3.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_3, 'hello', [items[3]],
KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
def test_gsi_update_second_regular_base_column(test_table_gsi_3):
items = [{'p': random_string(), 'a': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
with test_table_gsi_3.batch_writer() as batch:
for item in items:
batch.put_item(item)
items[3]['b'] = 'updated'
test_table_gsi_3.update_item(Key={'p': items[3]['p']}, AttributeUpdates={'b': {'Value': 'updated', 'Action': 'PUT'}})
assert_index_query(test_table_gsi_3, 'hello', [items[3]],
KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
# Test that when a table has a GSI, if the indexed attribute is missing, the
# item is added to the base table but not the index.
# This is the same feature we already tested in test_gsi_missing_attribute()
# above, but on a different table: In that test we used test_table_gsi_2,
# with one indexed attribute, and in this test we use test_table_gsi_3 which
# has two base regular attributes in the view key, and more possibilities
# of which value might be missing. Reproduces issue #6008.
def test_gsi_missing_attribute_3(test_table_gsi_3):
p = random_string()
a = random_string()
b = random_string()
# First, add an item with a missing "a" value. It should appear in the
# base table, but not in the index:
test_table_gsi_3.put_item(Item={'p': p, 'b': b})
assert test_table_gsi_3.get_item(Key={'p': p})['Item'] == {'p': p, 'b': b}
# Note: with eventually consistent read, we can't really be sure that
# an item will "never" appear in the index. We hope that if a bug exists
# and such an item did appear, sometimes the delay here will be enough
# for the unexpected item to become visible.
assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])
# Same thing for an item with a missing "b" value:
test_table_gsi_3.put_item(Item={'p': p, 'a': a})
assert test_table_gsi_3.get_item(Key={'p': p})['Item'] == {'p': p, 'a': a}
assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])
# And for an item missing both:
test_table_gsi_3.put_item(Item={'p': p})
assert test_table_gsi_3.get_item(Key={'p': p})['Item'] == {'p': p}
assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])
# A fourth scenario of GSI. Two GSIs on a single base table.
@pytest.fixture(scope="session")
def test_table_gsi_4(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello_a',
'KeySchema': [
{ 'AttributeName': 'a', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
},
{ 'IndexName': 'hello_b',
'KeySchema': [
{ 'AttributeName': 'b', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
# Test that a base table with two GSIs updates both as expected.
def test_gsi_4(test_table_gsi_4):
items = [{'p': random_string(), 'a': random_string(), 'b': random_string()} for i in range(10)]
with test_table_gsi_4.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_4, 'hello_a', [items[3]],
KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'}})
assert_index_query(test_table_gsi_4, 'hello_b', [items[3]],
KeyConditions={'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
# Verify that describe_table lists the two GSIs.
def test_gsi_4_describe(test_table_gsi_4):
desc = test_table_gsi_4.meta.client.describe_table(TableName=test_table_gsi_4.name)
assert 'Table' in desc
assert 'GlobalSecondaryIndexes' in desc['Table']
gsis = desc['Table']['GlobalSecondaryIndexes']
assert len(gsis) == 2
assert multiset([g['IndexName'] for g in gsis]) == multiset(['hello_a', 'hello_b'])
# A scenario for GSI in which the table has both hash and sort key
@pytest.fixture(scope="session")
def test_table_gsi_5(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x', 'KeyType': 'RANGE' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_gsi_5(test_table_gsi_5):
items1 = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]
p1, x1 = items1[0]['p'], items1[0]['x']
p2, x2 = random_string(), random_string()
items2 = [{'p': p2, 'c': random_string(), 'x': x2} for i in range(10)]
items = items1 + items2
with test_table_gsi_5.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['x'] == x1]
assert_index_query(test_table_gsi_5, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if i['p'] == p2 and i['x'] == x2]
assert_index_query(test_table_gsi_5, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'x': {'AttributeValueList': [x2], 'ComparisonOperator': 'EQ'}})
# Verify that DescribeTable correctly returns the schema of both base-table
# and secondary indexes. KeySchema is given for each of the base table and
# indexes, and AttributeDefinitions is merged for all of them together.
def test_gsi_5_describe_table_schema(test_table_gsi_5):
got = test_table_gsi_5.meta.client.describe_table(TableName=test_table_gsi_5.name)['Table']
# Copied from test_table_gsi_5 fixture
expected_base_keyschema = [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' } ]
expected_gsi_keyschema = [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x', 'KeyType': 'RANGE' } ]
expected_all_attribute_definitions = [
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' } ]
assert got['KeySchema'] == expected_base_keyschema
gsis = got['GlobalSecondaryIndexes']
assert len(gsis) == 1
assert gsis[0]['KeySchema'] == expected_gsi_keyschema
# The list of attribute definitions may be arbitrarily reordered
assert multiset(got['AttributeDefinitions']) == multiset(expected_all_attribute_definitions)
# Similar DescribeTable schema test for test_table_gsi_2. The peculiarity
# in that table is that the base table has only a hash key p, and index
# only hash hash key x; Now, while internally Scylla needs to add "p" as a
# clustering key in the materialized view (in Scylla the view key always
# contains the base key), when describing the table, "p" shouldn't be
# returned as a range key, because the user didn't ask for it.
# This test reproduces issue #5320.
@pytest.mark.xfail(reason="GSI DescribeTable spurious range key (#5320)")
def test_gsi_2_describe_table_schema(test_table_gsi_2):
got = test_table_gsi_2.meta.client.describe_table(TableName=test_table_gsi_2.name)['Table']
# Copied from test_table_gsi_2 fixture
expected_base_keyschema = [ { 'AttributeName': 'p', 'KeyType': 'HASH' } ]
expected_gsi_keyschema = [ { 'AttributeName': 'x', 'KeyType': 'HASH' } ]
expected_all_attribute_definitions = [
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' } ]
assert got['KeySchema'] == expected_base_keyschema
gsis = got['GlobalSecondaryIndexes']
assert len(gsis) == 1
assert gsis[0]['KeySchema'] == expected_gsi_keyschema
# The list of attribute definitions may be arbitrarily reordered
assert multiset(got['AttributeDefinitions']) == multiset(expected_all_attribute_definitions)
# All tests above involved "ProjectionType: ALL". This test checks how
# "ProjectionType:: KEYS_ONLY" works. We note that it projects both
# the index's key, *and* the base table's key. So items which had different
# base-table keys cannot suddenly become the same item in the index.
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_projection_keys_only(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
])
items = [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
wanted = ['p', 'x']
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert_index_scan(table, 'hello', expected_items)
table.delete()
# Test for "ProjectionType:: INCLUDE". The secondary table includes the
# its own and the base's keys (as in KEYS_ONLY) plus the extra keys given
# in NonKeyAttributes.
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_projection_include(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'INCLUDE',
'NonKeyAttributes': ['a', 'b'] }
}
])
# Some items have the projected attributes a,b and some don't:
items = [{'p': random_string(), 'x': random_string(), 'a': random_string(), 'b': random_string(), 'y': random_string()} for i in range(10)]
items = items + [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
wanted = ['p', 'x', 'a', 'b']
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert_index_scan(table, 'hello', expected_items)
print(len(expected_items))
table.delete()
# DynamoDB's says the "Projection" argument of GlobalSecondaryIndexes is
# mandatory, and indeed Boto3 enforces that it must be passed. The
# documentation then goes on to claim that the "ProjectionType" member of
# "Projection" is optional - and Boto3 allows it to be missing. But in
# fact, it is not allowed to be missing: DynamoDB complains: "Unknown
# ProjectionType: null".
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_missing_projection_type(dynamodb):
with pytest.raises(ClientError, match='ValidationException.*ProjectionType'):
create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
'Projection': {}
}
])
# update_table() for creating a GSI is an asynchronous operation.
# The table's TableStatus changes from ACTIVE to UPDATING for a short while
# and then goes back to ACTIVE, but the new GSI's IndexStatus appears as
# CREATING, until eventually (after a *long* time...) it becomes ACTIVE.
# During the CREATING phase, at some point the Backfilling attribute also
# appears, until it eventually disappears. We need to wait until all three
# markers indicate completion.
# Unfortunately, while boto3 has a client.get_waiter('table_exists') to
# wait for a table to exists, there is no such function to wait for an
# index to come up, so we need to code it ourselves.
def wait_for_gsi(table, gsi_name):
start_time = time.time()
# Surprisingly, even for tiny tables this can take a very long time
# on DynamoDB - often many minutes!
for i in range(300):
time.sleep(1)
desc = table.meta.client.describe_table(TableName=table.name)
table_status = desc['Table']['TableStatus']
if table_status != 'ACTIVE':
print('%d Table status still %s' % (i, table_status))
continue
index_desc = [x for x in desc['Table']['GlobalSecondaryIndexes'] if x['IndexName'] == gsi_name]
assert len(index_desc) == 1
index_status = index_desc[0]['IndexStatus']
if index_status != 'ACTIVE':
print('%d Index status still %s' % (i, index_status))
continue
# When the index is ACTIVE, this must be after backfilling completed
assert not 'Backfilling' in index_desc[0]
print('wait_for_gsi took %d seconds' % (time.time() - start_time))
return
raise AssertionError("wait_for_gsi did not complete")
# Similarly to how wait_for_gsi() waits for a GSI to finish adding,
# this function waits for a GSI to be finally deleted.
def wait_for_gsi_gone(table, gsi_name):
start_time = time.time()
for i in range(300):
time.sleep(1)
desc = table.meta.client.describe_table(TableName=table.name)
table_status = desc['Table']['TableStatus']
if table_status != 'ACTIVE':
print('%d Table status still %s' % (i, table_status))
continue
if 'GlobalSecondaryIndexes' in desc['Table']:
index_desc = [x for x in desc['Table']['GlobalSecondaryIndexes'] if x['IndexName'] == gsi_name]
if len(index_desc) != 0:
index_status = index_desc[0]['IndexStatus']
print('%d Index status still %s' % (i, index_status))
continue
print('wait_for_gsi_gone took %d seconds' % (time.time() - start_time))
return
raise AssertionError("wait_for_gsi_gone did not complete")
# All tests above involved creating a new table with a GSI up-front. This
# test will test creating a base table *without* a GSI, putting data in
# it, and then adding a GSI with the UpdateTable operation. This starts
# a backfilling stage - where data is copied to the index - and when this
# stage is done, the index is usable. Items whose indexed column contains
# the wrong type are silently ignored and not added to the index (it would
# not have been possible to add such items if the GSI was already configured
# when they were added).
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_backfill(dynamodb):
# First create, and fill, a table without GSI. The items in items1
# will have the appropriate string type for 'x' and will later get
# indexed. Items in item2 have no value for 'x', and in item3 'x' is in
# not a string; So the items in items2 and items3 will be missing
# in the index we'll create later.
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ])
items1 = [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]
items2 = [{'p': random_string(), 'y': random_string()} for i in range(10)]
items3 = [{'p': random_string(), 'x': i} for i in range(10)]
items = items1 + items2 + items3
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert multiset(items) == multiset(full_scan(table))
# Now use UpdateTable to create the GSI
dynamodb.meta.client.update_table(TableName=table.name,
AttributeDefinitions=[{ 'AttributeName': 'x', 'AttributeType': 'S' }],
GlobalSecondaryIndexUpdates=[ { 'Create':
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'x', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}}])
# update_table is an asynchronous operation. We need to wait until it
# finishes and the table is backfilled.
wait_for_gsi(table, 'hello')
# As explained above, only items in items1 got copied to the gsi,
# and Scan on them works as expected.
# Note that we don't need to retry the reads here (i.e., use the
# assert_index_scan() or assert_index_query() functions) because after
# we waited for backfilling to complete, we know all the pre-existing
# data is already in the index.
assert multiset(items1) == multiset(full_scan(table, IndexName='hello'))
# We can also use Query on the new GSI, to search on the attribute x:
assert multiset([items1[3]]) == multiset(full_query(table,
IndexName='hello',
KeyConditions={'x': {'AttributeValueList': [items1[3]['x']], 'ComparisonOperator': 'EQ'}}))
# Let's also test that we cannot add another index with the same name
# that already exists
with pytest.raises(ClientError, match='ValidationException.*already exists'):
dynamodb.meta.client.update_table(TableName=table.name,
AttributeDefinitions=[{ 'AttributeName': 'y', 'AttributeType': 'S' }],
GlobalSecondaryIndexUpdates=[ { 'Create':
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'y', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}}])
table.delete()
# Test deleting an existing GSI using UpdateTable
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_delete(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
items = [{'p': random_string(), 'x': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# So far, we have the index for "x" and can use it:
assert_index_query(table, 'hello', [items[3]],
KeyConditions={'x': {'AttributeValueList': [items[3]['x']], 'ComparisonOperator': 'EQ'}})
# Now use UpdateTable to delete the GSI for "x"
dynamodb.meta.client.update_table(TableName=table.name,
GlobalSecondaryIndexUpdates=[{ 'Delete':
{ 'IndexName': 'hello' } }])
# update_table is an asynchronous operation. We need to wait until it
# finishes and the GSI is removed.
wait_for_gsi_gone(table, 'hello')
# Now index is gone. We cannot query using it.
with pytest.raises(ClientError, match='ValidationException.*hello'):
full_query(table, IndexName='hello',
KeyConditions={'x': {'AttributeValueList': [items[3]['x']], 'ComparisonOperator': 'EQ'}})
table.delete()
# Utility function for creating a new table a GSI with the given name,
# and, if creation was successful, delete it. Useful for testing which
# GSI names work.
def create_gsi(dynamodb, index_name):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],
GlobalSecondaryIndexes=[
{ 'IndexName': index_name,
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}
])
# Verify that the GSI wasn't just ignored, as Scylla originally did ;-)
assert 'GlobalSecondaryIndexes' in table.meta.client.describe_table(TableName=table.name)['Table']
table.delete()
# Like table names (tested in test_table.py), index names must must also
# be 3-255 characters and match the regex [a-zA-Z0-9._-]+. This test
# is similar to test_create_table_unsupported_names(), but for GSI names.
# Note that Scylla is actually more limited in the length of the index
# names, because both table name and index name, together, have to fit in
# 221 characters. But we don't verify here this specific limitation.
def test_gsi_unsupported_names(dynamodb):
# Unfortunately, the boto library tests for names shorter than the
# minimum length (3 characters) immediately, and failure results in
# ParamValidationError. But the other invalid names are passed to
# DynamoDB, which returns an HTTP response code, which results in a
# CientError exception.
with pytest.raises(ParamValidationError):
create_gsi(dynamodb, 'n')
with pytest.raises(ParamValidationError):
create_gsi(dynamodb, 'nn')
with pytest.raises(ClientError, match='ValidationException.*nnnnn'):
create_gsi(dynamodb, 'n' * 256)
with pytest.raises(ClientError, match='ValidationException.*nyh'):
create_gsi(dynamodb, 'nyh@test')
# On the other hand, names following the above rules should be accepted. Even
# names which the Scylla rules forbid, such as a name starting with .
def test_gsi_non_scylla_name(dynamodb):
create_gsi(dynamodb, '.alternator_test')
# Index names with 255 characters are allowed in Dynamo. In Scylla, the
# limit is different - the sum of both table and index length cannot
# exceed 211 characters. So we test a much shorter limit.
# (compare test_create_and_delete_table_very_long_name()).
def test_gsi_very_long_name(dynamodb):
#create_gsi(dynamodb, 'n' * 255) # works on DynamoDB, but not on Scylla
create_gsi(dynamodb, 'n' * 190)
# Verify that ListTables does not list materialized views used for indexes.
# This is hard to test, because we don't really know which table names
# should be listed beyond those we created, and don't want to assume that
# no other test runs in parallel with us. So the method we chose is to use a
# unique random name for an index, and check that no table contains this
# name. This assumes that materialized-view names are composed using the
# index's name (which is currently what we do).
@pytest.fixture(scope="session")
def test_table_gsi_random_name(dynamodb):
index_name = random_string()
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': index_name,
'KeySchema': [
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
],
)
yield [table, index_name]
table.delete()
def test_gsi_list_tables(dynamodb, test_table_gsi_random_name):
table, index_name = test_table_gsi_random_name
# Check that the random "index_name" isn't a substring of any table name:
tables = list_tables(dynamodb)
for name in tables:
assert not index_name in name
# But of course, the table's name should be in the list:
assert table.name in tables

View File

@@ -0,0 +1,35 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the health check
import requests
# Test that a health check can be performed with a GET packet
def test_health_works(dynamodb):
url = dynamodb.meta.client._endpoint.host
response = requests.get(url)
assert response.ok
assert response.content.decode('utf-8').strip() == 'healthy: {}'.format(url.replace('https://', '').replace('http://', ''))
# Test that a health check only works for the root URL ('/')
def test_health_only_works_for_root_path(dynamodb):
url = dynamodb.meta.client._endpoint.host
for suffix in ['/abc', '/-', '/index.htm', '/health']:
print(url + suffix)
response = requests.get(url + suffix, verify=False)
assert response.status_code in range(400, 405)

View File

@@ -0,0 +1,402 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the CRUD item operations: PutItem, GetItem, UpdateItem, DeleteItem
import pytest
from botocore.exceptions import ClientError
from decimal import Decimal
from util import random_string, random_bytes
# Basic test for creating a new item with a random name, and reading it back
# with strong consistency.
# Only the string type is used for keys and attributes. None of the various
# optional PutItem features (Expected, ReturnValues, ReturnConsumedCapacity,
# ReturnItemCollectionMetrics, ConditionalOperator, ConditionExpression,
# ExpressionAttributeNames, ExpressionAttributeValues) are used, and
# for GetItem strong consistency is requested as well as all attributes,
# but no other optional features (AttributesToGet, ReturnConsumedCapacity,
# ProjectionExpression, ExpressionAttributeNames)
def test_basic_string_put_and_get(test_table):
p = random_string()
c = random_string()
val = random_string()
val2 = random_string()
test_table.put_item(Item={'p': p, 'c': c, 'attribute': val, 'another': val2})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item['p'] == p
assert item['c'] == c
assert item['attribute'] == val
assert item['another'] == val2
# Similar to test_basic_string_put_and_get, just uses UpdateItem instead of
# PutItem. Because the item does not yet exist, it should work the same.
def test_basic_string_update_and_get(test_table):
p = random_string()
c = random_string()
val = random_string()
val2 = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'attribute': {'Value': val, 'Action': 'PUT'}, 'another': {'Value': val2, 'Action': 'PUT'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item['p'] == p
assert item['c'] == c
assert item['attribute'] == val
assert item['another'] == val2
# Test put_item and get_item of various types for the *attributes*,
# including both scalars as well as nested documents, lists and sets.
# The full list of types tested here:
# number, boolean, bytes, null, list, map, string set, number set,
# binary set.
# The keys are still strings.
# Note that only top-level attributes are written and read in this test -
# this test does not attempt to modify *nested* attributes.
# See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/dynamodb.html
# on how to pass these various types to Boto3's put_item().
def test_put_and_get_attribute_types(test_table):
key = {'p': random_string(), 'c': random_string()}
test_items = [
Decimal("12.345"),
42,
True,
False,
b'xyz',
None,
['hello', 'world', 42],
{'hello': 'world', 'life': 42},
{'hello': {'test': 'hi', 'hello': True, 'list': [1, 2, 'hi']}},
set(['hello', 'world', 'hi']),
set([1, 42, Decimal("3.14")]),
set([b'xyz', b'hi']),
]
item = { str(i) : test_items[i] for i in range(len(test_items)) }
item.update(key)
test_table.put_item(Item=item)
got_item = test_table.get_item(Key=key, ConsistentRead=True)['Item']
assert item == got_item
# The test_empty_* tests below verify support for empty items, with no
# attributes except the key. This is a difficult case for Scylla, because
# for an empty row to exist, Scylla needs to add a "CQL row marker".
# There are several ways to create empty items - via PutItem, UpdateItem
# and deleting attributes from non-empty items, and we need to check them
# all, in several test_empty_* tests:
def test_empty_put(test_table):
p = random_string()
c = random_string()
test_table.put_item(Item={'p': p, 'c': c})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
def test_empty_put_delete(test_table):
p = random_string()
c = random_string()
test_table.put_item(Item={'p': p, 'c': c, 'hello': 'world'})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Action': 'DELETE'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
def test_empty_update(test_table):
p = random_string()
c = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
def test_empty_update_delete(test_table):
p = random_string()
c = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Value': 'world', 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Action': 'DELETE'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
# Test error handling of UpdateItem passed a bad "Action" field.
def test_update_bad_action(test_table):
p = random_string()
c = random_string()
val = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'attribute': {'Value': val, 'Action': 'NONEXISTENT'}})
# A more elaborate UpdateItem test, updating different attributes at different
# times. Includes PUT and DELETE operations.
def test_basic_string_more_update(test_table):
p = random_string()
c = random_string()
val1 = random_string()
val2 = random_string()
val3 = random_string()
val4 = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a3': {'Value': val1, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a1': {'Value': val1, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a2': {'Value': val2, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a1': {'Value': val3, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a3': {'Action': 'DELETE'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item['p'] == p
assert item['c'] == c
assert item['a1'] == val3
assert item['a2'] == val2
assert not 'a3' in item
# Test that item operations on a non-existant table name fail with correct
# error code.
def test_item_operations_nonexistent_table(dynamodb):
with pytest.raises(ClientError, match='ResourceNotFoundException'):
dynamodb.meta.client.put_item(TableName='non_existent_table',
Item={'a':{'S':'b'}})
# Fetching a non-existant item. According to the DynamoDB doc, "If there is no
# matching item, GetItem does not return any data and there will be no Item
# element in the response."
def test_get_item_missing_item(test_table):
p = random_string()
c = random_string()
assert not "Item" in test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)
# Test that if we have a table with string hash and sort keys, we can't read
# or write items with other key types to it.
def test_put_item_wrong_key_type(test_table):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types)
test_table.put_item(Item={'p': s, 'c': s})
assert test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)['Item'] == {'p': s, 'c': s}
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': b, 'c': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': n, 'c': s})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': s, 'c': b})
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': s, 'c': n})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'c': s})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': s})
def test_update_item_wrong_key_type(test_table, test_table_s):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types)
test_table.update_item(Key={'p': s, 'c': s}, AttributeUpdates={})
assert test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)['Item'] == {'p': s, 'c': s}
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': b, 'c': s}, AttributeUpdates={})
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': n, 'c': s}, AttributeUpdates={})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': s, 'c': b}, AttributeUpdates={})
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': s, 'c': n}, AttributeUpdates={})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'c': s}, AttributeUpdates={})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': s}, AttributeUpdates={})
# Should fail (spurious key columns)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': s, 'spurious': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': s, 'c': s})
def test_get_item_wrong_key_type(test_table, test_table_s):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types) but have empty result
assert not "Item" in test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': b, 'c': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': n, 'c': s})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': b})
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': n})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'c': s})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s})
# Should fail (spurious key columns)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': s, 'spurious': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': s, 'c': s})
def test_delete_item_wrong_key_type(test_table, test_table_s):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types)
test_table.delete_item(Key={'p': s, 'c': s})
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': b, 'c': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': n, 'c': s})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s, 'c': b})
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s, 'c': n})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'c': s})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s})
# Should fail (spurious key columns)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s, 'c': s, 'spurious': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': s, 'c': s})
# Most of the tests here arbitrarily used a table with both hash and sort keys
# (both strings). Let's check that a table with *only* a hash key works ok
# too, for PutItem, GetItem, and UpdateItem.
def test_only_hash_key(test_table_s):
s = random_string()
test_table_s.put_item(Item={'p': s, 'hello': 'world'})
assert test_table_s.get_item(Key={'p': s}, ConsistentRead=True)['Item'] == {'p': s, 'hello': 'world'}
test_table_s.update_item(Key={'p': s}, AttributeUpdates={'hi': {'Value': 'there', 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': s}, ConsistentRead=True)['Item'] == {'p': s, 'hello': 'world', 'hi': 'there'}
# Tests for item operations in tables with non-string hash or sort keys.
# These tests focus only on the type of the key - everything else is as
# simple as we can (string attributes, no special options for GetItem
# and PutItem). These tests also focus on individual items only, and
# not about the sort order of sort keys - this should be verified in
# test_query.py, for example.
def test_bytes_hash_key(test_table_b):
# Bytes values are passed using base64 encoding, which has weird cases
# depending on len%3 and len%4. So let's try various lengths.
for len in range(10,18):
p = random_bytes(len)
val = random_string()
test_table_b.put_item(Item={'p': p, 'attribute': val})
assert test_table_b.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'attribute': val}
def test_bytes_sort_key(test_table_sb):
p = random_string()
c = random_bytes()
val = random_string()
test_table_sb.put_item(Item={'p': p, 'c': c, 'attribute': val})
assert test_table_sb.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'attribute': val}
# Tests for using a large binary blob as hash key, sort key, or attribute.
# DynamoDB strictly limits the size of the binary hash key to 2048 bytes,
# and binary sort key to 1024 bytes, and refuses anything larger. The total
# size of an item is limited to 400KB, which also limits the size of the
# largest attributes. For more details on these limits, see
# https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
# Alternator currently does *not* have these limitations, and can accept much
# larger keys and attributes, but what we do in the following tests is to verify
# that items up to DynamoDB's maximum sizes also work well in Alternator.
def test_large_blob_hash_key(test_table_b):
b = random_bytes(2048)
test_table_b.put_item(Item={'p': b})
assert test_table_b.get_item(Key={'p': b}, ConsistentRead=True)['Item'] == {'p': b}
def test_large_blob_sort_key(test_table_sb):
s = random_string()
b = random_bytes(1024)
test_table_sb.put_item(Item={'p': s, 'c': b})
assert test_table_sb.get_item(Key={'p': s, 'c': b}, ConsistentRead=True)['Item'] == {'p': s, 'c': b}
def test_large_blob_attribute(test_table):
p = random_string()
c = random_string()
b = random_bytes(409500) # a bit less than 400KB
test_table.put_item(Item={'p': p, 'c': c, 'attribute': b })
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'attribute': b}
# Checks what it is not allowed to use in a single UpdateItem request both
# old-style AttributeUpdates and new-style UpdateExpression.
def test_update_item_two_update_methods(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
AttributeUpdates={'a': {'Value': 3, 'Action': 'PUT'}},
UpdateExpression='SET b = :val1',
ExpressionAttributeValues={':val1': 4})
# Verify that having neither AttributeUpdates nor UpdateExpression is
# allowed, and results in creation of an empty item.
def test_update_item_no_update_method(test_table_s):
p = random_string()
assert not "Item" in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
test_table_s.update_item(Key={'p': p})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p}
# Test GetItem with the AttributesToGet parameter. Result should include the
# selected attributes only - if one wants the key attributes as well, one
# needs to select them explicitly. When no key attributes are selected,
# some items may have *none* of the selected attributes. Those items are
# returned too, as empty items - they are not outright missing.
def test_getitem_attributes_to_get(dynamodb, test_table):
p = random_string()
c = random_string()
item = {'p': p, 'c': c, 'a': 'hello', 'b': 'hi'}
test_table.put_item(Item=item)
for wanted in [ ['a'], # only non-key attribute
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # Our item doesn't have this
]:
got_item = test_table.get_item(Key={'p': p, 'c': c}, AttributesToGet=wanted, ConsistentRead=True)['Item']
expected_item = {k: item[k] for k in wanted if k in item}
assert expected_item == got_item
# Basic test for DeleteItem, with hash key only
def test_delete_item_hash(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p})
assert 'Item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
test_table_s.delete_item(Key={'p': p})
assert not 'Item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
# Basic test for DeleteItem, with hash and sort key
def test_delete_item_sort(test_table):
p = random_string()
c = random_string()
key = {'p': p, 'c': c}
test_table.put_item(Item=key)
assert 'Item' in test_table.get_item(Key=key, ConsistentRead=True)
test_table.delete_item(Key=key)
assert not 'Item' in test_table.get_item(Key=key, ConsistentRead=True)
# Test that PutItem completely replaces an existing item. It shouldn't merge
# it with a previously existing value, as UpdateItem does!
# We test for a table with just hash key, and for a table with both hash and
# sort keys.
def test_put_item_replace(test_table_s, test_table):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi'}
test_table_s.put_item(Item={'p': p, 'b': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}
c = random_string()
test_table.put_item(Item={'p': p, 'c': c, 'a': 'hi'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 'hi'}
test_table.put_item(Item={'p': p, 'c': c, 'b': 'hello'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'b': 'hello'}

365
alternator-test/test_lsi.py Normal file
View File

@@ -0,0 +1,365 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests of LSI (Local Secondary Indexes)
#
# Note that many of these tests are slower than usual, because many of them
# need to create new tables and/or new LSIs of different types, operations
# which are extremely slow in DynamoDB, often taking minutes (!).
import pytest
import time
from botocore.exceptions import ClientError, ParamValidationError
from util import create_test_table, random_string, full_scan, full_query, multiset, list_tables
# Currently, Alternator's LSIs only support eventually consistent reads, so tests
# that involve writing to a table and then expect to read something from it cannot
# be guaranteed to succeed without retrying the read. The following utility
# functions make it easy to write such tests.
def assert_index_query(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs)):
return
print('assert_index_query retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs))
def assert_index_scan(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs)):
return
print('assert_index_scan retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs))
# Although quite silly, it is actually allowed to create an index which is
# identical to the base table.
def test_lsi_identical(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'S' }],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' }],
'Projection': { 'ProjectionType': 'ALL' }
}
])
items = [{'p': random_string(), 'c': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert multiset(items) == multiset(full_scan(table))
assert_index_scan(table, 'hello', items)
# We can't scan a non-existant index
with pytest.raises(ClientError, match='ValidationException'):
full_scan(table, IndexName='wrong')
table.delete()
# Checks that providing a hash key different than the base table is not allowed,
# and so is providing duplicated keys or no sort key at all
def test_lsi_wrong(dynamodb):
with pytest.raises(ClientError, match='ValidationException.*'):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'b', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
table.delete()
with pytest.raises(ClientError, match='ValidationException.*'):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
table.delete()
with pytest.raises(ClientError, match='ValidationException.*'):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
table.delete()
# A simple scenario for LSI. Base table has just hash key, Index has an
# additional sort key - one of the non-key attributes from the base table.
@pytest.fixture(scope="session")
def test_table_lsi_1(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' },
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'b', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_lsi_1(test_table_lsi_1):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string()} for i in range(10)]
p1, b1 = items1[0]['p'], items1[0]['b']
p2, b2 = random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2}]
items = items1 + items2
with test_table_lsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}})
# A second scenario of LSI. Base table has both hash and sort keys,
# a local index is created on each non-key parameter
@pytest.fixture(scope="session")
def test_table_lsi_4(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x1', 'AttributeType': 'S' },
{ 'AttributeName': 'x2', 'AttributeType': 'S' },
{ 'AttributeName': 'x3', 'AttributeType': 'S' },
{ 'AttributeName': 'x4', 'AttributeType': 'S' },
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello_' + column,
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': column, 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
} for column in ['x1','x2','x3','x4']
])
yield table
table.delete()
def test_lsi_4(test_table_lsi_4):
items1 = [{'p': random_string(), 'c': random_string(),
'x1': random_string(), 'x2': random_string(), 'x3': random_string(), 'x4': random_string()} for i in range(10)]
i_values = items1[0]
i5 = random_string()
items2 = [{'p': i5, 'c': i5, 'x1': i5, 'x2': i5, 'x3': i5, 'x4': i5}]
items = items1 + items2
with test_table_lsi_4.batch_writer() as batch:
for item in items:
batch.put_item(item)
for column in ['x1', 'x2', 'x3', 'x4']:
expected_items = [i for i in items if (i['p'], i[column]) == (i_values['p'], i_values[column])]
assert_index_query(test_table_lsi_4, 'hello_' + column, expected_items,
KeyConditions={'p': {'AttributeValueList': [i_values['p']], 'ComparisonOperator': 'EQ'},
column: {'AttributeValueList': [i_values[column]], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if (i['p'], i[column]) == (i5, i5)]
assert_index_query(test_table_lsi_4, 'hello_' + column, expected_items,
KeyConditions={'p': {'AttributeValueList': [i5], 'ComparisonOperator': 'EQ'},
column: {'AttributeValueList': [i5], 'ComparisonOperator': 'EQ'}})
def test_lsi_describe(test_table_lsi_4):
desc = test_table_lsi_4.meta.client.describe_table(TableName=test_table_lsi_4.name)
assert 'Table' in desc
assert 'LocalSecondaryIndexes' in desc['Table']
lsis = desc['Table']['LocalSecondaryIndexes']
assert(sorted([lsi['IndexName'] for lsi in lsis]) == ['hello_x1', 'hello_x2', 'hello_x3', 'hello_x4'])
# TODO: check projection and key params
# TODO: check also ProvisionedThroughput, IndexArn
# A table with selective projection - only keys are projected into the index
@pytest.fixture(scope="session")
def test_table_lsi_keys_only(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'b', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
])
yield table
table.delete()
# Check that it's possible to extract a non-projected attribute from the index,
# as the documentation promises
def test_lsi_get_not_projected_attribute(test_table_lsi_keys_only):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
p1, b1, d1 = items1[0]['p'], items1[0]['b'], items1[0]['d']
p2, b2, d2 = random_string(), random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2, 'd': d2}]
items = items1 + items2
with test_table_lsi_keys_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1 and i['d'] == d1]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}},
Select='ALL_ATTRIBUTES')
expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2 and i['d'] == d2]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},
Select='ALL_ATTRIBUTES')
expected_items = [{'d': i['d']} for i in items if i['p'] == p2 and i['b'] == b2 and i['d'] == d2]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},
Select='SPECIFIC_ATTRIBUTES', AttributesToGet=['d'])
# Check that only projected attributes can be extracted
@pytest.mark.xfail(reason="LSI in alternator currently only implement full projections")
def test_lsi_get_all_projected_attributes(test_table_lsi_keys_only):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
p1, b1, d1 = items1[0]['p'], items1[0]['b'], items1[0]['d']
p2, b2, d2 = random_string(), random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2, 'd': d2}]
items = items1 + items2
with test_table_lsi_keys_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [{'p': i['p'], 'c': i['c'],'b': i['b']} for i in items if i['p'] == p1 and i['b'] == b1]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}})
# Check that strongly consistent reads are allowed for LSI
def test_lsi_consistent_read(test_table_lsi_1):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string()} for i in range(10)]
p1, b1 = items1[0]['p'], items1[0]['b']
p2, b2 = random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2}]
items = items1 + items2
with test_table_lsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}},
ConsistentRead=True)
expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},
ConsistentRead=True)
# A table with both gsi and lsi present
@pytest.fixture(scope="session")
def test_table_lsi_gsi(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x1', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello_g1',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x1', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello_l1',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x1', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
])
yield table
table.delete()
# Test that GSI and LSI can coexist, even if they're identical
def test_lsi_and_gsi(test_table_lsi_gsi):
desc = test_table_lsi_gsi.meta.client.describe_table(TableName=test_table_lsi_gsi.name)
assert 'Table' in desc
assert 'LocalSecondaryIndexes' in desc['Table']
assert 'GlobalSecondaryIndexes' in desc['Table']
lsis = desc['Table']['LocalSecondaryIndexes']
gsis = desc['Table']['GlobalSecondaryIndexes']
assert(sorted([lsi['IndexName'] for lsi in lsis]) == ['hello_l1'])
assert(sorted([gsi['IndexName'] for gsi in gsis]) == ['hello_g1'])
items = [{'p': random_string(), 'c': random_string(), 'x1': random_string()} for i in range(17)]
p1, c1, x1 = items[0]['p'], items[0]['c'], items[0]['x1']
with test_table_lsi_gsi.batch_writer() as batch:
for item in items:
batch.put_item(item)
for index in ['hello_g1', 'hello_l1']:
expected_items = [i for i in items if i['p'] == p1 and i['x1'] == x1]
assert_index_query(test_table_lsi_gsi, index, expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'x1': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})

View File

@@ -0,0 +1,60 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Test for operations on items with *nested* attributes.
import pytest
from botocore.exceptions import ClientError
from util import random_string
# Test that we can write a top-level attribute that is a nested document, and
# read it back correctly.
def test_nested_document_attribute_write(test_table_s):
nested_value = {
'a': 3,
'b': {'c': 'hello', 'd': ['hi', 'there', {'x': 'y'}, '42']},
}
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': nested_value})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': nested_value}
# Test that if we have a top-level attribute that is a nested document (i.e.,
# a dictionary), updating this attribute will replace it entirely by a new
# nested document - not merge into the old content with the new content.
def test_nested_document_attribute_overwrite(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}
test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a': {'Value': {'c': 5}, 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'c': 5}, 'd': 5}
# Moreover, we can overwrite an entire nested document by, say, a string,
# and that's also fine.
def test_nested_document_attribute_overwrite_2(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}
test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a': {'Value': 'hi', 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi', 'd': 5}
# Verify that AttributeUpdates cannot be used to update a nested attribute -
# trying to use a dot in the name of the attribute, will just create one with
# an actual dot in its name.
def test_attribute_updates_dot(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a.b': {'Value': 3, 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a.b': 3}

View File

@@ -0,0 +1,201 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the various operations (GetItem, Query, Scan) with a
# ProjectionExpression parameter.
#
# ProjectionExpression is an expension of the legacy AttributesToGet
# parameter. Both parameters request that only a subset of the attributes
# be fetched for each item, instead of all of them. But while AttributesToGet
# was limited to top-level attributes, ProjectionExpression can request also
# nested attributes.
import pytest
from botocore.exceptions import ClientError
from util import random_string, full_scan, full_query, multiset
# Basic test for ProjectionExpression, requesting only top-level attributes.
# Result should include the selected attributes only - if one wants the key
# attributes as well, one needs to select them explicitly. When no key
# attributes are selected, an item may have *none* of the selected
# attributes, and returned as an empty item.
def test_projection_expression_toplevel(test_table):
p = random_string()
c = random_string()
item = {'p': p, 'c': c, 'a': 'hello', 'b': 'hi'}
test_table.put_item(Item=item)
for wanted in [ ['a'], # only non-key attribute
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # Our item doesn't have this
]:
got_item = test_table.get_item(Key={'p': p, 'c': c}, ProjectionExpression=",".join(wanted), ConsistentRead=True)['Item']
expected_item = {k: item[k] for k in wanted if k in item}
assert expected_item == got_item
# Various simple tests for ProjectionExpression's syntax, using only top-evel
# attributes.
def test_projection_expression_toplevel_syntax(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a')['Item'] == {'a': 'hello'}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#name': 'a'})['Item'] == {'a': 'hello'}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,b')['Item'] == {'a': 'hello', 'b': 'hi'}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression=' a , b ')['Item'] == {'a': 'hello', 'b': 'hi'}
# Missing or unused names in ExpressionAttributeNames are errors:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#wrong': 'a'})['Item'] == {'a': 'hello'}
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#name': 'a', '#unused': 'b'})['Item'] == {'a': 'hello'}
# It is not allowed to fetch the same top-level attribute twice (or in
# general, list two overlapping attributes). We get an error like
# "Invalid ProjectionExpression: Two document paths overlap with each
# other; must remove or rewrite one of these paths; path one: [a], path
# two: [a]".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,a')['Item']
# A comma with nothing after it is a syntax error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,')['Item']
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression=',a')['Item']
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,,b')['Item']
# An empty ProjectionExpression is not allowed. DynamoDB recognizes its
# syntax, but then writes: "Invalid ProjectionExpression: The expression
# can not be empty".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='')['Item']
# The following two tests are similar to test_projection_expression_toplevel()
# which tested the GetItem operation - but these test Scan and Query.
# Both test ProjectionExpression with only top-level attributes.
def test_projection_expression_scan(filled_test_table):
table, items = filled_test_table
for wanted in [ ['another'], # only non-key attributes (one item doesn't have it!)
['c', 'another'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
got_items = full_scan(table, ProjectionExpression=",".join(wanted))
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
def test_projection_expression_query(test_table):
p = random_string()
items = [{'p': p, 'c': str(i), 'a': str(i*10), 'b': str(i*100) } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
for wanted in [ ['a'], # only non-key attributes
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression=",".join(wanted))
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
# The previous tests all fetched only top-level attributes. They could all
# be written using AttributesToGet instead of ProjectionExpression (and,
# in fact, we do have similar tests with AttributesToGet in other files),
# but the previous test checked that the alternative syntax works correctly.
# The following test checks fetching more elaborate attribute paths from
# nested documents.
@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")
def test_projection_expression_path(test_table_s):
p = random_string()
test_table_s.put_item(Item={
'p': p,
'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}], 'c': 5},
'b': 'hello'
})
# Fetching the entire nested document "a" works, of course:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a')['Item'] == {'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}], 'c': 5}}
# If we fetch a.b, we get only the content of b - but it's still inside
# the a dictionary:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b')['Item'] == {'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}]}}
# Similarly, fetching a.b[0] gives us a one-element array in a dictionary.
# Note that [0] is the first element of an array.
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0]')['Item'] == {'a': {'b': [2]}}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[2]')['Item'] == {'a': {'b': [{'x': 'hi', 'y': 'yo'}]}}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[2].y')['Item'] == {'a': {'b': [{'y': 'yo'}]}}
# Trying to read any sort of non-existant attribute returns an empty item.
# This includes a non-existing top-level attribute, an attempt to read
# beyond the end of an array or a non-existant member of a dictionary, as
# well as paths which begin with a non-existant prefix.
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='x')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[3]')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.x')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.x.y')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[3].x')['Item'] == {}
# We can read multiple paths - the result are merged into one object
# structured the same was as in the original item:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.b[1]')['Item'] == {'a': {'b': [2, 4]}}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.c')['Item'] == {'a': {'b': [2], 'c': 5}}
# It is not allowed to read the same path multiple times. The error from
# DynamoDB looks like: "Invalid ProjectionExpression: Two document paths
# overlap with each other; must remove or rewrite one of these paths;
# path one: [a, b, [0]], path two: [a, b, [0]]".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.b[0]')['Item']
# Two paths are considered to "overlap" if the content of one path
# contains the content of the second path. So requesting both "a" and
# "a.b[0]" is not allowed.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,a.b[0]')['Item']
@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")
def test_query_projection_expression_path(test_table):
p = random_string()
items = [{'p': p, 'c': str(i), 'a': {'x': str(i*10), 'y': 'hi'}, 'b': 'hello' } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression="a.x")
expected_items = [{'a': {'x': x['a']['x']}} for x in items]
assert multiset(expected_items) == multiset(got_items)
@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")
def test_scan_projection_expression_path(test_table):
# This test is similar to test_query_projection_expression_path above,
# but uses a scan instead of a query. The scan will generate unrelated
# partitions created by other tests (hopefully not too many...) that we
# need to ignore. We also need to ask for "p" too, so we can filter by it.
p = random_string()
items = [{'p': p, 'c': str(i), 'a': {'x': str(i*10), 'y': 'hi'}, 'b': 'hello' } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = [ x for x in full_scan(test_table, ProjectionExpression="p, a.x") if x['p'] == p]
expected_items = [{'p': p, 'a': {'x': x['a']['x']}} for x in items]
assert multiset(expected_items) == multiset(got_items)
# It is not allowed to use both ProjectionExpression and its older cousin,
# AttributesToGet, together. If trying to do this, DynamoDB produces an error
# like "Can not use both expression and non-expression parameters in the same
# request: Non-expression parameters: {AttributesToGet} Expression
# parameters: {ProjectionExpression}
def test_projection_expression_and_attributes_to_get(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})
with pytest.raises(ClientError, match='ValidationException.*both'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a', AttributesToGet=['b'])['Item']
with pytest.raises(ClientError, match='ValidationException.*both'):
full_scan(test_table_s, ProjectionExpression='a', AttributesToGet=['a'])
with pytest.raises(ClientError, match='ValidationException.*both'):
full_query(test_table_s, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression='a', AttributesToGet=['a'])

View File

@@ -0,0 +1,516 @@
# -*- coding: utf-8 -*-
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the Query operation
import random
import pytest
from botocore.exceptions import ClientError, ParamValidationError
from decimal import Decimal
from util import random_string, random_bytes, full_query, multiset
from boto3.dynamodb.conditions import Key, Attr
# Test that scanning works fine with in-stock paginator
def test_query_basic_restrictions(dynamodb, filled_test_table):
test_table, items = filled_test_table
paginator = dynamodb.meta.client.get_paginator('query')
# EQ
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long']) == multiset(got_items)
# LT
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['12'], 'ComparisonOperator': 'LT'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] < '12']) == multiset(got_items)
# LE
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['14'], 'ComparisonOperator': 'LE'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] <= '14']) == multiset(got_items)
# GT
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['15'], 'ComparisonOperator': 'GT'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] > '15']) == multiset(got_items)
# GE
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['14'], 'ComparisonOperator': 'GE'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] >= '14']) == multiset(got_items)
# BETWEEN
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['155', '164'], 'ComparisonOperator': 'BETWEEN'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] >= '155' and item['c'] <= '164']) == multiset(got_items)
# BEGINS_WITH
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['11'], 'ComparisonOperator': 'BEGINS_WITH'}
}):
print([item for item in items if item['p'] == 'long' and item['c'].startswith('11')])
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'].startswith('11')]) == multiset(got_items)
# Test that KeyConditionExpression parameter is supported
@pytest.mark.xfail(reason="KeyConditionExpression not supported yet")
def test_query_key_condition_expression(dynamodb, filled_test_table):
test_table, items = filled_test_table
paginator = dynamodb.meta.client.get_paginator('query')
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditionExpression=Key("p").eq("long") & Key("c").lt("12")):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] < '12']) == multiset(got_items)
def test_begins_with(dynamodb, test_table):
paginator = dynamodb.meta.client.get_paginator('query')
items = [{'p': 'unorthodox_chars', 'c': sort_key, 'str': 'a'} for sort_key in [u'ÿÿÿ', u'cÿbÿ', u'cÿbÿÿabg'] ]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# TODO(sarna): Once bytes type is supported, /xFF character should be tested
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': [u'ÿÿ'], 'ComparisonOperator': 'BEGINS_WITH'}
}):
got_items += page['Items']
print(got_items)
assert sorted([d['c'] for d in got_items]) == sorted([d['c'] for d in items if d['c'].startswith(u'ÿÿ')])
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': [u'cÿbÿ'], 'ComparisonOperator': 'BEGINS_WITH'}
}):
got_items += page['Items']
print(got_items)
assert sorted([d['c'] for d in got_items]) == sorted([d['c'] for d in items if d['c'].startswith(u'cÿbÿ')])
def test_begins_with_wrong_type(dynamodb, test_table_sn):
paginator = dynamodb.meta.client.get_paginator('query')
with pytest.raises(ClientError, match='ValidationException'):
for page in paginator.paginate(TableName=test_table_sn.name, KeyConditions={
'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': [17], 'ComparisonOperator': 'BEGINS_WITH'}
}):
pass
# Items returned by Query should be sorted by the sort key. The following
# tests verify that this is indeed the case, for the three allowed key types:
# strings, binary, and numbers. These tests test not just the Query operation,
# but inherently that the sort-key sorting works.
def test_query_sort_order_string(test_table):
# Insert a lot of random items in one new partition:
# str(i) has a non-obvious sort order (e.g., "100" comes before "2") so is a nice test.
p = random_string()
items = [{'p': p, 'c': str(i)} for i in range(128)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
assert len(items) == len(got_items)
# Extract just the sort key ("c") from the items
sort_keys = [x['c'] for x in items]
got_sort_keys = [x['c'] for x in got_items]
# Verify that got_sort_keys are already sorted (in string order)
assert sorted(got_sort_keys) == got_sort_keys
# Verify that got_sort_keys are a sorted version of the expected sort_keys
assert sorted(sort_keys) == got_sort_keys
def test_query_sort_order_bytes(test_table_sb):
# Insert a lot of random items in one new partition:
# We arbitrarily use random_bytes with a random length.
p = random_string()
items = [{'p': p, 'c': random_bytes(10)} for i in range(128)]
with test_table_sb.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = full_query(test_table_sb, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
assert len(items) == len(got_items)
sort_keys = [x['c'] for x in items]
got_sort_keys = [x['c'] for x in got_items]
# Boto3's "Binary" objects are sorted as if bytes are signed integers.
# This isn't the order that DynamoDB itself uses (byte 0 should be first,
# not byte -128). Sorting the byte array ".value" works.
assert sorted(got_sort_keys, key=lambda x: x.value) == got_sort_keys
assert sorted(sort_keys) == got_sort_keys
def test_query_sort_order_number(test_table_sn):
# This is a list of numbers, sorted in correct order, and each suitable
# for accurate representation by Alternator's number type.
numbers = [
Decimal("-2e10"),
Decimal("-7.1e2"),
Decimal("-4.1"),
Decimal("-0.1"),
Decimal("-1e-5"),
Decimal("0"),
Decimal("2e-5"),
Decimal("0.15"),
Decimal("1"),
Decimal("1.00000000000000000000000001"),
Decimal("3.14159"),
Decimal("3.1415926535897932384626433832795028841"),
Decimal("31.4"),
Decimal("1.4e10"),
]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Finally, verify that we get back exactly the same numbers (with identical
# precision), and in their original sorted order.
got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
def test_query_filtering_attributes_equality(filled_test_table):
test_table, items = filled_test_table
query_filter = {
"attribute" : {
"AttributeValueList" : [ "xxxx" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx']) == multiset(got_items)
query_filter = {
"attribute" : {
"AttributeValueList" : [ "xxxx" ],
"ComparisonOperator": "EQ"
},
"another" : {
"AttributeValueList" : [ "yy" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)
# Test that FilterExpression works as expected
@pytest.mark.xfail(reason="FilterExpression not supported yet")
def test_query_filter_expression(filled_test_table):
test_table, items = filled_test_table
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, FilterExpression=Attr("attribute").eq("xxxx"))
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx']) == multiset(got_items)
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, FilterExpression=Attr("attribute").eq("xxxx") & Attr("another").eq("yy"))
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)
# QueryFilter can only contain non-key attributes in order to be compatible
def test_query_filtering_key_equality(filled_test_table):
test_table, items = filled_test_table
with pytest.raises(ClientError, match='ValidationException'):
query_filter = {
"c" : {
"AttributeValueList" : [ "5" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
with pytest.raises(ClientError, match='ValidationException'):
query_filter = {
"attribute" : {
"AttributeValueList" : [ "x" ],
"ComparisonOperator": "EQ"
},
"p" : {
"AttributeValueList" : [ "5" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
# Test Query with the AttributesToGet parameter. Result should include the
# selected attributes only - if one wants the key attributes as well, one
# needs to select them explicitly. When no key attributes are selected,
# some items may have *none* of the selected attributes. Those items are
# returned too, as empty items - they are not outright missing.
def test_query_attributes_to_get(dynamodb, test_table):
p = random_string()
items = [{'p': p, 'c': str(i), 'a': str(i*10), 'b': str(i*100) } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
for wanted in [ ['a'], # only non-key attributes
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, AttributesToGet=wanted)
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
# Test that in a table with both hash key and sort key, which keys we can
# Query by: We can Query by the hash key, by a combination of both hash and
# sort keys, but *cannot* query by just the sort key, and obviously not
# by any non-key column.
def test_query_which_key(test_table):
p = random_string()
c = random_string()
p2 = random_string()
c2 = random_string()
item1 = {'p': p, 'c': c}
item2 = {'p': p, 'c': c2}
item3 = {'p': p2, 'c': c}
for i in [item1, item2, item3]:
test_table.put_item(Item=i)
# Query by hash key only:
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
expected_items = [item1, item2]
assert multiset(expected_items) == multiset(got_items)
# Query by hash key *and* sort key (this is basically a GetItem):
got_items = full_query(test_table, KeyConditions={
'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
expected_items = [item1]
assert multiset(expected_items) == multiset(got_items)
# Query by sort key alone is not allowed. DynamoDB reports:
# "Query condition missed key schema element: p".
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# Query by a non-key isn't allowed, for the same reason - that the
# actual hash key (p) is missing in the query:
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# If we try both p and a non-key we get a complaint that the sort
# key is missing: "Query condition missed key schema element: c"
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# If we try p, c and another key, we get an error that
# "Conditions can be of length 1 or 2 only".
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'},
'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# Test the "Select" parameter of Query. The default Select mode,
# ALL_ATTRIBUTES, returns items with all their attributes. Other modes
# allow returning just specific attributes or just counting the results
# without returning items at all.
@pytest.mark.xfail(reason="Select not supported yet")
def test_query_select(test_table_sn):
numbers = [Decimal(i) for i in range(10)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num, 'x': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that we get back the numbers in their sorted order. By default,
# query returns all attributes:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
got_x_attributes = [x['x'] for x in got_items]
assert got_x_attributes == numbers
# Select=ALL_ATTRIBUTES does exactly the same as the default - return
# all attributes:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='ALL_ATTRIBUTES')['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
got_x_attributes = [x['x'] for x in got_items]
assert got_x_attributes == numbers
# Select=ALL_PROJECTED_ATTRIBUTES is not allowed on a base table (it
# is just for indexes, when IndexName is specified)
with pytest.raises(ClientError, match='ValidationException'):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='ALL_PROJECTED_ATTRIBUTES')
# Select=SPECIFIC_ATTRIBUTES requires that either a AttributesToGet
# or ProjectionExpression appears, but then really does nothing:
with pytest.raises(ClientError, match='ValidationException'):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES')
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES', AttributesToGet=['x'])['Items']
expected_items = [{'x': i} for i in numbers]
assert got_items == expected_items
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES', ProjectionExpression='x')['Items']
assert got_items == expected_items
# Select=COUNT just returns a count - not any items
got = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='COUNT')
assert got['Count'] == len(numbers)
assert not 'Items' in got
# Check again that we also get a count - not just with Select=COUNT,
# but without Select=COUNT we also get the items:
got = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
assert got['Count'] == len(numbers)
assert 'Items' in got
# Select with some unknown string generates a validation exception:
with pytest.raises(ClientError, match='ValidationException'):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='UNKNOWN')
# Test that the "Limit" parameter can be used to return only some of the
# items in a single partition. The items returned are the first in the
# sorted order.
def test_query_limit(test_table_sn):
numbers = [Decimal(i) for i in range(10)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that we get back the numbers in their sorted order.
# First, no Limit so we should get all numbers (we have few of them, so
# it all fits in the default 1MB limitation)
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
# Now try a few different Limit values, and verify that the query
# returns exactly the first Limit sorted numbers.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit)['Items']
assert len(got_items) == min(limit, len(numbers))
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers[0:limit]
# Unfortunately, the boto3 library forbids a Limit of 0 on its own,
# before even sending a request, so we can't test how the server responds.
with pytest.raises(ParamValidationError):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=0)
# In test_query_limit we tested just that Limit allows to stop the result
# after right right number of items. Here we test that such a stopped result
# can be resumed, via the LastEvaluatedKey/ExclusiveStartKey paging mechanism.
def test_query_limit_paging(test_table_sn):
numbers = [Decimal(i) for i in range(20)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that full_query() returns all these numbers, in sorted order.
# full_query() will do a query with the given limit, and resume it again
# and again until the last page.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit)
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
# Test that the ScanIndexForward parameter works, and can be used to
# return items sorted in reverse order. Combining this with Limit can
# be used to return the last items instead of the first items of the
# partition.
@pytest.mark.xfail(reason="ScanIndexForward not supported yet")
def test_query_reverse(test_table_sn):
numbers = [Decimal(i) for i in range(20)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that we get back the numbers in their sorted order or reverse
# order, depending on the ScanIndexForward parameter being True or False.
# First, no Limit so we should get all numbers (we have few of them, so
# it all fits in the default 1MB limitation)
reversed_numbers = list(reversed(numbers))
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=True)['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=False)['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == reversed_numbers
# Now try a few different Limit values, and verify that the query
# returns exactly the first Limit sorted numbers - in regular or
# reverse order, depending on ScanIndexForward.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit, ScanIndexForward=True)['Items']
assert len(got_items) == min(limit, len(numbers))
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers[0:limit]
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit, ScanIndexForward=False)['Items']
assert len(got_items) == min(limit, len(numbers))
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == reversed_numbers[0:limit]
# Test that paging also works properly with reverse order
# (ScanIndexForward=false), i.e., reverse-order queries can be resumed
@pytest.mark.xfail(reason="ScanIndexForward not supported yet")
def test_query_reverse_paging(test_table_sn):
numbers = [Decimal(i) for i in range(20)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
reversed_numbers = list(reversed(numbers))
# Verify that with ScanIndexForward=False, full_query() returns all
# these numbers in reversed sorted order - getting pages of Limit items
# at a time and resuming the query.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=False, Limit=limit)
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == reversed_numbers

View File

@@ -0,0 +1,226 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the ReturnValues parameter for the different update operations
# (PutItem, UpdateItem, DeleteItem).
import pytest
from botocore.exceptions import ClientError
from util import random_string
# Test trivial support for the ReturnValues parameter in PutItem, UpdateItem
# and DeleteItem - test that "NONE" works (and changes nothing), while a
# completely unsupported value gives an error.
# This test is useful to check that before the ReturnValues parameter is fully
# implemented, it returns an error when a still-unsupported ReturnValues
# option is attempted in the request - instead of simply being ignored.
def test_trivial_returnvalues(test_table_s):
# PutItem:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='NONE')
assert not 'Attributes' in ret
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='DOG')
# UpdateItem:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='NONE',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert not 'Attributes' in ret
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, ReturnValues='DOG',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})
# DeleteItem:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='NONE')
assert not 'Attributes' in ret
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='DOG')
# Test the ReturnValues parameter on a PutItem operation. Only two settings
# are supported for this parameter for this operation: NONE (the default)
# and ALL_OLD.
@pytest.mark.xfail(reason="ReturnValues not supported")
def test_put_item_returnvalues(test_table_s):
# By default, the previous value of an item is not returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert not 'Attributes' in ret
# Using ReturnValues=NONE is the same:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='NONE')
assert not 'Attributes' in ret
# With ReturnValues=ALL_OLD, the old value of the item is returned
# in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='ALL_OLD')
assert ret['Attributes'] == {'p': p, 'a': 'hi'}
# Other ReturnValue options - UPDATED_OLD, ALL_NEW, UPDATED_NEW,
# are supported by other operations but not by PutItem:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='UPDATED_OLD')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='ALL_NEW')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='UPDATED_NEW')
# Also, obviously, a non-supported setting "DOG" also returns in error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='DOG')
# The ReturnValues value is case sensitive, so while "NONE" is supported
# (and tested above), "none" isn't:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='none')
# Test the ReturnValues parameter on a DeleteItem operation. Only two settings
# are supported for this parameter for this operation: NONE (the default)
# and ALL_OLD.
@pytest.mark.xfail(reason="ReturnValues not supported")
def test_delete_item_returnvalues(test_table_s):
# By default, the previous value of an item is not returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p})
assert not 'Attributes' in ret
# Using ReturnValues=NONE is the same:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='NONE')
assert not 'Attributes' in ret
# With ReturnValues=ALL_OLD, the old value of the item is returned
# in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='ALL_OLD')
assert ret['Attributes'] == {'p': p, 'a': 'hi'}
# Other ReturnValue options - UPDATED_OLD, ALL_NEW, UPDATED_NEW,
# are supported by other operations but not by PutItem:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='UPDATE_OLD')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='ALL_NEW')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='UPDATE_NEW')
# Also, obviously, a non-supported setting "DOG" also returns in error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='DOG')
# The ReturnValues value is case sensitive, so while "NONE" is supported
# (and tested above), "none" isn't:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='none')
# Test the ReturnValues parameter on a UpdateItem operation. All five
# settings are supported for this parameter for this operation: NONE
# (the default), ALL_OLD, UPDATED_OLD, ALL_NEW and UPDATED_NEW.
@pytest.mark.xfail(reason="ReturnValues not supported")
def test_update_item_returnvalues(test_table_s):
# By default, the previous value of an item is not returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert not 'Attributes' in ret
# Using ReturnValues=NONE is the same:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='NONE',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert not 'Attributes' in ret
# With ReturnValues=ALL_OLD, the entire old value of the item (even
# attributes we did not modify) is returned in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='ALL_OLD',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert ret['Attributes'] == {'p': p, 'a': 'hi', 'b': 'dog'}
# With ReturnValues=UPDATED_OLD, only the overwritten attributes of the
# old item are returned in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
UpdateExpression='SET b = :val, c = :val2',
ExpressionAttributeValues={':val': 'cat', ':val2': 'hello'})
assert ret['Attributes'] == {'b': 'dog'}
# Even if an update overwrites an attribute by the same value again,
# this is considered an update, and the old value (identical to the
# new one) is returned:
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert ret['Attributes'] == {'b': 'cat'}
# Deleting an attribute also counts as overwriting it, of course:
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
UpdateExpression='REMOVE b')
assert ret['Attributes'] == {'b': 'cat'}
# With ReturnValues=ALL_NEW, the entire new value of the item (including
# old attributes we did not modify) is returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='ALL_NEW',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert ret['Attributes'] == {'p': p, 'a': 'hi', 'b': 'cat'}
# With ReturnValues=UPDATED_NEW, only the new value of the updated
# attributes are returned. Note that "updated attributes" means
# the newly set attributes - it doesn't require that these attributes
# have any previous values
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
UpdateExpression='SET b = :val, c = :val2',
ExpressionAttributeValues={':val': 'cat', ':val2': 'hello'})
assert ret['Attributes'] == {'b': 'cat', 'c': 'hello'}
# Deleting an attribute also counts as overwriting it, but the delete
# column is not returned in the response - so it's empty in this case.
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
UpdateExpression='REMOVE b')
assert not 'Attributes' in ret
# In the above examples, UPDATED_NEW is not useful because it just
# returns the new values we already know from the request... UPDATED_NEW
# becomes more useful in read-modify-write operations:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 1})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})
assert ret['Attributes'] == {'a': 2}
# A non-supported setting "DOG" also returns in error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, ReturnValues='DOG',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})
# The ReturnValues value is case sensitive, so while "NONE" is supported
# (and tested above), "none" isn't:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, ReturnValues='none',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})

View File

@@ -0,0 +1,252 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the Scan operation
import pytest
from botocore.exceptions import ClientError
from util import random_string, full_scan, full_scan_and_count, multiset
from boto3.dynamodb.conditions import Attr
# Test that scanning works fine with/without pagination
def test_scan_basic(filled_test_table):
test_table, items = filled_test_table
for limit in [None,1,2,4,33,50,100,9007,16*1024*1024]:
pos = None
got_items = []
while True:
if limit:
response = test_table.scan(Limit=limit, ExclusiveStartKey=pos) if pos else test_table.scan(Limit=limit)
assert len(response['Items']) <= limit
else:
response = test_table.scan(ExclusiveStartKey=pos) if pos else test_table.scan()
pos = response.get('LastEvaluatedKey', None)
got_items += response['Items']
if not pos:
break
assert len(items) == len(got_items)
assert multiset(items) == multiset(got_items)
def test_scan_with_paginator(dynamodb, filled_test_table):
test_table, items = filled_test_table
paginator = dynamodb.meta.client.get_paginator('scan')
got_items = []
for page in paginator.paginate(TableName=test_table.name):
got_items += page['Items']
assert len(items) == len(got_items)
assert multiset(items) == multiset(got_items)
for page_size in [1, 17, 1234]:
got_items = []
for page in paginator.paginate(TableName=test_table.name, PaginationConfig={'PageSize': page_size}):
got_items += page['Items']
assert len(items) == len(got_items)
assert multiset(items) == multiset(got_items)
# Although partitions are scanned in seemingly-random order, inside a
# partition items must be returned by Scan sorted in sort-key order.
# This test verifies this, for string sort key. We'll need separate
# tests for the other sort-key types (number and binary)
def test_scan_sort_order_string(filled_test_table):
test_table, items = filled_test_table
got_items = full_scan(test_table)
assert len(items) == len(got_items)
# Extract just the sort key ("c") from the partition "long"
items_long = [x['c'] for x in items if x['p'] == 'long']
got_items_long = [x['c'] for x in got_items if x['p'] == 'long']
# Verify that got_items_long are already sorted (in string order)
assert sorted(got_items_long) == got_items_long
# Verify that got_items_long are a sorted version of the expected items_long
assert sorted(items_long) == got_items_long
# Test Scan with the AttributesToGet parameter. Result should include the
# selected attributes only - if one wants the key attributes as well, one
# needs to select them explicitly. When no key attributes are selected,
# some items may have *none* of the selected attributes. Those items are
# returned too, as empty items - they are not outright missing.
def test_scan_attributes_to_get(dynamodb, filled_test_table):
table, items = filled_test_table
for wanted in [ ['another'], # only non-key attributes (one item doesn't have it!)
['c', 'another'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
print(wanted)
got_items = full_scan(table, AttributesToGet=wanted)
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
def test_scan_with_attribute_equality_filtering(dynamodb, filled_test_table):
table, items = filled_test_table
scan_filter = {
"attribute" : {
"AttributeValueList" : [ "xxxxx" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_scan(table, ScanFilter=scan_filter)
expected_items = [item for item in items if "attribute" in item.keys() and item["attribute"] == "xxxxx" ]
assert multiset(expected_items) == multiset(got_items)
scan_filter = {
"another" : {
"AttributeValueList" : [ "y" ],
"ComparisonOperator": "EQ"
},
"attribute" : {
"AttributeValueList" : [ "xxxxx" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_scan(table, ScanFilter=scan_filter)
expected_items = [item for item in items if "attribute" in item.keys() and item["attribute"] == "xxxxx" and item["another"] == "y" ]
assert multiset(expected_items) == multiset(got_items)
# Test that FilterExpression works as expected
@pytest.mark.xfail(reason="FilterExpression not supported yet")
def test_scan_filter_expression(filled_test_table):
test_table, items = filled_test_table
got_items = full_scan(test_table, FilterExpression=Attr("attribute").eq("xxxx"))
print(got_items)
assert multiset([item for item in items if 'attribute' in item.keys() and item['attribute'] == 'xxxx']) == multiset(got_items)
got_items = full_scan(test_table, FilterExpression=Attr("attribute").eq("xxxx") & Attr("another").eq("yy"))
print(got_items)
assert multiset([item for item in items if 'attribute' in item.keys() and 'another' in item.keys() and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)
def test_scan_with_key_equality_filtering(dynamodb, filled_test_table):
table, items = filled_test_table
scan_filter_p = {
"p" : {
"AttributeValueList" : [ "7" ],
"ComparisonOperator": "EQ"
}
}
scan_filter_c = {
"c" : {
"AttributeValueList" : [ "9" ],
"ComparisonOperator": "EQ"
}
}
scan_filter_p_and_attribute = {
"p" : {
"AttributeValueList" : [ "7" ],
"ComparisonOperator": "EQ"
},
"attribute" : {
"AttributeValueList" : [ "x"*7 ],
"ComparisonOperator": "EQ"
}
}
scan_filter_c_and_another = {
"c" : {
"AttributeValueList" : [ "9" ],
"ComparisonOperator": "EQ"
},
"another" : {
"AttributeValueList" : [ "y"*16 ],
"ComparisonOperator": "EQ"
}
}
# Filtering on the hash key
got_items = full_scan(table, ScanFilter=scan_filter_p)
expected_items = [item for item in items if "p" in item.keys() and item["p"] == "7" ]
assert multiset(expected_items) == multiset(got_items)
# Filtering on the sort key
got_items = full_scan(table, ScanFilter=scan_filter_c)
expected_items = [item for item in items if "c" in item.keys() and item["c"] == "9"]
assert multiset(expected_items) == multiset(got_items)
# Filtering on the hash key and an attribute
got_items = full_scan(table, ScanFilter=scan_filter_p_and_attribute)
expected_items = [item for item in items if "p" in item.keys() and "another" in item.keys() and item["p"] == "7" and item["another"] == "y"*16]
assert multiset(expected_items) == multiset(got_items)
# Filtering on the sort key and an attribute
got_items = full_scan(table, ScanFilter=scan_filter_c_and_another)
expected_items = [item for item in items if "c" in item.keys() and "another" in item.keys() and item["c"] == "9" and item["another"] == "y"*16]
assert multiset(expected_items) == multiset(got_items)
# Test the "Select" parameter of Scan. The default Select mode,
# ALL_ATTRIBUTES, returns items with all their attributes. Other modes
# allow returning just specific attributes or just counting the results
# without returning items at all.
@pytest.mark.xfail(reason="Select not supported yet")
def test_scan_select(filled_test_table):
test_table, items = filled_test_table
got_items = full_scan(test_table)
# By default, a scan returns all the items, with all their attributes:
# query returns all attributes:
got_items = full_scan(test_table)
assert multiset(items) == multiset(got_items)
# Select=ALL_ATTRIBUTES does exactly the same as the default - return
# all attributes:
got_items = full_scan(test_table, Select='ALL_ATTRIBUTES')
assert multiset(items) == multiset(got_items)
# Select=ALL_PROJECTED_ATTRIBUTES is not allowed on a base table (it
# is just for indexes, when IndexName is specified)
with pytest.raises(ClientError, match='ValidationException'):
full_scan(test_table, Select='ALL_PROJECTED_ATTRIBUTES')
# Select=SPECIFIC_ATTRIBUTES requires that either a AttributesToGet
# or ProjectionExpression appears, but then really does nothing beyond
# what AttributesToGet and ProjectionExpression already do:
with pytest.raises(ClientError, match='ValidationException'):
full_scan(test_table, Select='SPECIFIC_ATTRIBUTES')
wanted = ['c', 'another']
got_items = full_scan(test_table, Select='SPECIFIC_ATTRIBUTES', AttributesToGet=wanted)
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
got_items = full_scan(test_table, Select='SPECIFIC_ATTRIBUTES', ProjectionExpression=','.join(wanted))
assert multiset(expected_items) == multiset(got_items)
# Select=COUNT just returns a count - not any items
(got_count, got_items) = full_scan_and_count(test_table, Select='COUNT')
assert got_count == len(items)
assert got_items == []
# Check that we also get a count in regular scans - not just with
# Select=COUNT, but without Select=COUNT we both items and count:
(got_count, got_items) = full_scan_and_count(test_table)
assert got_count == len(items)
assert multiset(items) == multiset(got_items)
# Select with some unknown string generates a validation exception:
with pytest.raises(ClientError, match='ValidationException'):
full_scan(test_table, Select='UNKNOWN')
# Test parallel scan, i.e., the Segments and TotalSegments options.
# In the following test we check that these parameters allow splitting
# a scan into multiple parts, and that these parts are in fact disjoint,
# and their union is the entire contents of the table. We do not actually
# try to run these queries in *parallel* in this test.
@pytest.mark.xfail(reason="parallel scan not supported yet")
def test_scan_parallel(filled_test_table):
test_table, items = filled_test_table
for nsegments in [1, 2, 17]:
print('Testing TotalSegments={}'.format(nsegments))
got_items = []
for segment in range(nsegments):
got_items.extend(full_scan(test_table, TotalSegments=nsegments, Segment=segment))
# The following comparison verifies that each of the expected item
# in items was returned in one - and just one - of the segments.
assert multiset(items) == multiset(got_items)

View File

@@ -0,0 +1,276 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for basic table operations: CreateTable, DeleteTable, ListTables.
import pytest
from botocore.exceptions import ClientError
from util import list_tables, test_table_name, create_test_table, random_string
# Utility function for create a table with a given name and some valid
# schema.. This function initiates the table's creation, but doesn't
# wait for the table to actually become ready.
def create_table(dynamodb, name, BillingMode='PAY_PER_REQUEST', **kwargs):
return dynamodb.create_table(
TableName=name,
BillingMode=BillingMode,
KeySchema=[
{
'AttributeName': 'p',
'KeyType': 'HASH'
},
{
'AttributeName': 'c',
'KeyType': 'RANGE'
}
],
AttributeDefinitions=[
{
'AttributeName': 'p',
'AttributeType': 'S'
},
{
'AttributeName': 'c',
'AttributeType': 'S'
},
],
**kwargs
)
# Utility function for creating a table with a given name, and then deleting
# it immediately, waiting for these operations to complete. Since the wait
# uses DescribeTable, this function requires all of CreateTable, DescribeTable
# and DeleteTable to work correctly.
# Note that in DynamoDB, table deletion takes a very long time, so tests
# successfully using this function are very slow.
def create_and_delete_table(dynamodb, name, **kwargs):
table = create_table(dynamodb, name, **kwargs)
table.meta.client.get_waiter('table_exists').wait(TableName=name)
table.delete()
table.meta.client.get_waiter('table_not_exists').wait(TableName=name)
##############################################################################
# Test creating a table, and then deleting it, waiting for each operation
# to have completed before proceeding. Since the wait uses DescribeTable,
# this tests requires all of CreateTable, DescribeTable and DeleteTable to
# function properly in their basic use cases.
# Unfortunately, this test is extremely slow with DynamoDB because deleting
# a table is extremely slow until it really happens.
def test_create_and_delete_table(dynamodb):
create_and_delete_table(dynamodb, 'alternator_test')
# DynamoDB documentation specifies that table names must be 3-255 characters,
# and match the regex [a-zA-Z0-9._-]+. Names not matching these rules should
# be rejected, and no table be created.
def test_create_table_unsupported_names(dynamodb):
from botocore.exceptions import ParamValidationError, ClientError
# Intererstingly, the boto library tests for names shorter than the
# minimum length (3 characters) immediately, and failure results in
# ParamValidationError. But the other invalid names are passed to
# DynamoDB, which returns an HTTP response code, which results in a
# CientError exception.
with pytest.raises(ParamValidationError):
create_table(dynamodb, 'n')
with pytest.raises(ParamValidationError):
create_table(dynamodb, 'nn')
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, 'n' * 256)
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, 'nyh@test')
# On the other hand, names following the above rules should be accepted. Even
# names which the Scylla rules forbid, such as a name starting with .
def test_create_and_delete_table_non_scylla_name(dynamodb):
create_and_delete_table(dynamodb, '.alternator_test')
# names with 255 characters are allowed in Dynamo, but they are not currently
# supported in Scylla because we create a directory whose name is the table's
# name followed by 33 bytes (underscore and UUID). So currently, we only
# correctly support names with length up to 222.
def test_create_and_delete_table_very_long_name(dynamodb):
# In the future, this should work:
#create_and_delete_table(dynamodb, 'n' * 255)
# But for now, only 222 works:
create_and_delete_table(dynamodb, 'n' * 222)
# We cannot test the following on DynamoDB because it will succeed
# (DynamoDB allows up to 255 bytes)
#with pytest.raises(ClientError, match='ValidationException'):
# create_table(dynamodb, 'n' * 223)
# Tests creating a table with an invalid schema should return a
# ValidationException error.
def test_create_table_invalid_schema(dynamodb):
# The name of the table "created" by this test shouldn't matter, the
# creation should not succeed anyway.
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'HASH' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
{ 'AttributeName': 'z', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'z', 'AttributeType': 'S' }
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
],
AttributeDefinitions=[
{ 'AttributeName': 'z', 'AttributeType': 'S' }
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'k', 'KeyType': 'HASH' },
],
AttributeDefinitions=[
{ 'AttributeName': 'k', 'AttributeType': 'Q' }
],
)
# Test that trying to create a table that already exists fails in the
# appropriate way (ResourceInUseException)
def test_create_table_already_exists(dynamodb, test_table):
with pytest.raises(ClientError, match='ResourceInUseException'):
create_table(dynamodb, test_table.name)
# Test that BillingMode error path works as expected - only the values
# PROVISIONED or PAY_PER_REQUEST are allowed. The former requires
# ProvisionedThroughput to be set, the latter forbids it.
# If BillingMode is outright missing, it defaults (as original
# DynamoDB did) to PROVISIONED so ProvisionedThroughput is allowed.
def test_create_table_billing_mode_errors(dynamodb, test_table):
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(), BillingMode='unknown')
# billing mode is case-sensitive
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(), BillingMode='pay_per_request')
# PAY_PER_REQUEST cannot come with a ProvisionedThroughput:
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(),
BillingMode='PAY_PER_REQUEST', ProvisionedThroughput={'ReadCapacityUnits': 10, 'WriteCapacityUnits': 10})
# On the other hand, PROVISIONED requires ProvisionedThroughput:
# By the way, ProvisionedThroughput not only needs to appear, it must
# have both ReadCapacityUnits and WriteCapacityUnits - but we can't test
# this with boto3, because boto3 has its own verification that if
# ProvisionedThroughput is given, it must have the correct form.
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(), BillingMode='PROVISIONED')
# If BillingMode is completely missing, it defaults to PROVISIONED, so
# ProvisionedThroughput is required
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(TableName=test_table_name(),
KeySchema=[{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }])
# Our first implementation had a special column name called "attrs" where
# we stored a map for all non-key columns. If the user tried to name one
# of the key columns with this same name, the result was a disaster - Scylla
# goes into a bad state after trying to write data with two updates to same-
# named columns.
special_column_name1 = 'attrs'
special_column_name2 = ':attrs'
@pytest.fixture(scope="session")
def test_table_special_column_name(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[
{ 'AttributeName': special_column_name1, 'KeyType': 'HASH' },
{ 'AttributeName': special_column_name2, 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': special_column_name1, 'AttributeType': 'S' },
{ 'AttributeName': special_column_name2, 'AttributeType': 'S' },
],
)
yield table
table.delete()
@pytest.mark.xfail(reason="special attrs column not yet hidden correctly")
def test_create_table_special_column_name(test_table_special_column_name):
s = random_string()
c = random_string()
h = random_string()
expected = {special_column_name1: s, special_column_name2: c, 'hello': h}
test_table_special_column_name.put_item(Item=expected)
got = test_table_special_column_name.get_item(Key={special_column_name1: s, special_column_name2: c}, ConsistentRead=True)['Item']
assert got == expected
# Test that all tables we create are listed, and pagination works properly.
# Note that the DyanamoDB setup we run this against may have hundreds of
# other tables, for all we know. We just need to check that the tables we
# created are indeed listed.
def test_list_tables_paginated(dynamodb, test_table, test_table_s, test_table_b):
my_tables_set = {table.name for table in [test_table, test_table_s, test_table_b]}
for limit in [1, 2, 3, 4, 50, 100]:
print("testing limit={}".format(limit))
list_tables_set = set(list_tables(dynamodb, limit))
assert my_tables_set.issubset(list_tables_set)
# Test that pagination limit is validated
def test_list_tables_wrong_limit(dynamodb):
# lower limit (min. 1) is imposed by boto3 library checks
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.meta.client.list_tables(Limit=101)

View File

@@ -0,0 +1,854 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the UpdateItem operations with an UpdateExpression parameter
import random
import string
import pytest
from botocore.exceptions import ClientError
from decimal import Decimal
from util import random_string
# The simplest test of using UpdateExpression to set a top-level attribute,
# instead of the older AttributeUpdates parameter.
# Checks only one "SET" action in an UpdateExpression.
def test_update_expression_set(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}
# An empty UpdateExpression is NOT allowed, and generates a "The expression
# can not be empty" error. This contrasts with an empty AttributeUpdates which
# is allowed, and results in the creation of an empty item if it didn't exist
# yet (see test_empty_update()).
def test_update_expression_empty(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='')
# A basic test with multiple SET actions in one expression
def test_update_expression_set_multi(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET x = :val1, y = :val1',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'x': 4, 'y': 4}
# SET can be used to copy an existing attribute to a new one
def test_update_expression_set_copy(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b = a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello'}
# Copying an non-existing attribute generates an error
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = z')
# It turns out that attributes to be copied are read before the SET
# starts to write, so "SET x = :val1, y = x" does not work...
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET x = :val1, y = x', ExpressionAttributeValues={':val1': 4})
# SET z=z does nothing if z exists, or fails if it doesn't
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello'}
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET z = z')
# We can also use name references in either LHS or RHS of SET, e.g.,
# SET #one = #two. We need to also take the references used in the RHS
# when we want to complain about unused names in ExpressionAttributeNames.
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #one = #two',
ExpressionAttributeNames={'#one': 'c', '#two': 'a'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello', 'c': 'hello'}
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #one = #two',
ExpressionAttributeNames={'#one': 'c', '#two': 'a', '#three': 'z'})
# Test for read-before-write action where the value to be read is nested inside a - operator
def test_update_expression_set_nested_copy(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #n = :two',
ExpressionAttributeNames={'#n': 'n'}, ExpressionAttributeValues={':two': 2})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nn = :seven - #n',
ExpressionAttributeNames={'#nn': 'nn', '#n': 'n'}, ExpressionAttributeValues={':seven': 7})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'n': 2, 'nn': 5}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nnn = :nnn',
ExpressionAttributeNames={'#nnn': 'nnn'}, ExpressionAttributeValues={':nnn': [2,4]})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nnnn = list_append(:val1, #nnn)',
ExpressionAttributeNames={'#nnnn': 'nnnn', '#nnn': 'nnn'}, ExpressionAttributeValues={':val1': [1,3]})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'n': 2, 'nn': 5, 'nnn': [2,4], 'nnnn': [1,3,2,4]}
# Test for getting a key value with read-before-write
def test_update_expression_set_key(test_table_sn):
p = random_string()
test_table_sn.update_item(Key={'p': p, 'c': 7});
test_table_sn.update_item(Key={'p': p, 'c': 7}, UpdateExpression='SET #n = #p',
ExpressionAttributeNames={'#n': 'n', '#p': 'p'})
test_table_sn.update_item(Key={'p': p, 'c': 7}, UpdateExpression='SET #nn = #c + #c',
ExpressionAttributeNames={'#nn': 'nn', '#c': 'c'})
assert test_table_sn.get_item(Key={'p': p, 'c': 7}, ConsistentRead=True)['Item'] == {'p': p, 'c': 7, 'n': p, 'nn': 14}
# Simple test for the "REMOVE" action
def test_update_expression_remove(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hi'}
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hi'}
# Demonstrate that although all DynamoDB examples give UpdateExpression
# action names in uppercase - e.g., "SET", it can actually be any case.
def test_update_expression_action_case(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b = :val1', ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}
test_table_s.update_item(Key={'p': p}, UpdateExpression='set b = :val1', ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}
test_table_s.update_item(Key={'p': p}, UpdateExpression='sEt b = :val1', ExpressionAttributeValues={':val1': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 5}
# Demonstrate that whitespace is ignored in UpdateExpression parsing.
def test_update_expression_action_whitespace(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='set b = :val1', ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}
test_table_s.update_item(Key={'p': p}, UpdateExpression=' set b=:val1 ', ExpressionAttributeValues={':val1': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 5}
# In UpdateExpression, the attribute name can appear directly in the expression
# (without a "#placeholder" notation) only if it is a single "token" as
# determined by DynamoDB's lexical analyzer rules: Such token is composed of
# alphanumeric characters whose first character must be alphabetic. Other
# names cause the parser to see multiple tokens, and produce syntax errors.
def test_update_expression_name_token(test_table_s):
p = random_string()
# Alphanumeric names starting with an alphabetical character work
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET alnum = :val1', ExpressionAttributeValues={':val1': 1})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['alnum'] == 1
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET Alpha_Numeric_123 = :val1', ExpressionAttributeValues={':val1': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['Alpha_Numeric_123'] == 2
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET A123_ = :val1', ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['A123_'] == 3
# But alphanumeric names cannot start with underscore or digits.
# DynamoDB's lexical analyzer doesn't recognize them, and produces
# a ValidationException looking like:
# Invalid UpdateExpression: Syntax error; token: "_", near: "SET _123"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET _123 = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET _abc = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET 123a = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET 123 = :val1', ExpressionAttributeValues={':val1': 3})
# Various other non-alpha-numeric characters, split a token and NOT allowed
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET hi-there = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET hi$there = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET "hithere" = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET !hithere = :val1', ExpressionAttributeValues={':val1': 3})
# In addition to the literal names, DynamoDB also allows references to any
# name, using the "#reference" syntax. It turns out the reference name is
# also a token following the rules as above, with one interesting point:
# since "#" already started the token, the next character may be any
# alphanumeric and doesn't need to be only alphabetical.
# Note that the reference target - the actual attribute name - can include
# absolutely any characters, and we use silly_name below as an example
silly_name = '3can include any character!.#='
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #Alpha_Numeric_123 = :val1', ExpressionAttributeValues={':val1': 4}, ExpressionAttributeNames={'#Alpha_Numeric_123': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 4
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #123a = :val1', ExpressionAttributeValues={':val1': 5}, ExpressionAttributeNames={'#123a': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 5
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #123 = :val1', ExpressionAttributeValues={':val1': 6}, ExpressionAttributeNames={'#123': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 6
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #_ = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#_': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 7
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #hi-there = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#hi-there': silly_name})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #!hi = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#!hi': silly_name})
# Just a "#" is not enough as a token. Interestingly, DynamoDB will
# find the bad name in ExpressionAttributeNames before it actually tries
# to parse UpdateExpression, but we can verify the parse fails too by
# using a valid but irrelevant name in ExpressionAttributeNames:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET # = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#': silly_name})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET # = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#a': silly_name})
# There is also the value references, ":reference", for the right-hand
# side of an assignment. These have similar naming rules like "#reference".
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :Alpha_Numeric_123', ExpressionAttributeValues={':Alpha_Numeric_123': 8})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 8
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :123a', ExpressionAttributeValues={':123a': 9})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 9
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :123', ExpressionAttributeValues={':123': 10})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 10
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :_', ExpressionAttributeValues={':_': 11})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 11
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :hi!there', ExpressionAttributeValues={':hi!there': 12})
# Just a ":" is not enough as a token.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :', ExpressionAttributeValues={':': 7})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :', ExpressionAttributeValues={':a': 7})
# Trying to use a :reference on the left-hand side of an assignment will
# not work. In DynamoDB, it's a different type of token (and generates
# syntax error).
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET :a = :b', ExpressionAttributeValues={':a': 1, ':b': 2})
# Multiple actions are allowed in one expression, but actions are divided
# into clauses (SET, REMOVE, DELETE, ADD) and each of those can only appear
# once.
def test_update_expression_multi(test_table_s):
p = random_string()
# We can have two SET actions in one SET clause:
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1, b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 1, 'b': 2}
# But not two SET clauses - we get error "The "SET" section can only be used once in an update expression"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1 SET b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})
# We can have a REMOVE and a SET clause (note no comma between clauses):
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET b = :val2', ExpressionAttributeValues={':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = :val2 REMOVE b', ExpressionAttributeValues={':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'c': 3}
# The same clause (e.g., SET) cannot be used twice, even if interleaved with something else
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1 REMOVE a SET b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})
# Trying to modify the same item twice in the same update is forbidden.
# For "SET a=:v REMOVE a" DynamoDB says: "Invalid UpdateExpression: Two
# document paths overlap with each other; must remove or rewrite one of
# these paths; path one: [a], path two: [a]".
# It is actually good for Scylla that such updates are forbidden, because had
# we allowed "SET a=:v REMOVE a" the result would be surprising - because data
# wins over a delete with the same timestamp, so "a" would be set despite the
# REMOVE command appearing later in the command line.
def test_update_expression_multi_overlap(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
# Neither "REMOVE a SET a = :v" nor "SET a = :v REMOVE a" are allowed:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET a = :v', ExpressionAttributeValues={':v': 'hi'})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :v REMOVE a', ExpressionAttributeValues={':v': 'yo'})
# It's also not allowed to set a twice in the same clause
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :v1, a = :v2', ExpressionAttributeValues={':v1': 'yo', ':v2': 'he'})
# Obviously, the paths are compared after the name references are evaluated
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a1 = :v1, #a2 = :v2', ExpressionAttributeValues={':v1': 'yo', ':v2': 'he'}, ExpressionAttributeNames={'#a1': 'a', '#a2': 'a'})
# The problem isn't just with identical paths - we can't modify two paths that
# "overlap" in the sense that one is the ancestor of the other.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_multi_overlap_nested(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*overlap'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1, a.b = :val2',
ExpressionAttributeValues={':val1': {'b': 7}, ':val2': 'there'})
test_table_s.put_item(Item={'p': p, 'a': {'b': {'c': 2}}})
with pytest.raises(ClientError, match='ValidationException.*overlap'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.b = :val1, a.b.c = :val2',
ExpressionAttributeValues={':val1': 'hi', ':val2': 'there'})
# In the previous test we saw that *modifying* the same item twice in the same
# update is forbidden; But it is allowed to *read* an item in the same update
# that also modifies it, and we check this here.
def test_update_expression_multi_with_copy(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
# "REMOVE a SET b = a" works: as noted in test_update_expression_set_copy()
# the value of 'a' is read before the actual REMOVE operation happens.
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET b = a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = b REMOVE b')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'c': 'hello'}
# Test case where a :val1 is referenced, without being defined
def test_update_expression_set_missing_value(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1',
ExpressionAttributeValues={':val2': 4})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1')
# It is forbidden for ExpressionAttributeValues to contain values not used
# by the expression. DynamoDB produces an error like: "Value provided in
# ExpressionAttributeValues unused in expressions: keys: {:val1}"
def test_update_expression_spurious_value(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1',
ExpressionAttributeValues={':val1': 3, ':val2': 4})
# Test case where a #name is referenced, without being defined
def test_update_expression_set_missing_name(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET #name = :val1',
ExpressionAttributeValues={':val2': 4},
ExpressionAttributeNames={'#wrongname': 'hello'})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET #name = :val1',
ExpressionAttributeValues={':val2': 4})
# It is forbidden for ExpressionAttributeNames to contain names not used
# by the expression. DynamoDB produces an error like: "Value provided in
# ExpressionAttributeNames unused in expressions: keys: {#b}"
def test_update_expression_spurious_name(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a = :val1',
ExpressionAttributeNames={'#a': 'hello', '#b': 'hi'},
ExpressionAttributeValues={':val1': 3, ':val2': 4})
# Test that the key attributes (hash key or sort key) cannot be modified
# by an update
def test_update_expression_cannot_modify_key(test_table):
p = random_string()
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='SET p = :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='SET c = :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE p')
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE c')
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='ADD p :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='ADD c :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='DELETE p :val1', ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='DELETE c :val1', ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})
# As sanity check, verify we *can* modify a non-key column
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='SET a = :val1', ExpressionAttributeValues={':val1': 4})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 4}
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE a')
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c}
# Test that trying to start an expression with some nonsense like HELLO
# instead of SET, REMOVE, ADD or DELETE, fails.
def test_update_expression_non_existant_clause(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='HELLO b = :val1',
ExpressionAttributeValues={':val1': 4})
# Test support for "SET a = :val1 + :val2", "SET a = :val1 - :val2"
# Only exactly these combinations work - e.g., it's a syntax error to
# try to add three. Trying to add a string fails.
def test_update_expression_plus_basic(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': 4, ':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 7}
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 - :val2',
ExpressionAttributeValues={':val1': 5, ':val2': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}
# Only the addition of exactly two values is supported!
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2 + :val3',
ExpressionAttributeValues={':val1': 4, ':val2': 3, ':val3': 2})
# Only numeric values can be added - other things like strings or lists
# cannot be added, and we get an error like "Incorrect operand type for
# operator or function; operator or function: +, operand type: S".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': 'dog', ':val2': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['1', '2']})
# While most of the Alternator code just saves high-precision numbers
# unchanged, the "+" and "-" operations need to calculate with them, and
# we should check the calculation isn't done with some lower-precision
# representation, e.g., double
def test_update_expression_plus_precision(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': Decimal("1"), ':val2': Decimal("10000000000000000000000")})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': Decimal("10000000000000000000001")}
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val2 - :val1',
ExpressionAttributeValues={':val1': Decimal("1"), ':val2': Decimal("10000000000000000000000")})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': Decimal("9999999999999999999999")}
# Test support for "SET a = b + :val2" et al., i.e., a version of the
# above test_update_expression_plus_basic with read before write.
def test_update_expression_plus_rmw(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = a + :val1',
ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 5
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = :val1 + a',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 9
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + a',
ExpressionAttributeValues={':val1': 1})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 10
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = b + a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 19
# Test the list_append() function in SET, for the most basic use case of
# concatenating two value references. Because this is the first test of
# functions in SET, we also test some generic features of how functions
# are parsed.
def test_update_expression_list_append_basic(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': ['hi', 7]})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': [4, 'hello', 'hi', 7]}
# Unlike the operation name "SET", function names are case-sensitive!
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = LIST_APPEND(:val1, :val2)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': ['hi', 7]})
# As usual, spaces are ignored by the parser
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2)',
ExpressionAttributeValues={':val1': ['a'], ':val2': ['b']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['a', 'b']}
# The list_append function only allows two parameters. The parser can
# correctly parse fewer or more, but then an error is generated: "Invalid
# UpdateExpression: Incorrect number of operands for operator or function;
# operator or function: list_append, number of operands: 1".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1)',
ExpressionAttributeValues={':val1': ['a']})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2, :val3)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': [7], ':val3': ['a']})
# If list_append is used on value which isn't a list, we get
# error: "Invalid UpdateExpression: Incorrect operand type for operator
# or function; operator or function: list_append, operand type: S"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': 'hi'})
# Additional list_append() tests, also using attribute paths as parameters
# (i.e., read-modify-write).
def test_update_expression_list_append(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = :val1',
ExpressionAttributeValues={':val1': ['hi', 2]})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] ==['hi', 2]
# Often, list_append is used to append items to a list attribute
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(a, :val1)',
ExpressionAttributeValues={':val1': [4, 'hello']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['hi', 2, 4, 'hello']
# But it can also be used to just concatenate in other ways:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, a)',
ExpressionAttributeValues={':val1': ['dog']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['dog', 'hi', 2, 4, 'hello']
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = list_append(a, :val1)',
ExpressionAttributeValues={':val1': ['cat']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == ['dog', 'hi', 2, 4, 'hello', 'cat']
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET c = list_append(a, b)')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['c'] == ['dog', 'hi', 2, 4, 'hello', 'dog', 'hi', 2, 4, 'hello', 'cat']
# As usual, #references are allowed instead of inline names:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET #name1 = list_append(#name2,:val1)',
ExpressionAttributeValues={':val1': [8]},
ExpressionAttributeNames={'#name1': 'a', '#name2': 'a'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['dog', 'hi', 2, 4, 'hello', 8]
# Test the "if_not_exists" function in SET
# The test also checks additional features of function-call parsing.
def test_update_expression_if_not_exists(test_table_s):
p = random_string()
# Since attribute a doesn't exist, set it:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = if_not_exists(a, :val1)',
ExpressionAttributeValues={':val1': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
# Now the attribute does exist, so set does nothing:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = if_not_exists(a, :val1)',
ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
# if_not_exists can also be used to check one attribute and set another,
# but note that if_not_exists(a, :val) means a's value if it exists,
# otherwise :val!
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(c, :val1)',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 4
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(c, :val1)',
ExpressionAttributeValues={':val1': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 5
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(a, :val1)',
ExpressionAttributeValues={':val1': 6})
# note how because 'a' does exist, its value is copied, overwriting b's
# value:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 2
# The parser expects function parameters to be value references, paths,
# or nested call to functions. Other crap will cause syntax errors:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(non@sense, :val1)',
ExpressionAttributeValues={':val1': 6})
# if_not_exists() requires that the first parameter be a path. However,
# the parser doesn't know this, and allows for a function parameter
# also a value reference or a function call. If try one of these other
# things the parser succeeds, but we get a later error, looking like:
# "Invalid UpdateExpression: Operator or function requires a document
# path; operator or function: if_not_exists"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(if_not_exists(a, :val2), :val1)',
ExpressionAttributeValues={':val1': 6, ':val2': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(:val2, :val1)',
ExpressionAttributeValues={':val1': 6, ':val2': 3})
# Surprisingly, if the wrong argument is a :val value reference, the
# parser first tries to look it up in ExpressionAttributeValues (and
# fails if it's missing), before realizing any value reference would be
# wrong... So the following fails like the above does - but with a
# different error message (which we do not check here): "Invalid
# UpdateExpression: An expression attribute value used in expression
# is not defined; attribute value: :val2"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(:val2, :val1)',
ExpressionAttributeValues={':val1': 6})
# When the expression parser parses a function call f(value, value), each
# value may itself be a function call - ad infinitum. So expressions like
# list_append(if_not_exists(a, :val1), :val2) are legal and so is deeper
# nesting.
@pytest.mark.xfail(reason="for unknown reason, DynamoDB does not allow nesting list_append")
def test_update_expression_function_nesting(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(if_not_exists(a, :val1), :val2)',
ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['cat', 'dog']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['a', 'b', 'cat', 'dog']
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(if_not_exists(a, :val1), :val2)',
ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['1', '2']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['a', 'b', 'cat', 'dog', '1', '2']
# I don't understand why the following expression isn't accepted, but it
# isn't! It produces a "Invalid UpdateExpression: The function is not
# allowed to be used this way in an expression; function: list_append".
# I don't know how to explain it. In any case, the *parsing* works -
# this is not a syntax error - the failure is in some verification later.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(list_append(:val1, :val2), :val3)',
ExpressionAttributeValues={':val1': ['a'], ':val2': ['1'], ':val3': ['hi']})
# Ditto, the following passes the parser but fails some later check with
# the same error message as above.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(list_append(list_append(:val1, :val2), :val3), :val4)',
ExpressionAttributeValues={':val1': ['a'], ':val2': ['1'], ':val3': ['hi'], ':val4': ['yo']})
# Verify how in SET expressions, "+" (or "-") nests with functions.
# We discover that f(x)+f(y) works but f(x+y) does NOT (results in a syntax
# error on the "+"). This means that the parser has two separate rules:
# 1. set_action: SET path = value + value
# 2. value: VALREF | NAME | NAME (value, ...)
def test_update_expression_function_plus_nesting(test_table_s):
p = random_string()
# As explained above, this - with "+" outside the expression, works:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(b, :val1)+:val2',
ExpressionAttributeValues={':val1': 2, ':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 5
# ...but this - with the "+" inside an expression parameter, is a syntax
# error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET c = if_not_exists(c, :val1+:val2)',
ExpressionAttributeValues={':val1': 5, ':val2': 4})
# This test tries to use an undefined function "f". This, obviously, fails,
# but where we to actually print the error we would see "Invalid
# UpdateExpression: Invalid function name; function: f". Not a syntax error.
# This means that the parser accepts any alphanumeric name as a function
# name, and only later use of this function fails because it's not one of
# the supported file.
def test_update_expression_unknown_function(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*f'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = f(b,c,d)')
with pytest.raises(ClientError, match='ValidationException.*f123_hi'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = f123_hi(b,c,d)')
# Just like unreferenced column names parsed by the DynamoDB parser,
# function names must also start with an alphabetic character. Trying
# to use _f as a function name will result with an actual syntax error,
# on the "_" token.
with pytest.raises(ClientError, match='ValidationException.*yntax error'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = _f(b,c,d)')
# Test "ADD" operation for numbers
def test_update_expression_add_numbers(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 3, 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 7
# If the value to be added isn't a number, we get an error like "Invalid
# UpdateExpression: Incorrect operand type for operator or function;
# operator: ADD, operand type: STRING".
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': 'hello'})
# Similarly, if the attribute we're adding to isn't a number, we get an
# error like "An operand in the update expression has an incorrect data
# type"
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD b :val1',
ExpressionAttributeValues={':val1': 1})
# Test "ADD" operation for sets
def test_update_expression_add_sets(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': set(['dog', 'cat', 'mouse']), 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': set(['pig'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog', 'cat', 'mouse', 'pig'])
# TODO: right now this test won't detect duplicated values in the returned result,
# because boto3 parses a set out of the returned JSON anyway. This check should leverage
# lower level API (if exists) to ensure that the JSON contains no duplicates
# in the set representation. It has been verified manually.
test_table_s.put_item(Item={'p': p, 'a': set(['beaver', 'lynx', 'coati']), 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': set(['coati', 'beaver', 'badger'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['beaver', 'badger', 'lynx', 'coati'])
# The value to be added needs to be a set of the same type - it can't
# be a single element or anything else. If the value has the wrong type,
# we get an error like "Invalid UpdateExpression: Incorrect operand type
# for operator or function; operator: ADD, operand type: STRING".
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': 'hello'})
# Test "DELETE" operation for sets
def test_update_expression_delete_sets(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': set(['dog', 'cat', 'mouse']), 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='DELETE a :val1',
ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog'])
# Deleting an element not present in the set is not an error - it just
# does nothing
test_table_s.update_item(Key={'p': p},
UpdateExpression='DELETE a :val1',
ExpressionAttributeValues={':val1': set(['pig'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog'])
# The value to be deleted must be a set of the same type - it can't
# be a single element or anything else. If the value has the wrong type,
# we get an error like "Invalid UpdateExpression: Incorrect operand type
# for operator or function; operator: DELETE, operand type: STRING".
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='DELETE a :val1',
ExpressionAttributeValues={':val1': 'hello'})
######## Tests for paths and nested attribute updates:
# A dot inside a name in ExpressionAttributeNames is a literal dot, and
# results in a top-level attribute with an actual dot in its name - not
# a nested attribute path.
def test_update_expression_dot_in_name(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a = :val1',
ExpressionAttributeValues={':val1': 3},
ExpressionAttributeNames={'#a': 'a.b'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a.b': 3}
# A basic test for direct update of a nested attribute: One of the top-level
# attributes is itself a document, and we update only one of that document's
# nested attributes.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_attribute_dot(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',
ExpressionAttributeValues={':val1': 7})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7}, 'd': 5}
# Of course we can also add new nested attributes, not just modify
# existing ones:
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.d = :val1',
ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7, 'd': 3}, 'd': 5}
# Similar test, for a list: one of the top-level attributes is a list, we
# can update one of its items.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_attribute_index(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': ['one', 'two', 'three']})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[1] = :val1',
ExpressionAttributeValues={':val1': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'hello', 'three']}
# Test that just like happens in top-level attributes, also in nested
# attributes, setting them replaces the old value - potentially an entire
# nested document, by the whole value (which may have a different type)
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_different_type(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': {'one': 1, 'two': 2}}})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',
ExpressionAttributeValues={':val1': 7})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7}}
# Yet another test of a nested attribute update. This one uses deeper
# level of nesting (dots and indexes), adds #name references to the mix.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_deep(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': ['hi', {'x': {'y': [3, 5, 7]}}]}})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c[1].#name.y[1] = :val1',
ExpressionAttributeValues={':val1': 9}, ExpressionAttributeNames={'#name': 'x'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == {'b': 3, 'c': ['hi', {'x': {'y': [3, 9, 7]}}]}
# A deep path can also appear on the right-hand-side of an assignment
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.z = a.c[1].#name.y[1]',
ExpressionAttributeNames={'#name': 'x'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a']['z'] == 9
# A REMOVE operation can be used to remove nested attributes, and also
# individual list items.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_remove(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': ['hi', {'x': {'y': [3, 5, 7]}, 'q': 2}]}})
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a.c[1].x.y[1], a.c[1].q')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == {'b': 3, 'c': ['hi', {'x': {'y': [3, 7]}}]}
# The DynamoDB documentation specifies: "When you use SET to update a list
# element, the contents of that element are replaced with the new data that
# you specify. If the element does not already exist, SET will append the
# new element at the end of the list."
# So if we take a three-element list a[7], and set a[7], the new element
# will be put at the end of the list, not position 7 specifically.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_nested_attribute_update_array_out_of_bounds(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': ['one', 'two', 'three']})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[7] = :val1',
ExpressionAttributeValues={':val1': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'two', 'three', 'hello']}
# The DynamoDB documentation also says: "If you add multiple elements
# in a single SET operation, the elements are sorted in order by element
# number.
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[84] = :val1, a[37] = :val2',
ExpressionAttributeValues={':val1': 'a1', ':val2': 'a2'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'two', 'three', 'hello', 'a2', 'a1']}
# Test what happens if we try to write to a.b, which would only make sense if
# a were a nested document, but a doesn't exist, or exists and is NOT a nested
# document but rather a scalar or list or something.
# DynamoDB actually detects this case and prints an error:
# ClientError: An error occurred (ValidationException) when calling the
# UpdateItem operation: The document path provided in the update expression
# is invalid for update
# Because Scylla doesn't read before write, it cannot detect this as an error,
# so we'll probably want to allow for that possibility as well.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_nested_attribute_update_bad_path_dot(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': ['hi']})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',
ExpressionAttributeValues={':val1': 7})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b.c = :val1',
ExpressionAttributeValues={':val1': 7})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c.c = :val1',
ExpressionAttributeValues={':val1': 7})
# Similarly for other types of bad paths - using [0] on something which
# isn't an array,
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_nested_attribute_update_bad_path_array(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[0] = :val1',
ExpressionAttributeValues={':val1': 7})

141
alternator-test/util.py Normal file
View File

@@ -0,0 +1,141 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Various utility functions which are useful for multiple tests
import string
import random
import collections
import time
def random_string(length=10, chars=string.ascii_uppercase + string.digits):
return ''.join(random.choice(chars) for x in range(length))
def random_bytes(length=10):
return bytearray(random.getrandbits(8) for _ in range(length))
# Utility functions for scan and query into an array of items:
# TODO: add to full_scan and full_query by default ConsistentRead=True, as
# it's not useful for tests without it!
def full_scan(table, **kwargs):
response = table.scan(**kwargs)
items = response['Items']
while 'LastEvaluatedKey' in response:
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
items.extend(response['Items'])
return items
# full_scan_and_count returns both items and count as returned by the server.
# Note that count isn't simply len(items) - the server returns them
# independently. e.g., with Select='COUNT' the items are not returned, but
# count is.
def full_scan_and_count(table, **kwargs):
response = table.scan(**kwargs)
items = []
count = 0
if 'Items' in response:
items.extend(response['Items'])
if 'Count' in response:
count = count + response['Count']
while 'LastEvaluatedKey' in response:
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
if 'Items' in response:
items.extend(response['Items'])
if 'Count' in response:
count = count + response['Count']
return (count, items)
# Utility function for fetching the entire results of a query into an array of items
def full_query(table, **kwargs):
response = table.query(**kwargs)
items = response['Items']
while 'LastEvaluatedKey' in response:
response = table.query(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
items.extend(response['Items'])
return items
# To compare two lists of items (each is a dict) without regard for order,
# "==" is not good enough because it will fail if the order is different.
# The following function, multiset() converts the list into a multiset
# (set with duplicates) where order doesn't matter, so the multisets can
# be compared.
def freeze(item):
if isinstance(item, dict):
return frozenset((key, freeze(value)) for key, value in item.items())
elif isinstance(item, list):
return tuple(freeze(value) for value in item)
return item
def multiset(items):
return collections.Counter([freeze(item) for item in items])
test_table_prefix = 'alternator_test_'
def test_table_name():
current_ms = int(round(time.time() * 1000))
# In the off chance that test_table_name() is called twice in the same millisecond...
if test_table_name.last_ms >= current_ms:
current_ms = test_table_name.last_ms + 1
test_table_name.last_ms = current_ms
return test_table_prefix + str(current_ms)
test_table_name.last_ms = 0
def create_test_table(dynamodb, **kwargs):
name = test_table_name()
print("fixture creating new table {}".format(name))
table = dynamodb.create_table(TableName=name,
BillingMode='PAY_PER_REQUEST', **kwargs)
waiter = table.meta.client.get_waiter('table_exists')
# recheck every second instead of the default, lower, frequency. This can
# save a few seconds on AWS with its very slow table creation, but can
# more on tests on Scylla with its faster table creation turnaround.
waiter.config.delay = 1
waiter.config.max_attempts = 200
waiter.wait(TableName=name)
return table
# DynamoDB's ListTables request returns up to a single page of table names
# (e.g., up to 100) and it is up to the caller to call it again and again
# to get the next page. This is a utility function which calls it repeatedly
# as much as necessary to get the entire list.
# We deliberately return a list and not a set, because we want the caller
# to be able to recognize bugs in ListTables which causes the same table
# to be returned twice.
def list_tables(dynamodb, limit=100):
ret = []
pos = None
while True:
if pos:
page = dynamodb.meta.client.list_tables(Limit=limit, ExclusiveStartTableName=pos);
else:
page = dynamodb.meta.client.list_tables(Limit=limit);
results = page.get('TableNames', None)
assert(results)
ret = ret + results
newpos = page.get('LastEvaluatedTableName', None)
if not newpos:
break;
# It doesn't make sense for Dynamo to tell us we need more pages, but
# not send anything in *this* page!
assert len(results) > 0
assert newpos != pos
# Note that we only checked that we got back tables, not that we got
# any new tables not already in ret. So a buggy implementation might
# still cause an endless loop getting the same tables again and again.
pos = newpos
return ret

147
alternator/auth.cc Normal file
View File

@@ -0,0 +1,147 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "alternator/error.hh"
#include "log.hh"
#include <string>
#include <string_view>
#include <gnutls/crypto.h>
#include <seastar/util/defer.hh>
#include "hashers.hh"
#include "bytes.hh"
#include "alternator/auth.hh"
#include <fmt/format.h>
#include "auth/common.hh"
#include "auth/password_authenticator.hh"
#include "auth/roles-metadata.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
namespace alternator {
static logging::logger alogger("alternator-auth");
static hmac_sha256_digest hmac_sha256(std::string_view key, std::string_view msg) {
hmac_sha256_digest digest;
int ret = gnutls_hmac_fast(GNUTLS_MAC_SHA256, key.data(), key.size(), msg.data(), msg.size(), digest.data());
if (ret) {
throw std::runtime_error(fmt::format("Computing HMAC failed ({}): {}", ret, gnutls_strerror(ret)));
}
return digest;
}
static hmac_sha256_digest get_signature_key(std::string_view key, std::string_view date_stamp, std::string_view region_name, std::string_view service_name) {
auto date = hmac_sha256("AWS4" + std::string(key), date_stamp);
auto region = hmac_sha256(std::string_view(date.data(), date.size()), region_name);
auto service = hmac_sha256(std::string_view(region.data(), region.size()), service_name);
auto signing = hmac_sha256(std::string_view(service.data(), service.size()), "aws4_request");
return signing;
}
static std::string apply_sha256(std::string_view msg) {
sha256_hasher hasher;
hasher.update(msg.data(), msg.size());
return to_hex(hasher.finalize());
}
static std::string format_time_point(db_clock::time_point tp) {
time_t time_point_repr = db_clock::to_time_t(tp);
std::string time_point_str;
time_point_str.resize(17);
::tm time_buf;
// strftime prints the terminating null character as well
std::strftime(time_point_str.data(), time_point_str.size(), "%Y%m%dT%H%M%SZ", ::gmtime_r(&time_point_repr, &time_buf));
time_point_str.resize(16);
return time_point_str;
}
void check_expiry(std::string_view signature_date) {
//FIXME: The default 15min can be changed with X-Amz-Expires header - we should honor it
std::string expiration_str = format_time_point(db_clock::now() - 15min);
std::string validity_str = format_time_point(db_clock::now() + 15min);
if (signature_date < expiration_str) {
throw api_error("InvalidSignatureException",
fmt::format("Signature expired: {} is now earlier than {} (current time - 15 min.)",
signature_date, expiration_str));
}
if (signature_date > validity_str) {
throw api_error("InvalidSignatureException",
fmt::format("Signature not yet current: {} is still later than {} (current time + 15 min.)",
signature_date, validity_str));
}
}
std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string) {
auto amz_date_it = signed_headers_map.find("x-amz-date");
if (amz_date_it == signed_headers_map.end()) {
throw api_error("InvalidSignatureException", "X-Amz-Date header is mandatory for signature verification");
}
std::string_view amz_date = amz_date_it->second;
check_expiry(amz_date);
std::string_view datestamp = amz_date.substr(0, 8);
if (datestamp != orig_datestamp) {
throw api_error("InvalidSignatureException",
format("X-Amz-Date date does not match the provided datestamp. Expected {}, got {}",
orig_datestamp, datestamp));
}
std::string_view canonical_uri = "/";
std::stringstream canonical_headers;
for (const auto& header : signed_headers_map) {
canonical_headers << fmt::format("{}:{}", header.first, header.second) << '\n';
}
std::string payload_hash = apply_sha256(body_content);
std::string canonical_request = fmt::format("{}\n{}\n{}\n{}\n{}\n{}", method, canonical_uri, query_string, canonical_headers.str(), signed_headers_str, payload_hash);
std::string_view algorithm = "AWS4-HMAC-SHA256";
std::string credential_scope = fmt::format("{}/{}/{}/aws4_request", datestamp, region, service);
std::string string_to_sign = fmt::format("{}\n{}\n{}\n{}", algorithm, amz_date, credential_scope, apply_sha256(canonical_request));
hmac_sha256_digest signing_key = get_signature_key(secret_access_key, datestamp, region, service);
hmac_sha256_digest signature = hmac_sha256(std::string_view(signing_key.data(), signing_key.size()), string_to_sign);
return to_hex(bytes_view(reinterpret_cast<const int8_t*>(signature.data()), signature.size()));
}
future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username) {
static const sstring query = format("SELECT salted_hash FROM {} WHERE {} = ?",
auth::meta::roles_table::qualified_name(), auth::meta::roles_table::role_col_name);
auto cl = auth::password_authenticator::consistency_for_user(username);
auto timeout = auth::internal_distributed_timeout_config();
return qp.process(query, cl, timeout, {sstring(username)}, true).then_wrapped([username = std::move(username)] (future<::shared_ptr<cql3::untyped_result_set>> f) {
auto res = f.get0();
auto salted_hash = std::optional<sstring>();
if (res->empty()) {
throw api_error("UnrecognizedClientException", fmt::format("User not found: {}", username));
}
salted_hash = res->one().get_opt<sstring>("salted_hash");
if (!salted_hash) {
throw api_error("UnrecognizedClientException", fmt::format("No password found for user: {}", username));
}
return make_ready_future<std::string>(*salted_hash);
});
}
}

46
alternator/auth.hh Normal file
View File

@@ -0,0 +1,46 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string>
#include <string_view>
#include <array>
#include "gc_clock.hh"
#include "utils/loading_cache.hh"
namespace cql3 {
class query_processor;
}
namespace alternator {
using hmac_sha256_digest = std::array<char, 32>;
using key_cache = utils::loading_cache<std::string, std::string>;
std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string);
future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username);
}

111
alternator/base64.cc Normal file
View File

@@ -0,0 +1,111 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
// The DynamoAPI dictates that "binary" (a.k.a. "bytes" or "blob") values
// be encoded in the JSON API as base64-encoded strings. This is code to
// convert byte arrays to base64-encoded strings, and back.
#include "base64.hh"
#include <ctype.h>
// Arrays for quickly converting to and from an integer between 0 and 63,
// and the character used in base64 encoding to represent it.
static class base64_chars {
public:
static constexpr const char* to =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
int8_t from[255];
base64_chars() {
static_assert(strlen(to) == 64);
for (int i = 0; i < 255; i++) {
from[i] = 255; // signal invalid character
}
for (int i = 0; i < 64; i++) {
from[(unsigned) to[i]] = i;
}
}
} base64_chars;
std::string base64_encode(bytes_view in) {
std::string ret;
ret.reserve(((4 * in.size() / 3) + 3) & ~3);
int i = 0;
unsigned char chunk3[3]; // chunk of input
for (auto byte : in) {
chunk3[i++] = byte;
if (i == 3) {
ret += base64_chars.to[ (chunk3[0] & 0xfc) >> 2 ];
ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];
ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];
ret += base64_chars.to[ chunk3[2] & 0x3f ];
i = 0;
}
}
if (i) {
// i can be 1 or 2.
for(int j = i; j < 3; j++)
chunk3[j] = '\0';
ret += base64_chars.to[ ( chunk3[0] & 0xfc) >> 2 ];
ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];
if (i == 2) {
ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];
} else {
ret += '=';
}
ret += '=';
}
return ret;
}
bytes base64_decode(std::string_view in) {
int i = 0;
int8_t chunk4[4]; // chunk of input, each byte converted to 0..63;
std::string ret;
ret.reserve(in.size() * 3 / 4);
for (unsigned char c : in) {
uint8_t dc = base64_chars.from[c];
if (dc == 255) {
// Any unexpected character, include the "=" character usually
// used for padding, signals the end of the decode.
break;
}
chunk4[i++] = dc;
if (i == 4) {
ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);
ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);
ret += ((chunk4[2] & 0x3) << 6) + chunk4[3];
i = 0;
}
}
if (i) {
// i can be 2 or 3, meaning 1 or 2 more output characters
if (i>=2)
ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);
if (i==3)
ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);
}
// FIXME: This copy is sad. The problem is we need back "bytes"
// but "bytes" doesn't have efficient append and std::string.
// To fix this we need to use bytes' "uninitialized" feature.
return bytes(ret.begin(), ret.end());
}

34
alternator/base64.hh Normal file
View File

@@ -0,0 +1,34 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string_view>
#include "bytes.hh"
#include "rjson.hh"
std::string base64_encode(bytes_view);
bytes base64_decode(std::string_view);
inline bytes base64_decode(const rjson::value& v) {
return base64_decode(std::string_view(v.GetString(), v.GetStringLength()));
}

564
alternator/conditions.cc Normal file
View File

@@ -0,0 +1,564 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <list>
#include <map>
#include <string_view>
#include "alternator/conditions.hh"
#include "alternator/error.hh"
#include "cql3/constants.hh"
#include <unordered_map>
#include "rjson.hh"
#include "serialization.hh"
#include "base64.hh"
#include <stdexcept>
namespace alternator {
static logging::logger clogger("alternator-conditions");
comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator) {
static std::unordered_map<std::string, comparison_operator_type> ops = {
{"EQ", comparison_operator_type::EQ},
{"NE", comparison_operator_type::NE},
{"LE", comparison_operator_type::LE},
{"LT", comparison_operator_type::LT},
{"GE", comparison_operator_type::GE},
{"GT", comparison_operator_type::GT},
{"IN", comparison_operator_type::IN},
{"NULL", comparison_operator_type::IS_NULL},
{"NOT_NULL", comparison_operator_type::NOT_NULL},
{"BETWEEN", comparison_operator_type::BETWEEN},
{"BEGINS_WITH", comparison_operator_type::BEGINS_WITH},
{"CONTAINS", comparison_operator_type::CONTAINS},
{"NOT_CONTAINS", comparison_operator_type::NOT_CONTAINS},
};
if (!comparison_operator.IsString()) {
throw api_error("ValidationException", format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));
}
std::string op = comparison_operator.GetString();
auto it = ops.find(op);
if (it == ops.end()) {
throw api_error("ValidationException", format("Unsupported comparison operator {}", op));
}
return it->second;
}
static ::shared_ptr<cql3::restrictions::single_column_restriction::contains> make_map_element_restriction(const column_definition& cdef, std::string_view key, const rjson::value& value) {
bytes raw_key = utf8_type->from_string(sstring_view(key.data(), key.size()));
auto key_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_key)));
bytes raw_value = serialize_item(value);
auto entry_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));
return make_shared<cql3::restrictions::single_column_restriction::contains>(cdef, std::move(key_value), std::move(entry_value));
}
static ::shared_ptr<cql3::restrictions::single_column_restriction::EQ> make_key_eq_restriction(const column_definition& cdef, const rjson::value& value) {
bytes raw_value = get_key_from_typed_value(value, cdef, type_to_string(cdef.type));
auto restriction_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));
return make_shared<cql3::restrictions::single_column_restriction::EQ>(cdef, std::move(restriction_value));
}
::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter) {
clogger.trace("Getting filtering restrictions for: {}", rjson::print(query_filter));
auto filtering_restrictions = ::make_shared<cql3::restrictions::statement_restrictions>(schema, true);
for (auto it = query_filter.MemberBegin(); it != query_filter.MemberEnd(); ++it) {
std::string_view column_name(it->name.GetString(), it->name.GetStringLength());
const rjson::value& condition = it->value;
const rjson::value& comp_definition = rjson::get(condition, "ComparisonOperator");
const rjson::value& attr_list = rjson::get(condition, "AttributeValueList");
comparison_operator_type op = get_comparison_operator(comp_definition);
if (op != comparison_operator_type::EQ) {
throw api_error("ValidationException", "Filtering is currently implemented for EQ operator only");
}
if (attr_list.Size() != 1) {
throw api_error("ValidationException", format("EQ restriction needs exactly 1 attribute value: {}", rjson::print(attr_list)));
}
if (const column_definition* cdef = schema->get_column_definition(to_bytes(column_name.data()))) {
// Primary key restriction
filtering_restrictions->add_restriction(make_key_eq_restriction(*cdef, attr_list[0]), false, true);
} else {
// Regular column restriction
filtering_restrictions->add_restriction(make_map_element_restriction(attrs_col, column_name, attr_list[0]), false, true);
}
}
return filtering_restrictions;
}
namespace {
struct size_check {
// True iff size passes this check.
virtual bool operator()(rapidjson::SizeType size) const = 0;
// Check description, such that format("expected array {}", check.what()) is human-readable.
virtual sstring what() const = 0;
};
class exact_size : public size_check {
rapidjson::SizeType _expected;
public:
explicit exact_size(rapidjson::SizeType expected) : _expected(expected) {}
bool operator()(rapidjson::SizeType size) const override { return size == _expected; }
sstring what() const override { return format("of size {}", _expected); }
};
struct empty : public size_check {
bool operator()(rapidjson::SizeType size) const override { return size < 1; }
sstring what() const override { return "to be empty"; }
};
struct nonempty : public size_check {
bool operator()(rapidjson::SizeType size) const override { return size > 0; }
sstring what() const override { return "to be non-empty"; }
};
} // anonymous namespace
// Check that array has the expected number of elements
static void verify_operand_count(const rjson::value* array, const size_check& expected, const rjson::value& op) {
if (!array || !array->IsArray()) {
throw api_error("ValidationException", "With ComparisonOperator, AttributeValueList must be given and an array");
}
if (!expected(array->Size())) {
throw api_error("ValidationException",
format("{} operator requires AttributeValueList {}, instead found list size {}",
op, expected.what(), array->Size()));
}
}
struct rjson_engaged_ptr_comp {
bool operator()(const rjson::value* p1, const rjson::value* p2) const {
return rjson::single_value_comp()(*p1, *p2);
}
};
// It's not enough to compare underlying JSON objects when comparing sets,
// as internally they're stored in an array, and the order of elements is
// not important in set equality. See issue #5021
static bool check_EQ_for_sets(const rjson::value& set1, const rjson::value& set2) {
if (set1.Size() != set2.Size()) {
return false;
}
std::set<const rjson::value*, rjson_engaged_ptr_comp> set1_raw;
for (auto it = set1.Begin(); it != set1.End(); ++it) {
set1_raw.insert(&*it);
}
for (const auto& a : set2.GetArray()) {
if (set1_raw.count(&a) == 0) {
return false;
}
}
return true;
}
// Check if two JSON-encoded values match with the EQ relation
static bool check_EQ(const rjson::value* v1, const rjson::value& v2) {
if (!v1) {
return false;
}
if (v1->IsObject() && v1->MemberCount() == 1 && v2.IsObject() && v2.MemberCount() == 1) {
auto it1 = v1->MemberBegin();
auto it2 = v2.MemberBegin();
if ((it1->name == "SS" && it2->name == "SS") || (it1->name == "NS" && it2->name == "NS") || (it1->name == "BS" && it2->name == "BS")) {
return check_EQ_for_sets(it1->value, it2->value);
}
}
return *v1 == v2;
}
// Check if two JSON-encoded values match with the NE relation
static bool check_NE(const rjson::value* v1, const rjson::value& v2) {
return !v1 || *v1 != v2; // null is unequal to anything.
}
// Check if two JSON-encoded values match with the BEGINS_WITH relation
static bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2) {
// BEGINS_WITH requires that its single operand (v2) be a string or
// binary - otherwise it's a validation error. However, problems with
// the stored attribute (v1) will just return false (no match).
if (!v2.IsObject() || v2.MemberCount() != 1) {
throw api_error("ValidationException", format("BEGINS_WITH operator encountered malformed AttributeValue: {}", v2));
}
auto it2 = v2.MemberBegin();
if (it2->name != "S" && it2->name != "B") {
throw api_error("ValidationException", format("BEGINS_WITH operator requires String or Binary in AttributeValue, got {}", it2->name));
}
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
return false;
}
auto it1 = v1->MemberBegin();
if (it1->name != it2->name) {
return false;
}
if (it2->name == "S") {
std::string_view val1(it1->value.GetString(), it1->value.GetStringLength());
std::string_view val2(it2->value.GetString(), it2->value.GetStringLength());
return val1.substr(0, val2.size()) == val2;
} else /* it2->name == "B" */ {
// TODO (optimization): Check the begins_with condition directly on
// the base64-encoded string, without making a decoded copy.
bytes val1 = base64_decode(it1->value);
bytes val2 = base64_decode(it2->value);
return val1.substr(0, val2.size()) == val2;
}
}
static std::string_view to_string_view(const rjson::value& v) {
return std::string_view(v.GetString(), v.GetStringLength());
}
static bool is_set_of(const rjson::value& type1, const rjson::value& type2) {
return (type2 == "S" && type1 == "SS") || (type2 == "N" && type1 == "NS") || (type2 == "B" && type1 == "BS");
}
// Check if two JSON-encoded values match with the CONTAINS relation
static bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
if (!v1) {
return false;
}
const auto& kv1 = *v1->MemberBegin();
const auto& kv2 = *v2.MemberBegin();
if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {
throw api_error("ValidationException",
format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "
"got {} instead", kv2.name));
}
if (kv1.name == "S" && kv2.name == "S") {
return to_string_view(kv1.value).find(to_string_view(kv2.value)) != std::string_view::npos;
} else if (kv1.name == "B" && kv2.name == "B") {
return base64_decode(kv1.value).find(base64_decode(kv2.value)) != bytes::npos;
} else if (is_set_of(kv1.name, kv2.name)) {
for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {
if (*i == kv2.value) {
return true;
}
}
} else if (kv1.name == "L") {
for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {
if (!i->IsObject() || i->MemberCount() != 1) {
clogger.error("check_CONTAINS received a list whose element is malformed");
return false;
}
const auto& el = *i->MemberBegin();
if (el.name == kv2.name && el.value == kv2.value) {
return true;
}
}
}
return false;
}
// Check if two JSON-encoded values match with the NOT_CONTAINS relation
static bool check_NOT_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
if (!v1) {
return false;
}
return !check_CONTAINS(v1, v2);
}
// Check if a JSON-encoded value equals any element of an array, which must have at least one element.
static bool check_IN(const rjson::value* val, const rjson::value& array) {
if (!array[0].IsObject() || array[0].MemberCount() != 1) {
throw api_error("ValidationException",
format("IN operator encountered malformed AttributeValue: {}", array[0]));
}
const auto& type = array[0].MemberBegin()->name;
if (type != "S" && type != "N" && type != "B") {
throw api_error("ValidationException",
"IN operator requires AttributeValueList elements to be of type String, Number, or Binary ");
}
if (!val) {
return false;
}
bool have_match = false;
for (const auto& elem : array.GetArray()) {
if (!elem.IsObject() || elem.MemberCount() != 1 || elem.MemberBegin()->name != type) {
throw api_error("ValidationException",
"IN operator requires all AttributeValueList elements to have the same type ");
}
if (!have_match && *val == elem) {
// Can't return yet, must check types of all array elements. <sigh>
have_match = true;
}
}
return have_match;
}
static bool check_NULL(const rjson::value* val) {
return val == nullptr;
}
static bool check_NOT_NULL(const rjson::value* val) {
return val != nullptr;
}
// Check if two JSON-encoded values match with cmp.
template <typename Comparator>
bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp) {
if (!v2.IsObject() || v2.MemberCount() != 1) {
throw api_error("ValidationException",
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic));
}
const auto& kv2 = *v2.MemberBegin();
if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {
throw api_error("ValidationException",
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic));
}
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
return false;
}
const auto& kv1 = *v1->MemberBegin();
if (kv1.name != kv2.name) {
return false;
}
if (kv1.name == "N") {
return cmp(unwrap_number(*v1, cmp.diagnostic), unwrap_number(v2, cmp.diagnostic));
}
if (kv1.name == "S") {
return cmp(std::string_view(kv1.value.GetString(), kv1.value.GetStringLength()),
std::string_view(kv2.value.GetString(), kv2.value.GetStringLength()));
}
if (kv1.name == "B") {
return cmp(base64_decode(kv1.value), base64_decode(kv2.value));
}
clogger.error("check_compare panic: LHS type equals RHS type, but one is in {N,S,B} while the other isn't");
return false;
}
struct cmp_lt {
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs; }
static constexpr const char* diagnostic = "LT operator";
};
struct cmp_le {
// bytes only has <, so we cannot use <=.
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs || lhs == rhs; }
static constexpr const char* diagnostic = "LE operator";
};
struct cmp_ge {
// bytes only has <, so we cannot use >=.
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return rhs < lhs || lhs == rhs; }
static constexpr const char* diagnostic = "GE operator";
};
struct cmp_gt {
// bytes only has <, so we cannot use >.
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return rhs < lhs; }
static constexpr const char* diagnostic = "GT operator";
};
// True if v is between lb and ub, inclusive. Throws if lb > ub.
template <typename T>
bool check_BETWEEN(const T& v, const T& lb, const T& ub) {
if (ub < lb) {
throw api_error("ValidationException",
format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));
}
return cmp_ge()(v, lb) && cmp_le()(v, ub);
}
static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const rjson::value& ub) {
if (!v) {
return false;
}
if (!v->IsObject() || v->MemberCount() != 1) {
throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", *v));
}
if (!lb.IsObject() || lb.MemberCount() != 1) {
throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", lb));
}
if (!ub.IsObject() || ub.MemberCount() != 1) {
throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", ub));
}
const auto& kv_v = *v->MemberBegin();
const auto& kv_lb = *lb.MemberBegin();
const auto& kv_ub = *ub.MemberBegin();
if (kv_lb.name != kv_ub.name) {
throw api_error(
"ValidationException",
format("BETWEEN operator requires the same type for lower and upper bound; instead got {} and {}",
kv_lb.name, kv_ub.name));
}
if (kv_v.name != kv_lb.name) { // Cannot compare different types, so v is NOT between lb and ub.
return false;
}
if (kv_v.name == "N") {
const char* diag = "BETWEEN operator";
return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag));
}
if (kv_v.name == "S") {
return check_BETWEEN(std::string_view(kv_v.value.GetString(), kv_v.value.GetStringLength()),
std::string_view(kv_lb.value.GetString(), kv_lb.value.GetStringLength()),
std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()));
}
if (kv_v.name == "B") {
return check_BETWEEN(base64_decode(kv_v.value), base64_decode(kv_lb.value), base64_decode(kv_ub.value));
}
throw api_error("ValidationException",
format("BETWEEN operator requires AttributeValueList elements to be of type String, Number, or Binary; instead got {}",
kv_lb.name));
}
// Verify one Expect condition on one attribute (whose content is "got")
// for the verify_expected() below.
// This function returns true or false depending on whether the condition
// succeeded - it does not throw ConditionalCheckFailedException.
// However, it may throw ValidationException on input validation errors.
static bool verify_expected_one(const rjson::value& condition, const rjson::value* got) {
const rjson::value* comparison_operator = rjson::find(condition, "ComparisonOperator");
const rjson::value* attribute_value_list = rjson::find(condition, "AttributeValueList");
const rjson::value* value = rjson::find(condition, "Value");
const rjson::value* exists = rjson::find(condition, "Exists");
// There are three types of conditions that Expected supports:
// A value, not-exists, and a comparison of some kind. Each allows
// and requires a different combinations of parameters in the request
if (value) {
if (exists && (!exists->IsBool() || exists->GetBool() != true)) {
throw api_error("ValidationException", "Cannot combine Value with Exists!=true");
}
if (comparison_operator) {
throw api_error("ValidationException", "Cannot combine Value with ComparisonOperator");
}
return check_EQ(got, *value);
} else if (exists) {
if (comparison_operator) {
throw api_error("ValidationException", "Cannot combine Exists with ComparisonOperator");
}
if (!exists->IsBool() || exists->GetBool() != false) {
throw api_error("ValidationException", "Exists!=false requires Value");
}
// Remember Exists=false, so we're checking that the attribute does *not* exist:
return !got;
} else {
if (!comparison_operator) {
throw api_error("ValidationException", "Missing ComparisonOperator, Value or Exists");
}
comparison_operator_type op = get_comparison_operator(*comparison_operator);
switch (op) {
case comparison_operator_type::EQ:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_EQ(got, (*attribute_value_list)[0]);
case comparison_operator_type::NE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_NE(got, (*attribute_value_list)[0]);
case comparison_operator_type::LT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_lt{});
case comparison_operator_type::LE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_le{});
case comparison_operator_type::GT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_gt{});
case comparison_operator_type::GE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_ge{});
case comparison_operator_type::BEGINS_WITH:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_BEGINS_WITH(got, (*attribute_value_list)[0]);
case comparison_operator_type::IN:
verify_operand_count(attribute_value_list, nonempty(), *comparison_operator);
return check_IN(got, *attribute_value_list);
case comparison_operator_type::IS_NULL:
verify_operand_count(attribute_value_list, empty(), *comparison_operator);
return check_NULL(got);
case comparison_operator_type::NOT_NULL:
verify_operand_count(attribute_value_list, empty(), *comparison_operator);
return check_NOT_NULL(got);
case comparison_operator_type::BETWEEN:
verify_operand_count(attribute_value_list, exact_size(2), *comparison_operator);
return check_BETWEEN(got, (*attribute_value_list)[0], (*attribute_value_list)[1]);
case comparison_operator_type::CONTAINS:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_CONTAINS(got, (*attribute_value_list)[0]);
case comparison_operator_type::NOT_CONTAINS:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_NOT_CONTAINS(got, (*attribute_value_list)[0]);
}
throw std::logic_error(format("Internal error: corrupted operator enum: {}", int(op)));
}
}
// Verify that the existing values of the item (previous_item) match the
// conditions given by the Expected and ConditionalOperator parameters
// (if they exist) in the request (an UpdateItem, PutItem or DeleteItem).
// This function will throw a ConditionalCheckFailedException API error
// if the values do not match the condition, or ValidationException if there
// are errors in the format of the condition itself.
void verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item) {
const rjson::value* expected = rjson::find(req, "Expected");
if (!expected) {
return;
}
if (!expected->IsObject()) {
throw api_error("ValidationException", "'Expected' parameter, if given, must be an object");
}
// ConditionalOperator can be "AND" for requiring all conditions, or
// "OR" for requiring one condition, and defaults to "AND" if missing.
const rjson::value* conditional_operator = rjson::find(req, "ConditionalOperator");
bool require_all = true;
if (conditional_operator) {
if (!conditional_operator->IsString()) {
throw api_error("ValidationException", "'ConditionalOperator' parameter, if given, must be a string");
}
std::string_view s(conditional_operator->GetString(), conditional_operator->GetStringLength());
if (s == "AND") {
// require_all is already true
} else if (s == "OR") {
require_all = false;
} else {
throw api_error("ValidationException", "'ConditionalOperator' parameter must be AND, OR or missing");
}
if (expected->GetObject().ObjectEmpty()) {
throw api_error("ValidationException", "'ConditionalOperator' parameter cannot be specified for empty Expression");
}
}
for (auto it = expected->MemberBegin(); it != expected->MemberEnd(); ++it) {
const rjson::value* got = nullptr;
if (previous_item && previous_item->IsObject() && previous_item->HasMember("Item")) {
got = rjson::find((*previous_item)["Item"], rjson::string_ref_type(it->name.GetString()));
}
bool success = verify_expected_one(it->value, got);
if (success && !require_all) {
// When !require_all, one success is enough!
return;
} else if (!success && require_all) {
// When require_all, one failure is enough!
throw api_error("ConditionalCheckFailedException", "Failed condition.");
}
}
// If we got here and require_all, none of the checks failed, so succeed.
// If we got here and !require_all, all of the checks failed, so fail.
if (!require_all) {
throw api_error("ConditionalCheckFailedException", "None of ORed Expect conditions were successful.");
}
}
}

49
alternator/conditions.hh Normal file
View File

@@ -0,0 +1,49 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* This file contains definitions and functions related to placing conditions
* on Alternator queries (equivalent of CQL's restrictions).
*
* With conditions, it's possible to add criteria to selection requests (Scan, Query)
* and use them for narrowing down the result set, by means of filtering or indexing.
*
* Ref: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Condition.html
*/
#pragma once
#include "cql3/restrictions/statement_restrictions.hh"
#include "serialization.hh"
namespace alternator {
enum class comparison_operator_type {
EQ, NE, LE, LT, GE, GT, IN, BETWEEN, CONTAINS, NOT_CONTAINS, IS_NULL, NOT_NULL, BEGINS_WITH
};
comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator);
::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter);
void verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item);
}

50
alternator/error.hh Normal file
View File

@@ -0,0 +1,50 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <seastar/http/httpd.hh>
#include "seastarx.hh"
namespace alternator {
// DynamoDB's error messages are described in detail in
// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html
// Ah An error message has a "type", e.g., "ResourceNotFoundException", a coarser
// HTTP code (almost always, 400), and a human readable message. Eventually these
// will be wrapped into a JSON object returned to the client.
class api_error : public std::exception {
public:
using status_type = httpd::reply::status_type;
status_type _http_code;
std::string _type;
std::string _msg;
api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request)
: _http_code(std::move(http_code))
, _type(std::move(type))
, _msg(std::move(msg))
{ }
api_error() = default;
virtual const char* what() const noexcept override { return _msg.c_str(); }
};
}

2275
alternator/executor.cc Normal file

File diff suppressed because it is too large Load Diff

71
alternator/executor.hh Normal file
View File

@@ -0,0 +1,71 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <seastar/core/future.hh>
#include <seastar/http/httpd.hh>
#include "seastarx.hh"
#include <seastar/json/json_elements.hh>
#include "service/storage_proxy.hh"
#include "service/migration_manager.hh"
#include "service/client_state.hh"
#include "stats.hh"
namespace alternator {
class executor {
service::storage_proxy& _proxy;
service::migration_manager& _mm;
public:
using client_state = service::client_state;
stats _stats;
static constexpr auto ATTRS_COLUMN_NAME = ":attrs";
static constexpr auto KEYSPACE_NAME = "alternator";
executor(service::storage_proxy& proxy, service::migration_manager& mm) : _proxy(proxy), _mm(mm) {}
future<json::json_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> delete_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> put_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> get_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> delete_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> update_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> list_tables(client_state& client_state, std::string content);
future<json::json_return_type> scan(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> describe_endpoints(client_state& client_state, std::string content, std::string host_header);
future<json::json_return_type> batch_write_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> batch_get_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> query(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<> start();
future<> stop() { return make_ready_future<>(); }
future<> maybe_create_keyspace();
static tracing::trace_state_ptr maybe_trace_query(client_state& client_state, sstring_view op, sstring_view query);
};
}

98
alternator/expressions.cc Normal file
View File

@@ -0,0 +1,98 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "expressions.hh"
#include "alternator/expressionsLexer.hpp"
#include "alternator/expressionsParser.hpp"
#include <seastarx.hh>
#include <seastar/core/print.hh>
#include <seastar/util/log.hh>
#include <functional>
namespace alternator {
template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>
Result do_with_parser(std::string input, Func&& f) {
expressionsLexer::InputStreamType input_stream{
reinterpret_cast<const ANTLR_UINT8*>(input.data()),
ANTLR_ENC_UTF8,
static_cast<ANTLR_UINT32>(input.size()),
nullptr };
expressionsLexer lexer(&input_stream);
expressionsParser::TokenStreamType tstream(ANTLR_SIZE_HINT, lexer.get_tokSource());
expressionsParser parser(&tstream);
auto result = f(parser);
return result;
}
parsed::update_expression
parse_update_expression(std::string query) {
try {
return do_with_parser(query, std::mem_fn(&expressionsParser::update_expression));
} catch (...) {
throw expressions_syntax_error(format("Failed parsing UpdateExpression '{}': {}", query, std::current_exception()));
}
}
std::vector<parsed::path>
parse_projection_expression(std::string query) {
try {
return do_with_parser(query, std::mem_fn(&expressionsParser::projection_expression));
} catch (...) {
throw expressions_syntax_error(format("Failed parsing ProjectionExpression '{}': {}", query, std::current_exception()));
}
}
template<class... Ts> struct overloaded : Ts... { using Ts::operator()...; };
template<class... Ts> overloaded(Ts...) -> overloaded<Ts...>;
namespace parsed {
void update_expression::add(update_expression::action a) {
std::visit(overloaded {
[&] (action::set&) { seen_set = true; },
[&] (action::remove&) { seen_remove = true; },
[&] (action::add&) { seen_add = true; },
[&] (action::del&) { seen_del = true; }
}, a._action);
_actions.push_back(std::move(a));
}
void update_expression::append(update_expression other) {
if ((seen_set && other.seen_set) ||
(seen_remove && other.seen_remove) ||
(seen_add && other.seen_add) ||
(seen_del && other.seen_del)) {
throw expressions_syntax_error("Each of SET, REMOVE, ADD, DELETE may only appear once in UpdateExpression");
}
std::move(other._actions.begin(), other._actions.end(), std::back_inserter(_actions));
seen_set |= other.seen_set;
seen_remove |= other.seen_remove;
seen_add |= other.seen_add;
seen_del |= other.seen_del;
}
} // namespace parsed
} // namespace alternator

214
alternator/expressions.g Normal file
View File

@@ -0,0 +1,214 @@
/*
* Copyright 2019 ScyllaDB
*
* This file is part of Scylla. See the LICENSE.PROPRIETARY file in the
* top-level directory for licensing information.
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* The DynamoDB protocol is based on JSON, and most DynamoDB requests
* describe the operation and its parameters via JSON objects such as maps
* and lists. Nevertheless, in some types of requests an "expression" is
* passed as a single string, and we need to parse this string. These
* cases include:
* 1. Attribute paths, such as "a[3].b.c", are used in projection
* expressions as well as inside other expressions described below.
* 2. Condition expressions, such as "(NOT (a=b OR c=d)) AND e=f",
* used in conditional updates, filters, and other places.
* 3. Update expressions, such as "SET #a.b = :x, c = :y DELETE d"
*
* All these expression syntaxes are very simple: Most of them could be
* parsed as regular expressions, and the parenthesized condition expression
* could be done with a simple hand-written lexical analyzer and recursive-
* descent parser. Nevertheless, we decided to specify these parsers in the
* ANTLR3 language already used in the Scylla project, hopefully making these
* parsers easier to reason about, and easier to change if needed - and
* reducing the amount of boiler-plate code.
*/
grammar expressions;
options {
language = Cpp;
}
@parser::namespace{alternator}
@lexer::namespace{alternator}
/* TODO: explain what these traits things are. I haven't seen them explained
* in any document... Compilation fails without these fail because a definition
* of "expressionsLexerTraits" and "expressionParserTraits" is needed.
*/
@lexer::traits {
class expressionsLexer;
class expressionsParser;
typedef antlr3::Traits<expressionsLexer, expressionsParser> expressionsLexerTraits;
}
@parser::traits {
typedef expressionsLexerTraits expressionsParserTraits;
}
@lexer::header {
#include "alternator/expressions.hh"
// ANTLR generates a bunch of unused variables and functions. Yuck...
#pragma GCC diagnostic ignored "-Wunused-variable"
#pragma GCC diagnostic ignored "-Wunused-function"
}
@parser::header {
#include "expressionsLexer.hpp"
}
/* By default, ANTLR3 composes elaborate syntax-error messages, saying which
* token was unexpected, where, and so on on, but then dutifully writes these
* error messages to the standard error, and returns from the parser as if
* everything was fine, with a half-constructed output object! If we define
* the "displayRecognitionError" method, it will be called upon to build this
* error message, and we can instead throw an exception to stop the parsing
* immediately. This is good enough for now, for our simple needs, but if
* we ever want to show more information about the syntax error, Cql3.g
* contains an elaborate implementation (it would be nice if we could reuse
* it, not duplicate it).
* Unfortunately, we have to repeat the same definition twice - once for the
* parser, and once for the lexer.
*/
@parser::context {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
throw expressions_syntax_error("syntax error");
}
}
@lexer::context {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
throw expressions_syntax_error("syntax error");
}
}
/*
* Lexical analysis phase, i.e., splitting the input up to tokens.
* Lexical analyzer rules have names starting in capital letters.
* "fragment" rules do not generate tokens, and are just aliases used to
* make other rules more readable.
* Characters *not* listed here, e.g., '=', '(', etc., will be handled
* as individual tokens on their own right.
* Whitespace spans are skipped, so do not generate tokens.
*/
WHITESPACE: (' ' | '\t' | '\n' | '\r')+ { skip(); };
/* shortcuts for case-insensitive keywords */
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
/* These keywords must be appear before the generic NAME token below,
* because NAME matches too, and the first to match wins.
*/
SET: S E T;
REMOVE: R E M O V E;
ADD: A D D;
DELETE: D E L E T E;
fragment ALPHA: 'A'..'Z' | 'a'..'z';
fragment DIGIT: '0'..'9';
fragment ALNUM: ALPHA | DIGIT | '_';
INTEGER: DIGIT+;
NAME: ALPHA ALNUM*;
NAMEREF: '#' ALNUM+;
VALREF: ':' ALNUM+;
/*
* Parsing phase - parsing the string of tokens generated by the lexical
* analyzer defined above.
*/
path_component: NAME | NAMEREF;
path returns [parsed::path p]:
root=path_component { $p.set_root($root.text); }
( '.' name=path_component { $p.add_dot($name.text); }
| '[' INTEGER ']' { $p.add_index(std::stoi($INTEGER.text)); }
)*;
update_expression_set_value returns [parsed::value v]:
VALREF { $v.set_valref($VALREF.text); }
| path { $v.set_path($path.p); }
| NAME { $v.set_func_name($NAME.text); }
'(' x=update_expression_set_value { $v.add_func_parameter($x.v); }
(',' x=update_expression_set_value { $v.add_func_parameter($x.v); })*
')'
;
update_expression_set_rhs returns [parsed::set_rhs rhs]:
v=update_expression_set_value { $rhs.set_value(std::move($v.v)); }
( '+' v=update_expression_set_value { $rhs.set_plus(std::move($v.v)); }
| '-' v=update_expression_set_value { $rhs.set_minus(std::move($v.v)); }
)?
;
update_expression_set_action returns [parsed::update_expression::action a]:
path '=' rhs=update_expression_set_rhs { $a.assign_set($path.p, $rhs.rhs); };
update_expression_remove_action returns [parsed::update_expression::action a]:
path { $a.assign_remove($path.p); };
update_expression_add_action returns [parsed::update_expression::action a]:
path VALREF { $a.assign_add($path.p, $VALREF.text); };
update_expression_delete_action returns [parsed::update_expression::action a]:
path VALREF { $a.assign_del($path.p, $VALREF.text); };
update_expression_clause returns [parsed::update_expression e]:
SET s=update_expression_set_action { $e.add(s); }
(',' s=update_expression_set_action { $e.add(s); })*
| REMOVE r=update_expression_remove_action { $e.add(r); }
(',' r=update_expression_remove_action { $e.add(r); })*
| ADD a=update_expression_add_action { $e.add(a); }
(',' a=update_expression_add_action { $e.add(a); })*
| DELETE d=update_expression_delete_action { $e.add(d); }
(',' d=update_expression_delete_action { $e.add(d); })*
;
// Note the "EOF" token at the end of the update expression. We want to the
// parser to match the entire string given to it - not just its beginning!
update_expression returns [parsed::update_expression e]:
(update_expression_clause { e.append($update_expression_clause.e); })* EOF;
projection_expression returns [std::vector<parsed::path> v]:
p=path { $v.push_back(std::move($p.p)); }
(',' p=path { $v.push_back(std::move($p.p)); } )* EOF;

41
alternator/expressions.hh Normal file
View File

@@ -0,0 +1,41 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string>
#include <stdexcept>
#include <vector>
#include "expressions_types.hh"
namespace alternator {
class expressions_syntax_error : public std::runtime_error {
public:
using runtime_error::runtime_error;
};
parsed::update_expression parse_update_expression(std::string query);
std::vector<parsed::path> parse_projection_expression(std::string query);
} /* namespace alternator */

View File

@@ -0,0 +1,166 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <vector>
#include <string>
#include <variant>
/*
* Parsed representation of expressions and their components.
*
* Types in alternator::parse namespace are used for holding the parse
* tree - objects generated by the Antlr rules after parsing an expression.
* Because of the way Antlr works, all these objects are default-constructed
* first, and then assigned when the rule is completed, so all these types
* have only default constructors - but setter functions to set them later.
*/
namespace alternator {
namespace parsed {
// "path" is an attribute's path in a document, e.g., a.b[3].c.
class path {
// All paths have a "root", a top-level attribute, and any number of
// "dereference operators" - each either an index (e.g., "[2]") or a
// dot (e.g., ".xyz").
std::string _root;
std::vector<std::variant<std::string, unsigned>> _operators;
public:
void set_root(std::string root) {
_root = std::move(root);
}
void add_index(unsigned i) {
_operators.emplace_back(i);
}
void add_dot(std::string(name)) {
_operators.emplace_back(std::move(name));
}
const std::string& root() const {
return _root;
}
bool has_operators() const {
return !_operators.empty();
}
};
// "value" is is a value used in the right hand side of an assignment
// expression, "SET a = ...". It can be a reference to a value included in
// the request (":val"), a path to an attribute from the existing item
// (e.g., "a.b[3].c"), or a function of other such values.
// Note that the real right-hand-side of an assignment is actually a bit
// more general - it allows either a value, or a value+value or value-value -
// see class set_rhs below.
struct value {
struct function_call {
std::string _function_name;
std::vector<value> _parameters;
};
std::variant<std::string, path, function_call> _value;
void set_valref(std::string s) {
_value = std::move(s);
}
void set_path(path p) {
_value = std::move(p);
}
void set_func_name(std::string s) {
_value = function_call {std::move(s), {}};
}
void add_func_parameter(value v) {
std::get<function_call>(_value)._parameters.emplace_back(std::move(v));
}
};
// The right-hand-side of a SET in an update expression can be either a
// single value (see above), or value+value, or value-value.
class set_rhs {
public:
char _op; // '+', '-', or 'v''
value _v1;
value _v2;
void set_value(value&& v1) {
_op = 'v';
_v1 = std::move(v1);
}
void set_plus(value&& v2) {
_op = '+';
_v2 = std::move(v2);
}
void set_minus(value&& v2) {
_op = '-';
_v2 = std::move(v2);
}
};
class update_expression {
public:
struct action {
path _path;
struct set {
set_rhs _rhs;
};
struct remove {
};
struct add {
std::string _valref;
};
struct del {
std::string _valref;
};
std::variant<set, remove, add, del> _action;
void assign_set(path p, set_rhs rhs) {
_path = std::move(p);
_action = set { std::move(rhs) };
}
void assign_remove(path p) {
_path = std::move(p);
_action = remove { };
}
void assign_add(path p, std::string v) {
_path = std::move(p);
_action = add { std::move(v) };
}
void assign_del(path p, std::string v) {
_path = std::move(p);
_action = del { std::move(v) };
}
};
private:
std::vector<action> _actions;
bool seen_set = false;
bool seen_remove = false;
bool seen_add = false;
bool seen_del = false;
public:
void add(action a);
void append(update_expression other);
bool empty() const {
return _actions.empty();
}
const std::vector<action>& actions() const {
return _actions;
}
};
} // namespace parsed
} // namespace alternator

172
alternator/rjson.cc Normal file
View File

@@ -0,0 +1,172 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "rjson.hh"
#include "error.hh"
#include <seastar/core/print.hh>
namespace rjson {
static allocator the_allocator;
std::string print(const rjson::value& value) {
string_buffer buffer;
writer writer(buffer);
value.Accept(writer);
return std::string(buffer.GetString());
}
rjson::value copy(const rjson::value& value) {
return rjson::value(value, the_allocator);
}
rjson::value parse(const std::string& str) {
return parse_raw(str.c_str(), str.size());
}
rjson::value parse_raw(const char* c_str, size_t size) {
rjson::document d;
d.Parse(c_str, size);
if (d.HasParseError()) {
throw rjson::error(format("Parsing JSON failed: {}", GetParseError_En(d.GetParseError())));
}
rjson::value& v = d;
return std::move(v);
}
rjson::value& get(rjson::value& value, rjson::string_ref_type name) {
auto member_it = value.FindMember(name);
if (member_it != value.MemberEnd())
return member_it->value;
else {
throw rjson::error(format("JSON parameter {} not found", name));
}
}
const rjson::value& get(const rjson::value& value, rjson::string_ref_type name) {
auto member_it = value.FindMember(name);
if (member_it != value.MemberEnd())
return member_it->value;
else {
throw rjson::error(format("JSON parameter {} not found", name));
}
}
rjson::value from_string(const std::string& str) {
return rjson::value(str.c_str(), str.size(), the_allocator);
}
rjson::value from_string(const sstring& str) {
return rjson::value(str.c_str(), str.size(), the_allocator);
}
rjson::value from_string(const char* str, size_t size) {
return rjson::value(str, size, the_allocator);
}
const rjson::value* find(const rjson::value& value, string_ref_type name) {
auto member_it = value.FindMember(name);
return member_it != value.MemberEnd() ? &member_it->value : nullptr;
}
rjson::value* find(rjson::value& value, string_ref_type name) {
auto member_it = value.FindMember(name);
return member_it != value.MemberEnd() ? &member_it->value : nullptr;
}
void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member) {
base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), std::move(member), the_allocator);
}
void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member) {
base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), rjson::value(member), the_allocator);
}
void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member) {
base.AddMember(name, std::move(member), the_allocator);
}
void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member) {
base.AddMember(name, rjson::value(member), the_allocator);
}
void push_back(rjson::value& base_array, rjson::value&& item) {
base_array.PushBack(std::move(item), the_allocator);
}
bool single_value_comp::operator()(const rjson::value& r1, const rjson::value& r2) const {
auto r1_type = r1.GetType();
auto r2_type = r2.GetType();
// null is the smallest type and compares with every other type, nothing is lesser than null
if (r1_type == rjson::type::kNullType || r2_type == rjson::type::kNullType) {
return r1_type < r2_type;
}
// only null, true, and false are comparable with each other, other types are not compatible
if (r1_type != r2_type) {
if (r1_type > rjson::type::kTrueType || r2_type > rjson::type::kTrueType) {
throw rjson::error(format("Types are not comparable: {} {}", r1, r2));
}
}
switch (r1_type) {
case rjson::type::kNullType:
// fall-through
case rjson::type::kFalseType:
// fall-through
case rjson::type::kTrueType:
return r1_type < r2_type;
case rjson::type::kObjectType:
throw rjson::error("Object type comparison is not supported");
case rjson::type::kArrayType:
throw rjson::error("Array type comparison is not supported");
case rjson::type::kStringType: {
const size_t r1_len = r1.GetStringLength();
const size_t r2_len = r2.GetStringLength();
size_t len = std::min(r1_len, r2_len);
int result = std::strncmp(r1.GetString(), r2.GetString(), len);
return result < 0 || (result == 0 && r1_len < r2_len);
}
case rjson::type::kNumberType: {
if (r1.IsInt() && r2.IsInt()) {
return r1.GetInt() < r2.GetInt();
} else if (r1.IsUint() && r2.IsUint()) {
return r1.GetUint() < r2.GetUint();
} else if (r1.IsInt64() && r2.IsInt64()) {
return r1.GetInt64() < r2.GetInt64();
} else if (r1.IsUint64() && r2.IsUint64()) {
return r1.GetUint64() < r2.GetUint64();
} else {
// it's safe to call GetDouble() on any number type
return r1.GetDouble() < r2.GetDouble();
}
}
default:
return false;
}
}
} // end namespace rjson
std::ostream& std::operator<<(std::ostream& os, const rjson::value& v) {
return os << rjson::print(v);
}

163
alternator/rjson.hh Normal file
View File

@@ -0,0 +1,163 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
/*
* rjson is a wrapper over rapidjson library, providing fast JSON parsing and generation.
*
* rapidjson has strict copy elision policies, which, among other things, involves
* using provided char arrays without copying them and allows copying objects only explicitly.
* As such, one should be careful when passing strings with limited liveness
* (e.g. data underneath local std::strings) to rjson functions, because created JSON objects
* may end up relying on dangling char pointers. All rjson functions that create JSONs from strings
* by rjson have both APIs for string_ref_type (more optimal, used when the string is known to live
* at least as long as the object, e.g. a static char array) and for std::strings. The more optimal
* variants should be used *only* if the liveness of the string is guaranteed, otherwise it will
* result in undefined behaviour.
* Also, bear in mind that methods exposed by rjson::value are generic, but some of them
* work fine only for specific types. In case the type does not match, an rjson::error will be thrown.
* Examples of such mismatched usages is calling MemberCount() on a JSON value not of object type
* or calling Size() on a non-array value.
*/
#include <string>
#include <stdexcept>
namespace rjson {
class error : public std::exception {
std::string _msg;
public:
error() = default;
error(const std::string& msg) : _msg(msg) {}
virtual const char* what() const noexcept override { return _msg.c_str(); }
};
}
// rapidjson configuration macros
#define RAPIDJSON_HAS_STDSTRING 1
// Default rjson policy is to use assert() - which is dangerous for two reasons:
// 1. assert() can be turned off with -DNDEBUG
// 2. assert() crashes a program
// Fortunately, the default policy can be overridden, and so rapidjson errors will
// throw an rjson::error exception instead.
#define RAPIDJSON_ASSERT(x) do { if (!(x)) throw rjson::error(std::string("JSON error: condition not met: ") + #x); } while (0)
#include <rapidjson/document.h>
#include <rapidjson/writer.h>
#include <rapidjson/stringbuffer.h>
#include <rapidjson/error/en.h>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
namespace rjson {
using allocator = rapidjson::CrtAllocator;
using encoding = rapidjson::UTF8<>;
using document = rapidjson::GenericDocument<encoding, allocator>;
using value = rapidjson::GenericValue<encoding, allocator>;
using string_ref_type = value::StringRefType;
using string_buffer = rapidjson::GenericStringBuffer<encoding>;
using writer = rapidjson::Writer<string_buffer, encoding>;
using type = rapidjson::Type;
// Returns an object representing JSON's null
inline rjson::value null_value() {
return rjson::value(rapidjson::kNullType);
}
// Returns an empty JSON object - {}
inline rjson::value empty_object() {
return rjson::value(rapidjson::kObjectType);
}
// Returns an empty JSON array - []
inline rjson::value empty_array() {
return rjson::value(rapidjson::kArrayType);
}
// Returns an empty JSON string - ""
inline rjson::value empty_string() {
return rjson::value(rapidjson::kStringType);
}
// Convert the JSON value to a string with JSON syntax, the opposite of parse().
// The representation is dense - without any redundant indentation.
std::string print(const rjson::value& value);
// Copies given JSON value - involves allocation
rjson::value copy(const rjson::value& value);
// Parses a JSON value from given string or raw character array.
// The string/char array liveness does not need to be persisted,
// as both parse() and parse_raw() will allocate member names and values.
// Throws rjson::error if parsing failed.
rjson::value parse(const std::string& str);
rjson::value parse_raw(const char* c_str, size_t size);
// Creates a JSON value (of JSON string type) out of internal string representations.
// The string value is copied, so str's liveness does not need to be persisted.
rjson::value from_string(const std::string& str);
rjson::value from_string(const sstring& str);
rjson::value from_string(const char* str, size_t size);
// Returns a pointer to JSON member if it exists, nullptr otherwise
rjson::value* find(rjson::value& value, rjson::string_ref_type name);
const rjson::value* find(const rjson::value& value, rjson::string_ref_type name);
// Returns a reference to JSON member if it exists, throws otherwise
rjson::value& get(rjson::value& value, rjson::string_ref_type name);
const rjson::value& get(const rjson::value& value, rjson::string_ref_type name);
// Sets a member in given JSON object by moving the member - allocates the name.
// Throws if base is not a JSON object.
void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member);
// Sets a string member in given JSON object by assigning its reference - allocates the name.
// NOTICE: member string liveness must be ensured to be at least as long as base's.
// Throws if base is not a JSON object.
void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member);
// Sets a member in given JSON object by moving the member.
// NOTICE: name liveness must be ensured to be at least as long as base's.
// Throws if base is not a JSON object.
void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member);
// Sets a string member in given JSON object by assigning its reference.
// NOTICE: name liveness must be ensured to be at least as long as base's.
// NOTICE: member liveness must be ensured to be at least as long as base's.
// Throws if base is not a JSON object.
void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member);
// Adds a value to a JSON list by moving the item to its end.
// Throws if base_array is not a JSON array.
void push_back(rjson::value& base_array, rjson::value&& item);
struct single_value_comp {
bool operator()(const rjson::value& r1, const rjson::value& r2) const;
};
} // end namespace rjson
namespace std {
std::ostream& operator<<(std::ostream& os, const rjson::value& v);
}

261
alternator/serialization.cc Normal file
View File

@@ -0,0 +1,261 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "base64.hh"
#include "log.hh"
#include "serialization.hh"
#include "error.hh"
#include "rapidjson/writer.h"
#include "concrete_types.hh"
#include "cql3/type_json.hh"
static logging::logger slogger("alternator-serialization");
namespace alternator {
type_info type_info_from_string(std::string type) {
static thread_local const std::unordered_map<std::string, type_info> type_infos = {
{"S", {alternator_type::S, utf8_type}},
{"B", {alternator_type::B, bytes_type}},
{"BOOL", {alternator_type::BOOL, boolean_type}},
{"N", {alternator_type::N, decimal_type}}, //FIXME: Replace with custom Alternator type when implemented
};
auto it = type_infos.find(type);
if (it == type_infos.end()) {
return {alternator_type::NOT_SUPPORTED_YET, utf8_type};
}
return it->second;
}
type_representation represent_type(alternator_type atype) {
static thread_local const std::unordered_map<alternator_type, type_representation> type_representations = {
{alternator_type::S, {"S", utf8_type}},
{alternator_type::B, {"B", bytes_type}},
{alternator_type::BOOL, {"BOOL", boolean_type}},
{alternator_type::N, {"N", decimal_type}}, //FIXME: Replace with custom Alternator type when implemented
};
auto it = type_representations.find(atype);
if (it == type_representations.end()) {
throw std::runtime_error(format("Unknown alternator type {}", int8_t(atype)));
}
return it->second;
}
struct from_json_visitor {
const rjson::value& v;
bytes_ostream& bo;
void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), from_json_visitor{v, bo}); };
void operator()(const string_type_impl& t) {
bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));
}
void operator()(const bytes_type_impl& t) const {
bo.write(base64_decode(v));
}
void operator()(const boolean_type_impl& t) const {
bo.write(boolean_type->decompose(v.GetBool()));
}
void operator()(const decimal_type_impl& t) const {
bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));
}
// default
void operator()(const abstract_type& t) const {
bo.write(from_json_object(t, Json::Value(rjson::print(v)), cql_serialization_format::internal()));
}
};
bytes serialize_item(const rjson::value& item) {
if (item.IsNull() || item.MemberCount() != 1) {
throw api_error("ValidationException", format("An item can contain only one attribute definition: {}", item));
}
auto it = item.MemberBegin();
type_info type_info = type_info_from_string(it->name.GetString()); // JSON keys are guaranteed to be strings
if (type_info.atype == alternator_type::NOT_SUPPORTED_YET) {
slogger.trace("Non-optimal serialization of type {}", it->name.GetString());
return bytes{int8_t(type_info.atype)} + to_bytes(rjson::print(item));
}
bytes_ostream bo;
bo.write(bytes{int8_t(type_info.atype)});
visit(*type_info.dtype, from_json_visitor{it->value, bo});
return bytes(bo.linearize());
}
struct to_json_visitor {
rjson::value& deserialized;
const std::string& type_ident;
bytes_view bv;
void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), to_json_visitor{deserialized, type_ident, bv}); };
void operator()(const decimal_type_impl& t) const {
auto s = to_json_string(*decimal_type, bytes(bv));
//FIXME(sarna): unnecessary copy
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(s));
}
void operator()(const string_type_impl& t) {
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(reinterpret_cast<const char *>(bv.data()), bv.size()));
}
void operator()(const bytes_type_impl& t) const {
std::string b64 = base64_encode(bv);
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(b64));
}
// default
void operator()(const abstract_type& t) const {
rjson::set_with_string_name(deserialized, type_ident, rjson::parse(t.to_string(bytes(bv))));
}
};
rjson::value deserialize_item(bytes_view bv) {
rjson::value deserialized(rapidjson::kObjectType);
if (bv.empty()) {
throw api_error("ValidationException", "Serialized value empty");
}
alternator_type atype = alternator_type(bv[0]);
bv.remove_prefix(1);
if (atype == alternator_type::NOT_SUPPORTED_YET) {
slogger.trace("Non-optimal deserialization of alternator type {}", int8_t(atype));
return rjson::parse_raw(reinterpret_cast<const char *>(bv.data()), bv.size());
}
type_representation type_representation = represent_type(atype);
visit(*type_representation.dtype, to_json_visitor{deserialized, type_representation.ident, bv});
return deserialized;
}
std::string type_to_string(data_type type) {
static thread_local std::unordered_map<data_type, std::string> types = {
{utf8_type, "S"},
{bytes_type, "B"},
{boolean_type, "BOOL"},
{decimal_type, "N"}, // FIXME: use a specialized Alternator number type instead of the general decimal_type
};
auto it = types.find(type);
if (it == types.end()) {
throw std::runtime_error(format("Unknown type {}", type->name()));
}
return it->second;
}
bytes get_key_column_value(const rjson::value& item, const column_definition& column) {
std::string column_name = column.name_as_text();
std::string expected_type = type_to_string(column.type);
const rjson::value& key_typed_value = rjson::get(item, rjson::value::StringRefType(column_name.c_str()));
if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1) {
throw api_error("ValidationException",
format("Missing or invalid value object for key column {}: {}", column_name, item));
}
return get_key_from_typed_value(key_typed_value, column, expected_type);
}
bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column, const std::string& expected_type) {
auto it = key_typed_value.MemberBegin();
if (it->name.GetString() != expected_type) {
throw api_error("ValidationException",
format("Type mismatch: expected type {} for key column {}, got type {}",
expected_type, column.name_as_text(), it->name.GetString()));
}
if (column.type == bytes_type) {
return base64_decode(it->value);
} else {
return column.type->from_string(it->value.GetString());
}
}
rjson::value json_key_column_value(bytes_view cell, const column_definition& column) {
if (column.type == bytes_type) {
std::string b64 = base64_encode(cell);
return rjson::from_string(b64);
} if (column.type == utf8_type) {
return rjson::from_string(std::string(reinterpret_cast<const char*>(cell.data()), cell.size()));
} else if (column.type == decimal_type) {
// FIXME: use specialized Alternator number type, not the more
// general "decimal_type". A dedicated type can be more efficient
// in storage space and in parsing speed.
auto s = to_json_string(*decimal_type, bytes(cell));
return rjson::from_string(s);
} else {
// We shouldn't get here, we shouldn't see such key columns.
throw std::runtime_error(format("Unexpected key type: {}", column.type->name()));
}
}
partition_key pk_from_json(const rjson::value& item, schema_ptr schema) {
std::vector<bytes> raw_pk;
// FIXME: this is a loop, but we really allow only one partition key column.
for (const column_definition& cdef : schema->partition_key_columns()) {
bytes raw_value = get_key_column_value(item, cdef);
raw_pk.push_back(std::move(raw_value));
}
return partition_key::from_exploded(raw_pk);
}
clustering_key ck_from_json(const rjson::value& item, schema_ptr schema) {
if (schema->clustering_key_size() == 0) {
return clustering_key::make_empty();
}
std::vector<bytes> raw_ck;
// FIXME: this is a loop, but we really allow only one clustering key column.
for (const column_definition& cdef : schema->clustering_key_columns()) {
bytes raw_value = get_key_column_value(item, cdef);
raw_ck.push_back(std::move(raw_value));
}
return clustering_key::from_exploded(raw_ck);
}
big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {
if (!v.IsObject() || v.MemberCount() != 1) {
throw api_error("ValidationException", format("{}: invalid number object", diagnostic));
}
auto it = v.MemberBegin();
if (it->name != "N") {
throw api_error("ValidationException", format("{}: expected number, found type '{}'", diagnostic, it->name));
}
if (it->value.IsNumber()) {
// FIXME(sarna): should use big_decimal constructor with numeric values directly:
return big_decimal(rjson::print(it->value));
}
if (!it->value.IsString()) {
throw api_error("ValidationException", format("{}: improperly formatted number constant", diagnostic));
}
return big_decimal(it->value.GetString());
}
const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v) {
if (!v.IsObject() || v.MemberCount() != 1) {
return {"", nullptr};
}
auto it = v.MemberBegin();
const std::string it_key = it->name.GetString();
if (it_key != "SS" && it_key != "BS" && it_key != "NS") {
return {"", nullptr};
}
return std::make_pair(it_key, &(it->value));
}
}

View File

@@ -0,0 +1,72 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string>
#include <string_view>
#include "types.hh"
#include "schema.hh"
#include "keys.hh"
#include "rjson.hh"
#include "utils/big_decimal.hh"
namespace alternator {
enum class alternator_type : int8_t {
S, B, BOOL, N, NOT_SUPPORTED_YET
};
struct type_info {
alternator_type atype;
data_type dtype;
};
struct type_representation {
std::string ident;
data_type dtype;
};
type_info type_info_from_string(std::string type);
type_representation represent_type(alternator_type atype);
bytes serialize_item(const rjson::value& item);
rjson::value deserialize_item(bytes_view bv);
std::string type_to_string(data_type type);
bytes get_key_column_value(const rjson::value& item, const column_definition& column);
bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column, const std::string& expected_type);
rjson::value json_key_column_value(bytes_view cell, const column_definition& column);
partition_key pk_from_json(const rjson::value& item, schema_ptr schema);
clustering_key ck_from_json(const rjson::value& item, schema_ptr schema);
// If v encodes a number (i.e., it is a {"N": [...]}, returns an object representing it. Otherwise,
// raises ValidationException with diagnostic.
big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic);
// Check if a given JSON object encodes a set (i.e., it is a {"SS": [...]}, or "NS", "BS"
// and returns set's type and a pointer to that set. If the object does not encode a set,
// returned value is {"", nullptr}
const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v);
}

314
alternator/server.cc Normal file
View File

@@ -0,0 +1,314 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "alternator/server.hh"
#include "log.hh"
#include <seastar/http/function_handlers.hh>
#include <seastar/json/json_elements.hh>
#include <seastarx.hh>
#include "error.hh"
#include "rjson.hh"
#include "auth.hh"
#include <cctype>
#include "cql3/query_processor.hh"
static logging::logger slogger("alternator-server");
using namespace httpd;
namespace alternator {
static constexpr auto TARGET = "X-Amz-Target";
inline std::vector<std::string_view> split(std::string_view text, char separator) {
std::vector<std::string_view> tokens;
if (text == "") {
return tokens;
}
while (true) {
auto pos = text.find_first_of(separator);
if (pos != std::string_view::npos) {
tokens.emplace_back(text.data(), pos);
text.remove_prefix(pos + 1);
} else {
tokens.emplace_back(text);
break;
}
}
return tokens;
}
// DynamoDB HTTP error responses are structured as follows
// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html
// Our handlers throw an exception to report an error. If the exception
// is of type alternator::api_error, it unwrapped and properly reported to
// the user directly. Other exceptions are unexpected, and reported as
// Internal Server Error.
class api_handler : public handler_base {
public:
api_handler(const future_json_function& _handle) : _f_handle(
[_handle](std::unique_ptr<request> req, std::unique_ptr<reply> rep) {
return seastar::futurize_apply(_handle, std::move(req)).then_wrapped([rep = std::move(rep)](future<json::json_return_type> resf) mutable {
if (resf.failed()) {
// Exceptions of type api_error are wrapped as JSON and
// returned to the client as expected. Other types of
// exceptions are unexpected, and returned to the user
// as an internal server error:
api_error ret;
try {
resf.get();
} catch (api_error &ae) {
ret = ae;
} catch (rjson::error & re) {
ret = api_error("ValidationException", re.what());
} catch (...) {
ret = api_error(
"Internal Server Error",
format("Internal server error: {}", std::current_exception()),
reply::status_type::internal_server_error);
}
// FIXME: what is this version number?
rep->_content += "{\"__type\":\"com.amazonaws.dynamodb.v20120810#" + ret._type + "\"," +
"\"message\":\"" + ret._msg + "\"}";
rep->_status = ret._http_code;
slogger.trace("api_handler error case: {}", rep->_content);
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
slogger.trace("api_handler success case");
auto res = resf.get0();
if (res._body_writer) {
rep->write_body("json", std::move(res._body_writer));
} else {
rep->_content += res._res;
}
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
});
}), _type("json") { }
api_handler(const api_handler&) = default;
future<std::unique_ptr<reply>> handle(const sstring& path,
std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {
return _f_handle(std::move(req), std::move(rep)).then(
[this](std::unique_ptr<reply> rep) {
rep->done(_type);
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
});
}
protected:
future_handler_function _f_handle;
sstring _type;
};
class health_handler : public handler_base {
virtual future<std::unique_ptr<reply>> handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {
rep->set_status(reply::status_type::ok);
rep->write_body("txt", format("healthy: {}", req->get_header("Host")));
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
};
future<> server::verify_signature(const request& req) {
if (!_enforce_authorization) {
slogger.debug("Skipping authorization");
return make_ready_future<>();
}
auto host_it = req._headers.find("Host");
if (host_it == req._headers.end()) {
throw api_error("InvalidSignatureException", "Host header is mandatory for signature verification");
}
auto authorization_it = req._headers.find("Authorization");
if (host_it == req._headers.end()) {
throw api_error("InvalidSignatureException", "Authorization header is mandatory for signature verification");
}
std::string host = host_it->second;
std::vector<std::string_view> credentials_raw = split(authorization_it->second, ' ');
std::string credential;
std::string user_signature;
std::string signed_headers_str;
std::vector<std::string_view> signed_headers;
for (std::string_view entry : credentials_raw) {
std::vector<std::string_view> entry_split = split(entry, '=');
if (entry_split.size() != 2) {
if (entry != "AWS4-HMAC-SHA256") {
throw api_error("InvalidSignatureException", format("Only AWS4-HMAC-SHA256 algorithm is supported. Found: {}", entry));
}
continue;
}
std::string_view auth_value = entry_split[1];
// Commas appear as an additional (quite redundant) delimiter
if (auth_value.back() == ',') {
auth_value.remove_suffix(1);
}
if (entry_split[0] == "Credential") {
credential = std::string(auth_value);
} else if (entry_split[0] == "Signature") {
user_signature = std::string(auth_value);
} else if (entry_split[0] == "SignedHeaders") {
signed_headers_str = std::string(auth_value);
signed_headers = split(auth_value, ';');
std::sort(signed_headers.begin(), signed_headers.end());
}
}
std::vector<std::string_view> credential_split = split(credential, '/');
if (credential_split.size() != 5) {
throw api_error("ValidationException", format("Incorrect credential information format: {}", credential));
}
std::string user(credential_split[0]);
std::string datestamp(credential_split[1]);
std::string region(credential_split[2]);
std::string service(credential_split[3]);
std::map<std::string_view, std::string_view> signed_headers_map;
for (const auto& header : signed_headers) {
signed_headers_map.emplace(header, std::string_view());
}
for (auto& header : req._headers) {
std::string header_str;
header_str.resize(header.first.size());
std::transform(header.first.begin(), header.first.end(), header_str.begin(), ::tolower);
auto it = signed_headers_map.find(header_str);
if (it != signed_headers_map.end()) {
it->second = std::string_view(header.second);
}
}
auto cache_getter = [] (std::string username) {
return get_key_from_roles(cql3::get_query_processor().local(), std::move(username));
};
return _key_cache.get_ptr(user, cache_getter).then([this, &req,
user = std::move(user),
host = std::move(host),
datestamp = std::move(datestamp),
signed_headers_str = std::move(signed_headers_str),
signed_headers_map = std::move(signed_headers_map),
region = std::move(region),
service = std::move(service),
user_signature = std::move(user_signature)] (key_cache::value_ptr key_ptr) {
std::string signature = get_signature(user, *key_ptr, std::string_view(host), req._method,
datestamp, signed_headers_str, signed_headers_map, req.content, region, service, "");
if (signature != std::string_view(user_signature)) {
_key_cache.remove(user);
throw api_error("UnrecognizedClientException", "The security token included in the request is invalid.");
}
});
}
future<json::json_return_type> server::handle_api_request(std::unique_ptr<request>&& req) {
_executor.local()._stats.total_operations++;
sstring target = req->get_header(TARGET);
std::vector<std::string_view> split_target = split(target, '.');
//NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)
std::string op = split_target.empty() ? std::string() : std::string(split_target.back());
slogger.trace("Request: {} {}", op, req->content);
return verify_signature(*req).then([this, op, req = std::move(req)] () mutable {
auto callback_it = _callbacks.find(op);
if (callback_it == _callbacks.end()) {
_executor.local()._stats.unsupported_operations++;
throw api_error("UnknownOperationException",
format("Unsupported operation {}", op));
}
//FIXME: Client state can provide more context, e.g. client's endpoint address
// We use unique_ptr because client_state cannot be moved or copied
return do_with(std::make_unique<executor::client_state>(executor::client_state::internal_tag()), [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] (std::unique_ptr<executor::client_state>& client_state) mutable {
client_state->set_raw_keyspace(executor::KEYSPACE_NAME);
tracing::trace_state_ptr trace_state = executor::maybe_trace_query(*client_state, op, req->content);
tracing::trace(trace_state, op);
return callback_it->second(_executor.local(), *client_state, trace_state, std::move(req)).finally([trace_state] {});
});
});
}
void server::set_routes(routes& r) {
api_handler* req_handler = new api_handler([this] (std::unique_ptr<request> req) mutable {
return handle_api_request(std::move(req));
});
r.add(operation_type::POST, url("/"), req_handler);
r.add(operation_type::GET, url("/"), new health_handler);
}
//FIXME: A way to immediately invalidate the cache should be considered,
// e.g. when the system table which stores the keys is changed.
// For now, this propagation may take up to 1 minute.
server::server(seastar::sharded<executor>& e)
: _executor(e), _key_cache(1024, 1min, slogger), _enforce_authorization(false)
, _callbacks{
{"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) {
return e.maybe_create_keyspace().then([&e, &client_state, req = std::move(req), trace_state = std::move(trace_state)] () mutable { return e.create_table(client_state, std::move(trace_state), req->content); }); }
},
{"DescribeTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.describe_table(client_state, std::move(trace_state), req->content); }},
{"DeleteTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.delete_table(client_state, std::move(trace_state), req->content); }},
{"PutItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.put_item(client_state, std::move(trace_state), req->content); }},
{"UpdateItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.update_item(client_state, std::move(trace_state), req->content); }},
{"GetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.get_item(client_state, std::move(trace_state), req->content); }},
{"DeleteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.delete_item(client_state, std::move(trace_state), req->content); }},
{"ListTables", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.list_tables(client_state, req->content); }},
{"Scan", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.scan(client_state, std::move(trace_state), req->content); }},
{"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.describe_endpoints(client_state, req->content, req->get_header("Host")); }},
{"BatchWriteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.batch_write_item(client_state, std::move(trace_state), req->content); }},
{"BatchGetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.batch_get_item(client_state, std::move(trace_state), req->content); }},
{"Query", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.query(client_state, std::move(trace_state), req->content); }},
} {
}
future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization) {
_enforce_authorization = enforce_authorization;
if (!port && !https_port) {
return make_exception_future<>(std::runtime_error("Either regular port or TLS port"
" must be specified in order to init an alternator HTTP server instance"));
}
return seastar::async([this, addr, port, https_port, creds] {
try {
_executor.invoke_on_all([] (executor& e) {
return e.start();
}).get();
if (port) {
_control.start().get();
_control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();
_control.listen(socket_address{addr, *port}).get();
slogger.info("Alternator HTTP server listening on {} port {}", addr, *port);
}
if (https_port) {
_https_control.start().get();
_https_control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();
_https_control.server().invoke_on_all([creds] (http_server& serv) {
return serv.set_tls_credentials(creds->build_server_credentials());
}).get();
_https_control.listen(socket_address{addr, *https_port}).get();
slogger.info("Alternator HTTPS server listening on {} port {}", addr, *https_port);
}
} catch (...) {
slogger.error("Failed to set up Alternator HTTP server on {} port {}, TLS port {}: {}",
addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF", std::current_exception());
std::throw_with_nested(std::runtime_error(
format("Failed to set up Alternator HTTP server on {} port {}, TLS port {}",
addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF")));
}
});
}
}

54
alternator/server.hh Normal file
View File

@@ -0,0 +1,54 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "alternator/executor.hh"
#include <seastar/core/future.hh>
#include <seastar/http/httpd.hh>
#include <seastar/net/tls.hh>
#include <optional>
#include <alternator/auth.hh>
namespace alternator {
class server {
using alternator_callback = std::function<future<json::json_return_type>(executor&, executor::client_state&, tracing::trace_state_ptr, std::unique_ptr<request>)>;
using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;
seastar::httpd::http_server_control _control;
seastar::httpd::http_server_control _https_control;
seastar::sharded<executor>& _executor;
key_cache _key_cache;
bool _enforce_authorization;
alternator_callbacks_map _callbacks;
public:
server(seastar::sharded<executor>& executor);
seastar::future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization);
private:
void set_routes(seastar::httpd::routes& r);
future<> verify_signature(const seastar::httpd::request& r);
future<json::json_return_type> handle_api_request(std::unique_ptr<request>&& req);
};
}

98
alternator/stats.cc Normal file
View File

@@ -0,0 +1,98 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "stats.hh"
#include <seastar/core/metrics.hh>
namespace alternator {
const char* ALTERNATOR_METRICS = "alternator";
stats::stats() : api_operations{} {
// Register the
seastar::metrics::label op("op");
_metrics.add_group("alternator", {
#define OPERATION(name, CamelCaseName) \
seastar::metrics::make_total_operations("operation", api_operations.name, \
seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}),
#define OPERATION_LATENCY(name, CamelCaseName) \
seastar::metrics::make_histogram("op_latency", \
seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return api_operations.name.get_histogram(1,20);}),
OPERATION(batch_write_item, "BatchWriteItem")
OPERATION(create_backup, "CreateBackup")
OPERATION(create_global_table, "CreateGlobalTable")
OPERATION(create_table, "CreateTable")
OPERATION(delete_backup, "DeleteBackup")
OPERATION(delete_item, "DeleteItem")
OPERATION(delete_table, "DeleteTable")
OPERATION(describe_backup, "DescribeBackup")
OPERATION(describe_continuous_backups, "DescribeContinuousBackups")
OPERATION(describe_endpoints, "DescribeEndpoints")
OPERATION(describe_global_table, "DescribeGlobalTable")
OPERATION(describe_global_table_settings, "DescribeGlobalTableSettings")
OPERATION(describe_limits, "DescribeLimits")
OPERATION(describe_table, "DescribeTable")
OPERATION(describe_time_to_live, "DescribeTimeToLive")
OPERATION(get_item, "GetItem")
OPERATION(list_backups, "ListBackups")
OPERATION(list_global_tables, "ListGlobalTables")
OPERATION(list_tables, "ListTables")
OPERATION(list_tags_of_resource, "ListTagsOfResource")
OPERATION(put_item, "PutItem")
OPERATION(query, "Query")
OPERATION(restore_table_from_backup, "RestoreTableFromBackup")
OPERATION(restore_table_to_point_in_time, "RestoreTableToPointInTime")
OPERATION(scan, "Scan")
OPERATION(tag_resource, "TagResource")
OPERATION(transact_get_items, "TransactGetItems")
OPERATION(transact_write_items, "TransactWriteItems")
OPERATION(untag_resource, "UntagResource")
OPERATION(update_continuous_backups, "UpdateContinuousBackups")
OPERATION(update_global_table, "UpdateGlobalTable")
OPERATION(update_global_table_settings, "UpdateGlobalTableSettings")
OPERATION(update_item, "UpdateItem")
OPERATION(update_table, "UpdateTable")
OPERATION(update_time_to_live, "UpdateTimeToLive")
OPERATION_LATENCY(put_item_latency, "PutItem")
OPERATION_LATENCY(get_item_latency, "GetItem")
OPERATION_LATENCY(delete_item_latency, "DeleteItem")
OPERATION_LATENCY(update_item_latency, "UpdateItem")
});
_metrics.add_group("alternator", {
seastar::metrics::make_total_operations("unsupported_operations", unsupported_operations,
seastar::metrics::description("number of unsupported operations via Alternator API")),
seastar::metrics::make_total_operations("total_operations", total_operations,
seastar::metrics::description("number of total operations via Alternator API")),
seastar::metrics::make_total_operations("reads_before_write", reads_before_write,
seastar::metrics::description("number of performed read-before-write operations")),
seastar::metrics::make_total_operations("filtered_rows_read_total", cql_stats.filtered_rows_read_total,
seastar::metrics::description("number of rows read during filtering operations")),
seastar::metrics::make_total_operations("filtered_rows_matched_total", cql_stats.filtered_rows_matched_total,
seastar::metrics::description("number of rows read and matched during filtering operations")),
seastar::metrics::make_total_operations("filtered_rows_dropped_total", [this] { return cql_stats.filtered_rows_read_total - cql_stats.filtered_rows_matched_total; },
seastar::metrics::description("number of rows read and dropped during filtering operations")),
});
}
}

95
alternator/stats.hh Normal file
View File

@@ -0,0 +1,95 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <cstdint>
#include <seastar/core/metrics_registration.hh>
#include "seastarx.hh"
#include "utils/estimated_histogram.hh"
#include "cql3/stats.hh"
namespace alternator {
// Object holding per-shard statistics related to Alternator.
// While this object is alive, these metrics are also registered to be
// visible by the metrics REST API, with the "alternator" prefix.
class stats {
public:
stats();
// Count of DynamoDB API operations by types
struct {
uint64_t batch_get_item = 0;
uint64_t batch_write_item = 0;
uint64_t create_backup = 0;
uint64_t create_global_table = 0;
uint64_t create_table = 0;
uint64_t delete_backup = 0;
uint64_t delete_item = 0;
uint64_t delete_table = 0;
uint64_t describe_backup = 0;
uint64_t describe_continuous_backups = 0;
uint64_t describe_endpoints = 0;
uint64_t describe_global_table = 0;
uint64_t describe_global_table_settings = 0;
uint64_t describe_limits = 0;
uint64_t describe_table = 0;
uint64_t describe_time_to_live = 0;
uint64_t get_item = 0;
uint64_t list_backups = 0;
uint64_t list_global_tables = 0;
uint64_t list_tables = 0;
uint64_t list_tags_of_resource = 0;
uint64_t put_item = 0;
uint64_t query = 0;
uint64_t restore_table_from_backup = 0;
uint64_t restore_table_to_point_in_time = 0;
uint64_t scan = 0;
uint64_t tag_resource = 0;
uint64_t transact_get_items = 0;
uint64_t transact_write_items = 0;
uint64_t untag_resource = 0;
uint64_t update_continuous_backups = 0;
uint64_t update_global_table = 0;
uint64_t update_global_table_settings = 0;
uint64_t update_item = 0;
uint64_t update_table = 0;
uint64_t update_time_to_live = 0;
utils::estimated_histogram put_item_latency;
utils::estimated_histogram get_item_latency;
utils::estimated_histogram delete_item_latency;
utils::estimated_histogram update_item_latency;
} api_operations;
// Miscellaneous event counters
uint64_t total_operations = 0;
uint64_t unsupported_operations = 0;
uint64_t reads_before_write = 0;
// CQL-derived stats
cql3::cql_stats cql_stats;
private:
// The metric_groups object holds this stat object's metrics registered
// as long as the stats object is alive.
seastar::metrics::metric_groups _metrics;
};
}

View File

@@ -13,7 +13,7 @@
{
"method":"GET",
"summary":"get row cache save period in seconds",
"type":"int",
"type": "long",
"nickname":"get_row_cache_save_period_in_seconds",
"produces":[
"application/json"
@@ -35,7 +35,7 @@
"description":"row cache save period in seconds",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -48,7 +48,7 @@
{
"method":"GET",
"summary":"get key cache save period in seconds",
"type":"int",
"type": "long",
"nickname":"get_key_cache_save_period_in_seconds",
"produces":[
"application/json"
@@ -70,7 +70,7 @@
"description":"key cache save period in seconds",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -83,7 +83,7 @@
{
"method":"GET",
"summary":"get counter cache save period in seconds",
"type":"int",
"type": "long",
"nickname":"get_counter_cache_save_period_in_seconds",
"produces":[
"application/json"
@@ -105,7 +105,7 @@
"description":"counter cache save period in seconds",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -118,7 +118,7 @@
{
"method":"GET",
"summary":"get row cache keys to save",
"type":"int",
"type": "long",
"nickname":"get_row_cache_keys_to_save",
"produces":[
"application/json"
@@ -140,7 +140,7 @@
"description":"row cache keys to save",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -153,7 +153,7 @@
{
"method":"GET",
"summary":"get key cache keys to save",
"type":"int",
"type": "long",
"nickname":"get_key_cache_keys_to_save",
"produces":[
"application/json"
@@ -175,7 +175,7 @@
"description":"key cache keys to save",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -188,7 +188,7 @@
{
"method":"GET",
"summary":"get counter cache keys to save",
"type":"int",
"type": "long",
"nickname":"get_counter_cache_keys_to_save",
"produces":[
"application/json"
@@ -210,7 +210,7 @@
"description":"counter cache keys to save",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -448,7 +448,7 @@
{
"method": "GET",
"summary": "Get key entries",
"type": "int",
"type": "long",
"nickname": "get_key_entries",
"produces": [
"application/json"
@@ -568,7 +568,7 @@
{
"method": "GET",
"summary": "Get row entries",
"type": "int",
"type": "long",
"nickname": "get_row_entries",
"produces": [
"application/json"
@@ -688,7 +688,7 @@
{
"method": "GET",
"summary": "Get counter entries",
"type": "int",
"type": "long",
"nickname": "get_counter_entries",
"produces": [
"application/json"

View File

@@ -121,7 +121,7 @@
"description":"The minimum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -172,7 +172,7 @@
"description":"The maximum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -223,7 +223,7 @@
"description":"The maximum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
},
{
@@ -231,7 +231,7 @@
"description":"The minimum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -544,7 +544,7 @@
"summary":"sstable count for each level. empty unless leveled compaction is used",
"type":"array",
"items":{
"type":"int"
"type": "long"
},
"nickname":"get_sstable_count_per_level",
"produces":[
@@ -611,6 +611,54 @@
}
]
},
{
"path":"/column_family/toppartitions/{name}",
"operations":[
{
"method":"GET",
"summary":"Toppartitions query",
"type":"toppartitions_query_results",
"nickname":"toppartitions",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"duration",
"description":"Duration (in milliseconds) of monitoring operation",
"required":true,
"allowMultiple":false,
"type": "long",
"paramType":"query"
},
{
"name":"list_size",
"description":"number of the top partitions to list",
"required":false,
"allowMultiple":false,
"type": "long",
"paramType":"query"
},
{
"name":"capacity",
"description":"capacity of stream summary: determines amount of resources used in query processing",
"required":false,
"allowMultiple":false,
"type": "long",
"paramType":"query"
}
]
}
]
},
{
"path":"/column_family/metrics/memtable_columns_count/",
"operations":[
@@ -873,7 +921,7 @@
{
"method":"GET",
"summary":"Get memtable switch count",
"type":"int",
"type": "long",
"nickname":"get_memtable_switch_count",
"produces":[
"application/json"
@@ -897,7 +945,7 @@
{
"method":"GET",
"summary":"Get all memtable switch count",
"type":"int",
"type": "long",
"nickname":"get_all_memtable_switch_count",
"produces":[
"application/json"
@@ -1034,7 +1082,7 @@
{
"method":"GET",
"summary":"Get read latency",
"type":"int",
"type": "long",
"nickname":"get_read_latency",
"produces":[
"application/json"
@@ -1187,7 +1235,7 @@
{
"method":"GET",
"summary":"Get all read latency",
"type":"int",
"type": "long",
"nickname":"get_all_read_latency",
"produces":[
"application/json"
@@ -1203,7 +1251,7 @@
{
"method":"GET",
"summary":"Get range latency",
"type":"int",
"type": "long",
"nickname":"get_range_latency",
"produces":[
"application/json"
@@ -1227,7 +1275,7 @@
{
"method":"GET",
"summary":"Get all range latency",
"type":"int",
"type": "long",
"nickname":"get_all_range_latency",
"produces":[
"application/json"
@@ -1243,7 +1291,7 @@
{
"method":"GET",
"summary":"Get write latency",
"type":"int",
"type": "long",
"nickname":"get_write_latency",
"produces":[
"application/json"
@@ -1396,7 +1444,7 @@
{
"method":"GET",
"summary":"Get all write latency",
"type":"int",
"type": "long",
"nickname":"get_all_write_latency",
"produces":[
"application/json"
@@ -1412,7 +1460,7 @@
{
"method":"GET",
"summary":"Get pending flushes",
"type":"int",
"type": "long",
"nickname":"get_pending_flushes",
"produces":[
"application/json"
@@ -1436,7 +1484,7 @@
{
"method":"GET",
"summary":"Get all pending flushes",
"type":"int",
"type": "long",
"nickname":"get_all_pending_flushes",
"produces":[
"application/json"
@@ -1452,7 +1500,7 @@
{
"method":"GET",
"summary":"Get pending compactions",
"type":"int",
"type": "long",
"nickname":"get_pending_compactions",
"produces":[
"application/json"
@@ -1476,7 +1524,7 @@
{
"method":"GET",
"summary":"Get all pending compactions",
"type":"int",
"type": "long",
"nickname":"get_all_pending_compactions",
"produces":[
"application/json"
@@ -1492,7 +1540,7 @@
{
"method":"GET",
"summary":"Get live ss table count",
"type":"int",
"type": "long",
"nickname":"get_live_ss_table_count",
"produces":[
"application/json"
@@ -1516,7 +1564,7 @@
{
"method":"GET",
"summary":"Get all live ss table count",
"type":"int",
"type": "long",
"nickname":"get_all_live_ss_table_count",
"produces":[
"application/json"
@@ -1532,7 +1580,7 @@
{
"method":"GET",
"summary":"Get live disk space used",
"type":"int",
"type": "long",
"nickname":"get_live_disk_space_used",
"produces":[
"application/json"
@@ -1556,7 +1604,7 @@
{
"method":"GET",
"summary":"Get all live disk space used",
"type":"int",
"type": "long",
"nickname":"get_all_live_disk_space_used",
"produces":[
"application/json"
@@ -1572,7 +1620,7 @@
{
"method":"GET",
"summary":"Get total disk space used",
"type":"int",
"type": "long",
"nickname":"get_total_disk_space_used",
"produces":[
"application/json"
@@ -1596,7 +1644,7 @@
{
"method":"GET",
"summary":"Get all total disk space used",
"type":"int",
"type": "long",
"nickname":"get_all_total_disk_space_used",
"produces":[
"application/json"
@@ -2052,7 +2100,7 @@
{
"method":"GET",
"summary":"Get speculative retries",
"type":"int",
"type": "long",
"nickname":"get_speculative_retries",
"produces":[
"application/json"
@@ -2076,7 +2124,7 @@
{
"method":"GET",
"summary":"Get all speculative retries",
"type":"int",
"type": "long",
"nickname":"get_all_speculative_retries",
"produces":[
"application/json"
@@ -2156,7 +2204,7 @@
{
"method":"GET",
"summary":"Get row cache hit out of range",
"type":"int",
"type": "long",
"nickname":"get_row_cache_hit_out_of_range",
"produces":[
"application/json"
@@ -2180,7 +2228,7 @@
{
"method":"GET",
"summary":"Get all row cache hit out of range",
"type":"int",
"type": "long",
"nickname":"get_all_row_cache_hit_out_of_range",
"produces":[
"application/json"
@@ -2196,7 +2244,7 @@
{
"method":"GET",
"summary":"Get row cache hit",
"type":"int",
"type": "long",
"nickname":"get_row_cache_hit",
"produces":[
"application/json"
@@ -2220,7 +2268,7 @@
{
"method":"GET",
"summary":"Get all row cache hit",
"type":"int",
"type": "long",
"nickname":"get_all_row_cache_hit",
"produces":[
"application/json"
@@ -2236,7 +2284,7 @@
{
"method":"GET",
"summary":"Get row cache miss",
"type":"int",
"type": "long",
"nickname":"get_row_cache_miss",
"produces":[
"application/json"
@@ -2260,7 +2308,7 @@
{
"method":"GET",
"summary":"Get all row cache miss",
"type":"int",
"type": "long",
"nickname":"get_all_row_cache_miss",
"produces":[
"application/json"
@@ -2276,7 +2324,7 @@
{
"method":"GET",
"summary":"Get cas prepare",
"type":"int",
"type": "long",
"nickname":"get_cas_prepare",
"produces":[
"application/json"
@@ -2300,7 +2348,7 @@
{
"method":"GET",
"summary":"Get cas propose",
"type":"int",
"type": "long",
"nickname":"get_cas_propose",
"produces":[
"application/json"
@@ -2324,7 +2372,7 @@
{
"method":"GET",
"summary":"Get cas commit",
"type":"int",
"type": "long",
"nickname":"get_cas_commit",
"produces":[
"application/json"
@@ -2816,6 +2864,44 @@
"description":"The column family type"
}
}
},
"toppartitions_record":{
"id":"toppartitions_record",
"description":"nodetool toppartitions query record",
"properties":{
"partition":{
"type":"string",
"description":"Partition key"
},
"count":{
"type":"long",
"description":"Number of read/write operations"
},
"error":{
"type":"long",
"description":"Indication of inaccuracy in counting PKs"
}
}
},
"toppartitions_query_results":{
"id":"toppartitions_query_results",
"description":"nodetool toppartitions query results",
"properties":{
"read":{
"type":"array",
"items":{
"type":"toppartitions_record"
},
"description":"Read results"
},
"write":{
"type":"array",
"items":{
"type":"toppartitions_record"
},
"description":"Write results"
}
}
}
}
}

View File

@@ -118,7 +118,7 @@
{
"method": "GET",
"summary": "Get pending tasks",
"type": "int",
"type": "long",
"nickname": "get_pending_tasks",
"produces": [
"application/json"
@@ -127,6 +127,24 @@
}
]
},
{
"path": "/compaction_manager/metrics/pending_tasks_by_table",
"operations": [
{
"method": "GET",
"summary": "Get pending tasks by table name",
"type": "array",
"items": {
"type": "pending_compaction"
},
"nickname": "get_pending_tasks_by_table",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/compaction_manager/metrics/completed_tasks",
"operations": [
@@ -163,7 +181,7 @@
{
"method": "GET",
"summary": "Get bytes compacted",
"type": "int",
"type": "long",
"nickname": "get_bytes_compacted",
"produces": [
"application/json"
@@ -179,7 +197,7 @@
"description":"A row merged information",
"properties":{
"key":{
"type":"int",
"type": "long",
"description":"The number of sstable"
},
"value":{
@@ -244,6 +262,23 @@
}
}
},
"pending_compaction": {
"id": "pending_compaction",
"properties": {
"cf": {
"type": "string",
"description": "The column family name"
},
"ks": {
"type":"string",
"description": "The keyspace name"
},
"task": {
"type":"long",
"description": "The number of pending tasks"
}
}
},
"history": {
"id":"history",
"description":"Compaction history information",

View File

@@ -110,7 +110,7 @@
{
"method":"GET",
"summary":"Get count down endpoint",
"type":"int",
"type": "long",
"nickname":"get_down_endpoint_count",
"produces":[
"application/json"
@@ -126,7 +126,7 @@
{
"method":"GET",
"summary":"Get count up endpoint",
"type":"int",
"type": "long",
"nickname":"get_up_endpoint_count",
"produces":[
"application/json"
@@ -180,11 +180,11 @@
"description": "The endpoint address"
},
"generation": {
"type": "int",
"type": "long",
"description": "The heart beat generation"
},
"version": {
"type": "int",
"type": "long",
"description": "The heart beat version"
},
"update_time": {
@@ -209,7 +209,7 @@
"description": "Holds a version value for an application state",
"properties": {
"application_state": {
"type": "int",
"type": "long",
"description": "The application state enum index"
},
"value": {
@@ -217,7 +217,7 @@
"description": "The version value"
},
"version": {
"type": "int",
"type": "long",
"description": "The application state version"
}
}

View File

@@ -75,7 +75,7 @@
{
"method":"GET",
"summary":"Returns files which are pending for archival attempt. Does NOT include failed archive attempts",
"type":"int",
"type": "long",
"nickname":"get_current_generation_number",
"produces":[
"application/json"
@@ -99,7 +99,7 @@
{
"method":"GET",
"summary":"Get heart beat version for a node",
"type":"int",
"type": "long",
"nickname":"get_current_heart_beat_version",
"produces":[
"application/json"

View File

@@ -99,7 +99,7 @@
{
"method": "GET",
"summary": "Get create hint count",
"type": "int",
"type": "long",
"nickname": "get_create_hint_count",
"produces": [
"application/json"
@@ -123,7 +123,7 @@
{
"method": "GET",
"summary": "Get not stored hints count",
"type": "int",
"type": "long",
"nickname": "get_not_stored_hints_count",
"produces": [
"application/json"

View File

@@ -191,7 +191,7 @@
{
"method":"GET",
"summary":"Get the version number",
"type":"int",
"type": "long",
"nickname":"get_version",
"produces":[
"application/json"

View File

@@ -105,7 +105,7 @@
{
"method":"GET",
"summary":"Get the max hint window",
"type":"int",
"type": "long",
"nickname":"get_max_hint_window",
"produces":[
"application/json"
@@ -128,7 +128,7 @@
"description":"max hint window in ms",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -141,7 +141,7 @@
{
"method":"GET",
"summary":"Get max hints in progress",
"type":"int",
"type": "long",
"nickname":"get_max_hints_in_progress",
"produces":[
"application/json"
@@ -164,7 +164,7 @@
"description":"max hints in progress",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -177,7 +177,7 @@
{
"method":"GET",
"summary":"get hints in progress",
"type":"int",
"type": "long",
"nickname":"get_hints_in_progress",
"produces":[
"application/json"
@@ -602,7 +602,7 @@
{
"method": "GET",
"summary": "Get cas write metrics",
"type": "int",
"type": "long",
"nickname": "get_cas_write_metrics_unfinished_commit",
"produces": [
"application/json"
@@ -632,7 +632,7 @@
{
"method": "GET",
"summary": "Get cas write metrics",
"type": "int",
"type": "long",
"nickname": "get_cas_write_metrics_condition_not_met",
"produces": [
"application/json"
@@ -647,7 +647,7 @@
{
"method": "GET",
"summary": "Get cas read metrics",
"type": "int",
"type": "long",
"nickname": "get_cas_read_metrics_unfinished_commit",
"produces": [
"application/json"
@@ -671,28 +671,13 @@
}
]
},
{
"path": "/storage_proxy/metrics/cas_read/condition_not_met",
"operations": [
{
"method": "GET",
"summary": "Get cas read metrics",
"type": "int",
"nickname": "get_cas_read_metrics_condition_not_met",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/read/timeouts",
"operations": [
{
"method": "GET",
"summary": "Get read metrics",
"type": "int",
"type": "long",
"nickname": "get_read_metrics_timeouts",
"produces": [
"application/json"
@@ -707,7 +692,7 @@
{
"method": "GET",
"summary": "Get read metrics",
"type": "int",
"type": "long",
"nickname": "get_read_metrics_unavailables",
"produces": [
"application/json"
@@ -791,6 +776,36 @@
}
]
},
{
"path": "/storage_proxy/metrics/cas_read/moving_average_histogram",
"operations": [
{
"method": "GET",
"summary": "Get CAS read rate and latency histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_cas_read_metrics_latency_histogram",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/view_write/moving_average_histogram",
"operations": [
{
"method": "GET",
"summary": "Get view write rate and latency histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_view_write_metrics_latency_histogram",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/range/moving_average_histogram",
"operations": [
@@ -812,7 +827,7 @@
{
"method": "GET",
"summary": "Get range metrics",
"type": "int",
"type": "long",
"nickname": "get_range_metrics_timeouts",
"produces": [
"application/json"
@@ -827,7 +842,7 @@
{
"method": "GET",
"summary": "Get range metrics",
"type": "int",
"type": "long",
"nickname": "get_range_metrics_unavailables",
"produces": [
"application/json"
@@ -872,7 +887,7 @@
{
"method": "GET",
"summary": "Get write metrics",
"type": "int",
"type": "long",
"nickname": "get_write_metrics_timeouts",
"produces": [
"application/json"
@@ -887,7 +902,7 @@
{
"method": "GET",
"summary": "Get write metrics",
"type": "int",
"type": "long",
"nickname": "get_write_metrics_unavailables",
"produces": [
"application/json"
@@ -956,6 +971,21 @@
}
]
},
{
"path": "/storage_proxy/metrics/cas_write/moving_average_histogram",
"operations": [
{
"method": "GET",
"summary": "Get CAS write rate and latency histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_cas_write_metrics_latency_histogram",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path":"/storage_proxy/metrics/read/estimated_histogram/",
"operations":[
@@ -978,7 +1008,7 @@
{
"method":"GET",
"summary":"Get read latency",
"type":"int",
"type": "long",
"nickname":"get_read_latency",
"produces":[
"application/json"
@@ -1010,7 +1040,7 @@
{
"method":"GET",
"summary":"Get write latency",
"type":"int",
"type": "long",
"nickname":"get_write_latency",
"produces":[
"application/json"
@@ -1042,7 +1072,7 @@
{
"method":"GET",
"summary":"Get range latency",
"type":"int",
"type": "long",
"nickname":"get_range_latency",
"produces":[
"application/json"

View File

@@ -458,7 +458,7 @@
{
"method":"GET",
"summary":"Return the generation value for this node.",
"type":"int",
"type": "long",
"nickname":"get_current_generation_number",
"produces":[
"application/json"
@@ -646,7 +646,7 @@
{
"method":"POST",
"summary":"Trigger a cleanup of keys on a single keyspace",
"type":"int",
"type": "long",
"nickname":"force_keyspace_cleanup",
"produces":[
"application/json"
@@ -678,7 +678,7 @@
{
"method":"GET",
"summary":"Scrub (deserialize + reserialize at the latest version, skipping bad rows if any) the given keyspace. If columnFamilies array is empty, all CFs are scrubbed. Scrubbed CFs will be snapshotted first, if disableSnapshot is false",
"type":"int",
"type": "long",
"nickname":"scrub",
"produces":[
"application/json"
@@ -726,7 +726,7 @@
{
"method":"GET",
"summary":"Rewrite all sstables to the latest version. Unlike scrub, it doesn't skip bad rows and do not snapshot sstables first.",
"type":"int",
"type": "long",
"nickname":"upgrade_sstables",
"produces":[
"application/json"
@@ -800,7 +800,7 @@
"summary":"Return an array with the ids of the currently active repairs",
"type":"array",
"items":{
"type":"int"
"type": "long"
},
"nickname":"get_active_repair_async",
"produces":[
@@ -816,7 +816,7 @@
{
"method":"POST",
"summary":"Invoke repair asynchronously. You can track repair progress by using the get supplying id",
"type":"int",
"type": "long",
"nickname":"repair_async",
"produces":[
"application/json"
@@ -947,7 +947,7 @@
"description":"The repair ID to check for status",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1277,18 +1277,18 @@
},
{
"name":"dynamic_update_interval",
"description":"integer, in ms (default 100)",
"description":"interval in ms (default 100)",
"required":false,
"allowMultiple":false,
"type":"integer",
"type":"long",
"paramType":"query"
},
{
"name":"dynamic_reset_interval",
"description":"integer, in ms (default 600,000)",
"description":"interval in ms (default 600,000)",
"required":false,
"allowMultiple":false,
"type":"integer",
"type":"long",
"paramType":"query"
},
{
@@ -1493,7 +1493,7 @@
"description":"Stream throughput",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1501,7 +1501,7 @@
{
"method":"GET",
"summary":"Get stream throughput mb per sec",
"type":"int",
"type": "long",
"nickname":"get_stream_throughput_mb_per_sec",
"produces":[
"application/json"
@@ -1517,7 +1517,7 @@
{
"method":"GET",
"summary":"get compaction throughput mb per sec",
"type":"int",
"type": "long",
"nickname":"get_compaction_throughput_mb_per_sec",
"produces":[
"application/json"
@@ -1539,7 +1539,7 @@
"description":"compaction throughput",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1943,7 +1943,7 @@
{
"method":"GET",
"summary":"Returns the threshold for warning of queries with many tombstones",
"type":"int",
"type": "long",
"nickname":"get_tombstone_warn_threshold",
"produces":[
"application/json"
@@ -1965,7 +1965,7 @@
"description":"tombstone debug threshold",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1978,7 +1978,7 @@
{
"method":"GET",
"summary":"",
"type":"int",
"type": "long",
"nickname":"get_tombstone_failure_threshold",
"produces":[
"application/json"
@@ -2000,7 +2000,7 @@
"description":"tombstone debug threshold",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -2013,7 +2013,7 @@
{
"method":"GET",
"summary":"Returns the threshold for rejecting queries due to a large batch size",
"type":"int",
"type": "long",
"nickname":"get_batch_size_failure_threshold",
"produces":[
"application/json"
@@ -2035,7 +2035,7 @@
"description":"batch size debug threshold",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -2059,7 +2059,7 @@
"description":"throttle in kb",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -2072,7 +2072,7 @@
{
"method":"GET",
"summary":"Get load",
"type":"int",
"type": "long",
"nickname":"get_metrics_load",
"produces":[
"application/json"
@@ -2088,7 +2088,7 @@
{
"method":"GET",
"summary":"Get exceptions",
"type":"int",
"type": "long",
"nickname":"get_exceptions",
"produces":[
"application/json"
@@ -2104,7 +2104,7 @@
{
"method":"GET",
"summary":"Get total hints in progress",
"type":"int",
"type": "long",
"nickname":"get_total_hints_in_progress",
"produces":[
"application/json"
@@ -2120,7 +2120,7 @@
{
"method":"GET",
"summary":"Get total hints",
"type":"int",
"type": "long",
"nickname":"get_total_hints",
"produces":[
"application/json"
@@ -2164,7 +2164,42 @@
]
}
]
}
},
{
"path":"/storage_service/sstable_info",
"operations":[
{
"method":"GET",
"summary":"SSTable information",
"type":"array",
"items":{
"type":"table_sstables"
},
"nickname":"sstable_info",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"cf",
"description":"column family name",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
}
],
"models":{
"mapper":{
@@ -2228,11 +2263,11 @@
"description":"The column family"
},
"total":{
"type":"int",
"type":"long",
"description":"The total snapshot size"
},
"live":{
"type":"int",
"type":"long",
"description":"The live snapshot size"
}
}
@@ -2324,6 +2359,92 @@
"description":"The endpoint details"
}
}
},
"named_maps":{
"id":"named_maps",
"properties":{
"group":{
"type":"string"
},
"attributes":{
"type":"array",
"items":{
"type":"mapper"
}
}
}
},
"sstable":{
"id":"sstable",
"properties":{
"size":{
"type":"long",
"description":"Total size in bytes of sstable"
},
"data_size":{
"type":"long",
"description":"The size in bytes on disk of data"
},
"index_size":{
"type":"long",
"description":"The size in bytes on disk of index"
},
"filter_size":{
"type":"long",
"description":"The size in bytes on disk of filter"
},
"timestamp":{
"type":"datetime",
"description":"File creation time"
},
"generation":{
"type":"long",
"description":"SSTable generation"
},
"level":{
"type":"long",
"description":"SSTable level"
},
"version":{
"type":"string",
"enum":[
"ka", "la", "mc"
],
"description":"SSTable version"
},
"properties":{
"type":"array",
"description":"SSTable attributes",
"items":{
"type":"mapper"
}
},
"extended_properties":{
"type":"array",
"description":"SSTable extended attributes",
"items":{
"type":"named_maps"
}
}
}
},
"table_sstables":{
"id":"table_sstables",
"description":"Per-table SSTable info and attributes",
"properties":{
"keyspace":{
"type":"string"
},
"table":{
"type":"string"
},
"sstables":{
"type":"array",
"items":{
"$ref":"sstable"
}
}
}
}
}
}

View File

@@ -32,7 +32,7 @@
{
"method":"GET",
"summary":"Get number of active outbound streams",
"type":"int",
"type": "long",
"nickname":"get_all_active_streams_outbound",
"produces":[
"application/json"
@@ -48,7 +48,7 @@
{
"method":"GET",
"summary":"Get total incoming bytes",
"type":"int",
"type": "long",
"nickname":"get_total_incoming_bytes",
"produces":[
"application/json"
@@ -72,7 +72,7 @@
{
"method":"GET",
"summary":"Get all total incoming bytes",
"type":"int",
"type": "long",
"nickname":"get_all_total_incoming_bytes",
"produces":[
"application/json"
@@ -88,7 +88,7 @@
{
"method":"GET",
"summary":"Get total outgoing bytes",
"type":"int",
"type": "long",
"nickname":"get_total_outgoing_bytes",
"produces":[
"application/json"
@@ -112,7 +112,7 @@
{
"method":"GET",
"summary":"Get all total outgoing bytes",
"type":"int",
"type": "long",
"nickname":"get_all_total_outgoing_bytes",
"produces":[
"application/json"
@@ -154,7 +154,7 @@
"description":"The peer"
},
"session_index":{
"type":"int",
"type": "long",
"description":"The session index"
},
"connecting":{
@@ -211,7 +211,7 @@
"description":"The ID"
},
"files":{
"type":"int",
"type": "long",
"description":"Number of files to transfer. Can be 0 if nothing to transfer for some streaming request."
},
"total_size":{
@@ -242,7 +242,7 @@
"description":"The peer address"
},
"session_index":{
"type":"int",
"type": "long",
"description":"The session index"
},
"file_name":{

View File

@@ -52,6 +52,21 @@
}
]
},
{
"path":"/system/uptime_ms",
"operations":[
{
"method":"GET",
"summary":"Get system uptime, in milliseconds",
"type":"long",
"nickname":"get_system_uptime",
"produces":[
"application/json"
],
"parameters":[]
}
]
},
{
"path":"/system/logger/{name}",
"operations":[

View File

@@ -20,9 +20,9 @@
*/
#include "api.hh"
#include "http/file_handler.hh"
#include "http/transformers.hh"
#include "http/api_docs.hh"
#include <seastar/http/file_handler.hh>
#include <seastar/http/transformers.hh>
#include <seastar/http/api_docs.hh>
#include "storage_service.hh"
#include "commitlog.hh"
#include "gossiper.hh"
@@ -36,11 +36,13 @@
#include "endpoint_snitch.hh"
#include "compaction_manager.hh"
#include "hinted_handoff.hh"
#include "http/exception.hh"
#include <seastar/http/exception.hh>
#include "stream_manager.hh"
#include "system.hh"
#include "api/config.hh"
logging::logger apilog("api");
namespace api {
static std::unique_ptr<reply> exception_reply(std::exception_ptr eptr) {

View File

@@ -21,13 +21,15 @@
#pragma once
#include "json/json_elements.hh"
#include <seastar/json/json_elements.hh>
#include <type_traits>
#include <boost/lexical_cast.hpp>
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>
#include <boost/units/detail/utility.hpp>
#include "api/api-doc/utils.json.hh"
#include "utils/histogram.hh"
#include "http/exception.hh"
#include <seastar/http/exception.hh>
#include "api_init.hh"
#include "seastarx.hh"
@@ -216,4 +218,42 @@ std::vector<T> concat(std::vector<T> a, std::vector<T>&& b) {
return a;
}
template <class T, class Base = T>
class req_param {
public:
sstring name;
sstring param;
T value;
req_param(const request& req, sstring name, T default_val) : name(name) {
param = req.get_query_param(name);
if (param.empty()) {
value = default_val;
return;
}
try {
// boost::lexical_cast does not use boolalpha. Converting a
// true/false throws exceptions. We don't want that.
if constexpr (std::is_same_v<Base, bool>) {
// Cannot use boolalpha because we (probably) want to
// accept 1 and 0 as well as true and false. And True. And fAlse.
std::transform(param.begin(), param.end(), param.begin(), ::tolower);
if (param == "true" || param == "1") {
value = T(true);
} else if (param == "false" || param == "0") {
value = T(false);
} else {
throw boost::bad_lexical_cast{};
}
} else {
value = T{boost::lexical_cast<Base>(param)};
}
} catch (boost::bad_lexical_cast&) {
throw bad_param_exception(format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));
}
}
operator T() const { return value; }
};
}

View File

@@ -19,9 +19,11 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "database.hh"
#include "database_fwd.hh"
#include "service/storage_proxy.hh"
#include "http/httpd.hh"
#include <seastar/http/httpd.hh>
namespace service { class load_meter; }
namespace api {
@@ -31,9 +33,11 @@ struct http_context {
httpd::http_server_control http_server;
distributed<database>& db;
distributed<service::storage_proxy>& sp;
service::load_meter& lmeter;
http_context(distributed<database>& _db,
distributed<service::storage_proxy>& _sp)
: db(_db), sp(_sp) {
distributed<service::storage_proxy>& _sp,
service::load_meter& _lm)
: db(_db), sp(_sp), lmeter(_lm) {
}
};

View File

@@ -21,8 +21,8 @@
#include "collectd.hh"
#include "api/api-doc/collectd.json.hh"
#include "core/scollectd.hh"
#include "core/scollectd_api.hh"
#include <seastar/core/scollectd.hh>
#include <seastar/core/scollectd_api.hh>
#include "endian.h"
#include <boost/range/irange.hpp>
#include <regex>

View File

@@ -22,10 +22,14 @@
#include "column_family.hh"
#include "api/api-doc/column_family.json.hh"
#include <vector>
#include "http/exception.hh"
#include <seastar/http/exception.hh>
#include "sstables/sstables.hh"
#include "utils/estimated_histogram.hh"
#include <algorithm>
#include "db/system_keyspace_view_types.hh"
#include "db/data_listeners.hh"
extern logging::logger apilog;
namespace api {
using namespace httpd;
@@ -34,7 +38,7 @@ using namespace std;
using namespace json;
namespace cf = httpd::column_family_json;
const utils::UUID& get_uuid(const sstring& name, const database& db) {
std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {
auto pos = name.find("%3A");
size_t end;
if (pos == sstring::npos) {
@@ -46,14 +50,22 @@ const utils::UUID& get_uuid(const sstring& name, const database& db) {
} else {
end = pos + 3;
}
return std::make_tuple(name.substr(0, pos), name.substr(end));
}
const utils::UUID& get_uuid(const sstring& ks, const sstring& cf, const database& db) {
try {
return db.find_uuid(name.substr(0, pos), name.substr(end));
return db.find_uuid(ks, cf);
} catch (std::out_of_range& e) {
throw bad_param_exception("Column family '" + name.substr(0, pos) + ":"
+ name.substr(end) + "' not found");
throw bad_param_exception(format("Column family '{}:{}' not found", ks, cf));
}
}
const utils::UUID& get_uuid(const sstring& name, const database& db) {
auto [ks, cf] = parse_fully_qualified_cf_name(name);
return get_uuid(ks, cf, db);
}
future<> foreach_column_family(http_context& ctx, const sstring& name, function<void(column_family&)> f) {
auto uuid = get_uuid(name, ctx.db.local());
@@ -63,28 +75,28 @@ future<> foreach_column_family(http_context& ctx, const sstring& name, function<
}
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
int64_t column_family::stats::*f) {
int64_t column_family_stats::*f) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return cf.get_stats().*f;
}, std::plus<int64_t>());
}
future<json::json_return_type> get_cf_stats(http_context& ctx,
int64_t column_family::stats::*f) {
int64_t column_family_stats::*f) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return cf.get_stats().*f;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_stats_count(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).hist.count;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([uuid, f](database& db) {
// Histograms information is sample of the actual load
@@ -100,14 +112,14 @@ static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const
static future<json::json_return_type> get_cf_stats_count(http_context& ctx,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).hist.count;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const database& p) {
return (p.find_column_family(uuid).get_stats().*f).hist;},
@@ -118,7 +130,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, const
});
}
static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
std::function<utils::ihistogram(const database&)> fun = [f] (const database& db) {
utils::ihistogram res;
for (auto i : db.get_column_families()) {
@@ -134,7 +146,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils:
}
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const database& p) {
return (p.find_column_family(uuid).get_stats().*f).rate();},
@@ -145,7 +157,7 @@ static future<json::json_return_type> get_cf_rate_and_histogram(http_context& c
});
}
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
std::function<utils::rate_moving_average_and_histogram(const database&)> fun = [f] (const database& db) {
utils::rate_moving_average_and_histogram res;
for (auto i : db.get_column_families()) {
@@ -166,27 +178,27 @@ static future<json::json_return_type> get_cf_unleveled_sstables(http_context& ct
}, std::plus<int64_t>());
}
static int64_t min_row_size(column_family& cf) {
static int64_t min_partition_size(column_family& cf) {
int64_t res = INT64_MAX;
for (auto i: *cf.get_sstables() ) {
res = std::min(res, i->get_stats_metadata().estimated_row_size.min());
res = std::min(res, i->get_stats_metadata().estimated_partition_size.min());
}
return (res == INT64_MAX) ? 0 : res;
}
static int64_t max_row_size(column_family& cf) {
static int64_t max_partition_size(column_family& cf) {
int64_t res = 0;
for (auto i: *cf.get_sstables() ) {
res = std::max(i->get_stats_metadata().estimated_row_size.max(), res);
res = std::max(i->get_stats_metadata().estimated_partition_size.max(), res);
}
return res;
}
static integral_ratio_holder mean_row_size(column_family& cf) {
static integral_ratio_holder mean_partition_size(column_family& cf) {
integral_ratio_holder res;
for (auto i: *cf.get_sstables() ) {
auto c = i->get_stats_metadata().estimated_row_size.count();
res.sub += i->get_stats_metadata().estimated_row_size.mean() * c;
auto c = i->get_stats_metadata().estimated_partition_size.count();
res.sub += i->get_stats_metadata().estimated_partition_size.mean() * c;
res.total += c;
}
return res;
@@ -242,12 +254,11 @@ class sum_ratio {
uint64_t _n = 0;
T _total = 0;
public:
future<> operator()(T value) {
void operator()(T value) {
if (value > 0) {
_total += value;
_n++;
}
return make_ready_future<>();
}
// Returns average value of all registered ratios.
T get() && {
@@ -396,29 +407,31 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::memtable_switch_count);
return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::memtable_switch_count);
});
cf::get_all_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::memtable_switch_count);
return get_cf_stats(ctx, &column_family_stats::memtable_switch_count);
});
// FIXME: this refers to partitions, not rows.
cf::get_estimated_row_size_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
utils::estimated_histogram res(0);
for (auto i: *cf.get_sstables() ) {
res.merge(i->get_stats_metadata().estimated_row_size);
res.merge(i->get_stats_metadata().estimated_partition_size);
}
return res;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
// FIXME: this refers to partitions, not rows.
cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
uint64_t res = 0;
for (auto i: *cf.get_sstables() ) {
res += i->get_stats_metadata().estimated_row_size.count();
res += i->get_stats_metadata().estimated_partition_size.count();
}
return res;
},
@@ -443,67 +456,67 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::pending_flushes);
return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::pending_flushes);
});
cf::get_all_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::pending_flushes);
return get_cf_stats(ctx, &column_family_stats::pending_flushes);
});
cf::get_read.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx,req->param["name"] ,&column_family::stats::reads);
return get_cf_stats_count(ctx,req->param["name"] ,&column_family_stats::reads);
});
cf::get_all_read.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx, &column_family::stats::reads);
return get_cf_stats_count(ctx, &column_family_stats::reads);
});
cf::get_write.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx, req->param["name"] ,&column_family::stats::writes);
return get_cf_stats_count(ctx, req->param["name"] ,&column_family_stats::writes);
});
cf::get_all_write.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx, &column_family::stats::writes);
return get_cf_stats_count(ctx, &column_family_stats::writes);
});
cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::reads);
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::reads);
});
cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::reads);
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::reads);
});
cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_sum(ctx,req->param["name"] ,&column_family::stats::reads);
return get_cf_stats_sum(ctx,req->param["name"] ,&column_family_stats::reads);
});
cf::get_write_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_sum(ctx, req->param["name"] ,&column_family::stats::writes);
return get_cf_stats_sum(ctx, req->param["name"] ,&column_family_stats::writes);
});
cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, &column_family::stats::writes);
return get_cf_histogram(ctx, &column_family_stats::writes);
});
cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);
});
cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::writes);
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::writes);
});
cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::writes);
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::writes);
});
cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, &column_family::stats::writes);
return get_cf_histogram(ctx, &column_family_stats::writes);
});
cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);
});
cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -519,11 +532,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, req->param["name"], &column_family::stats::live_sstable_count);
return get_cf_stats(ctx, req->param["name"], &column_family_stats::live_sstable_count);
});
cf::get_all_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::live_sstable_count);
return get_cf_stats(ctx, &column_family_stats::live_sstable_count);
});
cf::get_unleveled_sstables.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -546,30 +559,36 @@ void set_column_family(http_context& ctx, routes& r) {
return sum_sstable(ctx, true);
});
// FIXME: this refers to partitions, not rows.
cf::get_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], INT64_MAX, min_row_size, min_int64);
return map_reduce_cf(ctx, req->param["name"], INT64_MAX, min_partition_size, min_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_all_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, INT64_MAX, min_row_size, min_int64);
return map_reduce_cf(ctx, INT64_MAX, min_partition_size, min_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_row_size, max_int64);
return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_partition_size, max_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_all_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, int64_t(0), max_row_size, max_int64);
return map_reduce_cf(ctx, int64_t(0), max_partition_size, max_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
// Cassandra 3.x mean values are truncated as integrals.
return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_row_size, std::plus<integral_ratio_holder>());
return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());
});
// FIXME: this refers to partitions, not rows.
cf::get_all_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
// Cassandra 3.x mean values are truncated as integrals.
return map_reduce_cf(ctx, integral_ratio_holder(), mean_row_size, std::plus<integral_ratio_holder>());
return map_reduce_cf(ctx, integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());
});
cf::get_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -776,25 +795,25 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_cas_prepare.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_cas_prepare;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_cas_propose.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_cas_propose;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_cas_commit.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_cas_commit;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -805,11 +824,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::tombstone_scanned);
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::tombstone_scanned);
});
cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::live_scanned);
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::live_scanned);
});
cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<request> req) {
@@ -827,13 +846,28 @@ void set_column_family(http_context& ctx, routes& r) {
return true;
});
cf::get_built_indexes.set(r, [](const_req) {
// FIXME
// Currently there are no index support
return std::vector<sstring>();
cf::get_built_indexes.set(r, [&ctx](std::unique_ptr<request> req) {
auto [ks, cf_name] = parse_fully_qualified_cf_name(req->param["name"]);
return db::system_keyspace::load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace::view_build_progress>& vb) mutable {
std::set<sstring> vp;
for (auto b : vb) {
if (b.view.first == ks) {
vp.insert(b.view.second);
}
}
std::vector<sstring> res;
auto uuid = get_uuid(ks, cf_name, ctx.db.local());
column_family& cf = ctx.db.local().find_column_family(uuid);
res.reserve(cf.get_index_manager().list_indexes().size());
for (auto&& i : cf.get_index_manager().list_indexes()) {
if (vp.find(secondary_index::index_table_name(i.metadata().name())) == vp.end()) {
res.emplace_back(i.metadata().name());
}
}
return make_ready_future<json::json_return_type>(res);
});
});
cf::get_compression_metadata_off_heap_memory_used.set(r, [](const_req) {
// FIXME
// Currently there are no information on the compression
@@ -920,5 +954,45 @@ void set_column_family(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(container_to_vec(res));
});
});
cf::toppartitions.set(r, [&ctx] (std::unique_ptr<request> req) {
auto name_param = req->param["name"];
auto [ks, cf] = parse_fully_qualified_cf_name(name_param);
api::req_param<std::chrono::milliseconds, unsigned> duration{*req, "duration", 1000ms};
api::req_param<unsigned> capacity(*req, "capacity", 256);
api::req_param<unsigned> list_size(*req, "list_size", 10);
apilog.info("toppartitions query: name={} duration={} list_size={} capacity={}",
name_param, duration.param, list_size.param, capacity.param);
return seastar::do_with(db::toppartitions_query(ctx.db, ks, cf, duration.value, list_size, capacity), [&ctx](auto& q) {
return q.scatter().then([&q] {
return sleep(q.duration()).then([&q] {
return q.gather(q.capacity()).then([&q] (auto topk_results) {
apilog.debug("toppartitions query: processing results");
cf::toppartitions_query_results results;
for (auto& d: topk_results.read.top(q.list_size())) {
cf::toppartitions_record r;
r.partition = sstring(d.item);
r.count = d.count;
r.error = d.error;
results.read.push(r);
}
for (auto& d: topk_results.write.top(q.list_size())) {
cf::toppartitions_record r;
r.partition = sstring(d.item);
r.count = d.count;
r.error = d.error;
results.write.push(r);
}
return make_ready_future<json::json_return_type>(results);
});
});
});
});
});
}
}

View File

@@ -24,6 +24,7 @@
#include "api.hh"
#include "api/api-doc/column_family.json.hh"
#include "database.hh"
#include <seastar/core/future-util.hh>
#include <any>
namespace api {
@@ -38,14 +39,14 @@ template<class Mapper, class I, class Reducer>
future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer) {
auto uuid = get_uuid(name, ctx.db.local());
using mapper_type = std::function<std::any (database&)>;
using reducer_type = std::function<std::any (std::any, std::any)>;
using mapper_type = std::function<std::unique_ptr<std::any>(database&)>;
using reducer_type = std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)>;
return ctx.db.map_reduce0(mapper_type([mapper, uuid](database& db) {
return I(mapper(db.find_column_family(uuid)));
}), std::any(std::move(init)), reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {
return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));
})).then([] (std::any r) {
return std::any_cast<I>(std::move(r));
return std::make_unique<std::any>(I(mapper(db.find_column_family(uuid))));
}), std::make_unique<std::any>(std::move(init)), reducer_type([reducer = std::move(reducer)] (std::unique_ptr<std::any> a, std::unique_ptr<std::any> b) mutable {
return std::make_unique<std::any>(I(reducer(std::any_cast<I>(std::move(*a)), std::any_cast<I>(std::move(*b)))));
})).then([] (std::unique_ptr<std::any> r) {
return std::any_cast<I>(std::move(*r));
});
}
@@ -69,30 +70,32 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& n
struct map_reduce_column_families_locally {
std::any init;
std::function<std::any (column_family&)> mapper;
std::function<std::any (std::any, std::any)> reducer;
std::any operator()(database& db) const {
auto res = init;
for (auto i : db.get_column_families()) {
res = reducer(res, mapper(*i.second.get()));
}
return res;
std::function<std::unique_ptr<std::any>(column_family&)> mapper;
std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)> reducer;
future<std::unique_ptr<std::any>> operator()(database& db) const {
auto res = seastar::make_lw_shared<std::unique_ptr<std::any>>(std::make_unique<std::any>(init));
return do_for_each(db.get_column_families(), [res, this](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {
*res = std::move(reducer(std::move(*res), mapper(*i.second.get())));
}).then([res] {
return std::move(*res);
});
}
};
template<class Mapper, class I, class Reducer>
future<I> map_reduce_cf_raw(http_context& ctx, I init,
Mapper mapper, Reducer reducer) {
using mapper_type = std::function<std::any (column_family&)>;
using reducer_type = std::function<std::any (std::any, std::any)>;
using mapper_type = std::function<std::unique_ptr<std::any>(column_family&)>;
using reducer_type = std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)>;
auto wrapped_mapper = mapper_type([mapper = std::move(mapper)] (column_family& cf) mutable {
return I(mapper(cf));
return std::make_unique<std::any>(I(mapper(cf)));
});
auto wrapped_reducer = reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {
return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));
auto wrapped_reducer = reducer_type([reducer = std::move(reducer)] (std::unique_ptr<std::any> a, std::unique_ptr<std::any> b) mutable {
return std::make_unique<std::any>(I(reducer(std::any_cast<I>(std::move(*a)), std::any_cast<I>(std::move(*b)))));
});
return ctx.db.map_reduce0(map_reduce_column_families_locally{init, std::move(wrapped_mapper), wrapped_reducer}, std::any(init), wrapped_reducer).then([] (std::any res) {
return std::any_cast<I>(std::move(res));
return ctx.db.map_reduce0(map_reduce_column_families_locally{init,
std::move(wrapped_mapper), wrapped_reducer}, std::make_unique<std::any>(init), wrapped_reducer).then([] (std::unique_ptr<std::any> res) {
return std::any_cast<I>(std::move(*res));
});
}
@@ -106,9 +109,9 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,
}
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
int64_t column_family::stats::*f);
int64_t column_family_stats::*f);
future<json::json_return_type> get_cf_stats(http_context& ctx,
int64_t column_family::stats::*f);
int64_t column_family_stats::*f);
}

View File

@@ -22,15 +22,16 @@
#include "commitlog.hh"
#include <db/commitlog/commitlog.hh>
#include "api/api-doc/commitlog.json.hh"
#include "database.hh"
#include <vector>
namespace api {
template<typename Func>
static auto acquire_cl_metric(http_context& ctx, Func&& func) {
typedef std::result_of_t<Func(db::commitlog *)> ret_type;
template<typename T>
static auto acquire_cl_metric(http_context& ctx, std::function<T (db::commitlog*)> func) {
typedef T ret_type;
return ctx.db.map_reduce0([func = std::forward<Func>(func)](database& db) {
return ctx.db.map_reduce0([func = std::move(func)](database& db) {
if (db.commitlog() == nullptr) {
return make_ready_future<ret_type>();
}
@@ -63,15 +64,15 @@ void set_commitlog(http_context& ctx, routes& r) {
});
httpd::commitlog_json::get_completed_tasks.set(r, [&ctx](std::unique_ptr<request> req) {
return acquire_cl_metric(ctx, std::bind(&db::commitlog::get_completed_tasks, std::placeholders::_1));
return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_completed_tasks, std::placeholders::_1));
});
httpd::commitlog_json::get_pending_tasks.set(r, [&ctx](std::unique_ptr<request> req) {
return acquire_cl_metric(ctx, std::bind(&db::commitlog::get_pending_tasks, std::placeholders::_1));
return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_pending_tasks, std::placeholders::_1));
});
httpd::commitlog_json::get_total_commit_log_size.set(r, [&ctx](std::unique_ptr<request> req) {
return acquire_cl_metric(ctx, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));
return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));
});
}

View File

@@ -24,6 +24,7 @@
#include "api/api-doc/compaction_manager.json.hh"
#include "db/system_keyspace.hh"
#include "column_family.hh"
#include <utility>
namespace api {
@@ -38,6 +39,16 @@ static future<json::json_return_type> get_cm_stats(http_context& ctx,
return make_ready_future<json::json_return_type>(res);
});
}
static std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash> sum_pending_tasks(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>&& a,
const std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& b) {
for (auto&& i : b) {
if (i.second) {
a[i.first] += i.second;
}
}
return std::move(a);
}
void set_compaction_manager(http_context& ctx, routes& r) {
cm::get_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -47,8 +58,8 @@ void set_compaction_manager(http_context& ctx, routes& r) {
for (const auto& c : cm.get_compactions()) {
cm::summary s;
s.ks = c->ks;
s.cf = c->cf;
s.ks = c->ks_name;
s.cf = c->cf_name;
s.unit = "keys";
s.task_type = sstables::compaction_name(c->type);
s.completed = c->total_keys_written;
@@ -61,6 +72,32 @@ void set_compaction_manager(http_context& ctx, routes& r) {
});
});
cm::get_pending_tasks_by_table.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.db.map_reduce0([&ctx](database& db) {
return do_with(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), [&ctx, &db](std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& tasks) {
return do_for_each(db.get_column_families(), [&tasks](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {
table& cf = *i.second.get();
tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.get_compaction_strategy().estimated_pending_compactions(cf);
return make_ready_future<>();
}).then([&tasks] {
return std::move(tasks);
});
});
}, std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), sum_pending_tasks).then(
[](const std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& task_map) {
std::vector<cm::pending_compaction> res;
res.reserve(task_map.size());
for (auto i : task_map) {
cm::pending_compaction task;
task.ks = i.first.first;
task.cf = i.first.second;
task.task = i.second;
res.emplace_back(std::move(task));
}
return make_ready_future<json::json_return_type>(res);
});
});
cm::force_user_defined_compaction.set(r, [] (std::unique_ptr<request> req) {
//TBD
// FIXME
@@ -103,29 +140,37 @@ void set_compaction_manager(http_context& ctx, routes& r) {
});
cm::get_compaction_history.set(r, [] (std::unique_ptr<request> req) {
return db::system_keyspace::get_compaction_history().then([] (std::vector<db::system_keyspace::compaction_history_entry> history) {
std::vector<cm::history> res;
res.reserve(history.size());
for (auto& entry : history) {
cm::history h;
h.id = entry.id.to_sstring();
h.ks = std::move(entry.ks);
h.cf = std::move(entry.cf);
h.compacted_at = entry.compacted_at;
h.bytes_in = entry.bytes_in;
h.bytes_out = entry.bytes_out;
for (auto it : entry.rows_merged) {
httpd::compaction_manager_json::row_merged e;
e.key = it.first;
e.value = it.second;
h.rows_merged.push(std::move(e));
}
res.push_back(std::move(h));
}
return make_ready_future<json::json_return_type>(res);
});
std::function<future<>(output_stream<char>&&)> f = [](output_stream<char>&& s) {
return do_with(output_stream<char>(std::move(s)), true, [] (output_stream<char>& s, bool& first){
return s.write("[").then([&s, &first] {
return db::system_keyspace::get_compaction_history([&s, &first](const db::system_keyspace::compaction_history_entry& entry) mutable {
cm::history h;
h.id = entry.id.to_sstring();
h.ks = std::move(entry.ks);
h.cf = std::move(entry.cf);
h.compacted_at = entry.compacted_at;
h.bytes_in = entry.bytes_in;
h.bytes_out = entry.bytes_out;
for (auto it : entry.rows_merged) {
httpd::compaction_manager_json::row_merged e;
e.key = it.first;
e.value = it.second;
h.rows_merged.push(std::move(e));
}
auto fut = first ? make_ready_future<>() : s.write(", ");
first = false;
return fut.then([&s, h = std::move(h)] {
return formatter::write(s, h);
});
}).then([&s] {
return s.write("]").then([&s] {
return s.close();
});
});
});
});
};
return make_ready_future<json::json_return_type>(std::move(f));
});
cm::get_compaction_info.set(r, [] (std::unique_ptr<request> req) {

View File

@@ -22,6 +22,7 @@
#include "api/config.hh"
#include "api/api-doc/config.json.hh"
#include "db/config.hh"
#include "database.hh"
#include <sstream>
#include <boost/algorithm/string/replace.hpp>
@@ -43,14 +44,14 @@ json::json_return_type get_json_return_type(const db::seed_provider_type& val) {
return json::json_return_type(val.class_name);
}
std::string format_type(const std::string& type) {
std::string_view format_type(std::string_view type) {
if (type == "int") {
return "integer";
}
return type;
}
future<> get_config_swagger_entry(const std::string& name, const std::string& description, const std::string& type, bool& first, output_stream<char>& os) {
future<> get_config_swagger_entry(std::string_view name, const std::string& description, std::string_view type, bool& first, output_stream<char>& os) {
std::stringstream ss;
if (first) {
first=false;
@@ -87,23 +88,29 @@ future<> get_config_swagger_entry(const std::string& name, const std::string& de
}
namespace cs = httpd::config_json;
#define _get_config_value(name, type, deflt, status, desc, ...) if (id == #name) {return get_json_return_type(ctx.db.local().get_config().name());}
#define _get_config_description(name, type, deflt, status, desc, ...) f = f.then([&os, &first] {return get_config_swagger_entry(#name, desc, #type, first, os);});
void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx, routes& r) {
rb->register_function(r, [] (output_stream<char>& os) {
return do_with(true, [&os] (bool& first) {
rb->register_function(r, [&ctx] (output_stream<char>& os) {
return do_with(true, [&os, &ctx] (bool& first) {
auto f = make_ready_future();
_make_config_values(_get_config_description)
for (auto&& cfg_ref : ctx.db.local().get_config().values()) {
auto&& cfg = cfg_ref.get();
f = f.then([&os, &first, &cfg] {
return get_config_swagger_entry(cfg.name(), std::string(cfg.desc()), cfg.type_name(), first, os);
});
}
return f;
});
});
cs::find_config_id.set(r, [&ctx] (const_req r) {
auto id = r.param["id"];
_make_config_values(_get_config_value)
for (auto&& cfg_ref : ctx.db.local().get_config().values()) {
auto&& cfg = cfg_ref.get();
if (id == cfg.name()) {
return cfg.value_as_json();
}
}
throw bad_param_exception(sstring("No such config entry: ") + id);
});
}

View File

@@ -23,7 +23,7 @@
#include "api/lsa.hh"
#include "api/api.hh"
#include "http/exception.hh"
#include <seastar/http/exception.hh>
#include "utils/logalloc.hh"
#include "log.hh"

View File

@@ -21,7 +21,7 @@
#include "messaging_service.hh"
#include "message/messaging_service.hh"
#include "rpc/rpc_types.hh"
#include <seastar/rpc/rpc_types.hh>
#include "api/api-doc/messaging_service.json.hh"
#include <iostream>
#include <sstream>
@@ -76,7 +76,7 @@ future_json_function get_server_getter(std::function<uint64_t(const rpc::stats&)
auto get_shard_map = [f](messaging_service& ms) {
std::unordered_map<gms::inet_address, unsigned long> map;
ms.foreach_server_connection_stats([&map, f] (const rpc::client_info& info, const rpc::stats& stats) mutable {
map[gms::inet_address(net::ipv4_address(info.addr))] = f(stats);
map[gms::inet_address(info.addr.addr())] = f(stats);
});
return map;
};
@@ -139,7 +139,7 @@ void set_messaging_service(http_context& ctx, routes& r) {
messaging_verb v = i; // for type safety we use messaging_verb values
auto idx = static_cast<uint32_t>(v);
if (idx >= map->size()) {
throw std::runtime_error(sprint("verb index out of bounds: %lu, map size: %lu", idx, map->size()));
throw std::runtime_error(format("verb index out of bounds: {:d}, map size: {:d}", idx, map->size()));
}
if ((*map)[idx] > 0) {
c.count = (*map)[idx];

View File

@@ -26,6 +26,7 @@
#include "service/storage_service.hh"
#include "db/config.hh"
#include "utils/histogram.hh"
#include "database.hh"
namespace api {
@@ -46,6 +47,10 @@ static future<json::json_return_type> sum_timed_rate_as_obj(distributed<proxy>&
});
}
httpd::utils_json::rate_moving_average_and_histogram get_empty_moving_average() {
return timer_to_json(utils::rate_moving_average_and_histogram());
}
static future<json::json_return_type> sum_timed_rate_as_long(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {
return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {
return make_ready_future<json::json_return_type>(val.count);
@@ -76,12 +81,9 @@ void set_storage_proxy(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
sp::get_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// hinted handoff is not supported currently,
// so we should return false
return make_ready_future<json::json_return_type>(false);
sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<request> req) {
auto enabled = ctx.db.local().get_config().hinted_handoff_enabled();
return make_ready_future<json::json_return_type>(enabled);
});
sp::set_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req) {
@@ -245,68 +247,40 @@ void set_storage_proxy(http_context& ctx, routes& r) {
});
});
sp::get_cas_read_timeouts.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
sp::get_cas_read_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_timeouts);
});
sp::get_cas_read_unavailables.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
sp::get_cas_read_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_unavailables);
});
sp::get_cas_write_timeouts.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
sp::get_cas_write_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_timeouts);
});
sp::get_cas_write_unavailables.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
sp::get_cas_write_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_unavailables);
});
sp::get_cas_write_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_write_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_write_unfinished_commit);
});
sp::get_cas_write_metrics_contention.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_write_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::cas_write_contention);
});
sp::get_cas_write_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_write_metrics_condition_not_met.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_write_condition_not_met);
});
sp::get_cas_read_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_read_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_read_unfinished_commit);
});
sp::get_cas_read_metrics_contention.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_read_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_read_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::cas_read_contention);
});
sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -376,6 +350,21 @@ void set_storage_proxy(http_context& ctx, routes& r) {
sp::get_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::write);
});
sp::get_cas_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::cas_write);
});
sp::get_cas_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::cas_read);
});
sp::get_view_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
// FIXME
// No View metrics are available, so just return empty moving average
return make_ready_future<json::json_return_type>(get_empty_moving_average());
});
sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::read);

View File

@@ -22,19 +22,27 @@
#include "storage_service.hh"
#include "api/api-doc/storage_service.json.hh"
#include "db/config.hh"
#include <optional>
#include <time.h>
#include <boost/range/adaptor/map.hpp>
#include <boost/range/adaptor/filtered.hpp>
#include <service/storage_service.hh>
#include <db/commitlog/commitlog.hh>
#include <gms/gossiper.hh>
#include <db/system_keyspace.hh>
#include "http/exception.hh"
#include "service/storage_service.hh"
#include "service/load_meter.hh"
#include "db/commitlog/commitlog.hh"
#include "gms/gossiper.hh"
#include "db/system_keyspace.hh"
#include "seastar/http/exception.hh"
#include "repair/repair.hh"
#include "locator/snitch_base.hh"
#include "column_family.hh"
#include "log.hh"
#include "release.hh"
#include "sstables/compaction_manager.hh"
#include "sstables/sstables.hh"
#include "database.hh"
#include "db/extensions.hh"
sstables::sstable::version_types get_highest_supported_format();
namespace api {
@@ -48,45 +56,55 @@ static sstring validate_keyspace(http_context& ctx, const parameters& param) {
throw bad_param_exception("Keyspace " + param["keyspace"] + " Does not exist");
}
static std::vector<ss::token_range> describe_ring(const sstring& keyspace) {
std::vector<ss::token_range> res;
for (auto d : service::get_local_storage_service().describe_ring(keyspace)) {
ss::token_range r;
r.start_token = d._start_token;
r.end_token = d._end_token;
r.endpoints = d._endpoints;
r.rpc_endpoints = d._rpc_endpoints;
for (auto det : d._endpoint_details) {
ss::endpoint_detail ed;
ed.host = det._host;
ed.datacenter = det._datacenter;
if (det._rack != "") {
ed.rack = det._rack;
}
r.endpoint_details.push(ed);
static ss::token_range token_range_endpoints_to_json(const dht::token_range_endpoints& d) {
ss::token_range r;
r.start_token = d._start_token;
r.end_token = d._end_token;
r.endpoints = d._endpoints;
r.rpc_endpoints = d._rpc_endpoints;
for (auto det : d._endpoint_details) {
ss::endpoint_detail ed;
ed.host = det._host;
ed.datacenter = det._datacenter;
if (det._rack != "") {
ed.rack = det._rack;
}
res.push_back(r);
r.endpoint_details.push(ed);
}
return res;
return r;
}
void set_storage_service(http_context& ctx, routes& r) {
using ks_cf_func = std::function<future<json::json_return_type>(std::unique_ptr<request>, sstring, std::vector<sstring>)>;
auto wrap_ks_cf = [&ctx](ks_cf_func f) {
return [&ctx, f = std::move(f)](std::unique_ptr<request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
auto column_families = split_cf(req->get_query_param("cf"));
if (column_families.empty()) {
column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
return f(std::move(req), std::move(keyspace), std::move(column_families));
};
};
ss::local_hostid.set(r, [](std::unique_ptr<request> req) {
return db::system_keyspace::get_local_host_id().then([](const utils::UUID& id) {
return make_ready_future<json::json_return_type>(id.to_sstring());
});
});
ss::get_tokens.set(r, [] (const_req req) {
auto tokens = service::get_local_storage_service().get_token_metadata().sorted_tokens();
return container_to_vec(tokens);
ss::get_tokens.set(r, [] (std::unique_ptr<request> req) {
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_token_metadata().sorted_tokens(), [](const dht::token& i) {
return boost::lexical_cast<std::string>(i);
}));
});
ss::get_node_tokens.set(r, [] (const_req req) {
gms::inet_address addr(req.param["endpoint"]);
auto tokens = service::get_local_storage_service().get_token_metadata().get_tokens(addr);
return container_to_vec(tokens);
ss::get_node_tokens.set(r, [] (std::unique_ptr<request> req) {
gms::inet_address addr(req->param["endpoint"]);
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_token_metadata().get_tokens(addr), [](const dht::token& i) {
return boost::lexical_cast<std::string>(i);
}));
});
ss::get_commitlog.set(r, [&ctx](const_req req) {
@@ -107,11 +125,7 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_moving_nodes.set(r, [](const_req req) {
auto points = service::get_local_storage_service().get_token_metadata().get_moving_endpoints();
std::unordered_set<sstring> addr;
for (auto i: points) {
addr.insert(boost::lexical_cast<std::string>(i.second));
}
return container_to_vec(addr);
});
@@ -159,13 +173,13 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(res);
});
ss::describe_any_ring.set(r, [&ctx](const_req req) {
return describe_ring("");
ss::describe_any_ring.set(r, [&ctx](std::unique_ptr<request> req) {
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(""), token_range_endpoints_to_json));
});
ss::describe_ring.set(r, [&ctx](const_req req) {
auto keyspace = validate_keyspace(ctx, req.param);
return describe_ring(keyspace);
ss::describe_ring.set(r, [&ctx](std::unique_ptr<request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(keyspace), token_range_endpoints_to_json));
});
ss::get_host_id_map.set(r, [](const_req req) {
@@ -175,11 +189,11 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_load.set(r, [&ctx](std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);
});
ss::get_load_map.set(r, [] (std::unique_ptr<request> req) {
return service::get_local_storage_service().get_load_map().then([] (auto&& load_map) {
ss::get_load_map.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.lmeter.get_load_map().then([] (auto&& load_map) {
std::vector<ss::map_string_double> res;
for (auto i : load_map) {
ss::map_string_double val;
@@ -237,6 +251,9 @@ void set_storage_service(http_context& ctx, routes& r) {
if (column_family.empty()) {
resp = service::get_local_storage_service().take_snapshot(tag, keynames);
} else {
if (keynames.empty()) {
throw httpd::bad_param_exception("The keyspace of column families must be specified");
}
if (keynames.size() > 1) {
throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");
}
@@ -287,38 +304,65 @@ void set_storage_service(http_context& ctx, routes& r) {
if (column_families.empty()) {
column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
std::vector<column_family*> column_families_vec;
auto& cm = db.get_compaction_manager();
for (auto cf : column_families) {
column_families_vec.push_back(&db.find_column_family(keyspace, cf));
return service::get_local_storage_service().is_cleanup_allowed(keyspace).then([&ctx, keyspace,
column_families = std::move(column_families)] (bool is_cleanup_allowed) mutable {
if (!is_cleanup_allowed) {
return make_exception_future<json::json_return_type>(
std::runtime_error("Can not perform cleanup operation when topology changes"));
}
return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {
return cm.perform_cleanup(cf);
return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
std::vector<column_family*> column_families_vec;
auto& cm = db.get_compaction_manager();
for (auto cf : column_families) {
column_families_vec.push_back(&db.find_column_family(keyspace, cf));
}
return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {
return cm.perform_cleanup(cf);
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
});
});
ss::scrub.set(r, wrap_ks_cf([&ctx](std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {
// TODO: respect this
auto skip_corrupted = req->get_query_param("skip_corrupted");
auto f = make_ready_future<>();
if (!req_param<bool>(*req, "disable_snapshot", false)) {
auto tag = format("pre-scrub-{:d}", db_clock::now().time_since_epoch().count());
f = parallel_for_each(column_families, [keyspace, tag](sstring cf) {
return service::get_local_storage_service().take_column_family_snapshot(keyspace, cf, tag);
});
}
return f.then([&ctx, keyspace, column_families] {
return ctx.db.invoke_on_all([=] (database& db) {
return do_for_each(column_families, [=, &db](sstring cfname) {
auto& cm = db.get_compaction_manager();
auto& cf = db.find_column_family(keyspace, cfname);
return cm.perform_sstable_scrub(&cf);
});
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
});
}));
ss::scrub.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req->param);
auto column_family = req->get_query_param("cf");
auto disable_snapshot = req->get_query_param("disable_snapshot");
auto skip_corrupted = req->get_query_param("skip_corrupted");
return make_ready_future<json::json_return_type>(json_void());
});
ss::upgrade_sstables.set(r, wrap_ks_cf([&ctx](std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {
bool exclude_current_version = req_param<bool>(*req, "exclude_current_version", false);
ss::upgrade_sstables.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req->param);
auto column_family = req->get_query_param("cf");
auto exclude_current_version = req->get_query_param("exclude_current_version");
return make_ready_future<json::json_return_type>(json_void());
});
return ctx.db.invoke_on_all([=] (database& db) {
return do_for_each(column_families, [=, &db](sstring cfname) {
auto& cm = db.get_compaction_manager();
auto& cf = db.find_column_family(keyspace, cfname);
return cm.perform_sstable_upgrade(&cf, exclude_current_version);
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
}));
ss::force_keyspace_flush.set(r, [&ctx](std::unique_ptr<request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
@@ -456,7 +500,7 @@ void set_storage_service(http_context& ctx, routes& r) {
return service::get_storage_service().map_reduce(adder<service::storage_service::drain_progress>(), [] (auto& ss) {
return ss.get_drain_progress();
}).then([] (auto&& progress) {
auto progress_str = sprint("Drained %s/%s ColumnFamilies", progress.remaining_cfs, progress.total_cfs);
auto progress_str = format("Drained {}/{} ColumnFamilies", progress.remaining_cfs, progress.total_cfs);
return make_ready_future<json::json_return_type>(std::move(progress_str));
});
});
@@ -561,9 +605,7 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::join_ring.set(r, [](std::unique_ptr<request> req) {
return service::get_local_storage_service().join_ring().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
return make_ready_future<json::json_return_type>(json_void());
});
ss::is_joined.set(r, [] (std::unique_ptr<request> req) {
@@ -667,7 +709,11 @@ void set_storage_service(http_context& ctx, routes& r) {
auto coordinator = std::hash<sstring>()(cf) % smp::count;
return service::get_storage_service().invoke_on(coordinator, [ks = std::move(ks), cf = std::move(cf)] (service::storage_service& s) {
return s.load_new_sstables(ks, cf);
}).then([] {
}).then_wrapped([] (auto&& f) {
if (f.failed()) {
auto msg = fmt::format("Failed to load new sstables: {}", f.get_exception());
return make_exception_future<json::json_return_type>(httpd::server_error_exception(msg));
}
return make_ready_future<json::json_return_type>(json_void());
});
});
@@ -701,7 +747,7 @@ void set_storage_service(http_context& ctx, routes& r) {
} catch (std::out_of_range& e) {
throw httpd::bad_param_exception(e.what());
} catch (std::invalid_argument&){
throw httpd::bad_param_exception(sprint("Bad format in a probability value: \"%s\"", probability.c_str()));
throw httpd::bad_param_exception(format("Bad format in a probability value: \"{}\"", probability.c_str()));
}
});
});
@@ -737,7 +783,7 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
} catch (...) {
throw httpd::bad_param_exception(sprint("Bad format value: "));
throw httpd::bad_param_exception(format("Bad format value: "));
}
});
@@ -819,7 +865,7 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_metrics_load.set(r, [&ctx](std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);
});
ss::get_exceptions.set(r, [](const_req req) {
@@ -861,6 +907,133 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(map_to_key_value(std::move(status), res));
});
});
ss::sstable_info.set(r, [&ctx] (std::unique_ptr<request> req) {
auto ks = api::req_param<sstring>(*req, "keyspace", {}).value;
auto cf = api::req_param<sstring>(*req, "cf", {}).value;
// The size of this vector is bound by ks::cf. I.e. it is as most Nks + Ncf long
// which is not small, but not huge either.
using table_sstables_list = std::vector<ss::table_sstables>;
return do_with(table_sstables_list{}, [ks, cf, &ctx](table_sstables_list& dst) {
return service::get_local_storage_service().db().map_reduce([&dst](table_sstables_list&& res) {
for (auto&& t : res) {
auto i = std::find_if(dst.begin(), dst.end(), [&t](const ss::table_sstables& t2) {
return t.keyspace() == t2.keyspace() && t.table() == t2.table();
});
if (i == dst.end()) {
dst.emplace_back(std::move(t));
continue;
}
auto& ssd = i->sstables;
for (auto&& sd : t.sstables._elements) {
auto j = std::find_if(ssd._elements.begin(), ssd._elements.end(), [&sd](const ss::sstable& s) {
return s.generation() == sd.generation();
});
if (j == ssd._elements.end()) {
i->sstables.push(std::move(sd));
}
}
}
}, [ks, cf](const database& db) {
// see above
table_sstables_list res;
auto& ext = db.get_config().extensions();
for (auto& t : db.get_column_families() | boost::adaptors::map_values) {
auto& schema = t->schema();
if ((ks.empty() || ks == schema->ks_name()) && (cf.empty() || cf == schema->cf_name())) {
// at most Nsstables long
ss::table_sstables tst;
tst.keyspace = schema->ks_name();
tst.table = schema->cf_name();
for (auto sstable : *t->get_sstables_including_compacted_undeleted()) {
auto ts = db_clock::to_time_t(sstable->data_file_write_time());
::tm t;
::gmtime_r(&ts, &t);
ss::sstable info;
info.timestamp = t;
info.generation = sstable->generation();
info.level = sstable->get_sstable_level();
info.size = sstable->bytes_on_disk();
info.data_size = sstable->ondisk_data_size();
info.index_size = sstable->index_size();
info.filter_size = sstable->filter_size();
info.version = sstable->get_version();
if (sstable->has_component(sstables::component_type::CompressionInfo)) {
auto& c = sstable->get_compression();
auto cp = sstables::get_sstable_compressor(c);
ss::named_maps nm;
nm.group = "compression_parameters";
for (auto& p : cp->options()) {
ss::mapper e;
e.key = p.first;
e.value = p.second;
nm.attributes.push(std::move(e));
}
if (!cp->options().count(compression_parameters::SSTABLE_COMPRESSION)) {
ss::mapper e;
e.key = compression_parameters::SSTABLE_COMPRESSION;
e.value = cp->name();
nm.attributes.push(std::move(e));
}
info.extended_properties.push(std::move(nm));
}
sstables::file_io_extension::attr_value_map map;
for (auto* ep : ext.sstable_file_io_extensions()) {
map.merge(ep->get_attributes(*sstable));
}
for (auto& p : map) {
struct {
const sstring& key;
ss::sstable& info;
void operator()(const std::map<sstring, sstring>& map) const {
ss::named_maps nm;
nm.group = key;
for (auto& p : map) {
ss::mapper e;
e.key = p.first;
e.value = p.second;
nm.attributes.push(std::move(e));
}
info.extended_properties.push(std::move(nm));
}
void operator()(const sstring& value) const {
ss::mapper e;
e.key = key;
e.value = value;
info.properties.push(std::move(e));
}
} v{p.first, info};
std::visit(v, p.second);
}
tst.sstables.push(std::move(info));
}
res.emplace_back(std::move(tst));
}
}
std::sort(res.begin(), res.end(), [](const ss::table_sstables& t1, const ss::table_sstables& t2) {
return t1.keyspace() < t2.keyspace() || (t1.keyspace() == t2.keyspace() && t1.table() < t2.table());
});
return res;
}).then([&dst] {
return make_ready_future<json::json_return_type>(stream_object(dst));
});
});
});
}
}

View File

@@ -22,7 +22,7 @@
#include "api/api-doc/system.json.hh"
#include "api/api.hh"
#include "http/exception.hh"
#include <seastar/http/exception.hh>
#include "log.hh"
namespace api {
@@ -30,6 +30,10 @@ namespace api {
namespace hs = httpd::system_json;
void set_system(http_context& ctx, routes& r) {
hs::get_system_uptime.set(r, [](const_req req) {
return std::chrono::duration_cast<std::chrono::milliseconds>(engine().uptime()).count();
});
hs::get_all_logger_names.set(r, [](const_req req) {
return logging::logger_registry().get_all_logger_names();
});

View File

@@ -21,6 +21,7 @@
#include "atomic_cell.hh"
#include "atomic_cell_or_collection.hh"
#include "counters.hh"
#include "types.hh"
/// LSA mirator for cells with irrelevant type
@@ -47,6 +48,23 @@ atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_typ
);
}
atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value, atomic_cell::collection_member cm) {
auto& imr_data = type.imr_state();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, bool(cm)), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value, collection_member cm)
{
auto& imr_data = type.imr_state();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, bool(cm)), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value,
gc_clock::time_point expiry, gc_clock::duration ttl, atomic_cell::collection_member cm) {
auto& imr_data = type.imr_state();
@@ -56,6 +74,25 @@ atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_typ
);
}
atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value,
gc_clock::time_point expiry, gc_clock::duration ttl, atomic_cell::collection_member cm) {
auto& imr_data = type.imr_state();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, expiry, ttl, bool(cm)), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value,
gc_clock::time_point expiry, gc_clock::duration ttl, collection_member cm)
{
auto& imr_data = type.imr_state();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, expiry, ttl, bool(cm)), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live_counter_update(api::timestamp_type timestamp, int64_t value) {
auto& imr_data = no_type_imr_descriptor();
return atomic_cell(
@@ -111,35 +148,6 @@ atomic_cell_or_collection::atomic_cell_or_collection(const abstract_type& type,
{
}
static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)
{
auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
auto ti = data::type_info::make_collection();
data::cell::context ctx(f, ti);
auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);
auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());
return collection_mutation_view { dv };
}
collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {
return get_collection_mutation_view(_data.get());
}
collection_mutation::collection_mutation(const collection_type_impl& type, collection_mutation_view v)
: _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator()))
{
}
collection_mutation::collection_mutation(const collection_type_impl& type, bytes_view v)
: _data(imr_object_type::make(data::cell::make_collection(v), &type.imr_state().lsa_migrator()))
{
}
collection_mutation::operator collection_mutation_view() const
{
return get_collection_mutation_view(_data.get());
}
bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_cell_or_collection& other) const
{
auto ptr_a = _data.get();
@@ -155,20 +163,20 @@ bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_c
if (a.timestamp() != b.timestamp()) {
return false;
}
if (a.is_live() != b.is_live()) {
return false;
}
if (a.is_live()) {
if (!b.is_live()) {
if (a.is_counter_update() != b.is_counter_update()) {
return false;
}
if (a.is_counter_update()) {
if (!b.is_counter_update()) {
return false;
}
return a.counter_update_value() == b.counter_update_value();
}
if (a.is_live_and_has_ttl() != b.is_live_and_has_ttl()) {
return false;
}
if (a.is_live_and_has_ttl()) {
if (!b.is_live_and_has_ttl()) {
return false;
}
if (a.ttl() != b.ttl() || a.expiry() != b.expiry()) {
return false;
}
@@ -187,19 +195,93 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)
return 0;
}
auto ctx = data::cell::context(_data.get(), t.imr_state().type_info());
return data::cell::structure::serialized_object_size(_data.get(), ctx);
auto view = data::cell::structure::make_view(_data.get(), ctx);
auto flags = view.get<data::cell::tags::flags>();
size_t external_value_size = 0;
if (flags.get<data::cell::tags::external_data>()) {
if (flags.get<data::cell::tags::collection>()) {
external_value_size = as_collection_mutation().data.size_bytes();
} else {
auto cell_view = data::cell::atomic_cell_view(t.imr_state().type_info(), view);
external_value_size = cell_view.value_size();
}
// Add overhead of chunk headers. The last one is a special case.
external_value_size += (external_value_size - 1) / data::cell::maximum_external_chunk_length * data::cell::external_chunk_overhead;
external_value_size += data::cell::external_last_chunk_overhead;
}
return data::cell::structure::serialized_object_size(_data.get(), ctx)
+ imr_object_type::size_overhead + external_value_size;
}
std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection& c) {
if (!c._data.get()) {
std::ostream&
operator<<(std::ostream& os, const atomic_cell_view& acv) {
if (acv.is_live()) {
return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",
acv.is_counter_update()
? "counter_update_value=" + to_sstring(acv.counter_update_value())
: to_hex(acv.value().linearize()),
acv.timestamp(),
acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,
acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);
} else {
return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",
acv.timestamp(), acv.deletion_time().time_since_epoch().count());
}
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell& ac) {
return os << atomic_cell_view(ac);
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell_view::printer& acvp) {
auto& type = acvp._type;
auto& acv = acvp._cell;
if (acv.is_live()) {
std::ostringstream cell_value_string_builder;
if (type.is_counter()) {
if (acv.is_counter_update()) {
cell_value_string_builder << "counter_update_value=" << acv.counter_update_value();
} else {
cell_value_string_builder << "shards: ";
counter_cell_view::with_linearized(acv, [&cell_value_string_builder] (counter_cell_view& ccv) {
cell_value_string_builder << ::join(", ", ccv.shards());
});
}
} else {
cell_value_string_builder << type.to_string(acv.value().linearize());
}
return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",
cell_value_string_builder.str(),
acv.timestamp(),
acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,
acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);
} else {
return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",
acv.timestamp(), acv.deletion_time().time_since_epoch().count());
}
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell::printer& acp) {
return operator<<(os, static_cast<const atomic_cell_view::printer&>(acp));
}
std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection::printer& p) {
if (!p._cell._data.get()) {
return os << "{ null atomic_cell_or_collection }";
}
using dc = data::cell;
os << "{ ";
if (dc::structure::get_member<dc::tags::flags>(c._data.get()).get<dc::tags::collection>()) {
os << "collection";
if (dc::structure::get_member<dc::tags::flags>(p._cell._data.get()).get<dc::tags::collection>()) {
os << "collection ";
auto cmv = p._cell.as_collection_mutation();
os << collection_mutation_view::printer(*p._cdef.type, cmv);
} else {
os << "atomic cell";
os << atomic_cell_view::printer(*p._cdef.type, p._cell.as_atomic_cell(p._cdef));
}
return os << " @" << static_cast<const void*>(c._data.get()) << " }";
return os << " }";
}

View File

@@ -26,13 +26,16 @@
#include "tombstone.hh"
#include "gc_clock.hh"
#include "utils/managed_bytes.hh"
#include "net/byteorder.hh"
#include <seastar/net//byteorder.hh>
#include <cstdint>
#include <iosfwd>
#include <seastar/util/gcc6-concepts.hh>
#include "data/cell.hh"
#include "data/schema_info.hh"
#include "imr/utils.hh"
#include "utils/fragmented_temporary_buffer.hh"
#include "serializer.hh"
class abstract_type;
class collection_type_impl;
@@ -150,6 +153,14 @@ public:
}
friend std::ostream& operator<<(std::ostream& os, const atomic_cell_view& acv);
class printer {
const abstract_type& _type;
const atomic_cell_view& _cell;
public:
printer(const abstract_type& type, const atomic_cell_view& cell) : _type(type), _cell(cell) {}
friend std::ostream& operator<<(std::ostream& os, const printer& acvp);
};
};
class atomic_cell_mutable_view final : public basic_atomic_cell_view<mutable_view::yes> {
@@ -186,6 +197,10 @@ public:
static atomic_cell make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time);
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value,
collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value,
collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value,
collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const bytes& value,
collection_member cm = collection_member::no) {
return make_live(type, timestamp, bytes_view(value), cm);
@@ -193,6 +208,10 @@ public:
static atomic_cell make_live_counter_update(api::timestamp_type timestamp, int64_t value);
static atomic_cell make_live(const abstract_type&, api::timestamp_type timestamp, bytes_view value,
gc_clock::time_point expiry, gc_clock::duration ttl, collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type&, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value,
gc_clock::time_point expiry, gc_clock::duration ttl, collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type&, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value,
gc_clock::time_point expiry, gc_clock::duration ttl, collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const bytes& value,
gc_clock::time_point expiry, gc_clock::duration ttl, collection_member cm = collection_member::no)
{
@@ -208,30 +227,12 @@ public:
static atomic_cell make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size);
friend class atomic_cell_or_collection;
friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);
};
class collection_mutation_view;
// Represents a mutation of a collection. Actual format is determined by collection type,
// and is:
// set: list of atomic_cell
// map: list of pair<atomic_cell, bytes> (for key/value)
// list: tbd, probably ugly
class collection_mutation {
public:
using imr_object_type = imr::utils::object<data::cell::structure>;
imr_object_type _data;
collection_mutation() {}
collection_mutation(const collection_type_impl&, collection_mutation_view v);
collection_mutation(const collection_type_impl&, bytes_view bv);
operator collection_mutation_view() const;
};
class collection_mutation_view {
public:
atomic_cell_value_view data;
class printer : atomic_cell_view::printer {
public:
printer(const abstract_type& type, const atomic_cell_view& cell) : atomic_cell_view::printer(type, cell) {}
friend std::ostream& operator<<(std::ostream& os, const printer& acvp);
};
};
class column_definition;

View File

@@ -24,6 +24,7 @@
// Not part of atomic_cell.hh to avoid cyclic dependency between types.hh and atomic_cell.hh
#include "types.hh"
#include "types/collection.hh"
#include "atomic_cell.hh"
#include "atomic_cell_or_collection.hh"
#include "hashing.hh"
@@ -33,14 +34,12 @@ template<>
struct appending_hash<collection_mutation_view> {
template<typename Hasher>
void operator()(Hasher& h, collection_mutation_view cell, const column_definition& cdef) const {
cell.data.with_linearized([&] (bytes_view cell_bv) {
auto ctype = static_pointer_cast<const collection_type_impl>(cdef.type);
auto m_view = ctype->deserialize_mutation_form(cell_bv);
::feed_hash(h, m_view.tomb);
for (auto&& key_and_value : m_view.cells) {
::feed_hash(h, key_and_value.first);
::feed_hash(h, key_and_value.second, cdef);
}
cell.with_deserialized(*cdef.type, [&] (collection_mutation_view_description m_view) {
::feed_hash(h, m_view.tomb);
for (auto&& key_and_value : m_view.cells) {
::feed_hash(h, key_and_value.first);
::feed_hash(h, key_and_value.second, cdef);
}
});
}
};

View File

@@ -22,6 +22,7 @@
#pragma once
#include "atomic_cell.hh"
#include "collection_mutation.hh"
#include "schema.hh"
#include "hashing.hh"
@@ -67,7 +68,19 @@ public:
bytes_view serialize() const;
bool equals(const abstract_type& type, const atomic_cell_or_collection& other) const;
size_t external_memory_usage(const abstract_type&) const;
friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
class printer {
const column_definition& _cdef;
const atomic_cell_or_collection& _cell;
public:
printer(const column_definition& cdef, const atomic_cell_or_collection& cell)
: _cdef(cdef), _cell(cell) { }
printer(const printer&) = delete;
printer(printer&&) = delete;
friend std::ostream& operator<<(std::ostream&, const printer&);
};
friend std::ostream& operator<<(std::ostream&, const printer&);
};
namespace std {

View File

@@ -72,19 +72,19 @@ public:
return make_ready_future<authenticated_user>(anonymous_user());
}
virtual future<> create(stdx::string_view, const authentication_options& options) const override {
virtual future<> create(std::string_view, const authentication_options& options) const override {
return make_ready_future();
}
virtual future<> alter(stdx::string_view, const authentication_options& options) const override {
virtual future<> alter(std::string_view, const authentication_options& options) const override {
return make_ready_future();
}
virtual future<> drop(stdx::string_view) const override {
virtual future<> drop(std::string_view) const override {
return make_ready_future();
}
virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override {
virtual future<custom_options> query_custom_options(std::string_view role_name) const override {
return make_ready_future<custom_options>();
}

View File

@@ -23,7 +23,6 @@
#include "auth/authorizer.hh"
#include "exceptions/exceptions.hh"
#include "stdx.hh"
namespace cql3 {
class query_processor;
@@ -58,12 +57,12 @@ public:
return make_ready_future<permission_set>(permissions::ALL);
}
virtual future<> grant(stdx::string_view, permission_set, const resource&) const override {
virtual future<> grant(std::string_view, permission_set, const resource&) const override {
return make_exception_future<>(
unsupported_authorization_operation("GRANT operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke(stdx::string_view, permission_set, const resource&) const override {
virtual future<> revoke(std::string_view, permission_set, const resource&) const override {
return make_exception_future<>(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}
@@ -74,7 +73,7 @@ public:
"LIST PERMISSIONS operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke_all(stdx::string_view) const override {
virtual future<> revoke_all(std::string_view) const override {
return make_exception_future(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}

View File

@@ -45,7 +45,7 @@
namespace auth {
authenticated_user::authenticated_user(stdx::string_view name)
authenticated_user::authenticated_user(std::string_view name)
: name(sstring(name)) {
}

View File

@@ -41,7 +41,7 @@
#pragma once
#include <experimental/string_view>
#include <string_view>
#include <functional>
#include <iosfwd>
#include <optional>
@@ -49,7 +49,6 @@
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
#include "stdx.hh"
namespace auth {
@@ -67,7 +66,7 @@ public:
/// An anonymous user.
///
authenticated_user() = default;
explicit authenticated_user(stdx::string_view name);
explicit authenticated_user(std::string_view name);
};
///

View File

@@ -57,7 +57,7 @@ inline bool any_authentication_options(const authentication_options& aos) noexce
class unsupported_authentication_option : public std::invalid_argument {
public:
explicit unsupported_authentication_option(authentication_option k)
: std::invalid_argument(sprint("The %s option is not supported.", k)) {
: std::invalid_argument(format("The {} option is not supported.", k)) {
}
};

View File

@@ -45,7 +45,6 @@
#include "auth/common.hh"
#include "auth/password_authenticator.hh"
#include "cql3/query_processor.hh"
#include "db/config.hh"
#include "utils/class_registrator.hh"
const sstring auth::authenticator::USERNAME_KEY("username");

View File

@@ -41,7 +41,7 @@
#pragma once
#include <experimental/string_view>
#include <string_view>
#include <memory>
#include <set>
#include <stdexcept>
@@ -55,10 +55,10 @@
#include "auth/authentication_options.hh"
#include "auth/resource.hh"
#include "auth/sasl_challenge.hh"
#include "bytes.hh"
#include "enum_set.hh"
#include "exceptions/exceptions.hh"
#include "stdx.hh"
namespace db {
class config;
@@ -122,7 +122,7 @@ public:
///
/// The options provided must be a subset of `supported_options()`.
///
virtual future<> create(stdx::string_view role_name, const authentication_options& options) const = 0;
virtual future<> create(std::string_view role_name, const authentication_options& options) const = 0;
///
/// Alter the authentication record of an existing user.
@@ -131,39 +131,25 @@ public:
///
/// Callers must ensure that the specification of `alterable_options()` is adhered to.
///
virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const = 0;
virtual future<> alter(std::string_view role_name, const authentication_options& options) const = 0;
///
/// Delete the authentication record for a user. This will disallow the user from logging in.
///
virtual future<> drop(stdx::string_view role_name) const = 0;
virtual future<> drop(std::string_view role_name) const = 0;
///
/// Query for custom options (those corresponding to \ref authentication_options::options).
///
/// If no options are set the result is an empty container.
///
virtual future<custom_options> query_custom_options(stdx::string_view role_name) const = 0;
virtual future<custom_options> query_custom_options(std::string_view role_name) const = 0;
///
/// System resources used internally as part of the implementation. These are made inaccessible to users.
///
virtual const resource_set& protected_resources() const = 0;
///
/// A stateful SASL challenge which supports many authentication schemes (depending on the implementation).
///
class sasl_challenge {
public:
virtual ~sasl_challenge() = default;
virtual bytes evaluate_response(bytes_view client_response) = 0;
virtual bool is_complete() const = 0;
virtual future<authenticated_user> get_authenticated_user() const = 0;
};
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const = 0;
};

View File

@@ -41,7 +41,7 @@
#pragma once
#include <experimental/string_view>
#include <string_view>
#include <functional>
#include <optional>
#include <stdexcept>
@@ -54,7 +54,6 @@
#include "auth/permission.hh"
#include "auth/resource.hh"
#include "seastarx.hh"
#include "stdx.hh"
namespace auth {
@@ -117,14 +116,14 @@ public:
///
/// \throws \ref unsupported_authorization_operation if granting permissions is not supported.
///
virtual future<> grant(stdx::string_view role_name, permission_set, const resource&) const = 0;
virtual future<> grant(std::string_view role_name, permission_set, const resource&) const = 0;
///
/// Revoke a set of permissions from a role for a particular \ref resource.
///
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke(stdx::string_view role_name, permission_set, const resource&) const = 0;
virtual future<> revoke(std::string_view role_name, permission_set, const resource&) const = 0;
///
/// Query for all directly granted permissions.
@@ -138,7 +137,7 @@ public:
///
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke_all(stdx::string_view role_name) const = 0;
virtual future<> revoke_all(std::string_view role_name) const = 0;
///
/// Revoke all permissions granted to any role for a particular resource.

View File

@@ -28,6 +28,7 @@
#include "database.hh"
#include "schema_builder.hh"
#include "service/migration_manager.hh"
#include "timeout_config.hh"
namespace auth {
@@ -47,9 +48,9 @@ future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_f
struct empty_state { };
return delay_until_system_ready(as).then([&as, func = std::move(func)] () mutable {
return exponential_backoff_retry::do_until_value(1s, 1min, as, [func = std::move(func)] {
return func().then_wrapped([] (auto&& f) -> stdx::optional<empty_state> {
return func().then_wrapped([] (auto&& f) -> std::optional<empty_state> {
if (f.failed()) {
auth_log.info("Auth task failed with error, rescheduling: {}", f.get_exception());
auth_log.debug("Auth task failed with error, rescheduling: {}", f.get_exception());
return { };
}
return { empty_state() };
@@ -59,16 +60,14 @@ future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_f
}
future<> create_metadata_table_if_missing(
stdx::string_view table_name,
std::string_view table_name,
cql3::query_processor& qp,
stdx::string_view cql,
std::string_view cql,
::service::migration_manager& mm) {
auto& db = qp.db().local();
if (db.has_schema(meta::AUTH_KS, sstring(table_name))) {
return make_ready_future<>();
}
static auto ignore_existing = [] (seastar::noncopyable_function<future<>()> func) {
return futurize_apply(std::move(func)).handle_exception_type([] (exceptions::already_exists_exception& ignored) { });
};
auto& db = qp.db();
auto parsed_statement = static_pointer_cast<cql3::statements::raw::cf_statement>(
cql3::query_processor::parse_statement(cql));
@@ -77,21 +76,36 @@ future<> create_metadata_table_if_missing(
auto statement = static_pointer_cast<cql3::statements::create_table_statement>(
parsed_statement->prepare(db, qp.get_cql_stats())->statement);
const auto schema = statement->get_cf_meta_data(qp.db().local());
const auto schema = statement->get_cf_meta_data(qp.db());
const auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());
schema_builder b(schema);
b.set_uuid(uuid);
schema_ptr table = b.build();
return ignore_existing([&mm, table = std::move(table)] () {
return mm.announce_new_column_family(table, false);
});
return mm.announce_new_column_family(b.build(), false);
}
future<> wait_for_schema_agreement(::service::migration_manager& mm, const database& db) {
future<> wait_for_schema_agreement(::service::migration_manager& mm, const database& db, seastar::abort_source& as) {
static const auto pause = [] { return sleep(std::chrono::milliseconds(500)); };
return do_until([&db] { return db.get_version() != database::empty_version; }, pause).then([&mm] {
return do_until([&mm] { return mm.have_schema_agreement(); }, pause);
return do_until([&db, &as] {
as.check();
return db.get_version() != database::empty_version;
}, pause).then([&mm, &as] {
return do_until([&mm, &as] {
as.check();
return mm.have_schema_agreement();
}, pause);
});
}
const timeout_config& internal_distributed_timeout_config() noexcept {
static const auto t = 5s;
static const timeout_config tc{t, t, t, t, t, t, t};
return tc;
}
}

View File

@@ -22,7 +22,7 @@
#pragma once
#include <chrono>
#include <experimental/string_view>
#include <string_view>
#include <seastar/core/future.hh>
#include <seastar/core/abort_source.hh>
@@ -38,6 +38,7 @@
using namespace std::chrono_literals;
class database;
class timeout_config;
namespace service {
class migration_manager;
@@ -75,11 +76,16 @@ inline future<> delay_until_system_ready(seastar::abort_source& as) {
future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_function<future<>()> func);
future<> create_metadata_table_if_missing(
stdx::string_view table_name,
std::string_view table_name,
cql3::query_processor&,
stdx::string_view cql,
std::string_view cql,
::service::migration_manager&);
future<> wait_for_schema_agreement(::service::migration_manager&, const database&);
future<> wait_for_schema_agreement(::service::migration_manager&, const database&, seastar::abort_source&);
///
/// Time-outs for internal, non-local CQL queries.
///
const timeout_config& internal_distributed_timeout_config() noexcept;
}

View File

@@ -61,6 +61,7 @@ extern "C" {
#include "cql3/untyped_result_set.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
#include "database.hh"
namespace auth {
@@ -94,11 +95,11 @@ default_authorizer::~default_authorizer() {
static const sstring legacy_table_name{"permissions"};
bool default_authorizer::legacy_metadata_exists() const {
return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);
return _qp.db().has_schema(meta::AUTH_KS, legacy_table_name);
}
future<bool> default_authorizer::any_granted() const {
static const sstring query = sprint("SELECT * FROM %s.%s LIMIT 1", meta::AUTH_KS, PERMISSIONS_CF);
static const sstring query = format("SELECT * FROM {}.{} LIMIT 1", meta::AUTH_KS, PERMISSIONS_CF);
return _qp.process(
query,
@@ -112,7 +113,7 @@ future<bool> default_authorizer::any_granted() const {
future<> default_authorizer::migrate_legacy_metadata() const {
alogger.info("Starting migration of legacy permissions metadata.");
static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);
static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);
return _qp.process(
query,
@@ -160,7 +161,7 @@ future<> default_authorizer::start() {
_migration_manager).then([this] {
_finished = do_after_system_ready(_as, [this] {
return async([this] {
wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
wait_for_schema_agreement(_migration_manager, _qp.db(), _as).get0();
if (legacy_metadata_exists()) {
if (!any_granted().get0()) {
@@ -178,7 +179,7 @@ future<> default_authorizer::start() {
future<> default_authorizer::stop() {
_as.request_abort();
return _finished.handle_exception_type([](const sleep_aborted&) {});
return _finished.handle_exception_type([](const sleep_aborted&) {}).handle_exception_type([](const abort_requested_exception&) {});
}
future<permission_set>
@@ -187,8 +188,7 @@ default_authorizer::authorize(const role_or_anonymous& maybe_role, const resourc
return make_ready_future<permission_set>(permissions::NONE);
}
static const sstring query = sprint(
"SELECT %s FROM %s.%s WHERE %s = ? AND %s = ?",
static const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? AND {} = ?",
PERMISSIONS_NAME,
meta::AUTH_KS,
PERMISSIONS_CF,
@@ -210,13 +210,12 @@ default_authorizer::authorize(const role_or_anonymous& maybe_role, const resourc
future<>
default_authorizer::modify(
stdx::string_view role_name,
std::string_view role_name,
permission_set set,
const resource& resource,
stdx::string_view op) const {
std::string_view op) const {
return do_with(
sprint(
"UPDATE %s.%s SET %s = %s %s ? WHERE %s = ? AND %s = ?",
format("UPDATE {}.{} SET {} = {} {} ? WHERE {} = ? AND {} = ?",
meta::AUTH_KS,
PERMISSIONS_CF,
PERMISSIONS_NAME,
@@ -228,23 +227,22 @@ default_authorizer::modify(
return _qp.process(
query,
db::consistency_level::ONE,
infinite_timeout_config,
internal_distributed_timeout_config(),
{permissions::to_strings(set), sstring(role_name), resource.name()}).discard_result();
});
}
future<> default_authorizer::grant(stdx::string_view role_name, permission_set set, const resource& resource) const {
future<> default_authorizer::grant(std::string_view role_name, permission_set set, const resource& resource) const {
return modify(role_name, std::move(set), resource, "+");
}
future<> default_authorizer::revoke(stdx::string_view role_name, permission_set set, const resource& resource) const {
future<> default_authorizer::revoke(std::string_view role_name, permission_set set, const resource& resource) const {
return modify(role_name, std::move(set), resource, "-");
}
future<std::vector<permission_details>> default_authorizer::list_all() const {
static const sstring query = sprint(
"SELECT %s, %s, %s FROM %s.%s",
static const sstring query = format("SELECT {}, {}, {} FROM {}.{}",
ROLE_NAME,
RESOURCE_NAME,
PERMISSIONS_NAME,
@@ -254,7 +252,7 @@ future<std::vector<permission_details>> default_authorizer::list_all() const {
return _qp.process(
query,
db::consistency_level::ONE,
infinite_timeout_config,
internal_distributed_timeout_config(),
{},
true).then([](::shared_ptr<cql3::untyped_result_set> results) {
std::vector<permission_details> all_details;
@@ -272,9 +270,8 @@ future<std::vector<permission_details>> default_authorizer::list_all() const {
});
}
future<> default_authorizer::revoke_all(stdx::string_view role_name) const {
static const sstring query = sprint(
"DELETE FROM %s.%s WHERE %s = ?",
future<> default_authorizer::revoke_all(std::string_view role_name) const {
static const sstring query = format("DELETE FROM {}.{} WHERE {} = ?",
meta::AUTH_KS,
PERMISSIONS_CF,
ROLE_NAME);
@@ -282,7 +279,7 @@ future<> default_authorizer::revoke_all(stdx::string_view role_name) const {
return _qp.process(
query,
db::consistency_level::ONE,
infinite_timeout_config,
internal_distributed_timeout_config(),
{sstring(role_name)}).discard_result().handle_exception([role_name](auto ep) {
try {
std::rethrow_exception(ep);
@@ -293,8 +290,7 @@ future<> default_authorizer::revoke_all(stdx::string_view role_name) const {
}
future<> default_authorizer::revoke_all(const resource& resource) const {
static const sstring query = sprint(
"SELECT %s FROM %s.%s WHERE %s = ? ALLOW FILTERING",
static const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? ALLOW FILTERING",
ROLE_NAME,
meta::AUTH_KS,
PERMISSIONS_CF,
@@ -311,8 +307,7 @@ future<> default_authorizer::revoke_all(const resource& resource) const {
res->begin(),
res->end(),
[this, res, resource](const cql3::untyped_result_set::row& r) {
static const sstring query = sprint(
"DELETE FROM %s.%s WHERE %s = ? AND %s = ?",
static const sstring query = format("DELETE FROM {}.{} WHERE {} = ? AND {} = ?",
meta::AUTH_KS,
PERMISSIONS_CF,
ROLE_NAME,

View File

@@ -77,13 +77,13 @@ public:
virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override;
virtual future<> grant(stdx::string_view, permission_set, const resource&) const override;
virtual future<> grant(std::string_view, permission_set, const resource&) const override;
virtual future<> revoke( stdx::string_view, permission_set, const resource&) const override;
virtual future<> revoke( std::string_view, permission_set, const resource&) const override;
virtual future<std::vector<permission_details>> list_all() const override;
virtual future<> revoke_all(stdx::string_view) const override;
virtual future<> revoke_all(std::string_view) const override;
virtual future<> revoke_all(const resource&) const override;
@@ -96,7 +96,7 @@ private:
future<> migrate_legacy_metadata() const;
future<> modify(stdx::string_view, permission_set, const resource&, stdx::string_view) const;
future<> modify(std::string_view, permission_set, const resource&, std::string_view) const;
};
} /* namespace auth */

View File

@@ -41,25 +41,24 @@
#include "auth/password_authenticator.hh"
extern "C" {
#include <crypt.h>
#include <unistd.h>
}
#include <algorithm>
#include <chrono>
#include <random>
#include <string_view>
#include <optional>
#include <boost/algorithm/cxx11/all_of.hpp>
#include <seastar/core/reactor.hh>
#include "auth/authenticated_user.hh"
#include "auth/common.hh"
#include "auth/passwords.hh"
#include "auth/roles-metadata.hh"
#include "cql3/untyped_result_set.hh"
#include "log.hh"
#include "service/migration_manager.hh"
#include "utils/class_registrator.hh"
#include "database.hh"
namespace auth {
@@ -82,6 +81,8 @@ static const class_registrator<
cql3::query_processor&,
::service::migration_manager&> password_auth_reg("org.apache.cassandra.auth.PasswordAuthenticator");
static thread_local auto rng_for_salt = std::default_random_engine(std::random_device{}());
password_authenticator::~password_authenticator() {
}
@@ -91,82 +92,11 @@ password_authenticator::password_authenticator(cql3::query_processor& qp, ::serv
, _stopped(make_ready_future<>()) {
}
// TODO: blowfish
// Origin uses Java bcrypt library, i.e. blowfish salt
// generation and hashing, which is arguably a "better"
// password hash than sha/md5 versions usually available in
// crypt_r. Otoh, glibc 2.7+ uses a modified sha512 algo
// which should be the same order of safe, so the only
// real issue should be salted hash compatibility with
// origin if importing system tables from there.
//
// Since bcrypt/blowfish is _not_ (afaict) not available
// as a dev package/lib on most linux distros, we'd have to
// copy and compile for example OWL crypto
// (http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/glibc/crypt_blowfish/)
// to be fully bit-compatible.
//
// Until we decide this is needed, let's just use crypt_r,
// and some old-fashioned random salt generation.
static constexpr size_t rand_bytes = 16;
static thread_local crypt_data tlcrypt = { 0, };
static sstring hashpw(const sstring& pass, const sstring& salt) {
auto res = crypt_r(pass.c_str(), salt.c_str(), &tlcrypt);
if (res == nullptr) {
throw std::system_error(errno, std::system_category());
}
return res;
}
static bool checkpw(const sstring& pass, const sstring& salted_hash) {
auto tmp = hashpw(pass, salted_hash);
return tmp == salted_hash;
}
static sstring gensalt() {
static sstring prefix;
std::random_device rd;
std::default_random_engine e1(rd());
std::uniform_int_distribution<char> dist;
sstring valid_salt = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./";
sstring input(rand_bytes, 0);
for (char&c : input) {
c = valid_salt[dist(e1) % valid_salt.size()];
}
sstring salt;
if (!prefix.empty()) {
return prefix + input;
}
// Try in order:
// blowfish 2011 fix, blowfish, sha512, sha256, md5
for (sstring pfx : { "$2y$", "$2a$", "$6$", "$5$", "$1$" }) {
salt = pfx + input;
if (crypt_r("fisk", salt.c_str(), &tlcrypt)) {
prefix = pfx;
return salt;
}
}
throw std::runtime_error("Could not initialize hashing algorithm");
}
static sstring hashpw(const sstring& pass) {
return hashpw(pass, gensalt());
}
static bool has_salted_hash(const cql3::untyped_result_set_row& row) {
return !row.get_or<sstring>(SALTED_HASH, "").empty();
}
static const sstring update_row_query = sprint(
"UPDATE %s SET %s = ? WHERE %s = ?",
static const sstring update_row_query = format("UPDATE {} SET {} = ? WHERE {} = ?",
meta::roles_table::qualified_name(),
SALTED_HASH,
meta::roles_table::role_col_name);
@@ -174,17 +104,17 @@ static const sstring update_row_query = sprint(
static const sstring legacy_table_name{"credentials"};
bool password_authenticator::legacy_metadata_exists() const {
return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);
return _qp.db().has_schema(meta::AUTH_KS, legacy_table_name);
}
future<> password_authenticator::migrate_legacy_metadata() const {
plogger.info("Starting migration of legacy authentication metadata.");
static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);
static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);
return _qp.process(
query,
db::consistency_level::QUORUM,
infinite_timeout_config).then([this](::shared_ptr<cql3::untyped_result_set> results) {
internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
auto username = row.get_as<sstring>("username");
auto salted_hash = row.get_as<sstring>(SALTED_HASH);
@@ -192,7 +122,7 @@ future<> password_authenticator::migrate_legacy_metadata() const {
return _qp.process(
update_row_query,
consistency_for_user(username),
infinite_timeout_config,
internal_distributed_timeout_config(),
{std::move(salted_hash), username}).discard_result();
}).finally([results] {});
}).then([] {
@@ -209,8 +139,8 @@ future<> password_authenticator::create_default_if_missing() const {
return _qp.process(
update_row_query,
db::consistency_level::QUORUM,
infinite_timeout_config,
{hashpw(DEFAULT_USER_PASSWORD), DEFAULT_USER_NAME}).then([](auto&&) {
internal_distributed_timeout_config(),
{passwords::hash(DEFAULT_USER_PASSWORD, rng_for_salt), DEFAULT_USER_NAME}).then([](auto&&) {
plogger.info("Created default superuser authentication record.");
});
}
@@ -221,8 +151,6 @@ future<> password_authenticator::create_default_if_missing() const {
future<> password_authenticator::start() {
return once_among_shards([this] {
gensalt(); // do this once to determine usable hashing
auto f = create_metadata_table_if_missing(
meta::roles_table::name,
_qp,
@@ -231,7 +159,7 @@ future<> password_authenticator::start() {
_stopped = do_after_system_ready(_as, [this] {
return async([this] {
wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
wait_for_schema_agreement(_migration_manager, _qp.db(), _as).get0();
if (any_nondefault_role_row_satisfies(_qp, &has_salted_hash).get0()) {
if (legacy_metadata_exists()) {
@@ -256,10 +184,10 @@ future<> password_authenticator::start() {
future<> password_authenticator::stop() {
_as.request_abort();
return _stopped.handle_exception_type([] (const sleep_aborted&) { });
return _stopped.handle_exception_type([] (const sleep_aborted&) { }).handle_exception_type([](const abort_requested_exception&) {});
}
db::consistency_level password_authenticator::consistency_for_user(stdx::string_view role_name) {
db::consistency_level password_authenticator::consistency_for_user(std::string_view role_name) {
if (role_name == DEFAULT_USER_NAME) {
return db::consistency_level::QUORUM;
}
@@ -285,10 +213,10 @@ authentication_option_set password_authenticator::alterable_options() const {
future<authenticated_user> password_authenticator::authenticate(
const credentials_map& credentials) const {
if (!credentials.count(USERNAME_KEY)) {
throw exceptions::authentication_exception(sprint("Required key '%s' is missing", USERNAME_KEY));
throw exceptions::authentication_exception(format("Required key '{}' is missing", USERNAME_KEY));
}
if (!credentials.count(PASSWORD_KEY)) {
throw exceptions::authentication_exception(sprint("Required key '%s' is missing", PASSWORD_KEY));
throw exceptions::authentication_exception(format("Required key '{}' is missing", PASSWORD_KEY));
}
auto& username = credentials.at(USERNAME_KEY);
@@ -300,8 +228,7 @@ future<authenticated_user> password_authenticator::authenticate(
// Rely on query processing caching statements instead, and lets assume
// that a map lookup string->statement is not gonna kill us much.
return futurize_apply([this, username, password] {
static const sstring query = sprint(
"SELECT %s FROM %s WHERE %s = ?",
static const sstring query = format("SELECT {} FROM {} WHERE {} = ?",
SALTED_HASH,
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
@@ -309,13 +236,17 @@ future<authenticated_user> password_authenticator::authenticate(
return _qp.process(
query,
consistency_for_user(username),
infinite_timeout_config,
internal_distributed_timeout_config(),
{username},
true);
}).then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
if (res->empty() || !checkpw(password, res->one().get_as<sstring>(SALTED_HASH))) {
auto salted_hash = std::optional<sstring>();
if (!res->empty()) {
salted_hash = res->one().get_opt<sstring>(SALTED_HASH);
}
if (!salted_hash || !passwords::check(password, *salted_hash)) {
throw exceptions::authentication_exception("Username and/or password are incorrect");
}
return make_ready_future<authenticated_user>(username);
@@ -323,13 +254,15 @@ future<authenticated_user> password_authenticator::authenticate(
std::throw_with_nested(exceptions::authentication_exception("Could not verify password"));
} catch (exceptions::request_execution_exception& e) {
std::throw_with_nested(exceptions::authentication_exception(e.what()));
} catch (exceptions::authentication_exception& e) {
std::throw_with_nested(e);
} catch (...) {
std::throw_with_nested(exceptions::authentication_exception("authentication failed"));
}
});
}
future<> password_authenticator::create(stdx::string_view role_name, const authentication_options& options) const {
future<> password_authenticator::create(std::string_view role_name, const authentication_options& options) const {
if (!options.password) {
return make_ready_future<>();
}
@@ -337,17 +270,16 @@ future<> password_authenticator::create(stdx::string_view role_name, const authe
return _qp.process(
update_row_query,
consistency_for_user(role_name),
infinite_timeout_config,
{hashpw(*options.password), sstring(role_name)}).discard_result();
internal_distributed_timeout_config(),
{passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result();
}
future<> password_authenticator::alter(stdx::string_view role_name, const authentication_options& options) const {
future<> password_authenticator::alter(std::string_view role_name, const authentication_options& options) const {
if (!options.password) {
return make_ready_future<>();
}
static const sstring query = sprint(
"UPDATE %s SET %s = ? WHERE %s = ?",
static const sstring query = format("UPDATE {} SET {} = ? WHERE {} = ?",
meta::roles_table::qualified_name(),
SALTED_HASH,
meta::roles_table::role_col_name);
@@ -355,21 +287,23 @@ future<> password_authenticator::alter(stdx::string_view role_name, const authen
return _qp.process(
query,
consistency_for_user(role_name),
infinite_timeout_config,
{hashpw(*options.password), sstring(role_name)}).discard_result();
internal_distributed_timeout_config(),
{passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result();
}
future<> password_authenticator::drop(stdx::string_view name) const {
static const sstring query = sprint(
"DELETE %s FROM %s WHERE %s = ?",
future<> password_authenticator::drop(std::string_view name) const {
static const sstring query = format("DELETE {} FROM {} WHERE {} = ?",
SALTED_HASH,
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
return _qp.process(query, consistency_for_user(name), infinite_timeout_config, {sstring(name)}).discard_result();
return _qp.process(
query, consistency_for_user(name),
internal_distributed_timeout_config(),
{sstring(name)}).discard_result();
}
future<custom_options> password_authenticator::query_custom_options(stdx::string_view role_name) const {
future<custom_options> password_authenticator::query_custom_options(std::string_view role_name) const {
return make_ready_future<custom_options>();
}
@@ -378,75 +312,13 @@ const resource_set& password_authenticator::protected_resources() const {
return resources;
}
::shared_ptr<authenticator::sasl_challenge> password_authenticator::new_sasl_challenge() const {
class plain_text_password_challenge : public sasl_challenge {
const password_authenticator& _self;
public:
plain_text_password_challenge(const password_authenticator& self) : _self(self) {
}
/**
* SASL PLAIN mechanism specifies that credentials are encoded in a
* sequence of UTF-8 bytes, delimited by 0 (US-ASCII NUL).
* The form is : {code}authzId<NUL>authnId<NUL>password<NUL>{code}
* authzId is optional, and in fact we don't care about it here as we'll
* set the authzId to match the authnId (that is, there is no concept of
* a user being authorized to act on behalf of another).
*
* @param bytes encoded credentials string sent by the client
* @return map containing the username/password pairs in the form an IAuthenticator
* would expect
* @throws javax.security.sasl.SaslException
*/
bytes evaluate_response(bytes_view client_response) override {
plogger.debug("Decoding credentials from client token");
sstring username, password;
auto b = client_response.crbegin();
auto e = client_response.crend();
auto i = b;
while (i != e) {
if (*i == 0) {
sstring tmp(i.base(), b.base());
if (password.empty()) {
password = std::move(tmp);
} else if (username.empty()) {
username = std::move(tmp);
}
b = ++i;
continue;
}
++i;
}
if (username.empty()) {
throw exceptions::authentication_exception("Authentication ID must not be null");
}
if (password.empty()) {
throw exceptions::authentication_exception("Password must not be null");
}
_credentials[USERNAME_KEY] = std::move(username);
_credentials[PASSWORD_KEY] = std::move(password);
_complete = true;
return {};
}
bool is_complete() const override {
return _complete;
}
future<authenticated_user> get_authenticated_user() const override {
return _self.authenticate(_credentials);
}
private:
credentials_map _credentials;
bool _complete = false;
};
return ::make_shared<plain_text_password_challenge>(*this);
::shared_ptr<sasl_challenge> password_authenticator::new_sasl_challenge() const {
return ::make_shared<plain_sasl_challenge>([this](std::string_view username, std::string_view password) {
credentials_map credentials{};
credentials[USERNAME_KEY] = sstring(username);
credentials[PASSWORD_KEY] = sstring(password);
return this->authenticate(credentials);
});
}
}

View File

@@ -61,7 +61,7 @@ class password_authenticator : public authenticator {
seastar::abort_source _as;
public:
static db::consistency_level consistency_for_user(stdx::string_view role_name);
static db::consistency_level consistency_for_user(std::string_view role_name);
password_authenticator(cql3::query_processor&, ::service::migration_manager&);
@@ -81,13 +81,13 @@ public:
virtual future<authenticated_user> authenticate(const credentials_map& credentials) const override;
virtual future<> create(stdx::string_view role_name, const authentication_options& options) const override;
virtual future<> create(std::string_view role_name, const authentication_options& options) const override;
virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const override;
virtual future<> alter(std::string_view role_name, const authentication_options& options) const override;
virtual future<> drop(stdx::string_view role_name) const override;
virtual future<> drop(std::string_view role_name) const override;
virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override;
virtual future<custom_options> query_custom_options(std::string_view role_name) const override;
virtual const resource_set& protected_resources() const override;

84
auth/passwords.cc Normal file
View File

@@ -0,0 +1,84 @@
/*
* Copyright (C) 2018 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/passwords.hh"
#include <cerrno>
#include <optional>
extern "C" {
#include <crypt.h>
#include <unistd.h>
}
namespace auth::passwords {
static thread_local crypt_data tlcrypt = { 0, };
namespace detail {
scheme identify_best_supported_scheme() {
const auto all_schemes = { scheme::bcrypt_y, scheme::bcrypt_a, scheme::sha_512, scheme::sha_256, scheme::md5 };
// "Random", for testing schemes.
const sstring random_part_of_salt = "aaaabbbbccccdddd";
for (scheme c : all_schemes) {
const sstring salt = sstring(prefix_for_scheme(c)) + random_part_of_salt;
const char* e = crypt_r("fisk", salt.c_str(), &tlcrypt);
if (e && (e[0] != '*')) {
return c;
}
}
throw no_supported_schemes();
}
sstring hash_with_salt(const sstring& pass, const sstring& salt) {
auto res = crypt_r(pass.c_str(), salt.c_str(), &tlcrypt);
if (!res || (res[0] == '*')) {
throw std::system_error(errno, std::system_category());
}
return res;
}
const char* prefix_for_scheme(scheme c) noexcept {
switch (c) {
case scheme::bcrypt_y: return "$2y$";
case scheme::bcrypt_a: return "$2a$";
case scheme::sha_512: return "$6$";
case scheme::sha_256: return "$5$";
case scheme::md5: return "$1$";
default: return nullptr;
}
}
} // namespace detail
no_supported_schemes::no_supported_schemes()
: std::runtime_error("No allowed hashing schemes are supported on this system") {
}
bool check(const sstring& pass, const sstring& salted_hash) {
return detail::hash_with_salt(pass, salted_hash) == salted_hash;
}
} // namespace auth::paswords

125
auth/passwords.hh Normal file
View File

@@ -0,0 +1,125 @@
/*
* Copyright (C) 2018 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <random>
#include <stdexcept>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
namespace auth::passwords {
class no_supported_schemes : public std::runtime_error {
public:
no_supported_schemes();
};
///
/// Apache Cassandra uses a library to provide the bcrypt scheme. Many Linux implementations do not support bcrypt, so
/// we support alternatives. The cost is loss of direct compatibility with Apache Cassandra system tables.
///
enum class scheme {
bcrypt_y,
bcrypt_a,
sha_512,
sha_256,
md5
};
namespace detail {
template <typename RandomNumberEngine>
sstring generate_random_salt_bytes(RandomNumberEngine& g) {
static const sstring valid_bytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./";
static constexpr std::size_t num_bytes = 16;
std::uniform_int_distribution<std::size_t> dist(0, valid_bytes.size() - 1);
sstring result(num_bytes, 0);
for (char& c : result) {
c = valid_bytes[dist(g)];
}
return result;
}
///
/// Test each allowed hashing scheme and report the best supported one on the current system.
///
/// \throws \ref no_supported_schemes when none of the known schemes is supported.
///
scheme identify_best_supported_scheme();
const char* prefix_for_scheme(scheme) noexcept;
///
/// Generate a implementation-specific salt string for hashing passwords.
///
/// The `RandomNumberEngine` is used to generate the string, which is an implementation-specific length.
///
/// \throws \ref no_supported_schemes when no known hashing schemes are supported on the system.
///
template <typename RandomNumberEngine>
sstring generate_salt(RandomNumberEngine& g) {
static const scheme scheme = identify_best_supported_scheme();
static const sstring prefix = sstring(prefix_for_scheme(scheme));
return prefix + generate_random_salt_bytes(g);
}
///
/// Hash a password combined with an implementation-specific salt string.
///
/// \throws \ref std::system_error when an unexpected implementation-specific error occurs.
///
sstring hash_with_salt(const sstring& pass, const sstring& salt);
} // namespace detail
///
/// Run a one-way hashing function on cleartext to produce encrypted text.
///
/// Prior to applying the hashing function, random salt is amended to the cleartext. The random salt bytes are generated
/// according to the random number engine `g`.
///
/// The result is the encrypted cyphertext, and also the salt used but in a implementation-specific format.
///
/// \throws \ref std::system_error when the implementation-specific implementation fails to hash the cleartext.
///
template <typename RandomNumberEngine>
sstring hash(const sstring& pass, RandomNumberEngine& g) {
return detail::hash_with_salt(pass, detail::generate_salt(g));
}
///
/// Check that cleartext matches previously hashed cleartext with salt.
///
/// \ref salted_hash is the result of invoking \ref hash, which is the implementation-specific combination of the hashed
/// password and the salt that was generated for it.
///
/// \returns `true` if the cleartext matches the salted hash.
///
/// \throws \ref std::system_error when an unexpected implementation-specific error occurs.
///
bool check(const sstring& pass, const sstring& salted_hash);
} // namespace auth::passwords

View File

@@ -24,19 +24,9 @@
#include "auth/authorizer.hh"
#include "auth/common.hh"
#include "auth/service.hh"
#include "db/config.hh"
namespace auth {
permissions_cache_config permissions_cache_config::from_db_config(const db::config& dc) {
permissions_cache_config c;
c.max_entries = dc.permissions_cache_max_entries();
c.validity_period = std::chrono::milliseconds(dc.permissions_validity_in_ms());
c.update_period = std::chrono::milliseconds(dc.permissions_update_interval_in_ms());
return c;
}
permissions_cache::permissions_cache(const permissions_cache_config& c, service& ser, logging::logger& log)
: _cache(c.max_entries, c.validity_period, c.update_period, log, [&ser, &log](const key_type& k) {
log.debug("Refreshing permissions for {}", k.first);

View File

@@ -22,7 +22,7 @@
#pragma once
#include <chrono>
#include <experimental/string_view>
#include <string_view>
#include <functional>
#include <iostream>
#include <optional>
@@ -37,7 +37,6 @@
#include "auth/resource.hh"
#include "auth/role_or_anonymous.hh"
#include "log.hh"
#include "stdx.hh"
#include "utils/hash.hh"
#include "utils/loading_cache.hh"
@@ -59,8 +58,6 @@ namespace auth {
class service;
struct permissions_cache_config final {
static permissions_cache_config from_db_config(const db::config&);
std::size_t max_entries;
std::chrono::milliseconds validity_period;
std::chrono::milliseconds update_period;

View File

@@ -61,7 +61,7 @@ std::ostream& operator<<(std::ostream& os, resource_kind kind) {
return os;
}
static const std::unordered_map<resource_kind, stdx::string_view> roots{
static const std::unordered_map<resource_kind, std::string_view> roots{
{resource_kind::data, "data"},
{resource_kind::role, "roles"}};
@@ -101,24 +101,25 @@ static permission_set applicable_permissions(const role_resource_view& rv) {
permission::DESCRIBE>();
}
resource::resource(resource_kind kind) : _kind(kind), _parts{sstring(roots.at(kind))} {
resource::resource(resource_kind kind) : _kind(kind) {
_parts.emplace_back(roots.at(kind));
}
resource::resource(resource_kind kind, std::vector<sstring> parts) : resource(kind) {
_parts.reserve(parts.size() + 1);
resource::resource(resource_kind kind, utils::small_vector<sstring, 3> parts) : resource(kind) {
_parts.insert(_parts.end(), std::make_move_iterator(parts.begin()), std::make_move_iterator(parts.end()));
}
resource::resource(data_resource_t, stdx::string_view keyspace)
: resource(resource_kind::data, std::vector<sstring>{sstring(keyspace)}) {
resource::resource(data_resource_t, std::string_view keyspace) : resource(resource_kind::data) {
_parts.emplace_back(keyspace);
}
resource::resource(data_resource_t, stdx::string_view keyspace, stdx::string_view table)
: resource(resource_kind::data, std::vector<sstring>{sstring(keyspace), sstring(table)}) {
resource::resource(data_resource_t, std::string_view keyspace, std::string_view table) : resource(resource_kind::data) {
_parts.emplace_back(keyspace);
_parts.emplace_back(table);
}
resource::resource(role_resource_t, stdx::string_view role)
: resource(resource_kind::role, std::vector<sstring>{sstring(role)}) {
resource::resource(role_resource_t, std::string_view role) : resource(resource_kind::role) {
_parts.emplace_back(role);
}
sstring resource::name() const {
@@ -173,7 +174,7 @@ data_resource_view::data_resource_view(const resource& r) : _resource(r) {
}
}
std::optional<stdx::string_view> data_resource_view::keyspace() const {
std::optional<std::string_view> data_resource_view::keyspace() const {
if (_resource._parts.size() == 1) {
return {};
}
@@ -181,7 +182,7 @@ std::optional<stdx::string_view> data_resource_view::keyspace() const {
return _resource._parts[1];
}
std::optional<stdx::string_view> data_resource_view::table() const {
std::optional<std::string_view> data_resource_view::table() const {
if (_resource._parts.size() <= 2) {
return {};
}
@@ -210,7 +211,7 @@ role_resource_view::role_resource_view(const resource& r) : _resource(r) {
}
}
std::optional<stdx::string_view> role_resource_view::role() const {
std::optional<std::string_view> role_resource_view::role() const {
if (_resource._parts.size() == 1) {
return {};
}
@@ -230,9 +231,9 @@ std::ostream& operator<<(std::ostream& os, const role_resource_view& v) {
return os;
}
resource parse_resource(stdx::string_view name) {
static const std::unordered_map<stdx::string_view, resource_kind> reverse_roots = [] {
std::unordered_map<stdx::string_view, resource_kind> result;
resource parse_resource(std::string_view name) {
static const std::unordered_map<std::string_view, resource_kind> reverse_roots = [] {
std::unordered_map<std::string_view, resource_kind> result;
for (const auto& pair : roots) {
result.emplace(pair.second, pair.first);
@@ -241,7 +242,7 @@ resource parse_resource(stdx::string_view name) {
return result;
}();
std::vector<sstring> parts;
utils::small_vector<sstring, 3> parts;
boost::split(parts, name, [](char ch) { return ch == '/'; });
if (parts.empty()) {

Some files were not shown because too many files have changed in this diff Show More