Compare commits

..

706 Commits

Author SHA1 Message Date
Konstantin Osipov
fd293768e7 storage_proxy: do not touch all_replicas.front() if it's empty.
The list of all endpoints for a query can be empty if we have
replication_factor 0 or there are no live endpoints for this token.
Do not access all_replicas.front() in this case.

Fixes #5935.
Message-Id: <20200306192521.73486-2-kostja@scylladb.com>

(cherry picked from commit 9827efe554)
2020-06-22 18:29:15 +03:00
Gleb Natapov
22dfa48585 cql transport: do not log broken pipe error when a client closes its side of a connection abruptly
Fixes #5661

Message-Id: <20200615075958.GL335449@scylladb.com>
(cherry picked from commit 7ca937778d)
2020-06-21 13:09:22 +03:00
Benny Halevy
2f3d7f1408 cql3::util::maybe_quote: avoid stack overflow and fix quote doubling
The function was reimplemented to solve the following issues.
The cutom implementation also improved its performance in
close to 19%

Using regex_match("[a-z][a-z0-9_]*") may cause stack overflow on long input strings
as found with the limits_test.py:TestLimits.max_key_length_test dtest.

std::regex_replace does not replace in-place so no doubling of
quotes was actually done.

Add unit test that reproduces the crash without this fix
and tests various string patterns for correctness.

Note that defining the regex with std::regex::optimize
still ended up with stack overflow.

Fixes #5671

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0329fe1fd1)
2020-06-21 13:07:21 +03:00
Gleb Natapov
76a08df939 commitlog: fix size of a write used to zero a segment
Due to a bug the entire segment is written in one huge write of 32Mb.
The idea was to split it to writes of 128K, so fix it.

Fixes #5857

Message-Id: <20200220102939.30769-1-gleb@scylladb.com>
(cherry picked from commit df2f67626b)
2020-06-21 13:03:05 +03:00
Amnon Heiman
6aa129d3b0 api/storage_service.cc: stream result of token_range
The get token range API can become big which can cause large allocation
and stalls.

This patch replace the implementation so it would stream the results
using the http stream capabilities instead of serialization and sending
one big buffer.

Fixes #6297

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 7c4562d532)
2020-06-21 12:57:48 +03:00
Takuya ASADA
b4f781e4eb scylla_post_install.sh: fix operator precedence issue with multiple statements
In bash, 'A || B && C' will be problem because when A is true, then it will be
evaluates C, since && and || have the same precedence.
To avoid the issue we need make B && C in one statement.

Fixes #5764

(cherry picked from commit b6988112b4)
2020-06-21 12:47:05 +03:00
Takuya ASADA
27594ca50e scylla_raid_setup: create missing directories
We need to create hints, view_hints, saved_caches directories
on RAID volume.

Fixes #5811

(cherry picked from commit 086f0ffd5a)
2020-06-21 12:45:27 +03:00
Rafael Ávila de Espíndola
0f2f0d65d7 configure: Reduce the dynamic linker path size
gdb has a SO_NAME_MAX_PATH_SIZE of 512, so we use that as the path
size.

Fixes: #6494

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200528202741.398695-2-espindola@scylladb.com>
(cherry picked from commit aa778ec152)
2020-06-21 12:29:16 +03:00
Tomasz Grabiec
31c2f8a3ae row_cache: Fix undefined behavior on key linearization
This is relevant only when using partition or clustering keys which
have a representation in memory which is larger than 12.8 KB (10% of
LSA segment size).

There are several places in code (cache, background garbage
collection) which may need to linearize keys because of performing key
comparison, but it's not done safely:

 1) the code does not run with the LSA region locked, so pointers may
get invalidated on linearization if it needs to reclaim memory. This
is fixed by running the code inside an allocating section.

 2) LSA region is locked, but the scope of
with_linearized_managed_bytes() encloses the allocating section. If
allocating section needs to reclaim, linearization context will
contain invalidated pointers. The fix is to reorder the scopes so
that linearization context lives within an allocating section.

Example of 1 can be found in
range_populating_reader::handle_end_of_stream() where it performs a
lookup:

  auto prev = std::prev(it);
  if (prev->key().equal(*_cache._schema, *_last_key->_key)) {
     it->set_continuous(true);

but handle_end_of_stream() is not invoked under allocating section.

Example of 2 can be found in mutation_cleaner_impl::merge_some() where
it does:

  return with_linearized_managed_bytes([&] {
  ...
    return _worker_state->alloc_section(region, [&] {

Fixes #6637.
Refs #6108.

Tests:

  - unit (all)

Message-Id: <1592218544-9435-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit e81fc1f095)
2020-06-21 11:58:59 +03:00
Yaron Kaikov
ec12331f11 release: prepare for 3.3.4 2020-06-15 21:19:02 +03:00
Avi Kivity
ccc463b5e5 tools: toolchain: regenerate for gnutls 3.6.14
CVE-2020-13777.

Fixes #6627.

Toolchain source image registry disambiguated due to tighter podman defaults.
2020-06-15 08:05:58 +03:00
Calle Wilund
4a9676f6b7 gms::inet_address: Fix sign extension error in custom address formatting
Fixes #5808

Seems some gcc:s will generate the code as sign extending. Mine does not,
but this should be more correct anyhow.

Added small stringify test to serialization_test for inet_address

(cherry picked from commit a14a28cdf4)
2020-06-09 20:16:50 +03:00
Takuya ASADA
aaf4989c31 aws: update enhanced networking supported instance list
Sync enhanced networking supported instance list to latest one.

Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html

Fixes #6540

(cherry picked from commit 969c4258cf)
2020-06-09 16:03:00 +03:00
Asias He
b29f954f20 gossip: Make is_safe_for_bootstrap more strict
Consider

1. Start n1, n2 in the cluster
2. Stop n2 and delete all data for n2
3. Start n2 to replace itself with replace_address_first_boot: n2
4. Kill n2 before n2 finishes the replace operation
5. Remove replace_address_first_boot: n2 from scylla.yaml of n2
6. Delete all data for n2
7. Start n2

At step 7, n2 will be allowed to bootstrap as a new node, because the
application state of n2 in the cluster is HIBERNATE which is not
rejected in the check of is_safe_for_bootstrap. As a result, n2 will
replace n2 with a different tokens and a different host_id, as if the
old n2 node was removed from the cluster silently.

Fixes #5172

(cherry picked from commit cdcedf5eb9)
2020-05-25 14:30:53 +03:00
Eliran Sinvani
5546d5df7b Auth: return correct error code when role is not found
Scylla returns the wrong error code (0000 - server internal error)
in response to trying to do authentication/authorization operations
that involves a non-existing role.
This commit changes those cases to return error code 2200 (invalid
query) which is the correct one and also the one that Cassandra
returns.
Tests:
    Unit tests (Dev)
    All auth and auth_role dtests

(cherry picked from commit ce8cebe34801f0ef0e327a32f37442b513ffc214)

Fixes #6363.
2020-05-25 12:58:38 +03:00
Amnon Heiman
541c29677f storage_service: get_range_to_address_map prevent use after free
The implementation of get_range_to_address_map has a default behaviour,
when getting an empty keypsace, it uses the first non-system keyspace
(first here is basically, just a keyspace).

The current implementation has two issues, first, it uses a reference to
a string that is held on a stack of another function. In other word,
there's a use after free that is not clear why we never hit.

The second, it calls get_non_system_keyspaces twice. Though this is not
a bug, it's redundant (get_non_system_keyspaces uses a loop, so calling
that function does have a cost).

This patch solves both issues, by chaning the implementation to hold a
string instead of a reference to a string.

Second, it stores the results from get_non_system_keyspaces and reuse
them it's more efficient and holds the returned values on the local
stack.

Fixes #6465

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 69a46d4179)
2020-05-25 12:48:48 +03:00
Hagit Segev
06f18108c0 release: prepare for 3.3.3 2020-05-24 23:28:07 +03:00
Tomasz Grabiec
90002ca3d2 sstables: index_reader: Fix overflow when calculating promoted index end
When index file is larger than 4GB, offset calculation will overflow
uint32_t and _promoted_index_end will be too small.

As a result, promoted_index_size calculation will underflow and the
rest of the page will be interpretd as a promoted index.

The partitions which are in the remainder of the index page will not
be found by single-partition queries.

Data is not lost.

Introduced in 6c5f8e0eda.

Fixes #6040
Message-Id: <20200521174822.8350-1-tgrabiec@scylladb.com>

(cherry picked from commit a6c87a7b9e)
2020-05-24 09:46:11 +03:00
Rafael Ávila de Espíndola
da23902311 repair: Make sure sinks are always closed
In a recent next failure I got the following backtrace

    function=function@entry=0x270360 "seastar::rpc::sink_impl<Serializer, Out>::~sink_impl() [with Serializer = netw::serializer; Out = {repair_row_on_wire_with_cmd}]") at assert.c:101
    at ./seastar/include/seastar/core/shared_ptr.hh:463
    at repair/row_level.cc:2059

This patch changes a few functions to use finally to make sure the sink
is always closed.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200515202803.60020-1-espindola@scylladb.com>
(cherry picked from commit 311fbe2f0a)

Ref #6414
2020-05-20 09:00:57 +03:00
Asias He
2b0dc21f97 repair: Fix race between write_end_of_stream and apply_rows
Consider: n1, n2, n1 is the repair master, n2 is the repair follower.

=== Case 1 ===
1) n1 sends missing rows {r1, r2} to n2
2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1
   is written to sstable, r2 is not written yet, r1 belongs to
   partition 1, r2 belongs to partition 2. It yields after row r1 is
   written.
   data: partition_start, r1
3) n1 sends repair_row_level_stop to n2 because error has happened on n1
4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream()
   data: partition_start, r1, partition_end
5) Step 2 resumes to apply the rows.
   data: partition_start, r1, partition_end, partition_end, partition_start, r2

=== Case 2 ===
1) n1 sends missing rows {r1, r2} to n2
2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1
   is written to sstable, r2 is not written yet, r1 belongs to partition
   1, r2 belongs to partition 2. It yields after partition_start for r2
   is written but before _partition_opened is set to true.
   data: partition_start, r1, partition_end, partition_start
3) n1 sends repair_row_level_stop to n2 because error has happened on n1
4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream().
   Since _partition_opened[node_idx] is false, partition_end is skipped,
   end_of_stream is written.
   data: partition_start, r1, partition_end, partition_start, end_of_stream

This causes unbalanced partition_start and partition_end in the stream
written to sstables.

To fix, serialize the write_end_of_stream and apply_rows with a semaphore.

Fixes: #6394
Fixes: #6296
Fixes: #6414
(cherry picked from commit b2c4d9fdbc)
2020-05-20 08:22:05 +03:00
Piotr Dulikowski
b544691493 hinted handoff: don't keep positions of old hints in rps_set
When sending hints from one file, rps_set field in send_one_file_ctx
keeps track of commitlog positions of hints that are being currently
sent, or have failed to be sent. At the end of the operation, if sending
of some hints failed, we will choose position of the earliest hint that
failed to be sent, and will retry sending that file later, starting from
that position. This position is stored in _last_not_complete_rp.

Usually, this set has a bounded size, because we impose a limit of at
most 128 hints being sent concurrently. Because we do not attempt to
send any more hints after a failure is detected, rps_set should not have
more than 128 elements at a time.

Due to a bug, commitlog positions of old hints (older than
gc_grace_seconds of the destination table) were inserted into rps_set
but not removed after checking their age. This could cause rps_set to
grow very large when replaying a file with old hints.

Moreover, if the file mixed expired and non-expired hints (which could
happen if it had hints to two tables with different gc_grace_seconds),
and sending of some non-expired hints failed, then positions of expired
hints could influence calculation _last_not_complete_rp, and more hints
than necessary would be resent on the next retry.

This simple patch removes commitlog position of a hint from rps_set when
it is detected to be too old.

Fixes #6422

(cherry picked from commit 85d5c3d5ee)
2020-05-20 08:06:17 +03:00
Piotr Dulikowski
d420b06844 hinted handoff: remove discarded hint positions from rps_set
Related commit: 85d5c3d

When attempting to send a hint, an exception might occur that results in
that hint being discarded (e.g. keyspace or table of the hint was
removed).

When such an exception is thrown, position of the hint will already be
stored in rps_set. We are only allowed to retain positions of hints that
failed to be sent and needed to be retried later. Dropping a hint is not
an error, therefore its position should be removed from rps_set - but
current logic does not do that.

Because of that bug, hint files with many discardable hints might cause
rps_set to grow large when the file is replayed. Furthermore, leaving
positions of such hints in rps_set might cause more hints than necessary
to be re-sent if some non-discarded hints fail to be sent.

This commit fixes the problem by removing positions of discarded hints
from rps_set.

Fixes #6433

(cherry picked from commit 0c5ac0da98)
2020-05-20 08:04:10 +03:00
Avi Kivity
b3a2cb2f68 Update seastar submodule
* seastar 0ebd89a858...30f03aeba9 (1):
  > timer: add scheduling_group awareness

Fixes #6170.
2020-05-10 18:39:20 +03:00
Hagit Segev
c8c057f5f8 release: prepare for 3.3.2 2020-05-10 18:16:28 +03:00
Gleb Natapov
038bfc925c storage_proxy: limit read repair only to replicas that answered during speculative reads
Speculative reader has more targets that needed for CL. In case there is
a digest mismatch the repair runs between all of them, but that violates
provided CL. The patch makes it so that repair runs only between
replicas that answered (there will be CL of them).

Fixes #6123

Reviewed-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200402132245.GA21956@scylladb.com>
(cherry picked from commit 36a24bbb70)
2020-05-07 19:48:37 +03:00
Mike Goltsov
13a4e7db83 fix error in fstrim service (scylla_util.py)
On Centos 7 machine:

fstrim.timer not enabled, only unmasked due scylla_fstrim_setup on installation
When trying run scylla-fstrim service manually you get error:

Traceback (most recent call last):
File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 60, in <module>
main()
File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 44, in main
cfg = parse_scylla_dirs_with_default(conf=args.config)
File "/opt/scylladb/scripts/scylla_util.py", line 484, in parse_scylla_dirs_with_default
if key not in y or not y[k]:
NameError: name 'k' is not defined

It caused by error in scylla_util.py

Fixes #6294.

(cherry picked from commit 068bb3a5bf)
2020-05-07 19:45:50 +03:00
Juliusz Stasiewicz
727d6cf8f3 atomic_cell: special rule for printing counter cells
Until now, attempts to print counter update cell would end up
calling abort() because `atomic_cell_view::value()` has no
specialized visitor for `imr::pod<int64_t>::basic_view<is_mutable>`,
i.e. counter update IMR type. Such visitor is not easy to write
if we want to intercept counters only (and not all int64_t values).

Anyway, linearized byte representation of counter cell would not
be helpful without knowing if it consists of counter shards or
counter update (delta) - and this must be known upon `deserialize`.

This commit introduces simple approach: it determines cell type on
high level (from `atomic_cell_view`) and prints counter contents by
`counter_cell_view` or `atomic_cell_view::counter_update_value()`.

Fixes #5616

(cherry picked from commit 0ea17216fe)
2020-05-07 19:40:47 +03:00
Tomasz Grabiec
6d6d7b4abe sstables: Release reserved space for sharding metadata
The intention of the code was to clear sharding metadata
chunked_vector so that it doesn't bloat memory.

The type of c is `chunked_vector*`. Assigning `{}`
clears the pointer while the intended behavior was to reset the
`chunked_vector` instance. The original instance is left unmodified
with all its reserved space.

Because of this, the previous fix had no effect because token ranges
are stored entirely inline and popping them doesn't realease memory.

Fixes #4951

Tests:
  - sstable_mutation_test (dev)
  - manual using scylla binary on customer data on top of 2019.1.5

Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1584559892-27653-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 5fe626a887)
2020-05-07 19:06:22 +03:00
Tomasz Grabiec
28f974b810 Merge "Don't return stale data by properly invalidating row cache after cleanup" from Raphael
Row cache needs to be invalidated whenever data in sstables
changes. Cleanup removes data from sstables which doesn't belong to
the node anymore, which means cache must be invalidated on cleanup.
Currently, stale data can be returned when a node re-owns ranges which
data are still stored in the node's row cache, because cleanup didn't
invalidate the cache."

Fixes #4446.

tests:
- unit tests (dev mode)
- dtests:
    update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test
    cleanup_test.py

(cherry picked from commit d0b6be0820)
2020-05-07 16:24:51 +03:00
Piotr Sarna
5fdadcaf3b network_topology_strategy: validate integers
In order to prevent users from creating a network topology
strategy instance with invalid inputs, it's not enough to use
std::stol() on the input: a string "3abc" still returns the number '3',
but will later confuse cqlsh and other drivers, when they ask for
topology strategy details.
The error message is now more human readable, since for incorrect
numeric inputs it used to return a rather cryptic message:
    ServerError: stol()
This commit fixes the issue and comes with a simple test.

Fixes #3801
Tests: unit(dev)
Message-Id: <7aaae83d003738f047d28727430ca0a5cec6b9c6.1583478000.git.sarna@scylladb.com>

(cherry picked from commit 5b7a35e02b)
2020-05-07 16:24:49 +03:00
Pekka Enberg
a960394f27 scripts/jobs: Keep memory reserve when calculating parallelism
The "jobs" script is used to determine the amount of compilation
parallelism on a machine. It attempts to ensure each GCC process has at
least 4 GB of memory per core. However, in the worst case scenario, we
could end up having the GCC processes take up all the system memory,
forcin swapping or OOM killer to kick in. For example, on a 4 core
machine with 16 GB of memory, this worst case scenario seems easy to
trigger in practice.

Fix up the problem by keeping a 1 GB of memory reserve for other
processes and calculating parallelism based on that.

Message-Id: <20200423082753.31162-1-penberg@scylladb.com>
(cherry picked from commit 7304a795e5)
2020-05-04 19:01:54 +03:00
Piotr Sarna
3216a1a70a alternator: fix signature timestamps
Generating timestamps for auth signatures used a non-thread-safe
::gmtime function instead of thread-safe ::gmtime_r.

Tests: unit(dev)
Fixes #6345

(cherry picked from commit fb7fa7f442)
2020-05-04 17:08:13 +03:00
Avi Kivity
5a7fd41618 Merge 'Fix hang in multishard_writer' from Asias
"
This series fix hang in multishard_writer when error happens. It contains
- multishard_writer: Abort the queue attached to consumers when producer fails
- repair: Fix hang when the writer is dead

Fixes #6241
Refs: #6248
"

* asias-stream_fix_multishard_writer_hang:
  repair: Fix hang when the writer is dead
  mutation_writer_test: Add test_multishard_writer_producer_aborts
  multishard_writer: Abort the queue attached to consumers when producer fails

(cherry picked from commit 8925e00e96)
2020-05-01 20:13:00 +03:00
Raphael S. Carvalho
dd24ba7a62 api/service: fix segfault when taking a snapshot without keyspace specified
If no keyspace is specified when taking snapshot, there will be a segfault
because keynames is unconditionally dereferenced. Let's return an error
because a keyspace must be specified when column families are specified.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200427195634.99940-1-raphaelsc@scylladb.com>
(cherry picked from commit 02e046608f)

Fixes #6336.
2020-04-30 12:57:14 +03:00
Avi Kivity
204f6dd393 Update seastar submodule
* seastar a0bdc6cd85...0ebd89a858 (1):
  > http server: fix "Date" header format

Fixes #6253.
2020-04-26 19:31:44 +03:00
Nadav Har'El
b1278adc15 alternator: unzero "scylla_alternator_total_operations" metric
In commit 388b492040, which was only supposed
to move around code, we accidentally lost the line which does

    _executor.local()._stats.total_operations++;

So after this commit this counter was always zero...
This patch returns the line incrementing this counter.

Arguably, this counter is not very important - a user can also calculate
this number by summing up all the counters in the scylla_alternator_operation
array (these are counters for individual types of operations). Nevertheless,
as long as we do export a "scylla_alternator_total_operations" metric,
we need to correctly calculate it and can't leave it zero :-)

Fixes #5836

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200219162820.14205-1-nyh@scylladb.com>
(cherry picked from commit b8aed18a24)
2020-04-19 19:07:31 +03:00
Botond Dénes
ee9677ef71 schema: schema(): use std::stable_sort() to sort key columns
When multiple key columns (clustering or partition) are passed to
the schema constructor, all having the same column id, the expectation
is that these columns will retain the order in which they were passed to
`schema_builder::with_column()`. Currently however this is not
guaranteed as the schema constructor sort key columns by column id with
`std::sort()`, which doesn't guarantee that equally comparing elements
retain their order. This can be an issue for indexes, the schemas of
which are built independently on each node. If there is any room for
variance between for the key column order, this can result in different
nodes having incompatible schemas for the same index.
The fix is to use `std::stable_sort()` which guarantees that the order
of equally comparing elements won't change.

This is a suspected cause of #5856, although we don't have hard proof.

Fixes: #5856
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
[avi: upgraded "Refs" to "Fixes", since we saw that std::sort() becomes
      unstable at 17 elements, and the failing schema had a
      clustering key with 23 elements]
Message-Id: <20200417121848.1456817-1-bdenes@scylladb.com>
(cherry picked from commit a4aa753f0f)
2020-04-19 18:19:05 +03:00
Nadav Har'El
2060e361cf materialized views: fix corner case of view updates used by Alternator
While CQL does not allow creation of a materialized view with more than one
base regular column in the view's key, in Alternator we do allow this - both
partition and clustering key may be a base regular column. We had a bug in
the logic handling this case:

If the new base row is missing a value for *one* of the view key columns,
we shouldn't create a view row. Similarly, if the existing base row was
missing a value for *one* of the view key columns, a view row does not
exist and doesn't need to be deleted.  This was done incorrectly, and made
decisions based on just one of the key columns, and the logic is now
fixed (and I think, simplified) in this patch.

With this patch, the Alternator test which previously failed because of
this problem now passes. The patch also includes new tests in the existing
C++ unit test test_view_with_two_regular_base_columns_in_key. This tests
was already supposed to be testing various cases of two-new-key-columns
updates, but missed the cases explained above. These new tests failed
badly before this patch - some of them had clean write errors, others
caused crashes. With this patch, they pass.

Fixes #6008.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200312162503.8944-1-nyh@scylladb.com>
(cherry picked from commit 635e6d887c)
2020-04-19 15:24:19 +03:00
Hagit Segev
6f939ffe19 release: prepare for 3.3.1 2020-04-18 00:23:31 +03:00
Kamil Braun
69105bde8a sstables: freeze types nested in collection types in legacy sstables
Some legacy `mc` SSTables (created in Scylla 3.0) may contain incorrect
serialization headers, which don't wrap frozen UDTs nested inside collections
with the FrozenType<...> tag. When reading such SSTable,
Scylla would detect a mismatch between the schema saved in schema
tables (which correctly wraps UDTs in the FrozenType<...> tag) and the schema
from the serialization header (which doesn't have these tags).

SSTables created in Scylla versions 3.1 and above, in particular in
Scylla versions that contain this commit, create correct serialization
headers (which wrap UDTs in the FrozenType<...> tag).

This commit does two things:
1. for all SSTables created after this commit, include a new feature
   flag, CorrectUDTsInCollections, presence of which implies that frozen
   UDTs inside collections have the FrozenType<...> tag.
2. when reading a Scylla SSTable without the feature flag, we assume that UDTs
   nested inside collections are always frozen, even if they don't have
   the tag. This assumption is safe to be made, because at the time of
   this commit, Scylla does not allow non-frozen (multi-cell) types inside
   collections or UDTs, and because of point 1 above.

There is one edge case not covered: if we don't know whether the SSTable
comes from Scylla or from C*. In that case we won't make the assumption
described in 2. Therefore, if we get a mismatch between schema and
serialization headers of a table which we couldn't confirm to come from
Scylla, we will still reject the table. If any user encounters such an
issue (unlikely), we will have to use another solution, e.g. using a
separate tool to rewrite the SSTable.

Fixes #6130.

(cherry picked from commit 3d811e2f95)
2020-04-17 09:12:28 +03:00
Kamil Braun
e09e9a5929 sstables: move definition of column_translation::state::build to a .cc file
Ref #6130
2020-04-17 09:12:28 +03:00
Piotr Sarna
2308bdbccb alternator: use partition tombstone if there's no clustering key
As @tgrabiec helpfully pointed out, creating a row tombstone
for a table which does not have a clustering key in its schema
creates something that looks like an open-ended range tombstone.
That's problematic for KA/LA sstable formats, which are incapable
of writing such tombstones, so a workaround is provided
in order to allow using KA/LA in alternator.

Fixes #6035
Cherry-picked from 0a2d7addc0
2020-04-16 12:14:10 +02:00
Asias He
a2d39c9a2e gossip: Add an option to force gossip generation
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.

n1, n2, n3 running scylla version with commit
0a52ecb6df (gossip: Fix max generation
drift measure)

One year later, user wants the upgrade n1,n2,n3 to a new version

when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.

Such unnecessary marking of node down can cause availability issues.
For example:

DC1: n1, n2
DC2: n3, n4

When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.

To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.

Once all the nodes run the version with commit
0a52ecb6df, the option is no logger
needed.

Fixes #5164

(cherry picked from commit 743b529c2b)
2020-03-27 12:49:23 +01:00
Asias He
5fe2ce3bbe gossiper: Always use the new generation number
User reported an issue that after a node restart, the restarted node
is marked as DOWN by other nodes in the cluster while the node is up
and running normally.

Consier the following:

- n1, n2, n3 in the cluster
- n3 shutdown itself
- n3 send shutdown verb to n1 and n2
- n1 and n2 set n3 in SHUTDOWN status and force the heartbeat version to
  INT_MAX
- n3 restarts
- n3 sends gossip shadow rounds to n1 and n2, in
  storage_service::prepare_to_join,
- n3 receives response from n1, in gossiper::handle_ack_msg, since
  _enabled = false and _in_shadow_round == false, n3 will apply the
  application state in fiber1, filber 1 finishes faster filber 2, it
  sets _in_shadow_round = false
- n3 receives response from n2, in gossiper::handle_ack_msg, since
  _enabled = false and _in_shadow_round == false, n3 will apply the
  application state in fiber2, filber 2 yields
- n3 finishes the shadow round and continues
- n3 resets gossip endpoint_state_map with
  gossiper.reset_endpoint_state_map()
- n3 resumes fiber 2, apply application state about n3 into
  endpoint_state_map, at this point endpoint_state_map contains
  information including n3 itself from n2.
- n3 calls gossiper.start_gossiping(generation_number, app_states, ...)
  with new generation number generated correctly in
  storage_service::prepare_to_join, but in
  maybe_initialize_local_state(generation_nbr), it will not set new
  generation and heartbeat if the endpoint_state_map contains itself
- n3 continues with the old generation and heartbeat learned in fiber 2
- n3 continues the gossip loop, in gossiper::run,
  hbs.update_heart_beat() the heartbeat is set to the number starting
  from 0.
- n1 and n2 will not get update from n3 because they use the same
  generation number but n1 and n2 has larger heartbeat version
- n1 and n2 will mark n3 as down even if n3 is alive.

To fix, always use the the new generation number.

Fixes: #5800
Backports: 3.0 3.1 3.2
(cherry picked from commit 62774ff882)
2020-03-27 12:49:20 +01:00
Piotr Sarna
aafa34bbad cql: fix qualifying indexed columns for filtering
When qualifying columns to be fetched for filtering, we also check
if the target column is not used as an index - in which case there's
no need of fetching it. However, the check was incorrectly assuming
that any restriction is eligible for indexing, while it's currently
only true for EQ. The fix makes a more specific check and contains
many dynamic casts, but these will hopefully we gone once our
long planned "restrictions rewrite" is done.
This commit comes with a test.

Fixes #5708
Tests: unit(dev)

(cherry picked from commit 767ff59418)
2020-03-22 09:00:51 +01:00
Hagit Segev
7ae2cdf46c release: prepare for 3.3.0 2020-03-19 21:46:44 +02:00
Hagit Segev
863f88c067 release: prepare for 3.3.rc3 2020-03-15 22:45:30 +02:00
Avi Kivity
90b4e9e595 Update seastar submodule
* seastar f54084c08f...a0bdc6cd85 (1):
  > tls: Fix race and stale memory use in delayed shutdown

Fixes #5759 (maybe)
2020-03-12 19:41:50 +02:00
Konstantin Osipov
434ad4548f locator: correctly select endpoints if RF=0
SimpleStrategy creates a list of endpoints by iterating over the set of
all configured endpoints for the given token, until we reach keyspace
replication factor.
There is a trivial coding bug when we first add at least one endpoint
to the list, and then compare list size and replication factor.
If RF=0 this never yields true.
Fix by moving the RF check before at least one endpoint is added to the
list.
Cassandra never had this bug since it uses a less fancy while()
loop.

Fixes #5962
Message-Id: <20200306193729.130266-1-kostja@scylladb.com>

(cherry picked from commit ac6f64a885)
2020-03-12 12:09:46 +02:00
Avi Kivity
cbbb15af5c logalloc: increase capacity of _regions vector outside reclaim lock
Reclaim consults the _regions vector, so we don't want it moving around while
allocating more capacity. For that we take the reclaim lock. However, that
can cause a false-positive OOM during startup:

1. all memory is allocated to LSA as part of priming (2baa16b371)
2. the _regions vector is resized from 64k to 128k, requiring a segment
   to be freed (plenty are free)
3. but reclaiming_lock is taken, so we cannot reclaim anything.

To fix, resize the _regions vector outside the lock.

Fixes #6003.
Message-Id: <20200311091217.1112081-1-avi@scylladb.com>

(cherry picked from commit c020b4e5e2)
2020-03-12 11:25:20 +02:00
Benny Halevy
3231580c05 dist/redhat: scylla.spec.mustache: set _no_recompute_build_ids
By default, `/usr/lib/rpm/find-debuginfo.sh` will temper with
the binary's build-id when stripping its debug info as it is passed
the `--build-id-seed <version>.<release>` option.

To prevent that we need to set the following macros as follows:
  unset `_unique_build_ids`
  set `_no_recompute_build_ids` to 1

Fixes #5881

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 25a763a187)
2020-03-09 15:21:50 +02:00
Piotr Sarna
62364d9dcd Merge 'cql3: do_execute_base_query: fix null deref ...
... when clustering key is unavailable' from Benny

This series fixes null pointer dereference seen in #5794

efd7efe cql3: generate_base_key_from_index_pk; support optional index_ck
7af1f9e cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable
7fe1a9e cql3: do_execute_base_query: fixup indentation

Fixes #5794

Branches: 3.3

Test: unit(dev) secondary_indexes_test:TestSecondaryIndexes.test_truncate_base(debug)

* bhalevy/fix-5794-generate_base_key_from_index_pk:
  cql3: do_execute_base_query: fixup indentation
  cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable
  cql3: generate_base_key_from_index_pk; support optional index_ck

(cherry picked from commit 4e95b67501)
2020-03-09 15:20:01 +02:00
Takuya ASADA
3bed8063f6 dist/debian: fix "unable to open node-exporter.service.dpkg-new" error
It seems like *.service is conflicting on install time because the file
installed twice, both debian/*.service and debian/scylla-server.install.

We don't need to use *.install, so we can just drop the line.

Fixes #5640

(cherry picked from commit 29285b28e2)
2020-03-03 12:40:39 +02:00
Yaron Kaikov
413fcab833 release: prepare for 3.3.rc2 2020-02-27 14:45:18 +02:00
Juliusz Stasiewicz
9f3c3036bf cdc: set TTLs on CDC log cells
Cells in CDC logs used to be created while completely neglecting
TTLs (the TTLs from `cdc = {...'ttl':600}`). This patch adds TTLs
to all cells; there are no row markers, so wee need not set TTL
there.

Fixes #5688

(cherry picked from commit 67b92c584f)
2020-02-26 18:12:55 +02:00
Benny Halevy
ff2e108a6d gossiper: do_stop_gossiping: copy live endpoints vector
It can be resized asynchronously by mark_dead.

Fixes #5701

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200203091344.229518-1-bhalevy@scylladb.com>
(cherry picked from commit f45fabab73)
2020-02-26 13:00:11 +02:00
Gleb Natapov
ade788ffe8 commitlog: use commitlog IO scheduling class for segment zeroing
There may be other commitlog writes waiting for zeroing to complete, so
not using proper scheduling class causes priority inversion.

Fixes #5858.

Message-Id: <20200220102939.30769-2-gleb@scylladb.com>
(cherry picked from commit 6a78cc9e31)
2020-02-26 12:51:10 +02:00
Benny Halevy
1f8bb754d9 storage_service: drain_on_shutdown: unregister storage_proxy subscribers from local_storage_service
Match subscription done in main() and avoid cross shard access
to _lifecycle_subscribers vector.

Fixes #5385

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Acked-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200123092817.454271-1-bhalevy@scylladb.com>
(cherry picked from commit 5b0ea4c114)
2020-02-25 16:39:49 +02:00
Tomasz Grabiec
7b2eb09225 Merge fixes for use-after-frees related to shutdown of services
Backport of 884d5e2bcb and
4839ca8491.

Fixes crashes when scylla is stopped early during boot.

Merged from https://github.com/xemul/scylla/tree/br-mm-combined-fixes-for-3.3

Fixes #5765.
2020-02-25 13:34:01 +01:00
Pavel Emelyanov
d2293f9fd5 migration_manager: Abort and wait cluster upgrade waiters
The maybe_schedule_schema_pull waits for schema_tables_v3 to
become available. This is unsafe in case migration manager
goes away before the feature is enabled.

Fix this by subscribing on feature with feature::listener and
waiting for condition variable in maybe_schedule_schema_pull.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-24 14:18:15 +03:00
Pavel Emelyanov
25b31f6c23 migration_manager: Abort and wait delayed schema pulls
The sleep is interrupted with the abort source, the "wait" part
is done with the existing _background_tasks gate. Also we need
to make sure the gate stays alive till the end of the function,
so make use of the async_sharded_service (migration manager is
already such).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-24 14:18:15 +03:00
Pavel Emelyanov
742a1ce7d6 storage_service: Unregister from gossiper notifications ... at all
This unregistration doesn't happen currently, but doesn't seem to
cause any problems in general, as on stop gossiper is stopped and
nothing from it hits the store_service.

However (!) if an exception pops up between the storage_service
is subscribed on gossiper and the drain_on_shutdown defer action
is set up  then we _may_ get into the following situation:

- main's stuff gets unrolled back
- gossiper is not stopped (drain_on_shutdown defer is not set up)
- migration manager is stopped (with deferred action in main)
- a nitification comes from gossiper
    -> storage_service::on_change might want to pull schema with
       the help of local migration manager
    -> assert(local_is_initialized) strikes

Fix this by registering storage_service to gossiper a bit earlier
(both are already initialized y that time) and setting up unregister
defer right afterwards.

Test: unit(dev), manual start-stop
Bug: #5628

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200130190343.25656-1-xemul@scylladb.com>
2020-02-24 14:18:15 +03:00
Avi Kivity
4ca9d23b83 Revert "streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations"
This reverts commit bdc542143e. Exposes a data resurrection
bug (#5838).
2020-02-24 10:02:58 +02:00
Avi Kivity
9e97f3a9b3 Update seastar submodule
* seastar dd686552ff...f54084c08f (2):
  > reactor: fallback to epoll backend when fs.aio-max-nr is too small
  > util: move read_sys_file_as() from iotune to seastar header, rename read_first_line_as()

Fixes #5638.
2020-02-20 10:25:00 +02:00
Piotr Dulikowski
183418f228 hh: handle counter update hints correctly
This patch fixes a bug that appears because of an incorrect interaction
between counters and hinted handoff.

When a counter is updated on the leader, it sends mutations to other
replicas that contain all counter shards from the leader. If consistency
level is achieved but some replicas are unavailable, a hint with
mutation containing counter shards is stored.

When a hint's destination node is no longer its replica, it is attempted
to be sent to all its current replicas. Previously,
storage_proxy::mutate was used for that purpose. It was incorrect
because that function treats mutations for counter tables as mutations
containing only a delta (by how much to increase/decrease the counter).
These two types of mutations have different serialization format, so in
this case a "shards" mutation is reinterpreted as "delta" mutation,
which can cause data corruption to occur.

This patch backports `storage_proxy::mutate_hint_from_scratch`
function, which bypasses special handling of counter mutations and
treats them as regular mutations - which is the correct behavior for
"shards" mutations.

Refs #5833.
Backports: 3.1, 3.2, 3.3
Tests: unit(dev)
(cherry picked from commit ec513acc49)
2020-02-19 16:49:12 +02:00
Piotr Sarna
756574d094 db,view: fix generating view updates for partition tombstones
The update generation path must track and apply all tombstones,
both from the existing base row (if read-before-write was needed)
and for the new row. One such path contained an error, because
it assumed that if the existing row is empty, then the update
can be simply generated from the new row. However, lack of the
existing row can also be the result of a partition/range tombstone.
If that's the case, it needs to be applied, because it's entirely
possible that this partition row also hides the new row.
Without taking the partition tombstone into account, creating
a future tombstone and inserting an out-of-order write before it
in the base table can result in ghost rows in the view table.
This patch comes with a test which was proven to fail before the
changes.

Branches 3.1,3.2,3.3
Fixes #5793

Tests: unit(dev)
Message-Id: <8d3b2abad31572668693ab585f37f4af5bb7577a.1581525398.git.sarna@scylladb.com>
(cherry picked from commit e93c54e837)
2020-02-16 20:26:28 +02:00
Rafael Ávila de Espíndola
a348418918 service: Add a lock around migration_notifier::_listeners
Before this patch the iterations over migration_notifier::_listeners
could race with listeners being added and removed.

The addition side is not modified, since it is common to add a
listener during construction and it would require a fairly big
refactoring. Instead, the iteration is modified to use indexes instead
of iterators so that it is still valid if another listener is added
concurrently.

For removal we use a rw lock, since removing an element invalidates
indexes too. There are only a few places that needed refactoring to
handle unregister_listener returning a future<>, so this is probably
OK.

Fixes #5541.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200120192819.136305-1-espindola@scylladb.com>
(cherry picked from commit 27bd3fe203)
2020-02-16 20:13:42 +02:00
Avi Kivity
06c0bd0681 Update seastar submodule
* seastar 3f3e117de3...dd686552ff (1):
  > perftune.py: Use safe_load() for fix arbitrary code execution

Fixes #5630.
2020-02-16 15:53:16 +02:00
Avi Kivity
223c300435 Point seastar submodule at scylla-seastar.git branch-3.3
This allows us to backport seastar patches to Scylla 3.3.
2020-02-16 15:51:46 +02:00
Gleb Natapov
ac8bef6781 commitlog: fix flushing an entry marked as "sync" in periodic mode
After 546556b71b we can have mixed writes into commitlog,
some do flush immediately some do not. If non flushing write races with
flushing one and becomes responsible for writing back its buffer into a
file flush will be skipped which will cause assert in batch_cycle() to
trigger since flush position will not be advanced. Fix that by checking
that flush was skipped and in this case flush explicitly our file
position.

Fixes #5670

Message-Id: <20200128145103.GI26048@scylladb.com>
(cherry picked from commit c654ffe34b)
2020-02-16 15:48:40 +02:00
Pavel Solodovnikov
68691907af lwt: fix handling of nulls in parameter markers for LWT queries
This patch affects the LWT queries with IF conditions of the
following form: `IF col in :value`, i.e. if the parameter
marker is used.

When executing a prepared query with a bound value
of `(None,)` (tuple with null, example for Python driver), it is
serialized not as NULL but as "empty" value (serialization
format differs in each case).

Therefore, Scylla deserializes the parameters in the request as
empty `data_value` instances, which are, in turn, translated
to non-empty `bytes_opt` with empty byte-string value later.

Account for this case too in the CAS condition evaluation code.

Example of a problem this patch aims to fix:

Suppose we have a table `tbl` with a boolean field `test` and
INSERT a row with NULL value for the `test` column.

Then the following update query fails to apply due to the
error in IF condition evaluation code (assume `v=(null)`):
`UPDATE tbl SET test=false WHERE key=0 IF test IN :v`
returns false in `[applied]` column, but is expected to succeed.

Tests: unit(debug, dev), dtest(prepared stmt LWT tests at https://github.com/scylladb/scylla-dtest/pull/1286)

Fixes: #5710

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200205102039.35851-1-pa.solodovnikov@scylladb.com>
(cherry picked from commit bcc4647552)
2020-02-16 15:29:28 +02:00
Avi Kivity
f59d2fcbf1 Merge "stop passing tracing state pointer in client_state" from Gleb
"
client_state is used simultaneously by many requests running in parallel
while tracing state pointer is per request. Both those facts do not sit
well together and as a result sometimes tracing state is being overwritten
while still been used by active request which may cause incorrect trace
or even a crash.
"

Fixes #5700.

Backported from 9f1f60fc38

* 'gleb/trace_fix_3.3_backport' of ssh://github.com/scylladb/seastar-dev:
  client_state: drop the pointer to a tracing state from client_state
  transport: pass tracing state explicitly instead of relying on it been in the client_state
  alternator: pass tracing state explicitly instead of relying on it been in the client_state
2020-02-16 15:23:41 +02:00
Asias He
bdc542143e streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations
The table::flush_streaming_mutations is used in the days when streaming
data goes to memtable. After switching to the new streaming, data goes
to sstables directly in streaming, so the sstables generated in
table::flush_streaming_mutations will be empty.

It is unnecessary to invalidate the cache if no sstables are added. To
avoid unnecessary cache invalidating which pokes hole in the cache, skip
calling _cache.invalidate() if the sstables is empty.

The steps are:

- STREAM_MUTATION_DONE verb is sent when streaming is done with old or
  new streaming
- table::flush_streaming_mutations is called in the verb handler
- cache is invalidated for the streaming ranges

In summary, this patch will avoid a lot of cache invalidation for
streaming.

Backports: 3.0 3.1 3.2
Fixes: #5769
(cherry picked from commit 5e9925b9f0)
2020-02-16 15:16:24 +02:00
Botond Dénes
061a02237c row: append(): downgrade assert to on_internal_error()
This assert, added by 060e3f8 is supposed to make sure the invariant of
the append() is respected, in order to prevent building an invalid row.
The assert however proved to be too harsh, as it converts any bug
causing out-of-order clustering rows into cluster unavailability.
Downgrade it to on_internal_error(). This will still prevent corrupt
data from spreading in the cluster, without the unavailability caused by
the assert.

Fixes: #5786
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200211083829.915031-1-bdenes@scylladb.com>
(cherry picked from commit 3164456108)
2020-02-16 15:12:46 +02:00
Gleb Natapov
35b6505517 client_state: drop the pointer to a tracing state from client_state
client_state is shared between requests and tracing state is per
request. It is not safe to use the former as a container for the later
since a state can be overwritten prematurely by subsequent requests.

(cherry picked from commit 31cf2434d6)
2020-02-13 13:45:56 +02:00
Gleb Natapov
866c04dd64 transport: pass tracing state explicitly instead of relying on it been in the client_state
Multiple requests can use the same client_state simultaneously, so it is
not safe to use it as a container for a tracing state which is per request.
Currently next request may overwrite tracing state for previous one
causing, in a best case, wrong trace to be taken or crash if overwritten
pointer is freed prematurely.

Fixes #5700

(cherry picked from commit 9f1f60fc38)
2020-02-13 13:45:56 +02:00
Gleb Natapov
dc588e6e7b alternator: pass tracing state explicitly instead of relying on it been in the client_state
Multiple requests can use the same client_state simultaneously, so it is
not safe to use it as a container for a tracing state which is per
request. This is not yet an issue for the alternator since it creates
new client_state object for each request, but first of all it should not
and second trace state will be dropped from the client_state, by later
patch.

(cherry picked from commit 38fcab3db4)
2020-02-13 13:45:56 +02:00
Takuya ASADA
f842154453 dist/debian: keep /etc/systemd .conf files on 'remove'
Since dpkg does not re-install conffiles when it removed by user,
currently we are missing dependencies.conf and sysconfdir.conf on rollback.
To prevent this, we need to stop running
'rm -rf /etc/systemd/system/scylla-server.service.d/' on 'remove'.

Fixes #5734

(cherry picked from commit 43097854a5)
2020-02-12 14:26:40 +02:00
Yaron Kaikov
b38193f71d dist/docker: Switch to 3.3 release repository (#5756)
Change the SCYLLA_REPO_URL variable to point to branch-3.3 instead of
master. This ensures that Docker image builds that don't specify the
variable build from the right repository by default.
2020-02-10 11:11:38 +02:00
Rafael Ávila de Espíndola
f47ba6dc06 lua: Handle nil returns correctly
This is a minimum backport to 3.3.

With this patch lua nil values are mapped to CQL null values instead
of producing an error.

Fixes #5667

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200203164918.70450-1-espindola@scylladb.com>
2020-02-09 18:55:42 +02:00
Hagit Segev
0d0c1d4318 release: prepare for 3.3.rc1 2020-02-09 15:55:24 +02:00
Takuya ASADA
9225b17b99 scylla_post_install.sh: fix 'integer expression expected' error
awk returns float value on Debian, it causes postinst script failure
since we compare it as integer value.
Replaced with sed + bash.

Fixes #5569

(cherry picked from commit 5627888b7c)
2020-02-04 14:30:04 +02:00
Gleb Natapov
00b3f28199 db/system_keyspace: use user memory limits for local.paxos table
Treat writes to local.paxos as user memory, as the number of writes is
dependent on the amount of user data written with LWT.

Fixes #5682

Message-Id: <20200130150048.GW26048@scylladb.com>
(cherry picked from commit b08679e1d3)
2020-02-02 17:36:52 +02:00
Rafael Ávila de Espíndola
1bbe619689 types: Fix encoding of negative varint
We would sometimes produce an unnecessary extra 0xff prefix byte.

The new encoding matches what cassandra does.

This was both a efficiency and correctness issue, as using varint in a
key could produce different tokens.

Fixes #5656

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
(cherry picked from commit c89c90d07f)
2020-02-02 16:00:58 +02:00
Avi Kivity
c36f71c783 test: make eventually() more patient
We use eventually() in tests to wait for eventually consistent data
to become consistent. However, we see spurious failures indicating
that we wait too little.

Increasing the timeout has a negative side effect in that tests that
fail will now take longer to do so. However, this negative side effect
is negligible to false-positive failures, since they throw away large
test efforts and sometimes require a person to investigate the problem,
only to conclude it is a false positive.

This patch therefore makes eventually() more patient, by a factor of
32.

Fixes #4707.
Message-Id: <20200130162745.45569-1-avi@scylladb.com>

(cherry picked from commit ec5b721db7)
2020-02-01 13:20:22 +02:00
Pekka Enberg
f5471d268b release: prepare for 3.3.rc0 2020-01-30 14:00:51 +02:00
Takuya ASADA
fd5c65d9dc dist/debian: Use tilde for release candidate builds
We need to add '~' to handle rcX version correctly on Debian variants
(merged at ae33e9f), but when we moved to relocated package we mistakenly
dropped the code, so add the code again.

Fixes #5641

(cherry picked from commit dd81fd3454)
2020-01-28 18:34:48 +02:00
Avi Kivity
3aa406bf00 tools: toolchain: dbuild: relax process limit in container
Docker restricts the number of processes in a container to some
limit it calculates. This limit turns out to be too low on large
machines, since we run multiple links in parallel, and each link
runs many threads.

Remove the limit by specifying --pids-limit -1. Since dbuild is
meant to provide a build environment, not a security barrier,
this is okay (the container is still restricted by host limits).

I checked that --pids-limit is supported by old versions of
docker and by podman.

Fixes #5651.
Message-Id: <20200127090807.3528561-1-avi@scylladb.com>

(cherry picked from commit 897320f6ab)
2020-01-28 18:14:01 +02:00
Piotr Sarna
c0253d9221 db,view: fix checking for secondary index special columns
A mistake in handling legacy checks for special 'idx_token' column
resulted in not recognizing materialized views backing secondary
indexes properly. The mistake is really a typo, but with bad
consequences - instead of checking the view schema for being an index,
we asked for the base schema, which is definitely not an index of
itself.

Branches 3.1,3.2 (asap)
Fixes #5621
Fixes #4744

(cherry picked from commit 9b379e3d63)
2020-01-21 23:32:11 +02:00
Avi Kivity
12bc965f71 atomic_cell: consistently use comma as separator in pretty-printers
The atomic_cell pretty printers use a mix of commas and semicolons.
This change makes them use commas everywhere, for consistency.
Message-Id: <20200116133327.2610280-1-avi@scylladb.com>
2020-01-16 17:26:33 +01:00
Nadav Har'El
1ed21d70dc merge: CDC: do mutation augmentation from storage proxy
Merged pull request https://github.com/scylladb/scylla/pull/5567
from Calle Wilund:

Fixes #5314

Instead of tying CDC handling into cql statement objects, this patch set
moves it to storage proxy, i.e. shared code for mutating stuff. This means
we automatically handle cdc for code paths outside cql (i.e. alternator).

It also adds api handling (though initially inefficient) for batch statements.

CDC is tied into storage proxy by giving the former a ref to the latter (per
shard). Initially this is not a constructor parameter, because right now we
have chicken and egg issues here. Hopefully, Pavels refactoring of migration
manager and notifications will untie these and this relationship can become
nicer.

The actual augmentation can (as stated above) be made much more efficient.
Hopefully, the stream management refactoring will deal with expensive stream
lookup, and eventually, we can maybe coalesce pre-image selects for batches.
However, that is left as an exercise for when deemed needed.

The augmentation API has an optional return value for a "post-image handler"
to be used iff returned after mutation call is finished (and successful).
It is not yet actually invoked from storage_proxy, but it is at least in the
call chain.
2020-01-16 17:12:56 +02:00
Avi Kivity
e677f56094 Merge "Enable general centos RPM (not only centos7)" from Hagit 2020-01-16 14:13:24 +02:00
Tomasz Grabiec
36d90e637e Merge "Relax migration manager dependencies" from Pavel Emalyanov
The set make dependencies between mm and other services cleaner,
in particular, after the set:

- the query processor no longer needs migration manager
  (which doesn't need query processor either)

- the database no longer needs migration manager, thus the mutual
  dependency between these two is dropped, only migration manager
  -> database is left

- the migration manager -> storage_service dependency is relaxed,
  one more patchset will be needed to remove it, thus dropping one
  more mutual dependency between them, only the storage_service
  -> migration manager will be left

- the migration manager is stopped on drain, but several more
  services need it on stop, thus causing use after free problems,
  in particular there's a caught bug when view builder crashes
  when unregistering from notifier list on stop. Fixed.

Tests: unit(dev)
Fixes: #5404
2020-01-16 12:12:25 +01:00
Hagit Segev
d0405003bd building-packages doc: Update no specific el7 on path 2020-01-16 12:49:08 +02:00
Rafael Ávila de Espíndola
c42a2c6f28 configure: Add -O1 when compiling generated parsers
Enabling asan enables a few cleanup optimizations in gcc. The net
result is that using

  -fsanitize=address -fno-sanitize-address-use-after-scope

Produces code that uses a lot less stack than if the file is compiled
with just -O0.

This patch adds -O1 in addition to
-fno-sanitize-address-use-after-scope to protect the unfortunate
developer that decides to build in dev mode with --cflags='-O0 -g'.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200116012318.361732-2-espindola@scylladb.com>
2020-01-16 12:05:50 +02:00
Rafael Ávila de Espíndola
317e0228a8 configure: Put user flags after the mode flags
It is sometimes convenient to build with flags that don't match any
existing mode.

Recently I was tracking a bug that would not reproduce with debug, but
reproduced with dev, so I tried debugging the result of

./configure.py --cflags="-O0 -g"

While the binary had debug info, it still had optimizations because
configure.py put the mode flags after the user flags (-O0 -O1). This
patch flips the order (-O1 -O0) so that the flags passed in the
command line win.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200116012318.361732-1-espindola@scylladb.com>
2020-01-16 12:05:50 +02:00
Gleb Natapov
51281bc8ad lwt: fix write timeout exception reporting
CQL transport code relies on an exception's C++ type to create correct
reply, but in lwt we converted some mutation_timeout exceptions to more
generic request_timeout while forwarding them which broke the protocol.
Do not drop type information.

Fixes #5598.

Message-Id: <20200115180313.GQ9084@scylladb.com>
2020-01-16 12:05:50 +02:00
Piotr Jastrzębski
0c8c1ec014 config: fix description of enable_deprecated_partitioners
Murmur3 is the default partitioner.
ByteOrder and Random are the deprecated ones
and should be mentioned in the description.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-16 12:05:50 +02:00
Nadav Har'El
9953a33354 merge "Adding a schema file when creating a snapshot"
Merged pull request https://github.com/scylladb/scylla/pull/5294 from
Amnon Heiman:

To use a snapshot we need a schema file that is similar to the result of
running cql DESCRIBE command.

The DESCRIBE is implemented in the cql driver so the functionality needs
to be re-implemented inside scylla.

This series adds a describe method to the schema file and use it when doing
a snapshot.

There are different approach of how to handle materialize views and
secondary indexes.

This implementation creates each schema.cql file in its own relevant
directory, so the schema for materializing view, for example, will be
placed in the snapshot directory of the table of that view.

Fixes #4192
2020-01-16 12:05:50 +02:00
Piotr Dulikowski
c383652061 gossip: allow for aborting on sleep
This commit makes most sleeps in gossip.cc abortable. It is now possible
to quickly shut down a node during startup, most notably during the
phase while it waits for gossip to settle.
2020-01-16 12:05:50 +02:00
Avi Kivity
e5e0642f2a tools: toolchain: add dependencies for building debian and rpm packages
This reduces network traffic and eliminates time for installation when
building packages from the frozen toolchain, as well as isolating the
build from updates to those dependencies which may cause breakage.
2020-01-16 12:05:50 +02:00
Pekka Enberg
da9dae3dbe Merge 'test.py: add support for CQL tests' from Kostja
This patch set adds support for CQL tests to test.py,
as well as many other improvements:

* --name is now a positional argument
* test output is preserved in testlog/${mode}
* concise output format
* better color support
* arbitrary number of test suites
* per-suite yaml-based configuration
* options --jenkins and --xunit are removed and xml
  files are generated for all runs

A simple driver is written in C++ to read CQL for
standard input, execute in embedded mode and produce output.

The patch is checked with BYO.

Reviewed-by: Dejan Mircevski <dejan@scylladb.com>
* 'test.py' of github.com:/scylladb/scylla-dev: (39 commits)
  test.py: introduce BoostTest and virtualize custom boost arguments
  test.py: sort tests within a suite, and sort suites
  test.py: add a basic CQL test
  test.py: add CQL .reject files to gitignore
  test.py: print a colored unidiff in case of test failure
  test.py: add CqlTestSuite to run CQL tests
  test.py: initial import of CQL test driver, cql_repl
  test.py: remove custom colors and define a color palette
  test.py: split test output per test mode
  test.py: remove tests_to_run
  test.py: virtualize Test.run(), to introduce CqlTest.Run next
  test.py: virtualize test search pattern per TestSuite
  test.py: virtualize write_xunit_report()
  test.py: ensure print_summary() is agnostic of test type
  test.py: tidy up print_summary()
  test.py: introduce base class Test for CQL and Unit tests
  test.py: move the default arguments handling to UnitTestSuite
  test.py: move custom unit test command line arguments to suite.yaml
  test.py: move command line argument processing to UnitTestSuite
  test.py: introduce add_test(), which is suite-specific
  ...
2020-01-16 12:05:50 +02:00
Pekka Enberg
e8b659ec5d dist/docker: Remove Ubuntu-based Docker image
The Ubuntu-based Docker image uses Scylla 1.0 and has not been updated
since 2017. Let's remove it as unmaintained.

Message-Id: <20200115102405.23567-1-penberg@scylladb.com>
2020-01-16 12:05:50 +02:00
Avi Kivity
546556b71b Merge "allow commitlog to wait for specific entires to be flushed on disk" from Gleb
"
Currently commitlog supports two modes of operation. First is 'periodic'
mode where all commitlog writes are ready the moment they are stored in
a memory buffer and the memory buffer is flushed to a storage periodically.
Second is a 'batch' mode where each write is flushed as soon as possible
(after previous flush completed) and writes are only ready after they
are flushed.

The first option is not very durable, the second is not very efficient.
This series adds an option to mark some writes as "more durable" in
periodic mode meaning that they will be flushed immediately and reported
complete only after the flush is complete (flushing a durable write also
flushes all writes that came before it). It also changes paxos to use
those durable writes to store paxos state.

Note that strictly speaking the last patch is not needed since after
writing to an actual table the code updates paxos table and the later
uses durable writes that make sure all previous writes are flushed. Given
that both writes supposed to run on the same shard this should be enough.
But it feels right to make base table writes durable as well.
"

* 'gleb/commilog_sync_v4' of github.com:scylladb/seastar-dev:
  paxos: immediately sync commitlog entries for writes made by paxos learn stage
  paxos: mark paxos table schema as "always sync"
  schema: allow schema to be marked as 'always sync to commitlog'
  commitlog: add test for per entry sync mode
  database: pass sync flag from db::apply function to the commitlog
  commitlog: add sync method to entry_writer
2020-01-16 12:05:50 +02:00
Rafael Ávila de Espíndola
2ebd1463b2 tests: Handle null and not present values differently
Before this patch result_set_assertions was handling both null values
and missing values in the same way.

This patch changes the handling of missing values so that now checking
for a null value is not the same as checking for a value not being
present.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200114184116.75546-1-espindola@scylladb.com>
2020-01-16 12:05:50 +02:00
Botond Dénes
0c52c2ba50 data: make cell::make_collection(): more consistent and safer
3ec889816 changed cell::make_collection() to take different code paths
depending whether its `data` argument is nothrow copyable/movable or
not. In case it is not, it is wrapped in a view to make it so (see the
above mentioned commit for a full explanation), relying on the methods
pre-existing requirement for callers to keep `data` alive while the
created writer is in use.
On closer look however it turns out that this requirement is neither
respected, nor enforced, at least not on the code level. The real
requirement is that the underlying data represented by `data` is kept
alive. If `data` is a view, it is not expected to be kept alive and
callers don't, it is instead copied into `make_collection()`.
Non-views however *are* expected to be kept alive. This makes the API
error prone.
To avoid any future errors due to this ambiguity, require all `data`
arguments to be nothrow copyable and movable. Callers are now required
to pass views of nonconforming objects.

This patch is a usability improvement and is not fixing a bug. The
current code works as-is because it happens to conform to the underlying
requirements.

Refs: #5575
Refs: #5341

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200115084520.206947-1-bdenes@scylladb.com>
2020-01-16 12:05:50 +02:00
Amnon Heiman
ac8aac2b53 tests/cql_query_test: Add schema describe tests
This patch adds tests for the describe method.

test_describe_simple_schema tests regular tables.

test_describe_view_schema tests view and index.

Each test, create a table, find the schema, call the describe method and
compare the results to the string that was used to create the table.

The view tests also verify that adding an index or view does not change
the base table.

When comparing results, leading and trailing white spaces are ignored
and all combination of whitespaces and new lines are treated equaly.

Additional tests may be added at a future phase if required.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:07:57 +02:00
Amnon Heiman
028525daeb database: add schema.cql file when creating a snapshot
When creating a snapshot we need to add a schema.cql file in the
snapshot directory that describes the table in that snapshot.

This patch adds the file using the schema describe method.

get_snapshot_details and manifest_json_filter were modified to ignore
the schema.cql file.

Fixes #4192

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:06:00 +02:00
Amnon Heiman
82367b325a schema: Add a describe method
This patch adds a describe method to a table schema.

It acts similar to a DESCRIBE cql command that is implemented in a CQL
driver.

The method supports tables, secondary indexes local indexes and
materialize views.

relates to: #4192

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:06:00 +02:00
Amnon Heiman
6f58d51c83 secondary_index_manager: add the index_name_from_table_name function
index_name_from_table_name is a reverse of index_table_name,
it gets a table name that was generated for an index and return the name
of the index that generated that table.

Relates to #4192

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:06:00 +02:00
Pavel Emelyanov
555856b1cd migration_manager: Use in-place value factory
The factory is purely a state-less thing, there is no difference what
instance of it to use, so we may omit referencing the storage_service
in passive_announce

This is 2nd simple migration_manager -> storage_service link to cut
(more to come later).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
f129d8380f migration_manager: Get database through storage_proxy
There are several places where migration_manager needs storage_service
reference to get the database from, thus forming the mutual dependency
between them. This is the simplest case where the migration_manager
link to the storage_service can be cut -- the databse reference can be
obtained from storage_proxy instead.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
5cf365d7e7 database: Explicitly pass migration_manager through init_non_system_keyspace
This is the last place where database code needs the migration_manager
instance to be alive, so now the mutual dependency between these two
is gone, only the migration_manager needs the database, but not the
vice-versa.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
ebebf9f8a8 database: Do not request migration_manager instance for passive_announce
The helper in question is static, so no need to play with the
migration_manager instances.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
3f84256853 migration_manager: Remove register/unregister helpers
In the 2nd patch the migration_manager kept those for
simpler patching, but now we can drop it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
9e4b41c32a tests: Switch on migration notifier
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
9d31bc166b cdc: Use migration_notifier to (un)register for events
If no one provided -- get it from storage_service.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:19 +03:00
Pavel Emelyanov
ecab51f8cc storage_service: Use migration_notifier (and stop worrying)
The storage_server needs migration_manager for notifications and
carefully handles the manager's stop process not to demolish the
listeners list from under itself. From now on this dependency is
no longer valid (however the storage_service seems still need the
migration_manager, but this is different story).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
7814ed3c12 cql_server: Use migration_notifier in events_notifier
This patch removes an implicit cql_server -> migration_manager
dependency, as the former's event notifier uses the latter
for notifications.

This dependency also breaks a loop:
storage_service -> cql_server -> migration_manager -> storage_service

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
d9edcb3f15 query_processor: Use migration_notifier
This patch breaks one (probably harmless but still) dependency
loop. The query_processor -> migration_manager -> storage_proxy
 -> tracing -> query_processor.

The first link is not not needed, as the query_processor needs the
migration_manager purely to (ub)subscribe on notifications.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
2735024a53 auth: Use migration_notifier
The same as with view builder. The constructor still needs both,
but the life-time reference is now for notifier only.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
28f1250b8b view_builder: Use migration notifier
The migration manager itself is still needed on start to wait
for schema agreement, but there's no longer the need for the
life-time reference on it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
7cfab1de77 database: Switch on mnotifier from migration_manager
Do not call for local migration manager instance to send notifications,
call for the local migration notifier, it will always be alive.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
f45b23f088 storage_service: Keep migration_notifier
The storage service will need this guy to initialize sub-services
with. Also it registers itself with notifiers.

That said, it's convenient to have the migration notifier on board.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
e327feb77f database: Prepare to use on-database migration_notifier
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
f240d5760c migration_manager: Split notifier from main class
The _listeners list on migration_manager class and the corresponding
notify_xxx helpers have nothing to do with the its instances, they
are just transport for notification delivery.

At the same time some services need the migration manager to be alive
at their stop time to unregister from it, while the manager itself
may need them for its needs.

The proposal is to move the migration notifier into a complete separate
sharded "service". This service doesn't need anything, so it's started
first and stopped last.

While it's not effectively a "migration" notifier, we inherited the name
from Cassandra and renaming it will "scramble neurons in the old-timers'
brains but will make it easier for newcomers" as Avi says.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:19 +03:00
Pavel Emelyanov
074cc0c8ac migration_manager: Helpers for on_before_ notifications
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:27:27 +03:00
Pavel Emelyanov
1992755c72 storage_service: Kill initialization helper from init.cc
The helper just makes further patching more complex, so drop it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:27:27 +03:00
Konstantin Osipov
a665fab306 test.py: introduce BoostTest and virtualize custom boost arguments 2020-01-15 13:37:25 +03:00
Gleb Natapov
51672e5990 paxos: immediately sync commitlog entries for writes made by paxos learn stage 2020-01-15 12:15:42 +02:00
Gleb Natapov
0fc48515d8 paxos: mark paxos table schema as "always sync"
We want all writes to paxos table to be persisted on a storage before
declared completed.
2020-01-15 12:15:42 +02:00
Gleb Natapov
16e0fc4742 schema: allow schema to be marked as 'always sync to commitlog'
All writes that uses this schema will be immediately persisted on a
storage.
2020-01-15 12:15:42 +02:00
Gleb Natapov
0ce70c7a04 commitlog: add test for per entry sync mode 2020-01-15 12:15:42 +02:00
Gleb Natapov
29574c1271 database: pass sync flag from db::apply function to the commitlog
Allow upper layers to request a mutation to be persisted on a disk before
making future ready independent of which mode commitlog is running in.
2020-01-15 12:15:42 +02:00
Gleb Natapov
e0bc4aa098 commitlog: add sync method to entry_writer
If the method returns true commitlog should sync to file immediately
after writing the entry and wait for flush to complete before returning.
2020-01-15 12:15:42 +02:00
Piotr Sarna
9aab75db60 alternator: clean up single value rjson comparator
The comparator is refreshed to ensure the following:
 - null compares less to all other types;
 - null, true and false are comparable against each other,
   while other types are only comparable against themselves and null.

Comparing mixed types is not currently reachable from the alternator
API, because it's only used for sets, which can only use
strings, binary blobs and numbers - thus, no new pytest cases are added.

Fixes #5454
2020-01-15 10:57:49 +02:00
Juliusz Stasiewicz
d87d01b501 storage_proxy: intercept rpc::closed_error if counter leader is down (#5579)
When counter mutation is about to be sent, a leader is elected, but
if the leader fails after election, we get `rpc::closed_error`. The
exception propagates high up, causing all connections to be dropped.

This patch intercepts `rpc::closed_error` in `storage_proxy::mutate_counters`
and translates it to `mutation_write_failure_exception`.

References #2859
2020-01-15 09:56:45 +01:00
Konstantin Osipov
a351ea57d5 test.py: sort tests within a suite, and sort suites
This makes it easier to navigate the test artefacts.

No need to sort suites since they are already
stored in a dict.
2020-01-15 11:41:19 +03:00
Konstantin Osipov
ba87e73f8e test.py: add a basic CQL test 2020-01-15 11:41:19 +03:00
Konstantin Osipov
44d31db1fc test.py: add CQL .reject files to gitignore
To avoid accidental commit, add .reject files to .gitignore
2020-01-15 11:41:19 +03:00
Konstantin Osipov
4f64f0c652 test.py: print a colored unidiff in case of test failure
Print a colored unidiff between result and reject files in case of test
failure.
2020-01-15 11:41:19 +03:00
Konstantin Osipov
d3f9e64028 test.py: add CqlTestSuite to run CQL tests
Run the test and compare results. Manage temporary
and .reject files.

Now that there are CQL tests, improve logging.

run_test success no longer means test success.
2020-01-15 11:41:19 +03:00
Konstantin Osipov
b114bfe0bd test.py: initial import of CQL test driver, cql_repl
cql_repl is a simple program which reads CQL from stdin,
executes it, and writes results to stdout.

It support --input, --output and --log options.
--log is directed to cql_test.log by default.
--input is stdin by default
--output is stdout by default.

The result set output is print with a basic
JSON visitor.
2020-01-15 11:41:16 +03:00
Konstantin Osipov
0ec27267ab test.py: remove custom colors and define a color palette
Using a standard Python module improves readability,
and allows using colors easily in other output.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
0165413405 test.py: split test output per test mode
Store test temporary files and logs in ${testdir}/${mode}.
Remove --jenkins and --xunit, and always write XML
files at a predefined location: ${testdir}/${mode}/xml/.

Use .xunit.xml extension for tests which XML output is
in xunit format, and junit.xml for an accumulated output
of all non-boost tests in junit format.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
4095ab08c8 test.py: remove tests_to_run
Avoid storing each test twice, use per-tests
list to construct a global iterable.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
169128f80b test.py: virtualize Test.run(), to introduce CqlTest.Run next 2020-01-15 10:53:24 +03:00
Konstantin Osipov
d05f6c3cc7 test.py: virtualize test search pattern per TestSuite
CQL tests have .cql extension, while unit tests
have .cc.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
abcc182ab3 test.py: virtualize write_xunit_report()
Make sure any non-boost test can participate in the report.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
18aafacfad test.py: ensure print_summary() is agnostic of test type
Introduce a virtual Test.print_summary() to print
a failed test summary.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
21fbe5fa81 test.py: tidy up print_summary()
Now that we have tabular output, make print_summary()
more concise.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
c171882b51 test.py: introduce base class Test for CQL and Unit tests 2020-01-15 10:53:24 +03:00
Konstantin Osipov
fd6897d53e test.py: move the default arguments handling to UnitTestSuite
Move UnitTeset default seastar argument handling to UnitTestSuite
(cleanup).
2020-01-15 10:53:24 +03:00
Konstantin Osipov
d3126f08ed test.py: move custom unit test command line arguments to suite.yaml
Load the command line arguments, if any, from suite.yaml, rather
than keep them hard-coded in test.py.

This is allows operations team to have easier access to these.

Note I had to sacrifice dynamic smp count for mutation_reader_test
(the new smp count is fixed at 3) since this is part
of test configuration now.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
ef6cebcbd2 test.py: move command line argument processing to UnitTestSuite 2020-01-15 10:53:24 +03:00
Konstantin Osipov
4a20617be3 test.py: introduce add_test(), which is suite-specific 2020-01-15 10:53:24 +03:00
Konstantin Osipov
7e10bebcda test.py: move long test list to suite.yaml
Use suite.yaml for long tests
2020-01-15 10:53:24 +03:00
Konstantin Osipov
32ffde91ba test.py: move test id assignment to TestSuite
Going forward finding and creating tests will be
a responsibility of TestSuite, so the id generator
needs to be shared.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
b5b4944111 test.py: move repeat handling to TestSuite
This way we can avoid iterating over all tests
to handle --repeat.
Besides, going forward the tests will be stored
in two places: in the global list of all tests,
for the runner, and per suite, for suite-based
reporting, so it's easier if TestSuite
if fully responsible for finding and adding tests.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
34a1b49fc3 test.py: move add_test_list() to TestSuite 2020-01-15 10:53:24 +03:00
Konstantin Osipov
44e1c4267c test.py: introduce test suites
- UnitTestSuite - for test/unit tests
- BoostTestSuite - a tweak on UnitTestSuite, with options
  to log xml test output to a dedicated file
2020-01-15 10:53:24 +03:00
Konstantin Osipov
eed3201ca6 test.py: use path, rather than test kind, for search pattern
Going forward there may be multiple suites of the same kind.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
f95c97667f test.py: support arbitrary number of test suites
Scan entire test/ for folders that contain suite.yaml,
and load tests from these folders. Skip the rest.

Each folder with a suite.yaml is expected to have a valid
suite configuration in the yaml file.

A suite is a folder with test of the same type. E.g.
it can be a folder with unit tests, boost tests, or CQL
tests.

The harness will use suite.yaml to create an appropriate
suite test driver, to execute tests in different formats.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
c1f8169cd4 test.py: add suite.yaml to boost and unit tests
The plan is to move suite-specific settings to the
configuration file.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
ec9ad04c8a test.py: move 'success' to TestUnit class
There will be other success attributes: program return
status 0 doesn't mean the test is successful for all tests.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
b4aa4d35c3 test.py: save test output in tmpdir
It is handy to have it so that a reference of a failed
test is available without re-running it.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
f4efe03ade test.py: always produce xml output, derive output paths from tmpdir
It reduces the number of configurations to re-test when test.py is
modified.  and simplifies usage of test.py in build tools, since you no
longer need to bother with extra arguments.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
d2b546d464 test.py: output job count in the log 2020-01-15 10:53:24 +03:00
Konstantin Osipov
233f921f9d test.py: make test output brief&tabular
New format:

% ./test.py --verbose --mode=release
================================================================================
[N/TOTAL] TEST                                                 MODE   RESULT
------------------------------------------------------------------------------
[1/111]   boost/UUID_test                                    release  [ PASS ]
[2/111]   boost/enum_set_test                                release  [ PASS ]
[3/111]   boost/like_matcher_test                            release  [ PASS ]
[4/111]   boost/observable_test                              release  [ PASS ]
[5/111]   boost/allocation_strategy_test                     release  [ PASS ]
^C
% ./test.py foo
================================================================================
[N/TOTAL] TEST                                                 MODE   RESULT
------------------------------------------------------------------------------
[3/3]     unit/memory_footprint_test                          debug   [ PASS ]
------------------------------------------------------------------------------
2020-01-15 10:53:24 +03:00
Konstantin Osipov
879bea20ab test.py: add a log file
Going forward I'd like to make terminal output brief&tabular,
but some test details are necessary to preserve so that a failure
is easy to debug. This information now goes to the log file.

- open and truncate the log file on each harness start
- log options of each invoked test in the log, so that
  a failure is easy to reproduce
- log test result in the log

Since tests are run concurrently, having an exact
trace of concurrent execution also helps
debugging flaky tests.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
cbee76fb95 test.py: gitignore the default ./test.py tmpdir, ./testlog 2020-01-15 10:53:24 +03:00
Konstantin Osipov
1de69228f1 test.py: add --tmpdir
It will be used for test log files.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
caf742f956 test.py: flake8 style fix 2020-01-15 10:53:24 +03:00
Konstantin Osipov
dab364c87d test.py: sort imports 2020-01-15 10:53:24 +03:00
Konstantin Osipov
7ec4b98200 test.py: make name a positional argument.
Accept multiple test names, treat test name
as a substring, and if the same name is given
multiple times, run the test multiple times.
2020-01-15 10:53:24 +03:00
Dejan Mircevski
bb2e04cc8b alternator: Improve comments on comparators
Some comparator methods in conditions.cc use unexpected operators;
explain why.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-01-14 22:25:55 +02:00
Tomasz Grabiec
c8a5a27bd9 Merge "storage_service: Move load_broadcaster away" from Pavel E.
The storage_service struct is a collection of diverse things,
most of them requiring only on start and on stop and/or runing
on shard 0 (but is nonetheless sharded).

As a part of clearing this structure and generated by it inter-
-componenes dependencies, here's the sanitation of load_broadcaster.
2020-01-14 19:26:06 +01:00
Calle Wilund
313ed91ab0 cdc: Listen for migration callbacks on all shards
Fixes #5582

... but only populate log on shard 0.

Migration manager callbacks are slightly assymetric. Notifications
for pre-create/update mutations are sent only on initiating shard
(neccesary, because we consider the mutations mutable).
But "created" callbacks are sent on all shards (immutable).

We must subscribe on all shards, but still do population of cdc table
only once, otherwise we can either miss table creat or populate
more than once.

v2:
- Add test case
Message-Id: <20200113140524.14890-1-calle@scylladb.com>
2020-01-14 16:35:41 +01:00
Avi Kivity
2138657d3a Update seastar submodule
* seastar 36cf5c5ff0...3f3e117de3 (16):
  > memcached: don't use C++17-only std::optional
  > reactor: Comment why _backend is assigned in constructor body
  > log: restore --log-to-stdout for backward compatibility
  > used_size.hh: Include missing headers
  > core: Move some code from reactor.cc to future.cc
  > future-util: move parallel_for_each to future-util.cc
  > task: stop wrapping tasks with unique_ptr
  > Merge "Setup timer signal handler in backend constructor" from Pavel
Fixes #5524
  > future: avoid a branch in future's move constructor if type is trivial
  > utils: Expose used_size
  > stream: Call get_future early
  > future-util: Move parallel_for_each_state code to a .cc
  > memcached: log exceptions
  > stream: Delete dead code
  > core: Turn pollable_fd into a simple proxy over pollable_fd_state.
  > Merge "log to std::cerr" from Benny
2020-01-14 16:56:25 +02:00
Pavel Emelyanov
e1ed8f3f7e storage_service: Remove _shadow_token_metadata
This is the part of de-bloating storage_service.

The field in question is used to temporary keep the _token_metadata
value during shard-wide replication. There's no need to have it as
class member, any "local" copy is enough.

Also, as the size of token_metadata is huge, and invoke_on_all()
copies the function for each shard, keep one local copy of metadata
using do_with() and pass it into the invoke_on_all() by reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Reviewed-by:  Asias He <asias@scylladb.com>
Message-Id: <20200113171657.10246-1-xemul@scylladb.com>
2020-01-14 16:29:10 +02:00
Rafael Ávila de Espíndola
054f5761a7 types: Refactor code into a serialize_varint helper
This is a bit cleaner and avoids a boost::multiprecision::cpp_int copy
while serializing a decimal.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200110221422.35807-1-espindola@scylladb.com>
2020-01-14 16:28:27 +02:00
Avi Kivity
6c84dd0045 cql3: update_statement: do not set query option always_return_static_content for list read-before-write
The query option always_return_static_content was added for lightweight
transations in commits e0b31dd273 (infrastructure) and 65b86d155e
(actual use). However, the flag was added unconditionally to
update_parameters::options. This caused it to be set for list
read-modify-write operations, not just for lightweight transactions.
This is a little wasteful, and worse, it breaks compatibility as old
nodes do not understand the always_return_static_content flag and
complain when they see it.

To fix, remove the always_return_static_content from
update_parameters::options and only set it from compare-and-swap
operations that are used to implement lightweight transactions.

Fixes #5593.

Reviewed-by: Gleb Natapov <gleb@scylladb.com>
Message-Id: <20200114135133.2338238-1-avi@scylladb.com>
2020-01-14 16:15:20 +02:00
Hagit Segev
ef88e1e822 CentOS RPMs: Remove target to enable general centos. 2020-01-14 14:31:03 +02:00
Alejo Sanchez
6909d4db42 cql3: BYPASS CACHE query counter
This patch is the first part of requested full scan metrics.
It implements a counter of SELECT queries with BYPASS CACHE option.

In scope of #5209

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Message-Id: <20200113222740.506610-2-alejo.sanchez@scylladb.com>
2020-01-14 12:19:00 +02:00
Rafael Ávila de Espíndola
dca1bc480f everywhere: Use serialized(foo) instead of data_value(foo).serialize()
This is just a simple cleanup that reduces the size of another patch I
am working on and is an independent improvement.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200114051739.370127-1-espindola@scylladb.com>
2020-01-14 12:17:12 +02:00
Pavel Emelyanov
b9f28e9335 storage_service: Remove dead drain branch
The drain_in_progress variable here is the future that's set by the
drain() operation itself. Its promise is set when the drain() finishes.

The check for this future in the beginning of drain() is pointless.
No two drain()-s can run in parallels because of run_with_api_lock()
protection. Doing the 2nd drain after successfull 1st one is also
impossible due to the _operation_mode check. The 2nd drain after
_exceptioned_ (and thus incomplete) 1st one will deadlock, after
this patch will try to drain for the 2nd time, but that should by ok.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200114094724.23876-1-xemul@scylladb.com>
2020-01-14 12:07:29 +02:00
Piotr Sarna
36ec43a262 Merge "add table with connected cql clients" from Juliusz
This change introduces system.clients table, which provides
information about CQL clients connected.

PK is the client's IP address, CK consists of outgoing port number
and client_type (which will be extended in future to thrift/alternator/redis).
Table supplies also shard_id and username. Other columns,
like connection_stage, driver_name, driver_version...,
are currently empty but exist for C* compatibility and future use.

This is an ordinary table (i.e. non-virtual) and it's updated upon
accepting connections. This is also why C*'s column request_count
was not introduced. In case of abrupt DB stop, the table should not persist,
so it's being truncated on startup.

Resolves #4820
2020-01-14 10:01:07 +02:00
Avi Kivity
1f46133273 Merge "data: make cell::make_collection() exception safe" from Botond
"
Most of the code in `cell` and the `imr` infrastructure it is built on
is `noexcept`. This means that extra care must be taken to avoid rouge
exceptions as they will bring down the node. The changes introduced by
0a453e5d3a did just that - introduced rouge `std::bad_alloc` into this
code path by violating an undocumented and unvalidated assumption --
that fragment ranges passed to `cell::make_collection()` are nothrow
copyable and movable.

This series refactors `cell::make_collection()` such that it does not
have this assumption anymore and is safe to use with any range.

Note that the unit test included in this series, that was used to find
all the possible exception sources will not be currently run in any of
our build modes, due to `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` not
being set. I plan to address this in a followup because setting this
flags fails other tests using the failure injection mechanism. This is
because these tests are normally run with the failure injection disabled
so failures managed to lurk in without anyone noticing.

Fixes: #5575
Refs: #5341

Tests: unit(dev, debug)
"

* 'data-cell-make-collection-exception-safety/v2' of https://github.com/denesb/scylla:
  test: mutation_test: add exception safety test for large collection serialization
  data/cell.hh: avoid accidental copies of non-nothrow copiable ranges
  utils/fragment_range.hh: introduce fragment_range_view
2020-01-14 10:01:06 +02:00
Nadav Har'El
5b08ec3d2c alternator: error on unsupported ScanIndexForward=false
We do not yet support the ScanIndexForward=false option for reversing
the sort order of a Query operation, as reported in issue #5153.
But even before implementing this feature, it is important that we
produce an error if a user attempts to use it - instead of outright
ignoring this parameter and giving the user wrong results. This is
what this patch does.

Before this patch, the reverse-order query in the xfailing test
test_query.py::test_query_reverse seems to succeed - yet gives
results in the wrong order. With this patch, the query itself fails -
stating that the ScanIndexForward=false argument is not supported.

Refs #5153

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200105113719.26326-1-nyh@scylladb.com>
2020-01-14 10:01:06 +02:00
Pavel Emelyanov
c4bf532d37 storage_service: Fix race in removenode/force_removenode/other
Here's another theoretical problem, that involves 3 sequential calls
to respectively removenode, force_removenode and some other operation.
Let's walk through them

First goes the removenode:
  run_with_api_lock
    _operation_in_progress = "removenode"
    storage_service::remove_node
      sleep in replicating_nodes.empty() loop

Now the force_removenode can run:

  run_with_no_api_lock
    storage_service::force_removenode
      check _operation_in_progress (not empty)
      _force_remove_completion = true
      sleep in _operation_in_progress.empty loop

Now the 1st call wakes up and:

    if _force_remove_completion == true
      throw <some exception>
  .finally() handler in run_with_api_lock
    _operation_in_progress = <empty>

At this point some other operation may start. Say, drain:

  run_with_api_lock
    _operation_in_progress = "drain"
    storage_service::drain
      ...
      go to sleep somewhere

No let's go back to the 1st op that wakes up from its sleep.
The code it executes is

    while (!ss._operation_in_progress.empty()) {
        sleep_abortable()
    }

and while the drain is running it will never exit.

However (! and this is the core of the race) should the drain
operation happen _before_ the force_removenode, another check
for _operation_in_progress would have made the latter exit with
the "Operation drain is in progress, try again" message.

Fix this inconsistency by making the check for current operation
every wake-up from the sleep_abortable.

Fixes #5591

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-14 10:01:06 +02:00
Pavel Emelyanov
cc92683894 storage_service: Fix race and deadlock in removenode/force_removenode
Here's a theoretical problem, that involves 3 sequential calls
to respectively removenode, force_removenode and removenode (again)
operations. Let's walk through them

First goes the removenode:
  run_with_api_lock
    _operation_in_progress = "removenode"
    storage_service::remove_node
      sleep in replicating_nodes.empty() loop

Now the force_removenode can run:

  run_with_no_api_lock
    storage_service::force_removenode
      check _operation_in_progress (not empty)
      _force_remove_completion = true
      sleep in _operation_in_progress.empty loop

Now the 1st call wakes up and:

    if _force_remove_completion == true
      _force_remove_completion = false
      throw <some exception>
  .finally() handler in run_with_api_lock
    _operation_in_progress = <empty>

! at this point we have _force_remove_completion = false and
_operation_in_progress = <empty>, which opens the following
opportunity for the 3d removenode:

  run_with_api_lock
    _operation_in_progress = "removenode"
    storage_service::remove_node
      sleep in replicating_nodes.empty() loop

Now here's what we have in 2nd and 3rd ops:

1. _operation_in_progress = "removenode" (set by 3rd) prevents the
   force_removenode from exiting its loop
2. _force_remove_completion = false (set by 1st on exit) prevents
   the removenode from waiting on replicating_nodes list

One can start the 4th call with force_removenode, it will proceed and
wake up the 3rd op, but after it we'll have two force_removenode-s
running in parallel and killing each other.

I propose not to set _force_remove_completion to false in removenode,
but just exit and let the owner of this flag unset it once it gets
the control back.

Fixes #5590

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-14 10:01:06 +02:00
Benny Halevy
ff55b5dca3 cql3: functions: limit sum overflow detection to integral types
Other types do not have a wider accumulator at the moment.
And static_cast<accumulator_type>(ret) != _sum evaluates as
false for NaN/Inf floating point values.

Fixes #5586

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200112183436.77951-1-bhalevy@scylladb.com>
2020-01-14 10:01:06 +02:00
Avi Kivity
e3310201dd atomic_cell_or_collection: type-aware print atomic_cell or collection components
Now that atomic_cell_view and collection_mutation_view have
type-aware printers, we can use them in the type-aware atomic_cell_or_collection
printer.
Message-Id: <20191231142832.594960-1-avi@scylladb.com>
2020-01-14 10:01:06 +02:00
Avi Kivity
931b196d20 mutation_partition: row: resolve column name when in schema-aware printer
Instead of printing the column id, print the full column name.
Message-Id: <20191231142944.595272-1-avi@scylladb.com>
2020-01-14 10:01:06 +02:00
Nadav Har'El
4aa323154e merge: Pretty print canonical_mutation objects
Merged pull request https://github.com/scylladb/scylla/pull/5533
from Avi Kivity:

canonical_mutation objects are used for schema reconciliation, which is a
fragile area and thus deserves some debugging help.

This series makes canonical_mutation objects printable.
2020-01-14 10:01:06 +02:00
Takuya ASADA
5241deda2d dist: nonroot: fix CLI tool path for nonroot (#5584)
CLI tool path is hardcorded, need to specify correct path on nonroot.
2020-01-14 10:01:06 +02:00
Nadav Har'El
1511b945f8 merge: Handle multiple regular base columns in view pk
Merged patch series from Piotr Sarna:

"Previous assumption was that there can only be one regular base column
in the view key. The assumption is still correct for tables created
via CQL, but it's internally possible to create a view with multiple
such columns - the new assumption is that if there are multiple columns,
they share their liveness.

This series is vital for indexing to work properly on alternator,
so it would be best to solve the issue upstream. I strived to leave
the existing semantics intact as long as only up to one regular
column is part of the materialized view primary key, which is the case
for Scylla's materialized views. For alternator it may not be true,
but all regular columns in alternator share liveness info (since
alternator does not support per-column TTL), which is sufficient
to compute view updates in a consistent way.

Fixes #5006
Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo)"

Piotr Sarna (3):
  db,view: fix checking if partition key is empty
  view: handle multiple regular base columns in view pk
  test: add a case for multiple base regular columns in view key

 alternator-test/test_gsi.py              |  1 -
 view_info.hh                             |  5 +-
 cql3/statements/alter_table_statement.cc |  2 +-
 db/view/view.cc                          | 77 ++++++++++++++----------
 mutation_partition.cc                    |  2 +-
 test/boost/cql_query_test.cc             | 58 ++++++++++++++++++
 6 files changed, 109 insertions(+), 36 deletions(-)
2020-01-14 10:01:00 +02:00
Nadav Har'El
f16e3b0491 merge: bouncing lwt request to an owning shard
Merged patch series from Gleb Natapov:

"LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by the transport
code that jumps to a correct shard and re-process incoming message there.

The nicer way to achieve the same would be to jump to a right shard
inside of the storage_proxy::cas(), but unfortunately with current
implementation of the modification statements they are unusable by
a shard different from where it was created, so the jump should happen
before a modification statement for an cas() is created. When we fix our
cql code to be more cross-shard friendly this can be reworked to do the
jump in the storage_proxy."

Gleb Natapov (4):
  transport: change make_result to takes a reference to cql result
    instead of shared_ptr
  storage_service: move start_native_transport into a thread
  lwt: Process lwt request on a owning shard
  lwt: drop invoke_on in paxos_state prepare and accept

 auth/service.hh                           |   5 +-
 message/messaging_service.hh              |   2 +-
 service/client_state.hh                   |  30 +++-
 service/paxos/paxos_state.hh              |  10 +-
 service/query_state.hh                    |   6 +
 service/storage_proxy.hh                  |   2 +
 transport/messages/result_message.hh      |  20 +++
 transport/messages/result_message_base.hh |   4 +
 transport/request.hh                      |   4 +
 transport/server.hh                       |  25 ++-
 cql3/statements/batch_statement.cc        |   6 +
 cql3/statements/modification_statement.cc |   6 +
 cql3/statements/select_statement.cc       |   8 +
 message/messaging_service.cc              |   2 +-
 service/paxos/paxos_state.cc              |  48 ++---
 service/storage_proxy.cc                  |  47 ++++-
 service/storage_service.cc                | 120 +++++++------
 test/boost/cql_query_test.cc              |   1 +
 thrift/handler.cc                         |   3 +
 transport/messages/result_message.cc      |   5 +
 transport/server.cc                       | 203 ++++++++++++++++------
 21 files changed, 377 insertions(+), 180 deletions(-)
2020-01-14 09:59:59 +02:00
Botond Dénes
300728120f test: mutation_test: add exception safety test for large collection serialization
Use `seastar::memory::local_failure_injector()` to inject al possible
`std::bad_alloc`:s into the collection serialization code path. The test
just checks that there are no `std::abort()`:s caused by any of the
exceptions.

The test will not be run if `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` is
not defined.
2020-01-13 16:53:35 +02:00
Botond Dénes
3ec889816a data/cell.hh: avoid accidental copies of non-nothrow copiable ranges
`cell::make_collection()` assumes that all ranges passed to it are
nothrow copyable and movable views. This is not guaranteed, is not
expressed in the interface and is not mentioned in the comments either.
The changes introduced by 0a453e5d3a to collection serialization, making
it use fragmented buffers, fell into this trap, as it passes
`bytes_ostream` to `cell::make_collection()`. `bytes_ostream`'s copy
constructor allocates and hence can throw, triggering an
`std::terminate()` inside `cell::make_collection()` as the latter is
noexcept.

To solve this issue, non-nothrow copyable and movable ranges are now
wrapped in a `fragment_range_view` to make them so.
`cell::make_collection()` already requires callers to keep alive the
range for the duration of the call, so this does not introduce any new
requirements to the callers. Additionally, to avoid any future
accidents, do not accept temporaries for the `data` parameter. We don't
ever want to move this param anyway, we will either have a trivially
copyable view, or a potentially heavy-weight range that we will create a
trivially copyable view of.
2020-01-13 16:53:35 +02:00
Botond Dénes
b52b4d36a2 utils/fragment_range.hh: introduce fragment_range_view
A lightweight, trivially copyable and movable view for fragment ranges.
Allows for uniform treatment of all kinds of ranges, i.e. treating all
of them as a view. Currently `fragment_range.hh` provides lightweight,
view-like adaptors for empty and single-fragment ranges (`bytes_view`). To
allow code to treat owning multi-fragment ranges the shame way as the
former two, we need a view for the latter as well -- this is
`fragment_range_view`.
2020-01-13 16:52:59 +02:00
Calle Wilund
75f2b2876b cdc: Remove free function for mutation augmentation 2020-01-13 13:18:55 +00:00
Calle Wilund
3eda3122af cdc: Move mutation augment from cql3::modification_statement to storage proxy
Using the attached service object
2020-01-13 13:18:55 +00:00
Juliusz Stasiewicz
27dfda0b9e main/transport: using the infrastructure of system.clients
Resolves #4820. Execution path in main.cc now cleans up system.clients
table if it exists (this is done on startup). Also, server.cc now calls
functions that notify about cql clients connecting/disconnecting.
2020-01-13 14:07:04 +01:00
Pavel Emelyanov
148da64a7e storage_servce: Move load_broadcaster away
This simplifies the storage_service API and fixes the
complain about shared_ptr usage instead of unique_ptr.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-13 13:55:09 +03:00
Pavel Emelyanov
b6e1e6df64 misc_services: Introduce load_meter
There's a lonely get_load_map() call on storage_service that
needs only load broadcaster, always runs on shard 0 and that's it.

Next patch will move this whole stuff into its own helper no-shard
container and this is preparation for this.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-13 13:53:08 +03:00
Gleb Natapov
5753ab7195 lwt: drop invoke_on in paxos_state prepare and accept
Since lwt requests are now running on an owning shard there is no longer
a need to invoke cross shard call on paxos_state level. RPC calls may
still arrive to a wrong shard so we need to make cross shard call there.
2020-01-13 10:26:02 +02:00
Gleb Natapov
d28dd4957b lwt: Process lwt request on a owning shard
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by transport code
that jumps to a correct shard and re-process incoming message there.
2020-01-13 10:26:02 +02:00
Piotr Sarna
3853594108 alternator-test: turn off TLS self-signed verification
Two test cases did not ignore TLS self-signed warnings, which are used
locally for testing HTTPS.

Fixes #5557

Tests(test_health, test_authorization)
Message-Id: <8bda759dc1597644c534f94d00853038c2688dd7.1578394444.git.sarna@scylladb.com>
2020-01-10 15:31:30 +02:00
Rafael Ávila de Espíndola
5313828ab8 cql3: Fix indentation
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200109025855.10591-2-espindola@scylladb.com>
2020-01-09 10:42:55 +02:00
Rafael Ávila de Espíndola
4da6dc1a7f cql3: Change a lambda capture order to match another
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200109025855.10591-1-espindola@scylladb.com>
2020-01-09 10:42:49 +02:00
Avi Kivity
6d454d13ac db/schema_tables: make gratuitous generic lambdas in do_merge_schema() concrete
Those gratuitous lambdas make life harder for IDE users by hiding the actual
types from the IDEs.
Message-Id: <20200107154746.1918648-1-avi@scylladb.com>
2020-01-08 17:43:18 +01:00
Avi Kivity
454074f284 Merge "database: Avoid OOMing with flush continuations after failed memtable flush" from Tomasz
"
The original fix (10f6b125c8) didn't
take into account that if there was a failed memtable flush (Refs
flush) but is not a flushable memtable because it's not the latest in
the memtable list. If that happens, it means no other memtable is
flushable as well, cause otherwise it would be picked due to
evictable_occupancy(). Therefore the right action is to not flush
anything in this case.

Suspected to be observed in #4982. I didn't manage to reproduce after
triggering a failed memtable flush.

Fixes #3717
"

* tag 'avoid-ooming-with-flush-continuations-v2' of github.com:tgrabiec/scylla:
  database: Avoid OOMing with flush continuations after failed memtable flush
  lsa: Introduce operator bool() to occupancy_stats
  lsa: Expose region_impl::evictable_occupancy in the region class
2020-01-08 16:58:54 +02:00
Gleb Natapov
feed544c5d paxos: fix truncation time checking during learn stage
The comparison is done in millisecons, not microseconds.

Fixes #5566

Message-Id: <20200108094927.GN9084@scylladb.com>
2020-01-08 14:37:07 +01:00
Gleb Natapov
2832f1d9eb storage_service: move start_native_transport into a thread
The code runs only once and it is simple if it runs in a seastar thread.
2020-01-08 14:57:57 +02:00
Gleb Natapov
7fb2e8eb9f transport: change make_result to takes a reference to cql result instead of shared_ptr 2020-01-08 14:57:57 +02:00
Avi Kivity
0bde5906b3 Merge "cql3: detect and handle int overflow in aggregate functions #5537" from Benny
"
Fix overflow handling in sum() and avg().

sum:
 - aggregated into __int128
 - detect overflow when computing result and log a warning if found

avg:
 - fix division function to divide the accumulator type _sum (__int128 for integers) by _count

Add unit tests for both cases

Test:
  - manual test against Cassandra 3.11.3 to make sure the results in the scylla unit test agree with it.
  - unit(dev), cql_query_test(debug)

Fixes #5536
"

* 'cql3-sum-overflow' of https://github.com/bhalevy/scylla:
  test: cql_query_test: test avg overflow
  cql3: functions: protect against int overflow in avg
  test: cql_query_test: test sum overflow
  cql3: functions: detect and handle int overflow in sum
  exceptions: sort exception_code definitions
  exceptions: define additional cassandra CQL exceptions codes
2020-01-08 10:39:38 +02:00
Avi Kivity
d649371baa Merge "Fix crash on SELECT SUM(udf(...))" from Rafael
"
We were failing to start a thread when the UDF call was nested in an
aggregate function call like SUM.
"

* 'espindola/fix-sum-of-udf' of https://github.com/espindola/scylla:
  cql3: Fix indentation
  cql3: Add missing with_thread_if_needed call
  cql3: Implement abstract_function_selector::requires_thread
  remove make_ready_future call
2020-01-08 10:25:42 +02:00
Benny Halevy
dafbd88349 query: initialize read_command timestamp to now
This was initialized to api::missing_timestamp but
should be set to either a client provided-timestamp or
the server's.

Unlike write operations, this timestamp need not be unique
as the one generated by client_state::get_timestamp.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200108074021.282339-2-bhalevy@scylladb.com>
2020-01-08 10:19:07 +02:00
Benny Halevy
39325cf297 storage_proxy: fix int overflow in service::abstract_read_executor::execute
exec->_cmd->read_timestamp may be initialized by default to api::min_timestamp,
causing:
  service/storage_proxy.cc:3328:116: runtime error: signed integer overflow: 1577983890961976 - -9223372036854775808 cannot be represented in type 'long int'
  Aborting on shard 1.

Do not optimize cross-dc repair if read_timestamp is missing (or just negative)
We're interested in reads that happen within write_timeout of a write.

Fixes #5556

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200108074021.282339-1-bhalevy@scylladb.com>
2020-01-08 10:18:59 +02:00
Raphael S. Carvalho
390c8b9b37 sstables: Move STCS implementation to source file
header only implementation potentially create a problem with duplicate symbols

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200107154258.9746-1-raphaelsc@scylladb.com>
2020-01-08 09:55:35 +02:00
Benny Halevy
20a0b1a0b6 test: cql_query_test: test avg overflow
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:50:50 +02:00
Benny Halevy
1c81422c1b cql3: functions: protect against int overflow in avg
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:33 +02:00
Benny Halevy
9053ef90c7 test: cql_query_test: test sum overflow
Add unit tests for summing up int's and bigint's
with possible handling of overflow.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:33 +02:00
Benny Halevy
e97a111f64 cql3: functions: detect and handle int overflow in sum
Detect integer overflow in cql sum functions and throw an error.
Note that Cassandra quietly truncates the sum if it doesn't fit
in the input type but we rather break compatibility in this
case. See https://issues.apache.org/jira/browse/CASSANDRA-4914?focusedCommentId=14158400&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14158400

Fixes #5536

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:33 +02:00
Benny Halevy
98260254df exceptions: sort exception_code definitions
Be compatible with Cassandra source.
It's easier to maintain this way.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:21 +02:00
Benny Halevy
30d0f1df75 exceptions: define additional cassandra CQL exceptions codes
As of e9da85723a

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:40:57 +02:00
Rafael Ávila de Espíndola
282228b303 cql3: Fix indentation
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:14:50 -08:00
Rafael Ávila de Espíndola
4316bc2e18 cql3: Add missing with_thread_if_needed call
This fixes an assert when doing sum(udf(...)).

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:14:50 -08:00
Rafael Ávila de Espíndola
d301d31de0 cql3: Implement abstract_function_selector::requires_thread
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:14:24 -08:00
Rafael Ávila de Espíndola
dc9b3b8ff2 remove make_ready_future call
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:10:27 -08:00
Calle Wilund
9f6b22d882 cdc: Assign self to storage proxy object 2020-01-07 12:01:58 +00:00
Calle Wilund
fc5904372b storage_proxy: Add (optional) cdc service object pointer member
The cdc service is assigned from outside, post construction, mainly
because of the chickens and eggs in main startup. Would be nice to
have it unconditionally, but this is workable.
2020-01-07 12:01:58 +00:00
Calle Wilund
d6003253dd storage_proxy: Move mutate_counters to private section
It is (and shall) only be called from inside storage proxy,
and we would like this to be reflected in the interface
so our eventual moving of cdc logic into the mutate call
chains become easier to verify and comprehend.
2020-01-07 12:01:58 +00:00
Calle Wilund
b6c788fccf cdc: Add augmentation call to cdc service
To eventually replace the free function.
Main difference is this is build to both handle batches correctly
and to eventually allow hanging cdc object on storage proxy,
and caches on the cdc object.
2020-01-07 12:01:58 +00:00
Piotr Sarna
04dc8faec9 test: add a case for multiple base regular columns in view key
The test case checks that having two base regular columns
in the materialized view key (not obtainable via CQL),
still works fine when values are inserted or deleted.
If TTL was involved and these columns would have different expiration
rules, the case would be more complicated, but it's not possible
for a user to reach that case - neither with CQL, nor with alternator.
2020-01-07 12:19:06 +01:00
Piotr Sarna
155a47cc55 view: handle multiple regular base columns in view pk
Previous assumption was that there can only be one regular base column
in the view key. The assumption is still correct for tables created
via CQL, but it's internally possible to create a view with multiple
such columns - the new assumption is that if there are multiple columns,
they share their liveness.
This patch is vital for indexing to work properly on alternator,
so it would be best to solve the issue upstream. I strived to leave
the existing semantics intact as long as only up to one regular
column is part of the materialized view primary key, which is the case
for Scylla's materialized views. For alternator it may not be true,
but all regular columns in alternator share liveness info (since
alternator does not support per-column TTL), which is sufficient
to compute view updates in a consistent way.

Fixes #5006

Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo)

Message-Id: <c9dec243ce903d3a922ce077dc274f988bcf5d57.1567604945.git.sarna@scylladb.com>
2020-01-07 12:18:39 +01:00
Avi Kivity
6e0a073b2e mutation_partition: use type-aware printing of the clustering row
Now that position_in_partition_view has type-aware printing, use it
to provide a human readable version of clustering keys.
Message-Id: <20191231151315.602559-2-avi@scylladb.com>
2020-01-07 12:17:11 +01:00
Avi Kivity
488c42408a position_in_partition_view: add type-aware printer
If the position_in_partition_view represents a clustering key,
we can now see it with the clustering key decoded according to
the schema.
Message-Id: <20191231151315.602559-1-avi@scylladb.com>
2020-01-07 12:15:09 +01:00
Piotr Sarna
54315f89cd db,view: fix checking if partition key is empty
Previous implementation did not take into account that a column
in a partition key might exist in a mutation, but in a DEAD state
- if it's deleted. There are no regressions for CQL, while for
alternator and its capability of having two regular base columns
in a view key, this additional check must be performed.
2020-01-07 12:05:36 +01:00
Avi Kivity
3a3c20d337 schema_tables: de-templatize diff_table_or_view()
This reduces code bloat and makes the code friendlier for IDEs, as the
IDE now understands the type of create_schema.
Message-Id: <20191231134803.591190-1-avi@scylladb.com>
2020-01-07 11:56:54 +01:00
Avi Kivity
e5e42672f5 sstables: reduce bloat from sstables::write_simple()
sstables::write_simple() has quite a lot of boilerplate
which gets replicated into each template instance. Move
all of that into a non-template do_write_simple(), leaving
only things that truly depend on the component being written
in the template, and encapsulating them with a
noncopyable_function.

An explicit template instantiation was added, since this
is used in a header file. Before, it likely worked by
accident and stopped working when the template became
small enough to inline.

Tests: unit (dev)
Message-Id: <20200106135453.1634311-1-avi@scylladb.com>
2020-01-07 11:56:11 +01:00
Avi Kivity
8f7f56d6a0 schema_tables: make gratuitous generic lambda in create_tables_from_partitions() concrete
The generic lambda made IDE searches for create_table_from_table_row() fail.
Message-Id: <20191231135210.591972-1-avi@scylladb.com>
2020-01-07 11:49:10 +01:00
Avi Kivity
92fd83d3af schema_tables: make gratuitoous generic lambda in create_table_from_name() concrete
The lambda made IDE searches for read_table_mutations fail.
Message-Id: <20191231135103.591741-1-avi@scylladb.com>
2020-01-07 11:48:56 +01:00
Avi Kivity
dd6dd97df9 schema_tables: make gratuitous generic lambda in merge_tables_and_views() concrete
The generic lambda made IDE searches for create_table_from_mutations fail.
Message-Id: <20191231135059.591681-1-avi@scylladb.com>
2020-01-07 11:48:39 +01:00
Avi Kivity
c63cf02745 canonical_mutation: add pretty printing
Add type-aware printing of canonical_mutation objects.
2020-01-07 12:06:31 +02:00
Avi Kivity
e093121687 mutation_partition_view: add virtual visitor
mutation_partition_view now supports a compile-time resolved visitor.
This is performant but results in bloat when the performance is not
needed. Furthermore, the template function that applies the object
to the visitor is private and out-of-line, to reduce compile time.

To allow visitation on mutation_partition_view objects, add a virtual
visitor type and a non-template accept function.

Note: mutation_partition_visitor is very similar to the new type,
but different enough to break the template visitor which is used
to implement the new visitor.

The new visitor will be used to implement pretty printing for
canonical_mutation.
2020-01-07 12:06:31 +02:00
Avi Kivity
75d9909b27 collection_mutation_view: add type-aware pretty printer
Add a way for the user to associate a type with a collection_mutation_view
and get a nice printout.
2020-01-07 12:06:29 +02:00
Rafael Ávila de Espíndola
b80852c447 main: Explicitly allow scylla core dumps
I have not looked into the security reason for disabling it when
a program has file capabilities.

Fixes #5560

[avi: remove extraneous semicolon]
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200106231836.99052-1-espindola@scylladb.com>
2020-01-07 11:15:59 +02:00
Rafael Ávila de Espíndola
07f1cb53ea tests: run with ASAN_OPTIONS='disable_coredump=0:abort_on_error=1'
These are the same options we use in seastar.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200107001513.122238-1-espindola@scylladb.com>
2020-01-07 11:11:49 +02:00
Takuya ASADA
238a25a0f4 docker: fix typo of scylla-jmx script path (#5551)
The path should /opt/scylladb/jmx, not /opt/scylladb/scripts/jmx.

Fixes #5542
2020-01-07 10:54:16 +02:00
Asias He
401854dbaf repair: Avoid duplicated partition_end write
Consider this:

1) Write partition_start of p1
2) Write clustering_row of p1
3) Write partition_end of p1
4) Repair is stopped due to error before writing partition_start of p2
5) Repair calls repair_row_level_stop() to tear down which calls
   wait_for_writer_done(). A duplicate partition_end is written.

To fix, track the partition_start and partition_end written, avoid
unpaired writes.

Backports: 3.1 and 3.2
Fixes: #5527
2020-01-06 14:06:02 +02:00
Eliran Sinvani
e64445d7e5 debian-reloc: Propagate PRODUCT variable to renaming command in debian pkg
commit 21dec3881c introduced
a bug that will cause scylla debian build to fail. This is
because the commit relied on the environment PRODUCT variable
to be exported (and as a result, to propogate to the rename
command that is executed by find in a subshell)
This commit fixes it by explicitly passing the PRODUCT variable
into the rename command.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20200106102229.24769-1-eliransin@scylladb.com>
2020-01-06 12:31:58 +02:00
Asias He
38d4015619 gossiper: Remove HIBERNATE status from dead state
In scylla, the replacing node is set as HIBERNATE status. It is the only
place we use HIBERNATE status. The replacing node is supposed to be
alive and updating its heartbeat, so it is not supposed to be in dead
state.

This patch fixes the following problem in replacing.

   1) start n1, n2
   2) n2 is down
   3) start n3 to replace n2, but kill n3 in the middle of the replace
   4) start n4 to replace n2

After step 3 and step 4, the old n3 will stay in gossip forever until a
full cluster shutdown. Note n3 will only stay in gossip but in
system.peers table. User will see the annoying and infinite logs like on
all the nodes

   rpc - client $ip_of_n3:7000: fail to connect: Connection refused

Fixes: #5449
Tests: replace_address_test.py + manual test
2020-01-06 11:47:31 +02:00
Amos Kong
c5ec1e3ddc scylla_ntp_setup: check redhat variant version by prase_version (#5434)
VERSION_ID of centos7 is "7", but VERSION_ID of oel7.7 is "7.7"
scylla_ntp_setup doesn't work on OEL7.7 for ValueError.

- ValueError: invalid literal for int() with base 10: '7.7'

This patch changed redhat_version() to return version string, and compare
with parse_version().

Fixes #5433

Signed-off-by: Amos Kong <amos@scylladb.com>
2020-01-06 11:43:14 +02:00
Asias He
145fd0313a streaming: Fix map access in stream_manager::get_progress
When the progress is queried, e.g., query from nodetool netstats
the progress info might not be updated yet.

Fix it by checking before access the map to avoid errors like:

std::out_of_range (_Map_base::at)

Fixes: #5437
Tests: nodetool_additional_test.py:TestNodetool.netstats_test
2020-01-06 10:31:15 +02:00
Rafael Ávila de Espíndola
98cd8eddeb tests: Run with halt_on_error=1:abort_on_error=1
This depends on the just emailed fixes to undefined behavior in
tests. With this change we should quickly notice if a change
introduces undefined behavior.

Fixes #4054

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>

Message-Id: <20191230222646.89628-1-espindola@scylladb.com>
2020-01-05 17:20:31 +02:00
Rafael Ávila de Espíndola
dc5ecc9630 enum_option_test: Add explicit underlying types to enums
We expect to be able to create variables with out of range values, so
these enums needs explicit underlying types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200102173422.68704-1-espindola@scylladb.com>
2020-01-05 17:20:31 +02:00
Nadav Har'El
f0d8dd4094 merge: CDC rolling upgrade
Merged pull request https://github.com/scylladb/scylla/pull/5538 from
Avi Kivity and Piotr Jastrzębski.

This series prepares CDC for rolling upgrade. This consists of
reducing the footprint of cdc, when disabled, on the schema, adding
a cluster feature, and redacting the cdc column when transferring
it to other nodes. The latter is needed because we'll want to backport
this to 3.2, which doesn't have canonical_mutations yet.
2020-01-05 17:13:12 +02:00
Gleb Natapov
720c0aa285 commitlog: update last sync timestamp when cycle a buffer
If in memory buffer has not enough space for incoming mutation it is
written into a file, but the code missed updating timestamp of a last
sync, so we may sync to often.
Message-Id: <20200102155049.21291-9-gleb@scylladb.com>
2020-01-05 16:13:59 +02:00
Gleb Natapov
14746e4218 commitlog: drop segment gate
The code that enters the gate never defers before leaving, so the gate
behaves like a flag. Lets use existing flag to prohibit adding data to a
closed segment.
Message-Id: <20200102155049.21291-8-gleb@scylladb.com>
2020-01-05 16:13:59 +02:00
Gleb Natapov
f8c8a5bd1f test: fix error reporting in commitlog_test
Message-Id: <20200102155049.21291-7-gleb@scylladb.com>
2020-01-05 16:13:58 +02:00
Gleb Natapov
680330ae70 commitlog: introduce segment::close() function.
Currently segment closing code is spread over several functions and
activated based on the _closed flag. Make segment closing explicit
by moving all the code into close() function and call it where _closed
flag is set.
Message-Id: <20200102155049.21291-6-gleb@scylladb.com>
2020-01-05 16:13:55 +02:00
Gleb Natapov
a1ae08bb63 commitlog: remove unused segment::flush() parameter
Message-Id: <20200102155049.21291-5-gleb@scylladb.com>
2020-01-05 16:13:55 +02:00
Gleb Natapov
1e15e1ef44 commitlog: cleanup segment sync()
Call cycle() only once.
Message-Id: <20200102155049.21291-4-gleb@scylladb.com>
2020-01-05 16:13:54 +02:00
Gleb Natapov
3d3d2c572e commitlog: move segment shutdown code from sync()
Currently sync() does two completely different things based on the
shutdown parameter. Separate code into two different function.
Message-Id: <20200102155049.21291-3-gleb@scylladb.com>
2020-01-05 16:13:54 +02:00
Gleb Natapov
89afb92b28 commitlog: drop superfluous this
Message-Id: <20200102155049.21291-2-gleb@scylladb.com>
2020-01-05 16:13:53 +02:00
Piotr Jastrzebski
95feeece0b scylla_tables: treat empty cdc props as disabled
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
396e35bf20 cdc: add schema_change test for cdc_options
The original "test_schema_digest_does_not_change" test case ensures
that schema digests will match for older nodes that do not support
all the features yet (including computed columns).
The additional case uses sstables generated after CDC was enabled
and a table with CDC enabled is created,
in order to make sure that the digest computed
including CDC column does not change spuriously as well.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
c08e6985cd cdc: allow cluster rolling upgrade
Addition of cdc column in scylla_tables changes how schema
digests are calculated, and affect the ABI of schema update
messages (adding a column changes other columns' indexes
in frozen_mutation).

To fix this, extend the schema_tables mechanism with support
for the cdc column, and adjust schemas and mutations to remove
that column when sending schemas during upgrade.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
caa0a4e154 tests: disable CDC in schema_change_tests
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
129af99b94 cdc: Return reference from cluster_supports_cdc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
4639989964 cdc: Add CDC_OPTIONS schema_feature
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Avi Kivity
c150f2e5d7 schema_tables, cdc: don't store empty cdc columns in scylla_tables
An empty cdc column in scylla_tables is hashed differently from
a missing column. This causes schema mismatch when a schema is
propagated to another node, because the other node will redact
the schema column completely if the cluster feature isn't enabled,
and an empty value is hashed differently from a missing value.

Store a tombstone instead. Tombstones are removed before
digesting, so they don't affect the outcome.

This change also undoes the changes in 386221da84 ("schema_tables:
 handle 'cdc' options") to schema_change_test
test_merging_does_not_alter_tables_which_didnt_change. That change
enshrined the breakage into the test, instead of fixing the root cause,
which was that we added an an extra mutation to the schema (for
cdc options, which were disabled).
2020-01-05 14:36:18 +02:00
Rafael Ávila de Espíndola
3d641d4062 lua: Use existing cpp_int cast logic
Different versions of boost have different rules for what conversions
from cpp_int to smaller intergers are allowed.

We already had a function that worked with all supported versions, but
it was not being use by lua.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200104041028.215153-1-espindola@scylladb.com>
2020-01-05 12:10:54 +02:00
Rafael Ávila de Espíndola
88b5aadb05 tests: cql_test_env: wait for two futures starting internal services
I noticed this while looking at the crashes next is currently
experiencing.

While I have no idea if this fixes the issue, it does avoid broken
future warnings (for no_sharded_instance_exception) in a debug build.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200103201540.65324-1-espindola@scylladb.com>
2020-01-05 12:09:59 +02:00
Avi Kivity
4b8e2f5003 Update seastar submodule
* seastar 0525bbb08...36cf5c5ff (6):
  > memcached: Fix use after free in shutdown
  > Revert "task: stop wrapping tasks with unique_ptr"
  > task: stop wrapping tasks with unique_ptr
  > http: Change exception formating to the generic seastar one
  > Merge "Avoid a few calls to ~exception_ptr" from Rafael
  > tests: fix core generation with asan
2020-01-03 15:48:53 +02:00
Nadav Har'El
44c2a44b54 alternator-test: test for ConditionExpression feature
This patch adds a very comprehensive test for the ConditionExpression
feature, i.e., the newer syntax of conditional writes replacing
the old-style "Expected" - for the UpdateItem, PutItem and DeleteItem
operations.

I wrote these tests while closely following the DynamoDB ConditionExpression
documentation, and attempted to cover all conceivable features, subfeatures
and subcases of the ConditionExpression syntax - to serve as a test for a
future support for this feature in Alternator (see issue #5053).

As usual, all these tests pass on AWS DynamoDB, but because we haven't yet
implemented this feature in Alternator, all but one xfail on Alternator.

Refs #5053.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191229143556.24002-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Nadav Har'El
aad5eeab51 alternator: better error messages when Alternator port is taken
If Alternator is requested to be enabled on a specific port but the port is
already taken, the boot fails as expected - but the error log is confusing;
It currently looks something like this:

WARN  2019-12-24 11:22:57,303 [shard 0] alternator-server - Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)
... (many more messages about the server shutting down)
INFO  2019-12-24 11:22:58,008 [shard 0] init - Startup failed: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)

There are two problems here. First, the "WARN" should really be an "ERROR",
because it causes the server to be shut down and the user must see this error.
Second, the final line in the log, something the user is likely to see first,
contains only the ultimate cause for the exception (an address already in use)
but not the information what this address was needed for.

This patch solves both issues, and the log now looks like:

ERROR 2019-12-24 14:00:54,496 [shard 0] alternator-server - Failed to set up Alterna
tor HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system
:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)
...
INFO  2019-12-24 14:00:55,056 [shard 0] init - Startup failed: std::_Nested_exception<std::runtime_error> (Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043): std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191224124127.7093-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Nadav Har'El
1f64a3bbc9 alternator: error on unsupported ReturnValues option
We don't support yet the ReturnValues option on PutItem, UpdateItem or
DeleteItem operations (see issue #5053), but if a user tries to use such
an option anyway, we silently ignore this option. It's better to fail,
reporting the unsupported option.

In this patch we check the ReturnValues option and if it is anything but
the supported default ("NONE"), we report an error.

Also added a test to confirm this fix. The test verifies that "NONE" is
allowed, and something which is unsupported (e.g., "DOG") is not ignored
but rather causes an error.

Refs #5053.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191216193310.20060-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Rafael Ávila de Espíndola
dc93228b66 reloc: Turn the default flags into common flags
These are flags we always want to enable. In particular, we want them
to be used by the bots, but the bots run this script with
--configure-flags, so they were being discarded.

We put the user option later so that they can override the common
options.

Fixes #5505

Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Reviewed-by: Takuya ASADA <syuu@scylladb.com>
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-03 15:48:20 +02:00
Rafael Ávila de Espíndola
d4dfb6ff84 build-id: Handle the binary having multiple PT_NOTE headers
There is no requirement that all notes be placed in a single
PT_NOTE. It looks like recent lld's actually put each section in its
own PT_NOTE.

This change looks for build-id in all PT_NOTE headers.

Fixes #5525

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191227000311.421843-1-espindola@scylladb.com>
2020-01-03 15:48:20 +02:00
Avi Kivity
1e9237d814 dist: redhat: use parallel compression for rpm payload
rpm compression uses xz, which is painfully slow. Adjust the
compression settings to run on all threads.

The xz utility documentation suggests that 0 threads is
equivalent to all CPUs, but apparently the library interface
(which rpmbuild uses) doesn't think the same way.

Message-Id: <20200101141544.1054176-1-avi@scylladb.com>
2020-01-03 15:48:20 +02:00
Nadav Har'El
de1171181c user defined types: fix support for case-sensitive type names
In the current code, support for case-sensitive (quoted) user-defined type
names is broken. For example, a test doing:

    CREATE TYPE "PHone" (country_code int, number text)
    CREATE TABLE cf (pk blob, pn "PHone", PRIMARY KEY (pk))

Fails - the first line creates the type with the case-sensitive name PHone,
but the second line wrongly ends up looking for the lowercased name phone,
and fails with an exception "Unknown type ks.phone".

The problem is in cql3_type_name_impl. This class is used to convert a
type object into its proper CQL syntax - for example frozen<list<int>>.
The problem is that for a user-defined type, we forgot to quote its name
if not lowercase, and the result is wrong CQL; For example, a list of
PHone will be written as list<PHone> - but this is wrong because the CQL
parser, when it sees this expression, lowercases the unquoted type name
PHone and it becomes just phone. It should be list<"PHone">, not list<PHone>.

The solution is for cql3_type_name_impl to use for a user-defined type
its get_name_as_cql_string() method instead of get_name_as_string().

get_name_as_cql_string() is a new method which prints the name of the
user type as it should be in a CQL expression, i.e., quoted if necessary.

The bug in the above test was apparently caused when our code serialized
the type name to disk as the string PHone (without any quoting), and then
later deserialized it using the CQL type parser, which converted it into
a lowercase phone. With this patch, the type's name is serialized as
"PHone", with the quotes, and deserialized properly as the type PHone.
While the extra quotes may seem excessive, they are necessary for the
correct CQL type expression - remember that the type expression may be
significantly more complex, e.g., frozen<list<"PHone">> and all of this,
including the quotes, is necessary for our parser to be able to translate
this string back into a type object.

This patch may cause breakage to existing databases which used case-
sensitive user-defined types, but I argue that these use cases were
already broken (as demonstrated by this test) so we won't break anything
that actually worked before.

Fixes #5544

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200101160805.15847-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Pavel Emelyanov
34f8762c4d storage_service: Drop _update_jobs
This field is write-only.
Leftover from 83ffae1 (storage_service: Drop block_until_update_pending_ranges_finished)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191226091210.20966-1-xemul@scylladb.com>
2020-01-03 15:48:20 +02:00
Pavel Emelyanov
f2b20e7083 cache_hitrate_calculator: Do not reinvent the peering_sharded_service
The class in question wants to run its own instances on different
shards, for this sake it keeps reference on sharded self to call
invoke_on() on. There's a handy peering_sharded_service<> in seastar
for the same, using it makes the code nicer and shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191226112401.23960-1-xemul@scylladb.com>
2020-01-03 15:48:19 +02:00
Rafael Ávila de Espíndola
bbed9cac35 cql3: move function creation to a .cc file
We had a lot of code in a .hh file, that while using templeates, was
only used from creating functions during startup.

This moves it to a new .cc file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200101002158.246736-1-espindola@scylladb.com>
2020-01-03 15:48:19 +02:00
Benny Halevy
c0883407fe scripts: Add cpp-name-format: pretty printer
Pretty-print cpp-names, useful for deciphering complex backtraces.

For example, the following line:
    service::storage_proxy::init_messaging_service()::{lambda(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector<frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info>)#1}::operator()(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector<frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info>) const at /local/home/bhalevy/dev/scylla/service/storage_proxy.cc:4360

Is formatted as:
    service::storage_proxy::init_messaging_service()::{
      lambda(
        seastar::rpc::client_info const&,
        seastar::rpc::opt_time_point,
        std::vector<
          frozen_mutation,
          std::allocator<frozen_mutation>
        >,
        db::consistency_level,
        std::optional<tracing::trace_info>
      )#1
    }::operator()(
      seastar::rpc::client_info const&,
      seastar::rpc::opt_time_point,
      std::vector<
        frozen_mutation,
        std::allocator<frozen_mutation>
      >,
      db::consistency_level,
      std::optional<tracing::trace_info>
    ) const at /local/home/bhalevy/dev/scylla/service/storage_proxy.cc:4360

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191226142212.37260-1-bhalevy@scylladb.com>
2020-01-01 12:08:12 +02:00
Rafael Ávila de Espíndola
75817d1fe7 sstable: Add checks to help track problems with large_data_handler use after free
I can't quite figure out how we were trying to write a sstable with
the large data handler already stopped, but the backtrace suggests a
good place to add extra checks.

This patch adds two check. One at the start and one at the end of
sstable::write_components. The first one should give us better
backtraces if the large_data_handler is already stopped. The second
one should help catch some race condition.

Refs: #5470
Message-Id: <20191231173237.19040-1-espindola@scylladb.com>
2020-01-01 12:03:31 +02:00
Rafael Ávila de Espíndola
3c34e2f585 types: Avoid an unaligned load in json integer serialization
The patch also adds a test that makes the fixed issue easier to
reproduce.

Fixes #5413
Message-Id: <20191231171406.15980-1-espindola@scylladb.com>
2019-12-31 19:23:42 +02:00
Gleb Natapov
bae5cb9f37 commitlog: remove unused argument during segment creation
Since 99a5a77234 all segments are created
equal and "active" argument is never true, so drop it.

Message-Id: <20191231150639.GR9084@scylladb.com>
2019-12-31 17:14:03 +02:00
Rafael Ávila de Espíndola
aa535a385d enum_option_test: Add an explicit underlying type to an enum
We expect to be able to create a variable with an out of range value,
so the enum needs an explicit underlying type.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191230222029.88942-1-espindola@scylladb.com>
2019-12-31 16:59:00 +02:00
Nadav Har'El
48a914c291 Fix uninitialized members
Merged pull request https://github.com/scylladb/scylla/pull/5532 from
Benny Halevy:

Initialize bool members in row_level_repair and _storage_service causing
ubsan errors.

Fixes #5531
2019-12-31 10:32:54 +02:00
Takuya ASADA
aa87169670 dist/debian: add procps on Depends
We require procps package to use sysctl on postinst script for scylla-kernel-conf.

Fixes #5494

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191218234100.37844-1-syuu@scylladb.com>
2019-12-30 19:30:35 +02:00
Avi Kivity
972127e3a8 atomic_cell: add type-aware pretty printing
The standard printer for atomic_cell prints the value as hex,
because atomic_cell does not include the type. Add a type-aware
printer that allows the user to provide the type.
2019-12-30 18:27:04 +02:00
Avi Kivity
19f68412ad atomic_cell: move pretty printers from database.cc to atomic_cell.cc
atomic_cell.cc is the logical home for atomic_cell pretty printers,
and since we plan to add more pretty printers, start by tidying up.
2019-12-30 18:20:30 +02:00
Eliran Sinvani
21dec3881c debian-reloc: rename buld product to the name specified in SCYLLA-VERSION-GEN
When the product name is other than "scylla", the debian
packaging scripts go over all files that starts with "scylla-"
and change the prefix to be the actual product name.
However, if there are no such files in the directory
the script will fail since the renaming command will
get the wildcard string instrad of an actual file name.
This patch replaces the command with a command with
an equivalent desired effect that only operates on files
if there are any.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20191230143250.18101-1-eliransin@scylladb.com>
2019-12-30 17:45:50 +02:00
Takuya ASADA
263385cb4b dist: stop replacing /usr/lib/scylla with symlink (#5530)
Since we merged /usr/lib/scylla with /opt/scylladb, we removed
/usr/lib/scylla and replace it with the symlink point to /opt/scylladb.
However, RPM does not support replacing a directory with a symlink,
we are doing some dirty hack using RPM scriptlet, but it causes
multiple issues on upgrade/downgrade.
(See: https://docs.fedoraproject.org/en-US/packaging-guidelines/Directory_Replacement/)

To minimize Scylla upgrading/downgrade issues on user side, it's better
to keep /usr/lib/scylla directory.
Instead of creating single symlink /usr/lib/scylla -> /opt/scylladb,
we can create symlinks for each setup scripts like
/usr/lib/scylla/<script> -> /opt/scylladb/scripts/<script>.

Fixes #5522
Fixes #4585
Fixes #4611
2019-12-30 13:52:24 +02:00
Hagit Segev
9d454b7dc6 reloc/build_rpm.sh: Fix '--builddir' option handling (#5519)
The '--builddir' option value is assigned to the "builddir" variable,
which is wrong. The correct variable is "BUILDDIR" so use that instead
to fix the '--builddir' option.

Also, add logging to the script when executing the "dist/redhat_build.rpm.sh"
script to simplify debugging.
2019-12-30 13:25:22 +02:00
Benny Halevy
8aa5d84dd8 storage_service: initialize _is_bootstrap_mode
Hit the following ubsan error with bootstrap_test:TestBootstrap.manual_bootstrap_test in debug mode:
  service/storage_service.cc:3519:37: runtime error: load of value 190, which is not a valid value for type 'bool'

The use site is:
  service::storage_service::is_cleanup_allowed(seastar::basic_sstring<char, unsigned int, 15u, true>)::{lambda(service::storage_service&)#1}::operator()(service::storage_service&) const at /local/home/bhalevy/dev/scylla/service/storage_service.cc:3519

While at it, initialize `_initialized` to false as well, just in case.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-30 11:44:58 +02:00
Benny Halevy
474ffb6e54 repair: initialize row_level_repair: _zero_rows
Avoid following UBSAN error:
repair/row_level.cc:2141:7: runtime error: load of value 240, which is not a valid value for type 'bool'

Fixes #5531

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-30 11:44:58 +02:00
Fabiano Lucchese
d7795b1efa scylla_setup: Support for enforcing optimal Linux clocksource setting (#5499)
A Linux machine typically has multiple clocksources with distinct
performances. Setting a high-performant clocksource might result in
better performance for ScyllaDB, so this should be considered whenever
starting it up.

This patch introduces the possibility of enforcing optimized Linux
clocksource to Scylla's setup/start-up processes. It does so by adding
an interactive question about enforcing clocksource setting to scylla_setup,
which modifies the parameter "CLOCKSOURCE" in scylla_server configuration
file. This parameter is read by perftune.py which, if set to "yes", proceeds
to (non persistently) setting the clocksource. On x86, TSC clocksource is used.

Fixes #4474
Fixes #5474
Fixes #5480
2019-12-30 10:54:14 +02:00
Avi Kivity
e223154268 cdc: options: return an empty options map when cdc is disabled
This is compatible with 3.1 and below, which didn't have that schema
field at all.
2019-12-29 16:34:37 +02:00
Benny Halevy
27e0aee358 docs/debugging.md: fix anchor links
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191229074136.13516-1-bhalevy@scylladb.com>
2019-12-29 16:26:26 +02:00
Pavel Solodovnikov
aba9a11ff0 cql: pass variable_specifications via lw_shared_ptr
Instances of `variable_specifications` are passed around as
shared_ptr's, which are redundant in this case since the class
is marked as `final`. Use `lw_shared_ptr` instead since we know
for sure it's not a polymorphic pointer.

Tests: unit(debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191225232853.45395-1-pa.solodovnikov@scylladb.com>
2019-12-29 16:26:26 +02:00
Benny Halevy
4c884908bb directories: Keep a unique set of directories to initialize
If any two directories of data/commitlog/hints/view_hints
are the same we still end up running verify_owner_and_mode
and disk_sanity(check_direct_io_support) in parallel
on the same directoriea and hit #5510.

This change uses std::set rather than std::vector to
collect a unique set of directories that need initialization.

Fixes #5510

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191225160645.2051184-1-bhalevy@scylladb.com>
2019-12-29 16:26:26 +02:00
Gleb Natapov
60a851d3a5 commitlog: always flush segments atomically with writing
db::commitlog::segment::batch_cycle() assumes that after a write
for a certain position completes (as reported by
_pending_ops.wait_for_pending()) it will also be flushed, but this is
true only if writing and flushing are atomic wrt _pending_ops lock.
It usually is unless flush_after is set to false when cycle() is
called. In this case only writing is done under the lock. This
is exactly what happens when a segment is closed. Flush is skipped
because zero header is added after the last entry and then flushed, but
this optimization breaks batch_cycle() assumption. Fix it by flushing
after the write atomically even if a segment is being closed.

Fixes #5496

Message-Id: <20191224115814.GA6398@scylladb.com>
2019-12-24 14:52:23 +02:00
Pavel Emelyanov
a5cdfea799 directories: Do not mess with per-shard base dir
The hints and view_hints directory has per-shard sub-dirs,
and the directories code tries to create, check and lock
all of them, including the base one.

The manipulations in question are excessive -- it's enough
to check and lock either the base dir, or all the per-shard
ones, but not everything. Let's take the latter approach for
its simplicity.

Fixes #5510

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Looks-good-to: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223142429.28448-1-xemul@scylladb.com>
2019-12-24 14:49:28 +02:00
Benny Halevy
f8f5db42ca dbuild: try to pull image if not present locally
Pekka Enberg <penberg@scylladb.com> wrote:
> Image might not be present, but the subsequent "docker run" command will automatically pull it.

Just letting "docker run" fail produces kinda confusing error message,
referring to docker help, but the we want to provide the user
with our own help, so still fail early, just also try to pull the image
if "docker image inspect" failed, indicating it's not present locally.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223085219.1253342-4-bhalevy@scylladb.com>
2019-12-24 11:13:23 +02:00
Benny Halevy
ee2f97680a dbuild: just die when no image-id is provided
Suggested-by: Pekka Enberg <penberg@scylladb.com>
> This will print all the available Docker images,
> many (most?) of them completely unrelated.
> Why not just print an error saying that no image was specified,
> and then perhaps print usage.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223085219.1253342-3-bhalevy@scylladb.com>
2019-12-24 11:13:22 +02:00
Benny Halevy
87b2f189f7 dbuild: s/usage/die/
Suggested-by: Dejan Mircevski <dejan@scylladb.com>
> The use pattern of this function strongly suggests a name like `die`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223085219.1253342-2-bhalevy@scylladb.com>
2019-12-24 11:13:21 +02:00
Benny Halevy
718e9eb341 table: move_sstables_from_staging: fix use after free of shared_sstable
Introduced in 4b3243f5b9

Reproducible with materialized_views_test:TestMaterializedViews.mv_populating_from_existing_data_during_node_remove_test
and read_amplification_test:ReadAmplificationTest.no_read_amplification_on_repair_with_mv_test

==955382==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200023de18 at pc 0x00000051d788 bp 0x7f8a0563fcc0 sp 0x7f8a0563fcb0
READ of size 8 at 0x60200023de18 thread T1 (reactor-1)
    #0 0x51d787 in seastar::lw_shared_ptr<sstables::sstable>::lw_shared_ptr(seastar::lw_shared_ptr<sstables::sstable> const&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:289
    #1 0x10ba189 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl
e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1530
    #2 0x109c4f1 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl
e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1556
    #3 0x106941a in do_for_each<__gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>*, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >, table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(
std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:618
    #4 0x1069203 in operator() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:626
    #5 0x10ba589 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36
    #6 0x10ba668 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>*, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging
(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44
    #7 0x10ba7c0 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>*, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging
(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563
    ...

0x60200023de18 is located 8 bytes inside of 16-byte region [0x60200023de10,0x60200023de20)
freed by thread T1 (reactor-1) here:
    #0 0x7f8a153b796f in operator delete(void*) (/lib64/libasan.so.5+0x11096f)
    #1 0x6ab4d1 in __gnu_cxx::new_allocator<seastar::lw_shared_ptr<sstables::sstable> >::deallocate(seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/ext/new_allocator.h:128
    #2 0x612052 in std::allocator_traits<std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::deallocate(std::allocator<seastar::lw_shared_ptr<sstables::sstable> >&, seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:470
    #3 0x58fdfb in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::_M_deallocate(seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/bits/stl_vector.h:351
    #4 0x52a790 in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~_Vector_base() /usr/include/c++/9/bits/stl_vector.h:332
    #5 0x52a99b in std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~vector() /usr/include/c++/9/bits/stl_vector.h:680
    #6 0xff60fa in ~<lambda> /local/home/bhalevy/dev/scylla/table.cc:2477
    #7 0xff7202 in operator() /local/home/bhalevy/dev/scylla/table.cc:2496
    #8 0x106af5b in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1573
    #9 0x102f5d5 in futurize_apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1645
    #10 0x102f9ee in operator()<seastar::semaphore_units<seastar::named_semaphore_exception_factory> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/semaphore.hh:488
    #11 0x109d2f1 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36
    #12 0x109d42c in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable>
 >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44
    #13 0x109d595 in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable>
 >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563
    ...

Fixes #5511

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191222214326.1229714-1-bhalevy@scylladb.com>
2019-12-23 15:20:41 +02:00
Konstantin Osipov
476fbc60be test.py: prepare to remove custom colors
Add dbuild dependency on python3-colorama,
which will be used in test.py instead of a hand-made palette.

[avi: update tools/toolchain/image]
Message-Id: <20191223125251.92064-2-kostja@scylladb.com>
2019-12-23 15:13:22 +02:00
Pavel Emelyanov
d361894b9d batchlog_manager: Speed up token_metadata endpoints counting a bit
In this place we only need to know the number of endpoints,
while current code additionally shuffles them before counting.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:45 +02:00
Pavel Emelyanov
6e06c88b4c token_metadata: Remove unused helper
There are two _identical_ methods in token_metadata class:
get_all_endpoints_count() and number_of_endpoints().
The former one is used (called) the latter one is not used, so
let's remove it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:43 +02:00
Pavel Emelyanov
2662d9c596 migration_manager: Remove run_may_throw() first argument
It's unused in this function. Also this helps getting
rid of global instances of components.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:42 +02:00
Pavel Emelyanov
703b16516a storage_service: Remove unused helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:41 +02:00
Takuya ASADA
e0071b1756 reloc: don't archive dist/ami/files/*.rpm on relocatable package
We should skip archiving dist/ami/files/*.rpm on relocatable package,
since it doesn't used.
Also packer and variables.json, too.

Fixes #5508

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191223121044.163861-1-syuu@scylladb.com>
2019-12-23 14:19:51 +02:00
Tomasz Grabiec
28dec80342 db/schema_tables: Add trace-level logging of schema digesting
This greatly helps to narrow down the source of schema digest mismatch
between nodes. Intented use is to enable this logger on disagreeing
nodes and trigger schema digest recalculation and observe which
mutations differ in digest and then examine their content.

Message-Id: <1574872791-27634-1-git-send-email-tgrabiec@scylladb.com>
2019-12-23 12:28:22 +02:00
Konstantin Osipov
1116700bc9 test.py: do not return 0 if there are failed tests
Fix a return value regression introduced when switching to asyncio.

Message-Id: <20191222134706.16616-2-kostja@scylladb.com>
2019-12-22 16:14:32 +02:00
Asias He
7322b749e0 repair: Do not return working_row_buf_nr in get combined row hash verb
In commit b463d7039c (repair: Introduce
get_combined_row_hash_response), working_row_buf_nr is returned in
REPAIR_GET_COMBINED_ROW_HASH in addition to the combined hash. It is
scheduled to be part of 3.1 release. However it is not backported to 3.1
by accident.

In order to be compatible between 3.1 and 3.2 repair. We need to drop
the working_row_buf_nr in 3.2 release.

Fixes: #5490
Backports: 3.2
Tests: Run repair in a mixed 3.1 and 3.2 cluster
2019-12-21 20:13:15 +02:00
Takuya ASADA
8eaecc5ed6 dist/common/scripts/scylla_setup: add swap existance check
Show warnings when no swap is configured on the node.

Closes #2511

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191220080222.46607-1-syuu@scylladb.com>
2019-12-21 20:03:58 +02:00
Pavel Solodovnikov
5a15bed569 cql3: return result_set by cref in cql3::result::result_set
Changes summary:
* make `cql3::result_set` movable-only
* change signature of `cql3::result::result_set` to return by cref
* adjust available call sites to the aforementioned method to accept cref

Motivation behind this change is elimination of dangerous API,
which can easily set a trap for developers who don't expect that
result_set would be returned by value.

There is no point in copying the `result_set` around, so make
`cql3::result::result_set` to cache `result_set` internally in a
`unique_ptr` member variable and return a const reference so to
minimize unnecessary copies here and there.

Tests: unit(debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191220115100.21528-1-pa.solodovnikov@scylladb.com>
2019-12-21 16:56:42 +02:00
Takuya ASADA
3a6cb0ed8c install.sh: drop limits.d from nonroot mode
The file only required for root mode.

Fixes #5507

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191220101940.52596-1-syuu@scylladb.com>
2019-12-21 15:26:08 +02:00
Botond Dénes
08bb0bd6aa mutation_fragment_stream_validator: wrap exceptions into own exception type
So a higher level component using the validator to validate a stream can
catch only validation errors, and let any other incidental exception
through.

This allows building data correctors on top of the
`mutation_fragment_stream_validator`, by filtering a fragment stream
through a validator, catching invalid fragment stream exceptions and
dropping the respective fragments from the stream.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191220073443.530750-1-bdenes@scylladb.com>
2019-12-20 12:05:00 +01:00
Rafael Ávila de Espíndola
91c7f5bf44 Print build-id on startup
Fixes #5426

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191218031556.120089-1-espindola@scylladb.com>
2019-12-19 15:43:04 +02:00
Avi Kivity
440ad6abcc Revert "relocatable: Check that patchelf didn't mangle the PT_LOAD headers"
This reverts commit 237ba74743. While it
works for the scylla executable, it fails for iotune, which is built
by seastar. It should be reinstated after we pass the correct link
parameters to the seastar build system.
2019-12-19 11:20:34 +02:00
Pekka Enberg
c0aea19419 Merge "Add a timeout for housekeeping for offline installs" from Amnon
"
These series solves an issue with scylla_setup and prevent it from
waiting forever if housekeeping cannot look for the new Scylla version.

Fixes #5302

It should be backported to versions that support offline installations.
"

* 'scylla_setup_timeout' of git://github.com/amnonh/scylla:
  scylla_setup: do not wait forever if no reply is return housekeeping
  scylla_util.py: Add optional timeout to out function
2019-12-19 08:18:19 +02:00
Rafael Ávila de Espíndola
8d777b3ad5 relocatable: Use a super long path for the dynamic linker
Having a long path allows patchelf to change the interpreter without
changing the PT_LOAD headers and therefore without moving the
build-id out of the first page.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191213224803.316783-1-espindola@scylladb.com>
2019-12-18 19:10:59 +02:00
Pavel Solodovnikov
c451f6d82a LWT: Fix required participants calculation for LOCAL_SERIAL CL
Suppose we have a multi-dc setup (e.g. 9 nodes distributed across
3 datacenters: [dc1, dc2, dc3] -> [3, 3, 3]).

When a query that uses LWT is executed with LOCAL_SERIAL consistency
level, the `storage_proxy::get_paxos_participants` function
incorrectly calculates the number of required participants to serve
the query.

In the example above it's calculated to be 5 (i.e. the number of
nodes needed for a regular QUORUM) instead of 2 (for LOCAL_SERIAL,
which is equivalent to LOCAL_QUORUM cl in this case).

This behavior results in an exception being thrown when executing
the following query with LOCAL_SERIAL cl:

INSERT INTO users (userid, firstname, lastname, age) VALUES (0, 'first0', 'last0', 30) IF NOT EXISTS

Unavailable: Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level for cl LOCAL_SERIAL. Requires 5, alive 3" info={'required_replicas': 5, 'alive_replicas': 3, 'consistency': 'LOCAL_SERIAL'}

Tests: unit(dev), dtest(consistency_test.py)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191216151732.64230-1-pa.solodovnikov@scylladb.com>
2019-12-18 16:58:32 +01:00
Botond Dénes
cd6bf3cb28 scylla-gdb.py: static_vector: update for changed storage
The actual buffer is now in a member called 'data'. Leave the old
`dummy.dummy` and `dummy` as fall-back. This seems to change every
Fedora release.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191218153544.511421-1-bdenes@scylladb.com>
2019-12-18 17:39:56 +02:00
Tomasz Grabiec
5865d08d6c migration_manager: Recalculate schema only on shard 0
Schema is node-global, update_schema_version_and_announce() updates
all shards.  We don't need to recalculate it from every shard, so
install the listeners only on shard 0. Reduces noise in the logs.

Message-Id: <1574872860-27899-1-git-send-email-tgrabiec@scylladb.com>
2019-12-18 16:43:26 +02:00
Pavel Emelyanov
998f51579a storage_service: Rip join_ring config option
The option in question apparently does not work, several sharded objects
are start()-ed (and thus instanciated) in join_roken_ring, while instances
themselves of these objects are used during init of other stuff.

This leads to broken seastar local_is_initialized assertion on sys_dist_ks,
but reading the code shows more examples, e.g. the auth_service is started
on join, but is used for thrift and cql servers initialization.

The suggestion is to remove the option instead of fixing. The is_joined
logic is kept since on-start joining still can take some time and it's safer
to report real status from the API.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191203140717.14521-1-xemul@scylladb.com>
2019-12-18 12:45:13 +02:00
Nadav Har'El
8157f530f5 merge: CDC: handle schema changes
Merged pull request https://github.com/scylladb/scylla/pull/5366 from Calle Wilund:

Moves schema creation/alter/drop awareness to use new "before" callbacks from
migration manager, and adds/modifies log and streams table as part of the base
table modification.

Makes schema changes semi-atomic per node. While this does not deal with updates
coming in before a schema change has propagated cluster, it now falls into the
same pit as when this happens without CDC.

Added side effect is also that now schemas are transparent across all subsystems,
not just cql.

Patches:
  cdc_test: Add small test for altering base schema (add column)
  cdc: Handle schema changes via migration manager callbacks
  migration_manager: Invoke "before" callbacks for table operations
  migration_listener: Add empty base class and "before" callbacks for tables
  cql_test_env: Include cdc service in cql tests
  cdc: Add sharded service that does nothing.
  cdc: Move "options" to separate header to avoid to much header inclusion
  cdc: Remove some code from header
2019-12-17 23:04:36 +02:00
Avi Kivity
1157ee16a5 Update seastar submodule
* seastar 00da4c8760...0525bbb08f (7):
  > future: Simplify future_state_base::any move constructor
  > future: don't create temporary tuple on future::get().
  > future: don't instantiate new future on future::then_wrapped().
  > future: clean-up the Result handling in then_wrapped().
  > Merge "Fix core dumps when asan is enabled" from Rafael
  > future: Move ignore to the base class
  > future: Don't delete in ignore
2019-12-17 19:47:50 +02:00
Botond Dénes
638623b56b configure.py: make build.ninja target depend on SCYLLA-VERSION-GEN
Currently `SCYLLA-VERSION-GEN` is not a dependency of any target and
hence changes done to it will not be picked up by ninja. To trigger a
rebuild and hence version changes to appear in the `scylla` target
binary, one has to do `touch configure.py`. This is counter intuitive
and frustrating to people who don't know about it and wonder why their
changed version is not appearing as the output of `scylla --version`.

This patch makes `SCYLLA-VERSION-GEN` a dependency of `build.ninja,
making the `build.ninja` target out-of-date whenever
`SCYLLA-VERSION-GEN` is changed and hence will trigger a rerun of
`configure.py` when the next target is built, allowing a build of e.g.
`scylla` to pick up any changes done to the version automatically.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191217123955.404172-1-bdenes@scylladb.com>
2019-12-17 17:40:04 +02:00
Avi Kivity
7152ba0c70 Merge "tests: automatically search for unit tests" from Kostja
"
This patch set rearranges the test files so that
it is now possible to search for tests automatically,
and adds this functionality to test.py
"

* 'test.py.requeue' of ssh://github.com/scylladb/scylla-dev:
  cmake: update CMakeLists.txt to scan test/ rather than tests/
  test.py: automatically lookup all unit and boost tests
  tests: move all test source files to their new locations
  tests: move a few remaining headers
  tests: move another set of headers to the new test layout
  tests: move .hh files and resources to new locations
  tests: remove executable property from data_listeners_test.cc
2019-12-17 17:32:18 +02:00
Amnon Heiman
dd42f83013 scylla_setup: do not wait forever if no reply is return housekeeping
When scylla is installed without a network connectivity, the test if a
newer version is available can cause scylla_setup to wait forever.

This patch adds a limit to the time scylla_setup will wait for a reply.

When there is no reply, the relevent error will be shown that it was
unable to check for newer version, but this will not block the setup
script.

Fixes #5302

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-12-17 14:56:47 +02:00
Nadav Har'El
aa1de5a171 merge: Synchronize snapshot and staging sstable deletion using sem
Merged pull request https://github.com/scylladb/scylla/pull/5343 from
Benny Halevy.

Fixes #5340

Hold the sstable_deletion_sem table::move_sstables_from_subdirs to
serialize access to the staging directory. It now synchronizes snapshot,
compaction deletion of sstables, and view_update_generator moving of
sstables from staging.

Tests:

    unit (dev) [expect test_user_function_timestamp_return that fails for me locally, but also on master]
    snapshot_test.py (dev)
2019-12-17 14:06:02 +02:00
Juliusz Stasiewicz
7fdc8563bf system_keyspace: Added infrastructure for table `system.clients'
I used the following as a reference:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/virtual/ClientsTable.java
At this moment there is only info about IP, clients outgoing port,
client 'type' (i.e. CQL/thrift/alternator), shard ID and username.
Column `request_count' is NOT present and CK consists of
(`port', `client_type'), contrary to what C*'s has: (`port').

Code that notifies `system.clients` about new connections goes
to top-level files `connection_notifier.*`. Currently only CQL
clients are observed, but enum `client_type` can be used in future
to notify about connections with other protocols.
2019-12-17 11:31:28 +01:00
Benny Halevy
4b3243f5b9 table: move_sstables_from_staging_in_thread with _sstable_deletion_sem
Hold the _sstable_deletion_sem while moving sstables from the staging directory
so not to move them under the feet of table::snapshot.

Fixes #5340

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
0446ce712a view_update_generator::start: use variable binding
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
5d7c80c148 view_update_generator::start: fix indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
02784f46b9 view_update_generator: handle errors when processing sstable
Consumer may throw, in this case, break from the loop and retry.

move_sstable_from_staging_in_thread may theoretically throw too,
ignore the error in this case since the sstable was already processed,
individual move failures are already ignored and moving from staging
will be retried upon restart.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
abda12107f sstables: move_to_new_dir: add do_sync_dirs param
To be used for "batch" move of several sstables from staging
to the base directory, allowing the caller to sync the directories
once when all are moved rather than for each one of them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
6efef84185 sstable: return future from move_to_new_dir
distributed_loader::probe_file needlessly creates a seastar
thread for it and the next patch will use it as part of
a parallel_for_each loop to move a list of sstables
(and sync the directories once at the end).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
0d2a7111b2 view_update_generator: sstable_with_table: std::move constructor args
Just a small optimization.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:19:55 +02:00
Nadav Har'El
fc85c49491 alternator: error on unsupported parallel scan
We do not yet support the parallel Scan options (TotalSegments, Segment),
as reported in issue #5059. But even before implementing this feature, it
is important that we produce an error if a user attempts to use it - instead
of outright ignoring this parameter. This is what this patch does.

The patch also adds a full test, test_scan.py::test_scan_parallel, for the
parallel scan feature. The test passes on DynamoDB, and still xfails
on Alternator after this patch - but now the Scan request fails immediately
reporting the unsupported option - instead of what the pre-patch code did:
returning the wrong results and the test failing just when the results
do not match the expectations.

Refs #5059.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191217084917.26191-1-nyh@scylladb.com>
2019-12-17 11:27:56 +02:00
Avi Kivity
f7d69b0428 Revert "Merge "bouncing lwt request to an owning shard" from Gleb"
This reverts commit 64cade15cc, reversing
changes made to 9f62a3538c.

This commit is suspected of corrupting the response stream.

Fixes #5479.
2019-12-17 11:06:10 +02:00
Rafael Ávila de Espíndola
237ba74743 relocatable: Check that patchelf didn't mangle the PT_LOAD headers
Should avoid issue #4983 showing up again.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191213224803.316783-2-espindola@scylladb.com>
2019-12-16 20:18:32 +02:00
Avi Kivity
3b7aca3406 Merge "db: Don't create a reference to nullptr" from Rafael
"
Only the first patch is needed to fix the undefined behavior, but the
followup ones simplify the memory management around user types.
"

* 'espindola/fix-5193-v2' of ssh://github.com/espindola/scylla:
  db: Don't use lw_shared_ptr for user_types_metadata
  user_types_metadata: don't implement enable_lw_shared_from_this
  cql3: pass a const user_types_metadata& to prepare_internal
  db: drop special case for top level UDTs
  db: simplify db::cql_type_parser::parse
  db: Don't create a reference to nullptr
  Add test for loading a schema with a non native type
2019-12-16 17:10:58 +02:00
Konstantin Osipov
d6bc7cae67 cmake: update CMakeLists.txt to scan test/ rather than tests/
A follow up on directory rename.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
e079a04f2a test.py: automatically lookup all unit and boost tests 2019-12-16 17:47:42 +03:00
Konstantin Osipov
1c8736f998 tests: move all test source files to their new locations
1. Move tests to test (using singular seems to be a convention
   in the rest of the code base)
2. Move boost tests to test/boost, other
   (non-boost) unit tests to test/unit, tests which are
   expected to be run manually to test/manual.

Update configure.py and test.py with new paths to tests.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
2fca24e267 tests: move a few remaining headers
Move sstable_test.hh, test_table.hh and cql_assertions.hh from tests/ to
test/lib or test/boost and update dependent .cc files.
Move tests/perf_sstable.hh to test/perf/perf_sstable.hh
2019-12-16 17:47:42 +03:00
Konstantin Osipov
b9bf1fbede tests: move another set of headers to the new test layout
Move another small subset of headers to test/
with the same goals:
- preserve bisectability
- make the revision history traceable after a move

Update dependent files.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
8047d24c48 tests: move .hh files and resources to new locations
The plan is to move the unstructured content of tests/ directory
into the following directories of test/:

test/lib - shared header and source files for unit tests
test/boost - boost unit tests
test/unit - non-boost unit tests
test/manual - tests intended to be run manually
test/resource - binary test resources and configuration files

In order to not break git bisect and preserve the file history,
first move most of the header files and resources.
Update paths to these files in .cc files, which are not moved.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
644595e15f tests: remove executable property from data_listeners_test.cc
Executable flag must be committed to git by mistake.
2019-12-16 17:47:41 +03:00
Benny Halevy
d2e00abe13 tests: commitlog_test: test_allocation_failure: improve error reporting
We're seeing the following error from test from time to time:
  fatal error: in "test_allocation_failure": std::runtime_error: Did not get expected exception from writing too large record

This is not reproducible and the error string does not contain
enough information to figure out what happened exactly, therefore
this patch adds an exception if the call succeeded unexpectedly
and also prints the unexpected exception if one was caught.

Refs #4714

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191215052434.129641-1-bhalevy@scylladb.com>
2019-12-16 15:38:48 +01:00
Asias He
6b7344f6e5 streaming: Fix typo in stream_result_future::maybe_complete
s/progess/progress/

Refs: #5437
2019-12-16 11:12:03 +02:00
Dejan Mircevski
f3883cd935 dbuild: Fix podman invocation (#5481)
The is_podman check was depending on `docker -v` printing "podman" in
the output, but that doesn't actually work, since podman prints $0.
Use `docker --help` instead, which will output "podman".

Also return podman's return status, which was previously being
dropped.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-16 11:11:48 +02:00
Avi Kivity
00ae4af94c Merge "Sanitize and speed-up (a bit) directories set up" from Pavel
"
On start there are two things that scylla does on data/commitlog/etc.
dirs: locks and verifies permissions. Right now these two actions are
managed by different approaches, it's convenient to merge them.

Also the introduced in this set directories class makes a ground for
better --workdir option handling. In particular, right now the db::config
entries are modified after options parse to update directories with
the workdir prefix. With the directories class at hands will be able
to stop doing this.
"

* 'br-directories-cleanup' of https://github.com/xemul/scylla:
  directories: Make internals work on fs::path
  directories: Cleanup adding dirs to the vector to work on
  directories: Drop seastar::async usage
  directories: Do touch_and_lock and verify sequentially
  directories: Do touch_and_lock in parallel
  directories: Move the whole stuff into own .cc file
  directories: Move all the dirs code into .init method
  file_lock: Work with fs::path, not sstring
2019-12-15 16:02:46 +02:00
Takuya ASADA
5e502ccea9 install.sh: setup workdir correctly on nonroot mode
Specify correct workdir on nonroot mode, to set correct path of
data / commitlog / hints directories at once.

Fixes #5475

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191213012755.194145-1-syuu@scylladb.com>
2019-12-15 16:00:57 +02:00
Avi Kivity
c25d51a4ea Revert "scylla_setup: Support for enforcing optimal Linux clocksource setting (#5379)"
This reverts commit 4333b37f9e. It breaks upgrades,
and the user question is not informative enough for the user to make a correct
decision.

Fixes #5478.
Fixes #5480.
2019-12-15 14:37:40 +02:00
Pavel Emelyanov
23a8d32920 directories: Make internals work on fs::path
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
373fcfdb3e directories: Cleanup adding dirs to the vector to work on
The unordered_set is turned into vector since for fs::path
there's no hash() method that's needed for set.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
14437da769 directories: Drop seastar::async usage
Now the only future-able operation remained is the call to
parallel_for_each(), all the rest is non-blocking preparation,
so we can drop the seastar::async and just return the future
from parallel_for_each.

The indendation is now good, as in previous patch is was prepared
just for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
06f4f3e6d8 directories: Do touch_and_lock and verify sequentially
The goal is to drop the seastar::async() usage.

Currently we have two places that return futures -- calls to
parallel_for_each-s.  We can either chain them together or,
since both are working on the same set of directories, chain
actions inside them.

For code simplicity I propose to chain actions.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
8d0c820aa1 directories: Do touch_and_lock in parallel
The list of paths that should be touch-and-locked is already
at hands, this shortens the code and makes it slightly faster
(in theory).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
71a528d404 directories: Move the whole stuff into own .cc file
In order not to pollute the root dir place the code in
utils/ directory, "utils" namespace.

While doing this -- move the touch_and_lock from the
class declaration.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Benny Halevy
9ec98324ed messaging_service: unregister_handler: return rpc unregister_handler future
Now that seastar returns it.

Fixes https://github.com/scylladb/scylla/issues/5228

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191212143214.99328-1-bhalevy@scylladb.com>
2019-12-12 16:38:36 +02:00
Pavel Emelyanov
f2b3c17e66 directories: Move all the dirs code into .init method
The seastar::async usage is tempoarary, added for bisect-safety,
soon it will go away. For this reason the indentation in the
.init method is not "canonical", but is prepared for one-patch
drop of the seastar::async.

The hinted_handoff_enabled arg is there, as it's not just a
parameter on config, it had been parsed in main.cc.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 17:33:11 +03:00
Pavel Emelyanov
82ef2a7730 file_lock: Work with fs::path, not sstring
The main.cc code that converts sstring to fs::path
will be patched soon, the file_desc::open belongs
to seastar and works on sstrings.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 17:32:10 +03:00
Konstantin Osipov
bc482ee666 test.py: remove an unused option
Message-Id: <20191204142622.89920-2-kostja@scylladb.com>
2019-12-12 15:53:35 +02:00
Avi Kivity
64cade15cc Merge "bouncing lwt request to an owning shard" from Gleb
"
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by the transport
code that jumps to a correct shard and re-process incoming message there.
"

* 'gleb/bounce_lwt_request' of github.com:scylladb/seastar-dev:
  lwt: take raw lock for entire cas duration
  lwt: drop invoke_on in paxos_state prepare and accept
  lwt: Process lwt request on a owning shard
  storage_service: move start_native_transport into a thread
  transport: change make_result to takes a reference to cql result instead of shared_ptr
2019-12-12 15:50:22 +02:00
Nadav Har'El
9f62a3538c alternator: fix BEGINS_WITH operator for blobs
The implementation of Expected's BEGINS_WITH operator on blobs was
incorrect, naively comparing the base64-encoded strings, which doesn't
work. This patches fixes the code to compare the decoded strings.

The reason why the BEGINS_WITH test missed this bug was that we forgot
to check the blob case and only tested the string case; So this patch
also adds the missing test - which reproduces this bug, and verifies
its fix.

Fixes #5457

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191211115526.29862-1-nyh@scylladb.com>
2019-12-12 14:02:56 +01:00
Dejan Mircevski
27b8b6fe9d cql3: Fix needs_filtering() for clustering columns
The LIKE operator requires filtering, so needs_filtering() must check
is_LIKE().  This already happens for partition columns, but it was
overlooked for clustering columns in the initial implementation of
LIKE.

Fixes #5400.

Tests: unit(dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-12 01:19:13 +02:00
Benny Halevy
d1bcb39e7f hinted handoff: log message after removing hints directory (#5372)
To be used by dtest as an indicator that endpoint's hints
were drained and hints directory is removed.

Refs #5354

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-12 01:16:19 +02:00
Rafael Ávila de Espíndola
3b61cf3f0b db: Don't use lw_shared_ptr for user_types_metadata
The user_types_metadata can simply be owned by the keyspace. This
simplifies the code since we never have to worry about nulls and the
ownership is now explicit.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
a55838323b user_types_metadata: don't implement enable_lw_shared_from_this
It looks like this was done just to avoid including
user_types_metadata.hh, which seems a bit much considering that it
requires adding specialization to the seastar namespace.

A followup patch will also stop using lw_shared_ptr for
user_types_metadata.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
f7c2c60b07 cql3: pass a const user_types_metadata& to prepare_internal
We never modify the user_types_metadata via prepare_internal, so we
can pass it a const reference.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
99cb8965be db: drop special case for top level UDTs
This was originally done in 7f64a6ec4b,
but that commit was reverted in reverted in
8517eecc28.

The revert was done because the original change would call parse_raw
for non UDT types. Unlike the old patch, this one doesn't change the
behavior of non UDT types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
7ae9955c5f db: simplify db::cql_type_parser::parse
The variant of db::cql_type_parser::parse that has a
user_types_metadata argument was only used from the variant that
didn't. This inlines one in the other.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
2092e1ef6f db: Don't create a reference to nullptr
The user_types variable can be null during db startup since we have to
create types before reading the system table defining user types.

This avoids undefined behavior, but is unlikely that it was causing
more serious problems since the variable is only used when creating
user types and we don't create any until after all system tables are
read, in which case the user_types variable is not null.

Fixes #5193

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
6143941535 Add test for loading a schema with a non native type
This would have found the error with the previous version of the patch
series.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:43:34 -08:00
Gleb Natapov
64cfb9b1f6 lwt: take raw lock for entire cas duration
It will prevent parallel update by the same coordinator and should
reduce contention.
2019-12-11 14:41:31 +02:00
Gleb Natapov
898d2330a2 lwt: drop invoke_on in paxos_state prepare and accept
Since lwt requests are now running on an owning shard there is no longer
a need to invoke cross shard call.
2019-12-11 14:41:31 +02:00
Gleb Natapov
964c532c4f lwt: Process lwt request on a owning shard
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by transport code
that jumps to a correct shard and re-process incoming message there.
2019-12-11 14:41:31 +02:00
Gleb Natapov
54be057af3 storage_service: move start_native_transport into a thread
The code runs only once and it is simple if it runs in a seastar thread.
2019-12-11 14:41:31 +02:00
Gleb Natapov
007ba3e38e transport: change make_result to takes a reference to cql result instead of shared_ptr 2019-12-11 14:41:31 +02:00
Nadav Har'El
9e5c6995a3 alternator-test: add tests for ReturnValues parameter
This patch adds comprehensive tests for the ReturnValue parameter of
the write operations (PutItem, UpdateItem, DeleteItem), which can return
pre-write or post-write values of the modified item. The tests are in
a new test file, alternator-test/test_returnvalues.py.

This feature is not yet implemented in Alternator, so all the new
tests xfail on Alternator (and all pass on AWS).

Refs #5053

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191127163735.19499-1-nyh@scylladb.com>
2019-12-11 13:26:39 +01:00
Nadav Har'El
ab69bfc111 alternator-test: add xfailing tests for ScanIndexForward
This patch adds tests for Query's "ScanIndexForward" parameter, which
can be used to return items in reversed sort order.
We test that a Limit works and returns the given number of *last* items
in the sort order, and also that such reverse queries can be resumed,
i.e., paging works in the reverse order.

These tests pass against AWS DynamoDB, but fail against Alternator (which
doesn't support ScanIndexForward yet), so it is marked xfail.

Refs #5153.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191127114657.14953-1-nyh@scylladb.com>
2019-12-11 13:26:39 +01:00
Pekka Enberg
6bc18ba713 storage_proxy: Remove reference to MBean interface
The JMX interface is implemented by the scylla-jmx project, not scylla.
Therefore, let's remove this historical reference to MBeans from
storage_proxy.

Message-Id: <20191211121652.22461-1-penberg@scylladb.com>
2019-12-11 14:24:28 +02:00
Avi Kivity
63474a3380 Merge "Add experimental_features option" from Dejan
"
Add --experimental-features -- a vector of features to unlock. Make corresponding changes in the YAML parser.

Fixes #5338
"

* 'vecexper' of https://github.com/dekimir/scylla:
  config: Add `experimental_features` option
  utils: Add enum_option
2019-12-11 14:23:08 +02:00
Avi Kivity
56b9bdc90f Update seastar submodule
* seastar e440e831c8...00da4c8760 (7):
  > Merge "reactor: fix iocb pool underflow due to unaccounted aio fsync" from Avi
Fixes #5443.
  > install-dependencies.sh: fix arch dependencies
  > Merge " rpc: fix use-after-free during rpc teardown vs. rpc server message handling" from Benny
  > Merge "testing: improve the observability of abandoned failed futures" from Botond
  > rework the fair_queue tester
  > directory_test: Update to use run instead of run_deprecated
  > log: support fmt 6.0 branch with chrono.h for log
2019-12-11 14:17:49 +02:00
Benny Halevy
105c8ef5a9 messaging_service: wait on unregister_handler
Prepare for returning future<> from seastar rpc
unregister_handler.

Refs https://github.com/scylladb/scylla/issues/5228

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191208153924.1953-1-bhalevy@scylladb.com>
2019-12-11 14:17:41 +02:00
Nadav Har'El
06c3802a1a storage_proxy: avoid overflow in view-backlog delay calculation
In the calculate_delay() code for view-backlog flow control, we calculate
a delay and cap it at a "budget" - the remaining timeout. This timeout is
measured in milliseconds, but the capping calculation converted it into
microseconds, which overflowed if the timeout is very large. This causes
some tests which enable the UB sanitizer to fail.

We fix this problem by comparing the delay to the budget in millisecond
resolution, not in microsecond resolution. Then, if the calculated delay
is short enough, we return it using its full microsecond resolution.

Fixes #5412

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191205131130.16793-1-nyh@scylladb.com>
2019-12-11 14:10:54 +02:00
Nadav Har'El
2824d8f6aa Merge: alternator: Fix EQ operator for sets
Merged pull request https://github.com/scylladb/scylla/pull/5453
from Piotr Sarna:

Checking the EQ relation for alternator attributes is usually performed
simply by comparing underlying JSON objects, but sets (SS, BS, NS types)
need a special routine, as we need to make sure that sets stored in
a different order underneath are still equal, e.g:

[1, 3, 2] == [1, 2, 3]

Fixes #5021
2019-12-11 13:20:25 +02:00
Piotr Sarna
421db1dc9d alternator-test: remove XFAIL from set EQ test
With this series merged, test_update_expected_1_eq_set from
test_expected.py suite starts passing.
2019-12-11 12:07:39 +01:00
Piotr Sarna
a8e45683cb alternator: add EQ comparison for sets
Checking the EQ relation for alternator attributes is usually performed
simply by comparing underlying JSON objects, but sets (SS, BS, NS types)
need a special routine, as we need to make sure that sets stored in
a different order underneath are still equal, e.g:
[1, 3, 2] == [1, 2, 3]

Fixes #5021
2019-12-11 12:07:39 +01:00
Piotr Sarna
fb37394995 schema_tables: notify table deletions before creations
If a set of mutations contains both an entry that deletes a table
and an entry that adds a table with the same name, it's expected
to be a replacement operation (delete old + create new),
rather than a useless "try to create a table even though it exists
already and then immediately delete the original one" operation.
As such, notifications about the deletions should be performed
before notifications about the creations. The place that originally
suffered from this wrong order is view building - which in this case
created an incorrect duplicated entry in the view building bookkeeping,
and then immediately deleted it, resulting in having old, deprecated
entries with stale UUIDS lying in the build queue and never proceeding,
because the underlying table is long gone.
The issue is fixed by ensuring the order of notifications:
 - drops are announced first, view drops are announced before table drops;
 - creations follow, table creations are announced before views;
 - finally, changes to tables and views are announced;

Fixes #4382

Tests: unit(dev), mv_populating_from_existing_data_during_node_stop_test
2019-12-11 12:48:29 +02:00
Benny Halevy
d544df6c3c dist/ami/build_ami.sh: support incremental build of rpms (#5191)
Iterate over an array holding all rpm names to see if any
of them is missing from `dist/ami/files`. If they are missing,
look them up in build/redhat/RPMS/x86_64 so that if reloc/build_rpm.sh
was run manually before dist/ami/build_ami.sh we can just collect
the built rpms from its output dir.

If we're still missing any rpms, then run reloc/build_rpm.sh
and copy the required rpms from build/redhat/RPMS/x86_64.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Reviewed-by: Glauber Costa <glauber@scylladb.com>
2019-12-11 12:48:29 +02:00
Amnon Heiman
f43285f39a api: replace swagger definition to use long instead of int (#5380)
In swagger 1.2 int is defined as int32.

We originally used int following the jmx definition, in practice
internally we use uint and int64 in many places.

While the API format the type correctly, an external system that uses
swagger-based code generator can face a type issue problem.

This patch replace all use of int in a return type with long that is defined as int64.

Changing the return type, have no impact on the system, but it does help
external systems that use code generator from swagger.

Fixes #5347

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-12-11 12:48:29 +02:00
Nadav Har'El
2abac32f2e Merged: alternator: Implement CONTAINS and NOT_CONTAINS in Expected
Merged pull request https://github.com/scylladb/scylla/pull/5447
by Dejan Mircevski.

Adds the last missing operators in the "Expected" parameter and re-enable
their tests.

Fixes #5034.
2019-12-11 12:48:29 +02:00
Cem Sancak
86b8036502 Fix DPDK mode in prepare script
Fixes #5455.
2019-12-11 12:48:29 +02:00
Calle Wilund
35089da983 conf/config: Add better descriptive text on server/client encryption
Provide some explanation on prio strings + direction to gnutls manual.
Document client auth option.
Remove confusing/misleading statement on "custom options"

Message-Id: <20191210123714.12278-1-calle@scylladb.com>
2019-12-11 12:48:28 +02:00
Dejan Mircevski
32af150f1d alternator: Implement NOT_CONTAINS operator in Expected
Enable existing NOT_CONTAINS test, add NOT_CONTAINS to the list of
recognized operators, implement check_NOT_CONTAINS, and hook it up to
verify_expected_one().

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-10 15:31:47 -05:00
Dejan Mircevski
bd2bd3c7c8 alternator: Implement CONTAINS operator in Expected
Enable existing CONTAINS test, implement check_CONTAINS, and hook it
up to verify_expected_one().

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-10 15:31:47 -05:00
Dejan Mircevski
5a56fd384c config: Add experimental_features option
When the user wants to turn on only some experimental features, they
can use this new option.  The existing `experimental` option is
preserved for backwards compatibility.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-10 11:47:03 -05:00
Piotr Sarna
9504bbf5a4 alternator: move unwrap_set to serialization header
The utility function for unwrapping a set is going to be useful
across source files, so it's moved to serialization.hh/serialization.cc.
2019-12-10 15:08:47 +01:00
Piotr Sarna
4660e58088 alternator: move rjson value comparison to rjson.hh
The comparison struct is going to be useful across source files,
so it's moved into rjson header, where it conceptually belongs anyway.
2019-12-10 15:08:47 +01:00
Botond Dénes
db0e2d8f90 scylla-gdb.py: document and add safety net to seastar::thread related commands
Almost all commands provided by `scylla-gdb.py` are safe to use. The
worst that could happen if they fail is that you won't get the desired
information. There is one notable exception: `scylla thread`. If
anything goes wrong while this command is executed - gdb crashes, a bug
in the command, etc. - there is a good change the process under
examination will crash. Sometimes this is fine, but other times e.g.
when live debugging a production node, this is unacceptable.
To avoid any accidents add documentation to all commands working with
`seastar::thread`. And since most people don't read documentation,
especially when debugging under pressure, add a safety net to the
`scylla thread` command. When run, this command will now warn of the
dangers and will ask for explicit acknowledgment of the risk of crash,
by means of passing an `--iamsure` flag. When this flag is missing, it
will refuse to run. I am sure this will be very annoying but I am also
sure that the avoided crashes are worth it.

As part of making `scylla thread` safe, its argument parsing code is
migrated to `argparse`. This changes the usage but this should be fine
because it is well documented.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191129092838.390878-1-bdenes@scylladb.com>
2019-12-10 11:51:57 +02:00
Eliran Sinvani
765db5d14f build_ami: Trim ami description attribute to the allowed size
The ami description attribute is only allowed to be 255
characters long. When build_ami.sh generates an ami, it
generates an ami description which is a concatenation
of all of the componnents version strings. It can
happen that the description string is too long which
eventually causes the ami build to fail. This patch
trims the description string to 255 characters.
It is ok since the individual versions of the components
are also saved in tags attached to the image.

Tests:
 1. Reproduced with a long description and
    validated that it doesn't fail after the fix.

Fixes #5435

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20191209141143.28893-1-eliransin@scylladb.com>
2019-12-10 11:51:57 +02:00
Fabiano Lucchese
4333b37f9e scylla_setup: Support for enforcing optimal Linux clocksource setting (#5379)
A Linux machine typically has multiple clocksources with distinct
performances. Setting a high-performant clocksource might result in
better performance for ScyllaDB, so this should be considered whenever
starting it up.

This patch introduces the possibility of enforcing optimized Linux
clocksource to Scylla's setup/start-up processes. It does so by adding
an interactive question about enforcing clocksource setting to scylla_setup,
which modifies the parameter "CLOCKSOURCE" in scylla_server configuration
file. This parameter is read by perftune.py which, if set to "yes", proceeds
to (non persistently) setting the clocksource. On x86, TSC clocksource is
used.

Fixes #4474
2019-12-10 11:51:57 +02:00
Pavel Emelyanov
3a21419fdb features: Remove _FEATURE suffix from hinted_handoff feature name
All the other features are named w/o one. The internal const-s
are all different, but I'm fixing it separately.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191209154310.21649-1-xemul@scylladb.com>
2019-12-10 11:51:57 +02:00
Dejan Mircevski
a26bd9b847 utils: Add enum_option
This allows us to accept command-line options with a predefined set of
valid arguments.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-09 09:45:59 -05:00
Calle Wilund
7c5e4c527d cdc_test: Add small test for altering base schema (add column) 2019-12-09 14:35:04 +00:00
Calle Wilund
cb0117eb44 cdc: Handle schema changes via migration manager callbacks
This allows us to create/alter/drop log and desc tables "atomically"
with the base, by including these mutations in the original mutation
set, i.e. batch create/alter tables.

Note that population does not happen until types are actually
already put into database (duh), thus there _is_ still a gap
between creating cdc and it being truly usable. This may or may
not need handling later.
2019-12-09 14:35:04 +00:00
Rafael Ávila de Espíndola
761b19cee5 build: Split the build and host linker flags
A general build system knows about 3 machines:

* build: where the building is running
* host: where the built software will run
* target: the machine the software will produce code for

The target machine is only relevant for compilers, so we can ignore
it.

Until now we could ignore the build and host distinction too. This
patch adds the first difference: don't use host ld_flags when linking
build tools (gen_crc_combine_table).

The reason for this change is to make it possible to build with
-Wl,--dynamic-linker pointing to a path that will exist on the host
machine, but may not exist on the build machine.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191207030408.987508-1-espindola@scylladb.com>
2019-12-09 15:54:57 +02:00
Calle Wilund
27183f648d migration_manager: Invoke "before" callbacks for table operations
Potentially allowing (cdc) augmentation of mutations.

Note: only does the listener part in seastar::thread, to avoid
changing call behaviour.
2019-12-09 12:12:09 +00:00
Calle Wilund
f78a3bf656 migration_listener: Add empty base class and "before" callbacks for tables
Empty base type makes for less boiler plate in implementations.
The "before" callbacks are for listeners who need to potentially
react/augment type creation/alteration _before_ actually
committing type to schema tables (and holding the semaphore for this).

I.e. it is for cdc to add/modify log/desc tables "atomically" with base.
2019-12-09 12:12:09 +00:00
Calle Wilund
4e406105b1 cql_test_env: Include cdc service in cql tests 2019-12-09 12:12:09 +00:00
Calle Wilund
a21e140169 cdc: Add sharded service that does nothing.
But can be used to hang functionality into eventually.
2019-12-09 12:12:09 +00:00
Calle Wilund
2787b0c4f8 cdc: Move "options" to separate header to avoid to much header inclusion
cdc should not contaminate the whole universe.
2019-12-09 12:12:09 +00:00
fastio
8f326b28f4 Redis: Combine all the source files redis/commands/* into redis/commands.{hh,cc}
Fixes: #5394

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-12-08 13:54:33 +02:00
Avi Kivity
9c63cd8da5 sysctl: reduce kernel tendency to swap anonymous pages relative to page cache (#5417)
The vm.swappiness sysctl controls the kernel's prefernce for swapping
anonymous memory vs page cache. Since Scylla uses very large amounts
of anonymous memory, and tiny amounts of page cache, the correct setting
is to prefer swapping page cache. If the kernel swaps anonymous memory
the reactor will stall until the page fault is satisfied. On the other
hand, page cache pages usually belong to other applications, usually
backup processes that read Scylla files.

This setting has been used in production in Scylla Cloud for a while
with good results.

Users can opt out by not installing the scylla-kernel-conf package
(same as with the other kernel tunables).
2019-12-08 13:04:25 +02:00
Avi Kivity
0e319e0359 Update seastar submodule
* seastar 166061da3...e440e831c (8):
  > Fail tests on ubsan errors
  > future: make a couple of asserts more strict
  > future: Move make_ready out of line
  > config: Do not allow zero rates
Fixes #5360
  > future: add new state to avoid temporaries in get_available_state().
  > future: avoid temporary future_state on get_available_state().
  > future: inline future::abandoned
  > noncopyable_function: Avoid uninitialized warning on empty types
2019-12-06 18:33:23 +02:00
Piotr Sarna
0718ff5133 Merge 'min/max on collections returns human-readable result' from Juliusz
Previously, scylla used min/max(blob)->blob overload for collections,
tuples and UDTs; effectively making the results being printed as blobs.
This PR adds "dynamically"-typed min()/max() functions for compound types.

These types can be complicated, like map<int,set<tuple<..., and created
in runtime, so functions for them are created on-demand,
similarly to tojson(). The comparison remains unchanged - underneath
this is still byte-by-byte weak lex ordering.

Fixes #5139

* jul-stas/5139-minmax-bad-printing-collections:
  cql_query_tests: Added tests for min/max/count on collections
  cql3: min()/max() for collections/tuples/UDTs do not cast to blobs
2019-12-06 16:40:17 +01:00
Juliusz Stasiewicz
75955beb0b cql_query_tests: Added tests for min/max/count on collections
This tests new min/max function for collections and tuples. CFs
in test suite were named according to types being tested, e.g.
`cf_map<int,text>' what is not a valid CF name. Therefore, these
names required "escaping" of invalid characters, here: simply
replacing with '_'.
2019-12-06 12:15:49 +01:00
Juliusz Stasiewicz
9efad36fb8 cql3: min()/max() for collections/tuples/UDTs do not cast to blobs
Before:
cqlsh> insert into ks.list_types (id, val) values (1, [3,4,5]);
cqlsh> select max(val) from ks.list_types;

 system.max(val)
------------------------------------------------------------
 0x00000003000000040000000300000004000000040000000400000005

After:
cqlsh> select max(val) from ks.list_types;

 system.max(val)
--------------------
 [3, 4, 5]

This is accomplished similarly to `tojson()`/`fromjson()`: functions
are generated on demand from within `cql3::functions::get()`.
Because collections can have a variety of types, including UDTs
and tuples, it would be impossible to statically define max(T t)->T
for every T. Until now, max(blob)->blob overload was used.

Because `impl_max/min_function_for` is templated with the
input/output type, which can be defined in runtime, we need type-erased
("dynamic") versions of these functors. They work identically, i.e.
they compare byte representations of lhs and rhs with
`bytes::operator<`.

Resolves #5139
2019-12-06 12:14:51 +01:00
Avi Kivity
a18a921308 docs: maintainer.md: use command line to merge multi-commit pull requests
If you merge a pull request that contains multiple patches via
the github interface, it will document itself as the committer.

Work around this brain damage by using the command line.
2019-12-06 10:59:46 +01:00
Botond Dénes
7b37a700e1 configure.py: make tests explicitely depend on libseastar_testing.a
So that changes to libseastar_testing.a make all test target out of
date.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191205142436.560823-1-bdenes@scylladb.com>
2019-12-05 19:30:34 +02:00
Piotr Sarna
3a46b1bb2b Merge "handle hints on separate connection and scheduling group" from Piotr
Introduce a new verb dedicated for receiving and sending hints: HINT_MUTATION. It is handled on the streaming connection, which is separate from the one used for handling mutations sent by coordinator during a write.

The intent of using a separate connection is to increase fairness while handling hints and user requests - this way, a situation can be avoided in which one type of requests saturate the connection, negatively impacting the other one.

Information about new RPC support is propagated through new gossip feature HINTED_HANDOFF_SEPARATE_CONNECTION.

Fixes #4974.

Tests: unit(release)
2019-12-05 17:25:26 +01:00
Calle Wilund
c11874d851 gms::inet_address: Use special ostream formatting to match Java
To make gms::inet_address::to_string() similar in output to origin.
The sole purpose being quick and easy fix of API/JMX ipv6
formatting of endpoints etc, where strings are used as lexical
comparisons instead of textual representation.

A better, but more work, solution is to fix the scylla-jmx
bridge to do explicit parse + re-format of addresses, but there
are many such callpoints.

An even better solution would be to fix nodetool to not make this
mistake of doing lexical comparisons, but then we risk breaking
merge compatibility. But could be an option for a separate
nodeprobe impl.

Message-Id: <20191204135319.1142-1-calle@scylladb.com>
2019-12-05 17:01:26 +02:00
Gleb Natapov
4893bc9139 tracing: split adding prepared query parameters from stopping of a trace
Currently query_options objects is passed to a trace stopping function
which makes it mandatory to make them alive until the end of the
query. The reason for that is to add prepared statement parameters to
the trace.  All other query options that we want to put in the trace are
copied into trace_state::params_values, so lets copy prepared statement
parameters there too. Trace enabled case will become a little bit more
expensive but on the other hand we can drop a continuation that holds
query_options object alive from a fast path. It is safe to drop the call
to stop_foreground_prepared() here since The tracing will be stopped
in process_request_one().

Message-Id: <20191205102026.GJ9084@scylladb.com>
2019-12-05 17:00:47 +02:00
Tomasz Grabiec
aa173898d6 Merge "Named semaphores in concurrency reader, segment_manager and region_group" from Juliusz
Selected semaphores' names are now included in exception messages in
case of timeout or when admission queue overflows.

Resolves #5281
2019-12-05 14:19:56 +01:00
Nadav Har'El
5b2f35a21a Merge "Redis: fix the options related to Redis API, fix the DEL and GET command"
Merged pull request https://github.com/scylladb/scylla/pull/5381 by
Peng Jian, fixing multiple small issues with Redis:

* Rename the options related to Redis API, and describe them clearly.
* Rename redis_transport_port to redis_port
* Rename redis_transport_port_ssl to redis_ssl_port
* Rename redis_default_database_count to redis_database_count
* Remove unnecessary option enable_redis_protocol
* Modify the default value of opition redis_read_consistency_level and redis_write_consistency_level to LOCAL_QUORUM

* Fix the DEL command: support to delete mutilple keys in one command.

* Fix the GET command: return the empty string when the required key is not exists.

* Fix the redis-test/test_del_non_existent_key: mark xfail.
2019-12-05 11:58:34 +02:00
Avi Kivity
85822c7786 database: fix schema use-after-move in make_multishard_streaming_reader
On aarch64, asan detected a use-after-move. It doesn't happen on x86_64,
likely due to different argument evaluation order.

Fix by evaluating full_slice before moving the schema.

Note: I used "auto&&" and "std::move()" even though full_slice()
returns a reference. I think this is safer in case full_slice()
changes, and works just as well with a reference.

Fixes #5419.
2019-12-05 11:58:34 +02:00
Piotr Sarna
79c3a508f4 table: Reduce read amplification in view update generation
This commit makes sure that single-partition readers for
read-before-write do not have fast-forwarding enabled,
as it may lead to huge read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs #5409
Fixes #4615
Fixes #5418
2019-12-05 11:58:34 +02:00
Konstantin Osipov
6a5e7c0e22 tests: reduce the number of iterations of dynamic_bitset_test
This test execution time dominates by a serious margin
test execution time in dev/release mode: reducing its
execution time improves the test.py turnaround by over 70%.

Message-Id: <20191204135315.86374-2-kostja@scylladb.com>
2019-12-05 11:58:34 +02:00
Avi Kivity
07427c89a2 gdb: change 'scylla thread' command to access fs_base register directly
Currently, 'scylla thread' uses arch_prctl() to extract the value of
fsbase, used to reference thread local variables. gdb 8 added support
for directly accessing the value as $fs_base, so use that instead. This
works from core dumps as well as live processes, as you don't need to
execute inferior functions.

The patch is required for debugging threads in core dumps, but not
sufficient, as we still need to set $rip and $rsp, and gdb still[1]
doesn't allow this.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=9370
2019-12-05 11:58:34 +02:00
Piotr Dulikowski
adfa7d7b8d messaging_service: don't move unsigned values in handlers
Performing std::move on integral types is pointless. This commit gets
rid of moves of values of `unsigned` type in rpc handlers.
2019-12-05 00:58:31 +01:00
Piotr Dulikowski
77d2ceaeba storage_proxy: handle hints through separate rpc verb 2019-12-05 00:51:52 +01:00
Piotr Dulikowski
2609065090 storage_proxy: move register_mutation handler to local lambda
This refactor makes it possible to reuse the lambda in following
commits.
2019-12-05 00:51:52 +01:00
Piotr Dulikowski
6198ee2735 hh: introduce HINTED_HANDOFF_SEPARATE_CONNECTION feature
The feature introduced by this commit declares that hints can be sent
using the new dedicated RPC verb. Before using the new verb, nodes need
to know if other nodes in the cluster will be able to handle the new
RPC verb.
2019-12-05 00:51:52 +01:00
Piotr Dulikowski
2e802ca650 hh: add HINT_MUTATION verb
Introduce a new verb dedicated for receiving and sending hints:
HINT_MUTATION. It is handled on the streaming connection, which is
separate from the one used for handling mutations sent by coordinator
during a write.

The intent of using a separate connection is to increase fariness while
handling hints and user requests - this way, a situation can be avoided
in which one type of requests saturate the connection, negatively
impacting the other one.
2019-12-05 00:51:49 +01:00
Avi Kivity
fd951a36e3 Merge "Let compaction wait on background deletions" from Benny
"
In several cases in distributed testing (dtest) we trigger compaction using nodetool compact assuming that when it is done, it is indeed really done.
However, the way compaction is currently implemented in scylla, it may leave behind some background tasks to delete the old sstables that were compacted.

This commit changes major compaction (triggered via the ss::force_keyspace_compaction api) so it would wait on the background deletes and will return only when they finish.

Fixes #4909

Tests: unit(dev), nodetool_refresh_with_data_perms_test, test_nodetool_snapshot_during_major_compaction
"
2019-12-04 11:18:41 +02:00
Takuya ASADA
c9d8606786 dist/common/scripts/scylla_ntp_setup: relax RHEL version check
We may able to use chrony setup script on future version of RHEL/CentOS,
it better to run chrony setup when RHEL version >= 8, not only 8.

Note that on Fedora it still provides ntp/ntpdate package, so we run
ntp setup on it for now. (same on debian variants)

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191203192812.5861-1-syuu@scylladb.com>
2019-12-04 10:59:14 +02:00
Juliusz Stasiewicz
430b2ad19d commitlog+region_group: timeout exceptions with names
`segment_manager' now uses a decorated version of `timed_out_error'
with hardcoded name. On the other hand `region_group' uses named
`on_request_expiry' within its `expiring_fifo'.
2019-12-03 19:07:19 +01:00
Avi Kivity
91d3f2afce docs: maintainers.md: fix typo in git push --force-with-lease
Just one lease, not many.

Reported by Piotr Sarna.
2019-12-03 18:17:46 +01:00
Calle Wilund
56a5e0a251 commitlog_replayer: Ensure applied frozen_mutation is safe during apply
Fixes #5211

In 79935df959 replay apply-call was
changed from one with no continuation to one with. But the frozen
mutation arg was still just lambda local.

Change to use do_with for this case as well.

Message-Id: <20191203162606.1664-1-calle@scylladb.com>
2019-12-03 18:28:01 +02:00
Juliusz Stasiewicz
d043393f52 db+semaphores+tests: mandatory `name' param in reader_concurrency_semaphore
Exception messages contain semaphore's name (provided in ctor).
This affects the queue overflow exception as well as timeout
exception. Also, custom throwing function in ctor was changed
to `prethrow_action', i.e. metrics can still be updated there but
now callers have no control over the type of the exception being
thrown. This affected `restricted_reader_max_queue_length' test.
`reader_concurrency_semaphore'-s docs are updated accordingly.
2019-12-03 15:41:34 +01:00
Amos Kong
e26b396f16 scylla-docker: fix default data_directories in scyllasetup.py (#5399)
Use default data_file_directories if it's not assigned in scylla.yaml

Fixes #5398

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-12-03 13:58:17 +02:00
Rafael Ávila de Espíndola
1cd17887fa build: strip debug when configured with --debuginfo 0
In a build configured with --debuginfo 0 the scylla binary still ends
up with some debug info from the libraries that are statically linked
in.

We should avoid compiling subprojects (including seastar) with debug
info when none is needed, but this at least avoids it showing up in
the binary.

The main motivation for this is that it is confusing to get a binary
with *some* debug info in it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191127215843.44992-1-espindola@scylladb.com>
2019-12-03 12:41:04 +02:00
Tomasz Grabiec
0a453e5d30 Merge "Use fragmented buffers for collection de/serialization" from Botond
This series refactors the collection de/serialization code to use
fragmented buffers, avoiding the large allocations and the associated
pains when working with large collections. Currently all operations that
involve collections require deserializing them, executing the operation,
then serializing them again to their internal storage format. The
de/serialization operations happen in linearized buffers, which means
that we have to allocate a buffer large enough to hold the *entire*
collection. This can cause immense pressure on the memory allocator,
which, in the face of memory fragmentation, might be unable to serve the
allocation at all. We've seen this causing all sorts of nasty problems,
including but not limited to: failing compactions, failing memtable
flush, OOM crash and etc.

Users are strongly discouraged from using large collections, yet they
are still a fact of life and have been haunting us since forever.

The proper solution for these problems would be to come up with an
in-memory format for collections, however that is a major effort, with a
lot of unknowns. This is something we plan on doing at some point but
until it happens we should make life less painful for those with large
collections.

The goal of this series is to avoid the need of allocating these large
buffers. Serialization now happens into a `bytes_ostream` which
automatically fragments the values internally. Deserialization happens
with `utils::linearizing_input_stream` (introduced by this series), which
linearizes only the individual collection cells, but not the entire
collection.
An important goal of this series was to introduce the least amount of
risk, and hence the least amount of code. This series does not try to
make a revolution and completely revamp and optimize the
de/serialization codepaths. These codepaths have their days numbered so
investing a lot of effort into them is in vain. We can apply incremental
optimizations where we deem it necessary.

Fixes: #5341
2019-12-03 10:31:34 +01:00
fastio
01599ffbae Redis API: Support the syntax of deleting multiple keys in one DEL command, fix the returning value for GET command.
Support to delete multiple keys in one DEL command.
The feature of returning number of the really deleted keys is still not supported.
Return empty string to client for GET command when the required key is not exists.

Fixes: #5334

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-12-03 17:27:40 +08:00
fastio
039b83ad3b Redis API: Rename options related to Redis API, describe them clearly, and remove unnecessary one.
Rename option redis_transport_port to redis_port, which the redis transport listens on for clients.
Rename option redis_transport_port_ssl to redis_ssl_port, which the redis TLS transport listens on for clients.
Rename option redis_database_count. Set the redis dabase count.
Rename option redis_keyspace_opitons to redis_keyspace_replication_strategy_options. Set the replication strategy for redis keyspace.
Remove option enable_redis_protocol, which is unnecessary.

Fixes: #5335

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-12-03 17:13:35 +08:00
Nadav Har'El
7b93360c8d Merge: redis: skip processing request of EOF
Merged pull request https://github.com/scylladb/scylla/pull/5393/ by
Amos Kong:
`
When I test the redis cmd by echo and nc, there is a redundant error in the end.
I checked by strace, currently if client read nothing from stdin, it will
shutdown the socket, redis server will read nothing (0 byte) from socket. But
it tries to process the empty command and returns an error.

$ echo -n -e '*1\r\n$4\r\nping\r\n' |strace nc localhost 6379
| ...
|    read(0, "*1\r\n$4\r\nping\r\n", 8192)   = 14
|    select(5, [4], [4], [], NULL)           = 1 (out [4])
|>>> sendto(4, "*1\r\n$4\r\nping\r\n", 14, 0, NULL, 0) = 14
|    select(5, [0 4], [], [], NULL)          = 1 (in [0])
|    recvfrom(0, 0x7ffe4d5b6c70, 8192, 0, 0x7ffe4d5b6bf0, 0x7ffe4d5b6bec) = -1 ENOTSOCK (Socket operation on non-socket)
|    read(0, "", 8192)                       = 0
|>>> shutdown(4, SHUT_WR)                    = 0
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "+PONG\r\n-ERR unknown command ''\r\n", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 32
|    write(1, "+PONG\r\n-ERR unknown command ''\r\n", 32+PONG
|    -ERR unknown command ''
|    ) = 32
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 0
|    close(1)                                = 0
|    close(4)                                = 0

Current result:
  $ echo -n -e '' |nc localhost 6379
  -ERR unknown command ''
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG
  -ERR unknown command ''

Expected:
  $ echo -n -e '' |nc localhost 6379
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG
2019-12-03 10:40:20 +02:00
Avi Kivity
83feb9ea77 tools: toolchain: update frozen image
Commit 96009881d8 added diffutils to the dependencies via
Seastar's install-dependencies.sh, after it was inadvertantly
dropped in 1164ff5329 (update to Fedora 31; diffutils is no
longer brought in as a side effect of something else).

Regenerate the image to include diffutils.

Ref #5401.
2019-12-03 10:36:55 +02:00
Amos Kong
fb9af2a86b redis-test: add test_raw_cmd.py
This patch added subtests for EOF process, it reads and writes the socket
directly by using protocol cmds.

We can add more tests in future, tests with Redis module will hide some
protocol error.

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-12-03 10:47:56 +08:00
Amos Kong
4fa862adf4 redis: skip processing request of EOF
When I test the redis cmd by echo and nc, there is a redundant error in the end.
I checked by strace, currently if client read nothing from stdin, it will
shutdown the socket, redis server will read nothing (0 byte) from socket. But
it tries to process the empty command and returns an error.

$ echo -n -e '*1\r\n$4\r\nping\r\n' |strace nc localhost 6379
| ...
|    read(0, "*1\r\n$4\r\nping\r\n", 8192)   = 14
|    select(5, [4], [4], [], NULL)           = 1 (out [4])
|>>> sendto(4, "*1\r\n$4\r\nping\r\n", 14, 0, NULL, 0) = 14
|    select(5, [0 4], [], [], NULL)          = 1 (in [0])
|    recvfrom(0, 0x7ffe4d5b6c70, 8192, 0, 0x7ffe4d5b6bf0, 0x7ffe4d5b6bec) = -1 ENOTSOCK (Socket operation on non-socket)
|    read(0, "", 8192)                       = 0
|>>> shutdown(4, SHUT_WR)                    = 0
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "+PONG\r\n-ERR unknown command ''\r\n", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 32
|    write(1, "+PONG\r\n-ERR unknown command ''\r\n", 32+PONG
|    -ERR unknown command ''
|    ) = 32
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 0
|    close(1)                                = 0
|    close(4)                                = 0

Current result:
  $ echo -n -e '' |nc localhost 6379
  -ERR unknown command ''
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG
  -ERR unknown command ''

Expected:
  $ echo -n -e '' |nc localhost 6379
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-12-03 10:47:56 +08:00
Rafael Ávila de Espíndola
bb114de023 dbuild: Fix confusion about relabeling
podman needs to relabel directories in exactly the same cases docker
does. The difference is that podman cannot relabel /tmp.

The reason it was working before is that in practice anyone using
dbuild has already relabeled any directories that need relabeling,
with the exception of /tmp, since it is recreated on every boot.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191201235614.10511-2-espindola@scylladb.com>
2019-12-02 18:38:16 +02:00
Rafael Ávila de Espíndola
867cdbda28 dbuild: Use a temporary directory for /tmp
With this we don't have to use --security-opt label=disable.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191201235614.10511-1-espindola@scylladb.com>
2019-12-02 18:38:14 +02:00
Botond Dénes
1d1f8b0d82 tests: mutation_test: add large collection allocation test
Checking that there are no large allocations when a large collection is
de/serialized.
2019-12-02 17:13:53 +02:00
Avi Kivity
28355af134 docs: add maintainer's handbook (#5396)
This is a list of recipes used by maintainers to maintain
scylla.git.
2019-12-02 15:01:54 +02:00
Calle Wilund
8c6d6254cf cdc: Remove some code from header 2019-12-02 13:00:19 +00:00
Botond Dénes
4c59487502 collection_mutation: don't linearize the buffer on deserialization
Use `utils::linearizing_input_stream` for the deserizalization of the
collection. Allows for avoiding the linearization of the entire cell
value, instead only linearizing individual values as they are
deserialized from the buffer.
2019-12-02 10:10:31 +02:00
Botond Dénes
690e9d2b44 utils: introduce linearizing_input_stream
`linearizing_input_stream` allows transparently reading linearized
values from a fragmented buffer. This is done by linearizing on-the-fly
only those read values that happen to be split across multiple
fragments. This reduces the size of the largest allocation from the size
of the entire buffer (when the entire buffer is linearized) to the size
of the largest read value. This is a huge gain when the buffer contains
loads of small objects, and modest gains when the buffer contains few
large objects. But the even in the worst case the size of the largest
allocation will be less or equal compared to the case where the entire
buffer is linearized.

This stream is planned to be used as glue code between the fragmented
cell value and the collection deserialization code which expects to be
reading linearized values.
2019-12-02 10:10:31 +02:00
Botond Dénes
065d8d37eb tests: random-utils: get_string(): add overload that takes engine parameter 2019-12-02 10:10:31 +02:00
Botond Dénes
2f9307c973 collection_mutation: use a fragmented buffer for serialization
For the serialization `bytes_ostream` is used.
2019-12-02 10:10:31 +02:00
Botond Dénes
fc5b096f73 imr: value_writer::write_to_destination(): don't dereference chunk iterator eagerly
Currently the loop which writes the data from the fragmented origin to
the destination, moves to the next chunk eagerly after writing the value
of the current chunk, if the current chunk is exhausted.
This presents a problem when we are writing the last piece of data from
the last chunk, as the chunk will be exhausted and we eagerly attempt to
move to the next chunk, which doesn't exist and dereferencing it will
fail. The solution is to not be eager about moving to the next chunk and
only attempt it if we actually have more data to write and hence expect
more chunks.
2019-12-02 10:10:31 +02:00
Botond Dénes
875314fc4b bytes_ostream: make it a FragmentRange
The presence of `const_iterator` seems to be a requirement as well
although it is not part of the concept. But perhaps it is just an
assumption made by code using it.
2019-12-02 10:10:31 +02:00
Botond Dénes
4054ba0c45 serialization: accept any CharOutputIterator
Not just bytes::output_iterator. Allow writing into streams other than
just `bytes`. In fact we should be very careful with writing into
`bytes` as they require potentially large contiguous allocations.

The `write()` method is now templatized also on the type of its first
argument, which now accepts any CharOutputIterator. Due to our poor
usage of namespace this now collides with `write` defined inside
`db/commitlog/commitlog.cc`. Luckily, the latter doesn't really have to
be templatized on the data type it reads from, and de-templatizing it
resolves the clash.
2019-12-02 10:10:31 +02:00
Botond Dénes
07007edab9 bytes_ostream: add output_iterator
To allow it being used for serialization code, which works in terms of
output iterators.
2019-12-02 10:10:31 +02:00
Takuya ASADA
c5a95210fe dist/common/scripts/scylla_setup: list virtio-blk devices correctly on interactive RAID setup
Currently interactive RAID setup prompt does not list virtio-blk devices due to
following reasons:
 - We fail matching '-p' option on 'lsblk --help' output since misusage of
   regex functon, list_block_devices() always skipping to use lsblk output.
 - We don't check existance of /dev/vd* when we skipping to use lsblk.
 - We mistakenly excluded virtio-blk devices on 'lsblk -pnr' output using '-e'
   option, but we actually needed them.

To fix the problem we need to use re.search() instead of re.match() to match
'-p' option on 'lsblk --help', need to add '/dev/vd*' on block device list,
then need to stop '-e 252' option on lsblk which excludes virtio-blk.

Additionally, it better to parse 'TYPE' field of lsblk output, we should skip
'loop' devices and 'rom' devices since these are not disk devices.

Fixes #4066

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191201160143.219456-1-syuu@scylladb.com>
2019-12-01 18:36:48 +02:00
Takuya ASADA
124da83103 dist/common/scripts: use chrony as NTP server on RHEL8/CentOS8
We need to use chrony as NTP server on RHEL8/CentOS8, since it dropped
ntpd/ntpdate.

Fixes #4571

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191101174032.29171-1-syuu@scylladb.com>
2019-12-01 18:35:03 +02:00
Nadav Har'El
b82417ba27 Merge "alternator: Implement Expected operators LE, GE, and BETWEEN"
Merged pull request https://github.com/scylladb/scylla/pull/5392 from
Dejan Mircevski.

Refs #5034

The patches:
  alternator: Implement LE operator in Expected
  alternator: Implement GE operator in Expected
  alternator: Make cmp diagnostic a value, not funct
  utils: Add operator<< for big_decimal
  alternator: Implement BETWEEN operator in Expected
2019-12-01 16:11:11 +02:00
Nadav Har'El
8614c30bcf Merge "implement echo command"
Merged pull request https://github.com/scylladb/scylla/pull/5387 from
Amos Kong:

This patch implemented echo command, which return the string back to client.

Reference:

    https://redis.io/commands/echo
2019-12-01 10:29:57 +02:00
Amos Kong
49fee4120e redis-test: add test_echo
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-30 13:32:00 +08:00
Amos Kong
3e2034f07b redis: implement echo command
This patch implemented echo command, which return the string back to client.

Reference:
- https://redis.io/commands/echo

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-30 13:30:35 +08:00
Dejan Mircevski
dcb1b360ba alternator: Implement BETWEEN operator in Expected
Enable existing BETWEEN test, and add some more coverage to it.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 16:47:21 -05:00
Dejan Mircevski
c43b286f35 utils: Add operator<< for big_decimal
... and remove an existing duplicate from lua.cc.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 15:32:09 -05:00
Dejan Mircevski
e0d77739cc alternator: Make cmp diagnostic a value, not funct
All check_compare diagnostics are static strings, so there's no need
to call functions to get them.  Instead of a function, make diagnostic
a simple value.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 15:09:05 -05:00
Dejan Mircevski
65cb84150a alternator: Implement GE operator in Expected
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 12:29:08 -05:00
Dejan Mircevski
f201f0eaee alternator: Implement LE operator in Expected
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 11:59:52 -05:00
Avi Kivity
96009881d8 Update seastar submodule
* seastar 8eb6a67a4...166061da3 (3):
  > install-dependencies.sh: add diffutils
  > reactor: replace std::optional (in _network_stack_ready) with compat::optional
  > noncopyable_function: disable -Wuninitialized warning in noncopyable_function_base

Ref #5386.
2019-11-29 12:50:48 +02:00
Tomasz Grabiec
6562c60c86 Merge "test.py: terminate children upon signal" from Kostja
Allows a signal to terminate the outstanding
test tasks, to avoid dangling children.
2019-11-29 12:05:03 +02:00
Pekka Enberg
bb227cf2b4 Merge "Fix default directories in Scylla setup scripts" from Amos
"Fix two problem in scylla_io_setup:

 - Problem 1: paths of default directories is invalid, introduced by
   commit 5ec1915 ("scylla_io_setup: assume default directories under
   /var/lib/scylla").

 - Problem 2: wrong path join, introduced by commit 31ddb21
   ("dist/common/scripts: support nonroot mode on setup scripts").

Fix a problem in scylla_io_setup, scylla_fstrim and scylla_blocktune.py:

  - Fixed default scylla directories when they aren't assigned in
    scylla.yaml"

Fixes #5370

Reviewed-by: Pavel Emelyanov <xemul@scylladb.com>

* 'scylla_io_setup' of git://github.com/amoskong/scylla:
  use parse_scylla_dirs_with_default to get scylla directories
  scylla_io_setup: fix data_file_directories check
  scylla_util: introduce helper to process the default scylla directories
  scylla_util: get workdir by datadir() if it's not assigned in scylla.yaml
  scylla_io_setup: fix path join of default scylla directories
2019-11-29 12:05:03 +02:00
Ultrabug
61f1e6e99c test.py: fix undefined variable 'options' in write_xunit_report() 2019-11-28 19:06:22 +03:00
Ultrabug
5bdc0386c4 test.py: comparison to False should be 'if cond is False:' 2019-11-28 19:06:22 +03:00
Ultrabug
737b1cff5e test.py: use isinstance() for type comparison 2019-11-28 19:06:22 +03:00
Konstantin Osipov
c611325381 test.py: terminate children upon signal
Use asyncio as a more modern way to work with concurrency,
Process signals in an event loop, terminate all outstanding
tests before exiting.

Breaking change: this commit requires Python 3.7 or
newer to run this script. The patch adds a version
check and a message to enforce it.
2019-11-28 19:06:22 +03:00
Botond Dénes
cf24f4fe30 imr: move documentation to docs/
Where all the other documentation is, and hence where people would be
looking for it.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191128144612.378244-1-bdenes@scylladb.com>
2019-11-28 16:47:52 +02:00
Avi Kivity
36dd0140a8 Update seastar submodule
* seastar 5c25de907a...8eb6a67a4b (1):
  > util/backtrace.hh: add missing print.hh include
2019-11-28 16:47:16 +02:00
Benny Halevy
7aef39e400 tracing: one_session_records: keep local tracing ptr
Similar to trace_state keep shared_ptr<tracing> _local_tracing_ptr
in one_session_records when constructed so it can be used
during shutdown.

Fixes #5243

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-11-28 15:24:10 +01:00
Gleb Natapov
75499896ab client_state: store _user as optional instead of shared_ptr
_user cannot outlive client_state class instance, so there is no point
in holding it in shared_ptr.

Tested: debug test.py and dtest auth_test.py

Message-Id: <20191128131217.26294-5-gleb@scylladb.com>
2019-11-28 15:48:59 +02:00
Gleb Natapov
1538cea043 cql: modification_statement: store _restrictions as optional instead of shared_ptr
_restrictions can be optional since its lifetime is managed by
modification_statement class explicitly.

Message-Id: <20191128131217.26294-4-gleb@scylladb.com>
2019-11-28 15:48:54 +02:00
Gleb Natapov
ce5d6d5eee storage_service: store thrift server as an optional instead of shared_ptr
Only do_stop_rpc_server uses the shared_ptr to prolong server's
lifetime until stop() completes, but do_with() can be used to achieve the
same.

Message-Id: <20191128131217.26294-3-gleb@scylladb.com>
2019-11-28 15:48:51 +02:00
Gleb Natapov
b9b99431a8 storage_service: store cql server as an optional instead of shared_ptr
Only do_stop_native_transport() uses the shared_ptr to prolong server's
lifetime until stop() completes, but do_with() can be used to achieve the
same.

Message-Id: <20191128131217.26294-2-gleb@scylladb.com>
2019-11-28 15:48:47 +02:00
Avi Kivity
2b7e97514a Update seastar submodule
* seastar 6f0ef32514...5c25de907a (7):
  > shared_future: Fix crash when all returned futures time out
Fixes #5322.
  > future: don't create temporaries on get_value().
  > reactor: lower the default stall threshold to 200ms
  > reactor: Simplify network initialization
  > reactor: Replace most std::function with noncopyable_function
  > futures: Avoid extra moves in SEASTAR_TYPE_ERASE_MORE mode
  > inet_address: Make inet_address == operator ignore scope (again)
2019-11-28 14:48:01 +02:00
Juliusz Stasiewicz
fa12394dfe reader_concurrency_semaphore: cosmetic changes
Added line breaks, replaced unused include, included seastarx.hh
instead of `using namespace seastar`.
2019-11-28 13:39:08 +01:00
Nadav Har'El
fde336a882 Merged "5139 minmax bad printing"
Merged pull request https://github.com/scylladb/scylla/pull/5311 from
Juliusz Stasiewicz:

This is a partial solution to #5139 (only for two types) because of the
above and because collections are much harder to do. They are coming in
a separate PR.
2019-11-28 14:06:43 +02:00
Juliusz Stasiewicz
3b9ebca269 tests/cql_query_test: add test for aggregates on inet+time_type
This is a test to max(), min() and count() system functions on
the arguments of types: `net::inet_address` and `time_native_type`.
2019-11-28 11:20:43 +01:00
Juliusz Stasiewicz
9c23d89531 cql3/functions: add missing min/max/count for inet and time type
References #5139. Aggregate functions, like max(), when invoked
on `inet_address' and `time_native_type' used to choose
max(blob)->blob overload, with casting of argument and result to
bytes. This is because appropriate calls to
`aggregate_fcts::make_XXX_function()' were missing. This commit
adds them. Functioning remains the same but now clients see
user-friendly representations of aggregate result, not binary.

Comparing inet addresses without inet::operator< is performed by
trick, where ADL is bypassed by wrapping the name of std::min/max
and providing an overload of wrapper on inet type.
2019-11-28 11:18:31 +01:00
Pavel Emelyanov
8532093c61 cql: The cql_server does not need proxy reference
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191127153842.4098-1-xemul@scylladb.com>
2019-11-28 10:58:46 +01:00
Amos Kong
e2eb754d03 use parse_scylla_dirs_with_default to get scylla directories
Use default data_file_directories/commitlog_directory if it's not assigned
in scylla.yaml

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 15:48:14 +08:00
Amos Kong
bd265bda4f scylla_io_setup: fix data_file_directories check
Use default data_file_directories if it's not assigned in scylla.yaml

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 15:47:56 +08:00
Amos Kong
123c791366 scylla_util: introduce helper to process the default scylla directories
Currently we support to assign workdir from scylla.yaml, and we use many
hardcode '/var/lib/scylla' in setup scripts.

Some setup scripts get scylla directories by parsing scylla.yaml, introduced
parse_scylla_dirs_with_default() that adds default values if scylla directories
aren't assigned in scylla.yaml

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 14:54:32 +08:00
Amos Kong
b75061b4bc scylla_util: get workdir by datadir() if it's not assigned in scylla.yaml
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 14:38:01 +08:00
Amos Kong
ada0e92b85 scylla_io_setup: fix path join of default scylla directories
Currently we are checking an invalid path of some default scylla directories,
the directories don't exist, so the tune will always be skipped. It caused by
two problem.

Problem 1: paths of default directories is invalid

Introduced by commit 5ec191536e, we try to tune some scylla default directories
if they exist. But the directory paths we try are wrong.

For example:
- What we check: /var/lib/scylla/commitlog_directory
- Correct one: /var/lib/scylla/commitlog

Problem 2: wrong path join

Introduced by commit 31ddb2145a, default_path might be replaced from
'/var/lib/scylla/' to '/var/lib/scylla'.

Our code tries to check an invalid path that is wrongly join, eg:
'/var/lib/scyllacommitlog'

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 14:37:58 +08:00
Amos Kong
d4a26f2ad0 scylla_util: get_scylla_dirs: return default data/commitlog directories if they aren't set (#5358)
The default values of data_file_directories and commitlog_directory were
commented by commit e0f40ed16a. It causes scylla_util.py:get_scylla_dirs() to
fail in checking the values.

This patch changed get_scylla_dirs() to return default data/commitlog
directories if they aren't set.

Fixes #5358 

Reviewed-by: Pavel Emelyanov <xemul@scylladb.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-27 13:52:05 +02:00
Nadav Har'El
cb1ed5eab2 alternator-test: test Query's Limit parameter
Add a test, test_query.py::test_query_limit, to verify that the Limit
parameter correctly limits the number of rows returned by the Query.
This was supposed to already work correctly - but we never had a test for
it. As we hoped, the test passes (on both Alternator and DynamoDB).

Another test, test_query.py::test_query_limit_paging, verifies that
paging can be done with any setting of Limit. We already had tests
for paging of the Scan operation, but not for the Query operation.

Refs #5153

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-27 12:27:26 +01:00
Nadav Har'El
c01ca661a0 alternator-test: Select parameter of Query and Scan
This is a comprehensive test for the "Select" parameter of Query and Scan
operations, but only for the base-table case, not index, so another future
patch should add similar tests in test_gsi.py and test_lsi.py as well.

The main use of the Select parameter is to allow returning just the count
of items, instead of their content, but it also has other esoteric options,
all of which we test here.

The test currently succeeds on AWS DynamoDB, demonstrating that the test
is correct, but fails on Alternator because the "Select" parameter is not
yet supported. So the test is marked xfail.

Refs #5058

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-27 12:22:33 +01:00
Botond Dénes
9d09f57ba5 scylla-gdb.py: scylla_smp_queues: use lazy initalization
Currently the command tries to read all seastar smp queues in its
initialization code in the constructor. This constructor is run each
time `scylla-gdb.py` is sourced in `gdb` which leads to slowdowns and
sometimes also annoying errors because the sourcing happens in the wrong
context and seastar symbols are not available.
Avoid this by running this initializing code lazily, on the first
invocation.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191127095408.112101-1-bdenes@scylladb.com>
2019-11-27 12:04:57 +01:00
Tomasz Grabiec
87b72dad3e Merge "treewide: add missing const qualifiers" from Pavel Solodovnikov
This patchset adds missing "const" function qualifiers throughout
the Scylla code base, which would make code less error-prone.

The changeset incorporates Kostja's work regarding const qualifiers
in the cql code hierarchy along with a follow-up patch addressing the
review comment of the corresponding patch set (the patch subject is
"cql: propagate const property through prepared statement tree.").
2019-11-27 10:56:20 +01:00
Rafael Ávila de Espíndola
91b43f1f06 dbuild: fix podman with selinux enabled
With this change I am able to run tests using docker-podman. The
option also exists in docker.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191126194101.25221-1-espindola@scylladb.com>
2019-11-26 21:50:56 +02:00
Rafael Ávila de Espíndola
480055d3b5 dbuild: Fix missing docker options
With the recent changes docker was missing a few options. In
particular, it was missing -u.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191126194347.25699-1-espindola@scylladb.com>
2019-11-26 21:45:31 +02:00
Rafael Ávila de Espíndola
c0a2cd70ff lua: fix test with boost 1.66
The boost 1.67 release notes says

Changed maximum supported year from 10000 to 9999 to resolve various issues

So change the test to use a larger number so that we get an exception
with both boost 1.66 and boost 1.67.

Fixes #5344

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191126180327.93545-1-espindola@scylladb.com>
2019-11-26 21:17:15 +02:00
Pavel Solodovnikov
55a1d46133 cql: some more missing const qualifiers
There are several virtual functions in public interfaces named "is_*"
that clearly should be marked as "const", so fix that.
2019-11-26 17:57:51 +03:00
Pavel Solodovnikov
412f1f946a cql: remove "mutable" on _opts in select_statement
_opts initialization can be safely done in the constructor, hence no need to make it mutable.
2019-11-26 17:55:10 +03:00
Piotr Sarna
d90dbd6ab0 Merge "support podman as a replacement to docker" from Avi
Docker on Fedora 31 is flakey, and is not supported at all on RHEL 8.
Podman is a drop-in replacement for docker; this series adds support
for using podman in dbuild.

Apart from actually working on Fedora 31 hosts,
podman is nicer in being more secure and not requiring a daemon.

Fixes #5332
2019-11-26 15:17:49 +01:00
Tomasz Grabiec
5c9fe83615 Merge "Sanitize sub-modules shutting down" from Pavel
As suggested in issue #4586 here is the helper that prints
"shutting down foo" message, then shuts the foo down, then
prints the "[it] was successull" one. In between it catches
the exception (if any) and warns this in logs.

By "then" I mean literally then, not the seastar's then() :)

Fixes: #4586
2019-11-26 15:14:22 +02:00
Piotr Sarna
9c5a5a5ac2 treewide: add names to semaphores
By default, semaphore exceptions bring along very little context:
either that a semaphore was broken or that it timed out.
In order to make debugging easier without introducing significant
runtime costs, a notion of named semaphore is added.
A named semaphore is simply a semaphore with statically defined
name, which is present in its errors, bringing valuable context.
A semaphore defined as:

  auto sem = semaphore(0);

will present the following message when it breaks:
"Semaphore broken"
However, a named semaphore:

  auto named_sem = named_semaphore(0, named_semaphore_exception_factory{"io_concurrency_sem"});

will present a message with at least some debugging context:

  "Semaphore broken: io_concurrency_sem"

It's not much, but it would really help in pinpointing bugs
without having to inspect core dumps.

At the same time, it does not incur any costs for normal
semaphore operations (except for its creation), but instead
only uses more CPU in case an error is actually thrown,
which is considered rare and not to be on the hot path.

Refs #4999

Tests: unit(dev), manual: hardcoding a failure in view building code
2019-11-26 15:14:21 +02:00
Avi Kivity
6fbb724140 conf: remove unsupported options from scylla.yaml (#5299)
These unsupported options do nothing except to confuse users who
try to tune them.

Options removed:

hinted_handoff_throttle_in_kb
max_hints_delivery_threads
batchlog_replay_throttle_in_kb
key_cache_size_in_mb
key_cache_save_period
key_cache_keys_to_save
row_cache_size_in_mb
row_cache_save_period
row_cache_keys_to_save
counter_cache_size_in_mb
counter_cache_save_period
counter_cache_keys_to_save
memory_allocator
saved_caches_directory
concurrent_reads
concurrent_writes
concurrent_counter_writes
file_cache_size_in_mb
index_summary_capacity_in_mb
index_summary_resize_interval_in_minutes
trickle_fsync
trickle_fsync_interval_in_kb
internode_authenticator
native_transport_max_threads
native_transport_max_concurrent_connections
native_transport_max_concurrent_connections_per_ip
rpc_server_type
rpc_min_threads
rpc_max_threads
rpc_send_buff_size_in_bytes
rpc_recv_buff_size_in_bytes
internode_send_buff_size_in_bytes
internode_recv_buff_size_in_bytes
thrift_framed_transport_size_in_mb
concurrent_compactors
compaction_throughput_mb_per_sec
sstable_preemptive_open_interval_in_mb
inter_dc_stream_throughput_outbound_megabits_per_sec
cross_node_timeout
streaming_socket_timeout_in_ms
dynamic_snitch_update_interval_in_ms
dynamic_snitch_reset_interval_in_ms
dynamic_snitch_badness_threshold
request_scheduler
request_scheduler_options
throttle_limit
default_weight
weights
request_scheduler_id
2019-11-26 15:14:21 +02:00
Amos Kong
817f34d1a9 ami: support new aws instance types: c5d, m5d, m5ad, r5d, z1d (#5330)
Currently scylla_io_setup will skip in scylla_setup, because we didn't support
those new instance types.

I manually executed scylla_io_setup, and the scylla-server started and worked
well.

Let's apply this patch first, then check if there is some new problem in
ami-test.

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-26 15:14:21 +02:00
Konstantin Osipov
90346236ac cql: propagate const property through prepared statement tree.
cql_statement is a class representing a prepared statement in Scylla.
It is used concurrently during execution, so it is important that its
change is not changed by execution.

Add const qualifier to the execution methods family, throghout the
cql hierarchy.

Mark a few places which do mutate prepared statement state during
execution as mutable. While these are not affecting production today,
as code ages, they may become a source of latent bugs and should be
moved out of the prepared state or evaluated at prepare eventually:

cf_property_defs::_compaction_strategy_class
list_permissions_statement::_resource
permission_altering_statement::_resource
property_definitions::_properties
select_statement::_opts
2019-11-26 14:18:17 +03:00
Pavel Solodovnikov
2f442f28af treewide: add const qualifiers throughout the code base 2019-11-26 02:24:49 +03:00
Pavel Emelyanov
50a1ededde main: Remove now unused defer-with-log helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
a0f92d40ee main: Shut down sighup handler with verbose helper
And (!) fix the misprinted variable name.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
0719369d83 repair: Remove extra logging on shutdown
The shutdown start/finish messages are already printed in verbose_shutdown()

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
2d64fc3a3e main: Shut down database with verbose_shutdown helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
636c300db5 main: Shut down prometheus with verbose_shutdown()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

---

v2:
- Have stop easrlier so that exception in start/listen do
  not prevent prometheu.stop from calling
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
804b152527 main: Sanitize shutting down callbacks
As suggested in issue #4586 here is the helper that prints
"shutting down foo" message, then shuts the foo down, then
prints the "shutting down foo was successfull". In between
it catches the exception (if any) and warns this in logs.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:45:49 +03:00
Nadav Har'El
4160b3630d Merge "Return preimage from CDC only when it's enabled"
Merged pull request https://github.com/scylladb/scylla/pull/5218
from Piotr Jastrzębski:

Users should be able to decide whether they need preimage or not. There is
already an option for that but it's not respected by the implementation.
This PR adds support for this functionality.

Tests: unit(dev).

Individual patches:
  cdc: Don't take storage_proxy as transformer::pre_image_select param
  cdc::append_log_mutations: use do_with instead of shared_ptr
  cdc::append_log_mutations: fix undefined behavior
  cdc: enable preimage in test_pre_image_logging test
  cdc: Return preimage only when it's requested
  cdc: test both enabled and disabled preimage in test_pre_image_logging
2019-11-25 14:32:17 +02:00
Pavel Emelyanov
f6ac969f1e mm: Stop migration manager
Before stopping the db itself, stop the migration service.
It must be stopped before RPC, but RPC is not stopped yet
itself, so we should be safe here.

Here's the tail of the resulting logs:

INFO  2019-11-20 11:22:35,193 [shard 0] init - shutdown migration manager
INFO  2019-11-20 11:22:35,193 [shard 0] migration_manager - stopping migration service
INFO  2019-11-20 11:22:35,193 [shard 1] migration_manager - stopping migration service
INFO  2019-11-20 11:22:35,193 [shard 0] init - Shutdown database started
INFO  2019-11-20 11:22:35,193 [shard 0] init - Shutdown database finished
INFO  2019-11-20 11:22:35,193 [shard 0] init - stopping prometheus API server
INFO  2019-11-20 11:22:35,193 [shard 0] init - Scylla version 666.development-0.20191120.25820980f shutdown complete.

Also -- stop the mm on drain before the commitlog it stopped.
[Tomasz: mm needs the cl because pulling schema changes from other nodes
involves applying them into the database. So cl/db needs to be
stopped after mm is stopped.]

The drain logs would look like

...
INFO  2019-11-25 11:00:40,562 [shard 0] migration_manager - stopping migration service
INFO  2019-11-25 11:00:40,562 [shard 1] migration_manager - stopping migration service
INFO  2019-11-25 11:00:40,563 [shard 0] storage_service - DRAINED:

and then on stop

...
INFO  2019-11-25 11:00:46,427 [shard 0] init - shutdown migration manager
INFO  2019-11-25 11:00:46,427 [shard 0] init - Shutdown database started
INFO  2019-11-25 11:00:46,427 [shard 0] init - Shutdown database finished
INFO  2019-11-25 11:00:46,427 [shard 0] init - stopping prometheus API server
INFO  2019-11-25 11:00:46,427 [shard 0] init - Scylla version 666.development-0.20191125.3eab6cd54 shutdown complete.

Fixes #5300

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191125080605.7661-1-xemul@scylladb.com>
2019-11-25 12:59:01 +01:00
Asias He
6ec602ff2c repair: Fix rx_hashes_nr metrics (#5213)
In get_full_row_hashes_with_rpc_stream and
repair_get_row_diff_with_rpc_stream_process_op which were introduced in
the "Repair switch to rpc stream" series, rx_hashes_nr metrics are not
updated correctly.

In the test we have 3 nodes and run repair on node3, we makes sure the
following metrics are correct.

assertEqual(node1_metrics['scylla_repair_tx_hashes_nr'] + node2_metrics['scylla_repair_tx_hashes_nr'],
   	    node3_metrics['scylla_repair_rx_hashes_nr'])
assertEqual(node1_metrics['scylla_repair_rx_hashes_nr'] + node2_metrics['scylla_repair_rx_hashes_nr'],
   	    node3_metrics['scylla_repair_tx_hashes_nr'])
assertEqual(node1_metrics['scylla_repair_tx_row_nr'] + node2_metrics['scylla_repair_tx_row_nr'],
   	    node3_metrics['scylla_repair_rx_row_nr'])
assertEqual(node1_metrics['scylla_repair_rx_row_nr'] + node2_metrics['scylla_repair_rx_row_nr'],
   	    node3_metrics['scylla_repair_tx_row_nr'])
assertEqual(node1_metrics['scylla_repair_tx_row_bytes'] + node2_metrics['scylla_repair_tx_row_bytes'],
   	    node3_metrics['scylla_repair_rx_row_bytes'])
assertEqual(node1_metrics['scylla_repair_rx_row_bytes'] + node2_metrics['scylla_repair_rx_row_bytes'],
            node3_metrics['scylla_repair_tx_row_bytes'])

Tests: repair_additional_test.py:RepairAdditionalTest.repair_almost_synced_3nodes_test
Fixes: #5339
Backports: 3.2
2019-11-25 13:57:37 +02:00
Piotr Jastrzebski
2999cb5576 cdc: test both enabled and disabled preimage in test_pre_image_logging
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
222b94c707 cdc: Return preimage only when it's requested
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
c94a5947b7 cdc: enable preimage in test_pre_image_logging test
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
595c9f9d32 cdc::append_log_mutations: fix undefined behavior
The code was iterating over a collection that was modified
at the same time. Iterators were used for that and collection
modification can invalidate all iterators.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
f0f44f9c51 cdc::append_log_mutations: use do_with instead of shared_ptr
This will not only safe some allocations but also improve
code readability.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
b8d9158c21 cdc: Don't take storage_proxy as transformer::pre_image_select param
transformer has access to storage_proxy through its _ctx field.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Nadav Har'El
3eab6cd549 Merged "toolchain: update to Fedora 31"
Merged pull request https://github.com/scylladb/scylla/pull/5310 from
Avi Kivity:

This is a minor update as gcc and boost versions did not change. A noteable
update is patchelf 0.10, which adds support to large binaries.

A few minor issues exposed by the update are fixed in preparatory patches.

Patches:
  dist: rpm: correct systemd post-uninstall scriptlet
  build: force xz compression on rpm binary payload
  tools: toolchain: update to Fedora 31
2019-11-24 13:38:45 +02:00
Tomasz Grabiec
e3d025d014 row_cache: Fix abort on bad_alloc during cache update
Since 90d6c0b, cache will abort when trying to detach partition
entries while they're updated. This should never happen. It can happen
though, when the update fails on bad_alloc, because the cleanup guard
invalidates the cache before it releases partition snapshots (held by
"update" coroutine).

Fix by destroying the coroutine first.

Fixes #5327.

Tests:
  - row_cache_test (dev)

Message-Id: <1574360259-10132-1-git-send-email-tgrabiec@scylladb.com>
2019-11-24 12:06:51 +02:00
Rafael Ávila de Espíndola
8599f8205b rpmbuild: don't use dwz
By default rpm uses dwz to merge the debug info from various
binaries. Unfortunately, it looks like addr2line has not been updated
to handle this:

// This works
$ addr2line  -e build/release/scylla 0x1234567

$ dwz -m build/release/common.debug build/release/scylla.debug build/release/iotune.debug

// now this fails
$ addr2line -e build/release/scylla 0x1234567

I think the issue is

https://sourceware.org/bugzilla/show_bug.cgi?id=23652

Fixes #5289

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123015734.89331-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
25d5d39b3c reloc: Force using sha1 for build-ids
The default build-id used by lld is xxhash, which is 8 bytes long. rpm
requires build-ids to be at least 16 bytes long
(https://github.com/rpm-software-management/rpm/issues/950). We force
using sha1 for now. That has no impact in gold and bfd since that is
their default. We set it in here instead of configure.py to not slow
down regular builds.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123020801.89750-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
b5667b9c31 build: don't compress debug info in executables
By default we were compressing debug info only in release
executables. The idea, if I understand it correctly, is that those are
the ones we ship, so we want a more compact binary.

I don't think that was doing anything useful. The compression is just
gzip, so when we ship a .tar.xz, having the debug info compressed
inside the scylla binary probably reduces the overall compression a
bit.

When building a rpm the situation in amusing. As part of the rpm
build process the debug info is decompressed and extracted to an
external file.

Given that most of the link time goes to compressing debug info, it is
probably a good idea to just skip that.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123022825.102837-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Tomasz Grabiec
d84859475e Merge "Refactor test.py and cleanup resources" from Kostja
Structure the code to be able to introduce futures.
Apply trivial cleanups.
Switch to asyncio and use it to work with processes and
handle signals. Cleanup all processes upon signal.
2019-11-24 11:35:29 +02:00
Tomasz Grabiec
e166fdfa26 Merge "Optimize LWT query phase" from Vladimir Davydov
This patch implements a simple optimization for LWT: it makes PAXOS
prepare phase query locally and return the current value of the modified
key so that a separate query is not necessary. For more details see
patch 6. Patch 1 fixes a bug in next. Patches 2-5 contain trivial
preparatory refactoring.
2019-11-24 11:35:29 +02:00
Pavel Solodovnikov
4879db70a6 system_keyspace: support timeouts in queries to system.paxos table.
Also introduce supplementary `execute_cql_with_timeout` function.

Remove redundant comment for `execute_cql`.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191121214148.57921-1-pa.solodovnikov@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
bf5f864d80 paxos: piggyback result query on prepare response
Current LWT implementation uses at least three network round trips:
 - first, execute PAXOS prepare phase
 - second, query the current value of the updated key
 - third, propose the change to participating replicas

(there's also learn phase, but we don't wait for it to complete).

The idea behind the optimization implemented by this patch is simple:
piggyback the current value of the updated key on the prepare response
to eliminate one round trip.

To generate less network traffic, only the closest to the coordinator
replica sends data while other participating replicas send digests which
are used to check data consistency.

Note, this patch changes the API of some RPC calls used by PAXOS, but
this should be okay as long as the feature in the early development
stage and marked experimental.

To assess the impact of this optimization on LWT performance, I ran a
simple benchmark that starts a number of concurrent clients each of
which updates its own key (uncontended case) stored in a cluster of
three AWS i3.2xlarge nodes located in the same region (us-west-1) and
measures the aggregate bandwidth and latency. The test uses shard-aware
gocql driver. Here are the results:

                latency 99% (ms)    bandwidth (rq/s)    timeouts (rq/s)
    clients     before  after       before  after       before  after
          1          2      2          626    637            0      0
          5          4      3         2616   2843            0      0
         10          3      3         4493   4767            0      0
         50          7      7        10567  10833            0      0
        100         15     15        12265  12934            0      0
        200         48     30        13593  14317            0      0
        400        185     60        14796  15549            0      0
        600        290     94        14416  15669            0      0
        800        568    118        14077  15820            2      0
       1000        710    118        13088  15830            9      0
       2000       1388    232        13342  15658           85      0
       3000       1110    363        13282  15422          233      0
       4000       1735    454        13387  15385          329      0

That is, this optimization improves max LWT bandwidth by about 15%
and allows to run 3-4x more clients while maintaining the same level
of system responsiveness.
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
6160b9017d commitlog: make sure a file is closed
If allocate or truncate throws, we have to close the file.

Fixes #4877

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191114174810.49004-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
3d1d4b018f paxos: remove unnecessary move constructor invocations
invoke_on() guarantees that captures object won't be destroyed until the
future returned by the invoked function is resolved so there's no need
to move key, token, proposal for calling paxos_state::*_impl helpers.
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
cfb079b2c9 types: Refactor duplicated value_cast implementation
The two implementations of value_cast were almost identical.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-3-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
ef2e96c47c storage_proxy: factor out helper to sort endpoints by proximity
We need it for PAXOS.
2019-11-24 11:35:29 +02:00
Nadav Har'El
854e6c8d7b alternator-test: test_health_only_works_for_root_path: remove wrong check
The test_health_only_works_for_root_path test checks that while Alternator's
HTTP server responds to a "GET /" request with success ("health check"), it
should respond to different URLs with failures (page not found).

One of the URLs it tested was "/..", but unfortunately some versions of
Python's HTTP client canonize this request to just a "/", causing the
request to unexpectedly succeed - and the test to fail.

So this patch just drops the "/.." check. A few other nonsense URLs are
attempted by the test - e.g., "/abc".

Fixes #5321

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
63d4590336 storage_proxy: move digest_algorithm upper
We need it for PAXOS.

Mark it as static inline while we are at it.
2019-11-24 11:35:29 +02:00
Nadav Har'El
43d3e8adaf alternator: make DescribeTable return table schema
One of the fields still missing in DescribeTable's response (Refs #5026)
was the table's schema - KeySchema and AttributeDefinitions.

This patch adds this missing feature, and enables the previously-xfailing
test test_describe_table_schema.

A complication of this patch is that in a table with secondary indexes,
we need to return not just the base table's schema, but also the indexes'
schema. The existing tests did not cover that feature, so we add here
two more tests in test_gsi.py for that.

One of these secondary-index schema tests, test_gsi_2_describe_table_schema,
still fails, because it outputs a range-key which Scylla added to a view
because of its own implementation needs, but wasn't in the user's
definition of the GSI. I opened a separate issue #5320 for that.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
f5c2a23118 serializer: add reference_wrapper handling
Serialize reference_wrapper<T> as T and make sure is_equivalent<> treats
reference_wrapper<T> wrapped in std::optional<> or std::variant<>, or
std::tuple<> as T.

We need it to avoid copying query::result while serializing
paxos::promise.
2019-11-24 11:35:29 +02:00
Botond Dénes
89f9b89a89 scylla-gdb.py: scylla task_histogram: scan all tasks with -a or -s 0
Currently even if `-a` or `-s 0` is provided, `scylla task_histogram`
will scan a limited amount of pages due to a bug in the scan loop's stop
condition, which will be trigger a stop once the default sample limit is
reached. Fix the loop by skipping this check when the user wants to scan
all tasks.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191121141706.29476-1-bdenes@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
1452653fbc query_context: fix use after free of timeout_config in execute_cql_with_timeout
timeout_config is used by reference by cql3::query_processor::process(),
see cql3::query_options, so the caller must make sure it doesn't go away.
2019-11-24 11:35:29 +02:00
Avi Kivity
ff7e78330c tools: toolchain: dbuild: work around "podman logs --follow" hang
At least some versions of 'podman logs --follow' hang when the
container eventually exits (also happens with docker on recent
versions). Fortunately, we don't need to use 'podman logs --follow'
and can use the more natural non-detached 'podman run', because
podman does not proxy SIGTERM and instead shuts down the container
when it receives it.

So, to work around the problem, use the same code path in interactive
and non-interactive runs, when podman is in use instead of docker.
2019-11-22 13:59:05 +02:00
Avi Kivity
702834d0e4 tools: dbuild: avoid uid/gid/selinux hacks when using podman
With docker, we went to considerable lengths to ensure that
access to mounted volume was done using the calling user, including
supplementary groups. This avoids root-owned files being left around
after a build, and ensures that access to group-shared files (like
/var/cache/ccache) works as expected.

All of this is unnecessary and broken when using podman. Podman
uses a proxy to access files on behalf of the container, so naturally
all access is done using the calling user's identity. Since it remaps
user and group IDs, assigning the host uid/gid is meaningless. Using
--userns host also breaks, because sudo no longer works.

Fix this by making all the uid/gid/selinux games specific to docker and
ignore them when using podman. To preserve the functionality of tools
that depend on $HOME, set that according to the host setting.
2019-11-22 13:58:29 +02:00
Tomasz Grabiec
9d7f8f18ab database: Avoid OOMing with flush continuations after failed memtable flush
The original fix (10f6b125c8) didn't
take into account that if there was a failed memtable flush (Refs
flush) but is not a flushable memtable because it's not the latest in
the memtable list. If that happens, it means no other memtable is
flushable as well, cause otherwise it would be picked due to
evictable_occupancy(). Therefore the right action is to not flush
anything in this case.

Suspected to be observed in #4982. I didn't manage to reproduce after
triggering a failed memtable flush.

Fixes #3717
2019-11-22 12:08:36 +01:00
Tomasz Grabiec
fb28543116 lsa: Introduce operator bool() to occupancy_stats 2019-11-22 12:08:28 +01:00
Tomasz Grabiec
a69fda819c lsa: Expose region_impl::evictable_occupancy in the region class 2019-11-22 12:08:10 +01:00
Avi Kivity
1c181c1b85 tools: dbuild: don't mount duplicate volumes
podman refuses to start with duplicate volumes, which routinely
happen if the toplevel directory is the working directory. Detect
this and avoid the duplicate.
2019-11-22 10:13:30 +02:00
Konstantin Osipov
b8b5834cf1 test.py: simplify message output in run_test() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
90a8f79d7e test.py: use UnitTest class where possible 2019-11-21 23:16:22 +03:00
Konstantin Osipov
8cd8cfc307 test.py: rename harness command line arguments to 'options'
UnitTest class uses juggles with the name 'args' quite a bit to
construct the command line for a unit test, so let's spread
the harness command line arguments from the unit test command line
arguments a bit apart by consistently calling the harness command line
arguments 'options', and unit test command line arguments 'args'.

Rename usage() to parse_cmd_line().
2019-11-21 23:16:22 +03:00
Konstantin Osipov
e5d624d055 test.py: consolidate argument handling in UnitTest constructor
Create unique UnitTest objects in find_tests() for each found match,
including repeat, to ensure each test has its own unique id.
This will also be used to store execution state in the test.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
dd60673cef test.py: move --collectd to standard args 2019-11-21 23:16:22 +03:00
Konstantin Osipov
fe12f73d7f test.py: introduce class UnitTest 2019-11-21 23:16:22 +03:00
Konstantin Osipov
bbcdee37f7 test.py: add add_test_list() to find_tests() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
4723afa09c test.py: add long tests with add_test() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
13f1e2abc6 test.py: store the non-default seastar arguments along with definition 2019-11-21 23:16:22 +03:00
Konstantin Osipov
72ef11eb79 test.py: introduce add_test() to find_tests()
To avoid code duplication, and to build upon later.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
b50b24a8a7 test.py: avoid an unnecessary loop in find_tests() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
a5103d0092 test.py: move args.repeat processing to find_tests()
It somewhat stands in the way of using asyncio

This patch also implements a more comprehensive
fix for #5303, since we not only have --repeat, but
run some tests in different configurations, in which
case xml output is also overwritten.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
0f0a49b811 test.py: introduce print_summary() and write_xunit_report()
(One more moving of the code around).
2019-11-21 23:16:22 +03:00
Konstantin Osipov
22166771ef test.py: rename test_to_run tests_to_run 2019-11-21 23:16:22 +03:00
Konstantin Osipov
1d94d9827e test.py: introduce run_all_tests() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
29087e1349 test.py: move out run_test() routine
(Trivial code refactoring.)
2019-11-21 23:16:22 +03:00
Konstantin Osipov
79506fc5ab test.py: introduce find_tests()
Trivial code refactoring.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
a44a1c4124 test.py: remove print_status_succint
(Trivial code cleanup.)
2019-11-21 23:16:22 +03:00
Konstantin Osipov
b9605c1d37 test.py: move mode list evaluation to usage() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
0c4df5a548 test.py: add usage() 2019-11-21 23:16:22 +03:00
Pavel Emelyanov
e0f40ed16a cli: Add the --workdir|-W option
When starting scylla daemon as non-root the initialization fails
because standard /var/lib/scylla is not accessible by regular users.
Making the default dir accessible for user is not very convenient
either, as it will cause conflicts if two or more instances of scylla
are in use.

This problem can be resolved by specifying --commitlog-directory,
--data-file-directories, etc on start, but it's too much typing. I
propose to revive Nadav's --home option that allows to move all the
directories under the same prefix in one go.

Unlike Nadav's approach the --workdir option doesn't do any tricky
manipulations with existing directories. Insead, as Pekka suggested,
the individual directories are placed under the workir if and only
if the respective option is NOT provided. Otherwise the directory
configuration is taken as is regardless of whether its absolute or
relative path.

The values substutution is done early on start. Avi suggested that
this is unsafe wrt HUP config re-read and proper paths must be
resolved on the fly, but this patch doesn't address that yet, here's
why.

First of all, the respective options are MustRestart now and the
substitution is done before HUP handler is installed.

Next, commitlog and data_file values are copied on start, so marking
the options as LiveUpdate won't make any effect.

Finally, the existing named_value::operator() returns a reference,
so returning a calculated (and thus temporary) value is not possible
(from my current understanding, correct me if I'm wrong). Thus if we
want the *_directory() to return calculated value all callers of them
must be patched to call something different (e.g. *_directory.get() ?)
which will lead to more confusion and errors.

Changes v3:
 - the option is --workdir back again
 - the existing *directory are only affected if unset
 - default config doesn't have any of these set
 - added the short -W alias

Changes v2:
 - the option is --home now
 - all other paths are changed to be relative

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191119130059.18066-1-xemul@scylladb.com>
2019-11-21 15:07:39 +02:00
Rafael Ávila de Espíndola
5417c5356b types: Move get_castas_fctn to cql3
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-9-espindola@scylladb.com>
2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola
f06d6df4df types: Simplify casts to string
These now just use the to_string member functions, which makes it
possible to move the code to another file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-8-espindola@scylladb.com>
2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola
786b1ec364 types: Move json code to its own file
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-7-espindola@scylladb.com>
2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola
af8e207491 types: Avoid using deserialize_value in json code
This makes it independent of internal functions and makes it possible
to move it to another file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-6-espindola@scylladb.com>
2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola
ed65e2c848 types: Move cql3_kind to the cql3 directory
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-5-espindola@scylladb.com>
2019-11-21 12:08:47 +02:00
Rafael Ávila de Espíndola
bd560e5520 types: Fix dynamic types of some data_value objects
I found these mismatched types while converting some member functions
to standalone functions, since they have to use the public API that
has more type checks.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-4-espindola@scylladb.com>
2019-11-21 12:08:46 +02:00
Rafael Ávila de Espíndola
0d953d8a35 types: Add a test for value_cast
We had no tests on when value_cast throws or when it moves the value.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-2-espindola@scylladb.com>
2019-11-21 12:08:45 +02:00
Konstantin Osipov
002ff51053 lua: make sure the latest master builds on Debian/Ubuntu
Use pkg-config to search for Lua dependencies rather
than hard-code include and link paths.

Avoid using boost internals, not present in earlier
versions of boost.

Reviewed-by: Rafael Avila de Espindola <espindola@scylladb.com>
Message-Id: <20191120170005.49649-1-kostja@scylladb.com>
2019-11-21 07:57:12 +02:00
Pavel Solodovnikov
d910899d61 configure.py: support multi-threaded linking via gold
Use `-Wl,--threads` flag to enable multi-threaded linking when
using `ld.gold` linker.

Additional compilation test is required because it depends on whether
or not the `gold` linker has been compiled with `--enable-threads` option.

This patch introduces a substantial improvement to the link times of
`scylla` binary in release and debug modes (around 30 percent).

Local setup reports the following numbers with release build for
linking only build/release/scylla:

Single-threaded mode:
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:09.30
Multi-threaded mode:
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:51.57

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191120163922.21462-1-pa.solodovnikov@scylladb.com>
2019-11-20 19:28:00 +02:00
Nadav Har'El
89d6d668cb Merge "Redis API in Scylla"
Merged patch series from Peng Jian, adding optionally-enabled Redis API
support to Scylla. This feature is experimental, and partial - the extent
of this support is detailed in docs/redis/redis.md.

Patches:
   Document: add docs/redis/redis.md
   redis: Redis API in Scylla
   Redis API: graft redis module to Scylla
   redis-test: add test cases for Redis API
2019-11-20 16:59:13 +02:00
Piotr Sarna
086e744f8f scripts/find-maintainer: refresh maintainers list
This commit attempts to make the maintainers list up-to-date
to the best of my knowledge, because it got really stale over the time.

Message-Id: <eab6d3f481712907eb83e91ed2b8dbfa0872155f.1574261533.git.sarna@scylladb.com>
2019-11-20 16:56:31 +02:00
Glauber Costa
73aff1fc95 api: export system uptime via REST
This will be useful for tools like nodetool that want to query the uptime
of the system.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190619110850.14206-1-glauber@scylladb.com>
2019-11-20 16:44:11 +02:00
Tomasz Grabiec
9a686ac551 Merge "scylla-gdb: active sstables: support k_l/mc sstable readers" from Benny
Fixes #5277
2019-11-19 23:49:39 +01:00
Avi Kivity
1164ff5329 tools: toolchain: update to Fedora 31
This is a minor update as gcc and boost versions do not change.

glibc-langpack-en no longer gets pulled in by default. As it is required
by some locale use somewhere, it is added to the explicit dependencies.
2019-11-20 00:08:30 +02:00
Avi Kivity
301c835cbf build: force xz compression on rpm binary payload
Fedora 31 switched the default compression to zstd, which isn't readable
by some older rpm distributions (CentOS 7 in particular). Tell it to use
the older xz compression instead, so packages produced on Fedora 31 can
be installed on older distributions.
2019-11-20 00:08:24 +02:00
Avi Kivity
3ebd68ef8a dist: rpm: correct systemd post-uninstall scriptlet
The post-uninstall scriptlet requires a parameter, but older versions
of rpm survived without it. Fedora 31's rpm is more strict, so supply
this parameter.
2019-11-20 00:03:49 +02:00
Peng Jian
e6adddd8ef redis-test: add test cases for Redis API
Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-20 04:56:16 +08:00
Peng Jian
f2801feb66 Redis API: graft redis module to Scylla
In this document, the detailed design and implementation of Redis API in
Scylla is provided.

v2: build: work around ragel 7 generated code bug (suggested by Avi)
    Ragel 7 incorrectly emits some unused variables that don't compile.
    As a workaround, sed them away.

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-20 04:55:58 +08:00
Peng Jian
0737d9e84d redis: Redis API in Scylla
Scylla has advantage and amazing features. If Redis build on the top of Scylla,
it has the above features automatically. It's achived great progress
in cluster master managment, data persistence, failover and replication.

The benefits to the users are easy to use and develop in their production
environment, and taking avantages of Scylla.

Using the Ragel to parse the Redis request, server abtains the command name
and the parameters from the request, invokes the Scylla's internal API to
read and write the data, then replies to client.

Signed-off-by: Peng Jian, <pengjian.uestc@gmail.com>
2019-11-20 04:55:56 +08:00
Peng Jian
708a42c284 Document: add docs/redis/redis.md
In this document, the detailed design and implementation of Redis API in
Scylla is provided.

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-11-20 04:46:33 +08:00
Nadav Har'El
9b9609c65b merge: row_marker: correct row expiry condition
Merged patch set by Piotr Dulikowski:

This change corrects condition on which a row was considered expired by its
TTL.

The logic that decides when a row becomes expired was inconsistent with the
logic that decides if a single cell is expired. A single cell becomes expired
when expiry_timestamp <= now, while a row became expired when
expiry_timestamp < now (notice the strict inequality). For rows inserted
with TTL, this caused non-key cells to expire (change their values to null)
one second before the row disappeared. Now, row expiry logic uses non-strict
inequality.

Fixes #4263,
Fixes #5290.

Tests:

    unit(dev)
    python test described in issue #5290
2019-11-19 18:14:15 +02:00
Amnon Heiman
9df10e2d4b scylla_util.py: Add optional timeout to out function
It is useful to have an option to limit the execution time of a shell
script.

This patch adds an optional timeout parameter, if a parameter will be
provided a command will return and failure if the duration is passed.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-11-19 17:30:28 +02:00
Nadav Har'El
b38c3f1288 Merge "Add separate counters for accesses to system tables"
Merged patch series from Juliusz Stasiewicz:

Welcome to my first PR to Scylla!
The task was intended as a warm-up ("noob") exercise; its description is
here: #4182 Sorry, I also couldn't help it and did some scouting: edited
descriptions of some metrics and shortened few annoyingly long LoC.
2019-11-19 15:21:56 +02:00
Piotr Dulikowski
9be842d3d8 row_marker: tests for row expiration 2019-11-19 13:45:30 +01:00
Tomasz Grabiec
5e4abd75cc main: Abort on EBADF and ENOTSOCK by default
Those are typically symptoms of use-after-free or memory corruption in
the program. It's better to catch such error sooner than later.

That situation is also dangerous since if a valid descriptor would
land under the invalid access, not the one which was intended for the
operation, then the operation may be performed on the wrong file and
result in corruption.

Message-Id: <1565206788-31254-1-git-send-email-tgrabiec@scylladb.com>
2019-11-19 13:07:33 +02:00
Piotr Dulikowski
589313a110 row_marker: correct expiration condition
This change corrects condition on which a row was considered expired by
its TTL.

The logic that decides when a row becomes expired was inconsistent with
the logic that decides if a single cell is expired. A single cell
becomes expired when `expiry_timestamp <= now`, while a row became
expired when `expiry_timestamp < now` (notice the strict inequality).
For rows inserted with TTL, this caused non-key cells to expire (change
their values to null) one second before the row disappeared. Now, row
expiry logic uses non-strict inequality.

Fixes: #4263, #5290.

Tests:
- unit(dev)
- python test described in issue #5290
2019-11-19 11:46:59 +01:00
Pekka Enberg
505f2c1008 test.py: Append test repeat cycle to output XML filename
Currently, we overwrite the same XML output file for each test repeat
cycle. This can cause invalid XML to be generated if the XML contents
don't match exactly for every iteration.

Fix the problem by appending the test repeat cycle in the XML filename
as follows:

  $ ./test.py --repeat 3 --name vint_serialization_test --mode dev --jenkins jenkins_test

  $ ls -1 *.xml
  jenkins_test.release.vint_serialization_test.0.boost.xml
  jenkins_test.release.vint_serialization_test.1.boost.xml
  jenkins_test.release.vint_serialization_test.2.boost.xml


Fixes #5303.

Message-Id: <20191119092048.16419-1-penberg@scylladb.com>
2019-11-19 11:30:47 +02:00
Rafael Ávila de Espíndola
750adee6e3 lua: fix build with boost 1.67 and older vs fmt
It is not completely clear why the fmt base code fails with boost
1.67, but it is easy to avoid.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191118210540.129603-1-espindola@scylladb.com>
2019-11-19 11:14:00 +02:00
Tomasz Grabiec
ff567649fa Merge "gossip: Limit number of pending gossip ACK and ACK2 messages" from Asias
In a cross-dc large cluster, the receiver node of the gossip SYN message
might be slow to send the gossip ACK message. The ack messages can be
large if the payload of the application state is big, e.g.,
CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK
message can consume unlimited amount of memory which causes OOM
eventually.

To fix, this patch queues the SYN message and handles it later if the
previous ACK message is still being sent. However, we only store the
latest SYN message. Since the latest SYN message from peer has the
latest information, so it is safe to drop the previous SYN message and
keep the latest one only. After this patch, there can be at most 1
pending SYN message and 1 pending ACK message per peer node.
2019-11-18 10:52:38 +01:00
Benny Halevy
f9e93bba38 sstables: compaction: move cleanup parameter to compaction_descriptor
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191117165806.3234-1-bhalevy@scylladb.com>
2019-11-18 10:52:20 +01:00
Avi Kivity
1fe062aed4 Merge "Add basic UDF support" from Rafael
"

This patch series adds only UDF support, UDA will be in the next patch series.

With this all CQL types are mapped to Lua. Right now we setup a new
lua state and copy the values for each argument and return. This will
be optimized once profiled.

We require --experimental to enable UDF in case there is some change
to the table format.
"

* 'espindola/udf-only-v4' of https://github.com/espindola/scylla: (65 commits)
  Lua: Document the conversions between Lua and CQL
  Lua: Implement decimal subtraction
  Lua: Implement decimal addition
  Lua: Implement support for returning decimal
  Lua: Implement decimal to string conversion
  Lua: Implement decimal to floating point conversion
  Lua: Implement support for decimal arguments
  Lua: Implement support for returning varint
  Lua: Implement support for returning duration
  Lua: Implement support for duration arguments
  Lua: Implement support for returning inet
  Lua: Implement support for inet arguments
  Lua: Implement support for returning time
  Lua: Implement support for time arguments
  Lua: Implement support for returning timeuuid
  Lua: Implement support for returning uuid
  Lua: Implement support for uuid and timeuuid arguments
  Lua: Implement support for returning date
  Lua: Implement support for date arguments
  Lua: Implement support for returning timestamp
  ...
2019-11-17 16:38:19 +02:00
Konstantin Osipov
48f3ca0fcb test.py: use the configured build modes from ninja mode_list
Add mode_list rule to ninja build and use it by default when searching
for tests in test.py.

Now it is no longer necessary to explicitly specify the test mode when
invoking test.py.

(cherry picked from commit a211ff30c7f2de12166d8f6f10d259207b462d4b)
2019-11-17 13:42:10 +01:00
Nadav Har'El
2fb2eb27a2 sstables: allow non-traditional characters in table name
The goal of this patch is to fix issue #5280, a rather serious Alternator
bug, where Scylla fails to restart when an Alternator table has secondary
indexes (LSI or GSI).

Traditionally, Cassandra allows table names to contain only alphanumeric
characters and underscores. However, most of our internal implementation
doesn't actually have this restriction. So Alternator uses the characters
':' and '!' in the table names to mark global and local secondary indexes,
respectively. And this actually works. Or almost...

This patch fixes a problem of listing, during boot, the sstables stored
for tables with such non-traditional names. The sstable listing code
needlessly assumes that the *directory* name, i.e., the CF names, matches
the "\w+" regular expression. When an sstable is found in a directory not
matching such regular expression, the boot fails. But there is no real
reason to require such a strict regular expression. So this patch relaxes
this requirement, and allows Scylla to boot with Alternator's GSI and LSI
tables and their names which include the ":" and "!" characters, and in
fact any other name allowed as a directory name.

Fixes #5280.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191114153811.17386-1-nyh@scylladb.com>
2019-11-17 14:27:47 +02:00
Shlomi Livne
3e873812a4 Document backport queue and procedure (#5282)
This document adds information about how fixes are tracked to be
backported into releases and what is the procedure that is followed to
backport those fixes.

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2019-11-17 01:45:24 -08:00
Benny Halevy
c215ad79a9 scylla-gdb: resolve: add startswith parameter
Allow filtering the resolved addresses by a startswith string.

The common use case if for resolving vtable ptrs, when resolving
the output of `find_vptrs` that may be too long for the host
(running gdb) memory size. In this case the number of vtable
ptrs is considerably smaller than the total number of objects
returned by find_ptrs (e.g. 462 vs. 69625 in a OOM core I
examined from scylla --smp=2 --memory=1024M)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-11-17 11:40:54 +02:00
Benny Halevy
2f688dcf08 scylla-gdb.py: find_single_sstable_readers: fix support for sstable_mutation_reader
provide template arguments for k_l and m readers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-11-17 11:02:05 +02:00
Kamil Braun
a67e887dea sstables: fix sstable file I/O CQL tracing when reading multiple files (#5285)
CQL tracing would only report file I/O involving one sstable, even if
multiple sstables were read from during the query.

Steps to reproduce:

create a table with NullCompactionStrategy
insert row, flush memtables
insert row, flush memtables
restart Scylla
tracing on
select * from table
The trace would only report DMA reads from one of the two sstables.

Kudos to @denesb for catching this.

Related issue: #4908
2019-11-17 00:38:37 -08:00
Tomasz Grabiec
a384d0af76 Merge "A set of cleanups over main() code" from Pavel E.
There are ... signs of massive start/stop code rework in the
main() function. While fixing the sub-modules interdependencies
during start/stop I've polished these signs too, so here's the
simplest ones.
2019-11-15 15:25:18 +01:00
Pavel Emelyanov
1dc490c81c tracing: Move register_tracing_keyspace_backend forward decl into proper header
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
7e81df71ba main: Shorten developer_mode() evaluation
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
1bd68d87fc main: Do not carry pctx all over the code
v2:
- do not use struct initialization extention

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
655b6d0d1e main: Hide start_thrift
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
26f2b2ce5e main,db: Kill some unused .hh includes
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
f5b345604f main: Factor out get_conf_sub
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
924d52573d main: Remove unused return_value variable (and capture)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Juliusz Stasiewicz
1cfa458409 metrics: separate counters for `system' KS accesses
Resolves #4182. Metrics per system tables are accumulated separately,
depending on the origin of query (DB internals vs clients).
2019-11-14 13:14:39 +01:00
Juliusz Stasiewicz
b1e4d222ed cql3: cosmetics - improved description of metrics 2019-11-14 10:35:42 +01:00
Rafael Ávila de Espíndola
10bcbaf348 Lua: Document the conversions between Lua and CQL
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
6ffddeae5e Lua: Implement decimal subtraction
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
aba8e531d1 Lua: Implement decimal addition
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
bb84eabbb3 Lua: Implement support for returning decimal
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
bc17312a86 Lua: Implement decimal to string conversion
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
e83d5bf375 Lua: Implement decimal to floating point conversion
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b568bf4f54 Lua: Implement support for decimal arguments
This is just the minimum to pass a value to Lua. Right now you can't
actually do anything with it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
6c3f050eb4 Lua: Implement support for returning varint
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
dc377abd68 Lua: Implement support for returning duration
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
c3f021d2e4 Lua: Implement support for duration arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9208b2f498 Lua: Implement support for returning inet
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
64be94ab01 Lua: Implement support for inet arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
faf029d472 Lua: Implement support for returning time
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
772f2a4982 Lua: Implement support for time arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
484f498534 Lua: Implement support for returning timeuuid
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9c2daf6554 Lua: Implement support for returning uuid
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ae1a1a4085 Lua: Implement support for uuid and timeuuid arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
f8aeed5beb Lua: Implement support for returning date
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
384effa54b Lua: Implement support for date arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
63bc960152 Lua: Implement support for returning timestamp
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ee95756f62 Lua: Implement support for timestamp arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
1c6d5507b4 Lua: Implement support for returning counter
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
0d9d53b5da Lua: Implement support for counter arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
74c4e58b6b Lua: Add a test for nested types.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b226511ce8 Lua: Implement support for returning maps
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
5c8d1a797f Lua: Implement support for map arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b5b15ce4e6 Lua: Implement support for returning set
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
cf7ba441e4 Lua: Implement support for set arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
02f076be43 Lua: Implement support for returning udt
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
92c8e94d9a Lua: Implement support for udt arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
a7c3f6f297 Lua: Implement support for returning list
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
688736f5ff Lua: Implement support for returning tuple
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ab5708a711 Lua: Implement support for list and tuple arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
534f29172c Lua: Implement support for returning boolean
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b03c580493 Lua: Implement support for boolean arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
dcfe397eb6 Lua: Implement support for returning floating point
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
cf4b7ab39a Lua: Implement support for returning blob
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
3d22433cd4 Lua: Implement support for blob arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
dd754fcf01 Lua: Implement support for returning ascii
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
affb1f8efd Lua: Implement support for returning text
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
f8ed347ee7 Lua: Implement support for string arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
0e4f047113 Lua: Implement a visitor for return values
This adds support for all integer types. Followup commits will
implement the missing types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
34b770e2fb Lua: Push varint as decimal
This makes it substantially simpler to support both varint and
decimal, which will be implemented in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9b3cab8865 Lua: Implement support for varint to integer conversion
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
5a40264d97 Lua: Implement support for varint arguments
Right now it is not possible to do anything with the value.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
3230b8bd86 Lua: Implement support for floating point arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9ad2cc2850 Lua: Implement a visitor for arguments
With this we support all simple integer types. Followup patches will
implement the missing types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ee1d87a600 Lua: Plug in the interpreter
This add a wrapper around the lua interpreter so that function
executions are interruptible and return futures.

With this patch it is possible to write and use simple UDFs that take
and return integer values.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
bc3bba1064 Lua: Add lua.cc and lua.hh skeleton files
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
7015e219ca Lua: Link with liblua
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
61200ebb04 Lua: Add config options
This patch just adds the config options that we will expose for the
lua runtime.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
d9337152f3 Use threads when executing user functions
This adds a requires_thread predicate to functions and propagates that
up until we get to code that already returns futures.

We can then use the predicate to decide if we need to use
seastar::async.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
52b48b415c Test that schema digests with UDFs don't change
This refactors test_schema_digest_does_not_change to also test a
schema with user defined functions and user defined aggregates.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
fc72a64c67 Add schema propagation and storage for UDF
With this it is possible to create user defined functions and
aggregates and they are saved to disk and the schema change is
propagated.

It is just not possible to call them yet.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ce6304d920 UDF: Add a feature and config option to track if udf is enabled
It can only be enabled with --experimental.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:40:47 -08:00
Rafael Ávila de Espíndola
dd17dfcbef Reject "OR REPLACE ... IF NOT EXISTS" in the grammar
The parser now rejects having both OR REPLACE and IF NOT EXISTS in the
same statement.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
e7e3dab4aa Convert UDF parsing code to c++
For now this just constructs the corresponding c++ classes.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
5c45f3b573 Update UDF syntax
This updates UDF syntax to the current specification.

In particular, this removes DETERMINISTIC and adds "CALLED ON NULL
INPUT" and "RETURNS NULL ON NULL INPUT".

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
c75cd5989c transport: Add support for FUNCTION and AGGREGATE to schema_change
While at it, modernize the code a bit and add a test.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
dac3cf5059 Clear functions between cql_test_env runs
At some point we should make the function list non static, but this
allows us to write tests for now.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
de1a970b93 cql: convert functions to add, remove and replace functions
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
33f9d196f9 Add iterator version of functions::find
This avoids allocating a std::vector and is more flexible since the
iterator can be passed to erase.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
7f9dadee5c Implement functions::type_equals.
Since the types are uniqued we can just use ==.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
5cef5a1b38 types: Add a friend visitor over data_value
This is a simple wrapper that allows code that is not in the types
hierarchy to visit a data_value.

Will be used by UDF.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
9bf9a84e4d types: Move the data_value visitor to a header
It will be used by the UDF implementation.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Asias He
f32ae00510 gossip: Limit number of pending gossip ACK2 messages
Similar to "gossip: Limit number of pending gossip ACK messages", limit
the number of pending gossip ACK2 messages in gossiper::handle_ack_msg.

Fixes #5210
2019-10-25 12:44:28 +08:00
Asias He
15148182ab gossip: Limit number of pending gossip ACK messages
In a cross-dc large cluster, the receiver node of the gossip SYN message
might be slow to send the gossip ACK message. The ack messages can be
large if the payload of the application state is big, e.g.,
CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK
message can consume unlimited amount of memory which causes OOM
eventually.

To fix, this patch queues the SYN message and handles it later if the
previous ACK message is still being sent. However, we only store the
latest SYN message. Since the latest SYN message from peer has the
latest information, so it is safe to drop the previous SYN message and
keep the latest one only. After this patch, there can be at most 1
pending SYN message and 1 pending ACK message per peer node.

Fixes #5210
2019-10-25 12:44:28 +08:00
Benny Halevy
7827e3f11d tests: test_large_data: do not stop database
Now that compaction returns only after the compacted sstables are
deleted we no longer need to stop the base to force waiting
for deletes (that were previously done asynchronously)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:38 +03:00
Benny Halevy
19b67d82c9 table::on_compaction_completion: fix indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:38 +03:00
Benny Halevy
8dd6e13468 table::on_compaction_completion: wait for background deletes
Don't let background deletes accumulate uncontrollably.

Fixes #4909

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:38 +03:00
Benny Halevy
da6645dc2c table: refresh_snapshot before deleting any sstables
The row cache must not hold refrences to any sstable we're
about to delete.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:29 +03:00
2597 changed files with 19987 additions and 7200 deletions

2
.gitignore vendored
View File

@@ -22,3 +22,5 @@ resources
.pytest_cache
/expressions.tokens
tags
testlog/*
test/*/*.reject

View File

@@ -97,7 +97,7 @@ scan_scylla_source_directories(
service
sstables
streaming
tests
test
thrift
tracing
transport

View File

@@ -5,8 +5,6 @@ F: Filename, directory, or pattern for the subsystem
---
AUTH
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Calle Wilund <calle@scylladb.com>
R: Vlad Zolotarov <vladz@scylladb.com>
R: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
@@ -14,22 +12,17 @@ F: auth/*
CACHE
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Paweł Dziepak <pdziepak@scylladb.com>
R: Piotr Jastrzebski <piotr@scylladb.com>
F: row_cache*
F: *mutation*
F: tests/mvcc*
COMMITLOG / BATCHLOGa
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Calle Wilund <calle@scylladb.com>
F: db/commitlog/*
F: db/batch*
COORDINATOR
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Gleb Natapov <gleb@scylladb.com>
F: service/storage_proxy*
@@ -49,12 +42,10 @@ M: Pekka Enberg <penberg@scylladb.com>
F: cql3/*
COUNTERS
M: Paweł Dziepak <pdziepak@scylladb.com>
F: counters*
F: tests/counter_test*
GOSSIP
M: Duarte Nunes <duarte@scylladb.com>
M: Tomasz Grabiec <tgrabiec@scylladb.com>
R: Asias He <asias@scylladb.com>
F: gms/*
@@ -65,14 +56,11 @@ F: dist/docker/*
LSA
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Paweł Dziepak <pdziepak@scylladb.com>
F: utils/logalloc*
MATERIALIZED VIEWS
M: Duarte Nunes <duarte@scylladb.com>
M: Pekka Enberg <penberg@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
R: Duarte Nunes <duarte@scylladb.com>
M: Nadav Har'El <nyh@scylladb.com>
F: db/view/*
F: cql3/statements/*view*
@@ -82,14 +70,12 @@ F: dist/*
REPAIR
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Asias He <asias@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
F: repair/*
SCHEMA MANAGEMENT
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
M: Pekka Enberg <penberg@scylladb.com>
F: db/schema_tables*
F: db/legacy_schema_migrator*
@@ -98,15 +84,13 @@ F: schema*
SECONDARY INDEXES
M: Pekka Enberg <penberg@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
M: Nadav Har'El <nyh@scylladb.com>
R: Pekka Enberg <penberg@scylladb.com>
F: db/index/*
F: cql3/statements/*index*
SSTABLES
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Raphael S. Carvalho <raphaelsc@scylladb.com>
R: Glauber Costa <glauber@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
@@ -114,18 +98,17 @@ F: sstables/*
STREAMING
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Asias He <asias@scylladb.com>
F: streaming/*
F: service/storage_service.*
THRIFT TRANSPORT LAYER
M: Duarte Nunes <duarte@scylladb.com>
F: thrift/*
ALTERNATOR
M: Nadav Har'El <nyh@scylladb.com>
F: alternator/*
F: alternator-test/*
THE REST
M: Avi Kivity <avi@scylladb.com>
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Nadav Har'El <nyh@scylladb.com>
F: *

View File

@@ -27,10 +27,10 @@ Please see [HACKING.md](HACKING.md) for detailed information on building and dev
```
* run Scylla with one CPU and ./tmp as data directory
* run Scylla with one CPU and ./tmp as work directory
```
./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1
./build/release/scylla --workdir tmp --smp 1
```
* For more run options:

View File

@@ -1,7 +1,7 @@
#!/bin/sh
PRODUCT=scylla
VERSION=3.2.5
VERSION=3.3.4
if test -f version
then

View File

@@ -55,7 +55,7 @@ def test_expired_signature(dynamodb, test_table):
'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',
'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'
}
response = requests.post(url, headers=headers)
response = requests.post(url, headers=headers, verify=False)
assert not response.ok
assert "InvalidSignatureException" in response.text and "Signature expired" in response.text
@@ -69,6 +69,6 @@ def test_signature_too_futuristic(dynamodb, test_table):
'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',
'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'
}
response = requests.post(url, headers=headers)
response = requests.post(url, headers=headers, verify=False)
assert not response.ok
assert "InvalidSignatureException" in response.text and "Signature not yet current" in response.text

File diff suppressed because it is too large Load Diff

View File

@@ -41,7 +41,6 @@ def test_describe_table_basic(test_table):
# Test that DescribeTable correctly returns the table's schema, in
# AttributeDefinitions and KeySchema attributes
@pytest.mark.xfail(reason="DescribeTable does not yet return schema")
def test_describe_table_schema(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
expected = { # Copied from test_table()'s fixture

View File

@@ -86,7 +86,6 @@ def test_update_expected_1_eq_true(test_table_s):
# Check that set equality is checked correctly. Unlike string equality (for
# example), it cannot be done with just naive string comparison of the JSON
# representation, and we need to allow for any order.
@pytest.mark.xfail(reason="bug in EQ test of sets")
def test_update_expected_1_eq_set(test_table_s):
p = random_string()
# Because boto3 sorts the set values we give it, in order to generate a
@@ -171,7 +170,6 @@ def test_update_expected_1_ne_false(test_table_s):
)
# Tests for Expected with ComparisonOperator = "LE":
@pytest.mark.xfail(reason="ComparisonOperator=LE in Expected not yet implemented")
def test_update_expected_1_le(test_table_s):
p = random_string()
# LE should work for string, number, and binary type
@@ -308,7 +306,6 @@ def test_update_expected_1_lt(test_table_s):
)
# Tests for Expected with ComparisonOperator = "GE":
@pytest.mark.xfail(reason="ComparisonOperator=GE in Expected not yet implemented")
def test_update_expected_1_ge(test_table_s):
p = random_string()
# GE should work for string, number, and binary type
@@ -526,7 +523,6 @@ def test_update_expected_1_null(test_table_s):
)
# Tests for Expected with ComparisonOperator = "CONTAINS":
@pytest.mark.xfail(reason="ComparisonOperator=CONTAINS in Expected not yet implemented")
def test_update_expected_1_contains(test_table_s):
# true cases. CONTAINS can be used for two unrelated things: check substrings
# (in string or binary) and membership (in set or list).
@@ -609,7 +605,6 @@ def test_update_expected_1_contains(test_table_s):
)
# Tests for Expected with ComparisonOperator = "NOT_CONTAINS":
@pytest.mark.xfail(reason="ComparisonOperator=NOT_CONTAINS in Expected not yet implemented")
def test_update_expected_1_not_contains(test_table_s):
# true cases. NOT_CONTAINS can be used for two unrelated things: check substrings
# (in string or binary) and membership (in set or list).
@@ -699,14 +694,21 @@ def test_update_expected_1_not_contains(test_table_s):
def test_update_expected_1_begins_with_true(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
AttributeUpdates={'a': {'Value': 'hello', 'Action': 'PUT'}})
AttributeUpdates={'a': {'Value': 'hello', 'Action': 'PUT'},
'd': {'Value': bytearray('hi there', 'utf-8'), 'Action': 'PUT'}})
# Case where expected and update are on different attribute:
test_table_s.update_item(Key={'p': p},
AttributeUpdates={'b': {'Value': 3, 'Action': 'PUT'}},
Expected={'a': {'ComparisonOperator': 'BEGINS_WITH',
'AttributeValueList': ['hell']}}
)
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 3}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 3
test_table_s.update_item(Key={'p': p},
AttributeUpdates={'b': {'Value': 4, 'Action': 'PUT'}},
Expected={'d': {'ComparisonOperator': 'BEGINS_WITH',
'AttributeValueList': [bytearray('hi', 'utf-8')]}}
)
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 4
# For BEGINS_WITH, AttributeValueList must have a single element
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
@@ -798,13 +800,13 @@ def test_update_expected_1_in(test_table_s):
)
# Tests for Expected with ComparisonOperator = "BETWEEN":
@pytest.mark.xfail(reason="ComparisonOperator=BETWEEN in Expected not yet implemented")
def test_update_expected_1_between(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
AttributeUpdates={'a': {'Value': 2, 'Action': 'PUT'},
'b': {'Value': 'cat', 'Action': 'PUT'},
'c': {'Value': bytearray('cat', 'utf-8'), 'Action': 'PUT'}})
'c': {'Value': bytearray('cat', 'utf-8'), 'Action': 'PUT'},
'd': {'Value': set([2, 4, 7]), 'Action': 'PUT'}})
# true cases:
test_table_s.update_item(Key={'p': p},
AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
@@ -842,6 +844,10 @@ def test_update_expected_1_between(test_table_s):
AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
Expected={'a': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': ['cat', 'dog']}}
)
with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
test_table_s.update_item(Key={'p': p},
AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
Expected={'q': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': [0, 100]}})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 6
# The given AttributeValueList array must contain exactly two items of the
# same type, and in the right order. Any other input is considered a validation
@@ -858,10 +864,18 @@ def test_update_expected_1_between(test_table_s):
test_table_s.update_item(Key={'p': p},
AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
Expected={'a': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': [4, 3]}})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
Expected={'b': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': ['dog', 'aardvark']}})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
Expected={'a': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': [4, 'dog']}})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
Expected={'d': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': [set([1]), set([2])]}})
##############################################################################
# Instead of ComparisonOperator and AttributeValueList, one can specify either

View File

@@ -377,7 +377,6 @@ def test_gsi_3(test_table_gsi_3):
KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
@pytest.mark.xfail(reason="GSI in alternator currently have a bug on updating the second regular base column")
def test_gsi_update_second_regular_base_column(test_table_gsi_3):
items = [{'p': random_string(), 'a': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
with test_table_gsi_3.batch_writer() as batch:
@@ -389,6 +388,34 @@ def test_gsi_update_second_regular_base_column(test_table_gsi_3):
KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
# Test that when a table has a GSI, if the indexed attribute is missing, the
# item is added to the base table but not the index.
# This is the same feature we already tested in test_gsi_missing_attribute()
# above, but on a different table: In that test we used test_table_gsi_2,
# with one indexed attribute, and in this test we use test_table_gsi_3 which
# has two base regular attributes in the view key, and more possibilities
# of which value might be missing. Reproduces issue #6008.
def test_gsi_missing_attribute_3(test_table_gsi_3):
p = random_string()
a = random_string()
b = random_string()
# First, add an item with a missing "a" value. It should appear in the
# base table, but not in the index:
test_table_gsi_3.put_item(Item={'p': p, 'b': b})
assert test_table_gsi_3.get_item(Key={'p': p})['Item'] == {'p': p, 'b': b}
# Note: with eventually consistent read, we can't really be sure that
# an item will "never" appear in the index. We hope that if a bug exists
# and such an item did appear, sometimes the delay here will be enough
# for the unexpected item to become visible.
assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])
# Same thing for an item with a missing "b" value:
test_table_gsi_3.put_item(Item={'p': p, 'a': a})
assert test_table_gsi_3.get_item(Key={'p': p})['Item'] == {'p': p, 'a': a}
assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])
# And for an item missing both:
test_table_gsi_3.put_item(Item={'p': p})
assert test_table_gsi_3.get_item(Key={'p': p})['Item'] == {'p': p}
assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])
# A fourth scenario of GSI. Two GSIs on a single base table.
@pytest.fixture(scope="session")
@@ -477,6 +504,52 @@ def test_gsi_5(test_table_gsi_5):
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'x': {'AttributeValueList': [x2], 'ComparisonOperator': 'EQ'}})
# Verify that DescribeTable correctly returns the schema of both base-table
# and secondary indexes. KeySchema is given for each of the base table and
# indexes, and AttributeDefinitions is merged for all of them together.
def test_gsi_5_describe_table_schema(test_table_gsi_5):
got = test_table_gsi_5.meta.client.describe_table(TableName=test_table_gsi_5.name)['Table']
# Copied from test_table_gsi_5 fixture
expected_base_keyschema = [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' } ]
expected_gsi_keyschema = [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x', 'KeyType': 'RANGE' } ]
expected_all_attribute_definitions = [
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' } ]
assert got['KeySchema'] == expected_base_keyschema
gsis = got['GlobalSecondaryIndexes']
assert len(gsis) == 1
assert gsis[0]['KeySchema'] == expected_gsi_keyschema
# The list of attribute definitions may be arbitrarily reordered
assert multiset(got['AttributeDefinitions']) == multiset(expected_all_attribute_definitions)
# Similar DescribeTable schema test for test_table_gsi_2. The peculiarity
# in that table is that the base table has only a hash key p, and index
# only hash hash key x; Now, while internally Scylla needs to add "p" as a
# clustering key in the materialized view (in Scylla the view key always
# contains the base key), when describing the table, "p" shouldn't be
# returned as a range key, because the user didn't ask for it.
# This test reproduces issue #5320.
@pytest.mark.xfail(reason="GSI DescribeTable spurious range key (#5320)")
def test_gsi_2_describe_table_schema(test_table_gsi_2):
got = test_table_gsi_2.meta.client.describe_table(TableName=test_table_gsi_2.name)['Table']
# Copied from test_table_gsi_2 fixture
expected_base_keyschema = [ { 'AttributeName': 'p', 'KeyType': 'HASH' } ]
expected_gsi_keyschema = [ { 'AttributeName': 'x', 'KeyType': 'HASH' } ]
expected_all_attribute_definitions = [
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' } ]
assert got['KeySchema'] == expected_base_keyschema
gsis = got['GlobalSecondaryIndexes']
assert len(gsis) == 1
assert gsis[0]['KeySchema'] == expected_gsi_keyschema
# The list of attribute definitions may be arbitrarily reordered
assert multiset(got['AttributeDefinitions']) == multiset(expected_all_attribute_definitions)
# All tests above involved "ProjectionType: ALL". This test checks how
# "ProjectionType:: KEYS_ONLY" works. We note that it projects both
# the index's key, *and* the base table's key. So items which had different

View File

@@ -29,6 +29,7 @@ def test_health_works(dynamodb):
# Test that a health check only works for the root URL ('/')
def test_health_only_works_for_root_path(dynamodb):
url = dynamodb.meta.client._endpoint.host
for suffix in ['/abc', '/..', '/-', '/index.htm', '/health']:
response = requests.get(url + suffix)
for suffix in ['/abc', '/-', '/index.htm', '/health']:
print(url + suffix)
response = requests.get(url + suffix, verify=False)
assert response.status_code in range(400, 405)

View File

@@ -20,7 +20,7 @@
import random
import pytest
from botocore.exceptions import ClientError
from botocore.exceptions import ClientError, ParamValidationError
from decimal import Decimal
from util import random_string, random_bytes, full_query, multiset
from boto3.dynamodb.conditions import Key, Attr
@@ -356,3 +356,161 @@ def test_query_which_key(test_table):
'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'},
'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# Test the "Select" parameter of Query. The default Select mode,
# ALL_ATTRIBUTES, returns items with all their attributes. Other modes
# allow returning just specific attributes or just counting the results
# without returning items at all.
@pytest.mark.xfail(reason="Select not supported yet")
def test_query_select(test_table_sn):
numbers = [Decimal(i) for i in range(10)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num, 'x': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that we get back the numbers in their sorted order. By default,
# query returns all attributes:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
got_x_attributes = [x['x'] for x in got_items]
assert got_x_attributes == numbers
# Select=ALL_ATTRIBUTES does exactly the same as the default - return
# all attributes:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='ALL_ATTRIBUTES')['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
got_x_attributes = [x['x'] for x in got_items]
assert got_x_attributes == numbers
# Select=ALL_PROJECTED_ATTRIBUTES is not allowed on a base table (it
# is just for indexes, when IndexName is specified)
with pytest.raises(ClientError, match='ValidationException'):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='ALL_PROJECTED_ATTRIBUTES')
# Select=SPECIFIC_ATTRIBUTES requires that either a AttributesToGet
# or ProjectionExpression appears, but then really does nothing:
with pytest.raises(ClientError, match='ValidationException'):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES')
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES', AttributesToGet=['x'])['Items']
expected_items = [{'x': i} for i in numbers]
assert got_items == expected_items
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES', ProjectionExpression='x')['Items']
assert got_items == expected_items
# Select=COUNT just returns a count - not any items
got = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='COUNT')
assert got['Count'] == len(numbers)
assert not 'Items' in got
# Check again that we also get a count - not just with Select=COUNT,
# but without Select=COUNT we also get the items:
got = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
assert got['Count'] == len(numbers)
assert 'Items' in got
# Select with some unknown string generates a validation exception:
with pytest.raises(ClientError, match='ValidationException'):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='UNKNOWN')
# Test that the "Limit" parameter can be used to return only some of the
# items in a single partition. The items returned are the first in the
# sorted order.
def test_query_limit(test_table_sn):
numbers = [Decimal(i) for i in range(10)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that we get back the numbers in their sorted order.
# First, no Limit so we should get all numbers (we have few of them, so
# it all fits in the default 1MB limitation)
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
# Now try a few different Limit values, and verify that the query
# returns exactly the first Limit sorted numbers.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit)['Items']
assert len(got_items) == min(limit, len(numbers))
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers[0:limit]
# Unfortunately, the boto3 library forbids a Limit of 0 on its own,
# before even sending a request, so we can't test how the server responds.
with pytest.raises(ParamValidationError):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=0)
# In test_query_limit we tested just that Limit allows to stop the result
# after right right number of items. Here we test that such a stopped result
# can be resumed, via the LastEvaluatedKey/ExclusiveStartKey paging mechanism.
def test_query_limit_paging(test_table_sn):
numbers = [Decimal(i) for i in range(20)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that full_query() returns all these numbers, in sorted order.
# full_query() will do a query with the given limit, and resume it again
# and again until the last page.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit)
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
# Test that the ScanIndexForward parameter works, and can be used to
# return items sorted in reverse order. Combining this with Limit can
# be used to return the last items instead of the first items of the
# partition.
@pytest.mark.xfail(reason="ScanIndexForward not supported yet")
def test_query_reverse(test_table_sn):
numbers = [Decimal(i) for i in range(20)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that we get back the numbers in their sorted order or reverse
# order, depending on the ScanIndexForward parameter being True or False.
# First, no Limit so we should get all numbers (we have few of them, so
# it all fits in the default 1MB limitation)
reversed_numbers = list(reversed(numbers))
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=True)['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=False)['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == reversed_numbers
# Now try a few different Limit values, and verify that the query
# returns exactly the first Limit sorted numbers - in regular or
# reverse order, depending on ScanIndexForward.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit, ScanIndexForward=True)['Items']
assert len(got_items) == min(limit, len(numbers))
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers[0:limit]
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit, ScanIndexForward=False)['Items']
assert len(got_items) == min(limit, len(numbers))
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == reversed_numbers[0:limit]
# Test that paging also works properly with reverse order
# (ScanIndexForward=false), i.e., reverse-order queries can be resumed
@pytest.mark.xfail(reason="ScanIndexForward not supported yet")
def test_query_reverse_paging(test_table_sn):
numbers = [Decimal(i) for i in range(20)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
reversed_numbers = list(reversed(numbers))
# Verify that with ScanIndexForward=False, full_query() returns all
# these numbers in reversed sorted order - getting pages of Limit items
# at a time and resuming the query.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=False, Limit=limit)
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == reversed_numbers

View File

@@ -0,0 +1,226 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the ReturnValues parameter for the different update operations
# (PutItem, UpdateItem, DeleteItem).
import pytest
from botocore.exceptions import ClientError
from util import random_string
# Test trivial support for the ReturnValues parameter in PutItem, UpdateItem
# and DeleteItem - test that "NONE" works (and changes nothing), while a
# completely unsupported value gives an error.
# This test is useful to check that before the ReturnValues parameter is fully
# implemented, it returns an error when a still-unsupported ReturnValues
# option is attempted in the request - instead of simply being ignored.
def test_trivial_returnvalues(test_table_s):
# PutItem:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='NONE')
assert not 'Attributes' in ret
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='DOG')
# UpdateItem:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='NONE',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert not 'Attributes' in ret
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, ReturnValues='DOG',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})
# DeleteItem:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='NONE')
assert not 'Attributes' in ret
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='DOG')
# Test the ReturnValues parameter on a PutItem operation. Only two settings
# are supported for this parameter for this operation: NONE (the default)
# and ALL_OLD.
@pytest.mark.xfail(reason="ReturnValues not supported")
def test_put_item_returnvalues(test_table_s):
# By default, the previous value of an item is not returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert not 'Attributes' in ret
# Using ReturnValues=NONE is the same:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='NONE')
assert not 'Attributes' in ret
# With ReturnValues=ALL_OLD, the old value of the item is returned
# in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='ALL_OLD')
assert ret['Attributes'] == {'p': p, 'a': 'hi'}
# Other ReturnValue options - UPDATED_OLD, ALL_NEW, UPDATED_NEW,
# are supported by other operations but not by PutItem:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='UPDATED_OLD')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='ALL_NEW')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='UPDATED_NEW')
# Also, obviously, a non-supported setting "DOG" also returns in error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='DOG')
# The ReturnValues value is case sensitive, so while "NONE" is supported
# (and tested above), "none" isn't:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='none')
# Test the ReturnValues parameter on a DeleteItem operation. Only two settings
# are supported for this parameter for this operation: NONE (the default)
# and ALL_OLD.
@pytest.mark.xfail(reason="ReturnValues not supported")
def test_delete_item_returnvalues(test_table_s):
# By default, the previous value of an item is not returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p})
assert not 'Attributes' in ret
# Using ReturnValues=NONE is the same:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='NONE')
assert not 'Attributes' in ret
# With ReturnValues=ALL_OLD, the old value of the item is returned
# in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='ALL_OLD')
assert ret['Attributes'] == {'p': p, 'a': 'hi'}
# Other ReturnValue options - UPDATED_OLD, ALL_NEW, UPDATED_NEW,
# are supported by other operations but not by PutItem:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='UPDATE_OLD')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='ALL_NEW')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='UPDATE_NEW')
# Also, obviously, a non-supported setting "DOG" also returns in error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='DOG')
# The ReturnValues value is case sensitive, so while "NONE" is supported
# (and tested above), "none" isn't:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='none')
# Test the ReturnValues parameter on a UpdateItem operation. All five
# settings are supported for this parameter for this operation: NONE
# (the default), ALL_OLD, UPDATED_OLD, ALL_NEW and UPDATED_NEW.
@pytest.mark.xfail(reason="ReturnValues not supported")
def test_update_item_returnvalues(test_table_s):
# By default, the previous value of an item is not returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert not 'Attributes' in ret
# Using ReturnValues=NONE is the same:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='NONE',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert not 'Attributes' in ret
# With ReturnValues=ALL_OLD, the entire old value of the item (even
# attributes we did not modify) is returned in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='ALL_OLD',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert ret['Attributes'] == {'p': p, 'a': 'hi', 'b': 'dog'}
# With ReturnValues=UPDATED_OLD, only the overwritten attributes of the
# old item are returned in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
UpdateExpression='SET b = :val, c = :val2',
ExpressionAttributeValues={':val': 'cat', ':val2': 'hello'})
assert ret['Attributes'] == {'b': 'dog'}
# Even if an update overwrites an attribute by the same value again,
# this is considered an update, and the old value (identical to the
# new one) is returned:
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert ret['Attributes'] == {'b': 'cat'}
# Deleting an attribute also counts as overwriting it, of course:
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
UpdateExpression='REMOVE b')
assert ret['Attributes'] == {'b': 'cat'}
# With ReturnValues=ALL_NEW, the entire new value of the item (including
# old attributes we did not modify) is returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='ALL_NEW',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert ret['Attributes'] == {'p': p, 'a': 'hi', 'b': 'cat'}
# With ReturnValues=UPDATED_NEW, only the new value of the updated
# attributes are returned. Note that "updated attributes" means
# the newly set attributes - it doesn't require that these attributes
# have any previous values
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
UpdateExpression='SET b = :val, c = :val2',
ExpressionAttributeValues={':val': 'cat', ':val2': 'hello'})
assert ret['Attributes'] == {'b': 'cat', 'c': 'hello'}
# Deleting an attribute also counts as overwriting it, but the delete
# column is not returned in the response - so it's empty in this case.
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
UpdateExpression='REMOVE b')
assert not 'Attributes' in ret
# In the above examples, UPDATED_NEW is not useful because it just
# returns the new values we already know from the request... UPDATED_NEW
# becomes more useful in read-modify-write operations:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 1})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})
assert ret['Attributes'] == {'a': 2}
# A non-supported setting "DOG" also returns in error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, ReturnValues='DOG',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})
# The ReturnValues value is case sensitive, so while "NONE" is supported
# (and tested above), "none" isn't:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, ReturnValues='none',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})

View File

@@ -19,7 +19,7 @@
import pytest
from botocore.exceptions import ClientError
from util import random_string, full_scan, multiset
from util import random_string, full_scan, full_scan_and_count, multiset
from boto3.dynamodb.conditions import Attr
# Test that scanning works fine with/without pagination
@@ -189,3 +189,64 @@ def test_scan_with_key_equality_filtering(dynamodb, filled_test_table):
got_items = full_scan(table, ScanFilter=scan_filter_c_and_another)
expected_items = [item for item in items if "c" in item.keys() and "another" in item.keys() and item["c"] == "9" and item["another"] == "y"*16]
assert multiset(expected_items) == multiset(got_items)
# Test the "Select" parameter of Scan. The default Select mode,
# ALL_ATTRIBUTES, returns items with all their attributes. Other modes
# allow returning just specific attributes or just counting the results
# without returning items at all.
@pytest.mark.xfail(reason="Select not supported yet")
def test_scan_select(filled_test_table):
test_table, items = filled_test_table
got_items = full_scan(test_table)
# By default, a scan returns all the items, with all their attributes:
# query returns all attributes:
got_items = full_scan(test_table)
assert multiset(items) == multiset(got_items)
# Select=ALL_ATTRIBUTES does exactly the same as the default - return
# all attributes:
got_items = full_scan(test_table, Select='ALL_ATTRIBUTES')
assert multiset(items) == multiset(got_items)
# Select=ALL_PROJECTED_ATTRIBUTES is not allowed on a base table (it
# is just for indexes, when IndexName is specified)
with pytest.raises(ClientError, match='ValidationException'):
full_scan(test_table, Select='ALL_PROJECTED_ATTRIBUTES')
# Select=SPECIFIC_ATTRIBUTES requires that either a AttributesToGet
# or ProjectionExpression appears, but then really does nothing beyond
# what AttributesToGet and ProjectionExpression already do:
with pytest.raises(ClientError, match='ValidationException'):
full_scan(test_table, Select='SPECIFIC_ATTRIBUTES')
wanted = ['c', 'another']
got_items = full_scan(test_table, Select='SPECIFIC_ATTRIBUTES', AttributesToGet=wanted)
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
got_items = full_scan(test_table, Select='SPECIFIC_ATTRIBUTES', ProjectionExpression=','.join(wanted))
assert multiset(expected_items) == multiset(got_items)
# Select=COUNT just returns a count - not any items
(got_count, got_items) = full_scan_and_count(test_table, Select='COUNT')
assert got_count == len(items)
assert got_items == []
# Check that we also get a count in regular scans - not just with
# Select=COUNT, but without Select=COUNT we both items and count:
(got_count, got_items) = full_scan_and_count(test_table)
assert got_count == len(items)
assert multiset(items) == multiset(got_items)
# Select with some unknown string generates a validation exception:
with pytest.raises(ClientError, match='ValidationException'):
full_scan(test_table, Select='UNKNOWN')
# Test parallel scan, i.e., the Segments and TotalSegments options.
# In the following test we check that these parameters allow splitting
# a scan into multiple parts, and that these parts are in fact disjoint,
# and their union is the entire contents of the table. We do not actually
# try to run these queries in *parallel* in this test.
@pytest.mark.xfail(reason="parallel scan not supported yet")
def test_scan_parallel(filled_test_table):
test_table, items = filled_test_table
for nsegments in [1, 2, 17]:
print('Testing TotalSegments={}'.format(nsegments))
got_items = []
for segment in range(nsegments):
got_items.extend(full_scan(test_table, TotalSegments=nsegments, Segment=segment))
# The following comparison verifies that each of the expected item
# in items was returned in one - and just one - of the segments.
assert multiset(items) == multiset(got_items)

View File

@@ -39,6 +39,26 @@ def full_scan(table, **kwargs):
items.extend(response['Items'])
return items
# full_scan_and_count returns both items and count as returned by the server.
# Note that count isn't simply len(items) - the server returns them
# independently. e.g., with Select='COUNT' the items are not returned, but
# count is.
def full_scan_and_count(table, **kwargs):
response = table.scan(**kwargs)
items = []
count = 0
if 'Items' in response:
items.extend(response['Items'])
if 'Count' in response:
count = count + response['Count']
while 'LastEvaluatedKey' in response:
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
if 'Items' in response:
items.extend(response['Items'])
if 'Count' in response:
count = count + response['Count']
return (count, items)
# Utility function for fetching the entire results of a query into an array of items
def full_query(table, **kwargs):
response = table.query(**kwargs)

View File

@@ -66,8 +66,9 @@ static std::string format_time_point(db_clock::time_point tp) {
time_t time_point_repr = db_clock::to_time_t(tp);
std::string time_point_str;
time_point_str.resize(17);
::tm time_buf;
// strftime prints the terminating null character as well
std::strftime(time_point_str.data(), time_point_str.size(), "%Y%m%dT%H%M%SZ", std::gmtime(&time_point_repr));
std::strftime(time_point_str.data(), time_point_str.size(), "%Y%m%dT%H%M%SZ", ::gmtime_r(&time_point_repr, &time_buf));
time_point_str.resize(16);
return time_point_str;
}

View File

@@ -29,6 +29,7 @@
#include "rjson.hh"
#include "serialization.hh"
#include "base64.hh"
#include <stdexcept>
namespace alternator {
@@ -47,7 +48,9 @@ comparison_operator_type get_comparison_operator(const rjson::value& comparison_
{"NOT_NULL", comparison_operator_type::NOT_NULL},
{"BETWEEN", comparison_operator_type::BETWEEN},
{"BEGINS_WITH", comparison_operator_type::BEGINS_WITH},
}; //TODO: CONTAINS
{"CONTAINS", comparison_operator_type::CONTAINS},
{"NOT_CONTAINS", comparison_operator_type::NOT_CONTAINS},
};
if (!comparison_operator.IsString()) {
throw api_error("ValidationException", format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));
}
@@ -143,9 +146,44 @@ static void verify_operand_count(const rjson::value* array, const size_check& ex
}
}
struct rjson_engaged_ptr_comp {
bool operator()(const rjson::value* p1, const rjson::value* p2) const {
return rjson::single_value_comp()(*p1, *p2);
}
};
// It's not enough to compare underlying JSON objects when comparing sets,
// as internally they're stored in an array, and the order of elements is
// not important in set equality. See issue #5021
static bool check_EQ_for_sets(const rjson::value& set1, const rjson::value& set2) {
if (set1.Size() != set2.Size()) {
return false;
}
std::set<const rjson::value*, rjson_engaged_ptr_comp> set1_raw;
for (auto it = set1.Begin(); it != set1.End(); ++it) {
set1_raw.insert(&*it);
}
for (const auto& a : set2.GetArray()) {
if (set1_raw.count(&a) == 0) {
return false;
}
}
return true;
}
// Check if two JSON-encoded values match with the EQ relation
static bool check_EQ(const rjson::value* v1, const rjson::value& v2) {
return v1 && *v1 == v2;
if (!v1) {
return false;
}
if (v1->IsObject() && v1->MemberCount() == 1 && v2.IsObject() && v2.MemberCount() == 1) {
auto it1 = v1->MemberBegin();
auto it2 = v2.MemberBegin();
if ((it1->name == "SS" && it2->name == "SS") || (it1->name == "NS" && it2->name == "NS") || (it1->name == "BS" && it2->name == "BS")) {
return check_EQ_for_sets(it1->value, it2->value);
}
}
return *v1 == v2;
}
// Check if two JSON-encoded values match with the NE relation
@@ -174,9 +212,70 @@ static bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2) {
if (it1->name != it2->name) {
return false;
}
std::string_view val1(it1->value.GetString(), it1->value.GetStringLength());
std::string_view val2(it2->value.GetString(), it2->value.GetStringLength());
return val1.substr(0, val2.size()) == val2;
if (it2->name == "S") {
std::string_view val1(it1->value.GetString(), it1->value.GetStringLength());
std::string_view val2(it2->value.GetString(), it2->value.GetStringLength());
return val1.substr(0, val2.size()) == val2;
} else /* it2->name == "B" */ {
// TODO (optimization): Check the begins_with condition directly on
// the base64-encoded string, without making a decoded copy.
bytes val1 = base64_decode(it1->value);
bytes val2 = base64_decode(it2->value);
return val1.substr(0, val2.size()) == val2;
}
}
static std::string_view to_string_view(const rjson::value& v) {
return std::string_view(v.GetString(), v.GetStringLength());
}
static bool is_set_of(const rjson::value& type1, const rjson::value& type2) {
return (type2 == "S" && type1 == "SS") || (type2 == "N" && type1 == "NS") || (type2 == "B" && type1 == "BS");
}
// Check if two JSON-encoded values match with the CONTAINS relation
static bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
if (!v1) {
return false;
}
const auto& kv1 = *v1->MemberBegin();
const auto& kv2 = *v2.MemberBegin();
if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {
throw api_error("ValidationException",
format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "
"got {} instead", kv2.name));
}
if (kv1.name == "S" && kv2.name == "S") {
return to_string_view(kv1.value).find(to_string_view(kv2.value)) != std::string_view::npos;
} else if (kv1.name == "B" && kv2.name == "B") {
return base64_decode(kv1.value).find(base64_decode(kv2.value)) != bytes::npos;
} else if (is_set_of(kv1.name, kv2.name)) {
for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {
if (*i == kv2.value) {
return true;
}
}
} else if (kv1.name == "L") {
for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {
if (!i->IsObject() || i->MemberCount() != 1) {
clogger.error("check_CONTAINS received a list whose element is malformed");
return false;
}
const auto& el = *i->MemberBegin();
if (el.name == kv2.name && el.value == kv2.value) {
return true;
}
}
}
return false;
}
// Check if two JSON-encoded values match with the NOT_CONTAINS relation
static bool check_NOT_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
if (!v1) {
return false;
}
return !check_CONTAINS(v1, v2);
}
// Check if a JSON-encoded value equals any element of an array, which must have at least one element.
@@ -221,13 +320,13 @@ bool check_compare(const rjson::value* v1, const rjson::value& v2, const Compara
if (!v2.IsObject() || v2.MemberCount() != 1) {
throw api_error("ValidationException",
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic()));
cmp.diagnostic));
}
const auto& kv2 = *v2.MemberBegin();
if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {
throw api_error("ValidationException",
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic()));
cmp.diagnostic));
}
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
return false;
@@ -237,7 +336,7 @@ bool check_compare(const rjson::value* v1, const rjson::value& v2, const Compara
return false;
}
if (kv1.name == "N") {
return cmp(unwrap_number(*v1, cmp.diagnostic()), unwrap_number(v2, cmp.diagnostic()));
return cmp(unwrap_number(*v1, cmp.diagnostic), unwrap_number(v2, cmp.diagnostic));
}
if (kv1.name == "S") {
return cmp(std::string_view(kv1.value.GetString(), kv1.value.GetStringLength()),
@@ -252,15 +351,80 @@ bool check_compare(const rjson::value* v1, const rjson::value& v2, const Compara
struct cmp_lt {
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs; }
const char* diagnostic() const { return "LT operator"; }
static constexpr const char* diagnostic = "LT operator";
};
struct cmp_le {
// bytes only has <, so we cannot use <=.
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs || lhs == rhs; }
static constexpr const char* diagnostic = "LE operator";
};
struct cmp_ge {
// bytes only has <, so we cannot use >=.
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return rhs < lhs || lhs == rhs; }
static constexpr const char* diagnostic = "GE operator";
};
struct cmp_gt {
// bytes only has <
// bytes only has <, so we cannot use >.
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return rhs < lhs; }
const char* diagnostic() const { return "GT operator"; }
static constexpr const char* diagnostic = "GT operator";
};
// True if v is between lb and ub, inclusive. Throws if lb > ub.
template <typename T>
bool check_BETWEEN(const T& v, const T& lb, const T& ub) {
if (ub < lb) {
throw api_error("ValidationException",
format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));
}
return cmp_ge()(v, lb) && cmp_le()(v, ub);
}
static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const rjson::value& ub) {
if (!v) {
return false;
}
if (!v->IsObject() || v->MemberCount() != 1) {
throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", *v));
}
if (!lb.IsObject() || lb.MemberCount() != 1) {
throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", lb));
}
if (!ub.IsObject() || ub.MemberCount() != 1) {
throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", ub));
}
const auto& kv_v = *v->MemberBegin();
const auto& kv_lb = *lb.MemberBegin();
const auto& kv_ub = *ub.MemberBegin();
if (kv_lb.name != kv_ub.name) {
throw api_error(
"ValidationException",
format("BETWEEN operator requires the same type for lower and upper bound; instead got {} and {}",
kv_lb.name, kv_ub.name));
}
if (kv_v.name != kv_lb.name) { // Cannot compare different types, so v is NOT between lb and ub.
return false;
}
if (kv_v.name == "N") {
const char* diag = "BETWEEN operator";
return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag));
}
if (kv_v.name == "S") {
return check_BETWEEN(std::string_view(kv_v.value.GetString(), kv_v.value.GetStringLength()),
std::string_view(kv_lb.value.GetString(), kv_lb.value.GetStringLength()),
std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()));
}
if (kv_v.name == "B") {
return check_BETWEEN(base64_decode(kv_v.value), base64_decode(kv_lb.value), base64_decode(kv_ub.value));
}
throw api_error("ValidationException",
format("BETWEEN operator requires AttributeValueList elements to be of type String, Number, or Binary; instead got {}",
kv_lb.name));
}
// Verify one Expect condition on one attribute (whose content is "got")
// for the verify_expected() below.
// This function returns true or false depending on whether the condition
@@ -306,9 +470,15 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu
case comparison_operator_type::LT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_lt{});
case comparison_operator_type::LE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_le{});
case comparison_operator_type::GT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_gt{});
case comparison_operator_type::GE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_ge{});
case comparison_operator_type::BEGINS_WITH:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_BEGINS_WITH(got, (*attribute_value_list)[0]);
@@ -321,10 +491,17 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu
case comparison_operator_type::NOT_NULL:
verify_operand_count(attribute_value_list, empty(), *comparison_operator);
return check_NOT_NULL(got);
default:
// FIXME: implement all the missing types, so there will be no default here.
throw api_error("ValidationException", format("ComparisonOperator {} is not yet supported", *comparison_operator));
case comparison_operator_type::BETWEEN:
verify_operand_count(attribute_value_list, exact_size(2), *comparison_operator);
return check_BETWEEN(got, (*attribute_value_list)[0], (*attribute_value_list)[1]);
case comparison_operator_type::CONTAINS:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_CONTAINS(got, (*attribute_value_list)[0]);
case comparison_operator_type::NOT_CONTAINS:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_NOT_CONTAINS(got, (*attribute_value_list)[0]);
}
throw std::logic_error(format("Internal error: corrupted operator enum: {}", int(op)));
}
}

View File

@@ -37,7 +37,7 @@
namespace alternator {
enum class comparison_operator_type {
EQ, NE, LE, LT, GE, GT, IN, BETWEEN, CONTAINS, IS_NULL, NOT_NULL, BEGINS_WITH
EQ, NE, LE, LT, GE, GT, IN, BETWEEN, CONTAINS, NOT_CONTAINS, IS_NULL, NOT_NULL, BEGINS_WITH
};
comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator);

View File

@@ -35,6 +35,7 @@
#include "query-result-reader.hh"
#include "cql3/selection/selection.hh"
#include "cql3/result_set.hh"
#include "cql3/type_json.hh"
#include "bytes.hh"
#include "cql3/update_parameters.hh"
#include "server.hh"
@@ -237,17 +238,75 @@ static std::string get_string_attribute(const rjson::value& value, rjson::string
attribute_name, value));
}
return attribute_value->GetString();
}
// Convenience function for getting the value of a boolean attribute, or a
// default value if it is missing. If the attribute exists, but is not a
// bool, a descriptive api_error is thrown.
static bool get_bool_attribute(const rjson::value& value, rjson::string_ref_type attribute_name, bool default_return) {
const rjson::value* attribute_value = rjson::find(value, attribute_name);
if (!attribute_value) {
return default_return;
}
if (!attribute_value->IsBool()) {
throw api_error("ValidationException", format("Expected boolean value for attribute {}, got: {}",
attribute_name, value));
}
return attribute_value->GetBool();
}
// Convenience function for getting the value of an integer attribute, or
// an empty optional if it is missing. If the attribute exists, but is not
// an integer, a descriptive api_error is thrown.
static std::optional<int> get_int_attribute(const rjson::value& value, rjson::string_ref_type attribute_name) {
const rjson::value* attribute_value = rjson::find(value, attribute_name);
if (!attribute_value)
return {};
if (!attribute_value->IsInt()) {
throw api_error("ValidationException", format("Expected integer value for attribute {}, got: {}",
attribute_name, value));
}
return attribute_value->GetInt();
}
// Sets a KeySchema object inside the given JSON parent describing the key
// attributes of the the given schema as being either HASH or RANGE keys.
// Additionally, adds to a given map mappings between the key attribute
// names and their type (as a DynamoDB type string).
static void describe_key_schema(rjson::value& parent, const schema& schema, std::unordered_map<std::string,std::string>& attribute_types) {
rjson::value key_schema = rjson::empty_array();
for (const column_definition& cdef : schema.partition_key_columns()) {
rjson::value key = rjson::empty_object();
rjson::set(key, "AttributeName", rjson::from_string(cdef.name_as_text()));
rjson::set(key, "KeyType", "HASH");
rjson::push_back(key_schema, std::move(key));
attribute_types[cdef.name_as_text()] = type_to_string(cdef.type);
}
for (const column_definition& cdef : schema.clustering_key_columns()) {
rjson::value key = rjson::empty_object();
rjson::set(key, "AttributeName", rjson::from_string(cdef.name_as_text()));
rjson::set(key, "KeyType", "RANGE");
rjson::push_back(key_schema, std::move(key));
attribute_types[cdef.name_as_text()] = type_to_string(cdef.type);
// FIXME: this "break" can avoid listing some clustering key columns
// we added for GSIs just because they existed in the base table -
// but not in all cases. We still have issue #5320. See also
// reproducer in test_gsi_2_describe_table_schema.
break;
}
rjson::set(parent, "KeySchema", std::move(key_schema));
}
future<json::json_return_type> executor::describe_table(client_state& client_state, std::string content) {
future<json::json_return_type> executor::describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content) {
_stats.api_operations.describe_table++;
rjson::value request = rjson::parse(content);
elogger.trace("Describing table {}", request);
schema_ptr schema = get_table(_proxy, request);
tracing::add_table_name(client_state.get_trace_state(), schema->ks_name(), schema->cf_name());
tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
rjson::value table_description = rjson::empty_object();
rjson::set(table_description, "TableName", rjson::from_string(schema->cf_name()));
@@ -268,6 +327,11 @@ future<json::json_return_type> executor::describe_table(client_state& client_sta
rjson::set(table_description, "BillingModeSummary", rjson::empty_object());
rjson::set(table_description["BillingModeSummary"], "BillingMode", "PAY_PER_REQUEST");
rjson::set(table_description["BillingModeSummary"], "LastUpdateToPayPerRequestDateTime", rjson::value(creation_date_seconds));
std::unordered_map<std::string,std::string> key_attribute_types;
// Add base table's KeySchema and collect types for AttributeDefinitions:
describe_key_schema(table_description, *schema, key_attribute_types);
table& t = _proxy.get_db().local().find_column_family(schema);
if (!t.views().empty()) {
rjson::value gsi_array = rjson::empty_array();
@@ -282,6 +346,8 @@ future<json::json_return_type> executor::describe_table(client_state& client_sta
}
sstring index_name = cf_name.substr(delim_it + 1);
rjson::set(view_entry, "IndexName", rjson::from_string(index_name));
// Add indexes's KeySchema and collect types for AttributeDefinitions:
describe_key_schema(view_entry, *vptr, key_attribute_types);
// Local secondary indexes are marked by an extra '!' sign occurring before the ':' delimiter
rjson::value& index_array = (delim_it > 1 && cf_name[delim_it-1] == '!') ? lsi_array : gsi_array;
rjson::push_back(index_array, std::move(view_entry));
@@ -293,23 +359,32 @@ future<json::json_return_type> executor::describe_table(client_state& client_sta
rjson::set(table_description, "GlobalSecondaryIndexes", std::move(gsi_array));
}
}
// Use map built by describe_key_schema() for base and indexes to produce
// AttributeDefinitions for all key columns:
rjson::value attribute_definitions = rjson::empty_array();
for (auto& type : key_attribute_types) {
rjson::value key = rjson::empty_object();
rjson::set(key, "AttributeName", rjson::from_string(type.first));
rjson::set(key, "AttributeType", rjson::from_string(type.second));
rjson::push_back(attribute_definitions, std::move(key));
}
rjson::set(table_description, "AttributeDefinitions", std::move(attribute_definitions));
// FIXME: still missing some response fields (issue #5026)
// FIXME: more attributes! Check https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TableDescription.html#DDB-Type-TableDescription-TableStatus but also run a test to see what DyanmoDB really fills
// maybe for TableId or TableArn use schema.id().to_sstring().c_str();
// Of course, the whole schema is missing!
rjson::value response = rjson::empty_object();
rjson::set(response, "Table", std::move(table_description));
elogger.trace("returning {}", response);
return make_ready_future<json::json_return_type>(make_jsonable(std::move(response)));
}
future<json::json_return_type> executor::delete_table(client_state& client_state, std::string content) {
future<json::json_return_type> executor::delete_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content) {
_stats.api_operations.delete_table++;
rjson::value request = rjson::parse(content);
elogger.trace("Deleting table {}", request);
std::string table_name = get_table_name(request);
tracing::add_table_name(client_state.get_trace_state(), KEYSPACE_NAME, table_name);
tracing::add_table_name(trace_state, KEYSPACE_NAME, table_name);
if (!_proxy.get_db().local().has_schema(KEYSPACE_NAME, table_name)) {
throw api_error("ResourceNotFoundException",
@@ -406,14 +481,14 @@ static std::pair<std::string, std::string> parse_key_schema(const rjson::value&
}
future<json::json_return_type> executor::create_table(client_state& client_state, std::string content) {
future<json::json_return_type> executor::create_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content) {
_stats.api_operations.create_table++;
rjson::value table_info = rjson::parse(content);
elogger.trace("Creating table {}", table_info);
std::string table_name = get_table_name(table_info);
const rjson::value& attribute_definitions = table_info["AttributeDefinitions"];
tracing::add_table_name(client_state.get_trace_state(), KEYSPACE_NAME, table_name);
tracing::add_table_name(trace_state, KEYSPACE_NAME, table_name);
schema_builder builder(KEYSPACE_NAME, table_name);
auto [hash_key, range_key] = parse_key_schema(table_info);
@@ -656,7 +731,12 @@ static mutation make_item_mutation(const rjson::value& item, schema_ptr schema)
// Scylla proper, to implement the operation to replace an entire
// collection ("UPDATE .. SET x = ..") - see
// cql3::update_parameters::make_tombstone_just_before().
row.apply(tombstone(ts-1, gc_clock::now()));
const bool use_partition_tombstone = schema->clustering_key_size() == 0;
if (use_partition_tombstone) {
m.partition().apply(tombstone(ts-1, gc_clock::now()));
} else {
row.apply(tombstone(ts-1, gc_clock::now()));
}
return m;
}
@@ -674,18 +754,24 @@ static future<std::unique_ptr<rjson::value>> maybe_get_previous_item(
bool need_read_before_write,
alternator::stats& stats);
future<json::json_return_type> executor::put_item(client_state& client_state, std::string content) {
future<json::json_return_type> executor::put_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content) {
_stats.api_operations.put_item++;
auto start_time = std::chrono::steady_clock::now();
rjson::value update_info = rjson::parse(content);
elogger.trace("Updating value {}", update_info);
schema_ptr schema = get_table(_proxy, update_info);
tracing::add_table_name(client_state.get_trace_state(), schema->ks_name(), schema->cf_name());
tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
if (rjson::find(update_info, "ConditionExpression")) {
throw api_error("ValidationException", "ConditionExpression is not yet implemented in alternator");
}
auto return_values = get_string_attribute(update_info, "ReturnValues", "NONE");
if (return_values != "NONE") {
// FIXME: Need to support also the ALL_OLD option. See issue #5053.
throw api_error("ValidationException", format("Unsupported ReturnValues={} for PutItem operation", return_values));
}
const bool has_expected = update_info.HasMember("Expected");
const rjson::value& item = update_info["Item"];
@@ -694,11 +780,11 @@ future<json::json_return_type> executor::put_item(client_state& client_state, st
return maybe_get_previous_item(_proxy, client_state, schema, item, has_expected, _stats).then(
[this, schema, has_expected, update_info = rjson::copy(update_info), m = std::move(m),
&client_state, start_time] (std::unique_ptr<rjson::value> previous_item) mutable {
&client_state, start_time, trace_state] (std::unique_ptr<rjson::value> previous_item) mutable {
if (has_expected) {
verify_expected(update_info, previous_item);
}
return _proxy.mutate(std::vector<mutation>{std::move(m)}, db::consistency_level::LOCAL_QUORUM, default_timeout(), client_state.get_trace_state(), empty_service_permit()).then([this, start_time] () {
return _proxy.mutate(std::vector<mutation>{std::move(m)}, db::consistency_level::LOCAL_QUORUM, default_timeout(), trace_state, empty_service_permit()).then([this, start_time] () {
_stats.api_operations.put_item_latency.add(std::chrono::steady_clock::now() - start_time, _stats.api_operations.put_item_latency._count + 1);
// Without special options on what to return, PutItem returns nothing.
return make_ready_future<json::json_return_type>(json_string(""));
@@ -721,22 +807,32 @@ static mutation make_delete_item_mutation(const rjson::value& key, schema_ptr sc
clustering_key ck = ck_from_json(key, schema);
check_key(key, schema);
mutation m(schema, pk);
auto& row = m.partition().clustered_row(*schema, ck);
row.apply(tombstone(api::new_timestamp(), gc_clock::now()));
const bool use_partition_tombstone = schema->clustering_key_size() == 0;
if (use_partition_tombstone) {
m.partition().apply(tombstone(api::new_timestamp(), gc_clock::now()));
} else {
auto& row = m.partition().clustered_row(*schema, ck);
row.apply(tombstone(api::new_timestamp(), gc_clock::now()));
}
return m;
}
future<json::json_return_type> executor::delete_item(client_state& client_state, std::string content) {
future<json::json_return_type> executor::delete_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content) {
_stats.api_operations.delete_item++;
auto start_time = std::chrono::steady_clock::now();
rjson::value update_info = rjson::parse(content);
schema_ptr schema = get_table(_proxy, update_info);
tracing::add_table_name(client_state.get_trace_state(), schema->ks_name(), schema->cf_name());
tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
if (rjson::find(update_info, "ConditionExpression")) {
throw api_error("ValidationException", "ConditionExpression is not yet implemented in alternator");
}
auto return_values = get_string_attribute(update_info, "ReturnValues", "NONE");
if (return_values != "NONE") {
// FIXME: Need to support also the ALL_OLD option. See issue #5053.
throw api_error("ValidationException", format("Unsupported ReturnValues={} for DeleteItem operation", return_values));
}
const bool has_expected = update_info.HasMember("Expected");
const rjson::value& key = update_info["Key"];
@@ -746,11 +842,11 @@ future<json::json_return_type> executor::delete_item(client_state& client_state,
return maybe_get_previous_item(_proxy, client_state, schema, key, has_expected, _stats).then(
[this, schema, has_expected, update_info = rjson::copy(update_info), m = std::move(m),
&client_state, start_time] (std::unique_ptr<rjson::value> previous_item) mutable {
&client_state, start_time, trace_state] (std::unique_ptr<rjson::value> previous_item) mutable {
if (has_expected) {
verify_expected(update_info, previous_item);
}
return _proxy.mutate(std::vector<mutation>{std::move(m)}, db::consistency_level::LOCAL_QUORUM, default_timeout(), client_state.get_trace_state(), empty_service_permit()).then([this, start_time] () {
return _proxy.mutate(std::vector<mutation>{std::move(m)}, db::consistency_level::LOCAL_QUORUM, default_timeout(), trace_state, empty_service_permit()).then([this, start_time] () {
_stats.api_operations.delete_item_latency.add(std::chrono::steady_clock::now() - start_time, _stats.api_operations.delete_item_latency._count + 1);
// Without special options on what to return, DeleteItem returns nothing.
return make_ready_future<json::json_return_type>(json_string(""));
@@ -783,7 +879,7 @@ struct primary_key_equal {
}
};
future<json::json_return_type> executor::batch_write_item(client_state& client_state, std::string content) {
future<json::json_return_type> executor::batch_write_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content) {
_stats.api_operations.batch_write_item++;
rjson::value batch_info = rjson::parse(content);
rjson::value& request_items = batch_info["RequestItems"];
@@ -793,7 +889,7 @@ future<json::json_return_type> executor::batch_write_item(client_state& client_s
for (auto it = request_items.MemberBegin(); it != request_items.MemberEnd(); ++it) {
schema_ptr schema = get_table_from_batch_request(_proxy, it);
tracing::add_table_name(client_state.get_trace_state(), schema->ks_name(), schema->cf_name());
tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
std::unordered_set<primary_key, primary_key_hash, primary_key_equal> used_keys(1, primary_key_hash{schema}, primary_key_equal{schema});
for (auto& request : it->value.GetArray()) {
if (!request.IsObject() || request.MemberCount() != 1) {
@@ -826,7 +922,7 @@ future<json::json_return_type> executor::batch_write_item(client_state& client_s
}
}
return _proxy.mutate(std::move(mutations), db::consistency_level::LOCAL_QUORUM, default_timeout(), client_state.get_trace_state(), empty_service_permit()).then([] () {
return _proxy.mutate(std::move(mutations), db::consistency_level::LOCAL_QUORUM, default_timeout(), trace_state, empty_service_permit()).then([] () {
// Without special options on what to return, BatchWriteItem returns nothing,
// unless there are UnprocessedItems - it's possible to just stop processing a batch
// due to throttling. TODO(sarna): Consider UnprocessedItems when returning.
@@ -911,21 +1007,6 @@ static std::string get_item_type_string(const rjson::value& v) {
return it->name.GetString();
}
// Check if a given JSON object encodes a set (i.e., it is a {"SS": [...]}, or "NS", "BS"
// and returns set's type and a pointer to that set. If the object does not encode a set,
// returned value is {"", nullptr}
static const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v) {
if (!v.IsObject() || v.MemberCount() != 1) {
return {"", nullptr};
}
auto it = v.MemberBegin();
const std::string it_key = it->name.GetString();
if (it_key != "SS" && it_key != "BS" && it_key != "NS") {
return {"", nullptr};
}
return std::make_pair(it_key, &(it->value));
}
// Take two JSON-encoded list values (remember that a list value is
// {"L": [...the actual list]}) and return the concatenation, again as
// a list value.
@@ -944,50 +1025,6 @@ static rjson::value list_concatenate(const rjson::value& v1, const rjson::value&
return ret;
}
struct single_value_rjson_comp {
bool operator()(const rjson::value& r1, const rjson::value& r2) const {
auto r1_type = r1.GetType();
auto r2_type = r2.GetType();
switch (r1_type) {
case rjson::type::kNullType:
return r1_type < r2_type;
case rjson::type::kFalseType:
return r1_type < r2_type;
case rjson::type::kTrueType:
return r1_type < r2_type;
case rjson::type::kObjectType:
throw rjson::error("Object type comparison is not supported");
case rjson::type::kArrayType:
throw rjson::error("Array type comparison is not supported");
case rjson::type::kStringType: {
const size_t r1_len = r1.GetStringLength();
const size_t r2_len = r2.GetStringLength();
size_t len = std::min(r1_len, r2_len);
int result = std::strncmp(r1.GetString(), r2.GetString(), len);
return result < 0 || (result == 0 && r1_len < r2_len);
}
case rjson::type::kNumberType: {
if (r1_type != r2_type) {
throw rjson::error("All numbers in a set should have the same type");
}
if (r1.IsDouble()) {
return r1.GetDouble() < r2.GetDouble();
} else if (r1.IsInt()) {
return r1.GetInt() < r2.GetInt();
} else if (r1.IsUint()) {
return r1.GetUint() < r2.GetUint();
} else if (r1.IsInt64()) {
return r1.GetInt64() < r2.GetInt64();
} else {
return r1.GetUint64() < r2.GetUint64();
}
}
default:
return false;
}
}
};
// Take two JSON-encoded set values (e.g. {"SS": [...the actual set]}) and return the sum of both sets,
// again as a set value.
static rjson::value set_sum(const rjson::value& v1, const rjson::value& v2) {
@@ -1000,7 +1037,7 @@ static rjson::value set_sum(const rjson::value& v1, const rjson::value& v2) {
throw api_error("ValidationException", "UpdateExpression: ADD operation for sets must be given sets as arguments");
}
rjson::value sum = rjson::copy(*set1);
std::set<rjson::value, single_value_rjson_comp> set1_raw;
std::set<rjson::value, rjson::single_value_comp> set1_raw;
for (auto it = sum.Begin(); it != sum.End(); ++it) {
set1_raw.insert(rjson::copy(*it));
}
@@ -1025,7 +1062,7 @@ static rjson::value set_diff(const rjson::value& v1, const rjson::value& v2) {
if (!set1 || !set2) {
throw api_error("ValidationException", "UpdateExpression: DELETE operation can only be performed on a set");
}
std::set<rjson::value, single_value_rjson_comp> set1_raw;
std::set<rjson::value, rjson::single_value_comp> set1_raw;
for (auto it = set1->Begin(); it != set1->End(); ++it) {
set1_raw.insert(rjson::copy(*it));
}
@@ -1384,17 +1421,22 @@ static future<std::unique_ptr<rjson::value>> maybe_get_previous_item(
}
future<json::json_return_type> executor::update_item(client_state& client_state, std::string content) {
future<json::json_return_type> executor::update_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content) {
_stats.api_operations.update_item++;
auto start_time = std::chrono::steady_clock::now();
rjson::value update_info = rjson::parse(content);
elogger.trace("update_item {}", update_info);
schema_ptr schema = get_table(_proxy, update_info);
tracing::add_table_name(client_state.get_trace_state(), schema->ks_name(), schema->cf_name());
tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
if (rjson::find(update_info, "ConditionExpression")) {
throw api_error("ValidationException", "ConditionExpression is not yet implemented in alternator");
}
auto return_values = get_string_attribute(update_info, "ReturnValues", "NONE");
if (return_values != "NONE") {
// FIXME: Need to support also ALL_OLD, UPDATED_OLD, ALL_NEW and UPDATED_NEW options. See issue #5053.
throw api_error("ValidationException", format("Unsupported ReturnValues={} for UpdateItem operation", return_values));
}
if (!update_info.HasMember("Key")) {
throw api_error("ValidationException", "UpdateItem requires a Key parameter");
@@ -1441,7 +1483,7 @@ future<json::json_return_type> executor::update_item(client_state& client_state,
return maybe_get_previous_item(_proxy, client_state, schema, pk, ck, has_update_expression, expression, has_expected, _stats).then(
[this, schema, expression = std::move(expression), has_update_expression, ck = std::move(ck), has_expected,
update_info = rjson::copy(update_info), m = std::move(m), attrs_collector = std::move(attrs_collector),
attribute_updates = rjson::copy(attribute_updates), ts, &client_state, start_time] (std::unique_ptr<rjson::value> previous_item) mutable {
attribute_updates = rjson::copy(attribute_updates), ts, &client_state, start_time, trace_state] (std::unique_ptr<rjson::value> previous_item) mutable {
if (has_expected) {
verify_expected(update_info, previous_item);
}
@@ -1572,7 +1614,7 @@ future<json::json_return_type> executor::update_item(client_state& client_state,
row.apply(row_marker(ts));
elogger.trace("Applying mutation {}", m);
return _proxy.mutate(std::vector<mutation>{std::move(m)}, db::consistency_level::LOCAL_QUORUM, default_timeout(), client_state.get_trace_state(), empty_service_permit()).then([this, start_time] () {
return _proxy.mutate(std::vector<mutation>{std::move(m)}, db::consistency_level::LOCAL_QUORUM, default_timeout(), trace_state, empty_service_permit()).then([this, start_time] () {
// Without special options on what to return, UpdateItem returns nothing.
_stats.api_operations.update_item_latency.add(std::chrono::steady_clock::now() - start_time, _stats.api_operations.update_item_latency._count + 1);
return make_ready_future<json::json_return_type>(json_string(""));
@@ -1599,7 +1641,7 @@ static db::consistency_level get_read_consistency(const rjson::value& request) {
return consistent_read ? db::consistency_level::LOCAL_QUORUM : db::consistency_level::LOCAL_ONE;
}
future<json::json_return_type> executor::get_item(client_state& client_state, std::string content) {
future<json::json_return_type> executor::get_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content) {
_stats.api_operations.get_item++;
auto start_time = std::chrono::steady_clock::now();
rjson::value table_info = rjson::parse(content);
@@ -1607,7 +1649,7 @@ future<json::json_return_type> executor::get_item(client_state& client_state, st
schema_ptr schema = get_table(_proxy, table_info);
tracing::add_table_name(client_state.get_trace_state(), schema->ks_name(), schema->cf_name());
tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
rjson::value& query_key = table_info["Key"];
db::consistency_level cl = get_read_consistency(table_info);
@@ -1642,7 +1684,7 @@ future<json::json_return_type> executor::get_item(client_state& client_state, st
});
}
future<json::json_return_type> executor::batch_get_item(client_state& client_state, std::string content) {
future<json::json_return_type> executor::batch_get_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content) {
// FIXME: In this implementation, an unbounded batch size can cause
// unbounded response JSON object to be buffered in memory, unbounded
// parallelism of the requests, and unbounded amount of non-preemptable
@@ -1670,7 +1712,7 @@ future<json::json_return_type> executor::batch_get_item(client_state& client_sta
for (auto it = request_items.MemberBegin(); it != request_items.MemberEnd(); ++it) {
table_requests rs;
rs.schema = get_table_from_batch_request(_proxy, it);
tracing::add_table_name(client_state.get_trace_state(), KEYSPACE_NAME, rs.schema->cf_name());
tracing::add_table_name(trace_state, KEYSPACE_NAME, rs.schema->cf_name());
rs.cl = get_read_consistency(it->value);
rs.attrs_to_get = calculate_attrs_to_get(it->value);
auto& keys = (it->value)["Keys"];
@@ -1810,7 +1852,7 @@ static rjson::value encode_paging_state(const schema& schema, const service::pag
for (const column_definition& cdef : schema.partition_key_columns()) {
rjson::set_with_string_name(last_evaluated_key, cdef.name_as_text(), rjson::empty_object());
rjson::value& key_entry = last_evaluated_key[cdef.name_as_text()];
rjson::set_with_string_name(key_entry, type_to_string(cdef.type), rjson::parse(cdef.type->to_json_string(*exploded_pk_it)));
rjson::set_with_string_name(key_entry, type_to_string(cdef.type), rjson::parse(to_json_string(*cdef.type, *exploded_pk_it)));
++exploded_pk_it;
}
auto ck = paging_state.get_clustering_key();
@@ -1820,7 +1862,7 @@ static rjson::value encode_paging_state(const schema& schema, const service::pag
for (const column_definition& cdef : schema.clustering_key_columns()) {
rjson::set_with_string_name(last_evaluated_key, cdef.name_as_text(), rjson::empty_object());
rjson::value& key_entry = last_evaluated_key[cdef.name_as_text()];
rjson::set_with_string_name(key_entry, type_to_string(cdef.type), rjson::parse(cdef.type->to_json_string(*exploded_ck_it)));
rjson::set_with_string_name(key_entry, type_to_string(cdef.type), rjson::parse(to_json_string(*cdef.type, *exploded_ck_it)));
++exploded_ck_it;
}
}
@@ -1836,10 +1878,11 @@ static future<json::json_return_type> do_query(schema_ptr schema,
db::consistency_level cl,
::shared_ptr<cql3::restrictions::statement_restrictions> filtering_restrictions,
service::client_state& client_state,
cql3::cql_stats& cql_stats) {
cql3::cql_stats& cql_stats,
tracing::trace_state_ptr trace_state) {
::shared_ptr<service::pager::paging_state> paging_state = nullptr;
tracing::trace(client_state.get_trace_state(), "Performing a database query");
tracing::trace(trace_state, "Performing a database query");
if (exclusive_start_key) {
partition_key pk = pk_from_json(*exclusive_start_key, schema);
@@ -1856,7 +1899,7 @@ static future<json::json_return_type> do_query(schema_ptr schema,
auto partition_slice = query::partition_slice(std::move(ck_bounds), {}, std::move(regular_columns), selection->get_query_options());
auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, query::max_partitions);
auto query_state_ptr = std::make_unique<service::query_state>(client_state, empty_service_permit());
auto query_state_ptr = std::make_unique<service::query_state>(client_state, trace_state, empty_service_permit());
command->slice.options.set<query::partition_slice::option::allow_short_read>();
auto query_options = std::make_unique<cql3::query_options>(cl, infinite_timeout_config, std::vector<cql3::raw_value>{});
@@ -1888,7 +1931,7 @@ static future<json::json_return_type> do_query(schema_ptr schema,
// 2. Filtering - by passing appropriately created restrictions to pager as a last parameter
// 3. Proper timeouts instead of gc_clock::now() and db::no_timeout
// 4. Implement parallel scanning via Segments
future<json::json_return_type> executor::scan(client_state& client_state, std::string content) {
future<json::json_return_type> executor::scan(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content) {
_stats.api_operations.scan++;
rjson::value request_info = rjson::parse(content);
elogger.trace("Scanning {}", request_info);
@@ -1898,6 +1941,10 @@ future<json::json_return_type> executor::scan(client_state& client_state, std::s
if (rjson::find(request_info, "FilterExpression")) {
throw api_error("ValidationException", "FilterExpression is not yet implemented in alternator");
}
if (get_int_attribute(request_info, "Segment") || get_int_attribute(request_info, "TotalSegments")) {
// FIXME: need to support parallel scan. See issue #5059.
throw api_error("ValidationException", "Scan Segment/TotalSegments is not yet implemented in alternator");
}
rjson::value* exclusive_start_key = rjson::find(request_info, "ExclusiveStartKey");
//FIXME(sarna): ScanFilter is deprecated in favor of FilterExpression
@@ -1921,7 +1968,7 @@ future<json::json_return_type> executor::scan(client_state& client_state, std::s
partition_ranges = filtering_restrictions->get_partition_key_ranges(query_options);
ck_bounds = filtering_restrictions->get_clustering_bounds(query_options);
}
return do_query(schema, exclusive_start_key, std::move(partition_ranges), std::move(ck_bounds), std::move(attrs_to_get), limit, cl, std::move(filtering_restrictions), client_state, _stats.cql_stats);
return do_query(schema, exclusive_start_key, std::move(partition_ranges), std::move(ck_bounds), std::move(attrs_to_get), limit, cl, std::move(filtering_restrictions), client_state, _stats.cql_stats, trace_state);
}
static dht::partition_range calculate_pk_bound(schema_ptr schema, const column_definition& pk_cdef, comparison_operator_type op, const rjson::value& attrs) {
@@ -2044,14 +2091,14 @@ calculate_bounds(schema_ptr schema, const rjson::value& conditions) {
return {std::move(partition_ranges), std::move(ck_bounds)};
}
future<json::json_return_type> executor::query(client_state& client_state, std::string content) {
future<json::json_return_type> executor::query(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content) {
_stats.api_operations.query++;
rjson::value request_info = rjson::parse(content);
elogger.trace("Querying {}", request_info);
schema_ptr schema = get_table_or_view(_proxy, request_info);
tracing::add_table_name(client_state.get_trace_state(), schema->ks_name(), schema->cf_name());
tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
rjson::value* exclusive_start_key = rjson::find(request_info, "ExclusiveStartKey");
db::consistency_level cl = get_read_consistency(request_info);
@@ -2067,6 +2114,11 @@ future<json::json_return_type> executor::query(client_state& client_state, std::
if (rjson::find(request_info, "FilterExpression")) {
throw api_error("ValidationException", "FilterExpression is not yet implemented in alternator");
}
bool forward = get_bool_attribute(request_info, "ScanIndexForward", true);
if (!forward) {
// FIXME: need to support the !forward (i.e., reverse sort order) case. See issue #5153.
throw api_error("ValidationException", "ScanIndexForward=false is not yet implemented in alternator");
}
//FIXME(sarna): KeyConditions are deprecated in favor of KeyConditionExpression
rjson::value& conditions = rjson::get(request_info, "KeyConditions");
@@ -2089,7 +2141,7 @@ future<json::json_return_type> executor::query(client_state& client_state, std::
throw api_error("ValidationException", format("QueryFilter can only contain non-primary key attributes: Primary key attribute: {}", ck_defs.front()->name_as_text()));
}
}
return do_query(schema, exclusive_start_key, std::move(partition_ranges), std::move(ck_bounds), std::move(attrs_to_get), limit, cl, std::move(filtering_restrictions), client_state, _stats.cql_stats);
return do_query(schema, exclusive_start_key, std::move(partition_ranges), std::move(ck_bounds), std::move(attrs_to_get), limit, cl, std::move(filtering_restrictions), client_state, _stats.cql_stats, std::move(trace_state));
}
static void validate_limit(int limit) {
@@ -2198,18 +2250,20 @@ future<> executor::maybe_create_keyspace() {
});
}
static void create_tracing_session(executor::client_state& client_state) {
static tracing::trace_state_ptr create_tracing_session() {
tracing::trace_state_props_set props;
props.set<tracing::trace_state_props::full_tracing>();
client_state.create_tracing_session(tracing::trace_type::QUERY, props);
return tracing::tracing::get_local_tracing_instance().create_session(tracing::trace_type::QUERY, props);
}
void executor::maybe_trace_query(client_state& client_state, sstring_view op, sstring_view query) {
tracing::trace_state_ptr executor::maybe_trace_query(client_state& client_state, sstring_view op, sstring_view query) {
tracing::trace_state_ptr trace_state;
if (tracing::tracing::get_local_tracing_instance().trace_next_query()) {
create_tracing_session(client_state);
tracing::add_query(client_state.get_trace_state(), query);
tracing::begin(client_state.get_trace_state(), format("Alternator {}", op), client_state.get_client_address());
trace_state = create_tracing_session();
tracing::add_query(trace_state, query);
tracing::begin(trace_state, format("Alternator {}", op), client_state.get_client_address());
}
return trace_state;
}
future<> executor::start() {

View File

@@ -46,26 +46,26 @@ public:
executor(service::storage_proxy& proxy, service::migration_manager& mm) : _proxy(proxy), _mm(mm) {}
future<json::json_return_type> create_table(client_state& client_state, std::string content);
future<json::json_return_type> describe_table(client_state& client_state, std::string content);
future<json::json_return_type> delete_table(client_state& client_state, std::string content);
future<json::json_return_type> put_item(client_state& client_state, std::string content);
future<json::json_return_type> get_item(client_state& client_state, std::string content);
future<json::json_return_type> delete_item(client_state& client_state, std::string content);
future<json::json_return_type> update_item(client_state& client_state, std::string content);
future<json::json_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> delete_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> put_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> get_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> delete_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> update_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> list_tables(client_state& client_state, std::string content);
future<json::json_return_type> scan(client_state& client_state, std::string content);
future<json::json_return_type> scan(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> describe_endpoints(client_state& client_state, std::string content, std::string host_header);
future<json::json_return_type> batch_write_item(client_state& client_state, std::string content);
future<json::json_return_type> batch_get_item(client_state& client_state, std::string content);
future<json::json_return_type> query(client_state& client_state, std::string content);
future<json::json_return_type> batch_write_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> batch_get_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> query(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<> start();
future<> stop() { return make_ready_future<>(); }
future<> maybe_create_keyspace();
static void maybe_trace_query(client_state& client_state, sstring_view op, sstring_view query);
static tracing::trace_state_ptr maybe_trace_query(client_state& client_state, sstring_view op, sstring_view query);
};
}

View File

@@ -113,6 +113,58 @@ void push_back(rjson::value& base_array, rjson::value&& item) {
}
bool single_value_comp::operator()(const rjson::value& r1, const rjson::value& r2) const {
auto r1_type = r1.GetType();
auto r2_type = r2.GetType();
// null is the smallest type and compares with every other type, nothing is lesser than null
if (r1_type == rjson::type::kNullType || r2_type == rjson::type::kNullType) {
return r1_type < r2_type;
}
// only null, true, and false are comparable with each other, other types are not compatible
if (r1_type != r2_type) {
if (r1_type > rjson::type::kTrueType || r2_type > rjson::type::kTrueType) {
throw rjson::error(format("Types are not comparable: {} {}", r1, r2));
}
}
switch (r1_type) {
case rjson::type::kNullType:
// fall-through
case rjson::type::kFalseType:
// fall-through
case rjson::type::kTrueType:
return r1_type < r2_type;
case rjson::type::kObjectType:
throw rjson::error("Object type comparison is not supported");
case rjson::type::kArrayType:
throw rjson::error("Array type comparison is not supported");
case rjson::type::kStringType: {
const size_t r1_len = r1.GetStringLength();
const size_t r2_len = r2.GetStringLength();
size_t len = std::min(r1_len, r2_len);
int result = std::strncmp(r1.GetString(), r2.GetString(), len);
return result < 0 || (result == 0 && r1_len < r2_len);
}
case rjson::type::kNumberType: {
if (r1.IsInt() && r2.IsInt()) {
return r1.GetInt() < r2.GetInt();
} else if (r1.IsUint() && r2.IsUint()) {
return r1.GetUint() < r2.GetUint();
} else if (r1.IsInt64() && r2.IsInt64()) {
return r1.GetInt64() < r2.GetInt64();
} else if (r1.IsUint64() && r2.IsUint64()) {
return r1.GetUint64() < r2.GetUint64();
} else {
// it's safe to call GetDouble() on any number type
return r1.GetDouble() < r2.GetDouble();
}
}
default:
return false;
}
}
} // end namespace rjson
std::ostream& std::operator<<(std::ostream& os, const rjson::value& v) {

View File

@@ -152,6 +152,10 @@ void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type
// Throws if base_array is not a JSON array.
void push_back(rjson::value& base_array, rjson::value&& item);
struct single_value_comp {
bool operator()(const rjson::value& r1, const rjson::value& r2) const;
};
} // end namespace rjson
namespace std {

View File

@@ -25,6 +25,7 @@
#include "error.hh"
#include "rapidjson/writer.h"
#include "concrete_types.hh"
#include "cql3/type_json.hh"
static logging::logger slogger("alternator-serialization");
@@ -77,7 +78,7 @@ struct from_json_visitor {
}
// default
void operator()(const abstract_type& t) const {
bo.write(t.from_json_object(Json::Value(rjson::print(v)), cql_serialization_format::internal()));
bo.write(from_json_object(t, Json::Value(rjson::print(v)), cql_serialization_format::internal()));
}
};
@@ -107,7 +108,7 @@ struct to_json_visitor {
void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), to_json_visitor{deserialized, type_ident, bv}); };
void operator()(const decimal_type_impl& t) const {
auto s = decimal_type->to_json_string(bytes(bv));
auto s = to_json_string(*decimal_type, bytes(bv));
//FIXME(sarna): unnecessary copy
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(s));
}
@@ -194,7 +195,7 @@ rjson::value json_key_column_value(bytes_view cell, const column_definition& col
// FIXME: use specialized Alternator number type, not the more
// general "decimal_type". A dedicated type can be more efficient
// in storage space and in parsing speed.
auto s = decimal_type->to_json_string(bytes(cell));
auto s = to_json_string(*decimal_type, bytes(cell));
return rjson::from_string(s);
} else {
// We shouldn't get here, we shouldn't see such key columns.
@@ -245,4 +246,16 @@ big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {
return big_decimal(it->value.GetString());
}
const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v) {
if (!v.IsObject() || v.MemberCount() != 1) {
return {"", nullptr};
}
auto it = v.MemberBegin();
const std::string it_key = it->name.GetString();
if (it_key != "SS" && it_key != "BS" && it_key != "NS") {
return {"", nullptr};
}
return std::make_pair(it_key, &(it->value));
}
}

View File

@@ -63,4 +63,10 @@ clustering_key ck_from_json(const rjson::value& item, schema_ptr schema);
// If v encodes a number (i.e., it is a {"N": [...]}, returns an object representing it. Otherwise,
// raises ValidationException with diagnostic.
big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic);
// Check if a given JSON object encodes a set (i.e., it is a {"SS": [...]}, or "NS", "BS"
// and returns set's type and a pointer to that set. If the object does not encode a set,
// returned value is {"", nullptr}
const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v);
}

View File

@@ -215,6 +215,7 @@ future<> server::verify_signature(const request& req) {
}
future<json::json_return_type> server::handle_api_request(std::unique_ptr<request>&& req) {
_executor.local()._stats.total_operations++;
sstring target = req->get_header(TARGET);
std::vector<std::string_view> split_target = split(target, '.');
//NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)
@@ -231,9 +232,9 @@ future<json::json_return_type> server::handle_api_request(std::unique_ptr<reques
// We use unique_ptr because client_state cannot be moved or copied
return do_with(std::make_unique<executor::client_state>(executor::client_state::internal_tag()), [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] (std::unique_ptr<executor::client_state>& client_state) mutable {
client_state->set_raw_keyspace(executor::KEYSPACE_NAME);
executor::maybe_trace_query(*client_state, op, req->content);
tracing::trace(client_state->get_trace_state(), op);
return callback_it->second(_executor.local(), *client_state, std::move(req));
tracing::trace_state_ptr trace_state = executor::maybe_trace_query(*client_state, op, req->content);
tracing::trace(trace_state, op);
return callback_it->second(_executor.local(), *client_state, trace_state, std::move(req)).finally([trace_state] {});
});
});
}
@@ -253,21 +254,21 @@ void server::set_routes(routes& r) {
server::server(seastar::sharded<executor>& e)
: _executor(e), _key_cache(1024, 1min, slogger), _enforce_authorization(false)
, _callbacks{
{"CreateTable", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) {
return e.maybe_create_keyspace().then([&e, &client_state, req = std::move(req)] { return e.create_table(client_state, req->content); }); }
{"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) {
return e.maybe_create_keyspace().then([&e, &client_state, req = std::move(req), trace_state = std::move(trace_state)] () mutable { return e.create_table(client_state, std::move(trace_state), req->content); }); }
},
{"DescribeTable", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.describe_table(client_state, req->content); }},
{"DeleteTable", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.delete_table(client_state, req->content); }},
{"PutItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.put_item(client_state, req->content); }},
{"UpdateItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.update_item(client_state, req->content); }},
{"GetItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.get_item(client_state, req->content); }},
{"DeleteItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.delete_item(client_state, req->content); }},
{"ListTables", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.list_tables(client_state, req->content); }},
{"Scan", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.scan(client_state, req->content); }},
{"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.describe_endpoints(client_state, req->content, req->get_header("Host")); }},
{"BatchWriteItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.batch_write_item(client_state, req->content); }},
{"BatchGetItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.batch_get_item(client_state, req->content); }},
{"Query", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.query(client_state, req->content); }},
{"DescribeTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.describe_table(client_state, std::move(trace_state), req->content); }},
{"DeleteTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.delete_table(client_state, std::move(trace_state), req->content); }},
{"PutItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.put_item(client_state, std::move(trace_state), req->content); }},
{"UpdateItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.update_item(client_state, std::move(trace_state), req->content); }},
{"GetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.get_item(client_state, std::move(trace_state), req->content); }},
{"DeleteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.delete_item(client_state, std::move(trace_state), req->content); }},
{"ListTables", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.list_tables(client_state, req->content); }},
{"Scan", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.scan(client_state, std::move(trace_state), req->content); }},
{"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.describe_endpoints(client_state, req->content, req->get_header("Host")); }},
{"BatchWriteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.batch_write_item(client_state, std::move(trace_state), req->content); }},
{"BatchGetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.batch_get_item(client_state, std::move(trace_state), req->content); }},
{"Query", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.query(client_state, std::move(trace_state), req->content); }},
} {
}
@@ -300,9 +301,11 @@ future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std:
slogger.info("Alternator HTTPS server listening on {} port {}", addr, *https_port);
}
} catch (...) {
slogger.warn("Failed to set up Alternator HTTP server on {} port {}, TLS port {}: {}",
slogger.error("Failed to set up Alternator HTTP server on {} port {}, TLS port {}: {}",
addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF", std::current_exception());
throw;
std::throw_with_nested(std::runtime_error(
format("Failed to set up Alternator HTTP server on {} port {}, TLS port {}",
addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF")));
}
});
}

View File

@@ -31,7 +31,7 @@
namespace alternator {
class server {
using alternator_callback = std::function<future<json::json_return_type>(executor&, executor::client_state&, std::unique_ptr<request>)>;
using alternator_callback = std::function<future<json::json_return_type>(executor&, executor::client_state&, tracing::trace_state_ptr, std::unique_ptr<request>)>;
using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;
seastar::httpd::http_server_control _control;

View File

@@ -13,7 +13,7 @@
{
"method":"GET",
"summary":"get row cache save period in seconds",
"type":"int",
"type": "long",
"nickname":"get_row_cache_save_period_in_seconds",
"produces":[
"application/json"
@@ -35,7 +35,7 @@
"description":"row cache save period in seconds",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -48,7 +48,7 @@
{
"method":"GET",
"summary":"get key cache save period in seconds",
"type":"int",
"type": "long",
"nickname":"get_key_cache_save_period_in_seconds",
"produces":[
"application/json"
@@ -70,7 +70,7 @@
"description":"key cache save period in seconds",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -83,7 +83,7 @@
{
"method":"GET",
"summary":"get counter cache save period in seconds",
"type":"int",
"type": "long",
"nickname":"get_counter_cache_save_period_in_seconds",
"produces":[
"application/json"
@@ -105,7 +105,7 @@
"description":"counter cache save period in seconds",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -118,7 +118,7 @@
{
"method":"GET",
"summary":"get row cache keys to save",
"type":"int",
"type": "long",
"nickname":"get_row_cache_keys_to_save",
"produces":[
"application/json"
@@ -140,7 +140,7 @@
"description":"row cache keys to save",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -153,7 +153,7 @@
{
"method":"GET",
"summary":"get key cache keys to save",
"type":"int",
"type": "long",
"nickname":"get_key_cache_keys_to_save",
"produces":[
"application/json"
@@ -175,7 +175,7 @@
"description":"key cache keys to save",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -188,7 +188,7 @@
{
"method":"GET",
"summary":"get counter cache keys to save",
"type":"int",
"type": "long",
"nickname":"get_counter_cache_keys_to_save",
"produces":[
"application/json"
@@ -210,7 +210,7 @@
"description":"counter cache keys to save",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -448,7 +448,7 @@
{
"method": "GET",
"summary": "Get key entries",
"type": "int",
"type": "long",
"nickname": "get_key_entries",
"produces": [
"application/json"
@@ -568,7 +568,7 @@
{
"method": "GET",
"summary": "Get row entries",
"type": "int",
"type": "long",
"nickname": "get_row_entries",
"produces": [
"application/json"
@@ -688,7 +688,7 @@
{
"method": "GET",
"summary": "Get counter entries",
"type": "int",
"type": "long",
"nickname": "get_counter_entries",
"produces": [
"application/json"

View File

@@ -121,7 +121,7 @@
"description":"The minimum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -172,7 +172,7 @@
"description":"The maximum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -223,7 +223,7 @@
"description":"The maximum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
},
{
@@ -231,7 +231,7 @@
"description":"The minimum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -544,7 +544,7 @@
"summary":"sstable count for each level. empty unless leveled compaction is used",
"type":"array",
"items":{
"type":"int"
"type": "long"
},
"nickname":"get_sstable_count_per_level",
"produces":[
@@ -636,7 +636,7 @@
"description":"Duration (in milliseconds) of monitoring operation",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
},
{
@@ -644,7 +644,7 @@
"description":"number of the top partitions to list",
"required":false,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
},
{
@@ -652,7 +652,7 @@
"description":"capacity of stream summary: determines amount of resources used in query processing",
"required":false,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -921,7 +921,7 @@
{
"method":"GET",
"summary":"Get memtable switch count",
"type":"int",
"type": "long",
"nickname":"get_memtable_switch_count",
"produces":[
"application/json"
@@ -945,7 +945,7 @@
{
"method":"GET",
"summary":"Get all memtable switch count",
"type":"int",
"type": "long",
"nickname":"get_all_memtable_switch_count",
"produces":[
"application/json"
@@ -1082,7 +1082,7 @@
{
"method":"GET",
"summary":"Get read latency",
"type":"int",
"type": "long",
"nickname":"get_read_latency",
"produces":[
"application/json"
@@ -1235,7 +1235,7 @@
{
"method":"GET",
"summary":"Get all read latency",
"type":"int",
"type": "long",
"nickname":"get_all_read_latency",
"produces":[
"application/json"
@@ -1251,7 +1251,7 @@
{
"method":"GET",
"summary":"Get range latency",
"type":"int",
"type": "long",
"nickname":"get_range_latency",
"produces":[
"application/json"
@@ -1275,7 +1275,7 @@
{
"method":"GET",
"summary":"Get all range latency",
"type":"int",
"type": "long",
"nickname":"get_all_range_latency",
"produces":[
"application/json"
@@ -1291,7 +1291,7 @@
{
"method":"GET",
"summary":"Get write latency",
"type":"int",
"type": "long",
"nickname":"get_write_latency",
"produces":[
"application/json"
@@ -1444,7 +1444,7 @@
{
"method":"GET",
"summary":"Get all write latency",
"type":"int",
"type": "long",
"nickname":"get_all_write_latency",
"produces":[
"application/json"
@@ -1460,7 +1460,7 @@
{
"method":"GET",
"summary":"Get pending flushes",
"type":"int",
"type": "long",
"nickname":"get_pending_flushes",
"produces":[
"application/json"
@@ -1484,7 +1484,7 @@
{
"method":"GET",
"summary":"Get all pending flushes",
"type":"int",
"type": "long",
"nickname":"get_all_pending_flushes",
"produces":[
"application/json"
@@ -1500,7 +1500,7 @@
{
"method":"GET",
"summary":"Get pending compactions",
"type":"int",
"type": "long",
"nickname":"get_pending_compactions",
"produces":[
"application/json"
@@ -1524,7 +1524,7 @@
{
"method":"GET",
"summary":"Get all pending compactions",
"type":"int",
"type": "long",
"nickname":"get_all_pending_compactions",
"produces":[
"application/json"
@@ -1540,7 +1540,7 @@
{
"method":"GET",
"summary":"Get live ss table count",
"type":"int",
"type": "long",
"nickname":"get_live_ss_table_count",
"produces":[
"application/json"
@@ -1564,7 +1564,7 @@
{
"method":"GET",
"summary":"Get all live ss table count",
"type":"int",
"type": "long",
"nickname":"get_all_live_ss_table_count",
"produces":[
"application/json"
@@ -1580,7 +1580,7 @@
{
"method":"GET",
"summary":"Get live disk space used",
"type":"int",
"type": "long",
"nickname":"get_live_disk_space_used",
"produces":[
"application/json"
@@ -1604,7 +1604,7 @@
{
"method":"GET",
"summary":"Get all live disk space used",
"type":"int",
"type": "long",
"nickname":"get_all_live_disk_space_used",
"produces":[
"application/json"
@@ -1620,7 +1620,7 @@
{
"method":"GET",
"summary":"Get total disk space used",
"type":"int",
"type": "long",
"nickname":"get_total_disk_space_used",
"produces":[
"application/json"
@@ -1644,7 +1644,7 @@
{
"method":"GET",
"summary":"Get all total disk space used",
"type":"int",
"type": "long",
"nickname":"get_all_total_disk_space_used",
"produces":[
"application/json"
@@ -2100,7 +2100,7 @@
{
"method":"GET",
"summary":"Get speculative retries",
"type":"int",
"type": "long",
"nickname":"get_speculative_retries",
"produces":[
"application/json"
@@ -2124,7 +2124,7 @@
{
"method":"GET",
"summary":"Get all speculative retries",
"type":"int",
"type": "long",
"nickname":"get_all_speculative_retries",
"produces":[
"application/json"
@@ -2204,7 +2204,7 @@
{
"method":"GET",
"summary":"Get row cache hit out of range",
"type":"int",
"type": "long",
"nickname":"get_row_cache_hit_out_of_range",
"produces":[
"application/json"
@@ -2228,7 +2228,7 @@
{
"method":"GET",
"summary":"Get all row cache hit out of range",
"type":"int",
"type": "long",
"nickname":"get_all_row_cache_hit_out_of_range",
"produces":[
"application/json"
@@ -2244,7 +2244,7 @@
{
"method":"GET",
"summary":"Get row cache hit",
"type":"int",
"type": "long",
"nickname":"get_row_cache_hit",
"produces":[
"application/json"
@@ -2268,7 +2268,7 @@
{
"method":"GET",
"summary":"Get all row cache hit",
"type":"int",
"type": "long",
"nickname":"get_all_row_cache_hit",
"produces":[
"application/json"
@@ -2284,7 +2284,7 @@
{
"method":"GET",
"summary":"Get row cache miss",
"type":"int",
"type": "long",
"nickname":"get_row_cache_miss",
"produces":[
"application/json"
@@ -2308,7 +2308,7 @@
{
"method":"GET",
"summary":"Get all row cache miss",
"type":"int",
"type": "long",
"nickname":"get_all_row_cache_miss",
"produces":[
"application/json"
@@ -2324,7 +2324,7 @@
{
"method":"GET",
"summary":"Get cas prepare",
"type":"int",
"type": "long",
"nickname":"get_cas_prepare",
"produces":[
"application/json"
@@ -2348,7 +2348,7 @@
{
"method":"GET",
"summary":"Get cas propose",
"type":"int",
"type": "long",
"nickname":"get_cas_propose",
"produces":[
"application/json"
@@ -2372,7 +2372,7 @@
{
"method":"GET",
"summary":"Get cas commit",
"type":"int",
"type": "long",
"nickname":"get_cas_commit",
"produces":[
"application/json"

View File

@@ -118,7 +118,7 @@
{
"method": "GET",
"summary": "Get pending tasks",
"type": "int",
"type": "long",
"nickname": "get_pending_tasks",
"produces": [
"application/json"
@@ -181,7 +181,7 @@
{
"method": "GET",
"summary": "Get bytes compacted",
"type": "int",
"type": "long",
"nickname": "get_bytes_compacted",
"produces": [
"application/json"
@@ -197,7 +197,7 @@
"description":"A row merged information",
"properties":{
"key":{
"type":"int",
"type": "long",
"description":"The number of sstable"
},
"value":{

View File

@@ -110,7 +110,7 @@
{
"method":"GET",
"summary":"Get count down endpoint",
"type":"int",
"type": "long",
"nickname":"get_down_endpoint_count",
"produces":[
"application/json"
@@ -126,7 +126,7 @@
{
"method":"GET",
"summary":"Get count up endpoint",
"type":"int",
"type": "long",
"nickname":"get_up_endpoint_count",
"produces":[
"application/json"
@@ -180,11 +180,11 @@
"description": "The endpoint address"
},
"generation": {
"type": "int",
"type": "long",
"description": "The heart beat generation"
},
"version": {
"type": "int",
"type": "long",
"description": "The heart beat version"
},
"update_time": {
@@ -209,7 +209,7 @@
"description": "Holds a version value for an application state",
"properties": {
"application_state": {
"type": "int",
"type": "long",
"description": "The application state enum index"
},
"value": {
@@ -217,7 +217,7 @@
"description": "The version value"
},
"version": {
"type": "int",
"type": "long",
"description": "The application state version"
}
}

View File

@@ -75,7 +75,7 @@
{
"method":"GET",
"summary":"Returns files which are pending for archival attempt. Does NOT include failed archive attempts",
"type":"int",
"type": "long",
"nickname":"get_current_generation_number",
"produces":[
"application/json"
@@ -99,7 +99,7 @@
{
"method":"GET",
"summary":"Get heart beat version for a node",
"type":"int",
"type": "long",
"nickname":"get_current_heart_beat_version",
"produces":[
"application/json"

View File

@@ -99,7 +99,7 @@
{
"method": "GET",
"summary": "Get create hint count",
"type": "int",
"type": "long",
"nickname": "get_create_hint_count",
"produces": [
"application/json"
@@ -123,7 +123,7 @@
{
"method": "GET",
"summary": "Get not stored hints count",
"type": "int",
"type": "long",
"nickname": "get_not_stored_hints_count",
"produces": [
"application/json"

View File

@@ -191,7 +191,7 @@
{
"method":"GET",
"summary":"Get the version number",
"type":"int",
"type": "long",
"nickname":"get_version",
"produces":[
"application/json"

View File

@@ -105,7 +105,7 @@
{
"method":"GET",
"summary":"Get the max hint window",
"type":"int",
"type": "long",
"nickname":"get_max_hint_window",
"produces":[
"application/json"
@@ -128,7 +128,7 @@
"description":"max hint window in ms",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -141,7 +141,7 @@
{
"method":"GET",
"summary":"Get max hints in progress",
"type":"int",
"type": "long",
"nickname":"get_max_hints_in_progress",
"produces":[
"application/json"
@@ -164,7 +164,7 @@
"description":"max hints in progress",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -177,7 +177,7 @@
{
"method":"GET",
"summary":"get hints in progress",
"type":"int",
"type": "long",
"nickname":"get_hints_in_progress",
"produces":[
"application/json"
@@ -602,7 +602,7 @@
{
"method": "GET",
"summary": "Get cas write metrics",
"type": "int",
"type": "long",
"nickname": "get_cas_write_metrics_unfinished_commit",
"produces": [
"application/json"
@@ -632,7 +632,7 @@
{
"method": "GET",
"summary": "Get cas write metrics",
"type": "int",
"type": "long",
"nickname": "get_cas_write_metrics_condition_not_met",
"produces": [
"application/json"
@@ -647,7 +647,7 @@
{
"method": "GET",
"summary": "Get cas read metrics",
"type": "int",
"type": "long",
"nickname": "get_cas_read_metrics_unfinished_commit",
"produces": [
"application/json"
@@ -677,7 +677,7 @@
{
"method": "GET",
"summary": "Get read metrics",
"type": "int",
"type": "long",
"nickname": "get_read_metrics_timeouts",
"produces": [
"application/json"
@@ -692,7 +692,7 @@
{
"method": "GET",
"summary": "Get read metrics",
"type": "int",
"type": "long",
"nickname": "get_read_metrics_unavailables",
"produces": [
"application/json"
@@ -827,7 +827,7 @@
{
"method": "GET",
"summary": "Get range metrics",
"type": "int",
"type": "long",
"nickname": "get_range_metrics_timeouts",
"produces": [
"application/json"
@@ -842,7 +842,7 @@
{
"method": "GET",
"summary": "Get range metrics",
"type": "int",
"type": "long",
"nickname": "get_range_metrics_unavailables",
"produces": [
"application/json"
@@ -887,7 +887,7 @@
{
"method": "GET",
"summary": "Get write metrics",
"type": "int",
"type": "long",
"nickname": "get_write_metrics_timeouts",
"produces": [
"application/json"
@@ -902,7 +902,7 @@
{
"method": "GET",
"summary": "Get write metrics",
"type": "int",
"type": "long",
"nickname": "get_write_metrics_unavailables",
"produces": [
"application/json"
@@ -1008,7 +1008,7 @@
{
"method":"GET",
"summary":"Get read latency",
"type":"int",
"type": "long",
"nickname":"get_read_latency",
"produces":[
"application/json"
@@ -1040,7 +1040,7 @@
{
"method":"GET",
"summary":"Get write latency",
"type":"int",
"type": "long",
"nickname":"get_write_latency",
"produces":[
"application/json"
@@ -1072,7 +1072,7 @@
{
"method":"GET",
"summary":"Get range latency",
"type":"int",
"type": "long",
"nickname":"get_range_latency",
"produces":[
"application/json"

View File

@@ -458,7 +458,7 @@
{
"method":"GET",
"summary":"Return the generation value for this node.",
"type":"int",
"type": "long",
"nickname":"get_current_generation_number",
"produces":[
"application/json"
@@ -646,7 +646,7 @@
{
"method":"POST",
"summary":"Trigger a cleanup of keys on a single keyspace",
"type":"int",
"type": "long",
"nickname":"force_keyspace_cleanup",
"produces":[
"application/json"
@@ -678,7 +678,7 @@
{
"method":"GET",
"summary":"Scrub (deserialize + reserialize at the latest version, skipping bad rows if any) the given keyspace. If columnFamilies array is empty, all CFs are scrubbed. Scrubbed CFs will be snapshotted first, if disableSnapshot is false",
"type":"int",
"type": "long",
"nickname":"scrub",
"produces":[
"application/json"
@@ -726,7 +726,7 @@
{
"method":"GET",
"summary":"Rewrite all sstables to the latest version. Unlike scrub, it doesn't skip bad rows and do not snapshot sstables first.",
"type":"int",
"type": "long",
"nickname":"upgrade_sstables",
"produces":[
"application/json"
@@ -800,7 +800,7 @@
"summary":"Return an array with the ids of the currently active repairs",
"type":"array",
"items":{
"type":"int"
"type": "long"
},
"nickname":"get_active_repair_async",
"produces":[
@@ -816,7 +816,7 @@
{
"method":"POST",
"summary":"Invoke repair asynchronously. You can track repair progress by using the get supplying id",
"type":"int",
"type": "long",
"nickname":"repair_async",
"produces":[
"application/json"
@@ -947,7 +947,7 @@
"description":"The repair ID to check for status",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1277,18 +1277,18 @@
},
{
"name":"dynamic_update_interval",
"description":"integer, in ms (default 100)",
"description":"interval in ms (default 100)",
"required":false,
"allowMultiple":false,
"type":"integer",
"type":"long",
"paramType":"query"
},
{
"name":"dynamic_reset_interval",
"description":"integer, in ms (default 600,000)",
"description":"interval in ms (default 600,000)",
"required":false,
"allowMultiple":false,
"type":"integer",
"type":"long",
"paramType":"query"
},
{
@@ -1493,7 +1493,7 @@
"description":"Stream throughput",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1501,7 +1501,7 @@
{
"method":"GET",
"summary":"Get stream throughput mb per sec",
"type":"int",
"type": "long",
"nickname":"get_stream_throughput_mb_per_sec",
"produces":[
"application/json"
@@ -1517,7 +1517,7 @@
{
"method":"GET",
"summary":"get compaction throughput mb per sec",
"type":"int",
"type": "long",
"nickname":"get_compaction_throughput_mb_per_sec",
"produces":[
"application/json"
@@ -1539,7 +1539,7 @@
"description":"compaction throughput",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1943,7 +1943,7 @@
{
"method":"GET",
"summary":"Returns the threshold for warning of queries with many tombstones",
"type":"int",
"type": "long",
"nickname":"get_tombstone_warn_threshold",
"produces":[
"application/json"
@@ -1965,7 +1965,7 @@
"description":"tombstone debug threshold",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1978,7 +1978,7 @@
{
"method":"GET",
"summary":"",
"type":"int",
"type": "long",
"nickname":"get_tombstone_failure_threshold",
"produces":[
"application/json"
@@ -2000,7 +2000,7 @@
"description":"tombstone debug threshold",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -2013,7 +2013,7 @@
{
"method":"GET",
"summary":"Returns the threshold for rejecting queries due to a large batch size",
"type":"int",
"type": "long",
"nickname":"get_batch_size_failure_threshold",
"produces":[
"application/json"
@@ -2035,7 +2035,7 @@
"description":"batch size debug threshold",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -2059,7 +2059,7 @@
"description":"throttle in kb",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -2072,7 +2072,7 @@
{
"method":"GET",
"summary":"Get load",
"type":"int",
"type": "long",
"nickname":"get_metrics_load",
"produces":[
"application/json"
@@ -2088,7 +2088,7 @@
{
"method":"GET",
"summary":"Get exceptions",
"type":"int",
"type": "long",
"nickname":"get_exceptions",
"produces":[
"application/json"
@@ -2104,7 +2104,7 @@
{
"method":"GET",
"summary":"Get total hints in progress",
"type":"int",
"type": "long",
"nickname":"get_total_hints_in_progress",
"produces":[
"application/json"
@@ -2120,7 +2120,7 @@
{
"method":"GET",
"summary":"Get total hints",
"type":"int",
"type": "long",
"nickname":"get_total_hints",
"produces":[
"application/json"

View File

@@ -32,7 +32,7 @@
{
"method":"GET",
"summary":"Get number of active outbound streams",
"type":"int",
"type": "long",
"nickname":"get_all_active_streams_outbound",
"produces":[
"application/json"
@@ -48,7 +48,7 @@
{
"method":"GET",
"summary":"Get total incoming bytes",
"type":"int",
"type": "long",
"nickname":"get_total_incoming_bytes",
"produces":[
"application/json"
@@ -72,7 +72,7 @@
{
"method":"GET",
"summary":"Get all total incoming bytes",
"type":"int",
"type": "long",
"nickname":"get_all_total_incoming_bytes",
"produces":[
"application/json"
@@ -88,7 +88,7 @@
{
"method":"GET",
"summary":"Get total outgoing bytes",
"type":"int",
"type": "long",
"nickname":"get_total_outgoing_bytes",
"produces":[
"application/json"
@@ -112,7 +112,7 @@
{
"method":"GET",
"summary":"Get all total outgoing bytes",
"type":"int",
"type": "long",
"nickname":"get_all_total_outgoing_bytes",
"produces":[
"application/json"
@@ -154,7 +154,7 @@
"description":"The peer"
},
"session_index":{
"type":"int",
"type": "long",
"description":"The session index"
},
"connecting":{
@@ -211,7 +211,7 @@
"description":"The ID"
},
"files":{
"type":"int",
"type": "long",
"description":"Number of files to transfer. Can be 0 if nothing to transfer for some streaming request."
},
"total_size":{
@@ -242,7 +242,7 @@
"description":"The peer address"
},
"session_index":{
"type":"int",
"type": "long",
"description":"The session index"
},
"file_name":{

View File

@@ -52,6 +52,21 @@
}
]
},
{
"path":"/system/uptime_ms",
"operations":[
{
"method":"GET",
"summary":"Get system uptime, in milliseconds",
"type":"long",
"nickname":"get_system_uptime",
"produces":[
"application/json"
],
"parameters":[]
}
]
},
{
"path":"/system/logger/{name}",
"operations":[

View File

@@ -23,6 +23,8 @@
#include "service/storage_proxy.hh"
#include <seastar/http/httpd.hh>
namespace service { class load_meter; }
namespace api {
struct http_context {
@@ -31,9 +33,11 @@ struct http_context {
httpd::http_server_control http_server;
distributed<database>& db;
distributed<service::storage_proxy>& sp;
service::load_meter& lmeter;
http_context(distributed<database>& _db,
distributed<service::storage_proxy>& _sp)
: db(_db), sp(_sp) {
distributed<service::storage_proxy>& _sp,
service::load_meter& _lm)
: db(_db), sp(_sp), lmeter(_lm) {
}
};

View File

@@ -27,6 +27,7 @@
#include <boost/range/adaptor/map.hpp>
#include <boost/range/adaptor/filtered.hpp>
#include "service/storage_service.hh"
#include "service/load_meter.hh"
#include "db/commitlog/commitlog.hh"
#include "gms/gossiper.hh"
#include "db/system_keyspace.hh"
@@ -55,26 +56,22 @@ static sstring validate_keyspace(http_context& ctx, const parameters& param) {
throw bad_param_exception("Keyspace " + param["keyspace"] + " Does not exist");
}
static std::vector<ss::token_range> describe_ring(const sstring& keyspace) {
std::vector<ss::token_range> res;
for (auto d : service::get_local_storage_service().describe_ring(keyspace)) {
ss::token_range r;
r.start_token = d._start_token;
r.end_token = d._end_token;
r.endpoints = d._endpoints;
r.rpc_endpoints = d._rpc_endpoints;
for (auto det : d._endpoint_details) {
ss::endpoint_detail ed;
ed.host = det._host;
ed.datacenter = det._datacenter;
if (det._rack != "") {
ed.rack = det._rack;
}
r.endpoint_details.push(ed);
static ss::token_range token_range_endpoints_to_json(const dht::token_range_endpoints& d) {
ss::token_range r;
r.start_token = d._start_token;
r.end_token = d._end_token;
r.endpoints = d._endpoints;
r.rpc_endpoints = d._rpc_endpoints;
for (auto det : d._endpoint_details) {
ss::endpoint_detail ed;
ed.host = det._host;
ed.datacenter = det._datacenter;
if (det._rack != "") {
ed.rack = det._rack;
}
res.push_back(r);
r.endpoint_details.push(ed);
}
return res;
return r;
}
void set_storage_service(http_context& ctx, routes& r) {
@@ -176,13 +173,13 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(res);
});
ss::describe_any_ring.set(r, [&ctx](const_req req) {
return describe_ring("");
ss::describe_any_ring.set(r, [&ctx](std::unique_ptr<request> req) {
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(""), token_range_endpoints_to_json));
});
ss::describe_ring.set(r, [&ctx](const_req req) {
auto keyspace = validate_keyspace(ctx, req.param);
return describe_ring(keyspace);
ss::describe_ring.set(r, [&ctx](std::unique_ptr<request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(keyspace), token_range_endpoints_to_json));
});
ss::get_host_id_map.set(r, [](const_req req) {
@@ -195,8 +192,8 @@ void set_storage_service(http_context& ctx, routes& r) {
return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);
});
ss::get_load_map.set(r, [] (std::unique_ptr<request> req) {
return service::get_local_storage_service().get_load_map().then([] (auto&& load_map) {
ss::get_load_map.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.lmeter.get_load_map().then([] (auto&& load_map) {
std::vector<ss::map_string_double> res;
for (auto i : load_map) {
ss::map_string_double val;
@@ -608,9 +605,7 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::join_ring.set(r, [](std::unique_ptr<request> req) {
return service::get_local_storage_service().join_ring().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
return make_ready_future<json::json_return_type>(json_void());
});
ss::is_joined.set(r, [] (std::unique_ptr<request> req) {

View File

@@ -30,6 +30,10 @@ namespace api {
namespace hs = httpd::system_json;
void set_system(http_context& ctx, routes& r) {
hs::get_system_uptime.set(r, [](const_req req) {
return std::chrono::duration_cast<std::chrono::milliseconds>(engine().uptime()).count();
});
hs::get_all_logger_names.set(r, [](const_req req) {
return logging::logger_registry().get_all_logger_names();
});

View File

@@ -21,6 +21,7 @@
#include "atomic_cell.hh"
#include "atomic_cell_or_collection.hh"
#include "counters.hh"
#include "types.hh"
/// LSA mirator for cells with irrelevant type
@@ -214,6 +215,61 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)
+ imr_object_type::size_overhead + external_value_size;
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell_view& acv) {
if (acv.is_live()) {
return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",
acv.is_counter_update()
? "counter_update_value=" + to_sstring(acv.counter_update_value())
: to_hex(acv.value().linearize()),
acv.timestamp(),
acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,
acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);
} else {
return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",
acv.timestamp(), acv.deletion_time().time_since_epoch().count());
}
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell& ac) {
return os << atomic_cell_view(ac);
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell_view::printer& acvp) {
auto& type = acvp._type;
auto& acv = acvp._cell;
if (acv.is_live()) {
std::ostringstream cell_value_string_builder;
if (type.is_counter()) {
if (acv.is_counter_update()) {
cell_value_string_builder << "counter_update_value=" << acv.counter_update_value();
} else {
cell_value_string_builder << "shards: ";
counter_cell_view::with_linearized(acv, [&cell_value_string_builder] (counter_cell_view& ccv) {
cell_value_string_builder << ::join(", ", ccv.shards());
});
}
} else {
cell_value_string_builder << type.to_string(acv.value().linearize());
}
return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",
cell_value_string_builder.str(),
acv.timestamp(),
acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,
acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);
} else {
return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",
acv.timestamp(), acv.deletion_time().time_since_epoch().count());
}
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell::printer& acp) {
return operator<<(os, static_cast<const atomic_cell_view::printer&>(acp));
}
std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection::printer& p) {
if (!p._cell._data.get()) {
return os << "{ null atomic_cell_or_collection }";
@@ -223,9 +279,9 @@ std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection::prin
if (dc::structure::get_member<dc::tags::flags>(p._cell._data.get()).get<dc::tags::collection>()) {
os << "collection ";
auto cmv = p._cell.as_collection_mutation();
os << to_hex(cmv.data.linearize());
os << collection_mutation_view::printer(*p._cdef.type, cmv);
} else {
os << p._cell.as_atomic_cell(p._cdef);
os << atomic_cell_view::printer(*p._cdef.type, p._cell.as_atomic_cell(p._cdef));
}
return os << " }";
}

View File

@@ -153,6 +153,14 @@ public:
}
friend std::ostream& operator<<(std::ostream& os, const atomic_cell_view& acv);
class printer {
const abstract_type& _type;
const atomic_cell_view& _cell;
public:
printer(const abstract_type& type, const atomic_cell_view& cell) : _type(type), _cell(cell) {}
friend std::ostream& operator<<(std::ostream& os, const printer& acvp);
};
};
class atomic_cell_mutable_view final : public basic_atomic_cell_view<mutable_view::yes> {
@@ -219,6 +227,12 @@ public:
static atomic_cell make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size);
friend class atomic_cell_or_collection;
friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);
class printer : atomic_cell_view::printer {
public:
printer(const abstract_type& type, const atomic_cell_view& cell) : atomic_cell_view::printer(type, cell) {}
friend std::ostream& operator<<(std::ostream& os, const printer& acvp);
};
};
class column_definition;

View File

@@ -33,6 +33,7 @@
#include "auth/resource.hh"
#include "seastarx.hh"
#include "exceptions/exceptions.hh"
namespace auth {
@@ -52,9 +53,9 @@ struct role_config_update final {
///
/// A logical argument error for a role-management operation.
///
class roles_argument_exception : public std::invalid_argument {
class roles_argument_exception : public exceptions::invalid_request_exception {
public:
using std::invalid_argument::invalid_argument;
using exceptions::invalid_request_exception::invalid_request_exception;
};
class role_already_exists : public roles_argument_exception {

View File

@@ -39,7 +39,7 @@
#include "db/consistency_level_type.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
#include "service/migration_listener.hh"
#include "service/migration_manager.hh"
#include "utils/class_registrator.hh"
#include "database.hh"
@@ -114,14 +114,14 @@ static future<> validate_role_exists(const service& ser, std::string_view role_n
service::service(
permissions_cache_config c,
cql3::query_processor& qp,
::service::migration_manager& mm,
::service::migration_notifier& mn,
std::unique_ptr<authorizer> z,
std::unique_ptr<authenticator> a,
std::unique_ptr<role_manager> r)
: _permissions_cache_config(std::move(c))
, _permissions_cache(nullptr)
, _qp(qp)
, _migration_manager(mm)
, _mnotifier(mn)
, _authorizer(std::move(z))
, _authenticator(std::move(a))
, _role_manager(std::move(r))
@@ -141,18 +141,19 @@ service::service(
service::service(
permissions_cache_config c,
cql3::query_processor& qp,
::service::migration_notifier& mn,
::service::migration_manager& mm,
const service_config& sc)
: service(
std::move(c),
qp,
mm,
mn,
create_object<authorizer>(sc.authorizer_java_name, qp, mm),
create_object<authenticator>(sc.authenticator_java_name, qp, mm),
create_object<role_manager>(sc.role_manager_java_name, qp, mm)) {
}
future<> service::create_keyspace_if_missing() const {
future<> service::create_keyspace_if_missing(::service::migration_manager& mm) const {
auto& db = _qp.db();
if (!db.has_keyspace(meta::AUTH_KS)) {
@@ -166,15 +167,15 @@ future<> service::create_keyspace_if_missing() const {
// We use min_timestamp so that default keyspace metadata will loose with any manual adjustments.
// See issue #2129.
return _migration_manager.announce_new_keyspace(ksm, api::min_timestamp, false);
return mm.announce_new_keyspace(ksm, api::min_timestamp, false);
}
return make_ready_future<>();
}
future<> service::start() {
return once_among_shards([this] {
return create_keyspace_if_missing();
future<> service::start(::service::migration_manager& mm) {
return once_among_shards([this, &mm] {
return create_keyspace_if_missing(mm);
}).then([this] {
return _role_manager->start().then([this] {
return when_all_succeed(_authorizer->start(), _authenticator->start());
@@ -183,7 +184,7 @@ future<> service::start() {
_permissions_cache = std::make_unique<permissions_cache>(_permissions_cache_config, *this, log);
}).then([this] {
return once_among_shards([this] {
_migration_manager.register_listener(_migration_listener.get());
_mnotifier.register_listener(_migration_listener.get());
return make_ready_future<>();
});
});
@@ -192,9 +193,9 @@ future<> service::start() {
future<> service::stop() {
// Only one of the shards has the listener registered, but let's try to
// unregister on each one just to make sure.
_migration_manager.unregister_listener(_migration_listener.get());
return _permissions_cache->stop().then([this] {
return _mnotifier.unregister_listener(_migration_listener.get()).then([this] {
return _permissions_cache->stop();
}).then([this] {
return when_all_succeed(_role_manager->stop(), _authorizer->stop(), _authenticator->stop());
});
}

View File

@@ -28,6 +28,7 @@
#include <seastar/core/future.hh>
#include <seastar/core/sstring.hh>
#include <seastar/util/bool_class.hh>
#include <seastar/core/sharded.hh>
#include "auth/authenticator.hh"
#include "auth/authorizer.hh"
@@ -42,6 +43,7 @@ class query_processor;
namespace service {
class migration_manager;
class migration_notifier;
class migration_listener;
}
@@ -76,13 +78,15 @@ public:
///
/// All state associated with access-control is stored externally to any particular instance of this class.
///
class service final {
/// peering_sharded_service inheritance is needed to be able to access shard local authentication service
/// given an object from another shard. Used for bouncing lwt requests to correct shard.
class service final : public seastar::peering_sharded_service<service> {
permissions_cache_config _permissions_cache_config;
std::unique_ptr<permissions_cache> _permissions_cache;
cql3::query_processor& _qp;
::service::migration_manager& _migration_manager;
::service::migration_notifier& _mnotifier;
std::unique_ptr<authorizer> _authorizer;
@@ -97,7 +101,7 @@ public:
service(
permissions_cache_config,
cql3::query_processor&,
::service::migration_manager&,
::service::migration_notifier&,
std::unique_ptr<authorizer>,
std::unique_ptr<authenticator>,
std::unique_ptr<role_manager>);
@@ -110,10 +114,11 @@ public:
service(
permissions_cache_config,
cql3::query_processor&,
::service::migration_notifier&,
::service::migration_manager&,
const service_config&);
future<> start();
future<> start(::service::migration_manager&);
future<> stop();
@@ -159,7 +164,7 @@ public:
private:
future<bool> has_existing_legacy_users() const;
future<> create_keyspace_if_missing() const;
future<> create_keyspace_if_missing(::service::migration_manager& mm) const;
};
future<bool> has_superuser(const service&, const authenticated_user&);

71
build_id.cc Normal file
View File

@@ -0,0 +1,71 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
#include "build_id.hh"
#include <fmt/printf.h>
#include <link.h>
#include <seastar/core/align.hh>
#include <sstream>
using namespace seastar;
static const Elf64_Nhdr* get_nt_build_id(dl_phdr_info* info) {
auto base = info->dlpi_addr;
const auto* h = info->dlpi_phdr;
auto num_headers = info->dlpi_phnum;
for (int i = 0; i != num_headers; ++i, ++h) {
if (h->p_type != PT_NOTE) {
continue;
}
auto* p = reinterpret_cast<const char*>(base) + h->p_vaddr;
auto* e = p + h->p_memsz;
while (p != e) {
const auto* n = reinterpret_cast<const Elf64_Nhdr*>(p);
if (n->n_type == NT_GNU_BUILD_ID) {
return n;
}
p += sizeof(Elf64_Nhdr);
p += n->n_namesz;
p = align_up(p, 4);
p += n->n_descsz;
p = align_up(p, 4);
}
}
assert(0 && "no NT_GNU_BUILD_ID note");
}
static int callback(dl_phdr_info* info, size_t size, void* data) {
std::string& ret = *(std::string*)data;
std::ostringstream os;
// The first DSO is always the main program, which has an empty name.
assert(strlen(info->dlpi_name) == 0);
auto* n = get_nt_build_id(info);
auto* p = reinterpret_cast<const char*>(n);
p += sizeof(Elf64_Nhdr);
p += n->n_namesz;
p = align_up(p, 4);
const char* desc = p;
for (unsigned i = 0; i < n->n_descsz; ++i) {
fmt::fprintf(os, "%02x", (unsigned char)*(desc + i));
}
ret = os.str();
return 1;
}
std::string get_build_id() {
std::string ret;
int r = dl_iterate_phdr(callback, &ret);
assert(r == 1);
return ret;
}

9
build_id.hh Normal file
View File

@@ -0,0 +1,9 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
#pragma once
#include <string>
std::string get_build_id();

View File

@@ -38,6 +38,7 @@ class bytes_ostream {
public:
using size_type = bytes::size_type;
using value_type = bytes::value_type;
using fragment_type = bytes_view;
static constexpr size_type max_chunk_size() { return 128 * 1024; }
private:
static_assert(sizeof(value_type) == 1, "value_type is assumed to be one byte long");
@@ -93,6 +94,29 @@ public:
return _current != other._current;
}
};
using const_iterator = fragment_iterator;
class output_iterator {
public:
using iterator_category = std::output_iterator_tag;
using difference_type = std::ptrdiff_t;
using value_type = bytes_ostream::value_type;
using pointer = bytes_ostream::value_type*;
using reference = bytes_ostream::value_type&;
friend class bytes_ostream;
private:
bytes_ostream* _ostream = nullptr;
private:
explicit output_iterator(bytes_ostream& os) : _ostream(&os) { }
public:
reference operator*() const { return *_ostream->write_place_holder(1); }
output_iterator& operator++() { return *this; }
output_iterator operator++(int) { return *this; }
};
private:
inline size_type current_space_left() const {
if (!_current) {
@@ -289,6 +313,11 @@ public:
return _size;
}
// For the FragmentRange concept
size_type size_bytes() const {
return _size;
}
bool empty() const {
return _size == 0;
}
@@ -326,6 +355,8 @@ public:
fragment_iterator begin() const { return { _begin.get() }; }
fragment_iterator end() const { return { nullptr }; }
output_iterator write_begin() { return output_iterator(*this); }
boost::iterator_range<fragment_iterator> fragments() const {
return { begin(), end() };
}

View File

@@ -35,6 +35,7 @@
#include "idl/uuid.dist.impl.hh"
#include "idl/keys.dist.impl.hh"
#include "idl/mutation.dist.impl.hh"
#include <iostream>
canonical_mutation::canonical_mutation(bytes data)
: _data(std::move(data))
@@ -89,3 +90,81 @@ mutation canonical_mutation::to_mutation(schema_ptr s) const {
}
return m;
}
static sstring bytes_to_text(bytes_view bv) {
sstring ret(sstring::initialized_later(), bv.size());
std::copy_n(reinterpret_cast<const char*>(bv.data()), bv.size(), ret.data());
return ret;
}
std::ostream& operator<<(std::ostream& os, const canonical_mutation& cm) {
auto in = ser::as_input_stream(cm._data);
auto mv = ser::deserialize(in, boost::type<ser::canonical_mutation_view>());
column_mapping mapping = mv.mapping();
auto partition_view = mutation_partition_view::from_view(mv.partition());
fmt::print(os, "{{canonical_mutation: ");
fmt::print(os, "table_id {} schema_version {} ", mv.table_id(), mv.schema_version());
fmt::print(os, "partition_key {} ", mv.key());
class printing_visitor : public mutation_partition_view_virtual_visitor {
std::ostream& _os;
const column_mapping& _cm;
bool _first = true;
bool _in_row = false;
private:
void print_separator() {
if (!_first) {
fmt::print(_os, ", ");
}
_first = false;
}
public:
printing_visitor(std::ostream& os, const column_mapping& cm) : _os(os), _cm(cm) {}
virtual void accept_partition_tombstone(tombstone t) override {
print_separator();
fmt::print(_os, "partition_tombstone {}", t);
}
virtual void accept_static_cell(column_id id, atomic_cell ac) override {
print_separator();
auto&& entry = _cm.static_column_at(id);
fmt::print(_os, "static column {} {}", bytes_to_text(entry.name()), atomic_cell::printer(*entry.type(), ac));
}
virtual void accept_static_cell(column_id id, collection_mutation_view cmv) override {
print_separator();
auto&& entry = _cm.static_column_at(id);
fmt::print(_os, "static column {} {}", bytes_to_text(entry.name()), collection_mutation_view::printer(*entry.type(), cmv));
}
virtual void accept_row_tombstone(range_tombstone rt) override {
print_separator();
fmt::print(_os, "row tombstone {}", rt);
}
virtual void accept_row(position_in_partition_view pipv, row_tombstone rt, row_marker rm, is_dummy, is_continuous) override {
if (_in_row) {
fmt::print(_os, "}}, ");
}
fmt::print(_os, "{{row {} tombstone {} marker {}", pipv, rt, rm);
_in_row = true;
_first = false;
}
virtual void accept_row_cell(column_id id, atomic_cell ac) override {
print_separator();
auto&& entry = _cm.regular_column_at(id);
fmt::print(_os, "column {} {}", bytes_to_text(entry.name()), atomic_cell::printer(*entry.type(), ac));
}
virtual void accept_row_cell(column_id id, collection_mutation_view cmv) override {
print_separator();
auto&& entry = _cm.regular_column_at(id);
fmt::print(_os, "column {} {}", bytes_to_text(entry.name()), collection_mutation_view::printer(*entry.type(), cmv));
}
void finalize() {
if (_in_row) {
fmt::print(_os, "}}");
}
}
};
printing_visitor pv(os, mapping);
partition_view.accept(mapping, pv);
pv.finalize();
fmt::print(os, "}}");
return os;
}

View File

@@ -26,6 +26,7 @@
#include "database_fwd.hh"
#include "mutation_partition_visitor.hh"
#include "mutation_partition_serializer.hh"
#include <iosfwd>
// Immutable mutation form which can be read using any schema version of the same table.
// Safe to access from other shards via const&.
@@ -52,4 +53,5 @@ public:
const bytes& representation() const { return _data; }
friend std::ostream& operator<<(std::ostream& os, const canonical_mutation& cm);
};

View File

@@ -22,6 +22,7 @@
#include <utility>
#include <algorithm>
#include <boost/range/irange.hpp>
#include <seastar/util/defer.hh>
#include <seastar/core/thread.hh>
@@ -33,19 +34,20 @@
#include "partition_slice_builder.hh"
#include "schema.hh"
#include "schema_builder.hh"
#include "service/migration_manager.hh"
#include "service/migration_listener.hh"
#include "service/storage_service.hh"
#include "types/tuple.hh"
#include "cql3/statements/select_statement.hh"
#include "cql3/multi_column_relation.hh"
#include "cql3/tuples.hh"
#include "log.hh"
#include "json.hh"
using locator::snitch_ptr;
using locator::token_metadata;
using locator::topology;
using seastar::sstring;
using service::migration_manager;
using service::migration_notifier;
using service::storage_proxy;
namespace std {
@@ -62,6 +64,196 @@ using namespace std::chrono_literals;
static logging::logger cdc_log("cdc");
namespace cdc {
static schema_ptr create_log_schema(const schema&, std::optional<utils::UUID> = {});
static schema_ptr create_stream_description_table_schema(const schema&, std::optional<utils::UUID> = {});
static future<> populate_desc(db_context ctx, const schema& s);
}
class cdc::cdc_service::impl : service::migration_listener::empty_listener {
friend cdc_service;
db_context _ctxt;
bool _stopped = false;
public:
impl(db_context ctxt)
: _ctxt(std::move(ctxt))
{
_ctxt._migration_notifier.register_listener(this);
}
~impl() {
assert(_stopped);
}
future<> stop() {
return _ctxt._migration_notifier.unregister_listener(this).then([this] {
_stopped = true;
});
}
void on_before_create_column_family(const schema& schema, std::vector<mutation>& mutations, api::timestamp_type timestamp) override {
if (schema.cdc_options().enabled()) {
auto& db = _ctxt._proxy.get_db().local();
auto logname = log_name(schema.cf_name());
if (!db.has_schema(schema.ks_name(), logname)) {
// in seastar thread
auto log_schema = create_log_schema(schema);
auto stream_desc_schema = create_stream_description_table_schema(schema);
auto& keyspace = db.find_keyspace(schema.ks_name());
auto log_mut = db::schema_tables::make_create_table_mutations(keyspace.metadata(), log_schema, timestamp);
auto stream_mut = db::schema_tables::make_create_table_mutations(keyspace.metadata(), stream_desc_schema, timestamp);
mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
}
}
}
void on_before_update_column_family(const schema& new_schema, const schema& old_schema, std::vector<mutation>& mutations, api::timestamp_type timestamp) override {
bool is_cdc = new_schema.cdc_options().enabled();
bool was_cdc = old_schema.cdc_options().enabled();
// we need to create or modify the log & stream schemas iff either we changed cdc status (was != is)
// or if cdc is on now unconditionally, since then any actual base schema changes will affect the column
// etc.
if (was_cdc || is_cdc) {
auto logname = log_name(old_schema.cf_name());
auto descname = desc_name(old_schema.cf_name());
auto& db = _ctxt._proxy.get_db().local();
auto& keyspace = db.find_keyspace(old_schema.ks_name());
auto log_schema = was_cdc ? db.find_column_family(old_schema.ks_name(), logname).schema() : nullptr;
auto stream_desc_schema = was_cdc ? db.find_column_family(old_schema.ks_name(), descname).schema() : nullptr;
if (!is_cdc) {
auto log_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), log_schema, timestamp);
auto stream_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), stream_desc_schema, timestamp);
mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
return;
}
auto new_log_schema = create_log_schema(new_schema, log_schema ? std::make_optional(log_schema->id()) : std::nullopt);
auto new_stream_desc_schema = create_stream_description_table_schema(new_schema, stream_desc_schema ? std::make_optional(stream_desc_schema->id()) : std::nullopt);
auto log_mut = log_schema
? db::schema_tables::make_update_table_mutations(keyspace.metadata(), log_schema, new_log_schema, timestamp, false)
: db::schema_tables::make_create_table_mutations(keyspace.metadata(), new_log_schema, timestamp)
;
auto stream_mut = stream_desc_schema
? db::schema_tables::make_update_table_mutations(keyspace.metadata(), stream_desc_schema, new_stream_desc_schema, timestamp, false)
: db::schema_tables::make_create_table_mutations(keyspace.metadata(), new_stream_desc_schema, timestamp)
;
mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
}
}
void on_before_drop_column_family(const schema& schema, std::vector<mutation>& mutations, api::timestamp_type timestamp) override {
if (schema.cdc_options().enabled()) {
auto logname = log_name(schema.cf_name());
auto descname = desc_name(schema.cf_name());
auto& db = _ctxt._proxy.get_db().local();
auto& keyspace = db.find_keyspace(schema.ks_name());
auto log_schema = db.find_column_family(schema.ks_name(), logname).schema();
auto stream_desc_schema = db.find_column_family(schema.ks_name(), descname).schema();
auto log_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), log_schema, timestamp);
auto stream_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), stream_desc_schema, timestamp);
mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
}
}
void on_create_column_family(const sstring& ks_name, const sstring& cf_name) override {
// This callback is done on all shards. Only do the work once.
if (engine().cpu_id() != 0) {
return;
}
auto& db = _ctxt._proxy.get_db().local();
auto& cf = db.find_column_family(ks_name, cf_name);
auto schema = cf.schema();
if (schema->cdc_options().enabled()) {
populate_desc(_ctxt, *schema).get();
}
}
void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool columns_changed) override {
on_create_column_family(ks_name, cf_name);
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {}
future<std::tuple<std::vector<mutation>, result_callback>> augment_mutation_call(
lowres_clock::time_point timeout,
std::vector<mutation>&& mutations
);
template<typename Iter>
future<> append_mutations(Iter i, Iter e, schema_ptr s, lowres_clock::time_point, std::vector<mutation>&);
};
cdc::cdc_service::cdc_service(service::storage_proxy& proxy)
: cdc_service(db_context::builder(proxy).build())
{}
cdc::cdc_service::cdc_service(db_context ctxt)
: _impl(std::make_unique<impl>(std::move(ctxt)))
{
_impl->_ctxt._proxy.set_cdc_service(this);
}
future<> cdc::cdc_service::stop() {
return _impl->stop();
}
cdc::cdc_service::~cdc_service() = default;
cdc::options::options(const std::map<sstring, sstring>& map) {
if (map.find("enabled") == std::end(map)) {
return;
}
for (auto& p : map) {
if (p.first == "enabled") {
_enabled = p.second == "true";
} else if (p.first == "preimage") {
_preimage = p.second == "true";
} else if (p.first == "postimage") {
_postimage = p.second == "true";
} else if (p.first == "ttl") {
_ttl = std::stoi(p.second);
} else {
throw exceptions::configuration_exception("Invalid CDC option: " + p.first);
}
}
}
std::map<sstring, sstring> cdc::options::to_map() const {
if (!_enabled) {
return {};
}
return {
{ "enabled", _enabled ? "true" : "false" },
{ "preimage", _preimage ? "true" : "false" },
{ "postimage", _postimage ? "true" : "false" },
{ "ttl", std::to_string(_ttl) },
};
}
sstring cdc::options::to_sstring() const {
return json::to_json(to_map());
}
bool cdc::options::operator==(const options& o) const {
return _enabled == o._enabled && _preimage == o._preimage && _postimage == o._postimage && _ttl == o._ttl;
}
bool cdc::options::operator!=(const options& o) const {
return !(*this == o);
}
namespace cdc {
using operation_native_type = std::underlying_type_t<operation>;
@@ -77,41 +269,8 @@ sstring desc_name(const sstring& table_name) {
return table_name + cdc_desc_suffix;
}
static future<>
remove_log(db_context ctx, const sstring& ks_name, const sstring& table_name) {
try {
return ctx._migration_manager.announce_column_family_drop(
ks_name, log_name(table_name), false);
} catch (exceptions::configuration_exception& e) {
// It's fine if the table does not exist.
return make_ready_future<>();
} catch (...) {
return make_exception_future<>(std::current_exception());
}
}
static future<>
remove_desc(db_context ctx, const sstring& ks_name, const sstring& table_name) {
try {
return ctx._migration_manager.announce_column_family_drop(
ks_name, desc_name(table_name), false);
} catch (exceptions::configuration_exception& e) {
// It's fine if the table does not exist.
return make_ready_future<>();
} catch (...) {
return make_exception_future<>(std::current_exception());
}
}
future<>
remove(db_context ctx, const sstring& ks_name, const sstring& table_name) {
return when_all(remove_log(ctx, ks_name, table_name),
remove_desc(ctx, ks_name, table_name)).discard_result();
}
static future<> setup_log(db_context ctx, const schema& s) {
static schema_ptr create_log_schema(const schema& s, std::optional<utils::UUID> uuid) {
schema_builder b(s.ks_name(), log_name(s.cf_name()));
b.set_default_time_to_live(gc_clock::duration{s.cdc_options().ttl()});
b.set_comment(sprint("CDC log for %s.%s", s.ks_name(), s.cf_name()));
b.with_column("stream_id", uuid_type, column_kind::partition_key);
b.with_column("time", timeuuid_type, column_kind::clustering_key);
@@ -131,17 +290,27 @@ static future<> setup_log(db_context ctx, const schema& s) {
add_columns(s.clustering_key_columns());
add_columns(s.static_columns(), true);
add_columns(s.regular_columns(), true);
return ctx._migration_manager.announce_new_column_family(b.build(), false);
if (uuid) {
b.set_uuid(*uuid);
}
return b.build();
}
static future<> setup_stream_description_table(db_context ctx, const schema& s) {
static schema_ptr create_stream_description_table_schema(const schema& s, std::optional<utils::UUID> uuid) {
schema_builder b(s.ks_name(), desc_name(s.cf_name()));
b.set_comment(sprint("CDC description for %s.%s", s.ks_name(), s.cf_name()));
b.with_column("node_ip", inet_addr_type, column_kind::partition_key);
b.with_column("shard_id", int32_type, column_kind::partition_key);
b.with_column("created_at", timestamp_type, column_kind::clustering_key);
b.with_column("stream_id", uuid_type);
return ctx._migration_manager.announce_new_column_family(b.build(), false);
if (uuid) {
b.set_uuid(*uuid);
}
return b.build();
}
// This function assumes setup_stream_description_table was called on |s| before the call to this
@@ -201,22 +370,34 @@ static future<> populate_desc(db_context ctx, const schema& s) {
empty_service_permit());
}
future<> setup(db_context ctx, schema_ptr s) {
return seastar::async([ctx = std::move(ctx), s = std::move(s)] {
setup_log(ctx, *s).get();
auto log_guard = seastar::defer([&] { remove_log(ctx, s->ks_name(), s->cf_name()).get(); });
setup_stream_description_table(ctx, *s).get();
auto desc_guard = seastar::defer([&] { remove_desc(ctx, s->ks_name(), s->cf_name()).get(); });
populate_desc(ctx, *s).get();
desc_guard.cancel();
log_guard.cancel();
});
db_context::builder::builder(service::storage_proxy& proxy)
: _proxy(proxy)
{}
db_context::builder& db_context::builder::with_migration_notifier(service::migration_notifier& migration_notifier) {
_migration_notifier = migration_notifier;
return *this;
}
db_context::builder& db_context::builder::with_token_metadata(locator::token_metadata& token_metadata) {
_token_metadata = token_metadata;
return *this;
}
db_context::builder& db_context::builder::with_snitch(locator::snitch_ptr& snitch) {
_snitch = snitch;
return *this;
}
db_context::builder& db_context::builder::with_partitioner(dht::i_partitioner& partitioner) {
_partitioner = partitioner;
return *this;
}
db_context db_context::builder::build() {
return db_context{
_proxy,
_migration_manager ? _migration_manager->get() : service::get_local_migration_manager(),
_migration_notifier ? _migration_notifier->get() : service::get_local_storage_service().get_migration_notifier(),
_token_metadata ? _token_metadata->get() : service::get_local_storage_service().get_token_metadata(),
_snitch ? _snitch->get() : locator::i_endpoint_snitch::get_local_snitch_ptr(),
_partitioner ? _partitioner->get() : dht::global_partitioner()
@@ -234,6 +415,7 @@ private:
bytes _decomposed_time;
::shared_ptr<const transformer::streams_type> _streams;
const column_definition& _op_col;
ttl_opt _cdc_ttl_opt;
clustering_key set_pk_columns(const partition_key& pk, int batch_no, mutation& m) const {
const auto log_ck = clustering_key::from_exploded(
@@ -245,7 +427,8 @@ private:
auto cdef = m.schema()->get_column_definition(to_bytes("_" + column.name()));
auto value = atomic_cell::make_live(*column.type,
_time.timestamp(),
bytes_view(pk_value[pos]));
bytes_view(pk_value[pos]),
_cdc_ttl_opt);
m.set_cell(log_ck, *cdef, std::move(value));
++pos;
}
@@ -253,7 +436,7 @@ private:
}
void set_operation(const clustering_key& ck, operation op, mutation& m) const {
m.set_cell(ck, _op_col, atomic_cell::make_live(*_op_col.type, _time.timestamp(), _op_col.type->decompose(operation_native_type(op))));
m.set_cell(ck, _op_col, atomic_cell::make_live(*_op_col.type, _time.timestamp(), _op_col.type->decompose(operation_native_type(op)), _cdc_ttl_opt));
}
partition_key stream_id(const net::inet_address& ip, unsigned int shard_id) const {
@@ -272,7 +455,11 @@ public:
, _decomposed_time(timeuuid_type->decompose(_time))
, _streams(std::move(streams))
, _op_col(*_log_schema->get_column_definition(to_bytes("operation")))
{}
{
if (_schema->cdc_options().ttl()) {
_cdc_ttl_opt = std::chrono::seconds(_schema->cdc_options().ttl());
}
}
// TODO: is pre-image data based on query enough. We only have actual column data. Do we need
// more details like tombstones/ttl? Probably not but keep in mind.
@@ -304,7 +491,8 @@ public:
auto cdef = _log_schema->get_column_definition(to_bytes("_" + column.name()));
auto value = atomic_cell::make_live(*column.type,
_time.timestamp(),
bytes_view(exploded[pos]));
bytes_view(exploded[pos]),
_cdc_ttl_opt);
res.set_cell(log_ck, *cdef, std::move(value));
++pos;
}
@@ -360,11 +548,11 @@ public:
for (const auto& column : _schema->clustering_key_columns()) {
assert (pos < ck_value.size());
auto cdef = _log_schema->get_column_definition(to_bytes("_" + column.name()));
res.set_cell(log_ck, *cdef, atomic_cell::make_live(*column.type, _time.timestamp(), bytes_view(ck_value[pos])));
res.set_cell(log_ck, *cdef, atomic_cell::make_live(*column.type, _time.timestamp(), bytes_view(ck_value[pos]), _cdc_ttl_opt));
if (pirow) {
assert(pirow->has(column.name_as_text()));
res.set_cell(*pikey, *cdef, atomic_cell::make_live(*column.type, _time.timestamp(), bytes_view(ck_value[pos])));
res.set_cell(*pikey, *cdef, atomic_cell::make_live(*column.type, _time.timestamp(), bytes_view(ck_value[pos]), _cdc_ttl_opt));
}
++pos;
@@ -393,7 +581,7 @@ public:
}
values[0] = data_type_for<column_op_native_type>()->decompose(data_value(static_cast<column_op_native_type>(op)));
res.set_cell(log_ck, *dst, atomic_cell::make_live(*dst->type, _time.timestamp(), tuple_type_impl::build_value(values)));
res.set_cell(log_ck, *dst, atomic_cell::make_live(*dst->type, _time.timestamp(), tuple_type_impl::build_value(values), _cdc_ttl_opt));
if (pirow && pirow->has(cdef.name_as_text())) {
values[0] = data_type_for<column_op_native_type>()->decompose(data_value(static_cast<column_op_native_type>(column_op::set)));
@@ -402,7 +590,7 @@ public:
assert(std::addressof(res.partition().clustered_row(*_log_schema, *pikey)) != std::addressof(res.partition().clustered_row(*_log_schema, log_ck)));
assert(pikey->explode() != log_ck.explode());
res.set_cell(*pikey, *dst, atomic_cell::make_live(*dst->type, _time.timestamp(), tuple_type_impl::build_value(values)));
res.set_cell(*pikey, *dst, atomic_cell::make_live(*dst->type, _time.timestamp(), tuple_type_impl::build_value(values), _cdc_ttl_opt));
}
} else {
cdc_log.warn("Non-atomic cell ignored {}.{}:{}", _schema->ks_name(), _schema->cf_name(), cdef.name_as_text());
@@ -426,7 +614,6 @@ public:
}
future<lw_shared_ptr<cql3::untyped_result_set>> pre_image_select(
service::storage_proxy& proxy,
service::client_state& client_state,
db::consistency_level cl,
const mutation& m)
@@ -474,10 +661,10 @@ public:
auto partition_slice = query::partition_slice(std::move(bounds), std::move(static_columns), std::move(regular_columns), selection->get_query_options());
auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(), partition_slice, query::max_partitions);
return proxy.query(_schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), empty_service_permit(), client_state)).then(
[this, partition_slice = std::move(partition_slice), selection = std::move(selection)] (service::storage_proxy::coordinator_query_result qr) -> lw_shared_ptr<cql3::untyped_result_set> {
return _ctx._proxy.query(_schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), empty_service_permit(), client_state)).then(
[s = _schema, partition_slice = std::move(partition_slice), selection = std::move(selection)] (service::storage_proxy::coordinator_query_result qr) -> lw_shared_ptr<cql3::untyped_result_set> {
cql3::selection::result_set_builder builder(*selection, gc_clock::now(), cql_serialization_format::latest());
query::result_view::consume(*qr.query_result, partition_slice, cql3::selection::result_set_builder::visitor(builder, *_schema, *selection));
query::result_view::consume(*qr.query_result, partition_slice, cql3::selection::result_set_builder::visitor(builder, *s, *selection));
auto result_set = builder.build();
if (!result_set || result_set->empty()) {
return {};
@@ -578,27 +765,71 @@ static future<::shared_ptr<transformer::streams_type>> get_streams(
});
}
future<std::vector<mutation>> append_log_mutations(
db_context ctx,
schema_ptr s,
service::storage_proxy::clock_type::time_point timeout,
service::query_state& qs,
std::vector<mutation> muts) {
auto mp = ::make_lw_shared<std::vector<mutation>>(std::move(muts));
template <typename Func>
future<std::vector<mutation>>
transform_mutations(std::vector<mutation>& muts, decltype(muts.size()) batch_size, Func&& f) {
return parallel_for_each(
boost::irange(static_cast<decltype(muts.size())>(0), muts.size(), batch_size),
std::move(f))
.then([&muts] () mutable { return std::move(muts); });
}
return get_streams(ctx, s->ks_name(), s->cf_name(), timeout, qs).then([ctx, s = std::move(s), mp, &qs](::shared_ptr<transformer::streams_type> streams) mutable {
mp->reserve(2 * mp->size());
auto trans = make_lw_shared<transformer>(ctx, s, std::move(streams));
auto i = mp->begin();
auto e = mp->end();
return parallel_for_each(i, e, [ctx, &qs, trans, mp](mutation& m) {
return trans->pre_image_select(ctx._proxy, qs.get_client_state(), db::consistency_level::LOCAL_QUORUM, m).then([trans, mp, &m](lw_shared_ptr<cql3::untyped_result_set> rs) {
mp->push_back(trans->transform(m, rs.get()));
} // namespace cdc
future<std::tuple<std::vector<mutation>, cdc::result_callback>>
cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout, std::vector<mutation>&& mutations) {
// we do all this because in the case of batches, we can have mixed schemas.
auto e = mutations.end();
auto i = std::find_if(mutations.begin(), e, [](const mutation& m) {
return m.schema()->cdc_options().enabled();
});
if (i == e) {
return make_ready_future<std::tuple<std::vector<mutation>, cdc::result_callback>>(std::make_tuple(std::move(mutations), result_callback{}));
}
mutations.reserve(2 * mutations.size());
return do_with(std::move(mutations), service::query_state(service::client_state::for_internal_calls(), empty_service_permit()), [this, timeout, i](std::vector<mutation>& mutations, service::query_state& qs) {
return transform_mutations(mutations, 1, [this, &mutations, timeout, &qs] (int idx) {
auto& m = mutations[idx];
auto s = m.schema();
if (!s->cdc_options().enabled()) {
return make_ready_future<>();
}
// for batches/multiple mutations this is super inefficient. either partition the mutation set by schema
// and re-use streams, or probably better: add a cache so this lookup is a noop on second mutation
return get_streams(_ctxt, s->ks_name(), s->cf_name(), timeout, qs).then([this, s = std::move(s), &qs, &mutations, idx](::shared_ptr<transformer::streams_type> streams) mutable {
auto& m = mutations[idx]; // should not really need because of reserve, but lets be conservative
transformer trans(_ctxt, s, streams);
if (!s->cdc_options().preimage()) {
mutations.emplace_back(trans.transform(m));
return make_ready_future<>();
}
// Note: further improvement here would be to coalesce the pre-image selects into one
// iff a batch contains several modifications to the same table. Otoh, batch is rare(?)
// so this is premature.
auto f = trans.pre_image_select(qs.get_client_state(), db::consistency_level::LOCAL_QUORUM, m);
return f.then([trans = std::move(trans), &mutations, idx] (lw_shared_ptr<cql3::untyped_result_set> rs) mutable {
mutations.push_back(trans.transform(mutations[idx], rs.get()));
});
});
}).then([mp] {
return std::move(*mp);
}).then([](std::vector<mutation> mutations) {
return make_ready_future<std::tuple<std::vector<mutation>, cdc::result_callback>>(std::make_tuple(std::move(mutations), result_callback{}));
});
});
}
} // namespace cdc
bool cdc::cdc_service::needs_cdc_augmentation(const std::vector<mutation>& mutations) const {
return std::any_of(mutations.begin(), mutations.end(), [](const mutation& m) {
return m.schema()->cdc_options().enabled();
});
}
future<std::tuple<std::vector<mutation>, cdc::result_callback>>
cdc::cdc_service::augment_mutation_call(lowres_clock::time_point timeout, std::vector<mutation>&& mutations) {
return _impl->augment_mutation_call(timeout, std::move(mutations));
}

View File

@@ -33,8 +33,8 @@
#include <seastar/core/sstring.hh>
#include "exceptions/exceptions.hh"
#include "json.hh"
#include "timestamp.hh"
#include "cdc_options.hh"
class schema;
using schema_ptr = seastar::lw_shared_ptr<const schema>;
@@ -48,7 +48,7 @@ class token_metadata;
namespace service {
class migration_manager;
class migration_notifier;
class storage_proxy;
class query_state;
@@ -65,110 +65,63 @@ class partition_key;
namespace cdc {
class options final {
bool _enabled = false;
bool _preimage = false;
bool _postimage = false;
int _ttl = 86400; // 24h in seconds
class db_context;
// Callback to be invoked on mutation finish to fix
// the whole bit about post-image.
// TODO: decide on what the parameters are to be for this.
using result_callback = std::function<future<>()>;
/// \brief CDC service, responsible for schema listeners
///
/// CDC service will listen for schema changes and iff CDC is enabled/changed
/// create/modify/delete corresponding log tables etc as part of the schema change.
///
class cdc_service {
class impl;
std::unique_ptr<impl> _impl;
public:
options() = default;
options(const std::map<sstring, sstring>& map) {
if (map.find("enabled") == std::end(map)) {
return;
}
future<> stop();
cdc_service(service::storage_proxy&);
cdc_service(db_context);
~cdc_service();
for (auto& p : map) {
if (p.first == "enabled") {
_enabled = p.second == "true";
} else if (p.first == "preimage") {
_preimage = p.second == "true";
} else if (p.first == "postimage") {
_postimage = p.second == "true";
} else if (p.first == "ttl") {
_ttl = std::stoi(p.second);
} else {
throw exceptions::configuration_exception("Invalid CDC option: " + p.first);
}
}
}
std::map<sstring, sstring> to_map() const {
if (!_enabled) {
return {};
}
return {
{ "enabled", _enabled ? "true" : "false" },
{ "preimage", _preimage ? "true" : "false" },
{ "postimage", _postimage ? "true" : "false" },
{ "ttl", std::to_string(_ttl) },
};
}
sstring to_sstring() const {
return json::to_json(to_map());
}
bool enabled() const { return _enabled; }
bool preimage() const { return _preimage; }
bool postimage() const { return _postimage; }
int ttl() const { return _ttl; }
bool operator==(const options& o) const {
return _enabled == o._enabled && _preimage == o._preimage && _postimage == o._postimage && _ttl == o._ttl;
}
bool operator!=(const options& o) const {
return !(*this == o);
}
// If any of the mutations are cdc enabled, optionally selects preimage, and adds the
// appropriate augments to set the log entries.
// Iff post-image is enabled for any of these, a non-empty callback is also
// returned to be invoked post the mutation query.
future<std::tuple<std::vector<mutation>, result_callback>> augment_mutation_call(
lowres_clock::time_point timeout,
std::vector<mutation>&& mutations
);
bool needs_cdc_augmentation(const std::vector<mutation>&) const;
};
struct db_context final {
service::storage_proxy& _proxy;
service::migration_manager& _migration_manager;
service::migration_notifier& _migration_notifier;
locator::token_metadata& _token_metadata;
locator::snitch_ptr& _snitch;
dht::i_partitioner& _partitioner;
class builder final {
service::storage_proxy& _proxy;
std::optional<std::reference_wrapper<service::migration_manager>> _migration_manager;
std::optional<std::reference_wrapper<service::migration_notifier>> _migration_notifier;
std::optional<std::reference_wrapper<locator::token_metadata>> _token_metadata;
std::optional<std::reference_wrapper<locator::snitch_ptr>> _snitch;
std::optional<std::reference_wrapper<dht::i_partitioner>> _partitioner;
public:
builder(service::storage_proxy& proxy) : _proxy(proxy) { }
builder(service::storage_proxy& proxy);
builder& with_migration_manager(service::migration_manager& migration_manager) {
_migration_manager = migration_manager;
return *this;
}
builder& with_token_metadata(locator::token_metadata& token_metadata) {
_token_metadata = token_metadata;
return *this;
}
builder& with_snitch(locator::snitch_ptr& snitch) {
_snitch = snitch;
return *this;
}
builder& with_partitioner(dht::i_partitioner& partitioner) {
_partitioner = partitioner;
return *this;
}
builder& with_migration_notifier(service::migration_notifier& migration_notifier);
builder& with_token_metadata(locator::token_metadata& token_metadata);
builder& with_snitch(locator::snitch_ptr& snitch);
builder& with_partitioner(dht::i_partitioner& partitioner);
db_context build();
};
};
/// \brief Sets up CDC related tables for a given table
///
/// This function not only creates CDC Log and CDC Description for a given table
/// but also populates CDC Description with a list of change streams.
///
/// param[in] ctx object with references to database components
/// param[in] schema schema of a table for which CDC tables are being created
seastar::future<> setup(db_context ctx, schema_ptr schema);
// cdc log table operation
enum class operation : int8_t {
// note: these values will eventually be read by a third party, probably not privvy to this
@@ -182,52 +135,8 @@ enum class column_op : int8_t {
set = 0, del = 1, add = 2,
};
/// \brief Deletes CDC Log and CDC Description tables for a given table
///
/// This function cleans up all CDC related tables created for a given table.
/// At the moment, CDC Log and CDC Description are the only affected tables.
/// It's ok if some/all of them don't exist.
///
/// \param[in] ctx object with references to database components
/// \param[in] ks_name keyspace name of a table for which CDC tables are removed
/// \param[in] table_name name of a table for which CDC tables are removed
///
/// \pre This function works correctly no matter if CDC Log and/or CDC Description
/// exist.
seastar::future<>
remove(db_context ctx, const seastar::sstring& ks_name, const seastar::sstring& table_name);
seastar::sstring log_name(const seastar::sstring& table_name);
seastar::sstring desc_name(const seastar::sstring& table_name);
/// \brief For each mutation in the set appends related CDC Log mutation
///
/// This function should be called with a set of mutations of a table
/// with CDC enabled. Returned set of mutations contains all original mutations
/// and for each original mutation appends a mutation to CDC Log that reflects
/// the change.
///
/// \param[in] ctx object with references to database components
/// \param[in] s schema of a CDC enabled table which is being modified
/// \param[in] timeout period of time after which a request is considered timed out
/// \param[in] qs the state of the query that's being executed
/// \param[in] mutations set of changes of a CDC enabled table
///
/// \return set of mutations from input parameter with relevant CDC Log mutations appended
///
/// \pre CDC Log and CDC Description have to exist
/// \pre CDC Description has to be in sync with cluster topology
///
/// \note At the moment, cluster topology changes are not supported
// so the assumption that CDC Description is in sync with cluster topology
// is easy to enforce. When support for cluster topology changes is added
// it has to make sure the assumption holds.
seastar::future<std::vector<mutation>>append_log_mutations(
db_context ctx,
schema_ptr s,
lowres_clock::time_point timeout,
service::query_state& qs,
std::vector<mutation> mutations);
} // namespace cdc

51
cdc/cdc_options.hh Normal file
View File

@@ -0,0 +1,51 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <map>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
namespace cdc {
class options final {
bool _enabled = false;
bool _preimage = false;
bool _postimage = false;
int _ttl = 86400; // 24h in seconds
public:
options() = default;
options(const std::map<sstring, sstring>& map);
std::map<sstring, sstring> to_map() const;
sstring to_sstring() const;
bool enabled() const { return _enabled; }
bool preimage() const { return _preimage; }
bool postimage() const { return _postimage; }
int ttl() const { return _ttl; }
bool operator==(const options& o) const;
bool operator!=(const options& o) const;
};
} // namespace cdc

View File

@@ -32,8 +32,8 @@
collection_mutation::collection_mutation(const abstract_type& type, collection_mutation_view v)
: _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator())) {}
collection_mutation::collection_mutation(const abstract_type& type, bytes_view v)
: _data(imr_object_type::make(data::cell::make_collection(v), &type.imr_state().lsa_migrator())) {}
collection_mutation::collection_mutation(const abstract_type& type, const bytes_ostream& data)
: _data(imr_object_type::make(data::cell::make_collection(fragment_range_view(data)), &type.imr_state().lsa_migrator())) {}
static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)
{
@@ -55,51 +55,49 @@ collection_mutation_view atomic_cell_or_collection::as_collection_mutation() con
}
bool collection_mutation_view::is_empty() const {
return data.with_linearized([&] (bytes_view in) { // FIXME: we can guarantee that this is in the first fragment
auto has_tomb = read_simple<bool>(in);
return !has_tomb && read_simple<uint32_t>(in) == 0;
});
auto in = collection_mutation_input_stream(data);
auto has_tomb = in.read_trivial<bool>();
return !has_tomb && in.read_trivial<uint32_t>() == 0;
}
template <typename F>
GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, bytes_view&>)
GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, collection_mutation_input_stream&>)
static bool is_any_live(const atomic_cell_value_view& data, tombstone tomb, gc_clock::time_point now, F&& read_cell_type_info) {
return data.with_linearized([&] (bytes_view in) {
auto has_tomb = read_simple<bool>(in);
auto in = collection_mutation_input_stream(data);
auto has_tomb = in.read_trivial<bool>();
if (has_tomb) {
auto ts = read_simple<api::timestamp_type>(in);
auto ttl = read_simple<gc_clock::duration::rep>(in);
auto ts = in.read_trivial<api::timestamp_type>();
auto ttl = in.read_trivial<gc_clock::duration::rep>();
tomb.apply(tombstone{ts, gc_clock::time_point(gc_clock::duration(ttl))});
}
auto nr = read_simple<uint32_t>(in);
auto nr = in.read_trivial<uint32_t>();
for (uint32_t i = 0; i != nr; ++i) {
auto& type_info = read_cell_type_info(in);
auto vsize = read_simple<uint32_t>(in);
auto value = atomic_cell_view::from_bytes(type_info, read_simple_bytes(in, vsize));
auto vsize = in.read_trivial<uint32_t>();
auto value = atomic_cell_view::from_bytes(type_info, in.read(vsize));
if (value.is_live(tomb, now, false)) {
return true;
}
}
return false;
});
}
bool collection_mutation_view::is_any_live(const abstract_type& type, tombstone tomb, gc_clock::time_point now) const {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
auto& type_info = ctype.value_comparator()->imr_state().type_info();
return ::is_any_live(data, tomb, now, [&type_info] (bytes_view& in) -> const data::type_info& {
auto key_size = read_simple<uint32_t>(in);
in.remove_prefix(key_size);
return ::is_any_live(data, tomb, now, [&type_info] (collection_mutation_input_stream& in) -> const data::type_info& {
auto key_size = in.read_trivial<uint32_t>();
in.skip(key_size);
return type_info;
});
},
[&] (const user_type_impl& utype) {
return ::is_any_live(data, tomb, now, [&utype] (bytes_view& in) -> const data::type_info& {
auto key_size = read_simple<uint32_t>(in);
auto key = read_simple_bytes(in, key_size);
return ::is_any_live(data, tomb, now, [&utype] (collection_mutation_input_stream& in) -> const data::type_info& {
auto key_size = in.read_trivial<uint32_t>();
auto key = in.read(key_size);
return utype.type(deserialize_field_index(key))->imr_state().type_info();
});
},
@@ -110,26 +108,25 @@ bool collection_mutation_view::is_any_live(const abstract_type& type, tombstone
}
template <typename F>
GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, bytes_view&>)
GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, collection_mutation_input_stream&>)
static api::timestamp_type last_update(const atomic_cell_value_view& data, F&& read_cell_type_info) {
return data.with_linearized([&] (bytes_view in) {
auto in = collection_mutation_input_stream(data);
api::timestamp_type max = api::missing_timestamp;
auto has_tomb = read_simple<bool>(in);
auto has_tomb = in.read_trivial<bool>();
if (has_tomb) {
max = std::max(max, read_simple<api::timestamp_type>(in));
(void)read_simple<gc_clock::duration::rep>(in);
max = std::max(max, in.read_trivial<api::timestamp_type>());
(void)in.read_trivial<gc_clock::duration::rep>();
}
auto nr = read_simple<uint32_t>(in);
auto nr = in.read_trivial<uint32_t>();
for (uint32_t i = 0; i != nr; ++i) {
auto& type_info = read_cell_type_info(in);
auto vsize = read_simple<uint32_t>(in);
auto value = atomic_cell_view::from_bytes(type_info, read_simple_bytes(in, vsize));
auto vsize = in.read_trivial<uint32_t>();
auto value = atomic_cell_view::from_bytes(type_info, in.read(vsize));
max = std::max(value.timestamp(), max);
}
return max;
});
}
@@ -137,16 +134,16 @@ api::timestamp_type collection_mutation_view::last_update(const abstract_type& t
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
auto& type_info = ctype.value_comparator()->imr_state().type_info();
return ::last_update(data, [&type_info] (bytes_view& in) -> const data::type_info& {
auto key_size = read_simple<uint32_t>(in);
in.remove_prefix(key_size);
return ::last_update(data, [&type_info] (collection_mutation_input_stream& in) -> const data::type_info& {
auto key_size = in.read_trivial<uint32_t>();
in.skip(key_size);
return type_info;
});
},
[&] (const user_type_impl& utype) {
return ::last_update(data, [&utype] (bytes_view& in) -> const data::type_info& {
auto key_size = read_simple<uint32_t>(in);
auto key = read_simple_bytes(in, key_size);
return ::last_update(data, [&utype] (collection_mutation_input_stream& in) -> const data::type_info& {
auto key_size = in.read_trivial<uint32_t>();
auto key = in.read(key_size);
return utype.type(deserialize_field_index(key))->imr_state().type_info();
});
},
@@ -156,6 +153,44 @@ api::timestamp_type collection_mutation_view::last_update(const abstract_type& t
));
}
std::ostream& operator<<(std::ostream& os, const collection_mutation_view::printer& cmvp) {
fmt::print(os, "{{collection_mutation_view ");
cmvp._cmv.with_deserialized(cmvp._type, [&os, &type = cmvp._type] (const collection_mutation_view_description& cmvd) {
bool first = true;
fmt::print(os, "tombstone {}", cmvd.tomb);
visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
auto&& key_type = ctype.name_comparator();
auto&& value_type = ctype.value_comparator();
for (auto&& [key, value] : cmvd.cells) {
if (!first) {
fmt::print(os, ", ");
}
fmt::print(os, "{}: {}", key_type->to_string(key), atomic_cell_view::printer(*value_type, value));
first = false;
}
},
[&] (const user_type_impl& utype) {
for (auto&& [raw_idx, value] : cmvd.cells) {
if (!first) {
fmt::print(os, ", ");
}
auto idx = deserialize_field_index(raw_idx);
fmt::print(os, "{}: {}", utype.field_name_as_string(idx), atomic_cell_view::printer(*utype.type(idx), value));
first = false;
}
},
[&] (const abstract_type& o) {
// Not throwing exception in this likely-to-be debug context
fmt::print(os, "attempted to pretty-print collection_mutation_view_description with type {}", o.name());
}
));
});
fmt::print(os, "}}");
return os;
}
collection_mutation_description
collection_mutation_view_description::materialize(const abstract_type& type) const {
collection_mutation_description m;
@@ -245,8 +280,9 @@ static collection_mutation serialize_collection_mutation(
if (tomb) {
size += sizeof(tomb.timestamp) + sizeof(tomb.deletion_time);
}
bytes ret(bytes::initialized_later(), size);
bytes::iterator out = ret.begin();
bytes_ostream ret;
ret.reserve(size);
auto out = ret.write_begin();
*out++ = bool(tomb);
if (tomb) {
write(out, tomb.timestamp);
@@ -385,19 +421,19 @@ collection_mutation difference(const abstract_type& type, collection_mutation_vi
}
template <typename F>
GCC6_CONCEPT(requires std::is_invocable_r_v<std::pair<bytes_view, atomic_cell_view>, F, bytes_view&>)
GCC6_CONCEPT(requires std::is_invocable_r_v<std::pair<bytes_view, atomic_cell_view>, F, collection_mutation_input_stream&>)
static collection_mutation_view_description
deserialize_collection_mutation(bytes_view in, F&& read_kv) {
deserialize_collection_mutation(collection_mutation_input_stream& in, F&& read_kv) {
collection_mutation_view_description ret;
auto has_tomb = read_simple<bool>(in);
auto has_tomb = in.read_trivial<bool>();
if (has_tomb) {
auto ts = read_simple<api::timestamp_type>(in);
auto ttl = read_simple<gc_clock::duration::rep>(in);
auto ts = in.read_trivial<api::timestamp_type>();
auto ttl = in.read_trivial<gc_clock::duration::rep>();
ret.tomb = tombstone{ts, gc_clock::time_point(gc_clock::duration(ttl))};
}
auto nr = read_simple<uint32_t>(in);
auto nr = in.read_trivial<uint32_t>();
ret.cells.reserve(nr);
for (uint32_t i = 0; i != nr; ++i) {
ret.cells.push_back(read_kv(in));
@@ -408,28 +444,28 @@ deserialize_collection_mutation(bytes_view in, F&& read_kv) {
}
collection_mutation_view_description
deserialize_collection_mutation(const abstract_type& type, bytes_view in) {
deserialize_collection_mutation(const abstract_type& type, collection_mutation_input_stream& in) {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
// value_comparator(), ugh
auto& type_info = ctype.value_comparator()->imr_state().type_info();
return deserialize_collection_mutation(in, [&type_info] (bytes_view& in) {
return deserialize_collection_mutation(in, [&type_info] (collection_mutation_input_stream& in) {
// FIXME: we could probably avoid the need for size
auto ksize = read_simple<uint32_t>(in);
auto key = read_simple_bytes(in, ksize);
auto vsize = read_simple<uint32_t>(in);
auto value = atomic_cell_view::from_bytes(type_info, read_simple_bytes(in, vsize));
auto ksize = in.read_trivial<uint32_t>();
auto key = in.read(ksize);
auto vsize = in.read_trivial<uint32_t>();
auto value = atomic_cell_view::from_bytes(type_info, in.read(vsize));
return std::make_pair(key, value);
});
},
[&] (const user_type_impl& utype) {
return deserialize_collection_mutation(in, [&utype] (bytes_view& in) {
return deserialize_collection_mutation(in, [&utype] (collection_mutation_input_stream& in) {
// FIXME: we could probably avoid the need for size
auto ksize = read_simple<uint32_t>(in);
auto key = read_simple_bytes(in, ksize);
auto vsize = read_simple<uint32_t>(in);
auto ksize = in.read_trivial<uint32_t>();
auto key = in.read(ksize);
auto vsize = in.read_trivial<uint32_t>();
auto value = atomic_cell_view::from_bytes(
utype.type(deserialize_field_index(key))->imr_state().type_info(), read_simple_bytes(in, vsize));
utype.type(deserialize_field_index(key))->imr_state().type_info(), in.read(vsize));
return std::make_pair(key, value);
});
},

View File

@@ -26,8 +26,12 @@
#include "gc_clock.hh"
#include "atomic_cell.hh"
#include "cql_serialization_format.hh"
#include "marshal_exception.hh"
#include "utils/linearizing_input_stream.hh"
#include <iosfwd>
class abstract_type;
class bytes_ostream;
class compaction_garbage_collector;
class row_tombstone;
@@ -66,10 +70,13 @@ struct collection_mutation_view_description {
collection_mutation serialize(const abstract_type&) const;
};
using collection_mutation_input_stream = utils::linearizing_input_stream<atomic_cell_value_view, marshal_exception>;
// Given a linearized collection_mutation_view, returns an auxiliary struct allowing the inspection of each cell.
// The struct is an observer of the data given by the collection_mutation_view and doesn't extend its lifetime.
// The struct is an observer of the data given by the collection_mutation_view and is only valid while the
// passed in `collection_mutation_input_stream` is alive.
// The function needs to be given the type of stored data to reconstruct the structural information.
collection_mutation_view_description deserialize_collection_mutation(const abstract_type&, bytes_view);
collection_mutation_view_description deserialize_collection_mutation(const abstract_type&, collection_mutation_input_stream&);
class collection_mutation_view {
public:
@@ -90,10 +97,18 @@ public:
// calls it on the corresponding description of `this`.
template <typename F>
inline decltype(auto) with_deserialized(const abstract_type& type, F f) const {
return data.with_linearized([&] (bytes_view bv) {
return f(deserialize_collection_mutation(type, std::move(bv)));
});
auto stream = collection_mutation_input_stream(data);
return f(deserialize_collection_mutation(type, stream));
}
class printer {
const abstract_type& _type;
const collection_mutation_view& _cmv;
public:
printer(const abstract_type& type, const collection_mutation_view& cmv)
: _type(type), _cmv(cmv) {}
friend std::ostream& operator<<(std::ostream& os, const printer& cmvp);
};
};
// A serialized mutation of a collection of cells.
@@ -112,7 +127,7 @@ public:
collection_mutation() {}
collection_mutation(const abstract_type&, collection_mutation_view);
collection_mutation(const abstract_type&, bytes_view);
collection_mutation(const abstract_type& type, const bytes_ostream& data);
operator collection_mutation_view() const;
};

View File

@@ -74,8 +74,8 @@ private:
* <len(value1)><value1><len(value2)><value2>...<len(value_n)><value_n>
*
*/
template<typename RangeOfSerializedComponents>
static void serialize_value(RangeOfSerializedComponents&& values, bytes::iterator& out) {
template<typename RangeOfSerializedComponents, typename CharOutputIterator>
static void serialize_value(RangeOfSerializedComponents&& values, CharOutputIterator& out) {
for (auto&& val : values) {
assert(val.size() <= std::numeric_limits<size_type>::max());
write<size_type>(out, size_type(val.size()));

View File

@@ -248,15 +248,16 @@ private:
static size_t size(const data_value& val) {
return val.serialized_size();
}
template<typename Value, typename = std::enable_if_t<!std::is_same<data_value, std::decay_t<Value>>::value>>
static void write_value(Value&& val, bytes::iterator& out) {
template<typename Value, typename CharOutputIterator, typename = std::enable_if_t<!std::is_same<data_value, std::decay_t<Value>>::value>>
static void write_value(Value&& val, CharOutputIterator& out) {
out = std::copy(val.begin(), val.end(), out);
}
static void write_value(const data_value& val, bytes::iterator& out) {
template <typename CharOutputIterator>
static void write_value(const data_value& val, CharOutputIterator& out) {
val.serialize(out);
}
template<typename RangeOfSerializedComponents>
static void serialize_value(RangeOfSerializedComponents&& values, bytes::iterator& out, bool is_compound) {
template<typename RangeOfSerializedComponents, typename CharOutputIterator>
static void serialize_value(RangeOfSerializedComponents&& values, CharOutputIterator& out, bool is_compound) {
if (!is_compound) {
auto it = values.begin();
write_value(std::forward<decltype(*it)>(*it), out);

View File

@@ -92,14 +92,17 @@ struct duration_type_impl final : public concrete_type<cql_duration> {
struct timestamp_type_impl final : public simple_type_impl<db_clock::time_point> {
timestamp_type_impl();
static db_clock::time_point from_sstring(sstring_view s);
};
struct simple_date_type_impl final : public simple_type_impl<uint32_t> {
simple_date_type_impl();
static uint32_t from_sstring(sstring_view s);
};
struct time_type_impl final : public simple_type_impl<int64_t> {
time_type_impl();
static int64_t from_sstring(sstring_view s);
};
struct string_type_impl : public concrete_type<sstring> {
@@ -129,6 +132,7 @@ using timestamp_date_base_class = concrete_type<db_clock::time_point>;
struct timeuuid_type_impl final : public concrete_type<utils::UUID> {
timeuuid_type_impl();
static utils::UUID from_sstring(sstring_view s);
};
struct varint_type_impl final : public concrete_type<boost::multiprecision::cpp_int> {
@@ -137,10 +141,13 @@ struct varint_type_impl final : public concrete_type<boost::multiprecision::cpp_
struct inet_addr_type_impl final : public concrete_type<seastar::net::inet_address> {
inet_addr_type_impl();
static sstring to_sstring(const seastar::net::inet_address& addr);
static seastar::net::inet_address from_sstring(sstring_view s);
};
struct uuid_type_impl final : public concrete_type<utils::UUID> {
uuid_type_impl();
static utils::UUID from_sstring(sstring_view s);
};
template <typename Func> using visit_ret_type = std::invoke_result_t<Func, const ascii_type_impl&>;
@@ -241,3 +248,28 @@ static inline visit_ret_type<Func> visit(const abstract_type& t, Func&& f) {
}
__builtin_unreachable();
}
template <typename Func> struct data_value_visitor {
const void* v;
Func& f;
auto operator()(const empty_type_impl& t) { return f(t, v); }
auto operator()(const counter_type_impl& t) { return f(t, v); }
auto operator()(const reversed_type_impl& t) { return f(t, v); }
template <typename T> auto operator()(const T& t) {
return f(t, reinterpret_cast<const typename T::native_type*>(v));
}
};
// Given an abstract_type and a void pointer to an object of that
// type, call f with the runtime type of t and v casted to the
// corresponding native type.
// This takes an abstract_type and a void pointer instead of a
// data_value to support reversed_type_impl without requiring that
// each visitor create a new data_value just to recurse.
template <typename Func> inline auto visit(const abstract_type& t, const void* v, Func&& f) {
return ::visit(t, data_value_visitor<Func>{v, f});
}
template <typename Func> inline auto visit(const data_value& v, Func&& f) {
return ::visit(*v.type(), v._value, f);
}

View File

@@ -25,15 +25,19 @@
# multiple tokens per node, see http://cassandra.apache.org/doc/latest/operating
num_tokens: 256
# Directory where Scylla should store all its files, which are commitlog,
# data, hints, view_hints and saved_caches subdirectories. All of these
# subs can be overriden by the respective options below.
# If unset, the value defaults to /var/lib/scylla
# workdir: /var/lib/scylla
# Directory where Scylla should store data on disk.
# If not set, the default directory is /var/lib/scylla/data.
data_file_directories:
- /var/lib/scylla/data
# data_file_directories:
# - /var/lib/scylla/data
# commit log. when running on magnetic HDD, this should be a
# separate spindle than the data directories.
# If not set, the default directory is /var/lib/scylla/commitlog.
commitlog_directory: /var/lib/scylla/commitlog
# commitlog_directory: /var/lib/scylla/commitlog
# commitlog_sync may be either "periodic" or "batch."
#
@@ -244,6 +248,7 @@ batch_size_fail_threshold_in_kb: 50
# experimental_features:
# - cdc
# - lwt
# - udf
# The directory where hints files are stored if hinted handoff is enabled.
# hints_directory: /var/lib/scylla/hints
@@ -262,24 +267,6 @@ batch_size_fail_threshold_in_kb: 50
# created until it has been seen alive and gone down again.
# max_hint_window_in_ms: 10800000 # 3 hours
# Maximum throttle in KBs per second, per delivery thread. This will be
# reduced proportionally to the number of nodes in the cluster. (If there
# are two nodes in the cluster, each delivery thread will use the maximum
# rate; if there are three, each will throttle to half of the maximum,
# since we expect two nodes to be delivering hints simultaneously.)
# hinted_handoff_throttle_in_kb: 1024
# Number of threads with which to deliver hints;
# Consider increasing this number when you have multi-dc deployments, since
# cross-dc handoff tends to be slower
# max_hints_delivery_threads: 2
###################################################
## Not currently supported, reserved for future use
###################################################
# Maximum throttle in KBs per second, total. This will be
# reduced proportionally to the number of nodes in the cluster.
# batchlog_replay_throttle_in_kb: 1024
# Validity period for permissions cache (fetching permissions can be an
# expensive operation depending on the authorizer, CassandraAuthorizer is
@@ -307,120 +294,6 @@ batch_size_fail_threshold_in_kb: 50
#
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
# Maximum size of the key cache in memory.
#
# Each key cache hit saves 1 seek and each row cache hit saves 2 seeks at the
# minimum, sometimes more. The key cache is fairly tiny for the amount of
# time it saves, so it's worthwhile to use it at large numbers.
# The row cache saves even more time, but must contain the entire row,
# so it is extremely space-intensive. It's best to only use the
# row cache if you have hot rows or static rows.
#
# NOTE: if you reduce the size, you may not get you hottest keys loaded on startup.
#
# Default value is empty to make it "auto" (min(5% of Heap (in MB), 100MB)). Set to 0 to disable key cache.
# key_cache_size_in_mb:
# Duration in seconds after which Scylla should
# save the key cache. Caches are saved to saved_caches_directory as
# specified in this configuration file.
#
# Saved caches greatly improve cold-start speeds, and is relatively cheap in
# terms of I/O for the key cache. Row cache saving is much more expensive and
# has limited use.
#
# Default is 14400 or 4 hours.
# key_cache_save_period: 14400
# Number of keys from the key cache to save
# Disabled by default, meaning all keys are going to be saved
# key_cache_keys_to_save: 100
# Maximum size of the row cache in memory.
# NOTE: if you reduce the size, you may not get you hottest keys loaded on startup.
#
# Default value is 0, to disable row caching.
# row_cache_size_in_mb: 0
# Duration in seconds after which Scylla should
# save the row cache. Caches are saved to saved_caches_directory as specified
# in this configuration file.
#
# Saved caches greatly improve cold-start speeds, and is relatively cheap in
# terms of I/O for the key cache. Row cache saving is much more expensive and
# has limited use.
#
# Default is 0 to disable saving the row cache.
# row_cache_save_period: 0
# Number of keys from the row cache to save
# Disabled by default, meaning all keys are going to be saved
# row_cache_keys_to_save: 100
# Maximum size of the counter cache in memory.
#
# Counter cache helps to reduce counter locks' contention for hot counter cells.
# In case of RF = 1 a counter cache hit will cause Scylla to skip the read before
# write entirely. With RF > 1 a counter cache hit will still help to reduce the duration
# of the lock hold, helping with hot counter cell updates, but will not allow skipping
# the read entirely. Only the local (clock, count) tuple of a counter cell is kept
# in memory, not the whole counter, so it's relatively cheap.
#
# NOTE: if you reduce the size, you may not get you hottest keys loaded on startup.
#
# Default value is empty to make it "auto" (min(2.5% of Heap (in MB), 50MB)). Set to 0 to disable counter cache.
# NOTE: if you perform counter deletes and rely on low gcgs, you should disable the counter cache.
# counter_cache_size_in_mb:
# Duration in seconds after which Scylla should
# save the counter cache (keys only). Caches are saved to saved_caches_directory as
# specified in this configuration file.
#
# Default is 7200 or 2 hours.
# counter_cache_save_period: 7200
# Number of keys from the counter cache to save
# Disabled by default, meaning all keys are going to be saved
# counter_cache_keys_to_save: 100
# The off-heap memory allocator. Affects storage engine metadata as
# well as caches. Experiments show that JEMAlloc saves some memory
# than the native GCC allocator (i.e., JEMalloc is more
# fragmentation-resistant).
#
# Supported values are: NativeAllocator, JEMallocAllocator
#
# If you intend to use JEMallocAllocator you have to install JEMalloc as library and
# modify cassandra-env.sh as directed in the file.
#
# Defaults to NativeAllocator
# memory_allocator: NativeAllocator
# saved caches
# If not set, the default directory is /var/lib/scylla/saved_caches.
# saved_caches_directory: /var/lib/scylla/saved_caches
# For workloads with more data than can fit in memory, Scylla's
# bottleneck will be reads that need to fetch data from
# disk. "concurrent_reads" should be set to (16 * number_of_drives) in
# order to allow the operations to enqueue low enough in the stack
# that the OS and drives can reorder them. Same applies to
# "concurrent_counter_writes", since counter writes read the current
# values before incrementing and writing them back.
#
# On the other hand, since writes are almost never IO bound, the ideal
# number of "concurrent_writes" is dependent on the number of cores in
# your system; (8 * number_of_cores) is a good rule of thumb.
# concurrent_reads: 32
# concurrent_writes: 32
# concurrent_counter_writes: 32
# Total memory to use for sstable-reading buffers. Defaults to
# the smaller of 1/4 of heap or 512MB.
# file_cache_size_in_mb: 512
# Total space to use for commitlogs.
#
# If space gets above this value (it will round up to the next nearest
@@ -432,28 +305,6 @@ partitioner: org.apache.cassandra.dht.Murmur3Partitioner
# available for Scylla.
commitlog_total_space_in_mb: -1
# A fixed memory pool size in MB for for SSTable index summaries. If left
# empty, this will default to 5% of the heap size. If the memory usage of
# all index summaries exceeds this limit, SSTables with low read rates will
# shrink their index summaries in order to meet this limit. However, this
# is a best-effort process. In extreme conditions Scylla may need to use
# more than this amount of memory.
# index_summary_capacity_in_mb:
# How frequently index summaries should be resampled. This is done
# periodically to redistribute memory from the fixed-size pool to sstables
# proportional their recent read rates. Setting to -1 will disable this
# process, leaving existing index summaries at their current sampling level.
# index_summary_resize_interval_in_minutes: 60
# Whether to, when doing sequential writing, fsync() at intervals in
# order to force the operating system to flush the dirty
# buffers. Enable this to avoid sudden dirty buffer flushing from
# impacting read latencies. Almost always a good idea on SSDs; not
# necessarily on platters.
# trickle_fsync: false
# trickle_fsync_interval_in_kb: 10240
# TCP port, for commands and data
# For security reasons, you should not expose this port to the internet. Firewall it if needed.
# storage_port: 7000
@@ -466,91 +317,21 @@ commitlog_total_space_in_mb: -1
# listen_interface: eth0
# listen_interface_prefer_ipv6: false
# Internode authentication backend, implementing IInternodeAuthenticator;
# used to allow/disallow connections from peer nodes.
# internode_authenticator: org.apache.cassandra.auth.AllowAllInternodeAuthenticator
# Whether to start the native transport server.
# Please note that the address on which the native transport is bound is the
# same as the rpc_address. The port however is different and specified below.
# start_native_transport: true
# The maximum threads for handling requests when the native transport is used.
# This is similar to rpc_max_threads though the default differs slightly (and
# there is no native_transport_min_threads, idle threads will always be stopped
# after 30 seconds).
# native_transport_max_threads: 128
#
# The maximum size of allowed frame. Frame (requests) larger than this will
# be rejected as invalid. The default is 256MB.
# native_transport_max_frame_size_in_mb: 256
# The maximum number of concurrent client connections.
# The default is -1, which means unlimited.
# native_transport_max_concurrent_connections: -1
# The maximum number of concurrent client connections per source ip.
# The default is -1, which means unlimited.
# native_transport_max_concurrent_connections_per_ip: -1
# Whether to start the thrift rpc server.
# start_rpc: true
# enable or disable keepalive on rpc/native connections
# rpc_keepalive: true
# Scylla provides two out-of-the-box options for the RPC Server:
#
# sync -> One thread per thrift connection. For a very large number of clients, memory
# will be your limiting factor. On a 64 bit JVM, 180KB is the minimum stack size
# per thread, and that will correspond to your use of virtual memory (but physical memory
# may be limited depending on use of stack space).
#
# hsha -> Stands for "half synchronous, half asynchronous." All thrift clients are handled
# asynchronously using a small number of threads that does not vary with the amount
# of thrift clients (and thus scales well to many clients). The rpc requests are still
# synchronous (one thread per active request). If hsha is selected then it is essential
# that rpc_max_threads is changed from the default value of unlimited.
#
# The default is sync because on Windows hsha is about 30% slower. On Linux,
# sync/hsha performance is about the same, with hsha of course using less memory.
#
# Alternatively, can provide your own RPC server by providing the fully-qualified class name
# of an o.a.c.t.TServerFactory that can create an instance of it.
# rpc_server_type: sync
# Uncomment rpc_min|max_thread to set request pool size limits.
#
# Regardless of your choice of RPC server (see above), the number of maximum requests in the
# RPC thread pool dictates how many concurrent requests are possible (but if you are using the sync
# RPC server, it also dictates the number of clients that can be connected at all).
#
# The default is unlimited and thus provides no protection against clients overwhelming the server. You are
# encouraged to set a maximum that makes sense for you in production, but do keep in mind that
# rpc_max_threads represents the maximum number of client requests this server may execute concurrently.
#
# rpc_min_threads: 16
# rpc_max_threads: 2048
# uncomment to set socket buffer sizes on rpc connections
# rpc_send_buff_size_in_bytes:
# rpc_recv_buff_size_in_bytes:
# Uncomment to set socket buffer size for internode communication
# Note that when setting this, the buffer size is limited by net.core.wmem_max
# and when not setting it it is defined by net.ipv4.tcp_wmem
# See:
# /proc/sys/net/core/wmem_max
# /proc/sys/net/core/rmem_max
# /proc/sys/net/ipv4/tcp_wmem
# /proc/sys/net/ipv4/tcp_rmem
# and: man tcp
# internode_send_buff_size_in_bytes:
# internode_recv_buff_size_in_bytes:
# Frame size for thrift (maximum message length).
# thrift_framed_transport_size_in_mb: 15
# Set to true to have Scylla create a hard link to each sstable
# flushed or streamed locally in a backups/ subdirectory of the
# keyspace data. Removing these links is the operator's
@@ -593,30 +374,6 @@ commitlog_total_space_in_mb: -1
# column_index_size_in_kb: 64
# Number of simultaneous compactions to allow, NOT including
# validation "compactions" for anti-entropy repair. Simultaneous
# compactions can help preserve read performance in a mixed read/write
# workload, by mitigating the tendency of small sstables to accumulate
# during a single long running compactions. The default is usually
# fine and if you experience problems with compaction running too
# slowly or too fast, you should look at
# compaction_throughput_mb_per_sec first.
#
# concurrent_compactors defaults to the smaller of (number of disks,
# number of cores), with a minimum of 2 and a maximum of 8.
#
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#concurrent_compactors: 1
# Throttles compaction to the given total throughput across the entire
# system. The faster you insert data, the faster you need to compact in
# order to keep the sstable count down, but in general, setting this to
# 16 to 32 times the rate you are inserting data is more than sufficient.
# Setting this to 0 disables throttling. Note that this account for all types
# of compaction, including validation compaction.
# compaction_throughput_mb_per_sec: 16
# Log a warning when writing partitions larger than this value
# compaction_large_partition_warning_threshold_mb: 1000
@@ -629,18 +386,6 @@ commitlog_total_space_in_mb: -1
# Log a warning when row number is larger than this value
# compaction_rows_count_warning_threshold: 100000
# When compacting, the replacement sstable(s) can be opened before they
# are completely written, and used in place of the prior sstables for
# any range that has been written. This helps to smoothly transfer reads
# between the sstables, reducing page cache churn and keeping hot rows hot
# sstable_preemptive_open_interval_in_mb: 50
# Throttles all streaming file transfer between the datacenters,
# this setting allows users to throttle inter dc stream throughput in addition
# to throttling all network stream traffic as configured with
# stream_throughput_outbound_megabits_per_sec
# inter_dc_stream_throughput_outbound_megabits_per_sec:
# How long the coordinator should wait for seq or index scans to complete
# range_request_timeout_in_ms: 10000
# How long the coordinator should wait for writes to complete
@@ -655,88 +400,23 @@ commitlog_total_space_in_mb: -1
# The default timeout for other, miscellaneous operations
# request_timeout_in_ms: 10000
# Enable operation timeout information exchange between nodes to accurately
# measure request timeouts. If disabled, replicas will assume that requests
# were forwarded to them instantly by the coordinator, which means that
# under overload conditions we will waste that much extra time processing
# already-timed-out requests.
#
# Warning: before enabling this property make sure to ntp is installed
# and the times are synchronized between the nodes.
# cross_node_timeout: false
# Enable socket timeout for streaming operation.
# When a timeout occurs during streaming, streaming is retried from the start
# of the current file. This _can_ involve re-streaming an important amount of
# data, so you should avoid setting the value too low.
# Default value is 0, which never timeout streams.
# streaming_socket_timeout_in_ms: 0
# controls how often to perform the more expensive part of host score
# calculation
# dynamic_snitch_update_interval_in_ms: 100
# controls how often to reset all host scores, allowing a bad host to
# possibly recover
# dynamic_snitch_reset_interval_in_ms: 600000
# if set greater than zero and read_repair_chance is < 1.0, this will allow
# 'pinning' of replicas to hosts in order to increase cache capacity.
# The badness threshold will control how much worse the pinned host has to be
# before the dynamic snitch will prefer other replicas over it. This is
# expressed as a double which represents a percentage. Thus, a value of
# 0.2 means Scylla would continue to prefer the static snitch values
# until the pinned host was 20% worse than the fastest.
# dynamic_snitch_badness_threshold: 0.1
# request_scheduler -- Set this to a class that implements
# RequestScheduler, which will schedule incoming client requests
# according to the specific policy. This is useful for multi-tenancy
# with a single Scylla cluster.
# NOTE: This is specifically for requests from the client and does
# not affect inter node communication.
# org.apache.cassandra.scheduler.NoScheduler - No scheduling takes place
# org.apache.cassandra.scheduler.RoundRobinScheduler - Round robin of
# client requests to a node with a separate queue for each
# request_scheduler_id. The scheduler is further customized by
# request_scheduler_options as described below.
# request_scheduler: org.apache.cassandra.scheduler.NoScheduler
# Scheduler Options vary based on the type of scheduler
# NoScheduler - Has no options
# RoundRobin
# - throttle_limit -- The throttle_limit is the number of in-flight
# requests per client. Requests beyond
# that limit are queued up until
# running requests can complete.
# The value of 80 here is twice the number of
# concurrent_reads + concurrent_writes.
# - default_weight -- default_weight is optional and allows for
# overriding the default which is 1.
# - weights -- Weights are optional and will default to 1 or the
# overridden default_weight. The weight translates into how
# many requests are handled during each turn of the
# RoundRobin, based on the scheduler id.
#
# request_scheduler_options:
# throttle_limit: 80
# default_weight: 5
# weights:
# Keyspace1: 1
# Keyspace2: 5
# request_scheduler_id -- An identifier based on which to perform
# the request scheduling. Currently the only valid option is keyspace.
# request_scheduler_id: keyspace
# Enable or disable inter-node encryption.
# You must also generate keys and provide the appropriate key and trust store locations and passwords.
# No custom encryption options are currently enabled. The available options are:
#
# The available internode options are : all, none, dc, rack
# If set to dc scylla will encrypt the traffic between the DCs
# If set to rack scylla will encrypt the traffic between the racks
#
# SSL/TLS algorithm and ciphers used can be controlled by
# the priority_string parameter. Info on priority string
# syntax and values is available at:
# https://gnutls.org/manual/html_node/Priority-Strings.html
#
# The require_client_auth parameter allows you to
# restrict access to service based on certificate
# validation. Client must provide a certificate
# accepted by the used trust store to connect.
#
# server_encryption_options:
# internode_encryption: none
# certificate: conf/scylla.crt

View File

@@ -144,8 +144,12 @@ def flag_supported(flag, compiler):
def gold_supported(compiler):
src_main = 'int main(int argc, char **argv) { return 0; }'
if try_compile_and_link(source=src_main, flags=['-fuse-ld=gold'], compiler=compiler):
return '-fuse-ld=gold'
link_flags = ['-fuse-ld=gold']
if try_compile_and_link(source=src_main, flags=link_flags, compiler=compiler):
threads_flag = '-Wl,--threads'
if try_compile_and_link(source=src_main, flags=link_flags + [threads_flag], compiler=compiler):
link_flags.append(threads_flag)
return ' '.join(link_flags)
else:
print('Note: gold not found; using default system linker')
return ''
@@ -257,139 +261,142 @@ modes = {
}
scylla_tests = [
'tests/mutation_test',
'tests/mvcc_test',
'tests/mutation_fragment_test',
'tests/flat_mutation_reader_test',
'tests/schema_registry_test',
'tests/canonical_mutation_test',
'tests/range_test',
'tests/types_test',
'tests/keys_test',
'tests/partitioner_test',
'tests/frozen_mutation_test',
'tests/serialized_action_test',
'tests/hint_test',
'tests/clustering_ranges_walker_test',
'tests/perf/perf_mutation',
'tests/lsa_async_eviction_test',
'tests/lsa_sync_eviction_test',
'tests/row_cache_alloc_stress',
'tests/perf_row_cache_update',
'tests/perf/perf_hash',
'tests/perf/perf_cql_parser',
'tests/perf/perf_simple_query',
'tests/perf/perf_fast_forward',
'tests/perf/perf_cache_eviction',
'tests/cache_flat_mutation_reader_test',
'tests/row_cache_stress_test',
'tests/memory_footprint',
'tests/perf/perf_sstable',
'tests/cdc_test',
'tests/cql_query_test',
'tests/user_types_test',
'tests/secondary_index_test',
'tests/json_cql_query_test',
'tests/filtering_test',
'tests/storage_proxy_test',
'tests/schema_change_test',
'tests/mutation_reader_test',
'tests/mutation_query_test',
'tests/row_cache_test',
'tests/test-serialization',
'tests/broken_sstable_test',
'tests/sstable_test',
'tests/sstable_datafile_test',
'tests/sstable_3_x_test',
'tests/sstable_mutation_test',
'tests/sstable_resharding_test',
'tests/memtable_test',
'tests/commitlog_test',
'tests/cartesian_product_test',
'tests/hash_test',
'tests/map_difference_test',
'tests/message',
'tests/gossip',
'tests/gossip_test',
'tests/compound_test',
'tests/config_test',
'tests/gossiping_property_file_snitch_test',
'tests/ec2_snitch_test',
'tests/gce_snitch_test',
'tests/snitch_reset_test',
'tests/network_topology_strategy_test',
'tests/query_processor_test',
'tests/batchlog_manager_test',
'tests/bytes_ostream_test',
'tests/UUID_test',
'tests/murmur_hash_test',
'tests/allocation_strategy_test',
'tests/logalloc_test',
'tests/log_heap_test',
'tests/managed_vector_test',
'tests/crc_test',
'tests/checksum_utils_test',
'tests/flush_queue_test',
'tests/dynamic_bitset_test',
'tests/auth_test',
'tests/idl_test',
'tests/range_tombstone_list_test',
'tests/anchorless_list_test',
'tests/database_test',
'tests/nonwrapping_range_test',
'tests/input_stream_test',
'tests/virtual_reader_test',
'tests/view_schema_test',
'tests/view_build_test',
'tests/view_complex_test',
'tests/counter_test',
'tests/cell_locker_test',
'tests/row_locker_test',
'tests/streaming_histogram_test',
'tests/duration_test',
'tests/vint_serialization_test',
'tests/continuous_data_consumer_test',
'tests/compress_test',
'tests/chunked_vector_test',
'tests/loading_cache_test',
'tests/castas_fcts_test',
'tests/big_decimal_test',
'tests/aggregate_fcts_test',
'tests/role_manager_test',
'tests/caching_options_test',
'tests/auth_resource_test',
'tests/cql_auth_query_test',
'tests/enum_set_test',
'tests/extensions_test',
'tests/cql_auth_syntax_test',
'tests/querier_cache',
'tests/limiting_data_source_test',
'tests/meta_test',
'tests/imr_test',
'tests/partition_data_test',
'tests/reusable_buffer_test',
'tests/mutation_writer_test',
'tests/observable_test',
'tests/transport_test',
'tests/fragmented_temporary_buffer_test',
'tests/json_test',
'tests/auth_passwords_test',
'tests/multishard_mutation_query_test',
'tests/top_k_test',
'tests/utf8_test',
'tests/small_vector_test',
'tests/data_listeners_test',
'tests/truncation_migration_test',
'tests/like_matcher_test',
'tests/enum_option_test',
'test/boost/UUID_test',
'test/boost/aggregate_fcts_test',
'test/boost/allocation_strategy_test',
'test/boost/anchorless_list_test',
'test/boost/auth_passwords_test',
'test/boost/auth_resource_test',
'test/boost/auth_test',
'test/boost/batchlog_manager_test',
'test/boost/big_decimal_test',
'test/boost/broken_sstable_test',
'test/boost/bytes_ostream_test',
'test/boost/cache_flat_mutation_reader_test',
'test/boost/caching_options_test',
'test/boost/canonical_mutation_test',
'test/boost/cartesian_product_test',
'test/boost/castas_fcts_test',
'test/boost/cdc_test',
'test/boost/cell_locker_test',
'test/boost/checksum_utils_test',
'test/boost/chunked_vector_test',
'test/boost/clustering_ranges_walker_test',
'test/boost/commitlog_test',
'test/boost/compound_test',
'test/boost/compress_test',
'test/boost/config_test',
'test/boost/continuous_data_consumer_test',
'test/boost/counter_test',
'test/boost/cql_auth_query_test',
'test/boost/cql_auth_syntax_test',
'test/boost/cql_query_test',
'test/boost/crc_test',
'test/boost/data_listeners_test',
'test/boost/database_test',
'test/boost/duration_test',
'test/boost/dynamic_bitset_test',
'test/boost/enum_option_test',
'test/boost/enum_set_test',
'test/boost/extensions_test',
'test/boost/filtering_test',
'test/boost/flat_mutation_reader_test',
'test/boost/flush_queue_test',
'test/boost/fragmented_temporary_buffer_test',
'test/boost/frozen_mutation_test',
'test/boost/gossip_test',
'test/boost/gossiping_property_file_snitch_test',
'test/boost/hash_test',
'test/boost/idl_test',
'test/boost/input_stream_test',
'test/boost/json_cql_query_test',
'test/boost/keys_test',
'test/boost/like_matcher_test',
'test/boost/limiting_data_source_test',
'test/boost/linearizing_input_stream_test',
'test/boost/loading_cache_test',
'test/boost/log_heap_test',
'test/boost/logalloc_test',
'test/boost/managed_vector_test',
'test/boost/map_difference_test',
'test/boost/memtable_test',
'test/boost/meta_test',
'test/boost/multishard_mutation_query_test',
'test/boost/murmur_hash_test',
'test/boost/mutation_fragment_test',
'test/boost/mutation_query_test',
'test/boost/mutation_reader_test',
'test/boost/mutation_test',
'test/boost/mutation_writer_test',
'test/boost/mvcc_test',
'test/boost/network_topology_strategy_test',
'test/boost/nonwrapping_range_test',
'test/boost/observable_test',
'test/boost/partitioner_test',
'test/boost/querier_cache_test',
'test/boost/query_processor_test',
'test/boost/range_test',
'test/boost/range_tombstone_list_test',
'test/boost/reusable_buffer_test',
'test/boost/role_manager_test',
'test/boost/row_cache_test',
'test/boost/schema_change_test',
'test/boost/schema_registry_test',
'test/boost/secondary_index_test',
'test/boost/serialization_test',
'test/boost/serialized_action_test',
'test/boost/small_vector_test',
'test/boost/snitch_reset_test',
'test/boost/sstable_3_x_test',
'test/boost/sstable_datafile_test',
'test/boost/sstable_mutation_test',
'test/boost/sstable_resharding_test',
'test/boost/sstable_test',
'test/boost/storage_proxy_test',
'test/boost/top_k_test',
'test/boost/transport_test',
'test/boost/truncation_migration_test',
'test/boost/types_test',
'test/boost/user_function_test',
'test/boost/user_types_test',
'test/boost/utf8_test',
'test/boost/view_build_test',
'test/boost/view_complex_test',
'test/boost/view_schema_test',
'test/boost/vint_serialization_test',
'test/boost/virtual_reader_test',
'test/manual/ec2_snitch_test',
'test/manual/gce_snitch_test',
'test/manual/gossip',
'test/manual/hint_test',
'test/manual/imr_test',
'test/manual/json_test',
'test/manual/message',
'test/manual/partition_data_test',
'test/manual/row_locker_test',
'test/manual/streaming_histogram_test',
'test/perf/perf_cache_eviction',
'test/perf/perf_cql_parser',
'test/perf/perf_fast_forward',
'test/perf/perf_hash',
'test/perf/perf_mutation',
'test/perf/perf_row_cache_update',
'test/perf/perf_simple_query',
'test/perf/perf_sstable',
'test/tools/cql_repl',
'test/unit/lsa_async_eviction_test',
'test/unit/lsa_sync_eviction_test',
'test/unit/memory_footprint_test',
'test/unit/row_cache_alloc_stress_test',
'test/unit/row_cache_stress_test',
]
perf_tests = [
'tests/perf/perf_mutation_readers',
'tests/perf/perf_checksum',
'tests/perf/perf_mutation_fragment',
'tests/perf/perf_idl',
'tests/perf/perf_vint',
'test/perf/perf_mutation_readers',
'test/perf/perf_checksum',
'test/perf/perf_mutation_fragment',
'test/perf/perf_idl',
'test/perf/perf_vint',
]
apps = [
@@ -432,8 +439,6 @@ arg_parser.add_argument('--dpdk-target', action='store', dest='dpdk_target', def
help='Path to DPDK SDK target location (e.g. <DPDK SDK dir>/x86_64-native-linuxapp-gcc)')
arg_parser.add_argument('--debuginfo', action='store', dest='debuginfo', type=int, default=1,
help='Enable(1)/disable(0)compiler debug information generation')
arg_parser.add_argument('--compress-exec-debuginfo', action='store', dest='compress_exec_debuginfo', type=int, default=1,
help='Enable(1)/disable(0) debug information compression in executables')
arg_parser.add_argument('--static-stdc++', dest='staticcxx', action='store_true',
help='Link libgcc and libstdc++ statically')
arg_parser.add_argument('--static-thrift', dest='staticthrift', action='store_true',
@@ -456,6 +461,8 @@ arg_parser.add_argument('--enable-alloc-failure-injector', dest='alloc_failure_i
help='enable allocation failure injection')
arg_parser.add_argument('--with-antlr3', dest='antlr3_exec', action='store', default=None,
help='path to antlr3 executable')
arg_parser.add_argument('--with-ragel', dest='ragel_exec', action='store', default='ragel',
help='path to ragel executable')
args = arg_parser.parse_args()
defines = ['XXH_PRIVATE_API',
@@ -470,6 +477,7 @@ scylla_core = (['database.cc',
'table.cc',
'atomic_cell.cc',
'collection_mutation.cc',
'connection_notifier.cc',
'hashers.cc',
'schema.cc',
'frozen_schema.cc',
@@ -489,6 +497,7 @@ scylla_core = (['database.cc',
'utils/buffer_input_stream.cc',
'utils/limiting_data_source.cc',
'utils/updateable_value.cc',
'utils/directories.cc',
'mutation_partition.cc',
'mutation_partition_view.cc',
'mutation_partition_serializer.cc',
@@ -509,6 +518,7 @@ scylla_core = (['database.cc',
'sstables/partition.cc',
'sstables/compaction.cc',
'sstables/compaction_strategy.cc',
'sstables/size_tiered_compaction_strategy.cc',
'sstables/leveled_compaction_strategy.cc',
'sstables/compaction_manager.cc',
'sstables/integrity_checked_file_impl.cc',
@@ -519,6 +529,7 @@ scylla_core = (['database.cc',
'transport/server.cc',
'transport/messages/result_message.cc',
'cdc/cdc.cc',
'cql3/type_json.cc',
'cql3/abstract_marker.cc',
'cql3/attributes.cc',
'cql3/cf_name.cc',
@@ -530,7 +541,9 @@ scylla_core = (['database.cc',
'cql3/sets.cc',
'cql3/tuples.cc',
'cql3/maps.cc',
'cql3/functions/user_function.cc',
'cql3/functions/functions.cc',
'cql3/functions/aggregate_fcts.cc',
'cql3/functions/castas_fcts.cc',
'cql3/statements/cf_prop_defs.cc',
'cql3/statements/cf_statement.cc',
@@ -539,13 +552,16 @@ scylla_core = (['database.cc',
'cql3/statements/create_table_statement.cc',
'cql3/statements/create_view_statement.cc',
'cql3/statements/create_type_statement.cc',
'cql3/statements/create_function_statement.cc',
'cql3/statements/drop_index_statement.cc',
'cql3/statements/drop_keyspace_statement.cc',
'cql3/statements/drop_table_statement.cc',
'cql3/statements/drop_view_statement.cc',
'cql3/statements/drop_type_statement.cc',
'cql3/statements/drop_function_statement.cc',
'cql3/statements/schema_altering_statement.cc',
'cql3/statements/ks_prop_defs.cc',
'cql3/statements/function_statement.cc',
'cql3/statements/modification_statement.cc',
'cql3/statements/cas_request.cc',
'cql3/statements/parsed_statement.cc',
@@ -745,6 +761,7 @@ scylla_core = (['database.cc',
'utils/ascii.cc',
'utils/like_matcher.cc',
'mutation_writer/timestamp_based_splitting_writer.cc',
'lua.cc',
] + [Antlr3Grammar('cql3/Cql.g')] + [Thrift('interface/cassandra.thrift', 'Cassandra')]
)
@@ -797,6 +814,21 @@ alternator = [
'alternator/auth.cc',
]
redis = [
'redis/service.cc',
'redis/server.cc',
'redis/query_processor.cc',
'redis/protocol_parser.rl',
'redis/keyspace_utils.cc',
'redis/options.cc',
'redis/stats.cc',
'redis/mutation_utils.cc',
'redis/query_utils.cc',
'redis/abstract_command.cc',
'redis/command_factory.cc',
'redis/commands.cc',
]
idls = ['idl/gossip_digest.idl.hh',
'idl/uuid.idl.hh',
'idl/range.idl.hh',
@@ -828,72 +860,73 @@ idls = ['idl/gossip_digest.idl.hh',
headers = find_headers('.', excluded_dirs=['idl', 'build', 'seastar', '.git'])
scylla_tests_generic_dependencies = [
'tests/cql_test_env.cc',
'tests/test_services.cc',
'test/lib/cql_test_env.cc',
'test/lib/test_services.cc',
]
scylla_tests_dependencies = scylla_core + idls + scylla_tests_generic_dependencies + [
'tests/cql_assertions.cc',
'tests/result_set_assertions.cc',
'tests/mutation_source_test.cc',
'tests/data_model.cc',
'tests/exception_utils.cc',
'tests/random_schema.cc',
'test/lib/cql_assertions.cc',
'test/lib/result_set_assertions.cc',
'test/lib/mutation_source_test.cc',
'test/lib/data_model.cc',
'test/lib/exception_utils.cc',
'test/lib/random_schema.cc',
]
deps = {
'scylla': idls + ['main.cc', 'release.cc'] + scylla_core + api + alternator,
'scylla': idls + ['main.cc', 'release.cc', 'build_id.cc'] + scylla_core + api + alternator + redis,
}
pure_boost_tests = set([
'tests/map_difference_test',
'tests/keys_test',
'tests/compound_test',
'tests/range_tombstone_list_test',
'tests/anchorless_list_test',
'tests/nonwrapping_range_test',
'tests/test-serialization',
'tests/range_test',
'tests/crc_test',
'tests/checksum_utils_test',
'tests/dynamic_bitset_test',
'tests/idl_test',
'tests/cartesian_product_test',
'tests/streaming_histogram_test',
'tests/duration_test',
'tests/vint_serialization_test',
'tests/compress_test',
'tests/chunked_vector_test',
'tests/big_decimal_test',
'tests/caching_options_test',
'tests/auth_resource_test',
'tests/enum_set_test',
'tests/cql_auth_syntax_test',
'tests/meta_test',
'tests/observable_test',
'tests/json_test',
'tests/auth_passwords_test',
'tests/top_k_test',
'tests/small_vector_test',
'tests/like_matcher_test',
'tests/enum_option_test',
'test/boost/anchorless_list_test',
'test/boost/auth_passwords_test',
'test/boost/auth_resource_test',
'test/boost/big_decimal_test',
'test/boost/caching_options_test',
'test/boost/cartesian_product_test',
'test/boost/checksum_utils_test',
'test/boost/chunked_vector_test',
'test/boost/compound_test',
'test/boost/compress_test',
'test/boost/cql_auth_syntax_test',
'test/boost/crc_test',
'test/boost/duration_test',
'test/boost/dynamic_bitset_test',
'test/boost/enum_option_test',
'test/boost/enum_set_test',
'test/boost/idl_test',
'test/boost/keys_test',
'test/boost/like_matcher_test',
'test/boost/linearizing_input_stream_test',
'test/boost/map_difference_test',
'test/boost/meta_test',
'test/boost/nonwrapping_range_test',
'test/boost/observable_test',
'test/boost/range_test',
'test/boost/range_tombstone_list_test',
'test/boost/serialization_test',
'test/boost/small_vector_test',
'test/boost/top_k_test',
'test/boost/vint_serialization_test',
'test/manual/json_test',
'test/manual/streaming_histogram_test',
])
tests_not_using_seastar_test_framework = set([
'tests/perf/perf_mutation',
'tests/lsa_async_eviction_test',
'tests/lsa_sync_eviction_test',
'tests/row_cache_alloc_stress',
'tests/perf_row_cache_update',
'tests/perf/perf_hash',
'tests/perf/perf_cql_parser',
'tests/message',
'tests/perf/perf_cache_eviction',
'tests/row_cache_stress_test',
'tests/memory_footprint',
'tests/gossip',
'tests/perf/perf_sstable',
'tests/small_vector_test',
'test/boost/small_vector_test',
'test/manual/gossip',
'test/manual/message',
'test/perf/perf_cache_eviction',
'test/perf/perf_cql_parser',
'test/perf/perf_hash',
'test/perf/perf_mutation',
'test/perf/perf_row_cache_update',
'test/perf/perf_sstable',
'test/unit/lsa_async_eviction_test',
'test/unit/lsa_sync_eviction_test',
'test/unit/memory_footprint_test',
'test/unit/row_cache_alloc_stress_test',
'test/unit/row_cache_stress_test',
]) | pure_boost_tests
for t in tests_not_using_seastar_test_framework:
@@ -914,28 +947,29 @@ perf_tests_seastar_deps = [
for t in perf_tests:
deps[t] = [t + '.cc'] + scylla_tests_dependencies + perf_tests_seastar_deps
deps['tests/sstable_test'] += ['tests/sstable_utils.cc', 'tests/normalizing_reader.cc']
deps['tests/sstable_datafile_test'] += ['tests/sstable_utils.cc', 'tests/normalizing_reader.cc']
deps['tests/mutation_reader_test'] += ['tests/sstable_utils.cc']
deps['test/boost/sstable_test'] += ['test/lib/sstable_utils.cc', 'test/lib/normalizing_reader.cc']
deps['test/boost/sstable_datafile_test'] += ['test/lib/sstable_utils.cc', 'test/lib/normalizing_reader.cc']
deps['test/boost/mutation_reader_test'] += ['test/lib/sstable_utils.cc']
deps['tests/bytes_ostream_test'] = ['tests/bytes_ostream_test.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/input_stream_test'] = ['tests/input_stream_test.cc']
deps['tests/UUID_test'] = ['utils/UUID_gen.cc', 'tests/UUID_test.cc', 'utils/uuid.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc', 'hashers.cc']
deps['tests/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'tests/murmur_hash_test.cc']
deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/log_heap_test'] = ['tests/log_heap_test.cc']
deps['tests/anchorless_list_test'] = ['tests/anchorless_list_test.cc']
deps['tests/perf/perf_fast_forward'] += ['release.cc']
deps['tests/perf/perf_simple_query'] += ['release.cc']
deps['tests/meta_test'] = ['tests/meta_test.cc']
deps['tests/imr_test'] = ['tests/imr_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/reusable_buffer_test'] = ['tests/reusable_buffer_test.cc']
deps['tests/utf8_test'] = ['utils/utf8.cc', 'tests/utf8_test.cc']
deps['tests/small_vector_test'] = ['tests/small_vector_test.cc']
deps['tests/multishard_mutation_query_test'] += ['tests/test_table.cc']
deps['tests/vint_serialization_test'] = ['tests/vint_serialization_test.cc', 'vint-serialization.cc', 'bytes.cc']
deps['test/boost/bytes_ostream_test'] = ['test/boost/bytes_ostream_test.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['test/boost/input_stream_test'] = ['test/boost/input_stream_test.cc']
deps['test/boost/UUID_test'] = ['utils/UUID_gen.cc', 'test/boost/UUID_test.cc', 'utils/uuid.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc', 'hashers.cc']
deps['test/boost/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'test/boost/murmur_hash_test.cc']
deps['test/boost/allocation_strategy_test'] = ['test/boost/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['test/boost/log_heap_test'] = ['test/boost/log_heap_test.cc']
deps['test/boost/anchorless_list_test'] = ['test/boost/anchorless_list_test.cc']
deps['test/perf/perf_fast_forward'] += ['release.cc']
deps['test/perf/perf_simple_query'] += ['release.cc']
deps['test/boost/meta_test'] = ['test/boost/meta_test.cc']
deps['test/manual/imr_test'] = ['test/manual/imr_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['test/boost/reusable_buffer_test'] = ['test/boost/reusable_buffer_test.cc']
deps['test/boost/utf8_test'] = ['utils/utf8.cc', 'test/boost/utf8_test.cc']
deps['test/boost/small_vector_test'] = ['test/boost/small_vector_test.cc']
deps['test/boost/multishard_mutation_query_test'] += ['test/boost/test_table.cc']
deps['test/boost/vint_serialization_test'] = ['test/boost/vint_serialization_test.cc', 'vint-serialization.cc', 'bytes.cc']
deps['test/boost/linearizing_input_stream_test'] = ['test/boost/linearizing_input_stream_test.cc']
deps['tests/duration_test'] += ['tests/exception_utils.cc']
deps['test/boost/duration_test'] += ['test/lib/exception_utils.cc']
deps['utils/gz/gen_crc_combine_table'] = ['utils/gz/gen_crc_combine_table.cc']
@@ -978,9 +1012,13 @@ modes['release']['cxx_ld_flags'] += ' ' + ' '.join(optimization_flags)
gold_linker_flag = gold_supported(compiler=args.cxx)
dbgflag = '-g' if args.debuginfo else ''
dbgflag = '-g -gz' if args.debuginfo else ''
tests_link_rule = 'link' if args.tests_debuginfo else 'link_stripped'
# Strip if debuginfo is disabled, otherwise we end up with partial
# debug info from the libraries we static link with
regular_link_rule = 'link' if args.debuginfo else 'link_stripped'
if args.so:
args.pie = '-shared'
args.fpie = '-fpic'
@@ -997,6 +1035,10 @@ else:
optional_packages = [['libsystemd', 'libsystemd-daemon']]
pkgs = []
# Lua can be provided by lua53 package on Debian-like
# systems and by Lua on others.
pkgs.append('lua53' if have_pkg('lua53') else 'lua')
def setup_first_pkg_of_list(pkglist):
# The HAVE_pkg symbol is taken from the first alternative
@@ -1087,12 +1129,6 @@ scylla_release = file.read().strip()
extra_cxxflags["release.cc"] = "-DSCYLLA_VERSION=\"\\\"" + scylla_version + "\\\"\" -DSCYLLA_RELEASE=\"\\\"" + scylla_release + "\\\"\""
# We never compress debug info in debug mode
modes['debug']['cxxflags'] += ' -gz'
# We compress it by default in release mode
flag_dest = 'cxx_ld_flags' if args.compress_exec_debuginfo else 'cxxflags'
modes['release'][flag_dest] += ' -gz'
for m in ['debug', 'release', 'sanitize']:
modes[m]['cxxflags'] += ' ' + dbgflag
@@ -1233,6 +1269,11 @@ if args.antlr3_exec:
else:
antlr3_exec = "antlr3"
if args.ragel_exec:
ragel_exec = args.ragel_exec
else:
ragel_exec = "ragel"
for mode in build_modes:
configure_zstd(outdir, mode)
@@ -1249,6 +1290,7 @@ with open(buildfile_tmp, 'w') as f:
cxx = {cxx}
cxxflags = {user_cflags} {warnings} {defines}
ldflags = {gold_linker_flag} {user_ldflags}
ldflags_build = {gold_linker_flag}
libs = {libs}
pool link_pool
depth = {link_pool_depth}
@@ -1267,6 +1309,11 @@ with open(buildfile_tmp, 'w') as f:
command = {ninja} -C $subdir $target
restat = 1
description = NINJA $out
rule ragel
# sed away a bug in ragel 7 that emits some extraneous _nfa* variables
# (the $$ is collapsed to a single one by ninja)
command = {ragel_exec} -G2 -o $out $in && sed -i -e '1h;2,$$H;$$!d;g' -re 's/static const char _nfa[^;]*;//g' $out
description = RAGEL $out
rule run
command = $in > $out
description = GEN $out
@@ -1286,7 +1333,7 @@ with open(buildfile_tmp, 'w') as f:
libs_{mode} = -l{fmt_lib}
seastar_libs_{mode} = {seastar_libs}
rule cxx.{mode}
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags -c -o $out $in
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags_{mode} $cxxflags $obj_cxxflags -c -o $out $in
description = CXX $out
depfile = $out.d
rule link.{mode}
@@ -1297,6 +1344,10 @@ with open(buildfile_tmp, 'w') as f:
command = $cxx $ld_flags_{mode} -s $ldflags -o $out $in $libs $libs_{mode}
description = LINK (stripped) $out
pool = link_pool
rule link_build.{mode}
command = $cxx $ld_flags_{mode} $ldflags_build -o $out $in $libs $libs_{mode}
description = LINK (build) $out
pool = link_pool
rule ar.{mode}
command = rm -f $out; ar cr $out $in; ranlib $out
description = AR $out
@@ -1331,8 +1382,10 @@ with open(buildfile_tmp, 'w') as f:
swaggers = {}
serializers = {}
thrifts = set()
ragels = {}
antlr3_grammars = set()
seastar_dep = 'build/{}/seastar/libseastar.a'.format(mode)
seastar_testing_dep = 'build/{}/seastar/libseastar_testing.a'.format(mode)
for binary in build_artifacts:
if binary in other:
continue
@@ -1356,7 +1409,7 @@ with open(buildfile_tmp, 'w') as f:
'zstd/lib/libzstd.a',
]])
objs.append('$builddir/' + mode + '/gen/utils/gz/crc_combine_table.o')
if binary.startswith('tests/'):
if binary.startswith('test/'):
local_libs = '$seastar_libs_{} $libs'.format(mode)
if binary in pure_boost_tests:
local_libs += ' ' + maybe_static(args.staticboost, '-lboost_unit_test_framework')
@@ -1370,12 +1423,12 @@ with open(buildfile_tmp, 'w') as f:
# So we strip the tests by default; The user can very
# quickly re-link the test unstripped by adding a "_g"
# to the test name, e.g., "ninja build/release/testname_g"
f.write('build $builddir/{}/{}: {}.{} {} | {}\n'.format(mode, binary, tests_link_rule, mode, str.join(' ', objs), seastar_dep))
f.write('build $builddir/{}/{}: {}.{} {} | {} {}\n'.format(mode, binary, tests_link_rule, mode, str.join(' ', objs), seastar_dep, seastar_testing_dep))
f.write(' libs = {}\n'.format(local_libs))
f.write('build $builddir/{}/{}_g: link.{} {} | {}\n'.format(mode, binary, mode, str.join(' ', objs), seastar_dep))
f.write('build $builddir/{}/{}_g: {}.{} {} | {} {}\n'.format(mode, binary, regular_link_rule, mode, str.join(' ', objs), seastar_dep, seastar_testing_dep))
f.write(' libs = {}\n'.format(local_libs))
else:
f.write('build $builddir/{}/{}: link.{} {} | {}\n'.format(mode, binary, mode, str.join(' ', objs), seastar_dep))
f.write('build $builddir/{}/{}: {}.{} {} | {}\n'.format(mode, binary, regular_link_rule, mode, str.join(' ', objs), seastar_dep))
if has_thrift:
f.write(' libs = {} {} $seastar_libs_{} $libs\n'.format(thrift_libs, maybe_static(args.staticboost, '-lboost_system'), mode))
for src in srcs:
@@ -1388,6 +1441,9 @@ with open(buildfile_tmp, 'w') as f:
elif src.endswith('.json'):
hh = '$builddir/' + mode + '/gen/' + src + '.hh'
swaggers[hh] = src
elif src.endswith('.rl'):
hh = '$builddir/' + mode + '/gen/' + src.replace('.rl', '.hh')
ragels[hh] = src
elif src.endswith('.thrift'):
thrifts.add(src)
elif src.endswith('.g'):
@@ -1398,7 +1454,7 @@ with open(buildfile_tmp, 'w') as f:
compiles['$builddir/' + mode + '/utils/gz/gen_crc_combine_table.o'] = 'utils/gz/gen_crc_combine_table.cc'
f.write('build {}: run {}\n'.format('$builddir/' + mode + '/gen/utils/gz/crc_combine_table.cc',
'$builddir/' + mode + '/utils/gz/gen_crc_combine_table'))
f.write('build {}: link.{} {}\n'.format('$builddir/' + mode + '/utils/gz/gen_crc_combine_table', mode,
f.write('build {}: link_build.{} {}\n'.format('$builddir/' + mode + '/utils/gz/gen_crc_combine_table', mode,
'$builddir/' + mode + '/utils/gz/gen_crc_combine_table.o'))
f.write(' libs = $seastar_libs_{}\n'.format(mode))
f.write(
@@ -1416,6 +1472,7 @@ with open(buildfile_tmp, 'w') as f:
gen_headers += g.headers('$builddir/{}/gen'.format(mode))
gen_headers += list(swaggers.keys())
gen_headers += list(serializers.keys())
gen_headers += list(ragels.keys())
gen_headers_dep = ' '.join(gen_headers)
for obj in compiles:
@@ -1429,6 +1486,9 @@ with open(buildfile_tmp, 'w') as f:
for hh in serializers:
src = serializers[hh]
f.write('build {}: serializer {} | idl-compiler.py\n'.format(hh, src))
for hh in ragels:
src = ragels[hh]
f.write('build {}: ragel {}\n'.format(hh, src))
for thrift in thrifts:
outs = ' '.join(thrift.generated('$builddir/{}/gen'.format(mode)))
f.write('build {}: thrift.{} {}\n'.format(outs, mode, thrift.source))
@@ -1442,9 +1502,12 @@ with open(buildfile_tmp, 'w') as f:
for cc in grammar.sources('$builddir/{}/gen'.format(mode)):
obj = cc.replace('.cpp', '.o')
f.write('build {}: cxx.{} {} || {}\n'.format(obj, mode, cc, ' '.join(serializers)))
if cc.endswith('Parser.cpp') and has_sanitize_address_use_after_scope:
# Parsers end up using huge amounts of stack space and overflowing their stack
f.write(' obj_cxxflags = -fno-sanitize-address-use-after-scope\n')
if cc.endswith('Parser.cpp'):
# Unoptimized parsers end up using huge amounts of stack space and overflowing their stack
flags = '-O1'
if has_sanitize_address_use_after_scope:
flags += ' -fno-sanitize-address-use-after-scope'
f.write(' obj_cxxflags = %s\n' % flags)
for hh in headers:
f.write('build $builddir/{mode}/{hh}.o: checkhh.{mode} {hh} || {gen_headers_dep}\n'.format(
mode=mode, hh=hh, gen_headers_dep=gen_headers_dep))
@@ -1453,7 +1516,12 @@ with open(buildfile_tmp, 'w') as f:
.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(' subdir = build/{mode}/seastar\n'.format(**locals()))
f.write(' target = seastar seastar_testing\n'.format(**locals()))
f.write(' target = seastar\n'.format(**locals()))
f.write('build build/{mode}/seastar/libseastar_testing.a: ninja\n'
.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(' subdir = build/{mode}/seastar\n'.format(**locals()))
f.write(' target = seastar_testing\n'.format(**locals()))
f.write('build build/{mode}/seastar/apps/iotune/iotune: ninja\n'
.format(**locals()))
f.write(' pool = submodule_pool\n')
@@ -1481,7 +1549,7 @@ with open(buildfile_tmp, 'w') as f:
rule configure
command = {python} configure.py $configure_args
generator = 1
build build.ninja: configure | configure.py
build build.ninja: configure | configure.py SCYLLA-VERSION-GEN
rule cscope
command = find -name '*.[chS]' -o -name "*.cc" -o -name "*.hh" | cscope -bq -i-
description = CSCOPE
@@ -1490,6 +1558,10 @@ with open(buildfile_tmp, 'w') as f:
command = rm -rf build
description = CLEAN
build clean: clean
rule mode_list
command = echo {modes_list}
description = List configured modes
build mode_list: mode_list
default {modes_list}
''').format(modes_list=' '.join(default_modes), **globals()))
f.write(textwrap.dedent('''\

71
connection_notifier.cc Normal file
View File

@@ -0,0 +1,71 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "connection_notifier.hh"
#include "db/query_context.hh"
#include "cql3/constants.hh"
#include "database.hh"
#include "service/storage_proxy.hh"
#include <stdexcept>
namespace db::system_keyspace {
extern const char *const CLIENTS;
}
static sstring to_string(client_type ct) {
switch (ct) {
case client_type::cql: return "cql";
case client_type::thrift: return "thrift";
case client_type::alternator: return "alternator";
default: throw std::runtime_error("Invalid client_type");
}
}
future<> notify_new_client(client_data cd) {
// FIXME: consider prepared statement
const static sstring req
= format("INSERT INTO system.{} (address, port, client_type, shard_id, protocol_version, username) "
"VALUES (?, ?, ?, ?, ?, ?);", db::system_keyspace::CLIENTS);
return db::execute_cql(req,
std::move(cd.ip), cd.port, to_string(cd.ct), cd.shard_id,
cd.protocol_version.has_value() ? data_value(*cd.protocol_version) : data_value::make_null(int32_type),
cd.username.value_or("anonymous")).discard_result();
}
future<> notify_disconnected_client(gms::inet_address addr, client_type ct, int port) {
// FIXME: consider prepared statement
const static sstring req
= format("DELETE FROM system.{} where address=? AND port=? AND client_type=?;",
db::system_keyspace::CLIENTS);
return db::execute_cql(req, addr.addr(), port, to_string(ct)).discard_result();
}
future<> clear_clientlist() {
auto& db_local = service::get_storage_proxy().local().get_db().local();
return db_local.truncate(
db_local.find_keyspace(db::system_keyspace_name()),
db_local.find_column_family(db::system_keyspace_name(),
db::system_keyspace::CLIENTS),
[] { return make_ready_future<db_clock::time_point>(db_clock::now()); },
false /* with_snapshot */);
}

57
connection_notifier.hh Normal file
View File

@@ -0,0 +1,57 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "gms/inet_address.hh"
#include <seastar/core/sstring.hh>
#include <optional>
enum class client_type {
cql = 0,
thrift,
alternator,
};
// Representation of a row in `system.clients'. std::optionals are for nullable cells.
struct client_data {
gms::inet_address ip;
int32_t port;
client_type ct;
int32_t shard_id; /// ID of server-side shard which is processing the connection.
// `optional' column means that it's nullable (possibly because it's
// unimplemented yet). If you want to fill ("implement") any of them,
// remember to update the query in `notify_new_client()'.
std::optional<sstring> connection_stage;
std::optional<sstring> driver_name;
std::optional<sstring> driver_version;
std::optional<sstring> hostname;
std::optional<int32_t> protocol_version;
std::optional<sstring> ssl_cipher_suite;
std::optional<bool> ssl_enabled;
std::optional<sstring> ssl_protocol;
std::optional<sstring> username;
};
future<> notify_new_client(client_data cd);
future<> notify_disconnected_client(gms::inet_address addr, client_type ct, int port);
future<> clear_clientlist();

View File

@@ -43,12 +43,14 @@ options {
#include "cql3/statements/create_table_statement.hh"
#include "cql3/statements/create_view_statement.hh"
#include "cql3/statements/create_type_statement.hh"
#include "cql3/statements/create_function_statement.hh"
#include "cql3/statements/drop_type_statement.hh"
#include "cql3/statements/alter_type_statement.hh"
#include "cql3/statements/property_definitions.hh"
#include "cql3/statements/drop_index_statement.hh"
#include "cql3/statements/drop_table_statement.hh"
#include "cql3/statements/drop_view_statement.hh"
#include "cql3/statements/drop_function_statement.hh"
#include "cql3/statements/truncate_statement.hh"
#include "cql3/statements/raw/update_statement.hh"
#include "cql3/statements/raw/insert_statement.hh"
@@ -243,10 +245,14 @@ struct uninitialized {
return res;
}
bool convert_boolean_literal(std::string_view s) {
std::string lower_s(s.size(), '\0');
sstring to_lower(std::string_view s) {
sstring lower_s(s.size(), '\0');
std::transform(s.cbegin(), s.cend(), lower_s.begin(), &::tolower);
return lower_s == "true";
return lower_s;
}
bool convert_boolean_literal(std::string_view s) {
return to_lower(s) == "true";
}
void add_raw_update(std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>,::shared_ptr<cql3::operation::raw_update>>>& operations,
@@ -348,9 +354,9 @@ cqlStatement returns [shared_ptr<raw::parsed_statement> stmt]
| st25=createTypeStatement { $stmt = st25; }
| st26=alterTypeStatement { $stmt = st26; }
| st27=dropTypeStatement { $stmt = st27; }
#if 0
| st28=createFunctionStatement { $stmt = st28; }
| st29=dropFunctionStatement { $stmt = st29; }
#if 0
| st30=createAggregateStatement { $stmt = st30; }
| st31=dropAggregateStatement { $stmt = st31; }
#endif
@@ -686,54 +692,56 @@ dropAggregateStatement returns [DropAggregateStatement expr]
)?
{ $expr = new DropAggregateStatement(fn, argsTypes, argsPresent, ifExists); }
;
#endif
createFunctionStatement returns [CreateFunctionStatement expr]
createFunctionStatement returns [shared_ptr<cql3::statements::create_function_statement> expr]
@init {
boolean orReplace = false;
boolean ifNotExists = false;
bool or_replace = false;
bool if_not_exists = false;
boolean deterministic = true;
List<ColumnIdentifier> argsNames = new ArrayList<>();
List<CQL3Type.Raw> argsTypes = new ArrayList<>();
std::vector<shared_ptr<cql3::column_identifier>> arg_names;
std::vector<shared_ptr<cql3_type::raw>> arg_types;
bool called_on_null_input = false;
}
: K_CREATE (K_OR K_REPLACE { orReplace = true; })?
((K_NON { deterministic = false; })? K_DETERMINISTIC)?
K_FUNCTION
(K_IF K_NOT K_EXISTS { ifNotExists = true; })?
: K_CREATE
// "OR REPLACE" and "IF NOT EXISTS" cannot be used together
((K_OR K_REPLACE { or_replace = true; } K_FUNCTION)
| (K_FUNCTION K_IF K_NOT K_EXISTS { if_not_exists = true; })
| K_FUNCTION)
fn=functionName
'('
(
k=ident v=comparatorType { argsNames.add(k); argsTypes.add(v); }
( ',' k=ident v=comparatorType { argsNames.add(k); argsTypes.add(v); } )*
k=ident v=comparatorType { arg_names.push_back(k); arg_types.push_back(v); }
( ',' k=ident v=comparatorType { arg_names.push_back(k); arg_types.push_back(v); } )*
)?
')'
( (K_RETURNS K_NULL) | (K_CALLED { called_on_null_input = true; })) K_ON K_NULL K_INPUT
K_RETURNS rt = comparatorType
K_LANGUAGE language = IDENT
K_AS body = STRING_LITERAL
{ $expr = new CreateFunctionStatement(fn, $language.text.toLowerCase(), $body.text, deterministic, argsNames, argsTypes, rt, orReplace, ifNotExists); }
{ $expr = ::make_shared<cql3::statements::create_function_statement>(std::move(fn), to_lower($language.text), $body.text, std::move(arg_names), std::move(arg_types), std::move(rt), called_on_null_input, or_replace, if_not_exists); }
;
dropFunctionStatement returns [DropFunctionStatement expr]
dropFunctionStatement returns [shared_ptr<cql3::statements::drop_function_statement> expr]
@init {
boolean ifExists = false;
List<CQL3Type.Raw> argsTypes = new ArrayList<>();
boolean argsPresent = false;
bool if_exists = false;
std::vector<shared_ptr<cql3_type::raw>> arg_types;
bool args_present = false;
}
: K_DROP K_FUNCTION
(K_IF K_EXISTS { ifExists = true; } )?
(K_IF K_EXISTS { if_exists = true; } )?
fn=functionName
(
'('
(
v=comparatorType { argsTypes.add(v); }
( ',' v=comparatorType { argsTypes.add(v); } )*
v=comparatorType { arg_types.push_back(v); }
( ',' v=comparatorType { arg_types.push_back(v); } )*
)?
')'
{ argsPresent = true; }
{ args_present = true; }
)?
{ $expr = new DropFunctionStatement(fn, argsTypes, argsPresent, ifExists); }
{ $expr = ::make_shared<cql3::statements::drop_function_statement>(std::move(fn), std::move(arg_types), args_present, if_exists); }
;
#endif
/**
* CREATE KEYSPACE [IF NOT EXISTS] <KEYSPACE> WITH attr1 = value1 AND attr2 = value2;
@@ -1743,8 +1751,8 @@ basic_unreserved_keyword returns [sstring str]
| K_INITCOND
| K_RETURNS
| K_LANGUAGE
| K_NON
| K_DETERMINISTIC
| K_CALLED
| K_INPUT
| K_JSON
| K_CACHE
| K_BYPASS
@@ -1883,11 +1891,11 @@ K_STYPE: S T Y P E;
K_FINALFUNC: F I N A L F U N C;
K_INITCOND: I N I T C O N D;
K_RETURNS: R E T U R N S;
K_CALLED: C A L L E D;
K_INPUT: I N P U T;
K_LANGUAGE: L A N G U A G E;
K_NON: N O N;
K_OR: O R;
K_REPLACE: R E P L A C E;
K_DETERMINISTIC: D E T E R M I N I S T I C;
K_JSON: J S O N;
K_DEFAULT: D E F A U L T;
K_UNSET: U N S E T;

View File

@@ -55,7 +55,7 @@ abstract_marker::abstract_marker(int32_t bind_index, ::shared_ptr<column_specifi
, _receiver{std::move(receiver)}
{ }
void abstract_marker::collect_marker_specification(::shared_ptr<variable_specifications> bound_names) {
void abstract_marker::collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) {
bound_names->add(_bind_index, _receiver);
}

View File

@@ -57,7 +57,7 @@ protected:
public:
abstract_marker(int32_t bind_index, ::shared_ptr<column_specification>&& receiver);
virtual void collect_marker_specification(::shared_ptr<variable_specifications> bound_names) override;
virtual void collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) override;
virtual bool contains_bind_marker() const override;

View File

@@ -120,7 +120,7 @@ int32_t attributes::get_time_to_live(const query_options& options) {
return ttl;
}
void attributes::collect_marker_specification(::shared_ptr<variable_specifications> bound_names) {
void attributes::collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) {
if (_timestamp) {
_timestamp->collect_marker_specification(bound_names);
}

View File

@@ -69,7 +69,7 @@ public:
int32_t get_time_to_live(const query_options& options);
void collect_marker_specification(::shared_ptr<variable_specifications> bound_names);
void collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names);
class raw {
public:

View File

@@ -114,7 +114,7 @@ uint32_t read_and_check_list_index(const cql3::raw_value_view& key) {
namespace cql3 {
bool
column_condition::uses_function(const sstring& ks_name, const sstring& function_name) {
column_condition::uses_function(const sstring& ks_name, const sstring& function_name) const {
if (bool(_collection_element) && _collection_element->uses_function(ks_name, function_name)) {
return true;
}
@@ -131,7 +131,7 @@ column_condition::uses_function(const sstring& ks_name, const sstring& function_
return false;
}
void column_condition::collect_marker_specificaton(::shared_ptr<variable_specifications> bound_names) {
void column_condition::collect_marker_specificaton(lw_shared_ptr<variable_specifications> bound_names) {
if (_collection_element) {
_collection_element->collect_marker_specification(bound_names);
}

View File

@@ -85,9 +85,9 @@ public:
* @param boundNames the list of column specification where to collect the
* bind variables of this term in.
*/
void collect_marker_specificaton(::shared_ptr<variable_specifications> bound_names);
void collect_marker_specificaton(lw_shared_ptr<variable_specifications> bound_names);
bool uses_function(const sstring& ks_name, const sstring& function_name);
bool uses_function(const sstring& ks_name, const sstring& function_name) const;
// Retrieve parameter marker values, if any, find the appropriate collection
// element if the cell is a collection and an element access is used in the expression,

View File

@@ -31,13 +31,48 @@
#include "types/map.hh"
#include "types/set.hh"
#include "types/list.hh"
#include "concrete_types.hh"
namespace cql3 {
static cql3_type::kind get_cql3_kind(const abstract_type& t) {
struct visitor {
cql3_type::kind operator()(const ascii_type_impl&) { return cql3_type::kind::ASCII; }
cql3_type::kind operator()(const byte_type_impl&) { return cql3_type::kind::TINYINT; }
cql3_type::kind operator()(const bytes_type_impl&) { return cql3_type::kind::BLOB; }
cql3_type::kind operator()(const boolean_type_impl&) { return cql3_type::kind::BOOLEAN; }
cql3_type::kind operator()(const counter_type_impl&) { return cql3_type::kind::COUNTER; }
cql3_type::kind operator()(const decimal_type_impl&) { return cql3_type::kind::DECIMAL; }
cql3_type::kind operator()(const double_type_impl&) { return cql3_type::kind::DOUBLE; }
cql3_type::kind operator()(const duration_type_impl&) { return cql3_type::kind::DURATION; }
cql3_type::kind operator()(const empty_type_impl&) { return cql3_type::kind::EMPTY; }
cql3_type::kind operator()(const float_type_impl&) { return cql3_type::kind::FLOAT; }
cql3_type::kind operator()(const inet_addr_type_impl&) { return cql3_type::kind::INET; }
cql3_type::kind operator()(const int32_type_impl&) { return cql3_type::kind::INT; }
cql3_type::kind operator()(const long_type_impl&) { return cql3_type::kind::BIGINT; }
cql3_type::kind operator()(const short_type_impl&) { return cql3_type::kind::SMALLINT; }
cql3_type::kind operator()(const simple_date_type_impl&) { return cql3_type::kind::DATE; }
cql3_type::kind operator()(const utf8_type_impl&) { return cql3_type::kind::TEXT; }
cql3_type::kind operator()(const time_type_impl&) { return cql3_type::kind::TIME; }
cql3_type::kind operator()(const timestamp_date_base_class&) { return cql3_type::kind::TIMESTAMP; }
cql3_type::kind operator()(const timeuuid_type_impl&) { return cql3_type::kind::TIMEUUID; }
cql3_type::kind operator()(const uuid_type_impl&) { return cql3_type::kind::UUID; }
cql3_type::kind operator()(const varint_type_impl&) { return cql3_type::kind::VARINT; }
cql3_type::kind operator()(const reversed_type_impl& r) { return get_cql3_kind(*r.underlying_type()); }
cql3_type::kind operator()(const tuple_type_impl&) { assert(0 && "no kind for this type"); }
cql3_type::kind operator()(const collection_type_impl&) { assert(0 && "no kind for this type"); }
};
return visit(t, visitor{});
}
cql3_type::kind_enum_set::prepared cql3_type::get_kind() const {
return kind_enum_set::prepare(get_cql3_kind(*_type));
}
cql3_type cql3_type::raw::prepare(database& db, const sstring& keyspace) {
try {
auto&& ks = db.find_keyspace(keyspace);
return prepare_internal(keyspace, *ks.metadata()->user_types());
return prepare_internal(keyspace, ks.metadata()->user_types());
} catch (no_such_keyspace& nsk) {
throw exceptions::invalid_request_exception("Unknown keyspace " + keyspace);
}
@@ -66,7 +101,7 @@ public:
virtual cql3_type prepare(database& db, const sstring& keyspace) {
return _type;
}
cql3_type prepare_internal(const sstring&, user_types_metadata&) override {
cql3_type prepare_internal(const sstring&, const user_types_metadata&) override {
return _type;
}
@@ -123,7 +158,7 @@ public:
return true;
}
virtual cql3_type prepare_internal(const sstring& keyspace, user_types_metadata& user_types) override {
virtual cql3_type prepare_internal(const sstring& keyspace, const user_types_metadata& user_types) override {
assert(_values); // "Got null values type for a collection";
if (!is_frozen() && _values->supports_freezing() && !_values->is_frozen()) {
@@ -190,7 +225,7 @@ public:
_frozen = true;
}
virtual cql3_type prepare_internal(const sstring& keyspace, user_types_metadata& user_types) override {
virtual cql3_type prepare_internal(const sstring& keyspace, const user_types_metadata& user_types) override {
if (_name.has_keyspace()) {
// The provided keyspace is the one of the current statement this is part of. If it's different from the keyspace of
// the UTName, we reject since we want to limit user types to their own keyspace (see #6643)
@@ -249,7 +284,7 @@ public:
}
_frozen = true;
}
virtual cql3_type prepare_internal(const sstring& keyspace, user_types_metadata& user_types) override {
virtual cql3_type prepare_internal(const sstring& keyspace, const user_types_metadata& user_types) override {
if (!is_frozen()) {
freeze();
}
@@ -395,14 +430,42 @@ operator<<(std::ostream& os, const cql3_type::raw& r) {
namespace util {
sstring maybe_quote(const sstring& identifier) {
static const std::regex unquoted_identifier_re("[a-z][a-z0-9_]*");
if (std::regex_match(identifier.begin(), identifier.end(), unquoted_identifier_re)) {
const auto* p = identifier.begin();
const auto* ep = identifier.end();
// quote empty string
if (__builtin_expect(p == ep, false)) {
return "\"\"";
}
// string needs no quoting if it matches [a-z][a-z0-9_]*
// quotes ('"') in the string are doubled
bool need_quotes;
bool has_quotes;
auto c = *p;
if ('a' <= c && c <= 'z') {
need_quotes = false;
has_quotes = false;
} else {
need_quotes = true;
has_quotes = (c == '"');
}
while ((++p != ep) && !has_quotes) {
c = *p;
if (!(('a' <= c && c <= 'z') || ('0' <= c && c <= '9') || (c == '_'))) {
need_quotes = true;
has_quotes = (c == '"');
}
}
if (!need_quotes) {
return identifier;
}
if (!has_quotes) {
return make_sstring("\"", identifier, "\"");
}
static const std::regex double_quote_re("\"");
std::string result = identifier;
std::regex_replace(result, double_quote_re, "\"\"");
return '"' + result + '"';
return '"' + std::regex_replace(identifier.c_str(), double_quote_re, "\"\"") + '"';
}
}

View File

@@ -81,7 +81,7 @@ public:
virtual bool references_user_type(const sstring&) const;
virtual std::optional<sstring> keyspace() const;
virtual void freeze();
virtual cql3_type prepare_internal(const sstring& keyspace, user_types_metadata&) = 0;
virtual cql3_type prepare_internal(const sstring& keyspace, const user_types_metadata&) = 0;
virtual cql3_type prepare(database& db, const sstring& keyspace);
static shared_ptr<raw> from(cql3_type type);
static shared_ptr<raw> user_type(ut_name name);
@@ -103,6 +103,33 @@ private:
}
public:
enum class kind : int8_t {
ASCII, BIGINT, BLOB, BOOLEAN, COUNTER, DECIMAL, DOUBLE, EMPTY, FLOAT, INT, SMALLINT, TINYINT, INET, TEXT, TIMESTAMP, UUID, VARINT, TIMEUUID, DATE, TIME, DURATION
};
using kind_enum = super_enum<kind,
kind::ASCII,
kind::BIGINT,
kind::BLOB,
kind::BOOLEAN,
kind::COUNTER,
kind::DECIMAL,
kind::DOUBLE,
kind::EMPTY,
kind::FLOAT,
kind::INET,
kind::INT,
kind::SMALLINT,
kind::TINYINT,
kind::TEXT,
kind::TIMESTAMP,
kind::UUID,
kind::VARINT,
kind::TIMEUUID,
kind::DATE,
kind::TIME,
kind::DURATION>;
using kind_enum_set = enum_set<kind_enum>;
static thread_local cql3_type ascii;
static thread_local cql3_type bigint;
static thread_local cql3_type blob;
@@ -127,9 +154,7 @@ public:
static const std::vector<cql3_type>& values();
public:
using kind = abstract_type::cql3_kind;
using kind_enum_set = abstract_type::cql3_kind_enum_set;
kind_enum_set::prepared get_kind() const { return _type->get_cql3_kind(); }
kind_enum_set::prepared get_kind() const;
};
inline bool operator==(const cql3_type& a, const cql3_type& b) {

View File

@@ -72,14 +72,14 @@ public:
timeout_config_selector get_timeout_config_selector() const { return _timeout_config_selector; }
virtual uint32_t get_bound_terms() = 0;
virtual uint32_t get_bound_terms() const = 0;
/**
* Perform any access verification necessary for the statement.
*
* @param state the current client state
*/
virtual future<> check_access(const service::client_state& state) = 0;
virtual future<> check_access(const service::client_state& state) const = 0;
/**
* Perform additional validation required by the statment.
@@ -87,7 +87,7 @@ public:
*
* @param state the current client state
*/
virtual void validate(service::storage_proxy& proxy, const service::client_state& state) = 0;
virtual void validate(service::storage_proxy& proxy, const service::client_state& state) const = 0;
/**
* Execute the statement and return the resulting result or null if there is no result.
@@ -96,7 +96,7 @@ public:
* @param options options for this query (consistency, variables, pageSize, ...)
*/
virtual future<::shared_ptr<cql_transport::messages::result_message>>
execute(service::storage_proxy& proxy, service::query_state& state, const query_options& options) = 0;
execute(service::storage_proxy& proxy, service::query_state& state, const query_options& options) const = 0;
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const = 0;

View File

@@ -55,12 +55,12 @@ class error_collector : public error_listener<RecognizerType, ExceptionBaseType>
/**
* The offset of the first token of the snippet.
*/
static const int32_t FIRST_TOKEN_OFFSET = 10;
static constexpr int32_t FIRST_TOKEN_OFFSET = 10;
/**
* The offset of the last token of the snippet.
*/
static const int32_t LAST_TOKEN_OFFSET = 2;
static constexpr int32_t LAST_TOKEN_OFFSET = 2;
/**
* The CQL query.

View File

@@ -48,6 +48,10 @@
#include <iosfwd>
#include <boost/functional/hash.hpp>
namespace std {
std::ostream& operator<<(std::ostream& os, const std::vector<data_type>& arg_types);
}
namespace cql3 {
namespace functions {
@@ -66,6 +70,9 @@ protected:
}
public:
virtual bool requires_thread() const;
virtual const function_name& name() const override {
return _name;
}
@@ -84,15 +91,15 @@ public:
&& _return_type == x._return_type;
}
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) override {
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const override {
return _name.keyspace == ks_name && _name.name == function_name;
}
virtual bool has_reference_to(function& f) override {
virtual bool has_reference_to(function& f) const override {
return false;
}
virtual sstring column_name(const std::vector<sstring>& column_names) override {
virtual sstring column_name(const std::vector<sstring>& column_names) const override {
return format("{}({})", _name, join(", ", column_names));
}
@@ -103,12 +110,7 @@ inline
void
abstract_function::print(std::ostream& os) const {
os << _name << " : (";
for (size_t i = 0; i < _arg_types.size(); ++i) {
if (i > 0) {
os << ", ";
}
os << _arg_types[i]->as_cql3_type().to_string();
}
os << _arg_types;
os << ") -> " << _return_type->as_cql3_type().to_string();
}

View File

@@ -0,0 +1,612 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2019 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "utils/big_decimal.hh"
#include "aggregate_fcts.hh"
#include "functions.hh"
#include "native_aggregate_function.hh"
#include "exceptions/exceptions.hh"
using namespace cql3;
using namespace functions;
using namespace aggregate_fcts;
namespace {
class impl_count_function : public aggregate_function::aggregate {
int64_t _count;
public:
virtual void reset() override {
_count = 0;
}
virtual opt_bytes compute(cql_serialization_format sf) override {
return long_type->decompose(_count);
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
++_count;
}
};
class count_rows_function final : public native_aggregate_function {
public:
count_rows_function() : native_aggregate_function(COUNT_ROWS_FUNCTION_NAME, long_type, {}) {}
virtual std::unique_ptr<aggregate> new_aggregate() override {
return std::make_unique<impl_count_function>();
}
virtual sstring column_name(const std::vector<sstring>& column_names) const override {
return "count";
}
};
// We need a wider accumulator for sum and average,
// since summing the inputs can overflow the input type
template <typename T>
struct accumulator_for;
template <typename NarrowType, typename AccType>
static NarrowType checking_narrow(AccType acc) {
NarrowType ret = static_cast<NarrowType>(acc);
if (static_cast<AccType>(ret) != acc) {
throw exceptions::overflow_error_exception("Sum overflow. Values should be casted to a wider type.");
}
return ret;
}
template <>
struct accumulator_for<int8_t> {
using type = __int128;
static int8_t narrow(type acc) {
return checking_narrow<int8_t>(acc);
}
};
template <>
struct accumulator_for<int16_t> {
using type = __int128;
static int16_t narrow(type acc) {
return checking_narrow<int16_t>(acc);
}
};
template <>
struct accumulator_for<int32_t> {
using type = __int128;
static int32_t narrow(type acc) {
return checking_narrow<int32_t>(acc);
}
};
template <>
struct accumulator_for<int64_t> {
using type = __int128;
static int64_t narrow(type acc) {
return checking_narrow<int64_t>(acc);
}
};
template <>
struct accumulator_for<float> {
using type = float;
static auto narrow(type acc) {
return acc;
}
};
template <>
struct accumulator_for<double> {
using type = double;
static auto narrow(type acc) {
return acc;
}
};
template <>
struct accumulator_for<boost::multiprecision::cpp_int> {
using type = boost::multiprecision::cpp_int;
static auto narrow(type acc) {
return acc;
}
};
template <>
struct accumulator_for<big_decimal> {
using type = big_decimal;
static auto narrow(type acc) {
return acc;
}
};
template <typename Type>
class impl_sum_function_for final : public aggregate_function::aggregate {
using accumulator_type = typename accumulator_for<Type>::type;
accumulator_type _sum{};
public:
virtual void reset() override {
_sum = {};
}
virtual opt_bytes compute(cql_serialization_format sf) override {
return data_type_for<Type>()->decompose(accumulator_for<Type>::narrow(_sum));
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
if (!values[0]) {
return;
}
_sum += value_cast<Type>(data_type_for<Type>()->deserialize(*values[0]));
}
};
template <typename Type>
class sum_function_for final : public native_aggregate_function {
public:
sum_function_for() : native_aggregate_function("sum", data_type_for<Type>(), { data_type_for<Type>() }) {}
virtual std::unique_ptr<aggregate> new_aggregate() override {
return std::make_unique<impl_sum_function_for<Type>>();
}
};
template <typename Type>
static
shared_ptr<aggregate_function>
make_sum_function() {
return make_shared<sum_function_for<Type>>();
}
template <typename Type>
class impl_div_for_avg {
public:
static Type div(const typename accumulator_for<Type>::type& x, const int64_t y) {
return x/y;
}
};
template <>
class impl_div_for_avg<big_decimal> {
public:
static big_decimal div(const big_decimal& x, const int64_t y) {
return x.div(y, big_decimal::rounding_mode::HALF_EVEN);
}
};
template <typename Type>
class impl_avg_function_for final : public aggregate_function::aggregate {
typename accumulator_for<Type>::type _sum{};
int64_t _count = 0;
public:
virtual void reset() override {
_sum = {};
_count = 0;
}
virtual opt_bytes compute(cql_serialization_format sf) override {
Type ret{};
if (_count) {
ret = impl_div_for_avg<Type>::div(_sum, _count);
}
return data_type_for<Type>()->decompose(ret);
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
if (!values[0]) {
return;
}
++_count;
_sum += value_cast<Type>(data_type_for<Type>()->deserialize(*values[0]));
}
};
template <typename Type>
class avg_function_for final : public native_aggregate_function {
public:
avg_function_for() : native_aggregate_function("avg", data_type_for<Type>(), { data_type_for<Type>() }) {}
virtual std::unique_ptr<aggregate> new_aggregate() override {
return std::make_unique<impl_avg_function_for<Type>>();
}
};
template <typename Type>
static
shared_ptr<aggregate_function>
make_avg_function() {
return make_shared<avg_function_for<Type>>();
}
template <typename T>
struct aggregate_type_for {
using type = T;
};
template<>
struct aggregate_type_for<ascii_native_type> {
using type = ascii_native_type::primary_type;
};
template<>
struct aggregate_type_for<simple_date_native_type> {
using type = simple_date_native_type::primary_type;
};
template<>
struct aggregate_type_for<timeuuid_native_type> {
using type = timeuuid_native_type::primary_type;
};
template<>
struct aggregate_type_for<time_native_type> {
using type = time_native_type::primary_type;
};
template <typename Type>
const Type& max_wrapper(const Type& t1, const Type& t2) {
using std::max;
return max(t1, t2);
}
inline const net::inet_address& max_wrapper(const net::inet_address& t1, const net::inet_address& t2) {
using family = seastar::net::inet_address::family;
const size_t len =
(t1.in_family() == family::INET || t2.in_family() == family::INET)
? sizeof(::in_addr) : sizeof(::in6_addr);
return std::memcmp(t1.data(), t2.data(), len) >= 0 ? t1 : t2;
}
template <typename Type>
class impl_max_function_for final : public aggregate_function::aggregate {
std::optional<typename aggregate_type_for<Type>::type> _max{};
public:
virtual void reset() override {
_max = {};
}
virtual opt_bytes compute(cql_serialization_format sf) override {
if (!_max) {
return {};
}
return data_type_for<Type>()->decompose(data_value(Type{*_max}));
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
if (!values[0]) {
return;
}
auto val = value_cast<typename aggregate_type_for<Type>::type>(data_type_for<Type>()->deserialize(*values[0]));
if (!_max) {
_max = val;
} else {
_max = max_wrapper(*_max, val);
}
}
};
/// The same as `impl_max_function_for' but without knowledge of `Type'.
class impl_max_dynamic_function final : public aggregate_function::aggregate {
opt_bytes _max;
public:
virtual void reset() override {
_max = {};
}
virtual opt_bytes compute(cql_serialization_format sf) override {
return _max.value_or(bytes{});
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
if (!values[0]) {
return;
}
const auto val = *values[0];
if (!_max || *_max < val) {
_max = val;
}
}
};
template <typename Type>
class max_function_for final : public native_aggregate_function {
public:
max_function_for() : native_aggregate_function("max", data_type_for<Type>(), { data_type_for<Type>() }) {}
virtual std::unique_ptr<aggregate> new_aggregate() override {
return std::make_unique<impl_max_function_for<Type>>();
}
};
class max_dynamic_function final : public native_aggregate_function {
public:
max_dynamic_function(data_type io_type) : native_aggregate_function("max", io_type, { io_type }) {}
virtual std::unique_ptr<aggregate> new_aggregate() override {
return std::make_unique<impl_max_dynamic_function>();
}
};
/**
* Creates a MAX function for the specified type.
*
* @param inputType the function input and output type
* @return a MAX function for the specified type.
*/
template <typename Type>
static
shared_ptr<aggregate_function>
make_max_function() {
return make_shared<max_function_for<Type>>();
}
template <typename Type>
const Type& min_wrapper(const Type& t1, const Type& t2) {
using std::min;
return min(t1, t2);
}
inline const net::inet_address& min_wrapper(const net::inet_address& t1, const net::inet_address& t2) {
using family = seastar::net::inet_address::family;
const size_t len =
(t1.in_family() == family::INET || t2.in_family() == family::INET)
? sizeof(::in_addr) : sizeof(::in6_addr);
return std::memcmp(t1.data(), t2.data(), len) <= 0 ? t1 : t2;
}
template <typename Type>
class impl_min_function_for final : public aggregate_function::aggregate {
std::optional<typename aggregate_type_for<Type>::type> _min{};
public:
virtual void reset() override {
_min = {};
}
virtual opt_bytes compute(cql_serialization_format sf) override {
if (!_min) {
return {};
}
return data_type_for<Type>()->decompose(data_value(Type{*_min}));
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
if (!values[0]) {
return;
}
auto val = value_cast<typename aggregate_type_for<Type>::type>(data_type_for<Type>()->deserialize(*values[0]));
if (!_min) {
_min = val;
} else {
_min = min_wrapper(*_min, val);
}
}
};
/// The same as `impl_min_function_for' but without knowledge of `Type'.
class impl_min_dynamic_function final : public aggregate_function::aggregate {
opt_bytes _min;
public:
virtual void reset() override {
_min = {};
}
virtual opt_bytes compute(cql_serialization_format sf) override {
return _min.value_or(bytes{});
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
if (!values[0]) {
return;
}
const auto val = *values[0];
if (!_min || val < *_min) {
_min = val;
}
}
};
template <typename Type>
class min_function_for final : public native_aggregate_function {
public:
min_function_for() : native_aggregate_function("min", data_type_for<Type>(), { data_type_for<Type>() }) {}
virtual std::unique_ptr<aggregate> new_aggregate() override {
return std::make_unique<impl_min_function_for<Type>>();
}
};
class min_dynamic_function final : public native_aggregate_function {
public:
min_dynamic_function(data_type io_type) : native_aggregate_function("min", io_type, { io_type }) {}
virtual std::unique_ptr<aggregate> new_aggregate() override {
return std::make_unique<impl_min_dynamic_function>();
}
};
/**
* Creates a MIN function for the specified type.
*
* @param inputType the function input and output type
* @return a MIN function for the specified type.
*/
template <typename Type>
static
shared_ptr<aggregate_function>
make_min_function() {
return make_shared<min_function_for<Type>>();
}
template <typename Type>
class impl_count_function_for final : public aggregate_function::aggregate {
int64_t _count = 0;
public:
virtual void reset() override {
_count = 0;
}
virtual opt_bytes compute(cql_serialization_format sf) override {
return long_type->decompose(_count);
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
if (!values[0]) {
return;
}
++_count;
}
};
template <typename Type>
class count_function_for final : public native_aggregate_function {
public:
count_function_for() : native_aggregate_function("count", long_type, { data_type_for<Type>() }) {}
virtual std::unique_ptr<aggregate> new_aggregate() override {
return std::make_unique<impl_count_function_for<Type>>();
}
};
/**
* Creates a COUNT function for the specified type.
*
* @param inputType the function input type
* @return a COUNT function for the specified type.
*/
template <typename Type>
static shared_ptr<aggregate_function> make_count_function() {
return make_shared<count_function_for<Type>>();
}
}
shared_ptr<aggregate_function>
aggregate_fcts::make_count_rows_function() {
return make_shared<count_rows_function>();
}
shared_ptr<aggregate_function>
aggregate_fcts::make_max_dynamic_function(data_type io_type) {
return make_shared<max_dynamic_function>(io_type);
}
shared_ptr<aggregate_function>
aggregate_fcts::make_min_dynamic_function(data_type io_type) {
return make_shared<min_dynamic_function>(io_type);
}
void cql3::functions::add_agg_functions(declared_t& funcs) {
auto declare = [&funcs] (shared_ptr<function> f) { funcs.emplace(f->name(), f); };
declare(make_count_function<int8_t>());
declare(make_max_function<int8_t>());
declare(make_min_function<int8_t>());
declare(make_count_function<int16_t>());
declare(make_max_function<int16_t>());
declare(make_min_function<int16_t>());
declare(make_count_function<int32_t>());
declare(make_max_function<int32_t>());
declare(make_min_function<int32_t>());
declare(make_count_function<int64_t>());
declare(make_max_function<int64_t>());
declare(make_min_function<int64_t>());
declare(make_count_function<boost::multiprecision::cpp_int>());
declare(make_max_function<boost::multiprecision::cpp_int>());
declare(make_min_function<boost::multiprecision::cpp_int>());
declare(make_count_function<big_decimal>());
declare(make_max_function<big_decimal>());
declare(make_min_function<big_decimal>());
declare(make_count_function<float>());
declare(make_max_function<float>());
declare(make_min_function<float>());
declare(make_count_function<double>());
declare(make_max_function<double>());
declare(make_min_function<double>());
declare(make_count_function<sstring>());
declare(make_max_function<sstring>());
declare(make_min_function<sstring>());
declare(make_count_function<ascii_native_type>());
declare(make_max_function<ascii_native_type>());
declare(make_min_function<ascii_native_type>());
declare(make_count_function<simple_date_native_type>());
declare(make_max_function<simple_date_native_type>());
declare(make_min_function<simple_date_native_type>());
declare(make_count_function<db_clock::time_point>());
declare(make_max_function<db_clock::time_point>());
declare(make_min_function<db_clock::time_point>());
declare(make_count_function<timeuuid_native_type>());
declare(make_max_function<timeuuid_native_type>());
declare(make_min_function<timeuuid_native_type>());
declare(make_count_function<time_native_type>());
declare(make_max_function<time_native_type>());
declare(make_min_function<time_native_type>());
declare(make_count_function<utils::UUID>());
declare(make_max_function<utils::UUID>());
declare(make_min_function<utils::UUID>());
declare(make_count_function<bytes>());
declare(make_max_function<bytes>());
declare(make_min_function<bytes>());
declare(make_count_function<bool>());
declare(make_max_function<bool>());
declare(make_min_function<bool>());
declare(make_count_function<net::inet_address>());
declare(make_max_function<net::inet_address>());
declare(make_min_function<net::inet_address>());
// FIXME: more count/min/max
declare(make_sum_function<int8_t>());
declare(make_sum_function<int16_t>());
declare(make_sum_function<int32_t>());
declare(make_sum_function<int64_t>());
declare(make_sum_function<float>());
declare(make_sum_function<double>());
declare(make_sum_function<boost::multiprecision::cpp_int>());
declare(make_sum_function<big_decimal>());
declare(make_avg_function<int8_t>());
declare(make_avg_function<int16_t>());
declare(make_avg_function<int32_t>());
declare(make_avg_function<int64_t>());
declare(make_avg_function<float>());
declare(make_avg_function<double>());
declare(make_avg_function<boost::multiprecision::cpp_int>());
declare(make_avg_function<big_decimal>());
}

View File

@@ -41,348 +41,28 @@
#pragma once
#include "utils/big_decimal.hh"
#include "aggregate_function.hh"
#include "native_aggregate_function.hh"
namespace cql3 {
namespace functions {
/**
* Factory methods for aggregate functions.
*/
/// Factory methods for aggregate functions.
namespace aggregate_fcts {
class impl_count_function : public aggregate_function::aggregate {
int64_t _count;
public:
virtual void reset() override {
_count = 0;
}
virtual opt_bytes compute(cql_serialization_format sf) override {
return long_type->decompose(_count);
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
++_count;
}
};
static const sstring COUNT_ROWS_FUNCTION_NAME = "countRows";
class count_rows_function final : public native_aggregate_function {
public:
count_rows_function() : native_aggregate_function(COUNT_ROWS_FUNCTION_NAME, long_type, {}) {}
virtual std::unique_ptr<aggregate> new_aggregate() override {
return std::make_unique<impl_count_function>();
}
virtual sstring column_name(const std::vector<sstring>& column_names) override {
return "count";
}
};
/**
* The function used to count the number of rows of a result set. This function is called when COUNT(*) or COUNT(1)
* is specified.
*/
inline
/// The function used to count the number of rows of a result set. This function is called when COUNT(*) or COUNT(1)
/// is specified.
shared_ptr<aggregate_function>
make_count_rows_function() {
return make_shared<count_rows_function>();
}
make_count_rows_function();
template <typename Type>
class impl_sum_function_for final : public aggregate_function::aggregate {
Type _sum{};
public:
virtual void reset() override {
_sum = {};
}
virtual opt_bytes compute(cql_serialization_format sf) override {
return data_type_for<Type>()->decompose(_sum);
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
if (!values[0]) {
return;
}
_sum += value_cast<Type>(data_type_for<Type>()->deserialize(*values[0]));
}
};
template <typename Type>
class sum_function_for final : public native_aggregate_function {
public:
sum_function_for() : native_aggregate_function("sum", data_type_for<Type>(), { data_type_for<Type>() }) {}
virtual std::unique_ptr<aggregate> new_aggregate() override {
return std::make_unique<impl_sum_function_for<Type>>();
}
};
template <typename Type>
inline
/// The same as `make_max_function()' but with type provided in runtime.
shared_ptr<aggregate_function>
make_sum_function() {
return make_shared<sum_function_for<Type>>();
}
make_max_dynamic_function(data_type io_type);
template <typename Type>
class impl_div_for_avg {
public:
static Type div(const Type& x, const int64_t y) {
return x/y;
}
};
template <>
class impl_div_for_avg<big_decimal> {
public:
static big_decimal div(const big_decimal& x, const int64_t y) {
return x.div(y, big_decimal::rounding_mode::HALF_EVEN);
}
};
// We need a wider accumulator for average, since summing the inputs can overflow
// the input type
template <typename T>
struct accumulator_for;
template <>
struct accumulator_for<int8_t> {
using type = __int128;
};
template <>
struct accumulator_for<int16_t> {
using type = __int128;
};
template <>
struct accumulator_for<int32_t> {
using type = __int128;
};
template <>
struct accumulator_for<int64_t> {
using type = __int128;
};
template <>
struct accumulator_for<float> {
using type = float;
};
template <>
struct accumulator_for<double> {
using type = double;
};
template <>
struct accumulator_for<boost::multiprecision::cpp_int> {
using type = boost::multiprecision::cpp_int;
};
template <>
struct accumulator_for<big_decimal> {
using type = big_decimal;
};
template <typename Type>
class impl_avg_function_for final : public aggregate_function::aggregate {
typename accumulator_for<Type>::type _sum{};
int64_t _count = 0;
public:
virtual void reset() override {
_sum = {};
_count = 0;
}
virtual opt_bytes compute(cql_serialization_format sf) override {
Type ret{};
if (_count) {
ret = impl_div_for_avg<Type>::div(_sum, _count);
}
return data_type_for<Type>()->decompose(ret);
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
if (!values[0]) {
return;
}
++_count;
_sum += value_cast<Type>(data_type_for<Type>()->deserialize(*values[0]));
}
};
template <typename Type>
class avg_function_for final : public native_aggregate_function {
public:
avg_function_for() : native_aggregate_function("avg", data_type_for<Type>(), { data_type_for<Type>() }) {}
virtual std::unique_ptr<aggregate> new_aggregate() override {
return std::make_unique<impl_avg_function_for<Type>>();
}
};
template <typename Type>
inline
/// The same as `make_min_function()' but with type provided in runtime.
shared_ptr<aggregate_function>
make_avg_function() {
return make_shared<avg_function_for<Type>>();
}
template <typename T>
struct aggregate_type_for {
using type = T;
};
template<>
struct aggregate_type_for<ascii_native_type> {
using type = ascii_native_type::primary_type;
};
template<>
struct aggregate_type_for<simple_date_native_type> {
using type = simple_date_native_type::primary_type;
};
template<>
struct aggregate_type_for<timeuuid_native_type> {
using type = timeuuid_native_type::primary_type;
};
template <typename Type>
class impl_max_function_for final : public aggregate_function::aggregate {
std::optional<typename aggregate_type_for<Type>::type> _max{};
public:
virtual void reset() override {
_max = {};
}
virtual opt_bytes compute(cql_serialization_format sf) override {
if (!_max) {
return {};
}
return data_type_for<Type>()->decompose(data_value(Type{*_max}));
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
if (!values[0]) {
return;
}
auto val = value_cast<typename aggregate_type_for<Type>::type>(data_type_for<Type>()->deserialize(*values[0]));
if (!_max) {
_max = val;
} else {
_max = std::max(*_max, val);
}
}
};
template <typename Type>
class max_function_for final : public native_aggregate_function {
public:
max_function_for() : native_aggregate_function("max", data_type_for<Type>(), { data_type_for<Type>() }) {}
virtual std::unique_ptr<aggregate> new_aggregate() override {
return std::make_unique<impl_max_function_for<Type>>();
}
};
/**
* Creates a MAX function for the specified type.
*
* @param inputType the function input and output type
* @return a MAX function for the specified type.
*/
template <typename Type>
shared_ptr<aggregate_function>
make_max_function() {
return make_shared<max_function_for<Type>>();
}
template <typename Type>
class impl_min_function_for final : public aggregate_function::aggregate {
std::optional<typename aggregate_type_for<Type>::type> _min{};
public:
virtual void reset() override {
_min = {};
}
virtual opt_bytes compute(cql_serialization_format sf) override {
if (!_min) {
return {};
}
return data_type_for<Type>()->decompose(data_value(Type{*_min}));
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
if (!values[0]) {
return;
}
auto val = value_cast<typename aggregate_type_for<Type>::type>(data_type_for<Type>()->deserialize(*values[0]));
if (!_min) {
_min = val;
} else {
_min = std::min(*_min, val);
}
}
};
template <typename Type>
class min_function_for final : public native_aggregate_function {
public:
min_function_for() : native_aggregate_function("min", data_type_for<Type>(), { data_type_for<Type>() }) {}
virtual std::unique_ptr<aggregate> new_aggregate() override {
return std::make_unique<impl_min_function_for<Type>>();
}
};
/**
* Creates a MIN function for the specified type.
*
* @param inputType the function input and output type
* @return a MIN function for the specified type.
*/
template <typename Type>
shared_ptr<aggregate_function>
make_min_function() {
return make_shared<min_function_for<Type>>();
}
template <typename Type>
class impl_count_function_for final : public aggregate_function::aggregate {
int64_t _count = 0;
public:
virtual void reset() override {
_count = 0;
}
virtual opt_bytes compute(cql_serialization_format sf) override {
return long_type->decompose(_count);
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
if (!values[0]) {
return;
}
++_count;
}
};
template <typename Type>
class count_function_for final : public native_aggregate_function {
public:
count_function_for() : native_aggregate_function("count", long_type, { data_type_for<Type>() }) {}
virtual std::unique_ptr<aggregate> new_aggregate() override {
return std::make_unique<impl_count_function_for<Type>>();
}
};
/**
* Creates a COUNT function for the specified type.
*
* @param inputType the function input type
* @return a COUNT function for the specified type.
*/
template <typename Type>
shared_ptr<aggregate_function>
make_count_function() {
return make_shared<count_function_for<Type>>();
}
make_min_dynamic_function(data_type io_type);
}
}
}

View File

@@ -44,6 +44,7 @@
#include "cql3/functions/function.hh"
#include "cql3/functions/scalar_function.hh"
#include "cql3/cql3_type.hh"
#include "cql3/type_json.hh"
#include "bytes_ostream.hh"
#include "types.hh"
@@ -73,6 +74,8 @@ public:
: _selector_names(std::move(selector_names)), _selector_types(std::move(selector_types)) {
}
virtual bool requires_thread() const;
virtual bytes_opt execute(cql_serialization_format sf, const std::vector<bytes_opt>& parameters) override {
bytes_ostream encoded_row;
encoded_row.write("{", 1);
@@ -90,7 +93,7 @@ public:
encoded_row.write("\\\"", 2);
}
encoded_row.write("\": ", 3);
sstring row_sstring = _selector_types[i]->to_json_string(parameters[i]);
sstring row_sstring = to_json_string(*_selector_types[i], parameters[i]);
encoded_row.write(row_sstring.c_str(), row_sstring.size());
}
encoded_row.write("}", 1);
@@ -110,15 +113,15 @@ public:
return utf8_type;
}
virtual bool is_pure() override {
virtual bool is_pure() const override {
return true;
}
virtual bool is_native() override {
virtual bool is_native() const override {
return true;
}
virtual bool is_aggregate() override {
virtual bool is_aggregate() const override {
// Aggregates of aggregates are currently not supported, but JSON handles them
return false;
}
@@ -137,15 +140,15 @@ public:
os << ") -> " << utf8_type->as_cql3_type().to_string();
}
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) override {
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const override {
return false;
}
virtual bool has_reference_to(function& f) override {
virtual bool has_reference_to(function& f) const override {
return false;
}
virtual sstring column_name(const std::vector<sstring>& column_names) override {
virtual sstring column_name(const std::vector<sstring>& column_names) const override {
return "[json]";
}

View File

@@ -20,7 +20,11 @@
*/
#include "castas_fcts.hh"
#include "concrete_types.hh"
#include "utils/UUID_gen.hh"
#include "cql3/functions/native_scalar_function.hh"
#include "utils/date.h"
#include <boost/date_time/posix_time/posix_time.hpp>
namespace cql3 {
namespace functions {
@@ -30,7 +34,7 @@ namespace {
using bytes_opt = std::optional<bytes>;
class castas_function_for : public cql3::functions::native_scalar_function {
castas_fctn _func;
cql3::functions::castas_fctn _func;
public:
castas_function_for(data_type to_type,
data_type from_type,
@@ -38,7 +42,7 @@ public:
: native_scalar_function("castas" + to_type->as_cql3_type().to_string(), to_type, {from_type})
, _func(func) {
}
virtual bool is_pure() override {
virtual bool is_pure() const override {
return true;
}
virtual void print(std::ostream& os) const override {
@@ -64,6 +68,289 @@ shared_ptr<function> make_castas_function(data_type to_type, data_type from_type
} /* Anonymous Namespace */
/*
* Support for CAST(. AS .) functions.
*/
namespace {
using bytes_opt = std::optional<bytes>;
template<typename ToType, typename FromType>
std::function<data_value(data_value)> make_castas_fctn_simple() {
return [](data_value from) -> data_value {
auto val_from = value_cast<FromType>(from);
return static_cast<ToType>(val_from);
};
}
template<typename ToType>
std::function<data_value(data_value)> make_castas_fctn_from_decimal_to_float() {
return [](data_value from) -> data_value {
auto val_from = value_cast<big_decimal>(from);
boost::multiprecision::cpp_int ten(10);
boost::multiprecision::cpp_rational r = val_from.unscaled_value();
r /= boost::multiprecision::pow(ten, val_from.scale());
return static_cast<ToType>(r);
};
}
static boost::multiprecision::cpp_int from_decimal_to_cppint(const data_value& from) {
const auto& val_from = value_cast<big_decimal>(from);
boost::multiprecision::cpp_int ten(10);
return val_from.unscaled_value() / boost::multiprecision::pow(ten, val_from.scale());
}
template<typename ToType>
std::function<data_value(data_value)> make_castas_fctn_from_varint_to_integer() {
return [](data_value from) -> data_value {
const auto& varint = value_cast<boost::multiprecision::cpp_int>(from);
return static_cast<ToType>(from_varint_to_integer(varint));
};
}
template<typename ToType>
std::function<data_value(data_value)> make_castas_fctn_from_decimal_to_integer() {
return [](data_value from) -> data_value {
auto varint = from_decimal_to_cppint(from);
return static_cast<ToType>(from_varint_to_integer(varint));
};
}
std::function<data_value(data_value)> make_castas_fctn_from_decimal_to_varint() {
return [](data_value from) -> data_value {
return from_decimal_to_cppint(from);
};
}
template<typename FromType>
std::function<data_value(data_value)> make_castas_fctn_from_integer_to_decimal() {
return [](data_value from) -> data_value {
auto val_from = value_cast<FromType>(from);
return big_decimal(1, 10*static_cast<boost::multiprecision::cpp_int>(val_from));
};
}
template<typename FromType>
std::function<data_value(data_value)> make_castas_fctn_from_float_to_decimal() {
return [](data_value from) -> data_value {
auto val_from = value_cast<FromType>(from);
return big_decimal(boost::lexical_cast<std::string>(val_from));
};
}
template<typename FromType>
std::function<data_value(data_value)> make_castas_fctn_to_string() {
return [](data_value from) -> data_value {
return to_sstring(value_cast<FromType>(from));
};
}
std::function<data_value(data_value)> make_castas_fctn_from_varint_to_string() {
return [](data_value from) -> data_value {
return to_sstring(value_cast<boost::multiprecision::cpp_int>(from).str());
};
}
std::function<data_value(data_value)> make_castas_fctn_from_decimal_to_string() {
return [](data_value from) -> data_value {
return value_cast<big_decimal>(from).to_string();
};
}
db_clock::time_point millis_to_time_point(const int64_t millis) {
return db_clock::time_point{std::chrono::milliseconds(millis)};
}
simple_date_native_type time_point_to_date(const db_clock::time_point& tp) {
const auto epoch = boost::posix_time::from_time_t(0);
auto timestamp = tp.time_since_epoch().count();
auto time = boost::posix_time::from_time_t(0) + boost::posix_time::milliseconds(timestamp);
const auto diff = time.date() - epoch.date();
return simple_date_native_type{uint32_t(diff.days() + (1UL<<31))};
}
db_clock::time_point date_to_time_point(const uint32_t date) {
const auto epoch = boost::posix_time::from_time_t(0);
const auto target_date = epoch + boost::gregorian::days(int64_t(date) - (1UL<<31));
boost::posix_time::time_duration duration = target_date - epoch;
const auto millis = std::chrono::milliseconds(duration.total_milliseconds());
return db_clock::time_point(std::chrono::duration_cast<db_clock::duration>(millis));
}
std::function<data_value(data_value)> make_castas_fctn_from_timestamp_to_date() {
return [](data_value from) -> data_value {
const auto val_from = value_cast<db_clock::time_point>(from);
return time_point_to_date(val_from);
};
}
std::function<data_value(data_value)> make_castas_fctn_from_date_to_timestamp() {
return [](data_value from) -> data_value {
const auto val_from = value_cast<uint32_t>(from);
return date_to_time_point(val_from);
};
}
std::function<data_value(data_value)> make_castas_fctn_from_timeuuid_to_timestamp() {
return [](data_value from) -> data_value {
const auto val_from = value_cast<utils::UUID>(from);
return db_clock::time_point{db_clock::duration{utils::UUID_gen::unix_timestamp(val_from)}};
};
}
std::function<data_value(data_value)> make_castas_fctn_from_timeuuid_to_date() {
return [](data_value from) -> data_value {
const auto val_from = value_cast<utils::UUID>(from);
return time_point_to_date(millis_to_time_point(utils::UUID_gen::unix_timestamp(val_from)));
};
}
static std::function<data_value(data_value)> make_castas_fctn_from_dv_to_string() {
return [](data_value from) -> data_value {
return from.type()->to_string_impl(from);
};
}
// FIXME: Add conversions for counters, after they are fully implemented...
// Map <ToType, FromType> -> castas_fctn
using castas_fctn_key = std::pair<data_type, data_type>;
struct castas_fctn_hash {
std::size_t operator()(const castas_fctn_key& x) const noexcept {
return boost::hash_value(x);
}
};
using castas_fctns_map = std::unordered_map<castas_fctn_key, castas_fctn, castas_fctn_hash>;
// List of supported castas functions...
thread_local castas_fctns_map castas_fctns {
{ {byte_type, byte_type}, make_castas_fctn_simple<int8_t, int8_t>() },
{ {byte_type, short_type}, make_castas_fctn_simple<int8_t, int16_t>() },
{ {byte_type, int32_type}, make_castas_fctn_simple<int8_t, int32_t>() },
{ {byte_type, long_type}, make_castas_fctn_simple<int8_t, int64_t>() },
{ {byte_type, float_type}, make_castas_fctn_simple<int8_t, float>() },
{ {byte_type, double_type}, make_castas_fctn_simple<int8_t, double>() },
{ {byte_type, varint_type}, make_castas_fctn_from_varint_to_integer<int8_t>() },
{ {byte_type, decimal_type}, make_castas_fctn_from_decimal_to_integer<int8_t>() },
{ {short_type, byte_type}, make_castas_fctn_simple<int16_t, int8_t>() },
{ {short_type, short_type}, make_castas_fctn_simple<int16_t, int16_t>() },
{ {short_type, int32_type}, make_castas_fctn_simple<int16_t, int32_t>() },
{ {short_type, long_type}, make_castas_fctn_simple<int16_t, int64_t>() },
{ {short_type, float_type}, make_castas_fctn_simple<int16_t, float>() },
{ {short_type, double_type}, make_castas_fctn_simple<int16_t, double>() },
{ {short_type, varint_type}, make_castas_fctn_from_varint_to_integer<int16_t>() },
{ {short_type, decimal_type}, make_castas_fctn_from_decimal_to_integer<int16_t>() },
{ {int32_type, byte_type}, make_castas_fctn_simple<int32_t, int8_t>() },
{ {int32_type, short_type}, make_castas_fctn_simple<int32_t, int16_t>() },
{ {int32_type, int32_type}, make_castas_fctn_simple<int32_t, int32_t>() },
{ {int32_type, long_type}, make_castas_fctn_simple<int32_t, int64_t>() },
{ {int32_type, float_type}, make_castas_fctn_simple<int32_t, float>() },
{ {int32_type, double_type}, make_castas_fctn_simple<int32_t, double>() },
{ {int32_type, varint_type}, make_castas_fctn_from_varint_to_integer<int32_t>() },
{ {int32_type, decimal_type}, make_castas_fctn_from_decimal_to_integer<int32_t>() },
{ {long_type, byte_type}, make_castas_fctn_simple<int64_t, int8_t>() },
{ {long_type, short_type}, make_castas_fctn_simple<int64_t, int16_t>() },
{ {long_type, int32_type}, make_castas_fctn_simple<int64_t, int32_t>() },
{ {long_type, long_type}, make_castas_fctn_simple<int64_t, int64_t>() },
{ {long_type, float_type}, make_castas_fctn_simple<int64_t, float>() },
{ {long_type, double_type}, make_castas_fctn_simple<int64_t, double>() },
{ {long_type, varint_type}, make_castas_fctn_from_varint_to_integer<int64_t>() },
{ {long_type, decimal_type}, make_castas_fctn_from_decimal_to_integer<int64_t>() },
{ {float_type, byte_type}, make_castas_fctn_simple<float, int8_t>() },
{ {float_type, short_type}, make_castas_fctn_simple<float, int16_t>() },
{ {float_type, int32_type}, make_castas_fctn_simple<float, int32_t>() },
{ {float_type, long_type}, make_castas_fctn_simple<float, int64_t>() },
{ {float_type, float_type}, make_castas_fctn_simple<float, float>() },
{ {float_type, double_type}, make_castas_fctn_simple<float, double>() },
{ {float_type, varint_type}, make_castas_fctn_simple<float, boost::multiprecision::cpp_int>() },
{ {float_type, decimal_type}, make_castas_fctn_from_decimal_to_float<float>() },
{ {double_type, byte_type}, make_castas_fctn_simple<double, int8_t>() },
{ {double_type, short_type}, make_castas_fctn_simple<double, int16_t>() },
{ {double_type, int32_type}, make_castas_fctn_simple<double, int32_t>() },
{ {double_type, long_type}, make_castas_fctn_simple<double, int64_t>() },
{ {double_type, float_type}, make_castas_fctn_simple<double, float>() },
{ {double_type, double_type}, make_castas_fctn_simple<double, double>() },
{ {double_type, varint_type}, make_castas_fctn_simple<double, boost::multiprecision::cpp_int>() },
{ {double_type, decimal_type}, make_castas_fctn_from_decimal_to_float<double>() },
{ {varint_type, byte_type}, make_castas_fctn_simple<boost::multiprecision::cpp_int, int8_t>() },
{ {varint_type, short_type}, make_castas_fctn_simple<boost::multiprecision::cpp_int, int16_t>() },
{ {varint_type, int32_type}, make_castas_fctn_simple<boost::multiprecision::cpp_int, int32_t>() },
{ {varint_type, long_type}, make_castas_fctn_simple<boost::multiprecision::cpp_int, int64_t>() },
{ {varint_type, float_type}, make_castas_fctn_simple<boost::multiprecision::cpp_int, float>() },
{ {varint_type, double_type}, make_castas_fctn_simple<boost::multiprecision::cpp_int, double>() },
{ {varint_type, varint_type}, make_castas_fctn_simple<boost::multiprecision::cpp_int, boost::multiprecision::cpp_int>() },
{ {varint_type, decimal_type}, make_castas_fctn_from_decimal_to_varint() },
{ {decimal_type, byte_type}, make_castas_fctn_from_integer_to_decimal<int8_t>() },
{ {decimal_type, short_type}, make_castas_fctn_from_integer_to_decimal<int16_t>() },
{ {decimal_type, int32_type}, make_castas_fctn_from_integer_to_decimal<int32_t>() },
{ {decimal_type, long_type}, make_castas_fctn_from_integer_to_decimal<int64_t>() },
{ {decimal_type, float_type}, make_castas_fctn_from_float_to_decimal<float>() },
{ {decimal_type, double_type}, make_castas_fctn_from_float_to_decimal<double>() },
{ {decimal_type, varint_type}, make_castas_fctn_from_integer_to_decimal<boost::multiprecision::cpp_int>() },
{ {decimal_type, decimal_type}, make_castas_fctn_simple<big_decimal, big_decimal>() },
{ {ascii_type, byte_type}, make_castas_fctn_to_string<int8_t>() },
{ {ascii_type, short_type}, make_castas_fctn_to_string<int16_t>() },
{ {ascii_type, int32_type}, make_castas_fctn_to_string<int32_t>() },
{ {ascii_type, long_type}, make_castas_fctn_to_string<int64_t>() },
{ {ascii_type, float_type}, make_castas_fctn_to_string<float>() },
{ {ascii_type, double_type}, make_castas_fctn_to_string<double>() },
{ {ascii_type, varint_type}, make_castas_fctn_from_varint_to_string() },
{ {ascii_type, decimal_type}, make_castas_fctn_from_decimal_to_string() },
{ {utf8_type, byte_type}, make_castas_fctn_to_string<int8_t>() },
{ {utf8_type, short_type}, make_castas_fctn_to_string<int16_t>() },
{ {utf8_type, int32_type}, make_castas_fctn_to_string<int32_t>() },
{ {utf8_type, long_type}, make_castas_fctn_to_string<int64_t>() },
{ {utf8_type, float_type}, make_castas_fctn_to_string<float>() },
{ {utf8_type, double_type}, make_castas_fctn_to_string<double>() },
{ {utf8_type, varint_type}, make_castas_fctn_from_varint_to_string() },
{ {utf8_type, decimal_type}, make_castas_fctn_from_decimal_to_string() },
{ {simple_date_type, timestamp_type}, make_castas_fctn_from_timestamp_to_date() },
{ {simple_date_type, timeuuid_type}, make_castas_fctn_from_timeuuid_to_date() },
{ {timestamp_type, simple_date_type}, make_castas_fctn_from_date_to_timestamp() },
{ {timestamp_type, timeuuid_type}, make_castas_fctn_from_timeuuid_to_timestamp() },
{ {ascii_type, timestamp_type}, make_castas_fctn_from_dv_to_string() },
{ {ascii_type, simple_date_type}, make_castas_fctn_from_dv_to_string() },
{ {ascii_type, time_type}, make_castas_fctn_from_dv_to_string() },
{ {ascii_type, timeuuid_type}, make_castas_fctn_from_dv_to_string() },
{ {ascii_type, uuid_type}, make_castas_fctn_from_dv_to_string() },
{ {ascii_type, boolean_type}, make_castas_fctn_from_dv_to_string() },
{ {ascii_type, inet_addr_type}, make_castas_fctn_from_dv_to_string() },
{ {ascii_type, ascii_type}, make_castas_fctn_simple<sstring, sstring>() },
{ {utf8_type, timestamp_type}, make_castas_fctn_from_dv_to_string() },
{ {utf8_type, simple_date_type}, make_castas_fctn_from_dv_to_string() },
{ {utf8_type, time_type}, make_castas_fctn_from_dv_to_string() },
{ {utf8_type, timeuuid_type}, make_castas_fctn_from_dv_to_string() },
{ {utf8_type, uuid_type}, make_castas_fctn_from_dv_to_string() },
{ {utf8_type, boolean_type}, make_castas_fctn_from_dv_to_string() },
{ {utf8_type, inet_addr_type}, make_castas_fctn_from_dv_to_string() },
{ {utf8_type, ascii_type}, make_castas_fctn_simple<sstring, sstring>() },
{ {utf8_type, utf8_type}, make_castas_fctn_simple<sstring, sstring>() },
};
} /* Anonymous Namespace */
castas_fctn get_castas_fctn(data_type to_type, data_type from_type) {
auto it_candidate = castas_fctns.find(castas_fctn_key{to_type, from_type});
if (it_candidate == castas_fctns.end()) {
throw exceptions::invalid_request_exception(format("{} cannot be cast to {}", from_type->name(), to_type->name()));
}
return it_candidate->second;
}
shared_ptr<function> castas_functions::get(data_type to_type, const std::vector<shared_ptr<cql3::selection::selector>>& provided_args, schema_ptr s) {
if (provided_args.size() != 1) {
throw exceptions::invalid_request_exception("Invalid CAST expression");

View File

@@ -54,6 +54,14 @@
namespace cql3 {
namespace functions {
/*
* Support for CAST(. AS .) functions.
*/
using castas_fctn = std::function<data_value(data_value)>;
castas_fctn get_castas_fctn(data_type to_type, data_type from_type);
class castas_functions {
public:
static shared_ptr<function> get(data_type to_type, const std::vector<shared_ptr<cql3::selection::selector>>& provided_args, schema_ptr s);

View File

@@ -62,25 +62,27 @@ public:
*
* @return <code>true</code> if the function is a pure function, <code>false</code> otherwise.
*/
virtual bool is_pure() = 0;
virtual bool is_pure() const = 0;
/**
* Checks whether the function is a native/hard coded one or not.
*
* @return <code>true</code> if the function is a native/hard coded one, <code>false</code> otherwise.
*/
virtual bool is_native() = 0;
virtual bool is_native() const = 0;
virtual bool requires_thread() const = 0;
/**
* Checks whether the function is an aggregate function or not.
*
* @return <code>true</code> if the function is an aggregate function, <code>false</code> otherwise.
*/
virtual bool is_aggregate() = 0;
virtual bool is_aggregate() const = 0;
virtual void print(std::ostream& os) const = 0;
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) = 0;
virtual bool has_reference_to(function& f) = 0;
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const = 0;
virtual bool has_reference_to(function& f) const = 0;
/**
* Returns the name of the function to use within a ResultSet.
@@ -88,7 +90,7 @@ public:
* @param column_names the names of the columns used to call the function
* @return the name of the function to use within a ResultSet
*/
virtual sstring column_name(const std::vector<sstring>& column_names) = 0;
virtual sstring column_name(const std::vector<sstring>& column_names) const = 0;
friend class function_call;
friend std::ostream& operator<<(std::ostream& os, const function& f);

View File

@@ -57,7 +57,7 @@ public:
: _fun(std::move(fun)), _terms(std::move(terms)) {
}
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const override;
virtual void collect_marker_specification(shared_ptr<variable_specifications> bound_names) override;
virtual void collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) override;
virtual shared_ptr<terminal> bind(const query_options& options) override;
virtual cql3::raw_value_view bind_and_get(const query_options& options) override;
private:

View File

@@ -28,18 +28,42 @@
#include "cql3/lists.hh"
#include "cql3/constants.hh"
#include "cql3/user_types.hh"
#include "cql3/type_json.hh"
#include "database.hh"
#include "types/map.hh"
#include "types/set.hh"
#include "types/list.hh"
#include "types/user.hh"
#include "concrete_types.hh"
#include "as_json_function.hh"
namespace std {
std::ostream& operator<<(std::ostream& os, const std::vector<data_type>& arg_types) {
for (size_t i = 0; i < arg_types.size(); ++i) {
if (i > 0) {
os << ", ";
}
os << arg_types[i]->as_cql3_type().to_string();
}
return os;
}
}
namespace cql3 {
namespace functions {
static logging::logger log("cql3_fuctions");
bool abstract_function::requires_thread() const { return false; }
bool as_json_function::requires_thread() const { return false; }
thread_local std::unordered_multimap<function_name, shared_ptr<function>> functions::_declared = init();
void functions::clear_functions() {
functions::_declared = init();
}
std::unordered_multimap<function_name, shared_ptr<function>>
functions::init() {
std::unordered_multimap<function_name, shared_ptr<function>> ret;
@@ -78,90 +102,10 @@ functions::init() {
declare(make_to_blob_function(type.get_type()));
declare(make_from_blob_function(type.get_type()));
}
declare(aggregate_fcts::make_count_function<int8_t>());
declare(aggregate_fcts::make_max_function<int8_t>());
declare(aggregate_fcts::make_min_function<int8_t>());
declare(aggregate_fcts::make_count_function<int16_t>());
declare(aggregate_fcts::make_max_function<int16_t>());
declare(aggregate_fcts::make_min_function<int16_t>());
declare(aggregate_fcts::make_count_function<int32_t>());
declare(aggregate_fcts::make_max_function<int32_t>());
declare(aggregate_fcts::make_min_function<int32_t>());
declare(aggregate_fcts::make_count_function<int64_t>());
declare(aggregate_fcts::make_max_function<int64_t>());
declare(aggregate_fcts::make_min_function<int64_t>());
declare(aggregate_fcts::make_count_function<boost::multiprecision::cpp_int>());
declare(aggregate_fcts::make_max_function<boost::multiprecision::cpp_int>());
declare(aggregate_fcts::make_min_function<boost::multiprecision::cpp_int>());
declare(aggregate_fcts::make_count_function<big_decimal>());
declare(aggregate_fcts::make_max_function<big_decimal>());
declare(aggregate_fcts::make_min_function<big_decimal>());
declare(aggregate_fcts::make_count_function<float>());
declare(aggregate_fcts::make_max_function<float>());
declare(aggregate_fcts::make_min_function<float>());
declare(aggregate_fcts::make_count_function<double>());
declare(aggregate_fcts::make_max_function<double>());
declare(aggregate_fcts::make_min_function<double>());
declare(aggregate_fcts::make_count_function<sstring>());
declare(aggregate_fcts::make_max_function<sstring>());
declare(aggregate_fcts::make_min_function<sstring>());
declare(aggregate_fcts::make_count_function<ascii_native_type>());
declare(aggregate_fcts::make_max_function<ascii_native_type>());
declare(aggregate_fcts::make_min_function<ascii_native_type>());
declare(aggregate_fcts::make_count_function<simple_date_native_type>());
declare(aggregate_fcts::make_max_function<simple_date_native_type>());
declare(aggregate_fcts::make_min_function<simple_date_native_type>());
declare(aggregate_fcts::make_count_function<db_clock::time_point>());
declare(aggregate_fcts::make_max_function<db_clock::time_point>());
declare(aggregate_fcts::make_min_function<db_clock::time_point>());
declare(aggregate_fcts::make_count_function<timeuuid_native_type>());
declare(aggregate_fcts::make_max_function<timeuuid_native_type>());
declare(aggregate_fcts::make_min_function<timeuuid_native_type>());
declare(aggregate_fcts::make_count_function<utils::UUID>());
declare(aggregate_fcts::make_max_function<utils::UUID>());
declare(aggregate_fcts::make_min_function<utils::UUID>());
declare(aggregate_fcts::make_count_function<bytes>());
declare(aggregate_fcts::make_max_function<bytes>());
declare(aggregate_fcts::make_min_function<bytes>());
declare(aggregate_fcts::make_count_function<bool>());
declare(aggregate_fcts::make_max_function<bool>());
declare(aggregate_fcts::make_min_function<bool>());
// FIXME: more count/min/max
declare(make_varchar_as_blob_fct());
declare(make_blob_as_varchar_fct());
declare(aggregate_fcts::make_sum_function<int8_t>());
declare(aggregate_fcts::make_sum_function<int16_t>());
declare(aggregate_fcts::make_sum_function<int32_t>());
declare(aggregate_fcts::make_sum_function<int64_t>());
declare(aggregate_fcts::make_sum_function<float>());
declare(aggregate_fcts::make_sum_function<double>());
declare(aggregate_fcts::make_sum_function<boost::multiprecision::cpp_int>());
declare(aggregate_fcts::make_sum_function<big_decimal>());
declare(aggregate_fcts::make_avg_function<int8_t>());
declare(aggregate_fcts::make_avg_function<int16_t>());
declare(aggregate_fcts::make_avg_function<int32_t>());
declare(aggregate_fcts::make_avg_function<int64_t>());
declare(aggregate_fcts::make_avg_function<float>());
declare(aggregate_fcts::make_avg_function<double>());
declare(aggregate_fcts::make_avg_function<boost::multiprecision::cpp_int>());
declare(aggregate_fcts::make_avg_function<big_decimal>());
add_agg_functions(ret);
// also needed for smp:
#if 0
@@ -170,6 +114,33 @@ functions::init() {
return ret;
}
void functions::add_function(shared_ptr<function> func) {
if (find(func->name(), func->arg_types())) {
throw std::logic_error(format("duplicated function {}", func));
}
_declared.emplace(func->name(), func);
}
template <typename F>
void functions::with_udf_iter(const function_name& name, const std::vector<data_type>& arg_types, F&& f) {
auto i = find_iter(name, arg_types);
if (i == _declared.end() || i->second->is_native()) {
log.error("attempted to remove or alter non existent user defined function {}({})", name, arg_types);
return;
}
f(i);
}
void functions::replace_function(shared_ptr<function> func) {
with_udf_iter(func->name(), func->arg_types(), [func] (functions::declared_t::iterator i) {
i->second = std::move(func);
});
}
void functions::remove_function(const function_name& name, const std::vector<data_type>& arg_types) {
with_udf_iter(name, arg_types, [] (functions::declared_t::iterator i) { _declared.erase(i); });
}
shared_ptr<column_specification>
functions::make_arg_spec(const sstring& receiver_ks, const sstring& receiver_cf,
const function& fun, size_t i) {
@@ -191,7 +162,7 @@ shared_ptr<function>
make_to_json_function(data_type t) {
return make_native_scalar_function<true>("tojson", utf8_type, {t},
[t](cql_serialization_format sf, const std::vector<bytes_opt>& parameters) -> bytes_opt {
return utf8_type->decompose(t->to_json_string(parameters[0]));
return utf8_type->decompose(to_json_string(*t, parameters[0]));
});
}
@@ -203,7 +174,7 @@ make_from_json_function(database& db, const sstring& keyspace, data_type t) {
Json::Value json_value = json::to_json_value(utf8_type->to_string(parameters[0].value()));
bytes_opt parsed_json_value;
if (!json_value.isNull()) {
parsed_json_value.emplace(t->from_json_object(json_value, sf));
parsed_json_value.emplace(from_json_object(*t, json_value, sf));
}
return parsed_json_value;
});
@@ -221,6 +192,8 @@ functions::get(database& db,
static const function_name TOKEN_FUNCTION_NAME = function_name::native_function("token");
static const function_name TO_JSON_FUNCTION_NAME = function_name::native_function("tojson");
static const function_name FROM_JSON_FUNCTION_NAME = function_name::native_function("fromjson");
static const function_name MIN_FUNCTION_NAME = function_name::native_function("min");
static const function_name MAX_FUNCTION_NAME = function_name::native_function("max");
if (name.has_keyspace()
? name == TOKEN_FUNCTION_NAME
@@ -253,6 +226,40 @@ functions::get(database& db,
return make_from_json_function(db, keyspace, receiver->type);
}
if (name.has_keyspace()
? name == MIN_FUNCTION_NAME
: name.name == MIN_FUNCTION_NAME.name) {
if (provided_args.size() != 1) {
throw exceptions::invalid_request_exception("min() operates on 1 argument at a time");
}
selection::selector *sp = dynamic_cast<selection::selector*>(provided_args[0].get());
if (!sp) {
throw exceptions::invalid_request_exception("min() is only valid in SELECT clause");
}
const data_type arg_type = sp->get_type();
if (arg_type->is_collection() || arg_type->is_tuple() || arg_type->is_user_type()) {
// `min()' function is created on demand for arguments of compound types.
return aggregate_fcts::make_min_dynamic_function(arg_type);
}
}
if (name.has_keyspace()
? name == MAX_FUNCTION_NAME
: name.name == MAX_FUNCTION_NAME.name) {
if (provided_args.size() != 1) {
throw exceptions::invalid_request_exception("max() operates on 1 argument at a time");
}
selection::selector *sp = dynamic_cast<selection::selector*>(provided_args[0].get());
if (!sp) {
throw exceptions::invalid_request_exception("max() is only valid in SELECT clause");
}
const data_type arg_type = sp->get_type();
if (arg_type->is_collection() || arg_type->is_tuple() || arg_type->is_user_type()) {
// `max()' function is created on demand for arguments of compound types.
return aggregate_fcts::make_max_dynamic_function(arg_type);
}
}
std::vector<shared_ptr<function>> candidates;
auto&& add_declared = [&] (function_name fn) {
auto&& fns = _declared.equal_range(fn);
@@ -310,23 +317,30 @@ functions::get(database& db,
return std::move(compatibles[0]);
}
std::vector<shared_ptr<function>>
boost::iterator_range<functions::declared_t::iterator>
functions::find(const function_name& name) {
auto range = _declared.equal_range(name);
std::vector<shared_ptr<function>> ret;
for (auto i = range.first; i != range.second; ++i) {
ret.push_back(i->second);
assert(name.has_keyspace()); // : "function name not fully qualified";
auto pair = _declared.equal_range(name);
return boost::make_iterator_range(pair.first, pair.second);
}
functions::declared_t::iterator
functions::find_iter(const function_name& name, const std::vector<data_type>& arg_types) {
auto range = find(name);
auto i = std::find_if(range.begin(), range.end(), [&] (const std::pair<const function_name, shared_ptr<function>>& d) {
return type_equals(d.second->arg_types(), arg_types);
});
if (i == range.end()) {
return _declared.end();
}
return ret;
return i;
}
shared_ptr<function>
functions::find(const function_name& name, const std::vector<data_type>& arg_types) {
assert(name.has_keyspace()); // : "function name not fully qualified";
for (auto&& f : find(name)) {
if (type_equals(f->arg_types(), arg_types)) {
return f;
}
auto i = find_iter(name, arg_types);
if (i != _declared.end()) {
return i->second;
}
return {};
}
@@ -396,15 +410,7 @@ functions::match_arguments(database& db, const sstring& keyspace,
bool
functions::type_equals(const std::vector<data_type>& t1, const std::vector<data_type>& t2) {
#if 0
if (t1.size() != t2.size())
return false;
for (int i = 0; i < t1.size(); i ++)
if (!typeEquals(t1.get(i), t2.get(i)))
return false;
return true;
#endif
abort();
return t1 == t2;
}
bool
@@ -413,7 +419,7 @@ function_call::uses_function(const sstring& ks_name, const sstring& function_nam
}
void
function_call::collect_marker_specification(shared_ptr<variable_specifications> bound_names) {
function_call::collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) {
for (auto&& t : _terms) {
t->collect_marker_specification(bound_names);
}

View File

@@ -58,9 +58,12 @@
namespace cql3 {
namespace functions {
using declared_t = std::unordered_multimap<function_name, shared_ptr<function>>;
void add_agg_functions(declared_t& funcs);
class functions {
static thread_local std::unordered_multimap<function_name, shared_ptr<function>> _declared;
using declared_t = cql3::functions::declared_t;
static thread_local declared_t _declared;
private:
static std::unordered_multimap<function_name, shared_ptr<function>> init();
public:
@@ -86,9 +89,17 @@ public:
const std::vector<shared_ptr<assignment_testable>> args(std::begin(provided_args), std::end(provided_args));
return get(db, keyspace, name, args, receiver_ks, receiver_cf, receiver);
}
static std::vector<shared_ptr<function>> find(const function_name& name);
static boost::iterator_range<declared_t::iterator> find(const function_name& name);
static declared_t::iterator find_iter(const function_name& name, const std::vector<data_type>& arg_types);
static shared_ptr<function> find(const function_name& name, const std::vector<data_type>& arg_types);
static void clear_functions();
static void add_function(shared_ptr<function>);
static void replace_function(shared_ptr<function>);
static void remove_function(const function_name& name, const std::vector<data_type>& arg_types);
private:
template <typename F>
static void with_udf_iter(const function_name& name, const std::vector<data_type>& arg_types, F&& f);
// This method and matchArguments are somewhat duplicate, but this method allows us to provide more precise errors in the common
// case where there is no override for a given function. This is thus probably worth the minor code duplication.
static void validate_types(database& db,
@@ -102,50 +113,6 @@ private:
const std::vector<shared_ptr<assignment_testable>>& provided_args,
const sstring& receiver_ks,
const sstring& receiver_cf);
#if 0
// This is *not* thread safe but is only called in SchemaTables that is synchronized.
public static void addFunction(AbstractFunction fun)
{
// We shouldn't get there unless that function don't exist
assert find(fun.name(), fun.argTypes()) == null;
declare(fun);
}
// Same remarks than for addFunction
public static void removeFunction(FunctionName name, List<AbstractType<?>> argsTypes)
{
Function old = find(name, argsTypes);
assert old != null && !old.isNative();
declared.remove(old.name(), old);
}
// Same remarks than for addFunction
public static void replaceFunction(AbstractFunction fun)
{
removeFunction(fun.name(), fun.argTypes());
addFunction(fun);
}
public static List<Function> getReferencesTo(Function old)
{
List<Function> references = new ArrayList<>();
for (Function function : declared.values())
if (function.hasReferenceTo(old))
references.add(function);
return references;
}
public static Collection<Function> all()
{
return declared.values();
}
public static boolean typeEquals(AbstractType<?> t1, AbstractType<?> t2)
{
return t1.asCQL3Type().toString().equals(t2.asCQL3Type().toString());
}
#endif
static bool type_equals(const std::vector<data_type>& t1, const std::vector<data_type>& t2);

View File

@@ -59,7 +59,7 @@ protected:
}
public:
virtual bool is_aggregate() override final {
virtual bool is_aggregate() const override final {
return true;
}
};

View File

@@ -58,11 +58,11 @@ protected:
public:
// Most of our functions are pure, the other ones should override this
virtual bool is_pure() override {
virtual bool is_pure() const override {
return true;
}
virtual bool is_native() override {
virtual bool is_native() const override {
return true;
}
};

View File

@@ -58,7 +58,7 @@ protected:
}
public:
virtual bool is_aggregate() override {
virtual bool is_aggregate() const override {
return false;
}
};
@@ -74,7 +74,7 @@ public:
: native_scalar_function(std::move(name), std::move(return_type), std::move(arg_types))
, _func(std::forward<Func>(func)) {
}
virtual bool is_pure() override {
virtual bool is_pure() const override {
return Pure;
}
virtual bytes_opt execute(cql_serialization_format sf, const std::vector<bytes_opt>& parameters) override {

View File

@@ -41,6 +41,7 @@
#pragma once
#include "castas_fcts.hh"
#include "native_scalar_function.hh"
#include "utils/UUID_gen.hh"
#include <boost/uuid/uuid.hpp>
@@ -61,16 +62,6 @@ make_now_fct() {
});
}
static int64_t get_valid_timestamp(const data_value& ts_obj) {
auto ts = value_cast<db_clock::time_point>(ts_obj);
int64_t ms = ts.time_since_epoch().count();
auto nanos_since = utils::UUID_gen::make_nanos_since(ms);
if (!utils::UUID_gen::is_valid_nanos_since(nanos_since)) {
throw exceptions::server_exception(format("{}: timestamp is out of range. Must be in milliseconds since epoch", ms));
}
return ms;
}
inline
shared_ptr<function>
make_min_timeuuid_fct() {
@@ -84,7 +75,8 @@ make_min_timeuuid_fct() {
if (ts_obj.is_null()) {
return {};
}
auto uuid = utils::UUID_gen::min_time_UUID(get_valid_timestamp(ts_obj));
auto ts = value_cast<db_clock::time_point>(ts_obj);
auto uuid = utils::UUID_gen::min_time_UUID(ts.time_since_epoch().count());
return {timeuuid_type->decompose(uuid)};
});
}
@@ -94,6 +86,7 @@ shared_ptr<function>
make_max_timeuuid_fct() {
return make_native_scalar_function<true>("maxtimeuuid", timeuuid_type, { timestamp_type },
[] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
// FIXME: should values be a vector<optional<bytes>>?
auto& bb = values[0];
if (!bb) {
return {};
@@ -102,22 +95,12 @@ make_max_timeuuid_fct() {
if (ts_obj.is_null()) {
return {};
}
auto uuid = utils::UUID_gen::max_time_UUID(get_valid_timestamp(ts_obj));
auto ts = value_cast<db_clock::time_point>(ts_obj);
auto uuid = utils::UUID_gen::max_time_UUID(ts.time_since_epoch().count());
return {timeuuid_type->decompose(uuid)};
});
}
inline utils::UUID get_valid_timeuuid(bytes raw) {
if (!utils::UUID_gen::is_valid_UUID(raw)) {
throw exceptions::server_exception(format("invalid timeuuid: size={}", raw.size()));
}
auto uuid = utils::UUID_gen::get_UUID(raw);
if (!uuid.is_timestamp()) {
throw exceptions::server_exception(format("{}: Not a timeuuid: version={}", uuid, uuid.version()));
}
return uuid;
}
inline
shared_ptr<function>
make_date_of_fct() {
@@ -128,7 +111,7 @@ make_date_of_fct() {
if (!bb) {
return {};
}
auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(get_valid_timeuuid(*bb))));
auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb))));
return {timestamp_type->decompose(ts)};
});
}
@@ -143,7 +126,7 @@ make_unix_timestamp_of_fct() {
if (!bb) {
return {};
}
return {long_type->decompose(UUID_gen::unix_timestamp(get_valid_timeuuid(*bb)))};
return {long_type->decompose(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb)))};
});
}
@@ -194,7 +177,7 @@ make_timeuuidtodate_fct() {
if (!bb) {
return {};
}
auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(get_valid_timeuuid(*bb))));
auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb))));
auto to_simple_date = get_castas_fctn(simple_date_type, timestamp_type);
return {simple_date_type->decompose(to_simple_date(ts))};
});
@@ -229,7 +212,7 @@ make_timeuuidtotimestamp_fct() {
if (!bb) {
return {};
}
auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(get_valid_timeuuid(*bb))));
auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb))));
return {timestamp_type->decompose(ts)};
});
}
@@ -263,12 +246,14 @@ make_timeuuidtounixtimestamp_fct() {
if (!bb) {
return {};
}
return {long_type->decompose(UUID_gen::unix_timestamp(get_valid_timeuuid(*bb)))};
return {long_type->decompose(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb)))};
});
}
inline bytes time_point_to_long(const data_value& v) {
return data_value(get_valid_timestamp(v)).serialize();
auto since_epoch = value_cast<db_clock::time_point>(v).time_since_epoch();
int64_t ms = std::chrono::duration_cast<std::chrono::milliseconds>(since_epoch).count();
return serialized(ms);
}
inline

View File

@@ -0,0 +1,63 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "user_function.hh"
#include "lua.hh"
namespace cql3 {
namespace functions {
user_function::user_function(function_name name, std::vector<data_type> arg_types, std::vector<sstring> arg_names,
sstring body, sstring language, data_type return_type, bool called_on_null_input, sstring bitcode,
lua::runtime_config cfg)
: abstract_function(std::move(name), std::move(arg_types), std::move(return_type)),
_arg_names(std::move(arg_names)), _body(std::move(body)), _language(std::move(language)),
_called_on_null_input(called_on_null_input), _bitcode(std::move(bitcode)),
_cfg(std::move(cfg)) {}
bool user_function::is_pure() const { return true; }
bool user_function::is_native() const { return false; }
bool user_function::is_aggregate() const { return false; }
bool user_function::requires_thread() const { return true; }
bytes_opt user_function::execute(cql_serialization_format sf, const std::vector<bytes_opt>& parameters) {
const auto& types = arg_types();
if (parameters.size() != types.size()) {
throw std::logic_error("Wrong number of parameters");
}
std::vector<data_value> values;
values.reserve(parameters.size());
for (int i = 0, n = types.size(); i != n; ++i) {
const data_type& type = types[i];
const bytes_opt& bytes = parameters[i];
if (!bytes && !_called_on_null_input) {
return std::nullopt;
}
values.push_back(bytes ? type->deserialize(*bytes) : data_value::make_null(type));
}
return lua::run_script(lua::bitcode_view{_bitcode}, values, return_type(), _cfg).get0();
}
}
}

View File

@@ -0,0 +1,70 @@
/*
* Copyright (C) 2019 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "abstract_function.hh"
#include "scalar_function.hh"
#include "lua.hh"
namespace cql3 {
namespace functions {
class user_function final : public abstract_function, public scalar_function {
std::vector<sstring> _arg_names;
sstring _body;
sstring _language;
bool _called_on_null_input;
sstring _bitcode;
// FIXME: We should not need a copy in each function. It is here
// because user_function::execute is only passed the
// cql_serialization_format and the runtime arguments. We could
// avoid it by having a runtime->execute(user_function) instead,
// but that is a large refactoring. We could also store a
// lua_runtime in a thread_local variable, but that is one extra
// global.
lua::runtime_config _cfg;
public:
user_function(function_name name, std::vector<data_type> arg_types, std::vector<sstring> arg_names, sstring body,
sstring language, data_type return_type, bool called_on_null_input, sstring bitcode,
lua::runtime_config cfg);
const std::vector<sstring>& arg_names() const { return _arg_names; }
const sstring& body() const { return _body; }
const sstring& language() const { return _language; }
bool called_on_null_input() const { return _called_on_null_input; }
virtual bool is_pure() const override;
virtual bool is_native() const override;
virtual bool is_aggregate() const override;
virtual bool requires_thread() const override;
virtual bytes_opt execute(cql_serialization_format sf, const std::vector<bytes_opt>& parameters) override;
};
}
}

View File

@@ -202,7 +202,7 @@ lists::delayed_value::contains_bind_marker() const {
}
void
lists::delayed_value::collect_marker_specification(shared_ptr<variable_specifications> bound_names) {
lists::delayed_value::collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) {
}
shared_ptr<terminal>
@@ -244,7 +244,7 @@ lists::marker::bind(const query_options& options) {
}
}
constexpr const db_clock::time_point lists::precision_time::REFERENCE_TIME;
constexpr db_clock::time_point lists::precision_time::REFERENCE_TIME;
thread_local lists::precision_time lists::precision_time::_last = {db_clock::time_point::max(), 0};
lists::precision_time
@@ -280,12 +280,12 @@ lists::setter::execute(mutation& m, const clustering_key_prefix& prefix, const u
}
bool
lists::setter_by_index::requires_read() {
lists::setter_by_index::requires_read() const {
return true;
}
void
lists::setter_by_index::collect_marker_specification(shared_ptr<variable_specifications> bound_names) {
lists::setter_by_index::collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) {
operation::collect_marker_specification(bound_names);
_idx->collect_marker_specification(std::move(bound_names));
}
@@ -337,7 +337,7 @@ lists::setter_by_index::execute(mutation& m, const clustering_key_prefix& prefix
}
bool
lists::setter_by_uuid::requires_read() {
lists::setter_by_uuid::requires_read() const {
return false;
}
@@ -437,7 +437,7 @@ lists::prepender::execute(mutation& m, const clustering_key_prefix& prefix, cons
}
bool
lists::discarder::requires_read() {
lists::discarder::requires_read() const {
return true;
}
@@ -490,7 +490,7 @@ lists::discarder::execute(mutation& m, const clustering_key_prefix& prefix, cons
}
bool
lists::discarder_by_index::requires_read() {
lists::discarder_by_index::requires_read() const {
return true;
}

View File

@@ -104,7 +104,7 @@ public:
: _elements(std::move(elements)) {
}
virtual bool contains_bind_marker() const override;
virtual void collect_marker_specification(shared_ptr<variable_specifications> bound_names);
virtual void collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names);
virtual shared_ptr<terminal> bind(const query_options& options) override;
};
@@ -158,8 +158,8 @@ public:
setter_by_index(const column_definition& column, shared_ptr<term> idx, shared_ptr<term> t)
: operation(column, std::move(t)), _idx(std::move(idx)) {
}
virtual bool requires_read() override;
virtual void collect_marker_specification(shared_ptr<variable_specifications> bound_names);
virtual bool requires_read() const override;
virtual void collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names);
virtual void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) override;
};
@@ -168,7 +168,7 @@ public:
setter_by_uuid(const column_definition& column, shared_ptr<term> idx, shared_ptr<term> t)
: setter_by_index(column, std::move(idx), std::move(t)) {
}
virtual bool requires_read() override;
virtual bool requires_read() const override;
virtual void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) override;
};
@@ -195,7 +195,7 @@ public:
discarder(const column_definition& column, shared_ptr<term> t)
: operation(column, std::move(t)) {
}
virtual bool requires_read() override;
virtual bool requires_read() const override;
virtual void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) override;
};
@@ -204,7 +204,7 @@ public:
discarder_by_index(const column_definition& column, shared_ptr<term> idx)
: operation(column, std::move(idx)) {
}
virtual bool requires_read() override;
virtual bool requires_read() const override;
virtual void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params);
};
};

View File

@@ -218,7 +218,7 @@ maps::delayed_value::contains_bind_marker() const {
}
void
maps::delayed_value::collect_marker_specification(shared_ptr<variable_specifications> bound_names) {
maps::delayed_value::collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) {
}
shared_ptr<terminal>
@@ -293,7 +293,7 @@ maps::setter::execute(mutation& m, const clustering_key_prefix& row_key, const u
}
void
maps::setter_by_key::collect_marker_specification(shared_ptr<variable_specifications> bound_names) {
maps::setter_by_key::collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) {
operation::collect_marker_specification(bound_names);
_k->collect_marker_specification(bound_names);
}

View File

@@ -98,7 +98,7 @@ public:
: _comparator(std::move(comparator)), _elements(std::move(elements)) {
}
virtual bool contains_bind_marker() const override;
virtual void collect_marker_specification(shared_ptr<variable_specifications> bound_names) override;
virtual void collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) override;
shared_ptr<terminal> bind(const query_options& options);
};
@@ -126,7 +126,7 @@ public:
setter_by_key(const column_definition& column, shared_ptr<term> k, shared_ptr<term> t)
: operation(column, std::move(t)), _k(std::move(k)) {
}
virtual void collect_marker_specification(shared_ptr<variable_specifications> bound_names) override;
virtual void collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) override;
virtual void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) override;
};

View File

@@ -138,7 +138,7 @@ public:
protected:
virtual shared_ptr<restrictions::restriction> new_EQ_restriction(database& db, schema_ptr schema,
shared_ptr<variable_specifications> bound_names) override {
lw_shared_ptr<variable_specifications> bound_names) override {
auto rs = receivers(db, schema);
std::vector<::shared_ptr<column_specification>> col_specs(rs.size());
std::transform(rs.begin(), rs.end(), col_specs.begin(), [] (auto cs) {
@@ -149,7 +149,7 @@ protected:
}
virtual shared_ptr<restrictions::restriction> new_IN_restriction(database& db, schema_ptr schema,
shared_ptr<variable_specifications> bound_names) override {
lw_shared_ptr<variable_specifications> bound_names) override {
auto rs = receivers(db, schema);
std::vector<::shared_ptr<column_specification>> col_specs(rs.size());
std::transform(rs.begin(), rs.end(), col_specs.begin(), [] (auto cs) {
@@ -172,7 +172,7 @@ protected:
}
virtual shared_ptr<restrictions::restriction> new_slice_restriction(database& db, schema_ptr schema,
shared_ptr<variable_specifications> bound_names,
lw_shared_ptr<variable_specifications> bound_names,
statements::bound bound, bool inclusive) override {
auto rs = receivers(db, schema);
std::vector<::shared_ptr<column_specification>> col_specs(rs.size());
@@ -184,12 +184,12 @@ protected:
}
virtual shared_ptr<restrictions::restriction> new_contains_restriction(database& db, schema_ptr schema,
shared_ptr<variable_specifications> bound_names, bool is_key) override {
lw_shared_ptr<variable_specifications> bound_names, bool is_key) override {
throw exceptions::invalid_request_exception(format("{} cannot be used for Multi-column relations", get_operator()));
}
virtual ::shared_ptr<restrictions::restriction> new_LIKE_restriction(
database& db, schema_ptr schema, ::shared_ptr<variable_specifications> bound_names) override {
database& db, schema_ptr schema, lw_shared_ptr<variable_specifications> bound_names) override {
throw exceptions::invalid_request_exception("LIKE cannot be used for Multi-column relations");
}
@@ -202,7 +202,7 @@ protected:
virtual shared_ptr<term> to_term(const std::vector<shared_ptr<column_specification>>& receivers,
::shared_ptr<term::raw> raw, database& db, const sstring& keyspace,
::shared_ptr<variable_specifications> bound_names) override {
lw_shared_ptr<variable_specifications> bound_names) override {
auto as_multi_column_raw = dynamic_pointer_cast<term::multi_column_raw>(raw);
auto t = as_multi_column_raw->prepare(db, keyspace, receivers);
t->collect_marker_specification(bound_names);

View File

@@ -115,7 +115,7 @@ public:
* @return whether the operation requires a read of the previous value to be executed
* (only lists setterByIdx, discard and discardByIdx requires that).
*/
virtual bool requires_read() {
virtual bool requires_read() const {
return false;
}
@@ -125,7 +125,7 @@ public:
* @param bound_names the list of column specification where to collect the
* bind variables of this term in.
*/
virtual void collect_marker_specification(::shared_ptr<variable_specifications> bound_names) {
virtual void collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) {
if (_t) {
_t->collect_marker_specification(bound_names);
}

View File

@@ -237,6 +237,10 @@ public:
return _names;
}
const std::vector<cql3::raw_value_view>& get_values() const noexcept {
return _value_views;
}
const cql_config& get_cql_config() const {
return _cql_config;
}

View File

@@ -85,10 +85,11 @@ public:
}
};
query_processor::query_processor(service::storage_proxy& proxy, database& db, query_processor::memory_config mcfg)
query_processor::query_processor(service::storage_proxy& proxy, database& db, service::migration_notifier& mn, query_processor::memory_config mcfg)
: _migration_subscriber{std::make_unique<migration_subscriber>(this)}
, _proxy(proxy)
, _db(db)
, _mnotifier(mn)
, _internal_state(new internal_state())
, _prepared_cache(prep_cache_log, mcfg.prepared_statment_cache_size)
, _authorized_prepared_cache(std::min(std::chrono::milliseconds(_db.get_config().permissions_validity_in_ms()),
@@ -96,14 +97,22 @@ query_processor::query_processor(service::storage_proxy& proxy, database& db, qu
std::chrono::milliseconds(_db.get_config().permissions_update_interval_in_ms()),
mcfg.authorized_prepared_cache_size, authorized_prepared_statements_cache_log) {
namespace sm = seastar::metrics;
namespace stm = statements;
using clevel = db::consistency_level;
sm::label cl_label("consistency_level");
sm::label who_label("who"); // Who queried system tables
const auto user_who_label_instance = who_label("user");
const auto internal_who_label_instance = who_label("internal");
sm::label ks_label("ks");
const auto system_ks_label_instance = ks_label("system");
std::vector<sm::metric_definition> qp_group;
qp_group.push_back(sm::make_derive(
"statements_prepared",
_stats.prepare_invocations,
sm::description("Counts a total number of parsed CQL requests.")));
sm::description("Counts the total number of parsed CQL requests.")));
for (auto cl = size_t(clevel::MIN_VALUE); cl <= size_t(clevel::MAX_VALUE); ++cl) {
qp_group.push_back(
sm::make_derive(
@@ -123,97 +132,219 @@ query_processor::query_processor(service::storage_proxy& proxy, database& db, qu
{
sm::make_derive(
"reads",
_cql_stats.statements[size_t(statement_type::SELECT)],
sm::description("Counts a total number of CQL read requests.")),
sm::description("Counts the total number of CQL SELECT requests."),
[this] {
// Reads fall into `cond_selector::NO_CONDITIONS' pigeonhole
return _cql_stats.query_cnt(source_selector::USER, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::SELECT)
+ _cql_stats.query_cnt(source_selector::USER, ks_selector::NONSYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::SELECT)
+ _cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::SELECT)
+ _cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::NONSYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::SELECT);
}),
sm::make_derive(
"inserts",
_cql_stats.statements[size_t(statement_type::INSERT)],
sm::description("Counts a total number of CQL INSERT requests without conditions."),
{non_cas_label_instance}),
sm::description("Counts the total number of CQL INSERT requests with/without conditions."),
{non_cas_label_instance},
[this] {
return _cql_stats.query_cnt(source_selector::USER, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::INSERT)
+ _cql_stats.query_cnt(source_selector::USER, ks_selector::NONSYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::INSERT)
+ _cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::INSERT)
+ _cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::NONSYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::INSERT);
}),
sm::make_derive(
"inserts",
_cql_stats.cas_statements[size_t(statement_type::INSERT)],
sm::description("Counts a total number of CQL INSERT requests with conditions."),
{cas_label_instance}),
sm::description("Counts the total number of CQL INSERT requests with/without conditions."),
{cas_label_instance},
[this] {
return _cql_stats.query_cnt(source_selector::USER, ks_selector::SYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::INSERT)
+ _cql_stats.query_cnt(source_selector::USER, ks_selector::NONSYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::INSERT)
+ _cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::SYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::INSERT)
+ _cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::NONSYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::INSERT);
}),
sm::make_derive(
"updates",
_cql_stats.statements[size_t(statement_type::UPDATE)],
sm::description("Counts a total number of CQL UPDATE requests without conditions."),
{non_cas_label_instance}),
sm::description("Counts the total number of CQL UPDATE requests with/without conditions."),
{non_cas_label_instance},
[this] {
return _cql_stats.query_cnt(source_selector::USER, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::UPDATE)
+ _cql_stats.query_cnt(source_selector::USER, ks_selector::NONSYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::UPDATE)
+ _cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::UPDATE)
+ _cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::NONSYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::UPDATE);
}),
sm::make_derive(
"updates",
_cql_stats.cas_statements[size_t(statement_type::UPDATE)],
sm::description("Counts a total number of CQL UPDATE requests with conditions."),
{cas_label_instance}),
sm::description("Counts the total number of CQL UPDATE requests with/without conditions."),
{cas_label_instance},
[this] {
return _cql_stats.query_cnt(source_selector::USER, ks_selector::SYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::UPDATE)
+ _cql_stats.query_cnt(source_selector::USER, ks_selector::NONSYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::UPDATE)
+ _cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::SYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::UPDATE)
+ _cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::NONSYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::UPDATE);
}),
sm::make_derive(
"deletes",
_cql_stats.statements[size_t(statement_type::DELETE)],
sm::description("Counts a total number of CQL DELETE requests without conditions."),
{non_cas_label_instance}),
sm::description("Counts the total number of CQL DELETE requests with/without conditions."),
{non_cas_label_instance},
[this] {
return _cql_stats.query_cnt(source_selector::USER, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::DELETE)
+ _cql_stats.query_cnt(source_selector::USER, ks_selector::NONSYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::DELETE)
+ _cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::DELETE)
+ _cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::NONSYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::DELETE);
}),
sm::make_derive(
"deletes",
_cql_stats.cas_statements[size_t(statement_type::DELETE)],
sm::description("Counts a total number of CQL DELETE requests with conditions."),
{cas_label_instance}),
sm::description("Counts the total number of CQL DELETE requests with/without conditions."),
{cas_label_instance},
[this] {
return _cql_stats.query_cnt(source_selector::USER, ks_selector::SYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::DELETE)
+ _cql_stats.query_cnt(source_selector::USER, ks_selector::NONSYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::DELETE)
+ _cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::SYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::DELETE)
+ _cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::NONSYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::DELETE);
}),
sm::make_derive(
"reads_per_ks",
// Reads fall into `cond_selector::NO_CONDITIONS' pigeonhole
_cql_stats.query_cnt(source_selector::USER, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::SELECT),
sm::description("Counts the number of CQL SELECT requests executed on particular keyspaces. "
"Label `who' indicates where the reqs come from (clients or DB internals)"),
{user_who_label_instance, system_ks_label_instance}),
sm::make_derive(
"reads_per_ks",
_cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::SELECT),
sm::description("Counts the number of CQL SELECT requests executed on particular keyspaces. "
"Label `who' indicates where the reqs come from (clients or DB internals)"),
{internal_who_label_instance, system_ks_label_instance}),
sm::make_derive(
"inserts_per_ks",
_cql_stats.query_cnt(source_selector::USER, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::INSERT),
sm::description("Counts the number of CQL INSERT requests executed on particular keyspaces. "
"Label `who' indicates where the reqs come from (clients or DB internals)."),
{user_who_label_instance, system_ks_label_instance, non_cas_label_instance}),
sm::make_derive(
"inserts_per_ks",
_cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::INSERT),
sm::description("Counts the number of CQL INSERT requests executed on particular keyspaces. "
"Label `who' indicates where the reqs come from (clients or DB internals)."),
{internal_who_label_instance, system_ks_label_instance, non_cas_label_instance}),
sm::make_derive(
"inserts_per_ks",
_cql_stats.query_cnt(source_selector::USER, ks_selector::SYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::INSERT),
sm::description("Counts the number of CQL INSERT requests executed on particular keyspaces. "
"Label `who' indicates where the reqs come from (clients or DB internals)."),
{user_who_label_instance, system_ks_label_instance, cas_label_instance}),
sm::make_derive(
"inserts_per_ks",
_cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::SYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::INSERT),
sm::description("Counts the number of CQL INSERT requests executed on particular keyspaces. "
"Label `who' indicates where the reqs come from (clients or DB internals)."),
{internal_who_label_instance, system_ks_label_instance, cas_label_instance}),
sm::make_derive(
"updates_per_ks",
_cql_stats.query_cnt(source_selector::USER, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::UPDATE),
sm::description("Counts the number of CQL UPDATE requests executed on particular keyspaces. "
"Label `who' indicates where the reqs come from (clients or DB internals)"),
{user_who_label_instance, system_ks_label_instance, non_cas_label_instance}),
sm::make_derive(
"updates_per_ks",
_cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::UPDATE),
sm::description("Counts the number of CQL UPDATE requests executed on particular keyspaces. "
"Label `who' indicates where the reqs come from (clients or DB internals)"),
{internal_who_label_instance, system_ks_label_instance, non_cas_label_instance}),
sm::make_derive(
"updates_per_ks",
_cql_stats.query_cnt(source_selector::USER, ks_selector::SYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::UPDATE),
sm::description("Counts the number of CQL UPDATE requests executed on particular keyspaces. "
"Label `who' indicates where the reqs come from (clients or DB internals)"),
{user_who_label_instance, system_ks_label_instance, cas_label_instance}),
sm::make_derive(
"updates_per_ks",
_cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::SYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::UPDATE),
sm::description("Counts the number of CQL UPDATE requests executed on particular keyspaces. "
"Label `who' indicates where the reqs come from (clients or DB internals)"),
{internal_who_label_instance, system_ks_label_instance, cas_label_instance}),
sm::make_derive(
"deletes_per_ks",
_cql_stats.query_cnt(source_selector::USER, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::DELETE),
sm::description("Counts the number of CQL DELETE requests executed on particular keyspaces. "
"Label `who' indicates where the reqs come from (clients or DB internals)"),
{user_who_label_instance, system_ks_label_instance, non_cas_label_instance}),
sm::make_derive(
"deletes_per_ks",
_cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::SYSTEM, cond_selector::NO_CONDITIONS, stm::statement_type::DELETE),
sm::description("Counts the number of CQL DELETE requests executed on particular keyspaces. "
"Label `who' indicates where the reqs come from (clients or DB internals)"),
{internal_who_label_instance, system_ks_label_instance, non_cas_label_instance}),
sm::make_derive(
"deletes_per_ks",
_cql_stats.query_cnt(source_selector::USER, ks_selector::SYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::DELETE),
sm::description("Counts the number of CQL DELETE requests executed on particular keyspaces. "
"Label `who' indicates where the reqs come from (clients or DB internals)"),
{user_who_label_instance, system_ks_label_instance, cas_label_instance}),
sm::make_derive(
"deletes_per_ks",
_cql_stats.query_cnt(source_selector::INTERNAL, ks_selector::SYSTEM, cond_selector::WITH_CONDITIONS, stm::statement_type::DELETE),
sm::description("Counts the number of CQL DELETE requests executed on particular keyspaces. "
"Label `who' indicates where the reqs come from (clients or DB internals)"),
{internal_who_label_instance, system_ks_label_instance, cas_label_instance}),
sm::make_derive(
"batches",
_cql_stats.batches,
sm::description("Counts a total number of CQL BATCH requests without conditions."),
sm::description("Counts the total number of CQL BATCH requests without conditions."),
{non_cas_label_instance}),
sm::make_derive(
"batches",
_cql_stats.cas_batches,
sm::description("Counts a total number of CQL BATCH requests with conditions."),
sm::description("Counts the total number of CQL BATCH requests with conditions."),
{cas_label_instance}),
sm::make_derive(
"statements_in_batches",
_cql_stats.statements_in_batches,
sm::description("Counts a total number of sub-statements in CQL BATCH requests without conditions."),
sm::description("Counts the total number of sub-statements in CQL BATCH requests without conditions."),
{non_cas_label_instance}),
sm::make_derive(
"statements_in_batches",
_cql_stats.statements_in_cas_batches,
sm::description("Counts a total number of sub-statements in CQL BATCH requests with conditions."),
sm::description("Counts the total number of sub-statements in CQL BATCH requests with conditions."),
{cas_label_instance}),
sm::make_derive(
"batches_pure_logged",
_cql_stats.batches_pure_logged,
sm::description(
"Counts a total number of LOGGED batches that were executed as LOGGED batches.")),
"Counts the total number of LOGGED batches that were executed as LOGGED batches.")),
sm::make_derive(
"batches_pure_unlogged",
_cql_stats.batches_pure_unlogged,
sm::description(
"Counts a total number of UNLOGGED batches that were executed as UNLOGGED "
"Counts the total number of UNLOGGED batches that were executed as UNLOGGED "
"batches.")),
sm::make_derive(
"batches_unlogged_from_logged",
_cql_stats.batches_unlogged_from_logged,
sm::description("Counts a total number of LOGGED batches that were executed as UNLOGGED "
sm::description("Counts the total number of LOGGED batches that were executed as UNLOGGED "
"batches.")),
sm::make_derive(
"rows_read",
_cql_stats.rows_read,
sm::description("Counts a total number of rows read during CQL requests.")),
sm::description("Counts the total number of rows read during CQL requests.")),
sm::make_derive(
"prepared_cache_evictions",
[] { return prepared_statements_cache::shard_stats().prepared_cache_evictions; },
sm::description("Counts a number of prepared statements cache entries evictions.")),
sm::description("Counts the number of prepared statements cache entries evictions.")),
sm::make_gauge(
"prepared_cache_size",
@@ -228,58 +359,63 @@ query_processor::query_processor(service::storage_proxy& proxy, database& db, qu
sm::make_derive(
"secondary_index_creates",
_cql_stats.secondary_index_creates,
sm::description("Counts a total number of CQL CREATE INDEX requests.")),
sm::description("Counts the total number of CQL CREATE INDEX requests.")),
sm::make_derive(
"secondary_index_drops",
_cql_stats.secondary_index_drops,
sm::description("Counts a total number of CQL DROP INDEX requests.")),
sm::description("Counts the total number of CQL DROP INDEX requests.")),
// secondary_index_reads total count is also included in all cql reads
sm::make_derive(
"secondary_index_reads",
_cql_stats.secondary_index_reads,
sm::description("Counts a total number of CQL read requests performed using secondary indexes.")),
sm::description("Counts the total number of CQL read requests performed using secondary indexes.")),
// secondary_index_rows_read total count is also included in all cql rows read
sm::make_derive(
"secondary_index_rows_read",
_cql_stats.secondary_index_rows_read,
sm::description("Counts a total number of rows read during CQL requests performed using secondary indexes.")),
sm::description("Counts the total number of rows read during CQL requests performed using secondary indexes.")),
// read requests that required ALLOW FILTERING
sm::make_derive(
"filtered_read_requests",
_cql_stats.filtered_reads,
sm::description("Counts a total number of CQL read requests that required ALLOW FILTERING. See filtered_rows_read_total to compare how many rows needed to be filtered.")),
sm::description("Counts the total number of CQL read requests that required ALLOW FILTERING. See filtered_rows_read_total to compare how many rows needed to be filtered.")),
// rows read with filtering enabled (because ALLOW FILTERING was required)
sm::make_derive(
"filtered_rows_read_total",
_cql_stats.filtered_rows_read_total,
sm::description("Counts a total number of rows read during CQL requests that required ALLOW FILTERING. See filtered_rows_matched_total and filtered_rows_dropped_total for information how accurate filtering queries are.")),
sm::description("Counts the total number of rows read during CQL requests that required ALLOW FILTERING. See filtered_rows_matched_total and filtered_rows_dropped_total for information how accurate filtering queries are.")),
// rows read with filtering enabled and accepted by the filter
sm::make_derive(
"filtered_rows_matched_total",
_cql_stats.filtered_rows_matched_total,
sm::description("Counts a number of rows read during CQL requests that required ALLOW FILTERING and accepted by the filter. Number similar to filtered_rows_read_total indicates that filtering is accurate.")),
sm::description("Counts the number of rows read during CQL requests that required ALLOW FILTERING and accepted by the filter. Number similar to filtered_rows_read_total indicates that filtering is accurate.")),
// rows read with filtering enabled and rejected by the filter
sm::make_derive(
"filtered_rows_dropped_total",
[this]() {return _cql_stats.filtered_rows_read_total - _cql_stats.filtered_rows_matched_total;},
sm::description("Counts a number of rows read during CQL requests that required ALLOW FILTERING and dropped by the filter. Number similar to filtered_rows_read_total indicates that filtering is not accurate and might cause performance degradation.")),
sm::description("Counts the number of rows read during CQL requests that required ALLOW FILTERING and dropped by the filter. Number similar to filtered_rows_read_total indicates that filtering is not accurate and might cause performance degradation.")),
sm::make_derive(
"select_bypass_caches",
_cql_stats.select_bypass_caches,
sm::description("Counts the number of SELECT statements with BYPASS CACHE option.")),
sm::make_derive(
"authorized_prepared_statements_cache_evictions",
[] { return authorized_prepared_statements_cache::shard_stats().authorized_prepared_statements_cache_evictions; },
sm::description("Counts a number of authenticated prepared statements cache entries evictions.")),
sm::description("Counts the number of authenticated prepared statements cache entries evictions.")),
sm::make_gauge(
"authorized_prepared_statements_cache_size",
[this] { return _authorized_prepared_cache.size(); },
sm::description("A number of entries in the authenticated prepared statements cache.")),
sm::description("Number of entries in the authenticated prepared statements cache.")),
sm::make_gauge(
"user_prepared_auth_cache_footprint",
@@ -289,24 +425,34 @@ query_processor::query_processor(service::storage_proxy& proxy, database& db, qu
sm::make_counter(
"reverse_queries",
_cql_stats.reverse_queries,
sm::description("Counts number of CQL SELECT requests with ORDER BY DESC.")),
sm::description("Counts the number of CQL SELECT requests with reverse ORDER BY order.")),
sm::make_counter(
"unpaged_select_queries",
_cql_stats.unpaged_select_queries,
sm::description("Counts number of unpaged CQL SELECT requests.")),
[this] {
return _cql_stats.unpaged_select_queries(ks_selector::NONSYSTEM)
+ _cql_stats.unpaged_select_queries(ks_selector::SYSTEM);
},
sm::description("Counts the total number of unpaged CQL SELECT requests.")),
sm::make_counter(
"unpaged_select_queries_per_ks",
_cql_stats.unpaged_select_queries(ks_selector::SYSTEM),
sm::description("Counts the number of unpaged CQL SELECT requests against particular keyspaces."),
{system_ks_label_instance})
});
service::get_local_migration_manager().register_listener(_migration_subscriber.get());
_mnotifier.register_listener(_migration_subscriber.get());
}
query_processor::~query_processor() {
}
future<> query_processor::stop() {
service::get_local_migration_manager().unregister_listener(_migration_subscriber.get());
return _authorized_prepared_cache.stop().finally([this] { return _prepared_cache.stop(); });
return _mnotifier.unregister_listener(_migration_subscriber.get()).then([this] {
return _authorized_prepared_cache.stop().finally([this] { return _prepared_cache.stop(); });
});
}
future<::shared_ptr<result_message>>
@@ -484,7 +630,7 @@ query_options query_processor::make_internal_options(
const std::initializer_list<data_value>& values,
db::consistency_level cl,
const timeout_config& timeout_config,
int32_t page_size) {
int32_t page_size) const {
if (p->bound_names.size() != values.size()) {
throw std::invalid_argument(
format("Invalid number of values. Expecting {:d} but got {:d}", p->bound_names.size(), values.size()));

View File

@@ -57,7 +57,7 @@
#include "cql3/untyped_result_set.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
#include "service/migration_manager.hh"
#include "service/migration_listener.hh"
#include "service/query_state.hh"
#include "transport/messages/result_message.hh"
@@ -109,6 +109,7 @@ private:
std::unique_ptr<migration_subscriber> _migration_subscriber;
service::storage_proxy& _proxy;
database& _db;
service::migration_notifier& _mnotifier;
struct stats {
uint64_t prepare_invocations = 0;
@@ -142,7 +143,7 @@ public:
static ::shared_ptr<statements::raw::parsed_statement> parse_statement(const std::string_view& query);
query_processor(service::storage_proxy& proxy, database& db, memory_config mcfg);
query_processor(service::storage_proxy& proxy, database& db, service::migration_notifier& mn, memory_config mcfg);
~query_processor();
@@ -158,15 +159,15 @@ public:
return _cql_stats;
}
statements::prepared_statement::checked_weak_ptr get_prepared(const auth::authenticated_user* user_ptr, const prepared_cache_key_type& key) {
if (user_ptr) {
auto it = _authorized_prepared_cache.find(*user_ptr, key);
statements::prepared_statement::checked_weak_ptr get_prepared(const std::optional<auth::authenticated_user>& user, const prepared_cache_key_type& key) {
if (user) {
auto it = _authorized_prepared_cache.find(*user, key);
if (it != _authorized_prepared_cache.end()) {
try {
return it->get()->checked_weak_from_this();
} catch (seastar::checked_ptr_is_null_exception&) {
// If the prepared statement got invalidated - remove the corresponding authorized_prepared_statements_cache entry as well.
_authorized_prepared_cache.remove(*user_ptr, key);
_authorized_prepared_cache.remove(*user, key);
}
}
}
@@ -325,7 +326,7 @@ private:
const std::initializer_list<data_value>&,
db::consistency_level,
const timeout_config& timeout_config,
int32_t page_size = -1);
int32_t page_size = -1) const;
future<::shared_ptr<cql_transport::messages::result_message>>
process_authorized_statement(const ::shared_ptr<cql_statement> statement, service::query_state& query_state, const query_options& options);

View File

@@ -139,7 +139,7 @@ public:
* @return the <code>Restriction</code> corresponding to this <code>Relation</code>
* @throws InvalidRequestException if this <code>Relation</code> is not valid
*/
virtual ::shared_ptr<restrictions::restriction> to_restriction(database& db, schema_ptr schema, ::shared_ptr<variable_specifications> bound_names) final {
virtual ::shared_ptr<restrictions::restriction> to_restriction(database& db, schema_ptr schema, lw_shared_ptr<variable_specifications> bound_names) final {
if (_relation_type == operator_type::EQ) {
return new_EQ_restriction(db, schema, bound_names);
} else if (_relation_type == operator_type::LT) {
@@ -182,7 +182,7 @@ public:
* @throws InvalidRequestException if the relation cannot be converted into an EQ restriction.
*/
virtual ::shared_ptr<restrictions::restriction> new_EQ_restriction(database& db, schema_ptr schema,
::shared_ptr<variable_specifications> bound_names) = 0;
lw_shared_ptr<variable_specifications> bound_names) = 0;
/**
* Creates a new IN restriction instance.
@@ -193,7 +193,7 @@ public:
* @throws InvalidRequestException if the relation cannot be converted into an IN restriction.
*/
virtual ::shared_ptr<restrictions::restriction> new_IN_restriction(database& db, schema_ptr schema,
::shared_ptr<variable_specifications> bound_names) = 0;
lw_shared_ptr<variable_specifications> bound_names) = 0;
/**
* Creates a new Slice restriction instance.
@@ -206,7 +206,7 @@ public:
* @throws InvalidRequestException if the <code>Relation</code> is not valid
*/
virtual ::shared_ptr<restrictions::restriction> new_slice_restriction(database& db, schema_ptr schema,
::shared_ptr<variable_specifications> bound_names,
lw_shared_ptr<variable_specifications> bound_names,
statements::bound bound,
bool inclusive) = 0;
@@ -220,13 +220,13 @@ public:
* @throws InvalidRequestException if the <code>Relation</code> is not valid
*/
virtual ::shared_ptr<restrictions::restriction> new_contains_restriction(database& db, schema_ptr schema,
::shared_ptr<variable_specifications> bound_names, bool isKey) = 0;
lw_shared_ptr<variable_specifications> bound_names, bool isKey) = 0;
/**
* Creates a new LIKE restriction instance.
*/
virtual ::shared_ptr<restrictions::restriction> new_LIKE_restriction(database& db, schema_ptr schema,
::shared_ptr<variable_specifications> bound_names) = 0;
lw_shared_ptr<variable_specifications> bound_names) = 0;
/**
* Renames an identifier in this Relation, if applicable.
@@ -253,7 +253,7 @@ protected:
::shared_ptr<term::raw> raw,
database& db,
const sstring& keyspace,
::shared_ptr<variable_specifications> boundNames) = 0;
lw_shared_ptr<variable_specifications> boundNames) = 0;
/**
* Converts the specified <code>Raw</code> terms into a <code>Term</code>s.
@@ -269,7 +269,7 @@ protected:
const std::vector<::shared_ptr<term::raw>>& raws,
database& db,
const sstring& keyspace,
::shared_ptr<variable_specifications> boundNames) {
lw_shared_ptr<variable_specifications> boundNames) {
std::vector<::shared_ptr<term>> terms;
for (auto&& r : raws) {
terms.emplace_back(to_term(receivers, r, db, keyspace, boundNames));

View File

@@ -176,7 +176,7 @@ statement_restrictions::statement_restrictions(database& db,
schema_ptr schema,
statements::statement_type type,
const std::vector<::shared_ptr<relation>>& where_clause,
::shared_ptr<variable_specifications> bound_names,
lw_shared_ptr<variable_specifications> bound_names,
bool selects_only_static_columns,
bool select_a_collection,
bool for_view,

Some files were not shown because too many files have changed in this diff Show More