Commit Graph

2818 Commits

Author SHA1 Message Date
Nadav Har'El
3d0bd523b5 Merge 'CQL3: fromJson out of range integer cause as error' from Jadw1
Passing integer which exceeds corresponding type's bounds to
`fromJson()` was causing silent overflow, e.g. inserting
`fromJson('2147483648')` to `int` coulmn stored `-2147483648`.

Now, this will cause marshal_exception. All integer types are testing agains their bounds.

Tests referring issue https://github.com/scylladb/scylla/issues/7914 in `test/cql-pytest/cassandra_tests/validation/entities/json_test.py` won't pass because the expected error's messages differ from the thrown ones. I was wondering what the message should be, because expected messages in tests aren't consistent, for instance:
- bigint overflow expects `Expected a bigint value, but got a` message
- short overflow expects `Unable to make short from` message

For now the message is `Value {} out of bound`.

Fixes: https://github.com/scylladb/scylla/issues/7914

Closes #10145

* github.com:scylladb/scylla:
  CQL3/pytest: Updating test_json
  CQL3: fromJson out of range integer cause as error
2022-03-03 13:46:16 +02:00
Jadw1
742efc4992 CQL3/pytest: Updating test_json
Added test for bigint overflow.
2022-03-02 15:36:09 +01:00
Raphael S. Carvalho
2dba0670ad compaction: Fix time_window_backlog_tracker::replace_sstables()
Introduced in commit: ddd693c6d7

We're not emplacing newer windows in the tracker, causing
std::out_of_range when replacing sstables for windows.

Let's fix the logic and add an unit test to cover this.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220301194944.95096-1-raphaelsc@scylladb.com>
2022-03-02 10:08:40 +02:00
Nadav Har'El
7cf2e5ee5c Merge 'directory_lister: drop abort method and simplify close semantics' from Benny Halevy
This series contains:
- lister: move to utils
  - tidy up the clutter in the root dir
Based on Avi's feedback to `[PATCH 1/1] utils: directory_lister: close: always abort queue` that was sent to the mailing list:
  - directory_lister: drop abort method
  - lister: do not require get after close to fail
- test: lister_test: test_directory_lister_close simplify indentation
  - cosmetic cleanup

Closes #10142

* github.com:scylladb/scylla:
  test: lister_test: test_directory_lister_close simplify indentation
  lister: do not require get after close to fail
  directory_lister: drop abort method
  lister: move to utils
2022-03-01 16:23:47 +02:00
Botond Dénes
cfa3910509 Merge 'Memtable - scanning and flush readers now implement flat_mutation_reader_v2::impl' from Michael Livshin
This PR consists of two changes.

The first fixes the flat_mutation_reader and flat_mutation_reader_v2, so that they can be destructed without being closed (if no action has been initiated). This has been discussed in the referenced issue.

The second one changes scanning and flush readers so that they implement the second version of the API.

It also contains unit test fixes, dealing with flat mutation reader assertions (where the v1 asserter failed to consume range tombstones intelligently enough in some flows) and several sstable_3_x tests (where sstables that contain range tombstones were expected to be byte-by-byte equivalent to a reference, aside from semantic validation).

Fixes #9065.

Closes #9669

* github.com:scylladb/scylla:
  flat_reader_assertions: do not accumulate out-of-range tombstones
  flat_reader_assertions: refactor resetting accumulated tombstone lists
  flat_mutation_reader_test: fix "test_flat_mutation_reader_consume_single_partition"
  memtable::make_flush_reader(): return flat_mutation_reader_v2
  memtable::make_flat_reader(): return flat_mutation_reader_v2
  flat_mutation_reader_v2: add consume_partitions()
  introduce the MutationConsumer concept
  mutation_source: clone shortcut constructors for flat_mutation_reader_v2
  flat_mutation_reader_v2: add delegating_reader_v2
  memtable: upgrade scanning_reader and flush_reader to v2
  flat_mutation_reader: allow destructing readers which are not closed and didn't initiate any IO.
  tests: stop comparing sstables with range tombstones to C* reference
  tests: flat_reader_assertions: improve range tombstone checking
2022-02-28 17:23:20 +02:00
Michael Livshin
fb6c79015a flat_reader_assertions: do not accumulate out-of-range tombstones
Also remove the incorrect difference in range tombstone checking
behavior between `produces_range_tombstone()` and `produces(const
range_tombstone&)` by having both turn on checking.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-28 17:11:54 +02:00
Michael Livshin
9fa4d9a2bb flat_reader_assertions: refactor resetting accumulated tombstone lists
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-28 17:11:54 +02:00
Michael Livshin
2221aeff0e flat_mutation_reader_test: fix "test_flat_mutation_reader_consume_single_partition"
Since `flat_reader_assertions::produces(const range_tombstone&,...)`
records the range tombstone for checking, be sure to explicitly pass
in a clustering range that does not extend beyond the mock-read part
of the mutation.

Also (provisionally) change the assertion method to accept clustering
ranges.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-28 17:11:54 +02:00
Michael Livshin
9bacce4359 memtable::make_flat_reader(): return flat_mutation_reader_v2
This is just a facade change.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-28 17:11:54 +02:00
Michał Radwański
9ada63a9cb flat_mutation_reader: allow destructing readers which are not closed and didn't initiate any IO.
In functions such as upgrade_to_v2 (excerpt below), if the constructor
of transforming_reader throws, r needs to be destroyed, however it
hasn't been closed. However, if a reader didn't start any operations, it
is safe to destruct such a reader. This issue can potentially manifest
itself in many more readers and might be hard to track down. This commit
adds a bool indicating whether a close is anticipated, thus avoiding
errors in the destructor.

Code excerpt:
flat_mutation_reader_v2 upgrade_to_v2(flat_mutation_reader r) {
    class transforming_reader : public flat_mutation_reader_v2::impl {
        // ...
    };
    return make_flat_mutation_reader_v2<transforming_reader>(std::move(r));
}

Fixes #9065.
2022-02-28 17:11:54 +02:00
Michael Livshin
67c3c31a6e tests: stop comparing sstables with range tombstones to C* reference
As flat mutation reader {up,down}grades get added to the write path,
comparing range-tombstone-containing (at least) sstables byte-by-byte
to a reference is starting to seem like a fool's errand.

* When a flat mutation reader is {up,down}graded, information may get
  lost while splitting range tombstones.  Making those splits revertable
  should in theory be possible but would surely make {up,down}graders
  slower and more complex, and may also possibly entail adding
  information to in-memory representation of range tombstones and
  range rombstone changes.  Such investment for the sake of 7 unit tests
  does not seem wise, given that the plan is to get rid of reader
  {up,down}grade logic once the move to flat mutation reader v2 is
  completed.

* All affected tests also validate their written sstables
  semantically.

* At least some of the offending reference sstables are not
  "canonical" wrt range tombstones to begin with -- they contain range
  tombstones that overlap with clustering rows.  The fact that Scylla
  does not "canonicalize" those in some way seems purely incidental.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-28 17:11:54 +02:00
Michael Livshin
2337d48b41 tests: flat_reader_assertions: improve range tombstone checking
`produces_range_tombstone()` is smart enough to not just try to read
one range tombstone from the input and compare it to the passed
reference, but to read as many range tombstones as the reader is
looking at (including none) using `may_produce_tombstones()` and
record those appropriately.

When `produces(const schema&, const mutation_fragment&)` is passed a
range tombstone as the second argument, it does not do anything
special -- it just reads one fragment, disregards it (!), and applies
its second argument to both "expected" and "encountered" range
tombstone lists.  The right thing here is to use the same logic as
`produces_range_tombstone()`; upcoming memtable-related reader
changes (which result in more split range tombstones) cause some unit
tests to fail without fixing this.

Refactor the relevant logic into a private method (`apply_rt()`) and
use that in both places.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-28 17:11:54 +02:00
Nadav Har'El
b650ff5808 test/cql-pytest: test another corner-case of scientific-notation integers
In a previous patch, we added a test for the case of Scylla trying to
assign the JSON value 1e6 into an integer - which should be allowed
because 1e6 is indeed a whole number, in the range of int.

We already fixed that in commit efe7456f0a,
but this patch adds another test which demonstrates that an even more
esoteric problem remains:

If we are reading a JSON value into a bigint (CQL's 64-bit integer),
*and* if the number is between 2^53 and 2^63-1 *and* if the number
is written using scientific notation, e.g., 922337203685477580.7e1
(which is 2^63-1), then the bigint is set incorrectly, with some
digits being lost. The problem is that RapidJSON reads this integer
into the "double" type, which only keeps 53 significant bits.

Because this is an open issue (#10137), the test included here is
marked as expected failure (xfail). The test is also known to
fail in Cassandra - which doesn't allow scientific notation for
JSON integers at all despite the JSON standard - so the test is
also marked "cassandra_bug".

Refs #10137

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-02-28 13:52:56 +02:00
Benny Halevy
9c89c2df37 test: lister_test: test_directory_lister_close simplify indentation
There's no need anymore for an indented block
to destroy tnhe directory_lister since the other
sub-case was deleted.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-02-28 13:00:03 +02:00
Benny Halevy
41d097ef47 lister: do not require get after close to fail
Currently, the lister test expected get() to
always fail after close(), but it unexpectedly
succeeded if get() was never called before close,
as seen in https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/4587/artifact/testlog/x86_64_debug/lister_test.test_directory_lister_close.4001.log
```
random-seed=1475104835
Generated 719 dir entries
Getting 565 dir entries
Closing directory_lister
Getting 0 dir entries
Closing directory_lister
test/boost/lister_test.cc(190): fatal error: in "test_directory_lister_close": exception std::exception expected but not raised
```

This change relaxes this requirement to keep
close() simple, based on Avi's feedback:

> The user should call close(), and not do it while get() is running, and
> that's it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-02-28 12:59:08 +02:00
Benny Halevy
00327bfae3 directory_lister: drop abort method
Based on Avi's feedback:
> We generally have a public abort() only if we depend on an external
> event (like data from a tcp socket) that we don't control. But here
> there are no such external events. So why have a public abort() at all?

If needed in the future, we can consider adding
get(abort_source&) to allow aborting get() via
an external event.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-02-28 12:52:47 +02:00
Benny Halevy
ebbbf1e687 lister: move to utils
There's nothing specific to scylla in the lister
classes, they could (and maybe should) be part of
the seastar library.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-02-28 12:36:03 +02:00
Tomasz Grabiec
7719f4cd91 Merge "Group 0 discovery: persist and restore peers" from Kamil
We add a `peers()` method to `discovery` which returns the peers
discovered until now (including seeds). The caller of functions which
return an output -- `tick` or `request` -- is responsible for persisting
`peers()` before returning the output of `tick`/`request` (e.g. before
sending the response produced by `request` back). The user of
`discovery` is also responsible for restoring previously persisted peers
when constructing `discovery` again after a restart (e.g. if we
previously crashed in the middle of the algorithm).

The `persistent_discovery` class is a wrapper around `discovery` which
does exactly that.

For storage we use a simple local table.

A simple bugfix is also included in the first patch.

* kbr/discovery-persist-v3:
  service: raft: raft_group0: persist discovered peers and restore on restart
  db: system_keyspace: introduce discovery table
  service: raft: discovery: rename `get_output` to `tick`
  service: raft: discovery: stop returning peer_list from `request` after becoming leader
2022-02-25 17:23:08 +01:00
Avi Kivity
ff2cd72766 Merge 'utils: cached_file: Fix alloc-dealloc mismatch during eviction' from Tomasz Grabiec
cached_page::on_evicted() is invoked in the LSA allocator context, set in the
reclaimer callback installed by the cache_tracker. However,
cached_pages are allocated in the standard allocator context (note:
page content is allocated inside LSA via lsa_buffer). The LSA region
will happily deallocate these, thinking that they these are large
objects which were delegated to the standard allocator. But the
_non_lsa_memory_in_use metric will underflow. When it underflows
enough, shard_segment_pool.total_memory() will become 0 and memory
reclamation will stop doing anything, leading to apparent OOM.

The fix is to switch to the standard allocator context inside
cached_page::on_evicted(). evict_range() was also given the same
treatment as a precaution, it currently is only invoked in the
standard allocator context.

The series also adds two safety checks to LSA to catch such problems earlier.

Fixes #10056

\cc @slivne @bhalevy

Closes #10130

* github.com:scylladb/scylla:
  lsa: Abort when trying to free a standard allocator object not allocated through the region
  lsa: Abort when _non_lsa_memory_in_use goes negative
  tests: utils: cached_file: Validate occupancy after eviction
  test: sstable_partition_index_cache_test: Fix alloc-dealloc mismatch
  utils: cached_file: Fix alloc-dealloc mismatch during eviction
2022-02-25 18:19:04 +02:00
Botond Dénes
d8833de3bb Merge "Redefine Compaction Backlog to tame compaction aggressiveness" From Raphael S. Carvalho
"
Problem statement
=================
Today, compaction can act much more aggressive than it really has to, because
the strategy and its definition of backlog are completely decoupled.

The backlog definition for size-tiered, which is inherited by all
strategies (e.g.: LCS L0, TWCS' windows), is built on the assumption that the
world must reach the state of zero amplification. But that's unrealistic and
goes against the intent amplification defined by the compaction strategy.
For example, size tiered is a write oriented strategy which allows for extra
space amplification for compaction to keep up with the high write rate.

It can be seen today, in many deployments, that compaction shares is either
close to 1000, or even stuck at 1000, even though there's nothing to be done,
i.e. the compaction strategy is completely satisfied.
When there's a single sstable per tier, for example.
This means that whenever a new compaction job kicks in, it will act much more
aggressive because of the high shares, caused by false backlog of the existing
tables. This translates into higher P99 latencies and reduced throughput.

Solution
========
This problem can be fixed, as proposed in the document "Fixing compaction
aggressiveness due to suboptimal definition of zero backlog by controller" [1],
by removing backlog of tiers that don't have to be compacted now, like a tier
that has a single file. That's about coupling the strategy goal with the
backlog definition. So once strategy becomes satisfied, so will the controller.

Low-efficiency compaction, like compacting 2 files only or cross-tier, only
happens when system is under little load and can proceed at a slower pace.
Once efficient jobs show up, ongoing compactions, even if inefficient, will get
more shares (as efficient jobs add to the backlog) so compaction won't fall
behind.

With this approach, throughput and latency is improved as cpu time is no longer
stolen (unnecessarily) from the foreground requests.

[1]: https://docs.google.com/document/d/1EQnXXGWg6z7VAwI4u8AaUX1vFduClaf6WOMt2wem5oQ

Results
=======
Test sequentially populates 3 tables and then run a mixed workload on them,
where disk:memory ratio (usage) reaches ~30:1 at the peak.

Please find graphs here:
https://user-images.githubusercontent.com/1409139/153687219-32368a35-ac63-461b-a362-64dbe8449a00.png

1) Patched version started at ~01:30
2) On population phase, throughput increase and lower P99 write latency can be
clearly observed.
3) On mixed phase, throughput increase and lower P99 write and read latency can
also be clearly observed.
4) Compaction CPU time sometimes reach ~100% because of the delay between each
loader.
5) On unpatched version, it can be seen that backlog keeps growing even when
though strategies become satisfied, so compaction is using much more CPU time
in comparison. Patched version correctly clears the backlog.

Can also be found at:
github.com/raphaelsc/scylla.git compaction-controller-v5

tests: UNIT(dev, debug).
"

* 'compaction-controller-v5' of https://github.com/raphaelsc/scylla:
  tests: Add compaction controller test
  test/lib/sstable_utils: Set bytes_on_disk for fake SSTables
  compaction/size_tiered_backlog_tracker.hh: Use unsigned type for inflight component
  compaction: Redefine compaction backlog to tame compaction aggressiveness
  compaction_backlog_tracker: Batch changes through a new replacement interface
  table: Disable backlog tracker when stopping table
  compaction_backlog_tracker: make disable() public
  compaction_backlog_tracker: Clear tracker state when disabled
  compaction: Add normalized backlog metric
  compaction: make size_tiered_compaction_strategy static
2022-02-25 09:21:08 +02:00
Nadav Har'El
d1b4cbfbc3 test/cql-pytest: add reproducer for LWT bug with static-column conditions
This patch adds a reproducing test for issue #10081. That issue is about
a conditional (LWT) UPDATE operation that chose a non-existent row via WHERE,
and its condition refers to both static and regular columns: In that case,
the code incorrectly assumes that because it didn't read any row, all columns
are null - and forgets that the static column is *not* null.

The test, test_lwt.py::test_lwt_missing_row_with_static
passes on Cassandra but fails on Scylla, so is marked xfail.

Refs #10081

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220215215243.660087-1-nyh@scylladb.com>
2022-02-25 07:26:11 +02:00
Nadav Har'El
49a8164fb7 alternator: add configurable scan period to TTL expiration
Before this patch, the experimental TTL (expiration time) feature in
Alternator scans tables for expiration in a tight loop - starting the
next scan one second after the previous one completed.

In this patch we introduce a new configuration option,
alternator_ttl_period_in_seconds, which determines how frequently
to start the scan. The default is 24 hours - meaning that the next
scan is started 24 hours after the previous one started.

The tests (test/alternator/run) change this configuration back to one
second, so that expiration tests finish as quickly as possible.

Please note that the scan is *not* slowed down to fill this 24 hours -
if it finishes in one hour, it will then sleep for 23 hours. Additional
work would be needed to slow down the scan to not finish too quickly.
One idea not yet implemented is to move the expiration service from
the "maintenance" scheduling group which it uses today to a new
scheduling group, and modifying the number of shares that this group
gets.

Another thing worth noting about the configurable period (which defaults
to 24 hours) is that when TTL is enabled on an Alternator table, it can
take that amount of time until its scan starts and items start expiring
from it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-02-25 07:26:11 +02:00
Tomasz Grabiec
ca09a72597 tests: utils: cached_file: Validate occupancy after eviction
Reproducer for #10056

Catches alloc-dealloc mismatch leading to the underflow of
_non_lsa_memory_in_use.
2022-02-25 01:42:15 +01:00
Tomasz Grabiec
b0d5bb334c test: sstable_partition_index_cache_test: Fix alloc-dealloc mismatch
The test was allocating entries in the standard allocator, but they
are evicted in the LSA allocator context.

Fix by allocating under LSA.
2022-02-25 01:42:15 +01:00
Raphael S. Carvalho
2a7939ee4d tests: Add compaction controller test
There's no automated test for controller, it's time to have one.
Let's start with a basic one that verifies the assumption that
perfectly compacted tiers should produce 0 backlog.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-02-24 18:57:45 -03:00
Raphael S. Carvalho
96cfe7d530 test/lib/sstable_utils: Set bytes_on_disk for fake SSTables
Not precise, as bytes_on_disk accounts for all components, but good enough
for testing purposes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-02-24 18:57:45 -03:00
Raphael S. Carvalho
ddd693c6d7 compaction_backlog_tracker: Batch changes through a new replacement interface
This new interface allows table to communicate multiple changes in the
SSTable set with a single call, which is useful on compaction completion
for example.
With this new interface, the size tiered backlog tracker will be able to
know when compaction completed, which will allow it to recompute tiers
and their backlog contribution, if any. Without it, tiered tracker
would have to recompute tiers for every change, which would be terribly
expensive.
The old remove/add interface are being removed in favor of the new one.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-02-24 15:34:16 -03:00
Avi Kivity
cbba80914d memtable: move to replica module and namespace
Memtables are a replica-side entity, and so are moved to the
replica module and namespace.

Memtables are also used outside the replica, in two places:
 - in some virtual tables; this is also in some way inside the replica,
   (virtual readers are installed at the replica level, not the
   cooordinator), so I don't consider it a layering violation
 - in many sstable unit tests, as a convenient way to create sstables
   with known input. This is a layering violation.

We could make memtables their own module, but I think this is wrong.
Memtables are deeply tied into replica memory management, and trying
to make them a low-level primitive (at a lower level than sstables) will
be difficult. Not least because memtables use sstables. Instead, we
should have a memtable-like thing that doesn't support merging and
doesn't have all other funky memtable stuff, and instead replace
the uses of memtables in sstable tests with some kind of
make_flat_mutation_reader_from_unsorted_mutations() that does
the sorting that is the reason for the use of memtables in tests (and
live with the layering violation meanwhile).

Test: unit (dev)

Closes #10120
2022-02-23 09:05:16 +02:00
Avi Kivity
75fb45df1b Merge 'Propagate CQL coordinator timeouts and failures for reads' from Piotr Dulikowski
This PR propagates the read coordinator logic so that read timeout and read failure exceptions are propagated without throwing on the coordinator side.

This PR is only concerned with exceptions which were originally thrown by the coordinator (in read resolvers). Exceptions propagated through RPC and RPC timeouts will still throw, although those exceptions will be caught and converted into exceptions-as-values by read resolvers.

This is a continuation of work started in #10014.

Results of `perf_simple_query --smp 1 --operations-per-shard 1000000` (read workload), compared with merge base (10880fb0a7):

```
BEFORE:
125085.13 tps ( 80.2 allocs/op,  12.2 tasks/op,   49010 insns/op)
125645.88 tps ( 80.2 allocs/op,  12.2 tasks/op,   49008 insns/op)
126148.85 tps ( 80.2 allocs/op,  12.2 tasks/op,   49005 insns/op)
126044.40 tps ( 80.2 allocs/op,  12.2 tasks/op,   49005 insns/op)
125799.75 tps ( 80.2 allocs/op,  12.2 tasks/op,   49003 insns/op)

AFTER:
127557.21 tps ( 80.2 allocs/op,  12.2 tasks/op,   49197 insns/op)
127835.98 tps ( 80.2 allocs/op,  12.2 tasks/op,   49198 insns/op)
127749.81 tps ( 80.2 allocs/op,  12.2 tasks/op,   49202 insns/op)
128941.17 tps ( 80.2 allocs/op,  12.2 tasks/op,   49192 insns/op)
129276.15 tps ( 80.2 allocs/op,  12.2 tasks/op,   49182 insns/op)
```

The PR does not introduce additional allocations on the read happy-path. The number of instructions used grows by about 200 insns/op. The increase in TPS is probably just a measurement error.

Closes #10092

* github.com:scylladb/scylla:
  indexed_table_select_statement: return some exceptions as exception messages
  result_combinators: add result_wrap_unpack
  select_statement: return exceptions as errors in execute_without_checking_exception_message
  select_statement: return exceptions without throwing in do_execute
  select_statement: implement execute_without_checking_exception_message
  select_statement: introduce helpers for working with failed results
  query_pager: resultify relevant methods
  storage_proxy: resultify (do_)query
  storage_proxy: resultify query_singular
  storage_proxy: propagate failed results through query_partition_key_range
  storage_proxy: resultify query_partition_key_range_concurrent
  storage_proxy: modify handle_read_error to also handle exception containers
  abstract_read_executor: return result from execute()
  abstract_read_executor: return and handle result from has_cl()
  storage_proxy: resultify handling errors from read-repair
  abstract_read_executor::reconcile: resultise handling of data_resolver->done()
  abstract_read_executor::execute: resultify handling of data_resolver->done()
  result_combinators: add result_discard_value
  abstract_read_executor: resultify _result_promise
  abstract_read_executor: return result from done()
  abstract_read_resolver: fail promises by passing exception as value
  abstract_read_resolver: resultify promises
  exceptions: make it possible to return read_{timeout,failure}_exception as value
  result_try: add as_inner/clone_inner to handle types
  result_try: relax ConvertWithTo constraint
  exception_container: switch impl to std::shared_ptr and make copyable
  result_loop: add result_repeat
  result_loop: add result_do_until
  result_loop: add result_map_reduce
  utils/result: add utilities for checking/creating rebindable results
2022-02-22 20:58:25 +03:00
Nadav Har'El
eec39e1258 Merge 'api: keyspace_scrub: validate params' from Benny Halevy
Refs #10087

Add validation of all params for the keyspace_scrub api.
The validation method is generic and should be used by all apis eventually,
but I'm leaving that as follow-up work.

While at it, fixed the exception types thrown on invalid `scrub_mode` or `quarantine_mode` values from `std::runtime_error` to `httpd::bad_param_exception` so to generate the `bad_request` http status.

And added unit tests to verify that, and the handling of an unknown parameter.

Test: unit(dev)
DTest: nodetool_additional_test.py::TestNodetool::{test_scrub_with_one_node_expect_data_loss,test_scrub_with_multi_nodes_expect_data_rebuild,test_scrub_sstable_with_invalid_fragment,test_scrub_ks_sstable_with_invalid_fragment,test_scrub_segregate_sstable_with_invalid_fragment,test_scrub_segregate_ks_sstable_with_invalid_fragment}

Closes #10090

* github.com:scylladb/scylla:
  api: storage_service: scrub: validate parameters
  api: storage_service: refactor parse_tables
  api: storage_service: refactor validate_keyspace
  test: rest_api: add test_storage_service_keyspace_scrub tests
  api: storage_service: scrub: throw httpd::bad_param_exception for invalid param values
2022-02-22 20:58:25 +03:00
Nadav Har'El
364bd00136 test/cql-pytest: confirm that table names cannot include non-Latin letters
In CQL table names must be composed only of letters, digits, or underscores,
but some Cassandra documentation is unclear whether these "letters" refer only
to the Latin alphabet, or maybe UTF-8 names composed of letters in other
alphabets should be allowed too.

This patch adds a test that confirms that both Scylla and Cassandra only
accept the Latin alphabet in table names, and for example UTF-8 names
with French or Hebrew letters are rejected.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220222134220.972413-1-nyh@scylladb.com>
2022-02-22 20:58:25 +03:00
Nadav Har'El
1a940a1003 test/cql-pytest: remove "xfail" mark from scientific-notation tests that now pass
After issue #10100 was fixed, the two tests reproducing it now pass,
so remove their "xfail" marker.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220222131809.970592-1-nyh@scylladb.com>
2022-02-22 20:58:25 +03:00
Nadav Har'El
be84a8def3 Merge 'Allow integers in scientific format in INSERT JSON ' from Piotr Grabowski
Add support for specifing integers in scientific format (for example 1.234e8) in INSERT JSON statement:

```
INSERT INTO table JSON '{"int_column": 1e7}';
```

Before the JSON parsing library was switched to RapidJSON from JsonCpp, this statement used to work correctly, because JsonCpp transparently casts double to integer value.

Inserting a floating-point number ending with .0 is allowed, as the fractional part is zero. Non-zero fractional part (for example 12.34) is disallowed. A new test is added to test all those behaviors.

This behavior differs from Cassandra, which disallows those types of numbers (1e7, 123.0 and 12.34), however some users rely on that behavior and JSON specification itself does not distinct between floating-point numbers and integer numbers (only a single "number" type is defined).

This PR also fixes two minor issues I noticed while looking at the code: wrong blob validation and missing `IsString()` checks that could result in assertion error.

Fixes #10100
Fixes #10114
Fixes #10115

Closes #10101

* github.com:scylladb/scylla:
  type_json: support integers in scientific format
  type_json: add missing IsString() checks
  type_json: fix wrong blob JSON validation
2022-02-22 20:58:25 +03:00
Botond Dénes
3aa05f7f03 Merge "Make system.clients table virtual" from Pavel Emelyanov
"
The table lists connected clients. For this the clients are
stored in real table when they connect, update their statuses
when needed and remove^w tombstone themselves when they
disconnect. On start the whole table is cleared.

This looks weird. Here's another approach (inspired by the
hackathon project) that makes this table a pure virtual one.
The schema is preserved so is the data returned.

The benefits of doing it virtual are

- no on-disk updates while processing clients
- no potentially failing updates on non-failing disconnect
- less usage of the global qctx thing
- less calls to global storage_proxy
- simpler support for thrift and alternator clients (today's
  table implementation doesn't track them)
- the need to make virtual tables reg/unreg dynamic

branch: https://github.com/xemul/scylla/tree/br-clients-virtual-table-4
tests: manual(dev), unit(dev)

The manual test used 80-shards node and 1M connections from
1k different IP addresses.
"

* 'br-clients-virtual-table-4' of https://github.com/xemul/scylla:
  test: Add cql-pytest sanity test for system.clients table
  client_data: Sanitize connection_notifier
  transport: Indentation fix after previous patch
  code: Remove old on-disk version of system.clients table
  system_keyspace: Add clients_v virtual table
  protocol_server: Add get_client_data call
  transport: Track client state for real
  transport: Add stringifiers to client_data class
  generic_server: Gentle iterator
  generic_server: Type alias
  docs: Add system.clients description
2022-02-22 20:58:25 +03:00
Piotr Dulikowski
091b20019b result_combinators: add result_wrap_unpack
Adds a helper combinator utils::result_wrap_unpack which, in contrast to
utils::result_wrap, uses futurize_apply instead of futurize_invoke to
call the wrapped callable.

In short, if utils::result_wrap is used to adapt code like this:

    f.then([] {})
      ->
    f_result.then(utils::result_wrap([] {}))

Then utils::result_wrap_unpack works for the following case:

    f.then_unpack([] (arg1, arg2) {})
      ->
    f_result.then(utils::result_wrap_unpack([] (arg1, arg2) {}))
2022-02-22 16:25:21 +01:00
Piotr Dulikowski
7afea88dfc result_loop: add result_repeat
Adds a result-aware counterpart to seastar::repeat. The new function
does not base on seastar::repeat, but rather is a rewrite of the
original (using a coroutine instead of an open-coded task). The main
consequence of using a coroutine is that exceptions from AsyncAction
need to be thrown once more.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
32cbc89779 result_loop: add result_do_until
Adds a result-aware counterpart to seastar::do_until. The new function
does not base on seastar::do_until, but rather is a rewrite of the
original (using a coroutine instead of an open-coded task). The main
consequence of using a coroutine is that exceptions from StopCondition
or AsyncAction need to be thrown once more.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
4f0a98a829 result_loop: add result_map_reduce
Adds result-aware counterparts to all seastar::map_reduce overloads.

Fortunately, it was possible to implement the functions by basing them
on seastar::map_reduce and get the same number of allocation. The only
exception happens when reducer::get() returns a non-ready future, which
doesn't seem to happen on the read coordinator path.
2022-02-22 16:08:52 +01:00
Piotr Grabowski
efe7456f0a type_json: support integers in scientific format
Add support for specifing integers in scientific format (for example
1.234e8) in INSERT JSON statement:

INSERT INTO table JSON '{"int_column": 1e7}';

Inserting a floating-point number ending with .0 is allowed, as
the fractional part is zero. Non-zero fractional part (for example
12.34) is disallowed. A new test is added to test all those behaviors.

Before the JSON parsing library was switched to RapidJSON from JsonCpp,
this statement used to work correctly, because JsonCpp transparently
casts double to integer value.

This behavior differs from Cassandra, which disallows those types of
numbers (1e7, 123.0 and 12.34).

Fix typo in if condition: "if (value.GetUint64())" to
"if (value.IsUint64())".

Fixes #10100
2022-02-22 12:55:38 +01:00
Piotr Grabowski
649ab70936 type_json: add missing IsString() checks
Add missing IsString() checks to parsing date, time, uuid and inet
types by introducing validated_to_string_view function which checks
whether the value is of string type and otherwise throws 
marshal_exception. 

Without this check, a malformed input to those types would result in 
nasty ServerError with RapidJSON assertion instead of marshal_exception
with detail about the problem.

Add new tests checking passing non-string values for those types.

Fixes #10115
2022-02-21 16:58:13 +01:00
Piotr Grabowski
f8b67c9bd1 type_json: fix wrong blob JSON validation
Fixes wrong condition for validating whether a JSON string representing
blob value is valid. Previously, strings such as "6" or "0392fa" would
pass the validation, even though they are too short or don't start with
"0x". Add those test cases to json_cql_query_test.cc.

Fixes #10114
2022-02-21 16:58:12 +01:00
Nadav Har'El
7181a6757a test/cql-pytest: add a couple of tests for static columns
This patch adds two tests for two interesting edge cases in the behavior
of static columns in Scylla. We already have a lot of tests for static
columns in other frameworks (C++ unit tests, cql and dtest), but the two
cases here are issues where specifically we weren't sure how Cassandra
behaves in those cases - and this can most easily be checked in the
test/cql-pytest framework.

The first test, test_static_not_selected, is a reproducer for issue #10091.
This issue was reported by a user @aohotnik, who was surprised by the
fact that Scylla returns empty values, instead of nothing, when selecting
regular columns of a non-existent row if the partition has a static
column set. The test demonstrates a difference between Scylla and
Cassandra, so it is marked "xfail" - it passes on Cassandra and fails on
Scylla. If later we decide that both Scylla's and Cassandra's behaviours
are reasonable and both can be considered "correct", we can change this
test to except Scylla's result as well and it will beging to pass.

The second test, test_missing_row_with_static, shows that SELECT of a
non-existent row returns nothing - even if the partition has a static
column. The behavior in this case is identical in Scylla and Cassandra,
so this test passes. This contrasts with the analogous situation in LWT
UPDATE from issue #10081, where the IF condition is expected to see the
static column value.

Refs #10081
Refs #10091

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220220120418.831540-1-nyh@scylladb.com>
2022-02-21 16:04:57 +02:00
Avi Kivity
adc08d0ab9 Merge "Drop v1 input support for mutation compactor" from Botond
"
Currently the mutation compactor supports v1 and v2 output and has a v1
output. The next step is to add a v2 output but this would lead to a
full conversion matrix which we want to avoid. So in preparation we drop
the v1 input support. Most inputs were already v2, but there were some
notable exceptions: tests, the compacting reader and the multishard
query code. The former two was a simple mechanical update but the latter
required some further work because it turned out the v2 version of
evictable reader wasn't used yet and thus it managed to hide some bugs
and dropped features. While at it, we migrate all evictable and
multishard reader users to the v2 variant of the respective readers and
drop the v1 variant completely.
With this the road is open to a v2 compactor output and therefore to a
v2 sstable writer.

Tests: unit(dev, release), dtest(paging_additional_test.py)
"

* 'compact-mutation-v2-only-input/v5' of https://github.com/denesb/scylla:
  test/lib/test_utils: return OK from check() variants
  repair/row_level: use evictable reader v2
  db/view/view_updating_consumer: migrate to v2
  test/boost/mutation_reader_test: add v2 specific evictable reader tests
  test: migrate to evictable reader v2 and multishard combining reader v2
  compact_mutation: drop support for v1 input
  test: pass v2 input to mutation_compaction
  test/boost/mutation_test: simplify test_compaction_data_stream_split test
  mutation_partition: do_compact(): do drop row tombstones covered by higher order tombstones
  multishard_mutation_query: migrate to v2
  mutation_fragment_v2: range_tombstone_change: add memory_usage()
  evictable_reader_v2: terminate active range tombstones on reader recreation
  evictable_reader_v2: restore handling of non-monotonically increasing positions
  evictable_reader_v2: simplify handling of reader recreation
  mutation: counter_write_query: use v2 reader
  mutation: migrate consume() to v2
  mutation_fragment_v2,flat_mutation_reader_v2: mirror v1 concept organization
  mutation_reader: compacting_reader: require a v2 input reader
  db/view/view_builder: use v2 reader
  test/lib/flat_mutation_reader_assertions: adjust has_monotonic_positions() to v2 spec
2022-02-21 14:32:55 +02:00
Botond Dénes
841b982e51 test/lib/test_utils: return OK from check() variants
The various require() and check() methods in test_utils.hh were
introduced to replace BOOST_REQUIRE() and BOOST_CHECK() respectively in
multi-shard concurrent tests, specifically those in
tests/boost/multishard_mutation_query_test.cc.
This was done literally, just replacing BOOST_REQUIRE() with require()
and BOOST_CHECK() with check(). The problem is that check() is missing a
feature BOOST_CHECK() had: while BOOST_CHECK() doesn't cause an
immediate test failure, just logging an error if the condition fails, it
remembers this failure and will fail the test in the end. check() did
not have this feature and this caused potential errors to just be logged
while the test could still pass fine, causing false-positive tests
passes. This patch fixes this by returning a [[nodiscard]] bool from the
check() methods. The caller can & these together over all calls to
check() methods and manually fail the test in the end. We choose this
method over a hidden global (like BOOST_CHECK() does) for simplicity
sake.
2022-02-21 12:29:25 +02:00
Botond Dénes
05c48ee0cc db/view/view_updating_consumer: migrate to v2
Not a completely mechanical transition. The consumer has to generate its
mutation via a mutation_rebuilder_v2 as mutation fragment v2 cannot be
applied to mutations directly yet.
2022-02-21 12:29:24 +02:00
Botond Dénes
014a23bf2a test/boost/mutation_reader_test: add v2 specific evictable reader tests
One is a reincarnation of the recently removed
test_multishard_combining_reader_non_strictly_monotonic_positions. The
latter was actually targeting the evictable reader but through the
multishard reader, probably for historic reasons (evictable reader was
part of the multishard reader family).
The other one checks that active range tombstones changes are properly
terminated when the partition ends abruptly after recreating the reader.
2022-02-21 12:29:24 +02:00
Botond Dénes
e3c618beba test: migrate to evictable reader v2 and multishard combining reader v2
All reads are now using the v2 version of these readers, test them
instead of the old v1.
2022-02-21 12:29:24 +02:00
Botond Dénes
f1e9e3b3b7 compact_mutation: drop support for v1 input 2022-02-21 12:29:24 +02:00
Botond Dénes
284ed9154f test: pass v2 input to mutation_compaction 2022-02-21 12:29:24 +02:00
Botond Dénes
dec4e5659b test/boost/mutation_test: simplify test_compaction_data_stream_split test
This test has very elaborate infrastructure essentially duplicating
mutation, mutation::apply() and mutation::operator==. Drop all this
extra code and use mutations directly instead. This makes migrating the
test to v2 easier.
2022-02-21 12:29:24 +02:00