Commit Graph

30290 Commits

Author SHA1 Message Date
Jan Ciolek
a5bcd4f7f2 cql3: expr: Add subscript struct
Add a struct called subscript, which will be used in expression
variant to represent subscripted values e.g col[x], val[sub].
It will replace the sub field of column_value.
Having a separate struct in AST for this purpose
is cleaner and allows to express subscripting
values other than column_value.

It is not added to the expression variant yet, because
that would require immediately implementing all visitors.

The following commits will implement individual visitors
and then subscript will finally be added to expression.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 01:32:59 +01:00
Tomasz Grabiec
7719f4cd91 Merge "Group 0 discovery: persist and restore peers" from Kamil
We add a `peers()` method to `discovery` which returns the peers
discovered until now (including seeds). The caller of functions which
return an output -- `tick` or `request` -- is responsible for persisting
`peers()` before returning the output of `tick`/`request` (e.g. before
sending the response produced by `request` back). The user of
`discovery` is also responsible for restoring previously persisted peers
when constructing `discovery` again after a restart (e.g. if we
previously crashed in the middle of the algorithm).

The `persistent_discovery` class is a wrapper around `discovery` which
does exactly that.

For storage we use a simple local table.

A simple bugfix is also included in the first patch.

* kbr/discovery-persist-v3:
  service: raft: raft_group0: persist discovered peers and restore on restart
  db: system_keyspace: introduce discovery table
  service: raft: discovery: rename `get_output` to `tick`
  service: raft: discovery: stop returning peer_list from `request` after becoming leader
2022-02-25 17:23:08 +01:00
Avi Kivity
ff2cd72766 Merge 'utils: cached_file: Fix alloc-dealloc mismatch during eviction' from Tomasz Grabiec
cached_page::on_evicted() is invoked in the LSA allocator context, set in the
reclaimer callback installed by the cache_tracker. However,
cached_pages are allocated in the standard allocator context (note:
page content is allocated inside LSA via lsa_buffer). The LSA region
will happily deallocate these, thinking that they these are large
objects which were delegated to the standard allocator. But the
_non_lsa_memory_in_use metric will underflow. When it underflows
enough, shard_segment_pool.total_memory() will become 0 and memory
reclamation will stop doing anything, leading to apparent OOM.

The fix is to switch to the standard allocator context inside
cached_page::on_evicted(). evict_range() was also given the same
treatment as a precaution, it currently is only invoked in the
standard allocator context.

The series also adds two safety checks to LSA to catch such problems earlier.

Fixes #10056

\cc @slivne @bhalevy

Closes #10130

* github.com:scylladb/scylla:
  lsa: Abort when trying to free a standard allocator object not allocated through the region
  lsa: Abort when _non_lsa_memory_in_use goes negative
  tests: utils: cached_file: Validate occupancy after eviction
  test: sstable_partition_index_cache_test: Fix alloc-dealloc mismatch
  utils: cached_file: Fix alloc-dealloc mismatch during eviction
2022-02-25 18:19:04 +02:00
Botond Dénes
d8833de3bb Merge "Redefine Compaction Backlog to tame compaction aggressiveness" From Raphael S. Carvalho
"
Problem statement
=================
Today, compaction can act much more aggressive than it really has to, because
the strategy and its definition of backlog are completely decoupled.

The backlog definition for size-tiered, which is inherited by all
strategies (e.g.: LCS L0, TWCS' windows), is built on the assumption that the
world must reach the state of zero amplification. But that's unrealistic and
goes against the intent amplification defined by the compaction strategy.
For example, size tiered is a write oriented strategy which allows for extra
space amplification for compaction to keep up with the high write rate.

It can be seen today, in many deployments, that compaction shares is either
close to 1000, or even stuck at 1000, even though there's nothing to be done,
i.e. the compaction strategy is completely satisfied.
When there's a single sstable per tier, for example.
This means that whenever a new compaction job kicks in, it will act much more
aggressive because of the high shares, caused by false backlog of the existing
tables. This translates into higher P99 latencies and reduced throughput.

Solution
========
This problem can be fixed, as proposed in the document "Fixing compaction
aggressiveness due to suboptimal definition of zero backlog by controller" [1],
by removing backlog of tiers that don't have to be compacted now, like a tier
that has a single file. That's about coupling the strategy goal with the
backlog definition. So once strategy becomes satisfied, so will the controller.

Low-efficiency compaction, like compacting 2 files only or cross-tier, only
happens when system is under little load and can proceed at a slower pace.
Once efficient jobs show up, ongoing compactions, even if inefficient, will get
more shares (as efficient jobs add to the backlog) so compaction won't fall
behind.

With this approach, throughput and latency is improved as cpu time is no longer
stolen (unnecessarily) from the foreground requests.

[1]: https://docs.google.com/document/d/1EQnXXGWg6z7VAwI4u8AaUX1vFduClaf6WOMt2wem5oQ

Results
=======
Test sequentially populates 3 tables and then run a mixed workload on them,
where disk:memory ratio (usage) reaches ~30:1 at the peak.

Please find graphs here:
https://user-images.githubusercontent.com/1409139/153687219-32368a35-ac63-461b-a362-64dbe8449a00.png

1) Patched version started at ~01:30
2) On population phase, throughput increase and lower P99 write latency can be
clearly observed.
3) On mixed phase, throughput increase and lower P99 write and read latency can
also be clearly observed.
4) Compaction CPU time sometimes reach ~100% because of the delay between each
loader.
5) On unpatched version, it can be seen that backlog keeps growing even when
though strategies become satisfied, so compaction is using much more CPU time
in comparison. Patched version correctly clears the backlog.

Can also be found at:
github.com/raphaelsc/scylla.git compaction-controller-v5

tests: UNIT(dev, debug).
"

* 'compaction-controller-v5' of https://github.com/raphaelsc/scylla:
  tests: Add compaction controller test
  test/lib/sstable_utils: Set bytes_on_disk for fake SSTables
  compaction/size_tiered_backlog_tracker.hh: Use unsigned type for inflight component
  compaction: Redefine compaction backlog to tame compaction aggressiveness
  compaction_backlog_tracker: Batch changes through a new replacement interface
  table: Disable backlog tracker when stopping table
  compaction_backlog_tracker: make disable() public
  compaction_backlog_tracker: Clear tracker state when disabled
  compaction: Add normalized backlog metric
  compaction: make size_tiered_compaction_strategy static
2022-02-25 09:21:08 +02:00
Pavel Emelyanov
40078a6f8c types.hh: Nitpick on <=> usage
tri_compare_opt can avoid casting bool to int for spaceshipping
int - int <=> 0 looks nicer and shorter as int <=> int
data_type::compare from serialized_tri_compare already returns strong_ordering

tests: unit(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220224125556.13138-1-xemul@scylladb.com>
2022-02-25 07:26:11 +02:00
Nadav Har'El
c26230943b alternator ttl: add metrics
This patch adds metrics to the Alternator TTL feature (aka the "expiration
service").

I put these metrics deliberately in their own object in ttl.{hh,cc}, and
also with their own prefix ("expiration_*") - and *not* together with the
rest of the Alternator metrics (alternator/stats.{hh,cc}). This is
because later we may want to use the expiration service not only in
Alternator but also in CQL - to support per-item expiration with CDC
events also in CQL. So the implementation of this feature should not be
too tangled with that of Alternator.

The patch currently adds four metrics, and opens the path to easily add
more in the future. The metrics added now are:

1. scylla_expiration_scan_passes:  The number of scan passes over the
       entire table. We expect this to grow by 1 every
       alternator_ttl_period_in_seconds seconds.

2. scylla_expiration_scan_table: The number of table scans. In each scan
       pass, we scan all the tables that have the Alternator TTL feature
       enabled. Each scan of each table is counted by this counter.

3. scylla_expiration_items_deleted: Counts the number of items that
       the expiration service expired (deleted). Please remember that
       each item is considered for expiration - and then expired - on
       only one node, so each expired item is counted only once - not
       RF times.

4. scylla_expiration_secondary_ranges_scanned: If this counter is
       incremented, it means this node took over some other node's
       expiration scanning duties while the other node was down.

This patch also includes a couple of unrelated comment fixes.

I tested the new metrics manually - they aren't yet tested by the
Alternator test suite because I couldn't make up my mind if such
tests would belong in test_ttl.py or test_metrics.py :-)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220224092419.1132655-1-nyh@scylladb.com>
2022-02-25 07:26:11 +02:00
Asias He
ec59f7a079 repair: Do not flush hints and batchlog if tombstone_gc_mode is not repair
The flush of hints and batchlog are needed only for the table with
tombstone_gc_mode set to repair mode. We should skip the flush if the
tombstone_gc_mode is not repair mode.

Fixes #10004

Closes #10124
2022-02-25 07:26:11 +02:00
Nadav Har'El
d1b4cbfbc3 test/cql-pytest: add reproducer for LWT bug with static-column conditions
This patch adds a reproducing test for issue #10081. That issue is about
a conditional (LWT) UPDATE operation that chose a non-existent row via WHERE,
and its condition refers to both static and regular columns: In that case,
the code incorrectly assumes that because it didn't read any row, all columns
are null - and forgets that the static column is *not* null.

The test, test_lwt.py::test_lwt_missing_row_with_static
passes on Cassandra but fails on Scylla, so is marked xfail.

Refs #10081

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220215215243.660087-1-nyh@scylladb.com>
2022-02-25 07:26:11 +02:00
Avi Kivity
8f2bc838af Update seastar submodule
* seastar ea6a6820ed...2849a8a8ba (1):
  > Merge "make semaphore and shared_promise abortable" from Gleb

include fixup from Gleb added.
2022-02-25 07:26:11 +02:00
Benny Halevy
e2894bc762 compaction_manager: task: use plain UUID
Now that a null uuid is defined to be logically false
there's no need to use an optional UUID.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-02-25 07:26:11 +02:00
Nadav Har'El
db7b11cfc4 alternator: make TTL expiration scanner bypass cache
The background scan for expired Alternator items (the TTL feature)
should bypass the cache to avoid poluting it with the entire content
of the table being scanned.

I tested that the flag added in this patch really works by adding a printout
to the code in table.cc which creates the reader. Although we do have a
metric for uses of BYPASS CACHE, unfortunately this metric counts usage
of "BYPASS CACHE" in CQL statements - and not does not account the low-
level calls that we use in the ttl scanner.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-02-25 07:26:11 +02:00
Nadav Har'El
e06b5d9306 alternator: updated compatibility.md about TTL feature
The document docs/alternator/compatibility.md suggested that Alternator
does not support the TTL feature at all. The real situation is more
optimistic - this feature is supported, but as experimental feature.
So let's update compatibility.md with the real status of this feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-02-25 07:26:11 +02:00
Nadav Har'El
49a8164fb7 alternator: add configurable scan period to TTL expiration
Before this patch, the experimental TTL (expiration time) feature in
Alternator scans tables for expiration in a tight loop - starting the
next scan one second after the previous one completed.

In this patch we introduce a new configuration option,
alternator_ttl_period_in_seconds, which determines how frequently
to start the scan. The default is 24 hours - meaning that the next
scan is started 24 hours after the previous one started.

The tests (test/alternator/run) change this configuration back to one
second, so that expiration tests finish as quickly as possible.

Please note that the scan is *not* slowed down to fill this 24 hours -
if it finishes in one hour, it will then sleep for 23 hours. Additional
work would be needed to slow down the scan to not finish too quickly.
One idea not yet implemented is to move the expiration service from
the "maintenance" scheduling group which it uses today to a new
scheduling group, and modifying the number of shares that this group
gets.

Another thing worth noting about the configurable period (which defaults
to 24 hours) is that when TTL is enabled on an Alternator table, it can
take that amount of time until its scan starts and items start expiring
from it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-02-25 07:26:11 +02:00
Tomasz Grabiec
1d75a8c843 lsa: Abort when trying to free a standard allocator object not
allocated through the region

It indicates alloc-dealloc mismatch, and can cause other problems in
the systems like unable to reclaim memory. We want to catch this at
the deallocation site to be able to quickly indentify the offender.

Misbehavior of this sort can cause fake OOMs due to underflow of
_non_lsa_memory_in_use. When it underflows enough,
shard_segment_pool.total_memory() will become 0 and memory reclamation
will stop doing anything.

Refs #10056
2022-02-25 01:42:15 +01:00
Tomasz Grabiec
9dd4153c16 lsa: Abort when _non_lsa_memory_in_use goes negative
It indicates alloc-dealloc mismatch, and can cause other problems in
the systems like unable to reclaim memory. Catch early.

Refs #10056
2022-02-25 01:42:15 +01:00
Tomasz Grabiec
ca09a72597 tests: utils: cached_file: Validate occupancy after eviction
Reproducer for #10056

Catches alloc-dealloc mismatch leading to the underflow of
_non_lsa_memory_in_use.
2022-02-25 01:42:15 +01:00
Tomasz Grabiec
b0d5bb334c test: sstable_partition_index_cache_test: Fix alloc-dealloc mismatch
The test was allocating entries in the standard allocator, but they
are evicted in the LSA allocator context.

Fix by allocating under LSA.
2022-02-25 01:42:15 +01:00
Raphael S. Carvalho
2a7939ee4d tests: Add compaction controller test
There's no automated test for controller, it's time to have one.
Let's start with a basic one that verifies the assumption that
perfectly compacted tiers should produce 0 backlog.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-02-24 18:57:45 -03:00
Raphael S. Carvalho
96cfe7d530 test/lib/sstable_utils: Set bytes_on_disk for fake SSTables
Not precise, as bytes_on_disk accounts for all components, but good enough
for testing purposes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-02-24 18:57:45 -03:00
Raphael S. Carvalho
a8caa67937 compaction/size_tiered_backlog_tracker.hh: Use unsigned type for inflight component
For describing data size, we use unsigned types.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-02-24 18:57:45 -03:00
Raphael S. Carvalho
1d9f53c881 compaction: Redefine compaction backlog to tame compaction aggressiveness
Today, compaction can act much more aggressive than it really has to, because
the strategy and its definition of backlog are completely decoupled.

The backlog definition for size-tiered, which is inherited by all
strategies (e.g.: LCS L0, TWCS' windows), is built on the assumption that the
world must reach the state of zero amplification. But that's unrealistic and
goes against the intent amplification defined by the compaction strategy.
For example, size tiered is a write oriented strategy which allows for extra
space amplification for compaction to keep up with the high write rate.

It can be seen today, in many deployments, that compaction shares is either
close to 1000, or even stuck at 1000, even though there's nothing to be done,
i.e. the compaction strategy is completely satisfied.
When there's a single sstable per tier, for example.
This means that whenever a new compaction job kicks in, it will act much more
aggressive because of the high shares, caused by false backlog of the existing
tables. This translates into higher P99 latencies and reduced throughput.

Solution
========
This problem can be fixed, as proposed in the document "Fixing compaction
aggressiveness due to suboptimal definition of zero backlog by controller" [1],
by removing backlog of tiers that don't have to be compacted now, like a tier
that has a single file. That's about coupling the strategy goal with the
backlog definition. So once strategy becomes satisfied, so will the controller.

Low-efficiency compaction, like compacting 2 files only or cross-tier, only
happens when system is under little load and can proceed at a slower pace.
Once efficient jobs show up, ongoing compactions, even if inefficient, will get
more shares (as efficient jobs add to the backlog) so compaction won't fall
behind.

With this approach, throughput and latency is improved as cpu time is no longer
stolen (unnecessarily) from the foreground requests.

[1]: https://docs.google.com/document/d/1EQnXXGWg6z7VAwI4u8AaUX1vFduClaf6WOMt2wem5oQ

Fixes #4588.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-02-24 18:57:38 -03:00
Raphael S. Carvalho
ddd693c6d7 compaction_backlog_tracker: Batch changes through a new replacement interface
This new interface allows table to communicate multiple changes in the
SSTable set with a single call, which is useful on compaction completion
for example.
With this new interface, the size tiered backlog tracker will be able to
know when compaction completed, which will allow it to recompute tiers
and their backlog contribution, if any. Without it, tiered tracker
would have to recompute tiers for every change, which would be terribly
expensive.
The old remove/add interface are being removed in favor of the new one.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-02-24 15:34:16 -03:00
Raphael S. Carvalho
84d843697b table: Disable backlog tracker when stopping table
Backlog tracker is managed by compaction strategy, and we'd like to
have it disabled in table::stop(), to make sure that all state is
cleared. For example, a reference to a shared sstable, in the
tracker implementation, could prevent the sstable manager from being
stopped as it relies on all sstables managed by it being closed
first. By calling tracker's disable() method, table::stop() will
guarantee that state is cleared by completion.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-02-24 13:41:05 -03:00
Raphael S. Carvalho
26350c8591 compaction_backlog_tracker: make disable() public
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-02-24 13:40:50 -03:00
Raphael S. Carvalho
c15e055612 compaction_backlog_tracker: Clear tracker state when disabled
If the tracker is disabled, we never get to access the underlying
implementation anymore. It makes sense to clear _impl on
disable(). So table::stop() can call its backlog tracker's disable
method, clearing all its state. This is important for clean
shutdown, as any sstable in tracker state may cause sstable
manager to hang when being stopped.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-02-24 13:40:39 -03:00
Raphael S. Carvalho
a70ce7ecb3 compaction: Add normalized backlog metric
Normalized backlog metric is important for understanding the controller
behavior as the controller acts on normalized backlog for yielding an
output, not the raw backlog value in bytes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-02-24 13:40:33 -03:00
Raphael S. Carvalho
89eb563c94 compaction: make size_tiered_compaction_strategy static
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-02-24 13:40:29 -03:00
Tomasz Grabiec
e68cf55514 utils: cached_file: Fix alloc-dealloc mismatch during eviction
on_evicted() is invoked in the LSA allocator context, set in the
reclaimer callback instaled by the cache_tracker. However,
cached_pages are allocated in the standard allocator context (note:
page content is allocated inside LSA via lsa_buffer). The LSA region
will happilly deallocate these, thinking that they these are large
objects which were delegated to the standard allocator. But the
_non_lsa_memory_in_use metric will underflow. When it underflows
enough, shard_segment_pool.total_memory() will become 0 and memory
reclamation will stop doing anything, leading to apparent OOM.

The fix is to switch to the standard allocator context inside
cached_page::on_evicted(). evict_range() was also given the same
treatment as a precaution, it currently is only invoked in the
standard allocator context.

Fixes #10056
2022-02-23 18:38:05 +01:00
Asias He
680195564d repair: Unify repair uuid report in the log
More and more places are using the repair[uuid]: format for logging
repair jobs with the uuid. Convert more places to use the new format to
unify the log format.

This makes it easier to grep a specific repair job in the log.

Closes #10125
2022-02-23 09:13:12 +02:00
Avi Kivity
cbba80914d memtable: move to replica module and namespace
Memtables are a replica-side entity, and so are moved to the
replica module and namespace.

Memtables are also used outside the replica, in two places:
 - in some virtual tables; this is also in some way inside the replica,
   (virtual readers are installed at the replica level, not the
   cooordinator), so I don't consider it a layering violation
 - in many sstable unit tests, as a convenient way to create sstables
   with known input. This is a layering violation.

We could make memtables their own module, but I think this is wrong.
Memtables are deeply tied into replica memory management, and trying
to make them a low-level primitive (at a lower level than sstables) will
be difficult. Not least because memtables use sstables. Instead, we
should have a memtable-like thing that doesn't support merging and
doesn't have all other funky memtable stuff, and instead replace
the uses of memtables in sstable tests with some kind of
make_flat_mutation_reader_from_unsorted_mutations() that does
the sorting that is the reason for the use of memtables in tests (and
live with the layering violation meanwhile).

Test: unit (dev)

Closes #10120
2022-02-23 09:05:16 +02:00
Avi Kivity
5d4213e1b8 Update seastar submodule
* seastar c18cc5dc68...ea6a6820ed (7):
  > Merge 'json/formatter: Escape strings' from Juliusz Stasiewicz
Fixes #9061.
  > Merge "Export IO rate-limiter tokens metrics" from Pavel E
  > Merge "Fix block device configuration and RWF_NOWAIT support" from Pavel E
  > code: Sanitize fs magic values usage
  > Fix build error with c++17
  > coroutine: introduce coroutine::switch_to
  > Broader catching of bpo command-line parsing errors in app-template
2022-02-22 20:58:25 +03:00
Avi Kivity
75fb45df1b Merge 'Propagate CQL coordinator timeouts and failures for reads' from Piotr Dulikowski
This PR propagates the read coordinator logic so that read timeout and read failure exceptions are propagated without throwing on the coordinator side.

This PR is only concerned with exceptions which were originally thrown by the coordinator (in read resolvers). Exceptions propagated through RPC and RPC timeouts will still throw, although those exceptions will be caught and converted into exceptions-as-values by read resolvers.

This is a continuation of work started in #10014.

Results of `perf_simple_query --smp 1 --operations-per-shard 1000000` (read workload), compared with merge base (10880fb0a7):

```
BEFORE:
125085.13 tps ( 80.2 allocs/op,  12.2 tasks/op,   49010 insns/op)
125645.88 tps ( 80.2 allocs/op,  12.2 tasks/op,   49008 insns/op)
126148.85 tps ( 80.2 allocs/op,  12.2 tasks/op,   49005 insns/op)
126044.40 tps ( 80.2 allocs/op,  12.2 tasks/op,   49005 insns/op)
125799.75 tps ( 80.2 allocs/op,  12.2 tasks/op,   49003 insns/op)

AFTER:
127557.21 tps ( 80.2 allocs/op,  12.2 tasks/op,   49197 insns/op)
127835.98 tps ( 80.2 allocs/op,  12.2 tasks/op,   49198 insns/op)
127749.81 tps ( 80.2 allocs/op,  12.2 tasks/op,   49202 insns/op)
128941.17 tps ( 80.2 allocs/op,  12.2 tasks/op,   49192 insns/op)
129276.15 tps ( 80.2 allocs/op,  12.2 tasks/op,   49182 insns/op)
```

The PR does not introduce additional allocations on the read happy-path. The number of instructions used grows by about 200 insns/op. The increase in TPS is probably just a measurement error.

Closes #10092

* github.com:scylladb/scylla:
  indexed_table_select_statement: return some exceptions as exception messages
  result_combinators: add result_wrap_unpack
  select_statement: return exceptions as errors in execute_without_checking_exception_message
  select_statement: return exceptions without throwing in do_execute
  select_statement: implement execute_without_checking_exception_message
  select_statement: introduce helpers for working with failed results
  query_pager: resultify relevant methods
  storage_proxy: resultify (do_)query
  storage_proxy: resultify query_singular
  storage_proxy: propagate failed results through query_partition_key_range
  storage_proxy: resultify query_partition_key_range_concurrent
  storage_proxy: modify handle_read_error to also handle exception containers
  abstract_read_executor: return result from execute()
  abstract_read_executor: return and handle result from has_cl()
  storage_proxy: resultify handling errors from read-repair
  abstract_read_executor::reconcile: resultise handling of data_resolver->done()
  abstract_read_executor::execute: resultify handling of data_resolver->done()
  result_combinators: add result_discard_value
  abstract_read_executor: resultify _result_promise
  abstract_read_executor: return result from done()
  abstract_read_resolver: fail promises by passing exception as value
  abstract_read_resolver: resultify promises
  exceptions: make it possible to return read_{timeout,failure}_exception as value
  result_try: add as_inner/clone_inner to handle types
  result_try: relax ConvertWithTo constraint
  exception_container: switch impl to std::shared_ptr and make copyable
  result_loop: add result_repeat
  result_loop: add result_do_until
  result_loop: add result_map_reduce
  utils/result: add utilities for checking/creating rebindable results
2022-02-22 20:58:25 +03:00
Nadav Har'El
eec39e1258 Merge 'api: keyspace_scrub: validate params' from Benny Halevy
Refs #10087

Add validation of all params for the keyspace_scrub api.
The validation method is generic and should be used by all apis eventually,
but I'm leaving that as follow-up work.

While at it, fixed the exception types thrown on invalid `scrub_mode` or `quarantine_mode` values from `std::runtime_error` to `httpd::bad_param_exception` so to generate the `bad_request` http status.

And added unit tests to verify that, and the handling of an unknown parameter.

Test: unit(dev)
DTest: nodetool_additional_test.py::TestNodetool::{test_scrub_with_one_node_expect_data_loss,test_scrub_with_multi_nodes_expect_data_rebuild,test_scrub_sstable_with_invalid_fragment,test_scrub_ks_sstable_with_invalid_fragment,test_scrub_segregate_sstable_with_invalid_fragment,test_scrub_segregate_ks_sstable_with_invalid_fragment}

Closes #10090

* github.com:scylladb/scylla:
  api: storage_service: scrub: validate parameters
  api: storage_service: refactor parse_tables
  api: storage_service: refactor validate_keyspace
  test: rest_api: add test_storage_service_keyspace_scrub tests
  api: storage_service: scrub: throw httpd::bad_param_exception for invalid param values
2022-02-22 20:58:25 +03:00
Nadav Har'El
364bd00136 test/cql-pytest: confirm that table names cannot include non-Latin letters
In CQL table names must be composed only of letters, digits, or underscores,
but some Cassandra documentation is unclear whether these "letters" refer only
to the Latin alphabet, or maybe UTF-8 names composed of letters in other
alphabets should be allowed too.

This patch adds a test that confirms that both Scylla and Cassandra only
accept the Latin alphabet in table names, and for example UTF-8 names
with French or Hebrew letters are rejected.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220222134220.972413-1-nyh@scylladb.com>
2022-02-22 20:58:25 +03:00
Nadav Har'El
1a940a1003 test/cql-pytest: remove "xfail" mark from scientific-notation tests that now pass
After issue #10100 was fixed, the two tests reproducing it now pass,
so remove their "xfail" marker.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220222131809.970592-1-nyh@scylladb.com>
2022-02-22 20:58:25 +03:00
Nadav Har'El
be84a8def3 Merge 'Allow integers in scientific format in INSERT JSON ' from Piotr Grabowski
Add support for specifing integers in scientific format (for example 1.234e8) in INSERT JSON statement:

```
INSERT INTO table JSON '{"int_column": 1e7}';
```

Before the JSON parsing library was switched to RapidJSON from JsonCpp, this statement used to work correctly, because JsonCpp transparently casts double to integer value.

Inserting a floating-point number ending with .0 is allowed, as the fractional part is zero. Non-zero fractional part (for example 12.34) is disallowed. A new test is added to test all those behaviors.

This behavior differs from Cassandra, which disallows those types of numbers (1e7, 123.0 and 12.34), however some users rely on that behavior and JSON specification itself does not distinct between floating-point numbers and integer numbers (only a single "number" type is defined).

This PR also fixes two minor issues I noticed while looking at the code: wrong blob validation and missing `IsString()` checks that could result in assertion error.

Fixes #10100
Fixes #10114
Fixes #10115

Closes #10101

* github.com:scylladb/scylla:
  type_json: support integers in scientific format
  type_json: add missing IsString() checks
  type_json: fix wrong blob JSON validation
2022-02-22 20:58:25 +03:00
Botond Dénes
3aa05f7f03 Merge "Make system.clients table virtual" from Pavel Emelyanov
"
The table lists connected clients. For this the clients are
stored in real table when they connect, update their statuses
when needed and remove^w tombstone themselves when they
disconnect. On start the whole table is cleared.

This looks weird. Here's another approach (inspired by the
hackathon project) that makes this table a pure virtual one.
The schema is preserved so is the data returned.

The benefits of doing it virtual are

- no on-disk updates while processing clients
- no potentially failing updates on non-failing disconnect
- less usage of the global qctx thing
- less calls to global storage_proxy
- simpler support for thrift and alternator clients (today's
  table implementation doesn't track them)
- the need to make virtual tables reg/unreg dynamic

branch: https://github.com/xemul/scylla/tree/br-clients-virtual-table-4
tests: manual(dev), unit(dev)

The manual test used 80-shards node and 1M connections from
1k different IP addresses.
"

* 'br-clients-virtual-table-4' of https://github.com/xemul/scylla:
  test: Add cql-pytest sanity test for system.clients table
  client_data: Sanitize connection_notifier
  transport: Indentation fix after previous patch
  code: Remove old on-disk version of system.clients table
  system_keyspace: Add clients_v virtual table
  protocol_server: Add get_client_data call
  transport: Track client state for real
  transport: Add stringifiers to client_data class
  generic_server: Gentle iterator
  generic_server: Type alias
  docs: Add system.clients description
2022-02-22 20:58:25 +03:00
Piotr Dulikowski
ddf049738d indexed_table_select_statement: return some exceptions as exception messages
Adjusts the indexed_table_select_statement so that it uses the
result-aware methods in storage_proxy and propagates failed results as
result_message::exception.
2022-02-22 16:25:21 +01:00
Piotr Dulikowski
091b20019b result_combinators: add result_wrap_unpack
Adds a helper combinator utils::result_wrap_unpack which, in contrast to
utils::result_wrap, uses futurize_apply instead of futurize_invoke to
call the wrapped callable.

In short, if utils::result_wrap is used to adapt code like this:

    f.then([] {})
      ->
    f_result.then(utils::result_wrap([] {}))

Then utils::result_wrap_unpack works for the following case:

    f.then_unpack([] (arg1, arg2) {})
      ->
    f_result.then(utils::result_wrap_unpack([] (arg1, arg2) {}))
2022-02-22 16:25:21 +01:00
Piotr Dulikowski
c5bcfee28f select_statement: return exceptions as errors in execute_without_checking_exception_message
Modifies the remaining logic of execute_without... (apart from the
do_execute call) so that the result-aware versions of storage_proxy's
methods are called and failed results are converted to
result_message::exception.
2022-02-22 16:25:21 +01:00
Piotr Dulikowski
5106c60cd0 select_statement: return exceptions without throwing in do_execute
Modifies do_execute so that it uses the result-aware versions of the
query_pager's methods and returns them as result_message::exception.
2022-02-22 16:25:21 +01:00
Piotr Dulikowski
3a4d3f3175 select_statement: implement execute_without_checking_exception_message
The select_statement will be able to propagate coordinator failures
without throwing, so it's important to override the default
implementations of execute and excecute_without... so that the first
calls the latter and not the other way around.
2022-02-22 16:25:21 +01:00
Piotr Dulikowski
df7668797b select_statement: introduce helpers for working with failed results
Adds:

- Includes for result-related helper methods (to be used in later
  commits),
- Alias for coordinator_result,
- The wrap_result_to_error_message function - a bit similar to
  utils::result_wrap. Adapts a callable T -> shared_ptr<result_message>
  to take result<T> -> shared_ptr<result_message>. If the result is
  failed, it converts it into result_message::exception and returns.
2022-02-22 16:25:21 +01:00
Piotr Dulikowski
c96c8e4813 query_pager: resultify relevant methods
Now, the relevant methods of all query pagers properly propagate failed
results.
2022-02-22 16:25:21 +01:00
Piotr Dulikowski
e5922e650e storage_proxy: resultify (do_)query
Adjusts do_query so that it propagates and returns failed results. The
query_result method is added which is result-aware, and the old query
method was changed to call query_result.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
e39c5b6eba storage_proxy: resultify query_singular
Now, query_singular propagates and returns failed results without
rethrowing them.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
2f5f746ae2 storage_proxy: propagate failed results through query_partition_key_range
Now, query_partition_key_range propagates the failed result from
query_partition_key_range_concurrent.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
608032b2b5 storage_proxy: resultify query_partition_key_range_concurrent
Now, query_partition_key_range_concurrent propagates and returns
exceptions as values, if possible.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
10923d9d58 storage_proxy: modify handle_read_error to also handle exception containers
Now, storage_proxy::handle_read_error can work with both exception
containers and exception_ptrs.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
89fe804a1a abstract_read_executor: return result from execute() 2022-02-22 16:08:52 +01:00