Commit Graph

27585 Commits

Author SHA1 Message Date
Avi Kivity
b07a0867b3 cql3: expression: introduce nested_expression class
The exression type cannot be a member of a struct that is an
element of the expression variant. This is because it would then
be required to contain itself. So introduce a nested_expression
type to indirectly hold an expression, but keep the value semantics
we expect from expressions: it is copyable and a copy has separate
identity and storage.

In fact binary_operator had to resort to this trick, so it's converted
to nested_expression in the next patch.
2021-07-27 20:08:21 +03:00
Avi Kivity
8a518e9c78 Convert column_identifier_raw's use as selectable to expressions
Introduce unresolved_identifer as an unprepared counterpart to column_value.
column_identifier_raw no longer inherits from selectable::raw, but
methods for now to reduce churn.
2021-07-27 20:08:15 +03:00
Avi Kivity
d3e8c05bed make column_identifier::raw forward declarable
Otherwise we run into a #include loop when try to have an expression
with column_identifier::raw: expression.hh -> column_identifier.hh
-> selectable.hh -> expression.hh.
2021-07-27 20:00:48 +03:00
Avi Kivity
0e30a78573 cql3: introduce selectable::with_expression::raw
Prepare to migrate selectable::raw sub-classes to expressions by
creating a bridge betweet the two types. with_expression::raw
is a selectable::raw and implements all its methods (right now,
trivially), and its contents is an expression. The methods are
implemented using the usual visitor pattern.
2021-07-27 20:00:48 +03:00
Avi Kivity
df4d77e857 table: simplify generate_and_propagate_view_updates exception handling
We have both try/catch and handle_exception() to ignore exceptions.
Try/catch is enough, so remove handle_exception().

Closes #9011
2021-07-27 14:08:30 +02:00
Avi Kivity
f86e65b4e7 Merge "Fix quadratic behavior in memtable/row_cache with lots of range tombstones" from Tomasz
"
This series fixes two issues which cause very poor efficiency of reads
when there is a lot of range tombstones per live row in a partition.

The first issue is in the row_cache reader. Before the patch, all range
tombstones up to the next row were copied into a vector, and then put
into the buffer until it's full. This would get quadratic if there is
much more range tombstones than fit in a buffer.

The fix is to avoid the accumulation of all tombstones in the vector
and invoke the callback instead, which stops the iteration as soon as
the buffer is full.

Fixes #2581.

The second, similar issue was in the memtable reader.

Tests:

  - unit (dev)
  - perf_row_cache_update (release)
"

* tag 'no-quadratic-rt-in-reads-v1' of github.com:tgrabiec/scylla:
  test: perf_row_cache_update: Uncomment test case for lots of range tombstones
  row_cache: Consume range tombstones incrementally
  partition_snapshot_reader: Avoid quadratic behavior with lots of range tombstones
  tests: mvcc: Relax monotonicity check
  range_tombstone_stream: Introduce peek_next()
2021-07-27 14:39:13 +03:00
Avi Kivity
05d22d27a8 Merge "Cut repair->storage-service link" from Pavel E
"
It exists in the node-ops handler which is registered by repair code,
but is handled by storage service. Probably, the whole node-ops handler
should instead be moved into repair, but this looks like rather huge
rework. So instead -- put the node-ops verb registration inside the
storage-service.

This removes some more calls for global storage service instance and
allows slight optimization of node-ops cross-shards calls.

tests: unit(dev), start-stop
"

* 'br-remove-storage-service-from-nodeops' of https://github.com/xemul/scylla:
  storage_service: Replace globals with locals
  storage_service: Remove one extra hop of node-ops handler
  storage_service: Fix indentation after previous patch
  storage_service: Move cross-shard hop up the stack
  repair: Drop empty verbs reg/unreg methods
  repair, storage_service: Move nodeops reg/unreg to storage service
  repair: Coroutinize row-level start/stop
2021-07-27 13:27:27 +03:00
Takuya ASADA
fdc786b451 install.sh: add supervisor support
Bring supervisor support from dist/docker to install.sh, make it
installable from relocatable package.
This enables to use supervisor with nonroot / offline environment,
and also make relocatable package able to run in Docker environment.

Related #8849

Closes #8918
2021-07-27 12:51:29 +03:00
Takuya ASADA
42fd73d033 scylla_setup: add RAID5 support
This supports optional RAID5 support on scylla_setup.

Fixes #9076

Closes #9093
2021-07-27 12:49:29 +03:00
Avi Kivity
2cca461652 Merge 'sstables: merge row consumer interfaces with implementations' from Wojciech Mitros
This patch follows #9002, further reducing the complexity of the sstable readers.
The split between row consumer interfaces and implementations has been first added in 2015, and there is no reason to create new implementations anymore. By merging those classes, we achieve a sizeable reduction in sstable reader length and complexity.
Refs #7952
Tests: unit(dev)

Closes #9073

* github.com:scylladb/scylla:
  sstables: merge row_consumer into mp_row_consumer_k_l
  sstables: move kl row_consumer
  sstables: merge consumer_m into mp_row_consumer_m
  sstables: move mp_row_consumer_m
2021-07-27 12:23:29 +03:00
Benny Halevy
424c53d5b1 mutation_fragment_stream_validator: disambiguate schema member definition
gcc 10.3.1 complains that:
```
./mutation_fragment_stream_validator.hh:39:21: error: declaration of ‘const schema& mutation_fragment_stream_validator::schema() const’ changes meaning of ‘schema’ [-fpermissive]
   39 |     const ::schema& schema() const { return _schema; }
      |                     ^~~~~~
```

Defining the _schama member as `::schema` rather than just `schema`
calms the compiler down.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210727073941.1999909-1-bhalevy@scylladb.com>
2021-07-27 11:55:42 +03:00
Nadav Har'El
8030461a2c cql-pytest: translate Cassandra's misc. type tests
This is a translation of Cassandra's CQL unit test source file
validation/entities/TypeTest.java into our our cql-pytest framework.

This is a tiny test file, with only four test which apparently didn't
find their place in other source files. All four tests pass on Cassandra,
and all but one pass on Scylla - the test marked xfail discovered one
previously-unknown incompatibility with Cassandra:

Refs #9082: DROP TYPE IF EXISTS shouldn't fail on non-existent keyspace

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210726140934.1479443-1-nyh@scylladb.com>
2021-07-27 08:28:16 +03:00
Tomasz Grabiec
7578cef0a4 test: perf_row_cache_update: Uncomment test case for lots of range tombstones 2021-07-26 21:38:00 +02:00
Gleb Natapov
56d0f711e8 serialize: allow use non copyable types with std::variant
Message-Id: <20210720120935.710549-3-gleb@scylladb.com>
2021-07-26 19:09:19 +03:00
Gleb Natapov
63025a75b2 serialize: allow use non copyable types with std::optional
Message-Id: <20210720120935.710549-2-gleb@scylladb.com>
2021-07-26 19:09:19 +03:00
Avi Kivity
8a80e455fb sstables: keys: convert trichotomic comparisons to std::strong_ordering
Prevent accidental conversions to bool from yielding the wrong results.
Unprepared users (that converted to bool, or assigned to int) are adjusted.

Ref #1449

Test: unit (dev)

Closes #9088
2021-07-26 19:09:19 +03:00
Nadav Har'El
d3a715e0ff Update seastar submodule
* seastar 93d053cd...ce3cc268 (4):
  > doc: update coroutine exception paragraph with make_exception
  > coroutine: add make_exception helper
  > coroutine: use std::move for forwarding exception_ptr
  > doc: tutorial: document direct exception propagation

With the new throw-less coroutine exception support, we can modify
some of Scylla's new coroutine code to generate exceptions a bit more
efficiently, without actually thowing an exception.
2021-07-26 19:09:19 +03:00
Tomasz Grabiec
2d18360157 row_cache: Consume range tombstones incrementally
Before the patch, all range tombstones up to the next row were copied
into a vector, and then put into the buffer until it's full. This
would get quadratic if there is much more range tombstones than fit in
a buffer.

The fix is to avoid the accumulation of all tombstones in the vector
and invoke the callback instead, which stops the iteartion as soon as
the buffer is full.

Fixes #2581.
2021-07-26 17:48:05 +02:00
Tomasz Grabiec
e74c3c885e partition_snapshot_reader: Avoid quadratic behavior with lots of range tombstones
next_range_tombstone() was populating _rt_stream on each invocation
from the current iterator ranges in _range_tombstones. If there is a
lot of range tombstones, all would be put into _rt_stream. One problem
is that this can cause a reactor stall. Fix by more incremental
approach where we populate _rt_stream with minimal amount on each
invocation of next_range_tombstone().

Another problem is that this can get quadratic. The iterators in
_range_tombstones are advanced, but if lsa invalidates them across
calls they can revert back to the front since they go back to
_last_rt, which is the last consumed range tombstone, and if the
buffer fills up, not all tombstones from _rt_stream could be
consumed. The new code doesn't have this problem because everything
which is produced out of the iterators in _range_tombstones is
produced only once. What we put into _rt_stream is consumed first
before we try to feed the _rt_stream with more data.
2021-07-26 17:48:05 +02:00
Tomasz Grabiec
0d7b3f9463 tests: mvcc: Relax monotonicity check
Consecutive range tombstones can have the same position. They will, in
one of the test cases, after the range tombstone merger in
partition_snapshot_flat_reader no longer uses range_tombstone_list to
merge data form multiple versions, which deoverlaps, but rather merges
the streams corresponding to each version, which interleaves range
tombstones from different versions.
2021-07-26 17:27:03 +02:00
Tomasz Grabiec
91868cf0cd range_tombstone_stream: Introduce peek_next() 2021-07-26 13:33:34 +02:00
Pavel Emelyanov
11a2709f10 storage_service: Replace globals with locals
The node-ops verb handler is the lambda of storage-service and it
can stop using global storage service instance for no extra charge.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-26 14:21:30 +03:00
Pavel Emelyanov
6e56671d9e storage_service: Remove one extra hop of node-ops handler
It's now clear that the verb handler goes to some "random"
shard, then immediatelly switches to shard-0 and then does
the handling. Avoid the extra hop and go to shard-0 right
at once.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-26 14:21:30 +03:00
Pavel Emelyanov
b6315d3af7 storage_service: Fix indentation after previous patch
And, while at it, s/ss/this/g and drop the ss variable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-26 14:21:30 +03:00
Pavel Emelyanov
f5fad311cf storage_service: Move cross-shard hop up the stack
The storage_service::node_ops_cmd_handler runs inside a huge
invoke_on(0, ...) lambda. Make it be called on shard-0. This
is the preparation for next two patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-26 14:21:30 +03:00
Pavel Emelyanov
eb55c252c9 repair: Drop empty verbs reg/unreg methods
Those in repair.cc's are now noops, so remove them.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-26 14:21:30 +03:00
Pavel Emelyanov
a09586a237 repair, storage_service: Move nodeops reg/unreg to storage service
The storage service is the verb sender, so it must be the verb
registrator. Another goal of this patch is to allow removal of
repair -> storage_service dependency.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-26 14:21:21 +03:00
Pavel Emelyanov
18397a5e0a repair: Coroutinize row-level start/stop
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-26 14:21:21 +03:00
Piotr Jastrzebski
90a607e844 api: use proper type to reduce partition count
Partition count is of a type size_t but we use std::plus<int>
to reduce values of partition count in various column families.
This patch changes the argument of std::plus to the right type.
Using std::plus<int> for size_t compiles but does not work as expected.
For example plus<int>(2147483648LL, 1LL) = -2147483647 while the code
would probably want 2147483649.

Fixes #9090

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>

Closes #9074
2021-07-26 11:53:06 +03:00
Nadav Har'El
b503ec36c2 cql-pytest: translate Cassandra's tests for tuples
This is a translation of Cassandra's CQL unit test source file
validation/entities/TupleTypeTest.java into our our cql-pytest framework.

This test file checks has a few tests on various features of tuples.
Unfortunately, some of the tests could not be easily translated into
Python so were left commented out: Some tests try to send invalid input
to the server which the Python driver "helpfully" forbids; Two tests
used an external testing library "QuickTheories" and are the only two
tests in the Cassandra test suite to use this library - so it's not
a worthwhile to translate it to Python.

11 tests remain, all of them pass on Cassandra, and just one fails on
Scylla (so marked xfail for now), reproducing one known issue:

Refs #7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax
Actually, += is not supposed to be supported on tuple columns anyway, but
should print the appropriate error - not the syntax error we get now as
the "+=" feature is not supported at all.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210722201900.1442391-1-nyh@scylladb.com>
2021-07-26 08:20:12 +03:00
Benny Halevy
8674746fdd flat_mutation_reader: detach_buffer: mark as noexcept
Since detach_buffer is used before closing and
destroying the reader, we want to mark it as noexcept
to simply the caller error handling.

Currently, although it does construct a new circular_buffer,
none of the constructors used may throw.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210617114240.1294501-2-bhalevy@scylladb.com>
2021-07-25 12:02:27 +03:00
Benny Halevy
0e31cdf367 flat_mutation_reader: detach_buffer: clarify buffer constructor
detach_buffer exchanges the current _buffer with
a new buffer constructed using the circular_buffer(Alloc)
constructor. The compiler implicitly constructs a
tracking_allocator(reader_permit) and passes it
to the circular_buffer constructor.

This patch just makes that explicit so it would be
clearer to the reader what's going on here.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210617114240.1294501-1-bhalevy@scylladb.com>
2021-07-25 11:59:37 +03:00
Pavel Solodovnikov
bcbcc18aa1 raft: raft_sys_table_storage: fix broken load_snapshot and load_term_and_vote
Loading snapshot id and term + vote involve selecting static
fields from the "system.raft" table, constrained by a given
group id.

The code incorrectly assumes that, for example,
`SELECT snapshot_id FROM raft WHERE group_id=?` in
`load_snapshot` always returns  only one row.
This is not true, since this will return a row
for each (pk, ck) combination, which is (group_id, index)
for "system.raft" table.

The same applies for the `load_term_and_vote`, which selects
static `vote_term` and `vote` from "system.raft".

This results in a crash at node startup when there is
a non-empty raft log containing more than one entry
for a given `group_id`.

Restrict the selection to always return one row by applying
`LIMIT 1` clause.

Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20210723183232.742083-1-pa.solodovnikov@scylladb.com>
2021-07-25 02:01:34 +02:00
Nadav Har'El
ec5e4c338b cql: fix undefined behavior in timestamp verification
Commit 2150c0f7a2 proposed by issue #5619
added a limitation that USING TIMESTAMP cannot be more than 3 days into
the future. But the actual code used to check it,

     timestamp - now > MAX_DIFFERENCE

only makes sense for *positive* timestamps. For negative timestamps,
which are allowed in Cassandra, the difference "timestamp - now" might
overflow the signed integer and the result is undefined - leading to the
undefined-behavior sanitizer to complain as reported in issue #8895.
Beyond the sanitizer, in practice, on my test setup, the timestamp -2^63+1
causes such overflow, which causes the above if() to make the nonsensical
statement that the timestamp is more than 3 days into the future.

This patch assumes that negative timestamps of any magnitude are still
allowed (as they are in Cassandra), and fixes the above if() to only
check timestamps which are in the future (timestamp > now).

We also add a cql-pytest test for negative timestamps, passing on both
Cassandra and Scylla (after this patch - it failed before, and also
reported sanitizer errors in the debug build).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210621141255.309485-1-nyh@scylladb.com>
2021-07-24 11:01:08 +03:00
Tomasz Grabiec
b044db863f Merge 'db/virtual_table: Streaming tables for large data + describe_ring example table' from Juliusz Stasiewicz
This is the 2nd PR in series with the goal to finish the hackathon project authored by @tgrabiec, @kostja, @amnonh and @mmatczuk (improved virtual tables + function call syntax in CQL). This one introduces a new implementation of the virtual tables, the streaming tables, which are suitable for large amounts of data.

This PR was created by @jul-stas and @StarostaGit

Closes #8961

* github.com:scylladb/scylla:
  test/boost: run_mutation_source_tests on streaming virtual table
  system_keyspace: Introduce describe_ring table as virtual_table
  storage_service: Pass the reference down to system_keyspace
  endpoint_details: store `_host` as `gms::inet_address`
  queue_reader: implement next_partition()
  virtual_tables: Introduce streaming_virtual_table
  flat_mutation_reader: Add a new filtering reader factory method
2021-07-23 18:05:51 +02:00
Gleb Natapov
f0047bd749 raft: apply snapshots in applier_fiber
We want to serialize snapshot application with command application
otherwise a command may be applied after a snapshot that already contains
the result of its application (it is not necessary a problem since the
raft by itself does not guaranty apply-once semantics, but better to
prevent it when possible). This also moves all interactions with user's
state machine into one place.

Message-Id: <YPltCmBAGUQnpW7r@scylladb.com>
2021-07-23 18:05:38 +02:00
Avi Kivity
aaf35b5ac2 Merge "Remove storage-service from transport (and a bit more)" from Pavel E
"
The cql-server -> storage-service dependency comes from the server's
event_notifier which (un)subscribes on the lifecycle events that come
from the storage service. To break this link the same trick as with
migration manager notifications is used -- the notification engine
is split out of the storage service and then is pushed directly into
both -- the listeners (to (un)subscribe) and the storage service (to
notify).

tests: unit(dev), dtest(simple_boot_shutdown, dev)
       manual({ start/stop,
                with/without started transport,
	        nodetool enable-/disablebinary
	      } in various combinations, dev)
"

* 'br-remove-storage-service-from-transport' of https://github.com/xemul/scylla:
  transport.controller: Brushup cql_server declarations
  code: Remove storage-service header from irrelevant places
  storage_service: Remove (unlifecycle) subscribe methods
  transport: Use local notifier to (un)subscribe server
  transport: Keep lifecycle notifier sharded reference
  main: Use local lifecycle notifier to (un)subscribe listeners
  main, tests: Push notifier through storage service
  storage_service: Move notification core into dedicated class
  storage_service: Split lifecycle notification code
  transport, generic_server: Remove no longer used functionality
  transport: (Un)Subscribe cql_server::event_notifier from controller
  tests: Remove storage service from manual gossiper test
2021-07-22 19:27:45 +03:00
Pavel Emelyanov
b1bb00a95c transport.controller: Brushup cql_server declarations
The controller code sits in the cql_transport namespace and
can omit its mentionings. Also the seastar::distributed<>
is replaced with modern seastar::sharded<> while at it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:50:57 +03:00
Pavel Emelyanov
c39f04fa6f code: Remove storage-service header from irrelevant places
Some .cc files over the code include the storage service
for no real need. Drop the header and include (in some)
what's really needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:50:19 +03:00
Pavel Emelyanov
e711bfbb7e storage_service: Remove (unlifecycle) subscribe methods
All the listeners now use main-local notifier instance directly
and these methods become unused.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:49:35 +03:00
Pavel Emelyanov
65b1bb8302 transport: Use local notifier to (un)subscribe server
Now the controller has the lifecycle notifier reference and
can stop using storage service to manage the subscription.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:48:58 +03:00
Pavel Emelyanov
5f99eeb35e transport: Keep lifecycle notifier sharded reference
It's needed to (un)subscribe server on it (next patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:48:20 +03:00
Pavel Emelyanov
2a30cb1664 main: Use local lifecycle notifier to (un)subscribe listeners
The storage proxy and sl-manager get subscribed on lifecycle
events with the help of storage service. Now when the notifier
lives in main() they can use it directly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:47:15 +03:00
Pavel Emelyanov
8248bc9e33 main, tests: Push notifier through storage service
Now it's time to move the lifecycle notifier from storage
service to the main's scope. Next patches will remove the
$lifecycle-subscriber -> storage_service dependency.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:45:51 +03:00
Pavel Emelyanov
6b3b01d9a6 storage_service: Move notification core into dedicated class
Introduce the endpoint_lifecycle_notifier class that's in
charge of keeping track of subscribers and notifying them.
The subscribers will thus be able to set and unset their
subscription without the need to mess with storage service
at all.

The storage_service for now keeps the notifier on board, but
this is going to change in the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:44:02 +03:00
Pavel Emelyanov
7e8a032013 storage_service: Split lifecycle notification code
This prepares the ground for moving the notification engine
into own class like it was done for migration_notifier some
time ago.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:43:14 +03:00
Pavel Emelyanov
c7b0b25494 transport, generic_server: Remove no longer used functionality
After subscription management was moved onto controller level
a bunch of code can be dropped:

- passing migration notifier beyond controller
- event_notifier's _stopped bit
- event_notifier .stop() method
- event_notifier empty constructor and destrictor
- generic_server's on_stop virtual method

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:41:32 +03:00
Pavel Emelyanov
1acef41626 transport: (Un)Subscribe cql_server::event_notifier from controller
There's a migration notifier that's carried through cql_server
_just_ to let event-notifier (un)subscribe on it. Also there's
a call for global storage-service in there which will need to
be replaced with yet another pass-through argument which is not
great.

It's easier to establish this subscription outside of cql_server
like it's currently done for proxy and sl-manager. In case of
cql_server the "outside" is the controller.

This patch just moves the subscription management from cql_server
to controller, next two patches will make more use of this change.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:37:23 +03:00
Pavel Emelyanov
b57fb0aa9a tests: Remove storage service from manual gossiper test
It's not needed there, gossiper starts and works without it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:36:28 +03:00
Yaron Kaikov
a004b1da30 scylla_util:add AWS arm based instance to supported list
Today we have a Scylla AMI image based on x86 archituctre only.
Following the work we did in https://github.com/scylladb/scylla-machine-image/pull/153 we can build
ARM based AMI image

Let's add ARM based instance to supported list

Closes #9064
2021-07-22 15:48:29 +03:00