The exression type cannot be a member of a struct that is an
element of the expression variant. This is because it would then
be required to contain itself. So introduce a nested_expression
type to indirectly hold an expression, but keep the value semantics
we expect from expressions: it is copyable and a copy has separate
identity and storage.
In fact binary_operator had to resort to this trick, so it's converted
to nested_expression in the next patch.
Introduce unresolved_identifer as an unprepared counterpart to column_value.
column_identifier_raw no longer inherits from selectable::raw, but
methods for now to reduce churn.
Otherwise we run into a #include loop when try to have an expression
with column_identifier::raw: expression.hh -> column_identifier.hh
-> selectable.hh -> expression.hh.
Prepare to migrate selectable::raw sub-classes to expressions by
creating a bridge betweet the two types. with_expression::raw
is a selectable::raw and implements all its methods (right now,
trivially), and its contents is an expression. The methods are
implemented using the usual visitor pattern.
"
This series fixes two issues which cause very poor efficiency of reads
when there is a lot of range tombstones per live row in a partition.
The first issue is in the row_cache reader. Before the patch, all range
tombstones up to the next row were copied into a vector, and then put
into the buffer until it's full. This would get quadratic if there is
much more range tombstones than fit in a buffer.
The fix is to avoid the accumulation of all tombstones in the vector
and invoke the callback instead, which stops the iteration as soon as
the buffer is full.
Fixes#2581.
The second, similar issue was in the memtable reader.
Tests:
- unit (dev)
- perf_row_cache_update (release)
"
* tag 'no-quadratic-rt-in-reads-v1' of github.com:tgrabiec/scylla:
test: perf_row_cache_update: Uncomment test case for lots of range tombstones
row_cache: Consume range tombstones incrementally
partition_snapshot_reader: Avoid quadratic behavior with lots of range tombstones
tests: mvcc: Relax monotonicity check
range_tombstone_stream: Introduce peek_next()
"
It exists in the node-ops handler which is registered by repair code,
but is handled by storage service. Probably, the whole node-ops handler
should instead be moved into repair, but this looks like rather huge
rework. So instead -- put the node-ops verb registration inside the
storage-service.
This removes some more calls for global storage service instance and
allows slight optimization of node-ops cross-shards calls.
tests: unit(dev), start-stop
"
* 'br-remove-storage-service-from-nodeops' of https://github.com/xemul/scylla:
storage_service: Replace globals with locals
storage_service: Remove one extra hop of node-ops handler
storage_service: Fix indentation after previous patch
storage_service: Move cross-shard hop up the stack
repair: Drop empty verbs reg/unreg methods
repair, storage_service: Move nodeops reg/unreg to storage service
repair: Coroutinize row-level start/stop
Bring supervisor support from dist/docker to install.sh, make it
installable from relocatable package.
This enables to use supervisor with nonroot / offline environment,
and also make relocatable package able to run in Docker environment.
Related #8849Closes#8918
This patch follows #9002, further reducing the complexity of the sstable readers.
The split between row consumer interfaces and implementations has been first added in 2015, and there is no reason to create new implementations anymore. By merging those classes, we achieve a sizeable reduction in sstable reader length and complexity.
Refs #7952
Tests: unit(dev)
Closes#9073
* github.com:scylladb/scylla:
sstables: merge row_consumer into mp_row_consumer_k_l
sstables: move kl row_consumer
sstables: merge consumer_m into mp_row_consumer_m
sstables: move mp_row_consumer_m
This is a translation of Cassandra's CQL unit test source file
validation/entities/TypeTest.java into our our cql-pytest framework.
This is a tiny test file, with only four test which apparently didn't
find their place in other source files. All four tests pass on Cassandra,
and all but one pass on Scylla - the test marked xfail discovered one
previously-unknown incompatibility with Cassandra:
Refs #9082: DROP TYPE IF EXISTS shouldn't fail on non-existent keyspace
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210726140934.1479443-1-nyh@scylladb.com>
Prevent accidental conversions to bool from yielding the wrong results.
Unprepared users (that converted to bool, or assigned to int) are adjusted.
Ref #1449
Test: unit (dev)
Closes#9088
* seastar 93d053cd...ce3cc268 (4):
> doc: update coroutine exception paragraph with make_exception
> coroutine: add make_exception helper
> coroutine: use std::move for forwarding exception_ptr
> doc: tutorial: document direct exception propagation
With the new throw-less coroutine exception support, we can modify
some of Scylla's new coroutine code to generate exceptions a bit more
efficiently, without actually thowing an exception.
Before the patch, all range tombstones up to the next row were copied
into a vector, and then put into the buffer until it's full. This
would get quadratic if there is much more range tombstones than fit in
a buffer.
The fix is to avoid the accumulation of all tombstones in the vector
and invoke the callback instead, which stops the iteartion as soon as
the buffer is full.
Fixes#2581.
next_range_tombstone() was populating _rt_stream on each invocation
from the current iterator ranges in _range_tombstones. If there is a
lot of range tombstones, all would be put into _rt_stream. One problem
is that this can cause a reactor stall. Fix by more incremental
approach where we populate _rt_stream with minimal amount on each
invocation of next_range_tombstone().
Another problem is that this can get quadratic. The iterators in
_range_tombstones are advanced, but if lsa invalidates them across
calls they can revert back to the front since they go back to
_last_rt, which is the last consumed range tombstone, and if the
buffer fills up, not all tombstones from _rt_stream could be
consumed. The new code doesn't have this problem because everything
which is produced out of the iterators in _range_tombstones is
produced only once. What we put into _rt_stream is consumed first
before we try to feed the _rt_stream with more data.
Consecutive range tombstones can have the same position. They will, in
one of the test cases, after the range tombstone merger in
partition_snapshot_flat_reader no longer uses range_tombstone_list to
merge data form multiple versions, which deoverlaps, but rather merges
the streams corresponding to each version, which interleaves range
tombstones from different versions.
The node-ops verb handler is the lambda of storage-service and it
can stop using global storage service instance for no extra charge.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's now clear that the verb handler goes to some "random"
shard, then immediatelly switches to shard-0 and then does
the handling. Avoid the extra hop and go to shard-0 right
at once.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The storage_service::node_ops_cmd_handler runs inside a huge
invoke_on(0, ...) lambda. Make it be called on shard-0. This
is the preparation for next two patches.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The storage service is the verb sender, so it must be the verb
registrator. Another goal of this patch is to allow removal of
repair -> storage_service dependency.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Partition count is of a type size_t but we use std::plus<int>
to reduce values of partition count in various column families.
This patch changes the argument of std::plus to the right type.
Using std::plus<int> for size_t compiles but does not work as expected.
For example plus<int>(2147483648LL, 1LL) = -2147483647 while the code
would probably want 2147483649.
Fixes#9090
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Closes#9074
This is a translation of Cassandra's CQL unit test source file
validation/entities/TupleTypeTest.java into our our cql-pytest framework.
This test file checks has a few tests on various features of tuples.
Unfortunately, some of the tests could not be easily translated into
Python so were left commented out: Some tests try to send invalid input
to the server which the Python driver "helpfully" forbids; Two tests
used an external testing library "QuickTheories" and are the only two
tests in the Cassandra test suite to use this library - so it's not
a worthwhile to translate it to Python.
11 tests remain, all of them pass on Cassandra, and just one fails on
Scylla (so marked xfail for now), reproducing one known issue:
Refs #7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax
Actually, += is not supposed to be supported on tuple columns anyway, but
should print the appropriate error - not the syntax error we get now as
the "+=" feature is not supported at all.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210722201900.1442391-1-nyh@scylladb.com>
Since detach_buffer is used before closing and
destroying the reader, we want to mark it as noexcept
to simply the caller error handling.
Currently, although it does construct a new circular_buffer,
none of the constructors used may throw.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210617114240.1294501-2-bhalevy@scylladb.com>
detach_buffer exchanges the current _buffer with
a new buffer constructed using the circular_buffer(Alloc)
constructor. The compiler implicitly constructs a
tracking_allocator(reader_permit) and passes it
to the circular_buffer constructor.
This patch just makes that explicit so it would be
clearer to the reader what's going on here.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210617114240.1294501-1-bhalevy@scylladb.com>
Loading snapshot id and term + vote involve selecting static
fields from the "system.raft" table, constrained by a given
group id.
The code incorrectly assumes that, for example,
`SELECT snapshot_id FROM raft WHERE group_id=?` in
`load_snapshot` always returns only one row.
This is not true, since this will return a row
for each (pk, ck) combination, which is (group_id, index)
for "system.raft" table.
The same applies for the `load_term_and_vote`, which selects
static `vote_term` and `vote` from "system.raft".
This results in a crash at node startup when there is
a non-empty raft log containing more than one entry
for a given `group_id`.
Restrict the selection to always return one row by applying
`LIMIT 1` clause.
Tests: unit(dev)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20210723183232.742083-1-pa.solodovnikov@scylladb.com>
Commit 2150c0f7a2 proposed by issue #5619
added a limitation that USING TIMESTAMP cannot be more than 3 days into
the future. But the actual code used to check it,
timestamp - now > MAX_DIFFERENCE
only makes sense for *positive* timestamps. For negative timestamps,
which are allowed in Cassandra, the difference "timestamp - now" might
overflow the signed integer and the result is undefined - leading to the
undefined-behavior sanitizer to complain as reported in issue #8895.
Beyond the sanitizer, in practice, on my test setup, the timestamp -2^63+1
causes such overflow, which causes the above if() to make the nonsensical
statement that the timestamp is more than 3 days into the future.
This patch assumes that negative timestamps of any magnitude are still
allowed (as they are in Cassandra), and fixes the above if() to only
check timestamps which are in the future (timestamp > now).
We also add a cql-pytest test for negative timestamps, passing on both
Cassandra and Scylla (after this patch - it failed before, and also
reported sanitizer errors in the debug build).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210621141255.309485-1-nyh@scylladb.com>
This is the 2nd PR in series with the goal to finish the hackathon project authored by @tgrabiec, @kostja, @amnonh and @mmatczuk (improved virtual tables + function call syntax in CQL). This one introduces a new implementation of the virtual tables, the streaming tables, which are suitable for large amounts of data.
This PR was created by @jul-stas and @StarostaGit
Closes#8961
* github.com:scylladb/scylla:
test/boost: run_mutation_source_tests on streaming virtual table
system_keyspace: Introduce describe_ring table as virtual_table
storage_service: Pass the reference down to system_keyspace
endpoint_details: store `_host` as `gms::inet_address`
queue_reader: implement next_partition()
virtual_tables: Introduce streaming_virtual_table
flat_mutation_reader: Add a new filtering reader factory method
We want to serialize snapshot application with command application
otherwise a command may be applied after a snapshot that already contains
the result of its application (it is not necessary a problem since the
raft by itself does not guaranty apply-once semantics, but better to
prevent it when possible). This also moves all interactions with user's
state machine into one place.
Message-Id: <YPltCmBAGUQnpW7r@scylladb.com>
"
The cql-server -> storage-service dependency comes from the server's
event_notifier which (un)subscribes on the lifecycle events that come
from the storage service. To break this link the same trick as with
migration manager notifications is used -- the notification engine
is split out of the storage service and then is pushed directly into
both -- the listeners (to (un)subscribe) and the storage service (to
notify).
tests: unit(dev), dtest(simple_boot_shutdown, dev)
manual({ start/stop,
with/without started transport,
nodetool enable-/disablebinary
} in various combinations, dev)
"
* 'br-remove-storage-service-from-transport' of https://github.com/xemul/scylla:
transport.controller: Brushup cql_server declarations
code: Remove storage-service header from irrelevant places
storage_service: Remove (unlifecycle) subscribe methods
transport: Use local notifier to (un)subscribe server
transport: Keep lifecycle notifier sharded reference
main: Use local lifecycle notifier to (un)subscribe listeners
main, tests: Push notifier through storage service
storage_service: Move notification core into dedicated class
storage_service: Split lifecycle notification code
transport, generic_server: Remove no longer used functionality
transport: (Un)Subscribe cql_server::event_notifier from controller
tests: Remove storage service from manual gossiper test
The controller code sits in the cql_transport namespace and
can omit its mentionings. Also the seastar::distributed<>
is replaced with modern seastar::sharded<> while at it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Some .cc files over the code include the storage service
for no real need. Drop the header and include (in some)
what's really needed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now the controller has the lifecycle notifier reference and
can stop using storage service to manage the subscription.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The storage proxy and sl-manager get subscribed on lifecycle
events with the help of storage service. Now when the notifier
lives in main() they can use it directly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now it's time to move the lifecycle notifier from storage
service to the main's scope. Next patches will remove the
$lifecycle-subscriber -> storage_service dependency.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Introduce the endpoint_lifecycle_notifier class that's in
charge of keeping track of subscribers and notifying them.
The subscribers will thus be able to set and unset their
subscription without the need to mess with storage service
at all.
The storage_service for now keeps the notifier on board, but
this is going to change in the next patch.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This prepares the ground for moving the notification engine
into own class like it was done for migration_notifier some
time ago.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
After subscription management was moved onto controller level
a bunch of code can be dropped:
- passing migration notifier beyond controller
- event_notifier's _stopped bit
- event_notifier .stop() method
- event_notifier empty constructor and destrictor
- generic_server's on_stop virtual method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's a migration notifier that's carried through cql_server
_just_ to let event-notifier (un)subscribe on it. Also there's
a call for global storage-service in there which will need to
be replaced with yet another pass-through argument which is not
great.
It's easier to establish this subscription outside of cql_server
like it's currently done for proxy and sl-manager. In case of
cql_server the "outside" is the controller.
This patch just moves the subscription management from cql_server
to controller, next two patches will make more use of this change.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>