Since detach_buffer is used before closing and
destroying the reader, we want to mark it as noexcept
to simply the caller error handling.
Currently, although it does construct a new circular_buffer,
none of the constructors used may throw.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210617114240.1294501-2-bhalevy@scylladb.com>
detach_buffer exchanges the current _buffer with
a new buffer constructed using the circular_buffer(Alloc)
constructor. The compiler implicitly constructs a
tracking_allocator(reader_permit) and passes it
to the circular_buffer constructor.
This patch just makes that explicit so it would be
clearer to the reader what's going on here.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210617114240.1294501-1-bhalevy@scylladb.com>
Loading snapshot id and term + vote involve selecting static
fields from the "system.raft" table, constrained by a given
group id.
The code incorrectly assumes that, for example,
`SELECT snapshot_id FROM raft WHERE group_id=?` in
`load_snapshot` always returns only one row.
This is not true, since this will return a row
for each (pk, ck) combination, which is (group_id, index)
for "system.raft" table.
The same applies for the `load_term_and_vote`, which selects
static `vote_term` and `vote` from "system.raft".
This results in a crash at node startup when there is
a non-empty raft log containing more than one entry
for a given `group_id`.
Restrict the selection to always return one row by applying
`LIMIT 1` clause.
Tests: unit(dev)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20210723183232.742083-1-pa.solodovnikov@scylladb.com>
Commit 2150c0f7a2 proposed by issue #5619
added a limitation that USING TIMESTAMP cannot be more than 3 days into
the future. But the actual code used to check it,
timestamp - now > MAX_DIFFERENCE
only makes sense for *positive* timestamps. For negative timestamps,
which are allowed in Cassandra, the difference "timestamp - now" might
overflow the signed integer and the result is undefined - leading to the
undefined-behavior sanitizer to complain as reported in issue #8895.
Beyond the sanitizer, in practice, on my test setup, the timestamp -2^63+1
causes such overflow, which causes the above if() to make the nonsensical
statement that the timestamp is more than 3 days into the future.
This patch assumes that negative timestamps of any magnitude are still
allowed (as they are in Cassandra), and fixes the above if() to only
check timestamps which are in the future (timestamp > now).
We also add a cql-pytest test for negative timestamps, passing on both
Cassandra and Scylla (after this patch - it failed before, and also
reported sanitizer errors in the debug build).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210621141255.309485-1-nyh@scylladb.com>
This is the 2nd PR in series with the goal to finish the hackathon project authored by @tgrabiec, @kostja, @amnonh and @mmatczuk (improved virtual tables + function call syntax in CQL). This one introduces a new implementation of the virtual tables, the streaming tables, which are suitable for large amounts of data.
This PR was created by @jul-stas and @StarostaGit
Closes#8961
* github.com:scylladb/scylla:
test/boost: run_mutation_source_tests on streaming virtual table
system_keyspace: Introduce describe_ring table as virtual_table
storage_service: Pass the reference down to system_keyspace
endpoint_details: store `_host` as `gms::inet_address`
queue_reader: implement next_partition()
virtual_tables: Introduce streaming_virtual_table
flat_mutation_reader: Add a new filtering reader factory method
We want to serialize snapshot application with command application
otherwise a command may be applied after a snapshot that already contains
the result of its application (it is not necessary a problem since the
raft by itself does not guaranty apply-once semantics, but better to
prevent it when possible). This also moves all interactions with user's
state machine into one place.
Message-Id: <YPltCmBAGUQnpW7r@scylladb.com>
"
The cql-server -> storage-service dependency comes from the server's
event_notifier which (un)subscribes on the lifecycle events that come
from the storage service. To break this link the same trick as with
migration manager notifications is used -- the notification engine
is split out of the storage service and then is pushed directly into
both -- the listeners (to (un)subscribe) and the storage service (to
notify).
tests: unit(dev), dtest(simple_boot_shutdown, dev)
manual({ start/stop,
with/without started transport,
nodetool enable-/disablebinary
} in various combinations, dev)
"
* 'br-remove-storage-service-from-transport' of https://github.com/xemul/scylla:
transport.controller: Brushup cql_server declarations
code: Remove storage-service header from irrelevant places
storage_service: Remove (unlifecycle) subscribe methods
transport: Use local notifier to (un)subscribe server
transport: Keep lifecycle notifier sharded reference
main: Use local lifecycle notifier to (un)subscribe listeners
main, tests: Push notifier through storage service
storage_service: Move notification core into dedicated class
storage_service: Split lifecycle notification code
transport, generic_server: Remove no longer used functionality
transport: (Un)Subscribe cql_server::event_notifier from controller
tests: Remove storage service from manual gossiper test
The controller code sits in the cql_transport namespace and
can omit its mentionings. Also the seastar::distributed<>
is replaced with modern seastar::sharded<> while at it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Some .cc files over the code include the storage service
for no real need. Drop the header and include (in some)
what's really needed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now the controller has the lifecycle notifier reference and
can stop using storage service to manage the subscription.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The storage proxy and sl-manager get subscribed on lifecycle
events with the help of storage service. Now when the notifier
lives in main() they can use it directly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now it's time to move the lifecycle notifier from storage
service to the main's scope. Next patches will remove the
$lifecycle-subscriber -> storage_service dependency.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Introduce the endpoint_lifecycle_notifier class that's in
charge of keeping track of subscribers and notifying them.
The subscribers will thus be able to set and unset their
subscription without the need to mess with storage service
at all.
The storage_service for now keeps the notifier on board, but
this is going to change in the next patch.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This prepares the ground for moving the notification engine
into own class like it was done for migration_notifier some
time ago.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
After subscription management was moved onto controller level
a bunch of code can be dropped:
- passing migration notifier beyond controller
- event_notifier's _stopped bit
- event_notifier .stop() method
- event_notifier empty constructor and destrictor
- generic_server's on_stop virtual method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's a migration notifier that's carried through cql_server
_just_ to let event-notifier (un)subscribe on it. Also there's
a call for global storage-service in there which will need to
be replaced with yet another pass-through argument which is not
great.
It's easier to establish this subscription outside of cql_server
like it's currently done for proxy and sl-manager. In case of
cql_server the "outside" is the controller.
This patch just moves the subscription management from cql_server
to controller, next two patches will make more use of this change.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This PR contains the parts relevant to batchlog_manager stop in #8998 without adding a gate to the storage_proxy for synchronization with on-going queries in storage_proxy::drain_on_shutdown.
As explained in #9009, we see that the batchlog_manager isn't stopped if scylla shuts down during startup, e.g. when waiting for gossip to settle, since currently the batchlog_manager is stopped only from `storage_service::do_drain`, while `storage_service::drain_on_shutdown` deferred shutdown is installed only later on:
222ef17305/main.cc (L1419-L1421)Fixes#9009
Test: unit(dev)
DTest: compact_storage_tests.py:TestCompactStorage.wide_row_test paging_test:TestPagingDatasetChanges.test_cell_TTL_expiry_during_paging update_cluster_layout_tests:TestUpdateClusterLayout.simple_add_new_node_while_adding_info_{1,2}_test (dev)
Closes#9010
* github.com:scylladb/scylla:
main: add deferred stop of batchlog_manager
batchlog_manager: refactor drain out of stop
batchlog_manager: stop: break _sem on shard 0
batchlog_manager: stop: use abort_source to abort batchlog_replay_loop
batchlog_manager: do_batch_log_replay: hold _gate
* seastar 388ee307...93d053cd (5):
> doc: tutorial: document seastar::coroutine::all()
> doc: tutorial: nest "exceptions in coroutines" under "coroutines"
> coroutine: add a way of propagating exceptions without throwing
> input_stream: Fix read_exactly(n) incorrectly skipping data
> coroutines: introduce all() template for waiting for multiple futures
The class name `coroutine` became problematic since seastar
introduced it as a namespace for coroutine helpers.
To avoid a clash, the class from scylla is wrapped in a separate
namespace.
Without this patch, Seastar submodule update fails to compile.
Message-Id: <6cb91455a7ac3793bc78d161e2cb4174cf6a1606.1626949573.git.sarna@scylladb.com>
This is a rewrite of an old PR: #7582
`TOKEN()` restrictions don't work properly when a query uses an index.
For example this returns both rows:
```cql
CREATE TABLE t(pk int, ck int, v int, PRIMARY KEY(pk, ck));
CREATE INDEX ON t(v);
INSERT INTO t (pk, ck, v) VALUES (0, 0, 0);
INSERT INTO t (pk, ck, v) VALUES (1, 0, 0);
SELECT token(pk), pk, ck, v FROM t WHERE v = 0 AND token(pk) = token(0) ALLOW FILTERING;
```
This functionality is supported on both old and new indexes. In old
indexes the type of the token column was `blob`. This causes problems,
because `blob` representation of tokens is ordered differently. Tokens
represented as blobs are ordered like this:
```
0, 1, 2, 3, 4, 5, ..., bigint_max, bigint_min, ...., -5, -4, -3, -2, -1
```
Because of that clustering range for `token()` restrictions needs to be
translated to two clustering ranges on the `blob` column.
To create old indexes disable the feature called:
`CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX` or run scylla version from branch
[`cvybhu/si-token2-old-index`](https://github.com/cvybhu/scylla/commits/si-token2-old-index)
I'm not sure if it's possible to create automatic tests with old
indexes. I ran `dev-test` manually on the `si-token2-old-index` branch,
and the only tests that failed were the ones testing row ordering. Rows
should be ordered by `token`, but because in old indexes the token is
represented as a `blob` this ordering breaks. This is a known issue
(#7443), that has been fixed by introducing new indexes.
To sum up:
* `token()` restrictions are fixed on both new and old indexes.
* When using old indexes, the rows are not properly ordered by token.
* With new indexes the rows are properly ordered by token.
Fixes#7043Closes#9067
* github.com:scylladb/scylla:
tests: add secondary index tests with TOKEN clause
secondary_index_test: extract test data
secondary_index: Fix TOKEN() restrictions in indexed SELECTs
expression: Add replace_token function
Add tests of SELECTs with TOKEN clauses on tables with secondary
indexes (both global and local).
test_select_with_token_range_cases checks all possible token range
combinations (inclusive/exclusive/infinity start/end) on tables without
index, with local or with global index.
test_select_with_token_range_filtering checks whether TOKEN restrictions
combined with column restrictions work properly. As different code paths
are taken if index is created on clustering key (first or non-first) or
non-primary-key column, the tests checks scenarios when index is created
on different columns.
Extract test data to a separate variables, allowing it to be easily
reused by other tests. The tokens are hard-coded, because calculating
their value brought too much complexity to this code.
When using an index, restrictions like token(p) <= x were ignored.
Because of this a query like this would select all rows where r = 0:
SELECT * FROM tab WHERE r = 0 and token(p) > 0;
Adds proper handling of token restrictions to queries that use indexes.
Old indexes represented token as a blob, which complicates
clustering bounds. Special code is included, which translates
token clustering bounds to blob clustering bounds.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Today, table relies on row_cache::invalidate() serialization for
concurrent sstable list updates to produce correct results.
That's very error prone because table is relying on an implementation
detail of invalidate() to get things right.
Instead, let's make table itself take care of serialization on
concurrent updates.
To achieve that, sstable_list_builder is introduced. Only one
builder can be alive for a given table, so serialization is guaranteed
as long as the builder is kept alive throughout the update procedure.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210721001716.210281-1-raphaelsc@scylladb.com>
That function is dangerously used by distributed loader, as the latter
was responsible for invalidating cache for new sstable.
load_sstable() is an unsafe alternative to
add_sstable_and_update_cache() that should never have been used by
the outside world. Instead, let's kill it and make loader use
the safe alternative instead.
This will also make it easier to make sure that all concurrent updates
to sstable set are properly serialized.
Additionally, this may potentially reduce the amount of data evicted
from the cache, when the sstables being imported have a narrow range,
like high level sstables imported from a LCS table. Unlikely but
possible.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210721131949.26899-1-raphaelsc@scylladb.com>
Adds replace_token function which takes an expression
and replaces all left hand side occurences of token()
with the given column definition.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
When a leader moves to a follower state it aborts all requests that are
waiting on an admission semaphore with not_a_leader exception. But
currently it specifies itself as a new leader since abortion happens
before the fsm state changes to a follower. The patch fixes this by
destroying leader state after fsm state already changed to be a
follower.
Message-Id: <YPbI++0z5ZPV9pKb@scylladb.com>
* seastar ef320940...388ee307 (4):
> Merge 'Add a stall analyser tool' from Benny Halevy
> compat: implement coroutine_handle<void> for <experimental/coroutine> header
> Merge "Make app_template::run noexcept" from Pavel E
> perftune.py: make RPS CPU set to be a full CPU set
The stall analyser tool was requested by the SCT team to help make
sense of Scylla's stall reports and find more stall bugs!
Stop the batchlog manager using a deferred
action in main to make sure it is stopped after
its start() method has been called, also
if we bail out of main early due to exception.
Change the bm.stop() calls in storage_service
to just stop the replay loop using drain().
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
drain() aborts the replay loop fiber
and returns its future.
It's grabbing _gate so stop() will wait on it.
The intention is to call stop_replay_loop from
storage_service::decommission and do_drain rather
than stop, so we can stop the batchlog manager once,
using a deferred action in main.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Harden start/stop by using an abort_source to abort from
the replay loop.
Extract the loop into batchlog_replay_loop() coroutine,
with the _stop abourt source as a stop condition,
plus use it for sleep_abortable to be able to promptly
stop while sleeping.
start() stores batchlog_replay_loop's future in a newly added
_started member, which is waited on in stop() to synchronize
with the start process at any stage.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
So we can wait on do_batch_log_replay on stop().
Note that do_batch_log_replay is called both from
batchlog_replay_loop and from the storage_service.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
In an upcoming commit I will add "system.describe_ring" table which uses
endpoint's inet address as a part of CK and, therefore, needs to keep them
sorted with `inet_addr_type::less`.
Introduced in d72b91053b.
If region was not compactible, for example because it has dense
segments, we would keep evicting even though the target for reclaimed
segments was met. In the worst case we may have to evict whole cache.
Refs #9038 (unlikely to be the cause though)
Message-Id: <20210720104039.463662-1-tgrabiec@scylladb.com>
Related issues: scylladb/sphinx-scylladb-theme#87
All the variables related to the multiversion extension are now defined in conf.py instead of using the GitHub Actions file.
How to test this PR
Run make multiversionpreview on docs folder. When you open https://0.0.0.0:5500, the browser should render the documentation site.
Closes#7957