--online-discard option defined as string parameter since it doesn't
specify "action=", but has default value in boolean (default=True).
It breaks "provisioning in a similar environment" since the code
supposed boolean value should be "action='store_true'" but it's not.
We should change the type of the option to int, and also specify
"choices=[0, 1]" just like --io-setup does.
Fixes#11700Closes#11831
(cherry picked from commit acc408c976)
Regular INSERT statements with null values for primary key
components are rejected by Scylla since #9286 and #9314.
Batch statements missed a similar check, this patch
fixes it.
Fixes: #12060
(cherry picked from commit 7730c4718e)
Traditionally in Scylla and in Cassandra, an empty partition key is mapped
to minimum_token() instead of the empty key's usual hash function (0).
The reasons for this are unknown (to me), but one possibility is that
having one known key that maps to the minimal token is useful for
various iterations.
In murmur3_partitioner.cc we have two variants of the token calculation
function - the first is get_token(bytes_view) and the second is
get_token(schema, partition_key_view). The first includes that empty-
key special case, but the second was missing this special case!
As Kamil first noted in #9352, the second variant is used when looking
up partitions in the index file - so if a partition with an empty-string
key is saved under one token, it will be looked up under a different
token and not found. I reproduced exactly this problem when fixing
issues #9364 and #9375 (empty-string keys in materialized views and
indexes) - where a partition with an empty key was visible in a
full-table scan but couldn't be found by looking up its key because of
the wrong index lookup.
I also tried an alternative fix - changing both implementations to return
minimum_token (and not 0) for the empty key. But this is undesirable -
minimum_token is not supposed to be a valid token, so the tokenizer and
sharder may not return a valid replica or shard for it, so we shouldn't
store data under such token. We also have have code (such as an increasing-
key sanity check in the flat mutation reader) which assumes that
no real key in the data can be minimum_token, and our plan is to start
allowing data with an empty key (at least for materialized views).
This patch does not risk a backward-incompatible disk format changes
for two reasons:
1. In the current Scylla, there was no valid case where an empty partition
key may appear. CQL and Thrift forbid such keys, and materialized-views
and indexes also (incorrectly - see #9364, #9375) drop such rows.
2. Although Cassandra *does* allow empty partition keys, they is only
allowed in materialized views and indexes - and we don't support reading
materialized views generated by Cassandra (the user must re-generate
them in Scylla).
When #9364 and #9375 will be fixed by the next patch, empty partition keys
will start appearing in Scylla (in materialized views and in the
materialized view backing a secondary index), and this fix will become
important.
Fixes#9352
Refs #9364
Refs #9375
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit bc4d0fd5ad)
Fixes wrong condition for validating whether a JSON string representing
blob value is valid. Previously, strings such as "6" or "0392fa" would
pass the validation, even though they are too short or don't start with
"0x". Add those test cases to json_cql_query_test.cc.
Fixes#10114
(cherry picked from commit f8b67c9bd1)
When the mutation compactor has all the rows it needs for a page, it
saves the decision to stop in a member flag: _stop.
For single partition queries, the mutation compactor is kept alive
across pages and so it has a method, start_new_page() to reset its state
for the next page. This method didn't clear the _stop flag. This meant
that the value set at the end of the previous could cause the new page
and subsequently the entire query to be stopped prematurely.
This can happen if the new page starts with a row that is covered by a
higher level tombstone and is completely empty after compaction.
Reset the _stop flag in start_new_page() to prevent this.
This commit also adds a unit test which reproduces the bug.
Fixes: #12361Closes#12384
(cherry picked from commit b0d95948e1)
Due to an oversight, the local index cache isn't evicted gently
when _upper_bound existed. This is a source of reactor stalls.
Fix that.
Fixes#12271Closes#12364
(cherry picked from commit d9269abf5b)
This includes merges 396d9e6a46 and 2c021affd1
Things that got changed here:
1. All the node_ops_... stuff in storage_service was coroutinized after 5.0, so in this merge the changes were de-coroutinized back
2. Had to cherry-pick molding for UUID (69fcc053bb and 489e50ef3a)
3. tracker::is_aborted() was added after 5.0, it caused minor context conflict
4. watchdog interval was changed, also caused minor context conflict
refs: #10284Closes#12335
* github.com:scylladb/scylladb:
repair: use sharded abort_source to abort repair_info
repair: node_ops_info: add start and stop methods
storage_service: node_ops_abort_thread: abort all node ops on shutdown
storage_service: node_ops_abort_thread: co_return only after printing log message
storage_service: node_ops_meta_data: add start and stop methods
repair: node_ops_info: prevent accidental copy
repair: Remove ops_uuid
repair: Remove abort_repair_node_ops() altogether
repair: Subscribe on node_ops_info::as abortion
repair: Keep abort source on node_ops_info
repair: Pass node_ops_info arg to do_sync_data_using_repair()
repair: Mark repair_info::abort() noexcept
node_ops: Remove _aborted bit
node_ops: Simplify construction of node_ops_metadata
main: Fix message about repair service starting
utils: uuid: make operator bool explicit
utils: uuid: add null_uuid
Currently we use a single shared_ptr<abort_source>
that can't be copied across shards.
Instead, use a sharded<abort_source> in node_ops_info so that each
repair_info instance will use an (optional) abort_source*
on its own shard.
Added respective start and stop methodsm plus a local_abort_source
getter to get the shard-local abort_source (if available).
Fixes#11826
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Prepare for adding a sharded<abort_source> member.
Wire start/stop in storage_service::node_ops_meta_data.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
A later patch adds a sharded<abort_source> to node_ops_info.
On shutdown, we must orderly stop it, so use node_ops_abort_thread
shutdown path (where node_ops_singal_abort is called will a nullopt)
to abort (and stop) all outstanding node_ops by passing
a null_uuid to node_ops_abort, and let it iterate over all
node ops to abort and stop them.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently the function co_returns if (!uuid_opt)
so the log info message indicating it's stopped
is not printed.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Delete node_ops_info copy and move constructors before
we add a sharded<abort_source> member for the per-shard repairs
in the next patch.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It used to be used to abort repair_info by the corresponding node-ops
uuid, but this code is no longer there, so it's good to drop the uuid as
well
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When node_ops_meta_data aborts it also kicks repair to find and abort
all relevant repair_infos. Now it can be simplified by subscribing
repair_meta on the abort source and aborting it without explicit kick
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patches will need to subscribe on node_ops_meta_data's abort source
inside repair code, so keep the pointer on node_ops_info too. At the
same time, the node_ops_info::abort becomes obsolete, because the same
check can be performed via the abort_source->abort_requested()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patches will need to know more than the ops_uuid. The needed info
is (well -- will be) sitting on node_ops_info, so pass it along
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patch will call it inside abort_source subscription callback which
requires the calling code to be noexcept
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
A short cleanup "while at it" -- the node_ops_meta_data doesn't need to
carry dedicated _aborted boolean -- the abort source that sets it is
available instantly
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The problematic scenario this patch fixes might happen due to
unfortunate serialization of locks/unlocks between lock_pk and lock_ck,
as follows:
1. lock_pk acquires an exclusive lock on the partition.
2.a lock_ck attempts to acquire shared lock on the partition
and any lock on the row. both cases currently use a fiber
returning a future<rwlock::holder>.
2.b since the partition is locked, the lock_partition times out
returning an exceptional future. lock_row has no such problem
and succeeds, returning a future holding a rwlock::holder,
pointing to the row lock.
3.a the lock_holder previously returned by lock_pk is destroyed,
calling `row_locker::unlock`
3.b row_locker::unlock sees that the partition is not locked
and erases it, including the row locks it contains.
4.a when_all_succeeds continuation in lock_ck runs. Since
the lock_partition future failed, it destroyes both futures.
4.b the lock_row future is destroyed with the rwlock::holder value.
4.c ~holder attempts to return the semaphore units to the row rwlock,
but the latter was already destroyed in 3.b above.
Acquiring the partition lock and row lock in parallel
doesn't help anything, but it complicates error handling
as seen above,
This patch serializes acquiring the row lock in lock_ck
after locking the partition to prevent the above race.
This way, erasing the unlocked partition is never expected
to happen while any of its rows locks is held.
Fixes#12168
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#12208
(cherry picked from commit 5007ded2c1)
Contains fixes requested in the issue (and some tiny extras), together with analysis why they don't affect the users (see commit messages).
Fixes [ #11800](https://github.com/scylladb/scylladb/issues/11800)
Closes#11926
* github.com:scylladb/scylladb:
alternator: add maybe_quote to secondary indexes 'where' condition
test/alternator: correct xfail reason for test_gsi_backfill_empty_string
test/alternator: correct indentation in test_lsi_describe
alternator: fix wrong 'where' condition for GSI range key
(cherry picked from commit ce7c1a6c52)
Currently, the `_reader` member is explicitly
initialized with the result of the call to `make_reader`.
And `make_reader`, as a side effect, assigns a value
to the `_reader_handle` member.
Since C++ initializes class members sequentially,
in the order they are defined, the assignment to `_reader_handle`
in `make_reader()` happens before `_reader_handle` is initialized.
This patch fixes that by changing the definition order,
and consequently, the member initialization order
in the constructor so that `_reader_handle` will be (default-)initialized
before the call to `make_reader()`, avoiding the undefined behavior.
Fixes#10882
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#10883
(cherry picked from commit 9c231ad0ce)
When we write to a materialized view, we need to know some information
defined in the base table such as the columns in its schema. We have
a "view_info" object that tracks each view and its base.
This view_info object has a couple of mutable attributes which are
used to lazily-calculate and cache the SELECT statement needed to
read from the base table. If the base-table schema ever changes -
and the code calls set_base_info() at that point - we need to forget
this cached statement. If we don't (as before this patch), the SELECT
will use the wrong schema and writes will no longer work.
This patch also includes a reproducing test that failed before this
patch, and passes afterwords. The test creates a base table with a
view that has a non-trivial SELECT (it has a filter on one of the
base-regular columns), makes a benign modification to the base table
(just a silly addition of a comment), and then tries to write to the
view - and before this patch it fails.
Fixes#10026Fixes#11542
(cherry picked from commit 2f2f01b045)
The view builder builds the views from a given base table in
view_builder::batch_size batches of rows. After processing this many
rows, it suspends so the view builder can switch to building views for
other base tables in the name of fairness. When resuming the build step
for a given base table, it reuses the reader used previously (also
serving the role of a snapshot, pinning sstables read from). The
compactor however is created anew. As the reader can be in the middle of
a partition, the view builder injects a partition start into the
compactor to prime it for continuing the partition. This however only
included the partition-key, crucially missing any active tombstones:
partition tombstone or -- since the v2 transition -- active range
tombstone. This can result in base rows covered by either of this to be
resurrected and the view builder to generate view updates for them.
This patch solves this by using the detach-state mechanism of the
compactor which was explicitly developed for situations like this (in
the range scan code) -- resuming a read with the readers kept but the
compactor recreated.
Also included are two test cases reproducing the problem, one with a
range tombstone, the other with a partition tombstone.
Fixes: #11668Closes#11671
(cherry picked from commit 5621cdd7f9)
PR #9314 fixed a similar issue with regular insert statements
but missed the LWT code path.
It's expected behaviour of
modification_statement::create_clustering_ranges to return an
empty range in this case, since possible_lhs_values it
uses explicitly returns empty_value_set if it evaluates rhs
to null, and it has a comment about it (All NULL
comparisons fail; no column values match.) On the other hand,
all components of the primary key are required to be set,
this is checked at the prepare phase, in
modification_statement::process_where_clause. So the only
problem was modification_statement::execute_with_condition
was not expecting an empty clustering_range in case of
a null clustering key.
Fixes: #11954
(cherry picked from commit 0d443dfd16)
When filtering with multi column restriction present all other restrictions were ignored.
So a query like:
`SELECT * FROM WHERE pk = 0 AND (ck1, ck2) < (0, 0) AND regular_col = 0 ALLOW FILTERING;`
would ignore the restriction `regular_col = 0`.
This was caused by a bug in the filtering code:
2779a171fc/cql3/selection/selection.cc (L433-L449)
When multi column restrictions were detected, the code checked if they are satisfied and returned immediately.
This is fixed by returning only when these restrictions are not satisfied. When they are satisfied the other restrictions are checked as well to ensure all of them are satisfied.
This code was introduced back in 2019, when fixing #3574.
Perhaps back then it was impossible to mix multi column and regular columns and this approach was correct.
Fixes: #6200Fixes: #12014Closes#12031
* github.com:scylladb/scylladb:
cql-pytest: add a reproducer for #12014, verify that filtering multi column and regular restrictions works
boost/restrictions-test: uncomment part of the test that passes now
cql-pytest: enable test for filtering combined multi column and regular column restrictions
cql3: don't ignore other restrictions when a multi column restriction is present during filtering
(cherry picked from commit 2d2034ea28)
Closes#12086
When stopping the read, the multishard reader will dismantle the
compaction state, pushing back (unpopping) the currently processed
partition's header to its originating reader. This ensures that if the
reader stops in the middle of a partition, on the next page the
partition-header is re-emitted as the compactor (and everything
downstream from it) expects.
It can happen however that there is nothing more for the current
partition in the reader and the next fragment is another partition.
Since we only push back the partition header (without a partition-end)
this can result in two partitions being emitted without being separated
by a partition end.
We could just add the missing partition-end when needed but it is
pointless, if the partition has no more data, just drop the header, we
won't need it on the next page.
The missing partition-end can generate an "IDL frame truncated" message
as it ends up causing the query result writer to create a corrupt
partition entry.
Fixes: https://github.com/scylladb/scylladb/issues/9482Closes#11912
* github.com:scylladb/scylladb:
test/cql-pytest: add regression test for "IDL frame truncated" error
mutation_compactor: detach_state(): make it no-op if partition was exhausted
Wrong access to an uninitialized token instead of the actual
generated string caused the parser to crash, this wasn't
detected by the ANTLR3 compiler because all the temporary
variables defined in the ANTLR3 statements are global in the
generated code. This essentialy caused a null dereference.
Tests: 1. The fixed issue scenario from github.
2. Unit tests in release mode.
Fixes#11774
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190612133151.20609-1-eliransin@scylladb.com>
Closes#11777
(cherry picked from commit ab7429b77d)
cql3::util::maybe_quote() is a utility function formatting an identifier
name (table name, column name, etc.) that needs to be embedded in a CQL
statement - and might require quoting if it contains non-alphanumeric
characters, uppercase characters, or a CQL keyword.
maybe_quote() made an effort to only quote the identifier name if neccessary,
e.g., a lowercase name usually does not need quoting. But lowercase names
that are CQL keywords - e.g., to or where - cannot be used as identifiers
without quoting. This can cause problems for code that wants to generate
CQL statements, such as the materialized-view problem in issue #9450 - where
a user had a column called "to" and wanted to create a materialized view
for it.
So in this patch we fix maybe_quote() to recognize invalid identifiers by
using the CQL parser, and quote them. This will quote reserved keywords,
but not so-called unreserved keywords, which *are* allowed as identifiers
and don't need quoting. This addition slows down maybe_quote(), but
maybe_quote() is anyway only used in heavy operations which need to
generate CQL.
This patch also adds two tests that reproduce the bug and verify its
fix:
1. Add to the low-level maybe_quote() test (a C++ unit test) also tests
that maybe_quote() quotes reserved keywords like "to", but doesn't
quote unreserved keywords like "int".
2. Add a test reproducing issue #9450 - creating a materialized view
whose key column is a keyword. This new test passes on Cassandra,
failed on Scylla before this patch, and passes after this patch.
It is worth noting that maybe_quote() now has a "forward compatiblity"
problem: If we save CQL statements generated by maybe_quote(), and a
future version introduces a new reserved keyword, the parser of the
future version may not be able to parse the saved CQL statement that
was generated with the old mayb_quote() and didn't quote what is now
a keyword. This problem can be solved in two ways:
1. Try hard not to introduced new reserved keywords. Instead, introduce
unreserved keywords. We've been doing this even before recognizing
this maybe_quote() future-compatibility problem.
2. In the next patch we will introduce quote() - which unconditionally
quotes identifier names, even if lowercase. These quoted names will
be uglier for lowercase names - but will be safe from future
introduction of new keywords. So we can consider switching some or
all uses of maybe_quote() to quote().
Fixes#9450
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220118161217.231811-1-nyh@scylladb.com>
(cherry picked from commit 5d2f694a90)
The return from DescribeTable which describes GSIs and LSIs is missing
the Projection field. We do not yet support all the settings Projection
(see #5036), but the default which we support is ALL, and DescribeTable
should return that in its description.
Fixes#11470Closes#11693
(cherry picked from commit 636e14cc77)
The problem was incompatibility with cassandra, which accepts bool
as a string in `fromJson()` UDF. The difference between Cassandra and
Scylla now is Scylla accepts whitespaces around word in string,
Cassandra don't. Both are case insensitive.
Fixes: #7915
(cherry picked from commit 1902dbc9ff)
EC2 instance metadata service can be busy, ret's retry to connect with
interval, just like we do in scylla-machine-image.
Fixes#10250
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Closes#11688
(cherry picked from commit 6b246dc119)
(cherry picked from commit e2809674d2)
detach_state() allows the user to resume a compaction process later,
without having to keep the compactor object alive. This happens by
generating and returning the mutation fragments the user has to re-feed
to a newly constructed compactor to bring it into the exact same state
the current compactor was at the point of stopping the compaction.
This state includes the partition-header (partition-start and static-row
if any) and the currently active range tombstone.
Detaching the state is pointless however when the compaction was stopped
such that the currently compacted partition was completely exhausted.
Allowing the state to be detached in this case seems benign but it
caused a subtle bug in the main user of this feature: the partition
range scan algorithm, where the fragments included in the detached state
were pushed back into the reader which produced them. If the partition
happened to be exhausted -- meaning the next fragment in the reader was
a partition-start or EOS -- this resulted in the partition being
re-emitted later without a partition-end, resulting in corrupt
query-result being generated, in turn resulting in an obscure "IDL frame
truncated" error.
This patch solves this seemingly benign but sinister bug by making the
return value of `detach_state()` an std::optional and returning a
disengaged optional when the partition was exhausted.
(cherry picked from commit 70b4158ce0)
As described in issue #11801, we saw in Alternator when a GSI has both partition and sort keys which were non-key attributes in the base, cases where updating the GSI-sort-key attribute to the same value it already had caused the entire GSI row to be deleted.
In this series fix this bug (it was a bug in our materialized views implementation) and add a reproducing test (plus a few more tests for similar situations which worked before the patch, and continue to work after it).
Fixes#11801Closes#11808
* github.com:scylladb/scylladb:
test/alternator: add test for issue 11801
MV: fix handling of view update which reassign the same key value
materialized views: inline used-once and confusing function, replace_entry()
(cherry picked from commit e981bd4f21)
`raft_group0` does not own the source and is not responsible for calling
`request_abort`. The source comes from top-level `stop_signal` (see
main.cc) and that's where it's aborted.
Fixes#10668.
Closes#10678
(cherry picked from commit ef7643d504)
When being stopped compaction manager may step on ENOSPC. This is not a
reason to fail stopping process with abort, better to warn this fact in
logs and proceed as if nothing happened
refs: #11245
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Make it the future-returning method and setup the _stop_future in its
only caller. Makes next patch much simpler
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>