Compare commits

...

72 Commits

Author SHA1 Message Date
Yaron Kaikov
6c0825e2a6 release: prepare for 4.6.11 2022-11-28 15:45:26 +02:00
Nadav Har'El
db3dd3bdf6 Merge 'cql3: don't ignore other restrictions when a multi column restriction is present during filtering' from Jan Ciołek
When filtering with multi column restriction present all other restrictions were ignored.
So a query like:
`SELECT * FROM WHERE pk = 0 AND (ck1, ck2) < (0, 0) AND regular_col = 0 ALLOW FILTERING;`
would ignore the restriction `regular_col = 0`.

This was caused by a bug in the filtering code:
2779a171fc/cql3/selection/selection.cc (L433-L449)

When multi column restrictions were detected, the code checked if they are satisfied and returned immediately.
This is fixed by returning only when these restrictions are not satisfied. When they are satisfied the other restrictions are checked as well to ensure all of them are satisfied.

This code was introduced back in 2019, when fixing #3574.
Perhaps back then it was impossible to mix multi column and regular columns and this approach was correct.

Fixes: #6200
Fixes: #12014

Closes #12031

* github.com:scylladb/scylladb:
  cql-pytest: add a reproducer for #12014, verify that filtering multi column and regular restrictions works
  boost/restrictions-test: uncomment part of the test that passes now
  cql-pytest: enable test for filtering combined multi column and regular column restrictions
  cql3: don't ignore other restrictions when a multi column restriction is present during filtering

(cherry picked from commit 2d2034ea28)

Closes #12086
2022-11-27 00:15:04 +02:00
Pavel Emelyanov
4ad24180f5 Merge '[branch-4.6] multishard_mutation_query: don't unpop partition header of spent partition ' from Botond Dénes
When stopping the read, the multishard reader will dismantle the
compaction state, pushing back (unpopping) the currently processed
partition's header to its originating reader. This ensures that if the
reader stops in the middle of a partition, on the next page the
partition-header is re-emitted as the compactor (and everything
downstream from it) expects.
It can happen however that there is nothing more for the current
partition in the reader and the next fragment is another partition.
Since we only push back the partition header (without a partition-end)
this can result in two partitions being emitted without being separated
by a partition end.
We could just add the missing partition-end when needed but it is
pointless, if the partition has no more data, just drop the header, we
won't need it on the next page.

The missing partition-end can generate an "IDL frame truncated" message
as it ends up causing the query result writer to create a corrupt
partition entry.

Fixes: https://github.com/scylladb/scylladb/issues/9482

Closes #11914

* github.com:scylladb/scylladb:
  test/cql-pytest: add regression test for "IDL frame truncated" error
  mutation_compactor: detach_state(): make it no-op if partition was exhausted
  treewide: fix headers
2022-11-16 11:52:51 +03:00
Anna Mikhlin
755c7eeb6a release: prepare for 4.6.10 2022-11-14 10:30:20 +02:00
Eliran Sinvani
8914ca8c58 cql: Fix crash upon use of the word empty for service level name
Wrong access to an uninitialized token instead of the actual
generated string caused the parser to crash, this wasn't
detected by the ANTLR3 compiler because all the temporary
variables defined in the ANTLR3 statements are global in the
generated code. This essentialy caused a null dereference.

Tests: 1. The fixed issue scenario from github.
       2. Unit tests in release mode.

Fixes #11774

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190612133151.20609-1-eliransin@scylladb.com>

Closes #11777

(cherry picked from commit ab7429b77d)
2022-11-10 20:43:44 +02:00
Botond Dénes
e82e4bbed3 test/cql-pytest: add regression test for "IDL frame truncated" error
(cherry picked from commit 11af489e84)
2022-11-07 16:51:14 +02:00
Botond Dénes
f9c457778e mutation_compactor: detach_state(): make it no-op if partition was exhausted
detach_state() allows the user to resume a compaction process later,
without having to keep the compactor object alive. This happens by
generating and returning the mutation fragments the user has to re-feed
to a newly constructed compactor to bring it into the exact same state
the current compactor was at the point of stopping the compaction.
This state includes the partition-header (partition-start and static-row
if any) and the currently active range tombstone.
Detaching the state is pointless however when the compaction was stopped
such that the currently compacted partition was completely exhausted.
Allowing the state to be detached in this case seems benign but it
caused a subtle bug in the main user of this feature: the partition
range scan algorithm, where the fragments included in the detached state
were pushed back into the reader which produced them. If the partition
happened to be exhausted -- meaning the next fragment in the reader was
a partition-start or EOS -- this resulted in the partition being
re-emitted later without a partition-end, resulting in corrupt
query-result being generated, in turn resulting in an obscure "IDL frame
truncated" error.

This patch solves this seemingly benign but sinister bug by making the
return value of `detach_state()` an std::optional and returning a
disengaged optional when the partition was exhausted.

(cherry picked from commit 70b4158ce0)
2022-11-07 16:51:14 +02:00
Botond Dénes
8315a7b164 treewide: fix headers
To fix CI.
2022-11-07 16:51:14 +02:00
Nadav Har'El
291ca8db60 cql3: fix cql3::util::maybe_quote() for keywords
cql3::util::maybe_quote() is a utility function formatting an identifier
name (table name, column name, etc.) that needs to be embedded in a CQL
statement - and might require quoting if it contains non-alphanumeric
characters, uppercase characters, or a CQL keyword.

maybe_quote() made an effort to only quote the identifier name if neccessary,
e.g., a lowercase name usually does not need quoting. But lowercase names
that are CQL keywords - e.g., to or where - cannot be used as identifiers
without quoting. This can cause problems for code that wants to generate
CQL statements, such as the materialized-view problem in issue #9450 - where
a user had a column called "to" and wanted to create a materialized view
for it.

So in this patch we fix maybe_quote() to recognize invalid identifiers by
using the CQL parser, and quote them. This will quote reserved keywords,
but not so-called unreserved keywords, which *are* allowed as identifiers
and don't need quoting. This addition slows down maybe_quote(), but
maybe_quote() is anyway only used in heavy operations which need to
generate CQL.

This patch also adds two tests that reproduce the bug and verify its
fix:

1. Add to the low-level maybe_quote() test (a C++ unit test) also tests
   that maybe_quote() quotes reserved keywords like "to", but doesn't
   quote unreserved keywords like "int".

2. Add a test reproducing issue #9450 - creating a materialized view
   whose key column is a keyword. This new test passes on Cassandra,
   failed on Scylla before this patch, and passes after this patch.

It is worth noting that maybe_quote() now has a "forward compatiblity"
problem: If we save CQL statements generated by maybe_quote(), and a
future version introduces a new reserved keyword, the parser of the
future version may not be able to parse the saved CQL statement that
was generated with the old mayb_quote() and didn't quote what is now
a keyword. This problem can be solved in two ways:

1. Try hard not to introduced new reserved keywords. Instead, introduce
   unreserved keywords. We've been doing this even before recognizing
   this maybe_quote() future-compatibility problem.

2. In the next patch we will introduce quote() - which unconditionally
   quotes identifier names, even if lowercase. These quoted names will
   be uglier for lowercase names - but will be safe from future
   introduction of new keywords. So we can consider switching some or
   all uses of maybe_quote() to quote().

Fixes #9450

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220118161217.231811-1-nyh@scylladb.com>
(cherry picked from commit 5d2f694a90)
2022-11-07 10:38:10 +02:00
Jadw1
4da5fbaa24 CQL3: fromJson accepts string as bool
The problem was incompatibility with cassandra, which accepts bool
as a string in `fromJson()` UDF. The difference between Cassandra and
Scylla now is Scylla accepts whitespaces around word in string,
Cassandra don't. Both are case insensitive.

Fixes: #7915
(cherry picked from commit 1902dbc9ff)
2022-11-07 10:38:10 +02:00
Takuya ASADA
fc16664d81 locator::ec2_snitch: Retry HTTP request to EC2 instance metadata service
EC2 instance metadata service can be busy, ret's retry to connect with
interval, just like we do in scylla-machine-image.

Fixes #10250

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #11688

(cherry picked from commit 6b246dc119)
(cherry picked from commit e2809674d2)
2022-11-06 15:43:58 +02:00
Botond Dénes
80bea5341e Merge 'Alternator, MV: fix bug in some view updates which set the view key to its existing value' from Nadav Har'El
As described in issue #11801, we saw in Alternator when a GSI has both partition and sort keys which were non-key attributes in the base, cases where updating the GSI-sort-key attribute to the same value it already had caused the entire GSI row to be deleted.

In this series fix this bug (it was a bug in our materialized views implementation) and add a reproducing test (plus a few more tests for similar situations which worked before the patch, and continue to work after it).

Fixes #11801

Closes #11808

* github.com:scylladb/scylladb:
  test/alternator: add test for issue 11801
  MV: fix handling of view update which reassign the same key value
  materialized views: inline used-once and confusing function, replace_entry()

(cherry picked from commit e981bd4f21)
2022-11-01 13:31:51 +02:00
Botond Dénes
6ecc772b56 mutation_partition: deletable_row::apply(shadowable_tombstone): remove redundant maybe_shadow()
Shadowing is already checked by the underlying row_tombstone::apply().
This redundant check was introduced by a previous fix to #9483
(6a76e12768). The rest of that patch is
good.

Refs: #9483
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211115091513.181233-1-bdenes@scylladb.com>
(cherry picked from commit b136746040)
2022-10-16 11:53:04 +03:00
Benny Halevy
0b2e951954 range_tombstone_list: insert_from: correct rev.update range_tombstone in not overlapping case
2nd std::move(start) looks like a typo
in fe2fa3f20d.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220404124741.1775076-1-bhalevy@scylladb.com>
(cherry picked from commit 2d80057617)
2022-10-14 12:29:56 +02:00
Pavel Emelyanov
f2a738497f compaction_manager: Swallow ENOSPCs in ::stop()
When being stopped compaction manager may step on ENOSPC. This is not a
reason to fail stopping process with abort, better to warn this fact in
logs and proceed as if nothing happened

refs: #11245

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-13 16:02:33 +03:00
Pavel Emelyanov
badf7c816f exceptions: Mark storage_io_error::code() with noexcept
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-13 16:02:32 +03:00
Pavel Emelyanov
bfb86f2c78 compaction_manager: Shuffle really_do_stop()
Make it the future-returning method and setup the _stop_future in its
only caller. Makes next patch much simpler

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-13 16:02:31 +03:00
Beni Peled
18e7a46038 release: prepare for 4.6.9 2022-10-09 08:54:33 +03:00
Nadav Har'El
cbcfa31e51 cql: validate bloom_filter_fp_chance up-front
Scylla's Bloom filter implementation has a minimal false-positive rate
that it can support (6.71e-5). When setting bloom_filter_fp_chance any
lower than that, the compute_bloom_spec() function, which writes the bloom
filter, throws an exception. However, this is too late - it only happens
while flushing the memtable to disk, and a failure at that point causes
Scylla to crash.

Instead, we should refuse the table creation with the unsupported
bloom_filter_fp_chance. This is also what Cassandra did six years ago -
see CASSANDRA-11920.

This patch also includes a regression test, which crashes Scylla before
this patch but passes after the patch (and also passes on Cassandra).

Fixes #11524.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11576

(cherry picked from commit 4c93a694b7)
2022-10-04 16:23:25 +03:00
Nadav Har'El
5ee69ff3a9 alternator: return ProvisionedThroughput in DescribeTable
DescribeTable is currently hard-coded to return PAY_PER_REQUEST billing
mode. Nevertheless, even in PAY_PER_REQUEST mode, the DescribeTable
operation must return a ProvisionedThroughput structure, listing both
ReadCapacityUnits and WriteCapacityUnits as 0. This requirement is not
stated in some DynamoDB documentation but is explictly mentioned in
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ProvisionedThroughput.html
Also in empirically, DynamoDB returns ProvisionedThroughput with zeros
even in PAY_PER_REQUEST mode. We even had an xfailing test to confirm this.

The ProvisionedThroughput structure being missing was a problem for
applications like DynamoDB connectors for Spark, if they implicitly
assume that ProvisionedThroughput is returned by DescribeTable, and
fail (as described in issue #11222) if it's outright missing.

So this patch adds the missing ProvisionedThroughput structure, and
the xfailing test starts to pass.

Note that this patch doesn't change the fact that attempting to set
a table to PROVISIONED billing mode is ignored: DescribeTable continues
to always return PAY_PER_REQUEST as the billing mode and zero as the
provisioned capacities.

Fixes #11222

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11298

(cherry picked from commit 941c719a23)
2022-10-03 14:29:22 +03:00
Tomasz Grabiec
949103d22a test: lib: random_mutation_generator: Don't generate mutations with marker uncompacted with shadowable tombstone
The generator was first setting the marker then applied tombstones.

The marker was set like this:

  row.marker() = random_row_marker();

Later, when shadowable tombstones were applied, they were compacted
with the marker as expected.

However, the key for the row was chosen randomly in each iteration and
there are multiple keys set, so there was a possibility of a key clash
with an earlier row. This could override the marker without applying
any tombstones, which is conditional on random choice.

This could generate rows with markers uncompacted with shadowable tombstones.

This broken row_cache_test::test_concurrent_reads_and_eviction on
comparison between expected and read mutations. The latter was
compacted because it went through an extra merge path, which compacts
the row.

Fix by making sure there are no key clashes.

Closes #11663

(cherry picked from commit 5268f0f837)
2022-10-03 09:00:28 +03:00
Botond Dénes
549cb60f4c sstables: crawling mx-reader: make on_out_of_clustering_range() no-op
Said method currently emits a partition-end. This method is only called
when the last fragment in the stream is a range tombstone change with a
position after all clustered rows. The problem is that
consume_partition_end() is also called unconditionally, resulting in two
partition-end fragments being emitted. The fix is simple: make this
method a no-op, there is nothing to do there.

Also add two tests: one targeted to this bug and another one testing the
crawling reader with random mutations generated for random schema.

Fixes: #11421

Closes #11422

(cherry picked from commit be9d1c4df4)
2022-09-30 17:56:58 +03:00
Botond Dénes
37633c5576 test/lib/random_schema: add a simpler overload for fixed partition count
Some tests want to generate a fixed amount of random partitions, make
their life easier.

(cherry picked from commit 98f3d516a2)

Ref #11421 (prerequisite)
2022-09-30 17:56:10 +03:00
Michael Livshin
abd9f43fa7 batchlog_manager: warn when a batch fails to replay
Only for reasons other than "no such KS", i.e. when the failure is
presumed transient and the batch in question is not deleted from
batchlog and will be retried in the future.

(Would info be more appropriate here than warning?)

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>

Closes #10556

Fixes #10636

(cherry picked from commit 00ed4ac74c)
2022-09-29 12:13:21 +03:00
Raphael S. Carvalho
d41d4db5c0 compaction: Make cleanup withstand better disk pressure scenario
It's not uncommong for cleanup to be issued against an entire keyspace,
which may be composed of tons of tables. To increase chances of success
if low on space, cleanup will now start from smaller tables first, such
that bigger tables will have more space available, once they're reached,
to satisfy their space requirement.

parallel_for_each() is dropped and wasn't needed given that manager
performs per-shard serialization of cleanup jobs.

Refs #9504.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211130133712.64517-1-raphaelsc@scylladb.com>
(cherry picked from commit 0d5ac845e1)
2022-09-29 10:15:29 +03:00
Michał Radwański
c500043a78 flat_mutation_reader: allow destructing readers which are not closed and didn't initiate any IO.
In functions such as upgrade_to_v2 (excerpt below), if the constructor
of transforming_reader throws, r needs to be destroyed, however it
hasn't been closed. However, if a reader didn't start any operations, it
is safe to destruct such a reader. This issue can potentially manifest
itself in many more readers and might be hard to track down. This commit
adds a bool indicating whether a close is anticipated, thus avoiding
errors in the destructor.

Code excerpt:
flat_mutation_reader_v2 upgrade_to_v2(flat_mutation_reader r) {
    class transforming_reader : public flat_mutation_reader_v2::impl {
        // ...
    };
    return make_flat_mutation_reader_v2<transforming_reader>(std::move(r));
}

Fixes #9065.

(cherry picked from commit 9ada63a9cb)
2022-09-29 09:40:07 +03:00
Pavel Emelyanov
af4752a526 messaging_service: Fix gossiper verb group
When configuring tcp-nodelay unconditionally, messaging service thinks
gossiper uses group index 1, though it had changed some time ago and now
those verbs belong to group 0.

fixes: #11465

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit 2c74062962)
2022-09-19 10:32:49 +03:00
Anna Mikhlin
0aa9a8c266 release: prepare for 4.6.8 2022-09-19 09:30:09 +03:00
Michał Chojnowski
85fd6ab377 sstables: add a flag for disabling long-term index caching
Long-term index caching in the global cache, as introduced in 4.6, is a major
pessimization for workloads where accesses to the index are (spacially) sparse.
We want to have a way to disable it for the affected workloads.

There is already infrastructure in place for disabling it for BYPASS CACHE
queries. One way of solving the issue is hijacking that infrastructure.

This patch adds a global flag (and a corresponding CLI option) which controls
index caching. Setting the flag to `false` causes all index reads to behave
like they would in BYPASS CACHE queries.

Consequences of this choice:

- The per-SSTable partition_index_cache is unused. Every index_reader has
  its own, and they die together. Independent reads can no longer reuse the
  work of other reads which hit the same index pages. This is not crucial,
  since partition accesses have no (natural) spatial locality. Note that
  the original reason for partition_index_cache -- the ability to share
  reads for the lower and upper bound of the query -- is unaffected.
- The per-SSTable cached_file is unused. Every index_reader has its own
  (uncached) input stream from the index file, and every
  bsearch_clustered_cursor has its own cached_file, which dies together with
  the cursor. Note that the cursor still can perform its binary search with
  caching. However, it won't be able to reuse the file pages read by
  index_reader. In particular, if the promoted index is small, and fits inside
  the same file page as its index_entry, that page will be re-read.
  It can also happen that index_reader will read the same index file page
  multiple times. When the summary is so dense that multiple index pages fit in
  one index file page, advancing the upper bound, which reads the next index
  page, will read the same index file page. Since summary:disk ratio is 1:2000,
  this is expected to happen for partitions with size greater than 2000
  partition keys.

Fixes #11202

(cherry picked from commit cdb3e71045)
2022-09-18 13:30:28 +03:00
Beni Peled
7c79c513d1 release: prepare for 4.6.7 2022-09-07 11:17:55 +03:00
Karol Baryła
9a8e73f0c3 transport/server.cc: Return correct size of decompressed lz4 buffer
An incorrect size is returned from the function, which could lead to
crashes or undefined behavior. Fix by erroring out in these cases.

Fixes #11476

(cherry picked from commit 1c2eef384d)
2022-09-07 10:58:54 +03:00
Benny Halevy
fac0443200 snapshot-ctl: run_snapshot_modify_operation: reject views and secondary index using the schema
Detecting a secondary index by checking for a dot
in the table name is wrong as tables generated by Alternator
may contain a dot in their name.

Instead detect bot hmaterialized view and secondary indexes
using the schema()->is_view() method.

Fixes #10526

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit aa127a2dbb)
2022-09-06 17:56:30 +03:00
Piotr Sarna
6bcfef2cfa cql3: fix misleading error message for service level timeouts
The error message incorrectly stated that the timeout value cannot
be longer than 24h, but it can - the actual restriction is that the
value cannot be expressed in units like days or months, which was done
in order to significantly simplify the parsing routines (and the fact
that timeouts counted in days are not expected to be common).

Fixes #10286

Closes #10294

(cherry picked from commit 85e95a8cc3)
2022-09-01 20:34:22 +03:00
Juliusz Stasiewicz
d2c67a2429 cdc/check_and_repair_cdc_streams: ignore LEFT endpoints
When `check_and_repair_cdc_streams` encountered a node with status LEFT, Scylla
would throw. This behavior is fixed so that LEFT nodes are simply ignored.

Fixes #9771

Closes #9778

(cherry picked from commit 351f142791)
2022-09-01 15:44:35 +03:00
Avi Kivity
d6c2f228e7 Merge 'row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy' from Tomasz Grabiec
Scenario:

cache = [
    row(pos=2, continuous=false),
    row(pos=after(2), dummy=true)
]

Scanning read starts, starts populating [-inf, before(2)] from sstables.

row(pos=2) is evicted.

cache = [
    row(pos=after(2), dummy=true)
]

Scanning read finishes reading from sstables.

Refreshes cache cursor via
partition_snapshot_row_cursor::maybe_refresh(), which calls
partition_snapshot_row_cursor::advance_to() because iterators are
invalidated. This advances the cursor to
after(2). no_clustering_row_between(2, after(2)) returns true, so
advance_to() returns true, and maybe_refresh() returns true. This is
interpreted by the cache reader as "the cursor has not moved forward",
so it marks the range as complete, without emitting the row with
pos=2. Also, it marks row(pos=after(2)) as continuous, so later reads
will also miss the row.

The bug is in advance_to(), which is using
no_clustering_row_between(a, b) to determine its result, which by
definition excludes the starting key.

Discovered by row_cache_test.cc::test_concurrent_reads_and_eviction
with reduced key range in the random_mutation_generator (1024 -> 16).

Fixes #11239

Closes #11240

* github.com:scylladb/scylladb:
  test: mvcc: Fix illegal use of maybe_refresh()
  tests: row_cache_test: Add test_eviction_of_upper_bound_of_population_range()
  tests: row_cache_test: Introduce one_shot mode to throttle
  row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy
2022-08-11 19:19:30 +02:00
Yaron Kaikov
a1b1df2074 release: prepare for 4.6.6 2022-08-07 16:24:51 +03:00
Avi Kivity
14e13ecbd4 Merge 'Backport: Fix map subscript crashes when map or subscript is null' from Nadav Har'El
This is a backport of https://github.com/scylladb/scylla/pull/10420 to branch 5.0.
Branch 5.0 had somewhat different code in this expression area, so the backport was not automatically, but nevertheless was fairly straightforward - just copy the exact same checking code to its right place, and keep the exact same tests to see we indeed fixed the bug.

Refs #10535.

The original cover letter from https://github.com/scylladb/scylla/pull/10420:

In the filtering expression "WHERE m[?] = 2", our implementation was buggy when either the map, or the subscript, was NULL (and also when the latter was an UNSET_VALUE). Our code ended up dereferencing null objects, yielding bizarre errors when we were lucky, or crashes when we were less lucky - see examples of both in issues https://github.com/scylladb/scylla/issues/10361, https://github.com/scylladb/scylla/issues/10399, https://github.com/scylladb/scylla/pull/10401. The existing test test_null.py::test_map_subscript_null reproduced all these bugs sporadically.

In this series we improve the test to reproduce the separate bugs separately, and also reproduce additional problems (like the UNSET_VALUE). We then define both m[NULL] and NULL[2] to result in NULL instead of the existing undefined (and buggy, and crashing) behavior. This new definition is consistent with our usual SQL-inspired tradition that NULL "wins" in expressions - e.g., NULL < 2 is also defined as resulting in NULL.

However, this decision differs from Cassandra, where m[NULL] is considered an error but NULL[2] is allowed. We believe that making m[NULL] be a NULL instead of an error is more consistent, and moreover - necessary if we ever want to support more complicate expressions like m[a], where the column a can be NULL for some rows and non-NULL for others, and it doesn't make sense to return an "invalid query" error in the middle of the scan.

Fixes https://github.com/scylladb/scylla/issues/10361
Fixes https://github.com/scylladb/scylla/issues/10399
Fixes https://github.com/scylladb/scylla/pull/10401

Closes #11142

* github.com:scylladb/scylla:
  test/cql-pytest: reproducer for CONTAINS NULL bug
  expressions: don't dereference invalid map subscript in filter
  expressions: fix invalid dereference in map subscript evaluation
  test/cql-pytest: improve tests for map subscripts and nulls

(cherry picked from commit 23a34d7e42)
2022-07-31 15:44:00 +03:00
Benny Halevy
b8740bde6e multishard_mutation_query: do_query: stop ctx if lookup_readers fails
lookup_readers might fail after populating some readers
and those better be closed before returning the exception.

Fixes #10351

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10425

(cherry picked from commit 055141fc2e)
2022-07-25 14:52:58 +03:00
Benny Halevy
1b23f8d038 sstables: time_series_sstable_set: insert: make exception safe
Need to erase the shared sstable from _sstables
if insertion to _sstables_reversed fails.

Fixes #10787

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit cd68b04fbf)
2022-07-25 14:22:08 +03:00
Tomasz Grabiec
05a228e4c5 memtable: Fix missing range tombstones during reads under ceratin rare conditions
There is a bug introduced in e74c3c8 (4.6.0) which makes memtable
reader skip one a range tombstone for a certain pattern of deletions
and under certain sequence of events.

_rt_stream contains the result of deoverlapping range tombstones which
had the same position, which were sipped from all the versions. The
result of deoverlapping may produce a range tombstone which starts
later, at the same position as a more recent tombstone which has not
been sipped from the partition version yet. If we consume the old
range tombstone from _rt_stream and then refresh the iterators, the
refresh will skip over the newer tombstone.

The fix is to drop the logic which drains _rt_stream so that
_rt_stream is always merged with partition versions.

For the problem to trigger, there have to be multiple MVCC versions
(at least 2) which contain deletions of the following form:

[a, c] @ t0
[a, b) @ t1, [b, d] @ t2

c > b

The proper sequence for such versions is (assuming d > c):

[a, b) @ t1,
[b, d] @ t2

Due to the bug, the reader will produce:

[a, b) @ t1,
[b, c] @ t0

The reader also needs to be preempted right before processing [b, d] @
t2 and iterators need to get invalidated so that
lsa_partition_reader::do_refresh_state() is called and it skips over
[b, d] @ t2. Otherwise, the reader will emit [b, d] @ t2 later. If it
does emit the proper range tombstone, it's possible that it will violate
fragment order in the stream if _rt_stream accumulated remainders
(possible with 3 MVCC versions).

The problem goes away once MVCC versions merge.

Fixes #10913
Fixes #10830

Closes #10914

(cherry picked from commit a6aef60b93)

[avi: backport prerequisite position_range_to_clustering_range() too]
2022-07-19 19:27:15 +03:00
Yaron Kaikov
2ec293ab0e release: prepare for 4.6.5 2022-07-19 16:02:46 +03:00
Pavel Emelyanov
b60f14601e azure_snitch: Do nothing on non-io-cpu
All snitch drivers are supposed to snitch info on some shard and
replicate the dc/rack info across others. All, but azure really do so.
The azure one gets dc/rack on all shards, which's excessive but not
terrible, but when all shards start to replicate their data to all the
others, this may lead to use-after-frees.

fixes: #10494

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit c6d0bc87d0)
2022-07-17 14:22:29 +03:00
Raphael S. Carvalho
284dd21ef7 compaction_manager: Fix race when selecting sstables for rewrite operations
Rewrite operations are scrub, cleanup and upgrade.

Race can happen because 'selection of sstables' and 'mark sstables as
compacting' are decoupled. So any deferring point in between can lead
to a parallel compaction picking the same files. After commit 2cf0c4bbf,
files are marked as compacting before rewrite starts, but it didn't
take into account the commit c84217ad which moved retrieval of
candidates to a deferring thread, before rewrite_sstables() is even
called.

Scrub isn't affected by this because it uses a coarse grained approach
where whole operation is run with compaction disabled, which isn't good
because regular compaction cannot run until its completion.

From now on, selection of files and marking them as compacting will
be serialized by running them with compaction disabled.

Now cleanup will also retrieve sstables with compaction disabled,
meaning it will no longer leave uncleaned files behind, which is
important to avoid data resurrection if node regains ownership of
data in uncleaned files.

Fixes #8168.
Refs #8155.

[backport notes:
- minor conflict around run_with_compaction_disabled()
- bumped into our old friend
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95111,
so I had to use std::ref() on local copy of lambda
- with the yielding part of candidate retrieval now happening in
rewrite_sstables(), task registration is moved to after run_with_
compaction_disabled() call, so the latter won't incorrectly try
to stop the task that called it, which triggers an assert in
debug mode.
]

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211129133107.53011-1-raphaelsc@scylladb.com>
(cherry picked from commit 80a1ebf0f3)

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #10963
2022-07-13 18:45:36 +03:00
Pavel Emelyanov
8b52f1d6e7 view: Fix trace-state pointer use after move
It's moved into .mutate_locally() but it captured and used in its
continuation. It works well just because moved-from pointer looks like
nullptr and all the tracing code checks for it to be non-such.

tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1266/
       (CI job failed on post-actions thus it's red)

Fixes #11015

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220711134152.30346-1-xemul@scylladb.com>
(cherry picked from commit 5526738794)
2022-07-12 14:21:11 +03:00
Piotr Sarna
157951f756 view: exclude using static columns in the view filter
The code which applied view filtering (i.e. a condition placed
on a view column, e.g. "WHERE v = 42") erroneously used a wildcard
selection, which also assumes that static columns are needed,
if the base table contains any such columns.
The filtering code currently assumes that no such columns are fetched,
so the selection is amended to only ask for regular columns
(primary key columns are sent anyway, because they are enabled
via slice options, so no need to ask for them explicitly).

Fixes #10851

Closes #10855

(cherry picked from commit bc3a635c42)
2022-07-11 17:07:22 +03:00
Juliusz Stasiewicz
4f643ed4a5 cdc: check_and_repair_cdc_streams: regenerate if too many streams are present
If the number of streams exceeds the number of token ranges
it indicates that some spurious streams from decommissioned
nodes are present.

In such a situation - simply regenerate.

Fixes #9772

Closes #9780

(cherry picked from commit ea46439858)
2022-07-07 18:53:14 +02:00
Avi Kivity
b598629b7f messaging: do isolate default tenants
In 10dd08c9 ("messaging_service: supply and interpret rpc isolation_cookies",
4.2), we added a mechanism to perform rpc calls in remote scheduling groups
based on the connection identity (rather than the verb), so that
connection processing itself can run in the correct group (not just
verb processing), and so that one verb can run in different groups according
to need.

In 16d8cdadc ("messaging_service: introduce the tenant concept", 4.2), we
changed the way isolation cookies are sent:

 scheduling_group
 messaging_service::scheduling_group_for_verb(messaging_verb verb) const {
     return _scheduling_info_for_connection_index[get_rpc_client_idx(verb)].sched_group;
@@ -665,11 +694,14 @@ shared_ptr<messaging_service::rpc_protocol_client_wrapper> messaging_service::ge
     if (must_compress) {
         opts.compressor_factory = &compressor_factory;
     }
     opts.tcp_nodelay = must_tcp_nodelay;
     opts.reuseaddr = true;
-    opts.isolation_cookie = _scheduling_info_for_connection_index[idx].isolation_cookie;
+    // We send cookies only for non-default statement tenant clients.
+    if (idx > 3) {
+        opts.isolation_cookie = _scheduling_info_for_connection_index[idx].isolation_cookie;
+    }

This effectively disables the mechanism for the default tenant. As a
result some verbs will be executed in whatever group the messaging
service listener was started in. This used to be the main group,
but in 554ab03 ("main: Run init_server and join_cluster inside
maintenance scheduling group", 4.5), this was change to the maintenance
group. As a result normal read/writes now compete with maintenance
operations, raising their latency significantly.

Fix by sending the isolation cookie for all connections. With this,
a 2-node cassandra-stress load has 99th percentile increase by just
3ms during repair, compared to 10ms+ before.

Fixes #9505.

Closes #10673

(cherry picked from commit c83393e819)
2022-07-05 13:42:10 +03:00
Nadav Har'El
43f82047b9 Merge 'types: fix is_string for reversed types' from Piotr Sarna
Checking if the type is string is subtly broken for reversed types,
and these types will not be recognized as strings, even though they are.
As a result, if somebody creates a column with DESC order and then
tries to use operator LIKE on it, it will fail because the type
would not be recognized as a string.

Fixes #10183

Closes #10181

* github.com:scylladb/scylla:
  test: add a case for LIKE operator on a descending order column
  types: fix is_string for reversed types

(cherry picked from commit 733672fc54)
2022-07-03 17:59:56 +03:00
Benny Halevy
ec3c07de6e compaction_manager: perform_offstrategy: run_offstrategy_compaction in maintenance scheduling group
It was assumed that offstrategy compaction is always triggered by streaming/repair
where it would inherit the caller's scheduling group.

However, offstrategy is triggered by a timer via table::_off_strategy_trigger so I don't see
how the expiration of this timer will inherit anything from streaming/repair.

Also, since d309a86, offstrategy compaction
may be triggered by the api where it will run in the default scheduling group.

The bottom line is that the compaction manager needs to explicitly perform offstrategy compaction
in the maintenance scheduling group similar to `perform_sstable_scrub_validate_mode`.

Fixes #10151

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220302084821.2239706-1-bhalevy@scylladb.com>
(cherry picked from commit 0764e511bb)
2022-07-03 14:30:54 +03:00
Takuya ASADA
82572e8cfe scylla_coredump_setup: support new format of Storage field
Storage field of "coredumpctl info" changed at systemd-v248, it added
"(present)" on the end of line when coredump file available.

Fixes #10669

Closes #10714

(cherry picked from commit ad2344a864)
2022-07-03 13:55:25 +03:00
Nadav Har'El
2b9ed79c6f alternator: forbid empty AttributesToGet
In DynamoDB one can retrieve only a subset of the attributes using the
AttributesToGet or ProjectionExpression paramters to read requests.
Neither allows an empty list of attributes - if you don't want any
attributes, you should use Select=COUNT instead.

Currently we correctly refuse an empty ProjectionExpression - and have
a test for it:
test_projection_expression.py::test_projection_expression_toplevel_syntax

However, Alternator is missing the same empty-forbidding logic for
AttributesToGet. An empty AttributesToGet is currently allowed, and
basically says "retrieve everything", which is sort of unexpected.

So this patch adds the missing logic, and the missing test (actually
two tests for the same thing - one using GetItem and the other Query).

Fixes #10332

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220405113700.9768-1-nyh@scylladb.com>
(cherry picked from commit 9c1ebdceea)
2022-07-03 13:36:02 +03:00
Avi Kivity
ab0b6fd372 Update seastar submodule (json crash in describe_ring)
* seastar 7a430a0830...8b2c13b346 (1):
  > Merge 'stream_range_as_array: always close output stream' from Benny Halevy

Fixes #10592.
2022-06-08 16:49:53 +03:00
Nadav Har'El
12f1718ef4 alternator: allow DescribeTimeToLive even without TTL enabled
We still consider the TTL support in Alternator to be experimental, so we
don't want to allow a user to enable TTL on a table without turning on a
"--experimental-features" flag. However, there is no reason not to allow
the DescribeTimeToLive call when this experimental flag is off - this call
would simply reply with the truth - that the TTL feature is disabled for
the table!

This is important for client code (such as the Terraform module
described in issue #10660) which uses DescribeTimeToLive for
information, even when it never intends to actually enable TTL.

The patch is trivial - we simply remove the flag check in
DescribeTimeToLive, the code works just as before.

After this patch, the following test now works on Scylla without
experimental flags turned on:

    test/alternator/run test_ttl.py::test_describe_ttl_without_ttl

Refs #10660

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 8ecf1e306f)
2022-05-30 20:40:34 +03:00
Tomasz Grabiec
322dfe8403 sstable: partition_index_cache: Fix abort on bad_alloc during page loading
When entry loading fails and there is another request blocked on the
same page, attempt to erase the failed entry will abort because that
would violate entry_ptr guarantees, which is supposed to keep the
entry alive.

The fix in 92727ac36c was incomplete. It
only helped for the case of a single loader. This patch makes a more
general approach by relaxing the assert.

The assert manifested like this:

scylla: ./sstables/partition_index_cache.hh:71: sstables::partition_index_cache::entry::~entry(): Assertion `!is_referenced()' failed.

Fixes #10617

Closes #10653

(cherry picked from commit f87274f66a)
2022-05-30 13:00:46 +03:00
Beni Peled
11f008e8fd release: prepare for 4.6.4 2022-05-16 15:20:35 +03:00
Benny Halevy
fd7314a362 table: clear: serialize with ongoing flush
Get all flush permits to serialize with any
ongoing flushes and preventing further flushes
during table::clear, in particular calling
discard_completed_segments for every table and
clearing the memtables in clear_and_add.

Fixes #10423

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit aae532a96b)
2022-05-15 13:43:43 +03:00
Raphael S. Carvalho
d27468f078 compaction: LCS: don't write to disengaged optional on compaction completion
Dtest triggers the problem by:
1) creating table with LCS
2) disabling regular compaction
3) writing a few sstables
4) running maintenance compaction, e.g. cleanup

Once the maintenance compaction completes, disengaged optional _last_compacted_keys
triggers an exception in notify_completion().

_last_compacted_keys is used by regular for its round-robin file picking
policy. It stores the last compacted key for each level. Meaning it's
irrelevant for any other compaction type.

Regular compaction is responsible for initializing it when it runs for
the first time to pick files. But with it disabled, notify_completion()
will find it uninitialized, therefore resulting in bad_optional_access.

To fix this, the procedure is skipped if _last_compacted_keys is
disengaged. Regular compaction, once re-enabled, will be able to
fill _last_compacted_keys by looking at metadata of the files.

compaction_test.py::TestCompaction::test_disable_autocompaction_doesnt_
block_user_initiated_compactions[CLEANUP-LeveledCompactionStrategy]
now passes.

Fixes #10378.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #10508

(cherry picked from commit 8e99d3912e)
2022-05-15 13:20:30 +03:00
Juliusz Stasiewicz
74ef1ee961 CQL: Replace assert by exception on invalid auth opcode
One user observed this assertion fail, but it's an extremely rare event.
The root cause - interlacing of processing STARTUP and OPTIONS messages -
is still there, but now it's harmless enough to leave it as is.

Fixes #10487

Closes #10503

(cherry picked from commit 603dd72f9e)
2022-05-10 14:03:03 +02:00
Benny Halevy
07549d159c compaction: time_window_compaction_strategy: reset estimated_remaining_tasks when running out of candidates
_estimated_remaining_tasks gets updated via get_next_non_expired_sstables ->
get_compaction_candidates, but otherwise if we return earlier from
get_sstables_for_compaction, it does not get updated and may go out of sync.

Refs #10418
(to be closed when the fix reaches branch-4.6)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10419

(cherry picked from commit 01f41630a5)
2022-05-09 09:36:22 +03:00
Eliran Sinvani
189bbcd82d prepared_statements: Invalidate batch statement too
It seams that batch prepared statements always return false for
depends_on, this in turn renders the removal criteria from the
prepared statements cache to always be false which result by the
queries not being evicted.
Here we change the function to return the true state meaning,
they will return true if one of the sub queries is dependant
upon the keyspace and/ or column family.

Fixes #10129

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
(cherry picked from commit 4eb0398457)
2022-05-08 12:33:00 +03:00
Eliran Sinvani
70e6921125 cql3 statements: Change dependency test API to express better it's
purpose

Cql statements used to have two API functions, depends_on_keyspace and
depends_on_column_family. The former, took as a parameter only a table
name, which makes no sense. There could be multiple tables with the same
name each in a different keyspace and it doesn't make sense to
generalize the test - i.e to ask "Does a statement depend on any table
named XXX?"
In this change we unify the two calls to one - depends on that takes a
keyspace name and optionally also a table name, that way every logical
dependency tests that makes sense is supported by a single API call.

(cherry picked from commit bf50dbd35b)

Ref #10129
2022-05-08 12:32:41 +03:00
Calle Wilund
e314158708 cdc: Ensure columns removed from log table are registered as dropped
If we are redefining the log table, we need to ensure any dropped
columns are registered in "dropped_columns" table, otherwise clients will not
be able to read data older than now.
Includes unit test.

Should probably be backported to all CDC enabled versions.

Fixes #10473
Closes #10474

(cherry picked from commit 78350a7e1b)
2022-05-05 11:34:56 +02:00
Tomasz Grabiec
46586532c9 loading_cache: Make invalidation take immediate effect
There are two issues with current implementation of remove/remove_if:

  1) If it happens concurrently with get_ptr(), the latter may still
  populate the cache using value obtained from before remove() was
  called. remove() is used to invalidate caches, e.g. the prepared
  statements cache, and the expected semantic is that values
  calculated from before remove() should not be present in the cache
  after invalidation.

  2) As long as there is any active pointer to the cached value
  (obtained by get_ptr()), the old value from before remove() will be
  still accessible and returned by get_ptr(). This can make remove()
  have no effect indefinitely if there is persistent use of the cache.

One of the user-perceived effects of this bug is that some prepared
statements may not get invalidated after a schema change and still use
the old schema (until next invalidation). If the schema change was
modifying UDT, this can cause statement execution failures. CQL
coordinator will try to interpret bound values using old set of
fields. If the driver uses the new schema, the coordinaotr will fail
to process the value with the following exception:

  User Defined Type value contained too many fields (expected 5, got 6)

The patch fixes the problem by making remove()/remove_if() erase old
entries from _loading_values immediately.

The predicate-based remove_if() variant has to also invalidate values
which are concurrently loading to be safe. The predicate cannot be
avaluated on values which are not ready. This may invalidate some
values unnecessarily, but I think it's fine.

Fixes #10117

Message-Id: <20220309135902.261734-1-tgrabiec@scylladb.com>
(cherry picked from commit 8fa704972f)
2022-05-04 15:38:11 +03:00
Avi Kivity
0114244363 Merge 'replica/database: drop_column_family(): properly cleanup stale querier cache entries' from Botond Dénes
Said method has to evict all querier cache entries, belonging to the to-be-dropped table. This is already the case, but there was a window where new entries could sneak in, causing a stale reference to the table to be de-referenced later when they are evicted due to TTL. This window is now closed, the entries are evicted after the method has waited for all ongoing operations on said table to stop.

Fixes: #10450

Closes #10451

* github.com:scylladb/scylla:
  replica/database: drop_column_family(): drop querier cache entries after waiting for ops
  replica/database: finish coroutinizing drop_column_family()
  replica/database: make remove(const column_family&) private

(cherry picked from commit 7f1e368e92)
2022-05-01 17:11:52 +03:00
Avi Kivity
f154c8b719 Update tools/java submodule (bad IPv6 addresses in nodetool)
* tools/java 05ec511bbb...46744a92ff (1):
  > CASSANDRA-17581 fix NodeProbe: Malformed IPv6 address at index

Fixes #10442
2022-04-28 11:35:09 +03:00
Beni Peled
8bf149fdd6 release: prepare for 4.6.3 2022-04-14 14:16:52 +03:00
Tomasz Grabiec
0265d56173 utils/chunked_managed_vector: Fix sigsegv during reserve()
Fixes the case of make_room() invoked with last_chunk_capacity_deficit
but _size not in the last reserved chunk.

Found during code review, no user impact.

Fixes #10364.

Message-Id: <20220411224741.644113-1-tgrabiec@scylladb.com>
(cherry picked from commit 0c365818c3)
2022-04-13 10:29:30 +03:00
Tomasz Grabiec
e50452ba43 utils/chunked_vector: Fix sigsegv during reserve()
Fixes the case of make_room() invoked with last_chunk_capacity_deficit
but _size not in the last reserved chunk.

Found during code review, no known user impact.

Fixes #10363.

Message-Id: <20220411222605.641614-1-tgrabiec@scylladb.com>
(cherry picked from commit 01eeb33c6e)

[avi: make max_chunk_capacity() public for backport]
2022-04-13 10:29:03 +03:00
Avi Kivity
a205f644cb transport: return correct error codes when downgrading v4 {WRITE,READ}_FAILURE to {WRITE,READ}_TIMEOUT
Protocol v4 added WRITE_FAILURE and READ_FAILURE. When running under v3
we downgrade these exceptions to WRITE_TIMEOUT and READ_TIMEOUT (since
the client won't understand the v4 errors), but we still send the new
error codes. This causes the client to become confused.

Fix by updating the error codes.

A better fix is to move the error code from the constructor parameter
list and hard-code it in the constructor, but that is left for a follow-up
after this minimal fix.

Fixes #5610.

Closes #10362

(cherry picked from commit 987e6533d2)
2022-04-13 09:49:02 +03:00
Tomasz Grabiec
f136b5b950 utils/chunked_managed_vector: Fix corruption in case there is more than one chunk
If reserve() allocates more than one chunk, push_back() should not
work with the last chunk. This can result in items being pushed to the
wrong chunk, breaking internal invariants.

Also, pop_back() should not work with the last chunk. This breaks when
there is more than one chunk.

Currently, the container is only used in the sstable partition index
cache.

Manifests by crashes in sstable reader which touch sstables which have
partition index pages with more than 1638 partition entries.

Introduced in 78e5b9fd85 (4.6.0)

Fixes #10290

Message-Id: <20220407174023.527059-1-tgrabiec@scylladb.com>
(cherry picked from commit 41fe01ecff)
2022-04-08 10:53:52 +03:00
Takuya ASADA
69a1325884 docker: enable --log-to-stdout which mistakenly disabled
Since our Docker image moved to Ubuntu, we mistakenly copy
dist/docker/etc/sysconfig/scylla-server to /etc/sysconfig, which is not
used in Ubuntu (it should be /etc/default).
So /etc/default/scylla-server is just default configuration of
scylla-server .deb package, --log-to-stdout is 0, same as normal installation.

We don't want keep the duplicated configuration file anyway,
so let's drop dist/docker/etc/sysconfig/scylla-server and configure
/etc/default/scylla-server in build_docker.sh.

Fixes #10270

Closes #10280

(cherry picked from commit bdefea7c82)
2022-04-07 12:13:35 +03:00
Avi Kivity
ab153c9b94 Update seastar submodule (logger deadlock with large messages)
* seastar 34e58f9995...94a462d94b (2):
  > log: Fix silencer to be shard-local and logger-global
  > log: Silence logger when logging

Fixes #10336.
2022-04-05 19:43:49 +03:00
117 changed files with 1715 additions and 312 deletions

View File

@@ -60,7 +60,7 @@ fi
# Default scylla product/version tags
PRODUCT=scylla
VERSION=4.6.2
VERSION=4.6.11
if test -f version
then

View File

@@ -415,6 +415,11 @@ future<executor::request_return_type> executor::describe_table(client_state& cli
rjson::add(table_description, "BillingModeSummary", rjson::empty_object());
rjson::add(table_description["BillingModeSummary"], "BillingMode", "PAY_PER_REQUEST");
rjson::add(table_description["BillingModeSummary"], "LastUpdateToPayPerRequestDateTime", rjson::value(creation_date_seconds));
// In PAY_PER_REQUEST billing mode, provisioned capacity should return 0
rjson::add(table_description, "ProvisionedThroughput", rjson::empty_object());
rjson::add(table_description["ProvisionedThroughput"], "ReadCapacityUnits", 0);
rjson::add(table_description["ProvisionedThroughput"], "WriteCapacityUnits", 0);
rjson::add(table_description["ProvisionedThroughput"], "NumberOfDecreasesToday", 0);
std::unordered_map<std::string,std::string> key_attribute_types;
// Add base table's KeySchema and collect types for AttributeDefinitions:
@@ -2078,6 +2083,9 @@ static attrs_to_get calculate_attrs_to_get(const rjson::value& req, std::unorder
for (auto it = attributes_to_get.Begin(); it != attributes_to_get.End(); ++it) {
attribute_path_map_add("AttributesToGet", ret, it->GetString());
}
if (ret.empty()) {
throw api_error::validation("Empty AttributesToGet is not allowed. Consider using Select=COUNT instead.");
}
return ret;
} else if (has_projection_expression) {
const rjson::value& projection_expression = req["ProjectionExpression"];

View File

@@ -94,10 +94,7 @@ future<executor::request_return_type> executor::update_time_to_live(client_state
}
future<executor::request_return_type> executor::describe_time_to_live(client_state& client_state, service_permit permit, rjson::value request) {
_stats.api_operations.update_time_to_live++;
if (!_proxy.get_db().local().features().cluster_supports_alternator_ttl()) {
co_return api_error::unknown_operation("DescribeTimeToLive not yet supported. Experimental support is available if the 'alternator_ttl' experimental feature is enabled on all nodes.");
}
_stats.api_operations.describe_time_to_live++;
schema_ptr schema = get_table(_proxy, request);
std::map<sstring, sstring> tags_map = get_tags_of_table(schema);
rjson::value desc = rjson::empty_object();

View File

@@ -604,15 +604,21 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
return make_exception_future<json::json_return_type>(
std::runtime_error("Can not perform cleanup operation when topology changes"));
}
return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
std::vector<column_family*> column_families_vec;
auto& cm = db.get_compaction_manager();
for (auto cf : column_families) {
column_families_vec.push_back(&db.find_column_family(keyspace, cf));
}
return parallel_for_each(column_families_vec, [&cm, &db] (column_family* cf) {
return cm.perform_cleanup(db, cf);
return ctx.db.invoke_on_all([keyspace, column_families] (database& db) -> future<> {
auto table_ids = boost::copy_range<std::vector<utils::UUID>>(column_families | boost::adaptors::transformed([&] (auto& table_name) {
return db.find_uuid(keyspace, table_name);
}));
// cleanup smaller tables first, to increase chances of success if low on space.
std::ranges::sort(table_ids, std::less<>(), [&] (const utils::UUID& id) {
return db.find_column_family(id).get_stats().live_disk_space_used;
});
auto& cm = db.get_compaction_manager();
// as a table can be dropped during loop below, let's find it before issuing the cleanup request.
for (auto& id : table_ids) {
table& t = db.find_column_family(id);
co_await cm.perform_cleanup(db, &t);
}
co_return;
}).then([]{
return make_ready_future<json::json_return_type>(0);
});

View File

@@ -765,8 +765,12 @@ future<> generation_service::check_and_repair_cdc_streams() {
std::optional<cdc::generation_id> latest = _gen_id;
const auto& endpoint_states = _gossiper.get_endpoint_states();
for (const auto& [addr, state] : endpoint_states) {
if (!_gossiper.is_normal(addr)) {
throw std::runtime_error(format("All nodes must be in NORMAL state while performing check_and_repair_cdc_streams"
if (_gossiper.is_left(addr)) {
cdc_log.info("check_and_repair_cdc_streams ignored node {} because it is in LEFT state", addr);
continue;
}
if (!_gossiper.is_normal(addr)) {
throw std::runtime_error(format("All nodes must be in NORMAL or LEFT state while performing check_and_repair_cdc_streams"
" ({} is in state {})", addr, _gossiper.get_gossip_status(state)));
}
@@ -830,6 +834,11 @@ future<> generation_service::check_and_repair_cdc_streams() {
latest, db_clock::now());
should_regenerate = true;
} else {
if (tmptr->sorted_tokens().size() != gen->entries().size()) {
// We probably have garbage streams from old generations
cdc_log.info("Generation size does not match the token ring, regenerating");
should_regenerate = true;
} else {
std::unordered_set<dht::token> gen_ends;
for (const auto& entry : gen->entries()) {
gen_ends.insert(entry.token_range_end);
@@ -841,6 +850,7 @@ future<> generation_service::check_and_repair_cdc_streams() {
break;
}
}
}
}
}

View File

@@ -73,7 +73,7 @@ using namespace std::chrono_literals;
logging::logger cdc_log("cdc");
namespace cdc {
static schema_ptr create_log_schema(const schema&, std::optional<utils::UUID> = {});
static schema_ptr create_log_schema(const schema&, std::optional<utils::UUID> = {}, schema_ptr = nullptr);
}
static constexpr auto cdc_group_name = "cdc";
@@ -220,7 +220,7 @@ public:
return;
}
auto new_log_schema = create_log_schema(new_schema, log_schema ? std::make_optional(log_schema->id()) : std::nullopt);
auto new_log_schema = create_log_schema(new_schema, log_schema ? std::make_optional(log_schema->id()) : std::nullopt, log_schema);
auto log_mut = log_schema
? db::schema_tables::make_update_table_mutations(db, keyspace.metadata(), log_schema, new_log_schema, timestamp, false)
@@ -503,7 +503,7 @@ bytes log_data_column_deleted_elements_name_bytes(const bytes& column_name) {
return to_bytes(cdc_deleted_elements_column_prefix) + column_name;
}
static schema_ptr create_log_schema(const schema& s, std::optional<utils::UUID> uuid) {
static schema_ptr create_log_schema(const schema& s, std::optional<utils::UUID> uuid, schema_ptr old) {
schema_builder b(s.ks_name(), log_name(s.cf_name()));
b.with_partitioner("com.scylladb.dht.CDCPartitioner");
b.set_compaction_strategy(sstables::compaction_strategy_type::time_window);
@@ -590,6 +590,20 @@ static schema_ptr create_log_schema(const schema& s, std::optional<utils::UUID>
b.set_uuid(*uuid);
}
/**
* #10473 - if we are redefining the log table, we need to ensure any dropped
* columns are registered in "dropped_columns" table, otherwise clients will not
* be able to read data older than now.
*/
if (old) {
// not super efficient, but we don't do this often.
for (auto& col : old->all_columns()) {
if (!b.has_column({col.name(), col.name_as_text() })) {
b.without_column(col.name_as_text(), col.type, api::new_timestamp());
}
}
}
return b.build();
}

View File

@@ -527,16 +527,11 @@ future<> compaction_manager::stop() {
}
}
void compaction_manager::really_do_stop() {
if (_state == state::none || _state == state::stopped) {
return;
}
_state = state::stopped;
future<> compaction_manager::really_do_stop() {
cmlog.info("Asked to stop");
// Reset the metrics registry
_metrics.clear();
_stop_future.emplace(stop_ongoing_compactions("shutdown").then([this] () mutable {
return stop_ongoing_compactions("shutdown").then([this] () mutable {
reevaluate_postponed_compactions();
return std::move(_waiting_reevalution);
}).then([this] {
@@ -544,12 +539,34 @@ void compaction_manager::really_do_stop() {
_compaction_submission_timer.cancel();
cmlog.info("Stopped");
return _compaction_controller.shutdown();
}));
});
}
template <typename Ex>
requires std::is_base_of_v<std::exception, Ex> &&
requires (const Ex& ex) {
{ ex.code() } noexcept -> std::same_as<const std::error_code&>;
}
auto swallow_enospc(const Ex& ex) noexcept {
if (ex.code().value() != ENOSPC) {
return make_exception_future<>(std::make_exception_ptr(ex));
}
cmlog.warn("Got ENOSPC on stop, ignoring...");
return make_ready_future<>();
}
void compaction_manager::do_stop() noexcept {
if (_state == state::none || _state == state::stopped) {
return;
}
try {
really_do_stop();
_state = state::stopped;
_stop_future = really_do_stop()
.handle_exception_type([] (const std::system_error& ex) { return swallow_enospc(ex); })
.handle_exception_type([] (const storage_io_error& ex) { return swallow_enospc(ex); })
;
} catch (...) {
try {
cmlog.error("Failed to stop the manager: {}", std::current_exception());
@@ -681,6 +698,7 @@ void compaction_manager::submit_offstrategy(column_family* cf) {
_stats.active_tasks++;
task->setup_new_compaction();
return with_scheduling_group(_maintenance_sg.cpu, [this, task, cf] {
return cf->run_offstrategy_compaction(task->compaction_data).then_wrapped([this, task] (future<> f) mutable {
_stats.active_tasks--;
task->finish_compaction();
@@ -703,6 +721,7 @@ void compaction_manager::submit_offstrategy(column_family* cf) {
_tasks.remove(task);
return make_ready_future<stop_iteration>(stop_iteration::yes);
});
});
});
});
});
@@ -719,9 +738,20 @@ inline bool compaction_manager::check_for_cleanup(column_family* cf) {
future<> compaction_manager::rewrite_sstables(column_family* cf, sstables::compaction_type_options options, get_candidates_func get_func, can_purge_tombstones can_purge) {
auto task = make_lw_shared<compaction_manager::task>(cf, options.type());
_tasks.push_back(task);
auto sstables = std::make_unique<std::vector<sstables::shared_sstable>>(get_func(*cf));
std::unique_ptr<std::vector<sstables::shared_sstable>> sstables;
lw_shared_ptr<compacting_sstable_registration> compacting;
// since we might potentially have ongoing compactions, and we
// must ensure that all sstables created before we run are included
// in the re-write, we need to barrier out any previously running
// compaction.
auto get_and_register_candidates_func = [this, &sstables, &compacting, &get_func] () mutable -> future<> {
sstables = std::make_unique<std::vector<sstables::shared_sstable>>(co_await get_func());
compacting = make_lw_shared<compacting_sstable_registration>(this, *sstables);
};
co_await cf->run_with_compaction_disabled(std::ref(get_and_register_candidates_func));
// sort sstables by size in descending order, such that the smallest files will be rewritten first
// (as sstable to be rewritten is popped off from the back of container), so rewrite will have higher
// chance to succeed when the biggest files are reached.
@@ -729,10 +759,11 @@ future<> compaction_manager::rewrite_sstables(column_family* cf, sstables::compa
return a->data_size() > b->data_size();
});
auto compacting = make_lw_shared<compacting_sstable_registration>(this, *sstables);
auto sstables_ptr = sstables.get();
_stats.pending_tasks += sstables->size();
_tasks.push_back(task);
task->compaction_done = do_until([this, sstables_ptr, task] { return sstables_ptr->empty() || !can_proceed(task); },
[this, task, options, sstables_ptr, compacting, can_purge] () mutable {
auto sst = sstables_ptr->back();
@@ -789,7 +820,7 @@ future<> compaction_manager::rewrite_sstables(column_family* cf, sstables::compa
_tasks.remove(task);
});
return task->compaction_done.get_future().then([task] {});
co_return co_await task->compaction_done.get_future();
}
future<> compaction_manager::perform_sstable_scrub_validate_mode(column_family* cf) {
@@ -871,31 +902,29 @@ future<> compaction_manager::perform_cleanup(database& db, column_family* cf) {
return make_exception_future<>(std::runtime_error(format("cleanup request failed: there is an ongoing cleanup on {}.{}",
cf->schema()->ks_name(), cf->schema()->cf_name())));
}
return seastar::async([this, cf, &db] {
// FIXME: indentation
auto sorted_owned_ranges = db.get_keyspace_local_ranges(cf->schema()->ks_name());
auto get_sstables = [this, &db, cf, sorted_owned_ranges] () -> future<std::vector<sstables::shared_sstable>> {
return seastar::async([this, &db, cf, sorted_owned_ranges = std::move(sorted_owned_ranges)] {
auto schema = cf->schema();
auto sorted_owned_ranges = db.get_keyspace_local_ranges(schema->ks_name());
auto sstables = std::vector<sstables::shared_sstable>{};
const auto candidates = get_candidates(*cf);
std::copy_if(candidates.begin(), candidates.end(), std::back_inserter(sstables), [&sorted_owned_ranges, schema] (const sstables::shared_sstable& sst) {
seastar::thread::maybe_yield();
return sorted_owned_ranges.empty() || needs_cleanup(sst, sorted_owned_ranges, schema);
});
return std::tuple<dht::token_range_vector, std::vector<sstables::shared_sstable>>(sorted_owned_ranges, sstables);
}).then_unpack([this, cf, &db] (dht::token_range_vector owned_ranges, std::vector<sstables::shared_sstable> sstables) {
return rewrite_sstables(cf, sstables::compaction_type_options::make_cleanup(std::move(owned_ranges)),
[sstables = std::move(sstables)] (const table&) { return sstables; });
return sstables;
});
};
return rewrite_sstables(cf, sstables::compaction_type_options::make_cleanup(std::move(sorted_owned_ranges)), std::move(get_sstables));
}
// Submit a column family to be upgraded and wait for its termination.
future<> compaction_manager::perform_sstable_upgrade(database& db, column_family* cf, bool exclude_current_version) {
using shared_sstables = std::vector<sstables::shared_sstable>;
return do_with(shared_sstables{}, [this, &db, cf, exclude_current_version](shared_sstables& tables) {
// since we might potentially have ongoing compactions, and we
// must ensure that all sstables created before we run are included
// in the re-write, we need to barrier out any previously running
// compaction.
return cf->run_with_compaction_disabled([this, cf, &tables, exclude_current_version] {
auto get_sstables = [this, &db, cf, exclude_current_version] {
// FIXME: indentation
std::vector<sstables::shared_sstable> tables;
auto last_version = cf->get_sstables_manager().get_highest_supported_format();
for (auto& sst : get_candidates(*cf)) {
@@ -906,21 +935,17 @@ future<> compaction_manager::perform_sstable_upgrade(database& db, column_family
tables.emplace_back(sst);
}
}
return make_ready_future<>();
}).then([&db, cf] {
return db.get_keyspace_local_ranges(cf->schema()->ks_name());
}).then([this, &db, cf, &tables] (dht::token_range_vector owned_ranges) {
// doing a "cleanup" is about as compacting as we need
// to be, provided we get to decide the tables to process,
// and ignoring any existing operations.
// Note that we potentially could be doing multiple
// upgrades here in parallel, but that is really the users
// problem.
return rewrite_sstables(cf, sstables::compaction_type_options::make_upgrade(std::move(owned_ranges)), [&](auto&) mutable {
return std::exchange(tables, {});
});
});
});
return make_ready_future<std::vector<sstables::shared_sstable>>(tables);
};
// doing a "cleanup" is about as compacting as we need
// to be, provided we get to decide the tables to process,
// and ignoring any existing operations.
// Note that we potentially could be doing multiple
// upgrades here in parallel, but that is really the users
// problem.
return rewrite_sstables(cf, sstables::compaction_type_options::make_upgrade(db.get_keyspace_local_ranges(cf->schema()->ks_name())), std::move(get_sstables));
}
// Submit a column family to be scrubbed and wait for its termination.
@@ -928,14 +953,10 @@ future<> compaction_manager::perform_sstable_scrub(column_family* cf, sstables::
if (scrub_mode == sstables::compaction_type_options::scrub::mode::validate) {
return perform_sstable_scrub_validate_mode(cf);
}
// since we might potentially have ongoing compactions, and we
// must ensure that all sstables created before we run are scrubbed,
// we need to barrier out any previously running compaction.
return cf->run_with_compaction_disabled([this, cf, scrub_mode] {
return rewrite_sstables(cf, sstables::compaction_type_options::make_scrub(scrub_mode), [this] (const table& cf) {
return get_candidates(cf);
// FIXME: indentation
return rewrite_sstables(cf, sstables::compaction_type_options::make_scrub(scrub_mode), [this, cf] {
return make_ready_future<std::vector<sstables::shared_sstable>>(get_candidates(*cf));
}, can_purge_tombstones::no);
});
}
future<> compaction_manager::remove(column_family* cf) {

View File

@@ -178,7 +178,7 @@ private:
maintenance_scheduling_group _maintenance_sg;
size_t _available_memory;
using get_candidates_func = std::function<std::vector<sstables::shared_sstable>(const column_family&)>;
using get_candidates_func = std::function<future<std::vector<sstables::shared_sstable>>()>;
class can_purge_tombstones_tag;
using can_purge_tombstones = bool_class<can_purge_tombstones_tag>;
@@ -209,7 +209,7 @@ public:
// Stop all fibers, without waiting. Safe to be called multiple times.
void do_stop() noexcept;
void really_do_stop();
future<> really_do_stop();
// Submit a column family to be compacted.
void submit(column_family* cf);

View File

@@ -80,7 +80,11 @@ compaction_descriptor leveled_compaction_strategy::get_major_compaction_job(colu
}
void leveled_compaction_strategy::notify_completion(const std::vector<shared_sstable>& removed, const std::vector<shared_sstable>& added) {
if (removed.empty() || added.empty()) {
// All the update here is only relevant for regular compaction's round-robin picking policy, and if
// last_compacted_keys wasn't generated by regular, it means regular is disabled since last restart,
// therefore we can skip the updates here until regular runs for the first time. Once it runs,
// it will be able to generate last_compacted_keys correctly by looking at metadata of files.
if (removed.empty() || added.empty() || !_last_compacted_keys) {
return;
}
auto min_level = std::numeric_limits<uint32_t>::max();

View File

@@ -225,6 +225,7 @@ time_window_compaction_strategy::get_sstables_for_compaction(column_family& cf,
auto gc_before = gc_clock::now() - cf.schema()->gc_grace_seconds();
if (candidates.empty()) {
_estimated_remaining_tasks = 0;
return compaction_descriptor();
}

View File

@@ -1403,7 +1403,7 @@ serviceLevelOrRoleName returns [sstring name]
std::transform($name.begin(), $name.end(), $name.begin(), ::tolower); }
| t=STRING_LITERAL { $name = sstring($t.text); }
| t=QUOTED_NAME { $name = sstring($t.text); }
| k=unreserved_keyword { $name = sstring($t.text);
| k=unreserved_keyword { $name = k;
std::transform($name.begin(), $name.end(), $name.begin(), ::tolower);}
| QMARK {add_recognition_error("Bind variables cannot be used for service levels or role names");}
;

View File

@@ -25,6 +25,7 @@
#include "cql3_type.hh"
#include "cql3/util.hh"
#include "exceptions/exceptions.hh"
#include "ut_name.hh"
#include "database.hh"
#include "user_types_metadata.hh"
@@ -448,7 +449,20 @@ sstring maybe_quote(const sstring& identifier) {
}
if (!need_quotes) {
return identifier;
// A seemingly valid identifier matching [a-z][a-z0-9_]* may still
// need quoting if it is a CQL keyword, e.g., "to" (see issue #9450).
// While our parser Cql.g has different production rules for different
// types of identifiers (column names, table names, etc.), all of
// these behave identically for alphanumeric strings: they exclude
// many keywords but allow keywords listed as "unreserved keywords".
// So we can use any of them, for example cident.
try {
cql3::util::do_with_parser(identifier, std::mem_fn(&cql3_parser::CqlParser::cident));
return identifier;
} catch(exceptions::syntax_exception&) {
// This alphanumeric string is not a valid identifier, so fall
// through to have it quoted:
}
}
if (num_quotes == 0) {
return make_sstring("\"", identifier, "\"");

View File

@@ -109,9 +109,7 @@ public:
virtual seastar::future<seastar::shared_ptr<cql_transport::messages::result_message>>
execute(query_processor& qp, service::query_state& state, const query_options& options) const = 0;
virtual bool depends_on_keyspace(const seastar::sstring& ks_name) const = 0;
virtual bool depends_on_column_family(const seastar::sstring& cf_name) const = 0;
virtual bool depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const = 0;
virtual seastar::shared_ptr<const metadata> get_result_metadata() const = 0;

View File

@@ -123,10 +123,38 @@ managed_bytes_opt get_value(const column_value& col, const column_value_eval_bag
format("Column definition {} does not match any column in the query selection",
cdef->name_as_text()));
}
const auto deserialized = cdef->type->deserialize(managed_bytes_view(*data.other_columns[index]));
const managed_bytes_opt& serialized = data.other_columns[index];
if (!serialized) {
// For null[i] we return null.
return std::nullopt;
}
const auto deserialized = cdef->type->deserialize(managed_bytes_view(*serialized));
const auto& data_map = value_cast<map_type_impl::native_type>(deserialized);
const auto key = evaluate_to_raw_view(col.sub, options);
auto&& key_type = col_type->name_comparator();
if (key.is_null()) {
// For m[null] return null.
// This is different from Cassandra - which treats m[null]
// as an invalid request error. But m[null] -> null is more
// consistent with our usual null treatement (e.g., both
// null[2] and null < 2 return null). It will also allow us
// to support non-constant subscripts (e.g., m[a]) where "a"
// may be null in some rows and non-null in others, and it's
// not an error.
return std::nullopt;
}
if (key.is_unset_value()) {
// An m[?] with ? bound to UNSET_VALUE is a invalid query.
// We could have detected it earlier while binding, but since
// we currently don't, we must protect the following code
// which can't work with an UNSET_VALUE. Note that the
// placement of this check here means that in an empty table,
// where we never need to evaluate the filter expression, this
// error will not be detected.
throw exceptions::invalid_request_exception(
format("Unsupported unset map key for column {}",
cdef->name_as_text()));
}
const auto found = key.with_linearized([&] (bytes_view key_bv) {
using entry = std::pair<data_value, data_value>;
return std::find_if(data_map.cbegin(), data_map.cend(), [&] (const entry& element) {

View File

@@ -970,7 +970,7 @@ bool query_processor::migration_subscriber::should_invalidate(
sstring ks_name,
std::optional<sstring> cf_name,
::shared_ptr<cql_statement> statement) {
return statement->depends_on_keyspace(ks_name) && (!cf_name || statement->depends_on_column_family(*cf_name));
return statement->depends_on(ks_name, cf_name);
}
future<> query_processor::query_internal(

View File

@@ -450,11 +450,16 @@ bool result_set_builder::restrictions_filter::do_filter(const selection& selecti
}
auto clustering_columns_restrictions = _restrictions->get_clustering_columns_restrictions();
if (dynamic_pointer_cast<cql3::restrictions::multi_column_restriction>(clustering_columns_restrictions)) {
bool has_multi_col_clustering_restrictions =
dynamic_pointer_cast<cql3::restrictions::multi_column_restriction>(clustering_columns_restrictions) != nullptr;
if (has_multi_col_clustering_restrictions) {
clustering_key_prefix ckey = clustering_key_prefix::from_exploded(clustering_key);
return expr::is_satisfied_by(
bool multi_col_clustering_satisfied = expr::is_satisfied_by(
clustering_columns_restrictions->expression,
partition_key, clustering_key, static_row, row, selection, _options);
if (!multi_col_clustering_satisfied) {
return false;
}
}
auto static_row_iterator = static_row.iterator();
@@ -502,6 +507,13 @@ bool result_set_builder::restrictions_filter::do_filter(const selection& selecti
if (_skip_ck_restrictions) {
continue;
}
if (has_multi_col_clustering_restrictions) {
// Mixing multi column and single column restrictions on clustering
// key columns is forbidden.
// Since there are multi column restrictions we have to skip
// evaluating single column restrictions or we will get an error.
continue;
}
auto clustering_key_restrictions_map = _restrictions->get_single_column_clustering_key_restrictions();
auto restr_it = clustering_key_restrictions_map.find(cdef);
if (restr_it == clustering_key_restrictions_map.end()) {

View File

@@ -46,13 +46,7 @@ uint32_t cql3::statements::authentication_statement::get_bound_terms() const {
return 0;
}
bool cql3::statements::authentication_statement::depends_on_keyspace(
const sstring& ks_name) const {
return false;
}
bool cql3::statements::authentication_statement::depends_on_column_family(
const sstring& cf_name) const {
bool cql3::statements::authentication_statement::depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const {
return false;
}

View File

@@ -55,9 +55,7 @@ public:
uint32_t get_bound_terms() const override;
bool depends_on_keyspace(const sstring& ks_name) const override;
bool depends_on_column_family(const sstring& cf_name) const override;
bool depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const override;
future<> check_access(service::storage_proxy& proxy, const service::client_state& state) const override;

View File

@@ -48,13 +48,7 @@ uint32_t cql3::statements::authorization_statement::get_bound_terms() const {
return 0;
}
bool cql3::statements::authorization_statement::depends_on_keyspace(
const sstring& ks_name) const {
return false;
}
bool cql3::statements::authorization_statement::depends_on_column_family(
const sstring& cf_name) const {
bool cql3::statements::authorization_statement::depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const {
return false;
}

View File

@@ -59,9 +59,7 @@ public:
uint32_t get_bound_terms() const override;
bool depends_on_keyspace(const sstring& ks_name) const override;
bool depends_on_column_family(const sstring& cf_name) const override;
bool depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const override;
future<> check_access(service::storage_proxy& proxy, const service::client_state& state) const override;

View File

@@ -98,14 +98,9 @@ batch_statement::batch_statement(type type_,
{
}
bool batch_statement::depends_on_keyspace(const sstring& ks_name) const
bool batch_statement::depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const
{
return false;
}
bool batch_statement::depends_on_column_family(const sstring& cf_name) const
{
return false;
return boost::algorithm::any_of(_statements, [&ks_name, &cf_name] (auto&& s) { return s.statement->depends_on(ks_name, cf_name); });
}
uint32_t batch_statement::get_bound_terms() const

View File

@@ -115,9 +115,7 @@ public:
std::unique_ptr<attributes> attrs,
cql_stats& stats);
virtual bool depends_on_keyspace(const sstring& ks_name) const override;
virtual bool depends_on_column_family(const sstring& cf_name) const override;
virtual bool depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const override;
virtual uint32_t get_bound_terms() const override;

View File

@@ -46,6 +46,7 @@
#include "cdc/cdc_extension.hh"
#include "gms/feature.hh"
#include "gms/feature_service.hh"
#include "utils/bloom_calculations.hh"
#include <boost/algorithm/string/predicate.hpp>
@@ -168,6 +169,16 @@ void cf_prop_defs::validate(const database& db, const schema::extensions_map& sc
throw exceptions::configuration_exception(KW_MAX_INDEX_INTERVAL + " must be greater than " + KW_MIN_INDEX_INTERVAL);
}
if (get_simple(KW_BF_FP_CHANCE)) {
double bloom_filter_fp_chance = get_double(KW_BF_FP_CHANCE, 0/*not used*/);
double min_bloom_filter_fp_chance = utils::bloom_calculations::min_supported_bloom_filter_fp_chance();
if (bloom_filter_fp_chance <= min_bloom_filter_fp_chance || bloom_filter_fp_chance > 1.0) {
throw exceptions::configuration_exception(format(
"{} must be larger than {} and less than or equal to 1.0 (got {})",
KW_BF_FP_CHANCE, min_bloom_filter_fp_chance, bloom_filter_fp_chance));
}
}
speculative_retry::from_sstring(get_string(KW_SPECULATIVE_RETRY, speculative_retry(speculative_retry::type::NONE, 0).to_sstring()));
}

View File

@@ -571,12 +571,8 @@ modification_statement::validate(service::storage_proxy&, const service::client_
}
}
bool modification_statement::depends_on_keyspace(const sstring& ks_name) const {
return keyspace() == ks_name;
}
bool modification_statement::depends_on_column_family(const sstring& cf_name) const {
return column_family() == cf_name;
bool modification_statement::depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const {
return keyspace() == ks_name && (!cf_name || column_family() == *cf_name);
}
void modification_statement::add_operation(::shared_ptr<operation> op) {

View File

@@ -165,9 +165,7 @@ public:
// Validate before execute, using client state and current schema
void validate(service::storage_proxy&, const service::client_state& state) const override;
virtual bool depends_on_keyspace(const sstring& ks_name) const override;
virtual bool depends_on_column_family(const sstring& cf_name) const override;
virtual bool depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const override;
void add_operation(::shared_ptr<operation> op);

View File

@@ -67,12 +67,7 @@ future<> schema_altering_statement::grant_permissions_to_creator(const service::
return make_ready_future<>();
}
bool schema_altering_statement::depends_on_keyspace(const sstring& ks_name) const
{
return false;
}
bool schema_altering_statement::depends_on_column_family(const sstring& cf_name) const
bool schema_altering_statement::depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const
{
return false;
}

View File

@@ -79,9 +79,7 @@ protected:
*/
virtual future<> grant_permissions_to_creator(const service::client_state&) const;
virtual bool depends_on_keyspace(const sstring& ks_name) const override;
virtual bool depends_on_column_family(const sstring& cf_name) const override;
virtual bool depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const override;
virtual uint32_t get_bound_terms() const override;

View File

@@ -194,12 +194,8 @@ void select_statement::validate(service::storage_proxy&, const service::client_s
// Nothing to do, all validation has been done by raw_statemet::prepare()
}
bool select_statement::depends_on_keyspace(const sstring& ks_name) const {
return keyspace() == ks_name;
}
bool select_statement::depends_on_column_family(const sstring& cf_name) const {
return column_family() == cf_name;
bool select_statement::depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const {
return keyspace() == ks_name && (!cf_name || column_family() == *cf_name);
}
const sstring& select_statement::keyspace() const {

View File

@@ -127,8 +127,7 @@ public:
virtual uint32_t get_bound_terms() const override;
virtual future<> check_access(service::storage_proxy& proxy, const service::client_state& state) const override;
virtual void validate(service::storage_proxy&, const service::client_state& state) const override;
virtual bool depends_on_keyspace(const sstring& ks_name) const override;
virtual bool depends_on_column_family(const sstring& cf_name) const override;
virtual bool depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const override;
virtual future<::shared_ptr<cql_transport::messages::result_message>> execute(query_processor& qp,
service::query_state& state, const query_options& options) const override;

View File

@@ -30,13 +30,7 @@ uint32_t service_level_statement::get_bound_terms() const {
return 0;
}
bool service_level_statement::depends_on_keyspace(
const sstring &ks_name) const {
return false;
}
bool service_level_statement::depends_on_column_family(
const sstring &cf_name) const {
bool service_level_statement::depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const {
return false;
}

View File

@@ -56,9 +56,7 @@ public:
uint32_t get_bound_terms() const override;
bool depends_on_keyspace(const sstring& ks_name) const override;
bool depends_on_column_family(const sstring& cf_name) const override;
bool depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const override;
future<> check_access(service::storage_proxy& sp, const service::client_state& state) const override;

View File

@@ -43,7 +43,7 @@ void sl_prop_defs::validate() {
data_value v = duration_type->deserialize(duration_type->from_string(*repr));
cql_duration duration = static_pointer_cast<const duration_type_impl>(duration_type)->from_value(v);
if (duration.months || duration.days) {
throw exceptions::invalid_request_exception("Timeout values cannot be longer than 24h");
throw exceptions::invalid_request_exception("Timeout values cannot be expressed in days/months");
}
if (duration.nanoseconds % 1'000'000 != 0) {
throw exceptions::invalid_request_exception("Timeout values must be expressed in millisecond granularity");

View File

@@ -67,12 +67,7 @@ std::unique_ptr<prepared_statement> truncate_statement::prepare(database& db,cql
return std::make_unique<prepared_statement>(::make_shared<truncate_statement>(*this));
}
bool truncate_statement::depends_on_keyspace(const sstring& ks_name) const
{
return false;
}
bool truncate_statement::depends_on_column_family(const sstring& cf_name) const
bool truncate_statement::depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const
{
return false;
}

View File

@@ -58,9 +58,7 @@ public:
virtual std::unique_ptr<prepared_statement> prepare(database& db, cql_stats& stats) override;
virtual bool depends_on_keyspace(const sstring& ks_name) const override;
virtual bool depends_on_column_family(const sstring& cf_name) const override;
virtual bool depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const override;
virtual future<> check_access(service::storage_proxy& proxy, const service::client_state& state) const override;

View File

@@ -74,12 +74,7 @@ std::unique_ptr<prepared_statement> use_statement::prepare(database& db, cql_sta
}
bool use_statement::depends_on_keyspace(const sstring& ks_name) const
{
return false;
}
bool use_statement::depends_on_column_family(const sstring& cf_name) const
bool use_statement::depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const
{
return false;
}

View File

@@ -59,9 +59,7 @@ public:
virtual uint32_t get_bound_terms() const override;
virtual bool depends_on_keyspace(const seastar::sstring& ks_name) const override;
virtual bool depends_on_column_family(const seastar::sstring& cf_name) const override;
virtual bool depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const override;
virtual seastar::future<> check_access(service::storage_proxy& proxy, const service::client_state& state) const override;

View File

@@ -31,6 +31,8 @@
#include "types/listlike_partial_deserializing_iterator.hh"
#include "utils/managed_bytes.hh"
#include "exceptions/exceptions.hh"
#include <boost/algorithm/string/trim_all.hpp>
#include <boost/algorithm/string.hpp>
static inline bool is_control_char(char c) {
return c >= 0 && c <= 0x1F;
@@ -212,6 +214,17 @@ struct from_json_object_visitor {
}
bytes operator()(const boolean_type_impl& t) {
if (!value.IsBool()) {
if (value.IsString()) {
std::string str(rjson::to_string_view(value));
boost::trim_all(str);
boost::to_lower(str);
if (str == "true") {
return t.decompose(true);
} else if (str == "false") {
return t.decompose(false);
}
}
throw marshal_exception(format("Invalid JSON object {}", value));
}
return t.decompose(value.GetBool());

View File

@@ -87,6 +87,13 @@ std::unique_ptr<cql3::statements::raw::select_statement> build_select_statement(
/// forbids non-alpha-numeric characters in identifier names.
/// Quoting involves wrapping the string in double-quotes ("). A double-quote
/// character itself is quoted by doubling it.
/// maybe_quote() also quotes reserved CQL keywords (e.g., "to", "where")
/// but doesn't quote *unreserved* keywords (like ttl, int or as).
/// Note that this means that if new reserved keywords are added to the
/// parser, a saved output of maybe_quote() may no longer be parsable by
/// parser. To avoid this forward-compatibility issue, use quote() instead
/// of maybe_quote() - to unconditionally quote an identifier even if it is
/// lowercase and not (yet) a keyword.
sstring maybe_quote(const sstring& s);
// Check whether timestamp is not too far in the future as this probably

View File

@@ -926,10 +926,9 @@ bool database::update_column_family(schema_ptr new_schema) {
return columns_changed;
}
future<> database::remove(const column_family& cf) noexcept {
void database::remove(const table& cf) noexcept {
auto s = cf.schema();
auto& ks = find_keyspace(s->ks_name());
co_await _querier_cache.evict_all_for_table(s->id());
_column_families.erase(s->id());
ks.metadata()->remove_column_family(s);
_ks_cf_to_uuid.erase(std::make_pair(s->ks_name(), s->cf_name()));
@@ -946,13 +945,20 @@ future<> database::drop_column_family(const sstring& ks_name, const sstring& cf_
auto& ks = find_keyspace(ks_name);
auto uuid = find_uuid(ks_name, cf_name);
auto cf = _column_families.at(uuid);
co_await remove(*cf);
remove(*cf);
cf->clear_views();
co_return co_await cf->await_pending_ops().then([this, &ks, cf, tsf = std::move(tsf), snapshot] {
return truncate(ks, *cf, std::move(tsf), snapshot).finally([this, cf] {
return cf->stop();
});
}).finally([cf] {});
co_await cf->await_pending_ops();
co_await _querier_cache.evict_all_for_table(cf->schema()->id());
std::exception_ptr ex;
try {
co_await truncate(ks, *cf, std::move(tsf), snapshot);
} catch (...) {
ex = std::current_exception();
}
co_await cf->stop();
if (ex) {
std::rethrow_exception(std::move(ex));
}
}
const utils::UUID& database::find_uuid(std::string_view ks, std::string_view cf) const {

View File

@@ -1384,6 +1384,7 @@ private:
Future update_write_metrics(Future&& f);
void update_write_metrics_for_timed_out_write();
future<> create_keyspace(const lw_shared_ptr<keyspace_metadata>&, bool is_bootstrap, system_keyspace system);
void remove(const table&) noexcept;
public:
static utils::UUID empty_version;
@@ -1582,7 +1583,6 @@ public:
bool update_column_family(schema_ptr s);
future<> drop_column_family(const sstring& ks_name, const sstring& cf_name, timestamp_func, bool with_snapshot = true);
future<> remove(const column_family&) noexcept;
const logalloc::region_group& dirty_memory_region_group() const {
return _dirty_memory_manager.region_group();

View File

@@ -39,6 +39,7 @@
*/
#include <chrono>
#include <exception>
#include <seastar/core/future-util.hh>
#include <seastar/core/do_with.hh>
#include <seastar/core/semaphore.hh>
@@ -306,6 +307,7 @@ future<> db::batchlog_manager::replay_all_failed_batches() {
} catch (no_such_keyspace& ex) {
// should probably ignore and drop the batch
} catch (...) {
blogger.warn("Replay failed (will retry): {}", std::current_exception());
// timeout, overload etc.
// Do _not_ remove the batch, assuning we got a node write error.
// Since we don't have hints (which origin is satisfied with),

View File

@@ -860,6 +860,8 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"Flush tables in the system_schema keyspace after schema modification. This is required for crash recovery, but slows down tests and can be disabled for them")
, restrict_replication_simplestrategy(this, "restrict_replication_simplestrategy", liveness::LiveUpdate, value_status::Used, db::tri_mode_restriction_t::mode::FALSE, "Controls whether to disable SimpleStrategy replication. Can be true, false, or warn.")
, restrict_dtcs(this, "restrict_dtcs", liveness::LiveUpdate, value_status::Used, db::tri_mode_restriction_t::mode::WARN, "Controls whether to prevent setting DateTieredCompactionStrategy. Can be true, false, or warn.")
, cache_index_pages(this, "cache_index_pages", liveness::LiveUpdate, value_status::Used, true,
"Keep SSTable index pages in the global cache after a SSTable read. Expected to improve performance for workloads with big partitions, but may degrade performance for workloads with small partitions.")
, default_log_level(this, "default_log_level", value_status::Used)
, logger_log_level(this, "logger_log_level", value_status::Used)
, log_to_stdout(this, "log_to_stdout", value_status::Used)

View File

@@ -372,6 +372,8 @@ public:
named_value<tri_mode_restriction> restrict_replication_simplestrategy;
named_value<tri_mode_restriction> restrict_dtcs;
named_value<bool> cache_index_pages;
seastar::logging_settings logging_settings(const boost::program_options::variables_map&) const;
const db::extensions& extensions() const;

View File

@@ -119,8 +119,9 @@ future<> snapshot_ctl::take_column_family_snapshot(sstring ks_name, std::vector<
return check_snapshot_not_exist(ks_name, tag, tables).then([this, ks_name, tables, tag, sf] {
return do_with(std::vector<sstring>(std::move(tables)),[this, ks_name, tag, sf](const std::vector<sstring>& tables) {
return do_for_each(tables, [ks_name, tag, sf, this] (const sstring& table_name) {
if (table_name.find(".") != sstring::npos) {
throw std::invalid_argument("Cannot take a snapshot of a secondary index by itself. Run snapshot on the table that owns the index.");
auto& cf = _db.local().find_column_family(ks_name, table_name);
if (cf.schema()->is_view()) {
throw std::invalid_argument("Do not take a snapshot of a materialized view or a secondary index by itself. Run snapshot on the base table instead.");
}
return _db.invoke_on_all([ks_name, table_name, tag, sf] (database &db) {
auto& cf = db.find_column_family(ks_name, table_name);

View File

@@ -350,7 +350,11 @@ public:
view_filter_checking_visitor(const schema& base, const view_info& view)
: _base(base)
, _view(view)
, _selection(cql3::selection::selection::wildcard(_base.shared_from_this()))
, _selection(cql3::selection::selection::for_columns(_base.shared_from_this(),
boost::copy_range<std::vector<const column_definition*>>(
_base.regular_columns() | boost::adaptors::transformed([] (const column_definition& cdef) { return &cdef; }))
)
)
{}
void accept_new_partition(const partition_key& key, uint64_t row_count) {
@@ -887,13 +891,18 @@ void view_updates::generate_update(
bool same_row = true;
for (auto col_id : col_ids) {
auto* after = update.cells().find_cell(col_id);
// Note: multi-cell columns can't be part of the primary key.
auto& cdef = _base->regular_column_at(col_id);
if (existing) {
auto* before = existing->cells().find_cell(col_id);
// Note that this cell is necessarily atomic, because col_ids are
// view key columns, and keys must be atomic.
if (before && before->as_atomic_cell(cdef).is_live()) {
if (after && after->as_atomic_cell(cdef).is_live()) {
auto cmp = compare_atomic_cell_for_merge(before->as_atomic_cell(cdef), after->as_atomic_cell(cdef));
// We need to compare just the values of the keys, not
// metadata like the timestamp. This is because below,
// if the old and new view row have the same key, we need
// to be sure to reach the update_entry() case.
auto cmp = compare_unsigned(before->as_atomic_cell(cdef).value(), after->as_atomic_cell(cdef).value());
if (cmp != 0) {
same_row = false;
}
@@ -913,7 +922,13 @@ void view_updates::generate_update(
if (same_row) {
update_entry(base_key, update, *existing, now);
} else {
replace_entry(base_key, update, *existing, now);
// This code doesn't work if the old and new view row have the
// same key, because if they do we get both data and tombstone
// for the same timestamp (now) and the tombstone wins. This
// is why we need the "same_row" case above - it's not just a
// performance optimization.
delete_old_entry(base_key, *existing, update, now);
create_entry(base_key, update, now);
}
} else {
delete_old_entry(base_key, *existing, update, now);
@@ -1320,7 +1335,7 @@ future<> mutate_MV(
auto mut_ptr = remote_endpoints.empty() ? std::make_unique<frozen_mutation>(std::move(mut.fm)) : std::make_unique<frozen_mutation>(mut.fm);
tracing::trace(tr_state, "Locally applying view update for {}.{}; base token = {}; view token = {}",
mut.s->ks_name(), mut.s->cf_name(), base_token, view_token);
local_view_update = service::get_local_storage_proxy().mutate_locally(mut.s, *mut_ptr, std::move(tr_state), db::commitlog::force_sync::no).then_wrapped(
local_view_update = service::get_local_storage_proxy().mutate_locally(mut.s, *mut_ptr, tr_state, db::commitlog::force_sync::no).then_wrapped(
[s = mut.s, &stats, &cf_stats, tr_state, base_token, view_token, my_address, mut_ptr = std::move(mut_ptr),
units = sem_units.split(sem_units.count())] (future<>&& f) {
--stats.writes;

View File

@@ -164,10 +164,7 @@ private:
void delete_old_entry(const partition_key& base_key, const clustering_row& existing, const clustering_row& update, gc_clock::time_point now);
void do_delete_old_entry(const partition_key& base_key, const clustering_row& existing, const clustering_row& update, gc_clock::time_point now);
void update_entry(const partition_key& base_key, const clustering_row& update, const clustering_row& existing, gc_clock::time_point now);
void replace_entry(const partition_key& base_key, const clustering_row& update, const clustering_row& existing, gc_clock::time_point now) {
create_entry(base_key, update, now);
delete_old_entry(base_key, existing, update, now);
}
void update_entry_for_computed_column(const partition_key& base_key, const clustering_row& update, const std::optional<clustering_row>& existing, gc_clock::time_point now);
};
class view_update_builder {

View File

@@ -215,6 +215,12 @@ public:
});
}
future<flush_permit> get_all_flush_permits() {
return get_units(_background_work_flush_serializer, _max_background_work).then([this] (auto&& units) {
return this->get_flush_permit(std::move(units));
});
}
bool has_extraneous_flushes_requested() const {
return _extraneous_flushes > 0;
}

View File

@@ -127,10 +127,14 @@ WantedBy=multi-user.target
# - Storage: /path/to/file (inacessible)
# - Storage: /path/to/file
#
# After systemd-v248, available coredump file output changed like this:
# - Storage: /path/to/file (present)
# We need to support both versions.
#
# reference: https://github.com/systemd/systemd/commit/47f50642075a7a215c9f7b600599cbfee81a2913
corefail = False
res = re.findall(r'Storage: (.*)$', coreinfo, flags=re.MULTILINE)
res = re.findall(r'Storage: (\S+)(?: \(.+\))?$', coreinfo, flags=re.MULTILINE)
# v232 or later
if res:
corepath = res[0]

View File

@@ -101,6 +101,7 @@ run bash -ec "cat /scylla_bashrc >> /etc/bash.bashrc"
run mkdir -p /etc/supervisor.conf.d
run mkdir -p /var/log/scylla
run chown -R scylla:scylla /var/lib/scylla
run sed -i -e 's/^SCYLLA_ARGS=".*"$/SCYLLA_ARGS="--log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix"/' /etc/default/scylla-server
run mkdir -p /opt/scylladb/supervisor
run touch /opt/scylladb/SCYLLA-CONTAINER-FILE

View File

@@ -1,41 +0,0 @@
# choose following mode: virtio, dpdk, posix
NETWORK_MODE=posix
# tap device name(virtio)
TAP=tap0
# bridge device name (virtio)
BRIDGE=virbr0
# ethernet device name
IFNAME=eth0
# setup NIC's and disks' interrupts, RPS, XPS, nomerges and I/O scheduler (posix)
SET_NIC_AND_DISKS=no
# ethernet device driver (dpdk)
ETHDRV=
# ethernet device PCI ID (dpdk)
ETHPCIID=
# number of hugepages
NR_HUGEPAGES=64
# user for process (must be root for dpdk)
USER=scylla
# group for process
GROUP=scylla
# scylla home dir
SCYLLA_HOME=/var/lib/scylla
# scylla config dir
SCYLLA_CONF=/etc/scylla
# scylla arguments
SCYLLA_ARGS="--log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix"
# setup as AMI instance
AMI=no

View File

@@ -45,7 +45,7 @@
logging::logger fmr_logger("flat_mutation_reader");
flat_mutation_reader& flat_mutation_reader::operator=(flat_mutation_reader&& o) noexcept {
if (_impl) {
if (_impl && _impl->is_close_required()) {
impl* ip = _impl.get();
// Abort to enforce calling close() before readers are closed
// to prevent leaks and potential use-after-free due to background
@@ -58,7 +58,7 @@ flat_mutation_reader& flat_mutation_reader::operator=(flat_mutation_reader&& o)
}
flat_mutation_reader::~flat_mutation_reader() {
if (_impl) {
if (_impl && _impl->is_close_required()) {
impl* ip = _impl.get();
// Abort to enforce calling close() before readers are closed
// to prevent leaks and potential use-after-free due to background
@@ -1344,7 +1344,7 @@ void mutation_fragment_stream_validating_filter::on_end_of_stream() {
}
flat_mutation_reader_v2& flat_mutation_reader_v2::operator=(flat_mutation_reader_v2&& o) noexcept {
if (_impl) {
if (_impl && _impl->is_close_required()) {
impl* ip = _impl.get();
// Abort to enforce calling close() before readers are closed
// to prevent leaks and potential use-after-free due to background
@@ -1357,7 +1357,7 @@ flat_mutation_reader_v2& flat_mutation_reader_v2::operator=(flat_mutation_reader
}
flat_mutation_reader_v2::~flat_mutation_reader_v2() {
if (_impl) {
if (_impl && _impl->is_close_required()) {
impl* ip = _impl.get();
// Abort to enforce calling close() before readers are closed
// to prevent leaks and potential use-after-free due to background

View File

@@ -142,6 +142,7 @@ public:
private:
tracked_buffer _buffer;
size_t _buffer_size = 0;
bool _close_required = false;
protected:
size_t max_buffer_size_in_bytes = default_max_buffer_size_in_bytes();
bool _end_of_stream = false;
@@ -175,6 +176,8 @@ public:
bool is_end_of_stream() const { return _end_of_stream; }
bool is_buffer_empty() const { return _buffer.empty(); }
bool is_buffer_full() const { return _buffer_size >= max_buffer_size_in_bytes; }
bool is_close_required() const { return _close_required; }
void set_close_required() { _close_required = true; }
static constexpr size_t default_max_buffer_size_in_bytes() { return 8 * 1024; }
mutation_fragment pop_mutation_fragment() {
@@ -506,9 +509,15 @@ public:
//
// Can be used to skip over entire partitions if interleaved with
// `operator()()` calls.
future<> next_partition() { return _impl->next_partition(); }
future<> next_partition() {
_impl->set_close_required();
return _impl->next_partition();
}
future<> fill_buffer() { return _impl->fill_buffer(); }
future<> fill_buffer() {
_impl->set_close_required();
return _impl->fill_buffer();
}
// Changes the range of partitions to pr. The range can only be moved
// forwards. pr.begin() needs to be larger than pr.end() of the previousl
@@ -517,6 +526,7 @@ public:
// pr needs to be valid until the reader is destroyed or fast_forward_to()
// is called again.
future<> fast_forward_to(const dht::partition_range& pr) {
_impl->set_close_required();
return _impl->fast_forward_to(pr);
}
// Skips to a later range of rows.
@@ -546,6 +556,7 @@ public:
// In particular one must first enter a partition by fetching a `partition_start`
// fragment before calling `fast_forward_to`.
future<> fast_forward_to(position_range cr) {
_impl->set_close_required();
return _impl->fast_forward_to(std::move(cr));
}
// Closes the reader.

View File

@@ -177,6 +177,7 @@ public:
private:
tracked_buffer _buffer;
size_t _buffer_size = 0;
bool _close_required = false;
protected:
size_t max_buffer_size_in_bytes = default_max_buffer_size_in_bytes();
@@ -216,6 +217,8 @@ public:
bool is_end_of_stream() const { return _end_of_stream; }
bool is_buffer_empty() const { return _buffer.empty(); }
bool is_buffer_full() const { return _buffer_size >= max_buffer_size_in_bytes; }
bool is_close_required() const { return _close_required; }
void set_close_required() { _close_required = true; }
static constexpr size_t default_max_buffer_size_in_bytes() { return 8 * 1024; }
mutation_fragment_v2 pop_mutation_fragment() {
@@ -547,9 +550,15 @@ public:
//
// Can be used to skip over entire partitions if interleaved with
// `operator()()` calls.
future<> next_partition() { return _impl->next_partition(); }
future<> next_partition() {
_impl->set_close_required();
return _impl->next_partition();
}
future<> fill_buffer() { return _impl->fill_buffer(); }
future<> fill_buffer() {
_impl->set_close_required();
return _impl->fill_buffer();
}
// Changes the range of partitions to pr. The range can only be moved
// forwards. pr.begin() needs to be larger than pr.end() of the previousl
@@ -558,6 +567,7 @@ public:
// pr needs to be valid until the reader is destroyed or fast_forward_to()
// is called again.
future<> fast_forward_to(const dht::partition_range& pr) {
_impl->set_close_required();
return _impl->fast_forward_to(pr);
}
// Skips to a later range of rows.
@@ -587,6 +597,7 @@ public:
// In particular one must first enter a partition by fetching a `partition_start`
// fragment before calling `fast_forward_to`.
future<> fast_forward_to(position_range cr) {
_impl->set_close_required();
return _impl->fast_forward_to(std::move(cr));
}
// Closes the reader.

View File

@@ -1672,6 +1672,10 @@ bool gossiper::is_normal(const inet_address& endpoint) const {
return get_gossip_status(endpoint) == sstring(versioned_value::STATUS_NORMAL);
}
bool gossiper::is_left(const inet_address& endpoint) const {
return get_gossip_status(endpoint) == sstring(versioned_value::STATUS_LEFT);
}
bool gossiper::is_normal_ring_member(const inet_address& endpoint) const {
auto status = get_gossip_status(endpoint);
return status == sstring(versioned_value::STATUS_NORMAL) || status == sstring(versioned_value::SHUTDOWN);

View File

@@ -571,6 +571,7 @@ public:
bool is_seed(const inet_address& endpoint) const;
bool is_shutdown(const inet_address& endpoint) const;
bool is_normal(const inet_address& endpoint) const;
bool is_left(const inet_address& endpoint) const;
// Check if a node is in NORMAL or SHUTDOWN status which means the node is
// part of the token ring from the gossip point of view and operates in
// normal status or was in normal status but is shutdown.

View File

@@ -61,6 +61,10 @@ azure_snitch::azure_snitch(const sstring& fname, unsigned io_cpuid) : production
}
future<> azure_snitch::load_config() {
if (this_shard_id() != io_cpu_id()) {
co_return;
}
sstring region = co_await azure_api_call(REGION_NAME_QUERY_PATH);
sstring azure_zone = co_await azure_api_call(ZONE_NAME_QUERY_PATH);

View File

@@ -1,5 +1,7 @@
#include "locator/ec2_snitch.hh"
#include <seastar/core/seastar.hh>
#include <seastar/core/sleep.hh>
#include <seastar/core/do_with.hh>
#include <boost/algorithm/string/classification.hpp>
#include <boost/algorithm/string/split.hpp>
@@ -67,6 +69,30 @@ future<> ec2_snitch::start() {
}
future<sstring> ec2_snitch::aws_api_call(sstring addr, uint16_t port, sstring cmd) {
return do_with(int(0), [this, addr, port, cmd] (int& i) {
return repeat_until_value([this, addr, port, cmd, &i]() -> future<std::optional<sstring>> {
++i;
return aws_api_call_once(addr, port, cmd).then([] (auto res) {
return make_ready_future<std::optional<sstring>>(std::move(res));
}).handle_exception([&i] (auto ep) {
try {
std::rethrow_exception(ep);
} catch (const std::system_error &e) {
logger().error(e.what());
if (i >= AWS_API_CALL_RETRIES - 1) {
logger().error("Maximum number of retries exceeded");
throw e;
}
}
return sleep(AWS_API_CALL_RETRY_INTERVAL).then([] {
return make_ready_future<std::optional<sstring>>(std::nullopt);
});
});
});
});
}
future<sstring> ec2_snitch::aws_api_call_once(sstring addr, uint16_t port, sstring cmd) {
return connect(socket_address(inet_address{addr}, port))
.then([this, addr, cmd] (connected_socket fd) {
_sd = std::move(fd);

View File

@@ -29,6 +29,8 @@ public:
static constexpr const char* ZONE_NAME_QUERY_REQ = "/latest/meta-data/placement/availability-zone";
static constexpr const char* AWS_QUERY_SERVER_ADDR = "169.254.169.254";
static constexpr uint16_t AWS_QUERY_SERVER_PORT = 80;
static constexpr int AWS_API_CALL_RETRIES = 5;
static constexpr auto AWS_API_CALL_RETRY_INTERVAL = std::chrono::seconds{5};
ec2_snitch(const sstring& fname = "", unsigned io_cpu_id = 0);
virtual future<> start() override;
@@ -45,5 +47,6 @@ private:
output_stream<char> _out;
http_response_parser _parser;
sstring _zone_req;
future<sstring> aws_api_call_once(sstring addr, uint16_t port, const sstring cmd);
};
} // namespace locator

View File

@@ -562,6 +562,12 @@ int main(int ac, char** av) {
cfg->broadcast_to_all_shards().get();
// We pass this piece of config through a global as a temporary hack.
// See the comment at the definition of sstables::global_cache_index_pages.
smp::invoke_on_all([&cfg] {
sstables::global_cache_index_pages = cfg->cache_index_pages.operator utils::updateable_value<bool>();
}).get();
::sighup_handler sighup_handler(opts, *cfg);
auto stop_sighup_handler = defer_verbose_shutdown("sighup", [&] {
sighup_handler.stop().get();

View File

@@ -30,6 +30,7 @@
#include <seastar/core/io_priority_class.hh>
class memtable;
class reader_permit;
class flat_mutation_reader;
namespace sstables {

View File

@@ -442,6 +442,8 @@ static constexpr unsigned do_get_rpc_client_idx(messaging_verb verb) {
case messaging_verb::GOSSIP_ECHO:
case messaging_verb::GOSSIP_GET_ENDPOINT_STATES:
case messaging_verb::GET_SCHEMA_VERSION:
// ATTN -- if moving GOSSIP_ verbs elsewhere, mind updating the tcp_nodelay
// setting in get_rpc_client(), which assumes gossiper verbs live in idx 0
return 0;
case messaging_verb::PREPARE_MESSAGE:
case messaging_verb::PREPARE_DONE_MESSAGE:
@@ -689,7 +691,7 @@ shared_ptr<messaging_service::rpc_protocol_client_wrapper> messaging_service::ge
}();
auto must_tcp_nodelay = [&] {
if (idx == 1) {
if (idx == 0) {
return true; // gossip
}
if (_cfg.tcp_nodelay == tcp_nodelay_what::local) {
@@ -710,10 +712,7 @@ shared_ptr<messaging_service::rpc_protocol_client_wrapper> messaging_service::ge
}
opts.tcp_nodelay = must_tcp_nodelay;
opts.reuseaddr = true;
// We send cookies only for non-default statement tenant clients.
if (idx > 3) {
opts.isolation_cookie = _scheduling_info_for_connection_index[idx].isolation_cookie;
}
opts.isolation_cookie = _scheduling_info_for_connection_index[idx].isolation_cookie;
auto client = must_encrypt ?
::make_shared<rpc_protocol_client_wrapper>(_rpc->protocol(), std::move(opts),

View File

@@ -283,8 +283,8 @@ public:
future<> lookup_readers(db::timeout_clock::time_point timeout);
future<> save_readers(flat_mutation_reader::tracked_buffer unconsumed_buffer, detached_compaction_state compaction_state,
std::optional<clustering_key_prefix> last_ckey);
future<> save_readers(flat_mutation_reader::tracked_buffer unconsumed_buffer, std::optional<detached_compaction_state> compaction_state,
dht::decorated_key last_pkey, std::optional<clustering_key_prefix> last_ckey);
future<> stop();
};
@@ -583,19 +583,22 @@ future<> read_context::lookup_readers(db::timeout_clock::time_point timeout) {
});
}
future<> read_context::save_readers(flat_mutation_reader::tracked_buffer unconsumed_buffer, detached_compaction_state compaction_state,
std::optional<clustering_key_prefix> last_ckey) {
future<> read_context::save_readers(flat_mutation_reader::tracked_buffer unconsumed_buffer, std::optional<detached_compaction_state> compaction_state,
dht::decorated_key last_pkey, std::optional<clustering_key_prefix> last_ckey) {
if (_cmd.query_uuid == utils::UUID{}) {
return make_ready_future<>();
}
auto last_pkey = compaction_state.partition_start.key();
const auto cb_stats = dismantle_combined_buffer(std::move(unconsumed_buffer), last_pkey);
tracing::trace(_trace_state, "Dismantled combined buffer: {}", cb_stats);
const auto cs_stats = dismantle_compaction_state(std::move(compaction_state));
tracing::trace(_trace_state, "Dismantled compaction state: {}", cs_stats);
auto cs_stats = dismantle_buffer_stats{};
if (compaction_state) {
cs_stats = dismantle_compaction_state(std::move(*compaction_state));
tracing::trace(_trace_state, "Dismantled compaction state: {}", cs_stats);
} else {
tracing::trace(_trace_state, "No compaction state to dismantle, partition exhausted", cs_stats);
}
return do_with(std::move(last_pkey), std::move(last_ckey), [this] (const dht::decorated_key& last_pkey,
const std::optional<clustering_key_prefix>& last_ckey) {
@@ -694,16 +697,18 @@ future<typename ResultBuilder::result_type> do_query(
ResultBuilder&& result_builder) {
auto ctx = seastar::make_shared<read_context>(db, s, cmd, ranges, trace_state, timeout);
co_await ctx->lookup_readers(timeout);
std::exception_ptr ex;
try {
co_await ctx->lookup_readers(timeout);
auto [last_ckey, result, unconsumed_buffer, compaction_state] = co_await read_page<ResultBuilder>(ctx, s, cmd, ranges, trace_state,
std::move(result_builder));
if (compaction_state->are_limits_reached() || result.is_short_read()) {
co_await ctx->save_readers(std::move(unconsumed_buffer), std::move(*compaction_state).detach_state(), std::move(last_ckey));
// Must call before calling 'detached_state()`.
auto last_pkey = *compaction_state->current_partition();
co_await ctx->save_readers(std::move(unconsumed_buffer), std::move(*compaction_state).detach_state(), std::move(last_pkey), std::move(last_ckey));
}
co_await ctx->stop();

View File

@@ -175,6 +175,9 @@ class compact_mutation_state {
std::unique_ptr<mutation_compactor_garbage_collector> _collector;
compaction_stats _stats;
// Remember if we requested to stop mid-partition.
stop_iteration _stop = stop_iteration::no;
private:
static constexpr bool only_live() {
return OnlyLive == emit_only_live_rows::yes;
@@ -270,6 +273,7 @@ public:
}
void consume_new_partition(const dht::decorated_key& dk) {
_stop = stop_iteration::no;
auto& pk = dk.key();
_dk = &dk;
_return_static_content_on_partition_with_no_rows =
@@ -323,9 +327,9 @@ public:
_static_row_live = is_live;
if (is_live || (!only_live() && !sr.empty())) {
partition_is_not_empty(consumer);
return consumer.consume(std::move(sr), current_tombstone, is_live);
_stop = consumer.consume(std::move(sr), current_tombstone, is_live);
}
return stop_iteration::no;
return _stop;
}
template <typename Consumer, typename GCConsumer>
@@ -370,23 +374,22 @@ public:
if (only_live() && is_live) {
partition_is_not_empty(consumer);
auto stop = consumer.consume(std::move(cr), t, true);
_stop = consumer.consume(std::move(cr), t, true);
if (++_rows_in_current_partition == _current_partition_limit) {
return stop_iteration::yes;
_stop = stop_iteration::yes;
}
return stop;
return _stop;
} else if (!only_live()) {
auto stop = stop_iteration::no;
if (!cr.empty()) {
partition_is_not_empty(consumer);
stop = consumer.consume(std::move(cr), t, is_live);
_stop = consumer.consume(std::move(cr), t, is_live);
}
if (!sstable_compaction() && is_live && ++_rows_in_current_partition == _current_partition_limit) {
return stop_iteration::yes;
_stop = stop_iteration::yes;
}
return stop;
return _stop;
}
return stop_iteration::no;
return _stop;
}
template <typename Consumer, typename GCConsumer>
@@ -398,13 +401,13 @@ public:
if (rt.tomb > _range_tombstones.get_partition_tombstone()) {
if (can_purge_tombstone(rt.tomb)) {
partition_is_not_empty_for_gc_consumer(gc_consumer);
return gc_consumer.consume(std::move(rt));
_stop = gc_consumer.consume(std::move(rt));
} else {
partition_is_not_empty(consumer);
return consumer.consume(std::move(rt));
_stop = consumer.consume(std::move(rt));
}
}
return stop_iteration::no;
return _stop;
}
template <typename Consumer, typename GCConsumer>
@@ -492,9 +495,24 @@ public:
/// compactor will result in the new compactor being in the same state *this
/// is (given the same outside parameters of course). Practically this
/// allows the compaction state to be stored in the compacted reader.
detached_compaction_state detach_state() && {
/// If the currently compacted partition is exhausted a disengaged optional
/// is returned -- in this case there is no state to detach.
std::optional<detached_compaction_state> detach_state() && {
// If we exhausted the partition, there is no need to detach-restore the
// compaction state.
// We exhausted the partition if `consume_partition_end()` was called
// without us requesting the consumption to stop (remembered in _stop)
// from one of the consume() overloads.
// The consume algorithm calls `consume_partition_end()` in two cases:
// * on a partition-end fragment
// * consume() requested to stop
// In the latter case, the partition is not exhausted. Even if the next
// fragment to process is a partition-end, it will not be consumed.
if (!_stop) {
return {};
}
partition_start ps(std::move(_last_dk), _range_tombstones.get_partition_tombstone());
return {std::move(ps), std::move(_last_static_row), std::move(_range_tombstones).range_tombstones()};
return detached_compaction_state{std::move(ps), std::move(_last_static_row), std::move(_range_tombstones).range_tombstones()};
}
const compaction_stats& stats() const { return _stats; }

View File

@@ -843,7 +843,6 @@ public:
void apply(shadowable_tombstone deleted_at) {
_deleted_at.apply(deleted_at, _marker);
maybe_shadow();
}
void apply(row_tombstone deleted_at) {

View File

@@ -305,14 +305,23 @@ class partition_snapshot_flat_reader : public flat_mutation_reader::impl, public
const std::optional<position_in_partition>& last_row,
const std::optional<position_in_partition>& last_rts,
position_in_partition_view pos) {
if (!_rt_stream.empty()) {
return _rt_stream.get_next(std::move(pos));
}
return in_alloc_section([&] () -> mutation_fragment_opt {
maybe_refresh_state(ck_range_snapshot, last_row, last_rts);
position_in_partition::less_compare rt_less(_query_schema);
// The while below moves range tombstones from partition versions
// into _rt_stream, just enough to produce the next range tombstone
// The main goal behind moving to _rt_stream is to deoverlap range tombstones
// which have the same starting position. This is not in order to satisfy
// flat_mutation_reader stream requirements, the reader can emit range tombstones
// which have the same position incrementally. This is to guarantee forward
// progress in the case iterators get invalidated and maybe_refresh_state()
// above needs to restore them. It does so using last_rts, which tracks
// the position of the last emitted range tombstone. All range tombstones
// with positions <= than last_rts are skipped on refresh. To make progress,
// we need to make sure that all range tombstones with duplicated positions
// are emitted before maybe_refresh_state().
while (has_more_range_tombstones()
&& !rt_less(pos, peek_range_tombstone().position())
&& (_rt_stream.empty() || !rt_less(_rt_stream.peek_next().position(), peek_range_tombstone().position()))) {

View File

@@ -325,7 +325,7 @@ public:
// When throws, the cursor is invalidated and its position is not changed.
bool advance_to(position_in_partition_view lower_bound) {
prepare_heap(lower_bound);
bool found = no_clustering_row_between(_schema, lower_bound, _heap[0].it->position());
bool found = no_clustering_row_between_weak(_schema, lower_bound, _heap[0].it->position());
recreate_current_row();
return found;
}

View File

@@ -575,6 +575,20 @@ bool no_clustering_row_between(const schema& s, position_in_partition_view a, po
}
}
// Returns true if and only if there can't be any clustering_row with position >= a and < b.
// It is assumed that a <= b.
inline
bool no_clustering_row_between_weak(const schema& s, position_in_partition_view a, position_in_partition_view b) {
clustering_key_prefix::equality eq(s);
if (a.has_key() && b.has_key()) {
return eq(a.key(), b.key())
&& (a.get_bound_weight() == bound_weight::after_all_prefixed
|| b.get_bound_weight() != bound_weight::after_all_prefixed);
} else {
return !a.has_key() && !b.has_key();
}
}
// Includes all position_in_partition objects "p" for which: start <= p < end
// And only those.
class position_range {
@@ -659,3 +673,9 @@ inline
bool position_range::is_all_clustered_rows(const schema& s) const {
return _start.is_before_all_clustered_rows(s) && _end.is_after_all_clustered_rows(s);
}
// Assumes that the bounds of `r` are of 'clustered' type
// and that `r` is non-empty (the left bound is smaller than the right bound).
//
// If `r` does not contain any keys, returns nullopt.
std::optional<query::clustering_range> position_range_to_clustering_range(const position_range& r, const schema&);

View File

@@ -379,3 +379,52 @@ foreign_ptr<lw_shared_ptr<query::result>> result_merger::get() {
}
}
std::optional<query::clustering_range> position_range_to_clustering_range(const position_range& r, const schema& s) {
assert(r.start().get_type() == partition_region::clustered);
assert(r.end().get_type() == partition_region::clustered);
if (r.start().has_key() && r.end().has_key()
&& clustering_key_prefix::equality(s)(r.start().key(), r.end().key())) {
assert(r.start().get_bound_weight() != r.end().get_bound_weight());
if (r.end().get_bound_weight() == bound_weight::after_all_prefixed
&& r.start().get_bound_weight() != bound_weight::after_all_prefixed) {
// [before x, after x) and [for x, after x) get converted to [x, x].
return query::clustering_range::make_singular(r.start().key());
}
// [before x, for x) does not contain any keys.
return std::nullopt;
}
// position_range -> clustering_range
// (recall that position_ranges are always left-closed, right opened):
// [before x, ...), [for x, ...) -> [x, ...
// [after x, ...) -> (x, ...
// [..., before x), [..., for x) -> ..., x)
// [..., after x) -> ..., x]
auto to_bound = [&s] (const position_in_partition& p, bool left) -> std::optional<query::clustering_range::bound> {
if (p.is_before_all_clustered_rows(s)) {
assert(left);
return {};
}
if (p.is_after_all_clustered_rows(s)) {
assert(!left);
return {};
}
assert(p.has_key());
auto bw = p.get_bound_weight();
bool inclusive = left
? bw != bound_weight::after_all_prefixed
: bw == bound_weight::after_all_prefixed;
return query::clustering_range::bound{p.key(), inclusive};
};
return query::clustering_range{to_bound(r.start(), true), to_bound(r.end(), false)};
}

View File

@@ -109,7 +109,7 @@ void range_tombstone_list::insert_from(const schema& s,
if (cmp(end, it->position()) < 0) {
// not overlapping
if (it->tombstone().tomb == tomb && cmp(end, it->position()) == 0) {
rev.update(it, {std::move(start), std::move(start), tomb});
rev.update(it, {std::move(start), std::move(end), tomb});
} else {
auto rt = construct_range_tombstone_entry(std::move(start), std::move(end), tomb);
rev.insert(it, *rt);

View File

@@ -22,6 +22,7 @@
#pragma once
#include "mutation_fragment.hh"
#include "mutation_fragment_v2.hh"
#include "converting_mutation_partition_applier.hh"
// A StreamedMutationTransformer which transforms the stream to a different schema

Submodule seastar updated: 34e58f9995...6217d6ff4e

View File

@@ -1155,7 +1155,7 @@ private:
}
index_reader& get_index_reader() {
if (!_index_reader) {
auto caching = use_caching(!_slice.options.contains(query::partition_slice::option::bypass_cache));
auto caching = use_caching(global_cache_index_pages && !_slice.options.contains(query::partition_slice::option::bypass_cache));
_index_reader = std::make_unique<index_reader>(_sst, _consumer.permit(), _consumer.io_priority(),
_consumer.trace_state(), caching);
}

View File

@@ -1308,7 +1308,7 @@ private:
}
index_reader& get_index_reader() {
if (!_index_reader) {
auto caching = use_caching(!_slice.options.contains(query::partition_slice::option::bypass_cache));
auto caching = use_caching(global_cache_index_pages && !_slice.options.contains(query::partition_slice::option::bypass_cache));
_index_reader = std::make_unique<index_reader>(_sst, _consumer.permit(), _consumer.io_priority(),
_consumer.trace_state(), caching);
}
@@ -1745,9 +1745,7 @@ public:
_monitor.on_read_started(_context->reader_position());
}
public:
void on_out_of_clustering_range() override {
push_mutation_fragment(mutation_fragment_v2(*_schema, _permit, partition_end()));
}
void on_out_of_clustering_range() override { }
virtual future<> fast_forward_to(const dht::partition_range& pr) override {
on_internal_error(sstlog, "mx_crawling_sstable_mutation_reader: doesn't support fast_forward_to(const dht::partition_range&)");
}

View File

@@ -68,7 +68,12 @@ private:
entry(entry&&) noexcept = default;
~entry() {
assert(!is_referenced());
if (is_referenced()) {
// Live entry_ptr should keep the entry alive, except when the entry failed on loading.
// In that case, entry_ptr holders are not supposed to use the pointer, so it's safe
// to nullify those entry_ptrs.
assert(!ready());
}
}
void on_evicted() noexcept override;

View File

@@ -400,10 +400,15 @@ void time_series_sstable_set::for_each_sstable(std::function<void(const shared_s
// O(log n)
void time_series_sstable_set::insert(shared_sstable sst) {
try {
auto min_pos = sst->min_position();
auto max_pos_reversed = sst->max_position().reversed();
_sstables->emplace(std::move(min_pos), sst);
_sstables_reversed->emplace(std::move(max_pos_reversed), std::move(sst));
} catch (...) {
erase(sst);
throw;
}
}
// O(n) worst case, but should be close to O(log n) most of the time

View File

@@ -94,6 +94,18 @@ thread_local disk_error_signal_type sstable_write_error;
namespace sstables {
// The below flag governs the mode of index file page caching used by the index
// reader.
//
// If set to true, the reader will read and/or populate a common global cache,
// which shares its capacity with the row cache. If false, the reader will use
// BYPASS CACHE semantics for index caching.
//
// This flag is intended to be a temporary hack. The goal is to eventually
// solve index caching problems via a smart cache replacement policy.
//
thread_local utils::updateable_value<bool> global_cache_index_pages(false);
logging::logger sstlog("sstable");
// Because this is a noop and won't hold any state, it is better to use a global than a

View File

@@ -61,6 +61,7 @@
#include "sstables/open_info.hh"
#include "query-request.hh"
#include "mutation_fragment_stream_validator.hh"
#include "utils/updateable_value.hh"
#include <seastar/util/optimized_optional.hh>
@@ -70,6 +71,8 @@ class cached_file;
namespace sstables {
extern thread_local utils::updateable_value<bool> global_cache_index_pages;
namespace mc {
class writer;
}

View File

@@ -1493,13 +1493,14 @@ bool table::can_flush() const {
}
future<> table::clear() {
auto permits = co_await _config.dirty_memory_manager->get_all_flush_permits();
if (_commitlog) {
for (auto& t : *_memtables) {
_commitlog->discard_completed_segments(_schema->id(), t->get_and_discard_rp_set());
}
}
_memtables->clear_and_add();
return _cache.invalidate(row_cache::external_updater([] { /* There is no underlying mutation source */ }));
co_await _cache.invalidate(row_cache::external_updater([] { /* There is no underlying mutation source */ }));
}
// NOTE: does not need to be futurized, but might eventually, depending on

View File

@@ -107,9 +107,9 @@ def test_describe_table_size(test_table):
# Test the ProvisionedThroughput attribute returned by DescribeTable.
# This is a very partial test: Our test table is configured without
# provisioned throughput, so obviously it will not have interesting settings
# for it. DynamoDB returns zeros for some of the attributes, even though
# the documentation suggests missing values should have been fine too.
@pytest.mark.xfail(reason="DescribeTable does not return provisioned throughput")
# for it. But DynamoDB documents that zeros be returned for WriteCapacityUnits
# and ReadCapacityUnits, and does this in practice as well - and some
# applications assume these numbers are always there (even if 0).
def test_describe_table_provisioned_throughput(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert got['ProvisionedThroughput']['NumberOfDecreasesToday'] == 0

View File

@@ -438,6 +438,126 @@ def test_gsi_update_second_regular_base_column(test_table_gsi_3):
KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
# Test reproducing issue #11801: In issue #5006 we noticed that in the special
# case of a GSI with with two non-key attributes as keys (test_table_gsi_3),
# an update of the second attribute forgot to delete the old row. We fixed
# that bug, but a bug remained for updates which update the value to the *same*
# value - in that case the old row shouldn't be deleted, but we did - as
# noticed in issue #11801.
def test_11801(test_table_gsi_3):
p = random_string()
a = random_string()
b = random_string()
item = {'p': p, 'a': a, 'b': b, 'd': random_string()}
test_table_gsi_3.put_item(Item=item)
assert_index_query(test_table_gsi_3, 'hello', [item],
KeyConditions={'a': {'AttributeValueList': [a], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b], 'ComparisonOperator': 'EQ'}})
# Update the attribute 'b' to the same value b that it already had.
# This shouldn't change anything in the base table or in the GSI
test_table_gsi_3.update_item(Key={'p': p}, AttributeUpdates={'b': {'Value': b, 'Action': 'PUT'}})
assert item == test_table_gsi_3.get_item(Key={'p': p}, ConsistentRead=True)['Item']
# In issue #11801, the following assertion failed (the view row was
# deleted and nothing matched the query).
assert_index_query(test_table_gsi_3, 'hello', [item],
KeyConditions={'a': {'AttributeValueList': [a], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b], 'ComparisonOperator': 'EQ'}})
# Above we checked that setting 'b' to the same value didn't remove
# the old GSI row. But the same update may actually modify the GSI row
# (e.g., an unrelated attribute d) - check this modification took place:
item['d'] = random_string()
test_table_gsi_3.update_item(Key={'p': p},
AttributeUpdates={'b': {'Value': b, 'Action': 'PUT'},
'd': {'Value': item['d'], 'Action': 'PUT'}})
assert item == test_table_gsi_3.get_item(Key={'p': p}, ConsistentRead=True)['Item']
assert_index_query(test_table_gsi_3, 'hello', [item],
KeyConditions={'a': {'AttributeValueList': [a], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b], 'ComparisonOperator': 'EQ'}})
# This test is the same as test_11801, but updating the first attribute (a)
# instead of the second (b). This test didn't fail, showing that issue #11801
# is - like #5006 - specific to the case of updating the second attribute.
def test_11801_variant1(test_table_gsi_3):
p = random_string()
a = random_string()
b = random_string()
d = random_string()
item = {'p': p, 'a': a, 'b': b, 'd': d}
test_table_gsi_3.put_item(Item=item)
assert_index_query(test_table_gsi_3, 'hello', [item],
KeyConditions={'a': {'AttributeValueList': [a], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b], 'ComparisonOperator': 'EQ'}})
test_table_gsi_3.update_item(Key={'p': p}, AttributeUpdates={'a': {'Value': a, 'Action': 'PUT'}})
assert_index_query(test_table_gsi_3, 'hello', [item],
KeyConditions={'a': {'AttributeValueList': [a], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b], 'ComparisonOperator': 'EQ'}})
# This test is the same as test_11801, but updates b to a different value
# (newb) instead of to the same one. This test didn't fail, showing that
# issue #11801 is specific to updates to the same value. This test basically
# reproduces the already-fixed #5006 (we also have another test above which
# reproduces that issue - test_gsi_update_second_regular_base_column())
def test_11801_variant2(test_table_gsi_3):
p = random_string()
a = random_string()
b = random_string()
item = {'p': p, 'a': a, 'b': b, 'd': random_string()}
test_table_gsi_3.put_item(Item=item)
assert_index_query(test_table_gsi_3, 'hello', [item],
KeyConditions={'a': {'AttributeValueList': [a], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b], 'ComparisonOperator': 'EQ'}})
newb = random_string()
item['b'] = newb
test_table_gsi_3.update_item(Key={'p': p}, AttributeUpdates={'b': {'Value': newb, 'Action': 'PUT'}})
assert_index_query(test_table_gsi_3, 'hello', [],
KeyConditions={'a': {'AttributeValueList': [a], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b], 'ComparisonOperator': 'EQ'}})
assert_index_query(test_table_gsi_3, 'hello', [item],
KeyConditions={'a': {'AttributeValueList': [a], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [newb], 'ComparisonOperator': 'EQ'}})
# This test is the same as test_11801, but uses a different table schema
# (test_table_gsi_5) where there is only one new key column in the view (x).
# This test passed, showing that issue #11801 was specific to the special
# case of a view with two new key columns (test_table_gsi_3).
def test_11801_variant3(test_table_gsi_5):
p = random_string()
c = random_string()
x = random_string()
item = {'p': p, 'c': c, 'x': x, 'd': random_string()}
test_table_gsi_5.put_item(Item=item)
assert_index_query(test_table_gsi_5, 'hello', [item],
KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
'x': {'AttributeValueList': [x], 'ComparisonOperator': 'EQ'}})
test_table_gsi_5.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'x': {'Value': x, 'Action': 'PUT'}})
assert item == test_table_gsi_5.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert_index_query(test_table_gsi_5, 'hello', [item],
KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
'x': {'AttributeValueList': [x], 'ComparisonOperator': 'EQ'}})
# Another test similar to test_11801, but instead of updating a view key
# column to the same value it already has, simply don't update it at all
# (and just modify some other regular column). This test passed, showing
# that issue #11801 is specific to the case of updating a view key column
# to the same value it already had.
def test_11801_variant4(test_table_gsi_3):
p = random_string()
a = random_string()
b = random_string()
item = {'p': p, 'a': a, 'b': b, 'd': random_string()}
test_table_gsi_3.put_item(Item=item)
assert_index_query(test_table_gsi_3, 'hello', [item],
KeyConditions={'a': {'AttributeValueList': [a], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b], 'ComparisonOperator': 'EQ'}})
# An update that doesn't change the GSI keys (a or b), just a regular
# column d.
item['d'] = random_string()
test_table_gsi_3.update_item(Key={'p': p}, AttributeUpdates={'d': {'Value': item['d'], 'Action': 'PUT'}})
assert item == test_table_gsi_3.get_item(Key={'p': p}, ConsistentRead=True)['Item']
assert_index_query(test_table_gsi_3, 'hello', [item],
KeyConditions={'a': {'AttributeValueList': [a], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b], 'ComparisonOperator': 'EQ'}})
# Test that when a table has a GSI, if the indexed attribute is missing, the
# item is added to the base table but not the index.
# This is the same feature we already tested in test_gsi_missing_attribute()

View File

@@ -374,6 +374,14 @@ def test_getitem_attributes_to_get_duplicate(dynamodb, test_table):
with pytest.raises(ClientError, match='ValidationException.*Duplicate'):
test_table.get_item(Key={'p': p, 'c': c}, AttributesToGet=['a', 'a'], ConsistentRead=True)
# Verify that it is forbidden to ask for an empty AttributesToGet
# Reproduces issue #10332.
def test_getitem_attributes_to_get_empty(dynamodb, test_table):
p = random_string()
c = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': p, 'c': c}, AttributesToGet=[], ConsistentRead=True)
# Basic test for DeleteItem, with hash key only
def test_delete_item_hash(test_table_s):
p = random_string()

View File

@@ -170,6 +170,13 @@ def test_query_attributes_to_get(dynamodb, test_table):
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
# Verify that it is forbidden to ask for an empty AttributesToGet
# Reproduces issue #10332.
def test_query_attributes_to_get_empty(dynamodb, test_table):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, AttributesToGet=[])
# Test that in a table with both hash key and sort key, which keys we can
# Query by: We can Query by the hash key, by a combination of both hash and
# sort keys, but *cannot* query by just the sort key, and obviously not

View File

@@ -25,6 +25,8 @@
#include <deque>
#include <random>
#include "utils/lsa/chunked_managed_vector.hh"
#include "utils/managed_ref.hh"
#include "test/lib/log.hh"
#include <boost/range/algorithm/sort.hpp>
#include <boost/range/algorithm/equal.hpp>
@@ -216,3 +218,106 @@ SEASTAR_TEST_CASE(tests_reserve_partial) {
});
return make_ready_future<>();
}
SEASTAR_TEST_CASE(test_clear_and_release) {
region region;
allocating_section as;
with_allocator(region.allocator(), [&] {
lsa::chunked_managed_vector<managed_ref<uint64_t>> v;
for (uint64_t i = 1; i < 4000; ++i) {
as(region, [&] {
v.emplace_back(make_managed<uint64_t>(i));
});
}
v.clear_and_release();
});
return make_ready_future<>();
}
SEASTAR_TEST_CASE(test_chunk_reserve) {
region region;
allocating_section as;
for (auto conf :
{ // std::make_pair(reserve size, push count)
std::make_pair(0, 4000),
std::make_pair(100, 4000),
std::make_pair(200, 4000),
std::make_pair(1000, 4000),
std::make_pair(2000, 4000),
std::make_pair(3000, 4000),
std::make_pair(5000, 4000),
std::make_pair(500, 8000),
std::make_pair(1000, 8000),
std::make_pair(2000, 8000),
std::make_pair(8000, 500),
})
{
with_allocator(region.allocator(), [&] {
auto [reserve_size, push_count] = conf;
testlog.info("Testing reserve({}), {}x emplace_back()", reserve_size, push_count);
lsa::chunked_managed_vector<managed_ref<uint64_t>> v;
v.reserve(reserve_size);
uint64_t seed = rand();
for (uint64_t i = 0; i < push_count; ++i) {
as(region, [&] {
v.emplace_back(make_managed<uint64_t>(seed + i));
BOOST_REQUIRE(**v.begin() == seed);
});
}
auto v_it = v.begin();
for (uint64_t i = 0; i < push_count; ++i) {
BOOST_REQUIRE(**v_it++ == seed + i);
}
v.clear_and_release();
});
}
return make_ready_future<>();
}
// Tests the case of make_room() invoked with last_chunk_capacity_deficit but _size not in
// the last reserved chunk.
SEASTAR_TEST_CASE(test_shrinking_and_expansion_involving_chunk_boundary) {
region region;
allocating_section as;
with_allocator(region.allocator(), [&] {
lsa::chunked_managed_vector<managed_ref<uint64_t>> v;
// Fill two chunks
v.reserve(2000);
for (uint64_t i = 0; i < 2000; ++i) {
as(region, [&] {
v.emplace_back(make_managed<uint64_t>(i));
});
}
// Make the last chunk smaller than max size to trigger the last_chunk_capacity_deficit path in make_room()
v.shrink_to_fit();
// Leave the last chunk reserved but empty
for (uint64_t i = 0; i < 1000; ++i) {
v.pop_back();
}
// Try to reserve more than the currently reserved capacity and trigger last_chunk_capacity_deficit path
// with _size not in the last chunk. Should not sigsegv.
v.reserve(8000);
for (uint64_t i = 0; i < 2000; ++i) {
as(region, [&] {
v.emplace_back(make_managed<uint64_t>(i));
});
}
v.clear_and_release();
});
return make_ready_future<>();
}

View File

@@ -191,3 +191,32 @@ BOOST_AUTO_TEST_CASE(tests_reserve_partial) {
BOOST_REQUIRE_EQUAL(v.capacity(), orig_size);
}
}
// Tests the case of make_room() invoked with last_chunk_capacity_deficit but _size not in
// the last reserved chunk.
BOOST_AUTO_TEST_CASE(test_shrinking_and_expansion_involving_chunk_boundary) {
using vector_type = utils::chunked_vector<std::unique_ptr<uint64_t>>;
vector_type v;
// Fill two chunks
v.reserve(vector_type::max_chunk_capacity() * 3 / 2);
for (uint64_t i = 0; i < vector_type::max_chunk_capacity() * 3 / 2; ++i) {
v.emplace_back(std::make_unique<uint64_t>(i));
}
// Make the last chunk smaller than max size to trigger the last_chunk_capacity_deficit path in make_room()
v.shrink_to_fit();
// Leave the last chunk reserved but empty
for (uint64_t i = 0; i < vector_type::max_chunk_capacity(); ++i) {
v.pop_back();
}
// Try to reserve more than the currently reserved capacity and trigger last_chunk_capacity_deficit path
// with _size not in the last chunk. Should not sigsegv.
v.reserve(vector_type::max_chunk_capacity() * 4);
for (uint64_t i = 0; i < vector_type::max_chunk_capacity() * 2; ++i) {
v.emplace_back(std::make_unique<uint64_t>(i));
}
}

View File

@@ -32,7 +32,7 @@
#include "cql3/util.hh"
//
// Test basic CQL string quoting
// Test basic CQL identifier quoting
//
BOOST_AUTO_TEST_CASE(maybe_quote) {
std::string s(65536, 'x');
@@ -67,6 +67,16 @@ BOOST_AUTO_TEST_CASE(maybe_quote) {
BOOST_REQUIRE_EQUAL(cql3::util::maybe_quote("\"\""), "\"\"\"\"\"\"");
BOOST_REQUIRE_EQUAL(cql3::util::maybe_quote("\"hell0\""), "\"\"\"hell0\"\"\"");
BOOST_REQUIRE_EQUAL(cql3::util::maybe_quote("hello \"my\" world"), "\"hello \"\"my\"\" world\"");
// Reproducer for issue #9450. Reserved keywords like "to" or "where"
// need quoting, but unreserved keywords like "ttl", "int" or "as",
// do not.
BOOST_REQUIRE_EQUAL(cql3::util::maybe_quote("to"), "\"to\"");
BOOST_REQUIRE_EQUAL(cql3::util::maybe_quote("where"), "\"where\"");
BOOST_REQUIRE_EQUAL(cql3::util::maybe_quote("ttl"), "ttl");
BOOST_REQUIRE_EQUAL(cql3::util::maybe_quote("int"), "int");
BOOST_REQUIRE_EQUAL(cql3::util::maybe_quote("as"), "as");
BOOST_REQUIRE_EQUAL(cql3::util::maybe_quote("ttl hi"), "\"ttl hi\"");
}
//

View File

@@ -784,3 +784,38 @@ SEASTAR_TEST_CASE(upgrade_sstables) {
}).get();
});
}
SEASTAR_TEST_CASE(database_drop_column_family_clears_querier_cache) {
return do_with_cql_env_thread([] (cql_test_env& e) {
e.execute_cql("create table ks.cf (k text, v int, primary key (k));").get();
auto& db = e.local_db();
const auto ts = db_clock::now();
auto& tbl = db.find_column_family("ks", "cf");
auto op = std::optional(tbl.read_in_progress());
auto s = tbl.schema();
auto q = query::data_querier(
tbl.as_mutation_source(),
tbl.schema(),
database_test(db).get_user_read_concurrency_semaphore().make_tracking_only_permit(s.get(), "test", db::no_timeout),
query::full_partition_range,
s->full_slice(),
default_priority_class(),
nullptr);
auto f = e.db().invoke_on_all([ts] (database& db) {
return db.drop_column_family("ks", "cf", [ts] { return make_ready_future<db_clock::time_point>(ts); });
});
// we add a querier to the querier cache while the drop is ongoing
auto& qc = db.get_querier_cache();
qc.insert(utils::make_random_uuid(), std::move(q), nullptr);
BOOST_REQUIRE_EQUAL(qc.get_stats().population, 1);
op.reset(); // this should allow the drop to finish
f.get();
// the drop should have cleaned up all entries belonging to that table
BOOST_REQUIRE_EQUAL(qc.get_stats().population, 0);
});
}

View File

@@ -941,3 +941,28 @@ SEASTAR_THREAD_TEST_CASE(test_reverse_reader_is_mutation_source) {
};
run_mutation_source_tests(populate);
}
SEASTAR_THREAD_TEST_CASE(test_allow_reader_early_destruction) {
struct test_reader_impl : public flat_mutation_reader::impl {
using flat_mutation_reader::impl::impl;
virtual future<> fill_buffer() override { return make_ready_future<>(); }
virtual future<> next_partition() override { return make_ready_future<>(); }
virtual future<> fast_forward_to(const dht::partition_range&) override { return make_ready_future<>(); }
virtual future<> fast_forward_to(position_range) override { return make_ready_future<>(); }
virtual future<> close() noexcept override { return make_ready_future<>(); };
};
struct test_reader_v2_impl : public flat_mutation_reader_v2::impl {
using flat_mutation_reader_v2::impl::impl;
virtual future<> fill_buffer() override { return make_ready_future<>(); }
virtual future<> next_partition() override { return make_ready_future<>(); }
virtual future<> fast_forward_to(const dht::partition_range&) override { return make_ready_future<>(); }
virtual future<> fast_forward_to(position_range) override { return make_ready_future<>(); }
virtual future<> close() noexcept override { return make_ready_future<>(); };
};
simple_schema s;
tests::reader_concurrency_semaphore_wrapper semaphore;
// These readers are not closed, but didn't start any operations, so it's safe for them to be destroyed.
auto reader = make_flat_mutation_reader<test_reader_impl>(s.schema(), semaphore.make_permit());
auto reader_v2 = make_flat_mutation_reader_v2<test_reader_v2_impl>(s.schema(), semaphore.make_permit());
}

View File

@@ -391,3 +391,87 @@ SEASTAR_TEST_CASE(test_loading_cache_reload_during_eviction) {
BOOST_REQUIRE_EQUAL(loading_cache.size(), 1);
});
}
SEASTAR_THREAD_TEST_CASE(test_loading_cache_remove_leaves_no_old_entries_behind) {
using namespace std::chrono;
load_count = 0;
auto load_v1 = [] (auto key) { return make_ready_future<sstring>("v1"); };
auto load_v2 = [] (auto key) { return make_ready_future<sstring>("v2"); };
auto load_v3 = [] (auto key) { return make_ready_future<sstring>("v3"); };
{
utils::loading_cache<int, sstring> loading_cache(num_loaders, 100s, testlog);
auto stop_cache_reload = seastar::defer([&loading_cache] { loading_cache.stop().get(); });
//
// Test remove() concurrent with loading
//
auto f = loading_cache.get_ptr(0, [&](auto key) {
return later().then([&] {
return load_v1(key);
});
});
loading_cache.remove(0);
BOOST_REQUIRE_EQUAL(loading_cache.find(0), nullptr);
BOOST_REQUIRE_EQUAL(loading_cache.size(), 0);
auto ptr1 = f.get0();
BOOST_REQUIRE_EQUAL(*ptr1, "v1");
BOOST_REQUIRE_EQUAL(loading_cache.find(0), nullptr);
BOOST_REQUIRE_EQUAL(loading_cache.size(), 0);
ptr1 = loading_cache.get_ptr(0, load_v2).get0();
loading_cache.remove(0);
BOOST_REQUIRE_EQUAL(*ptr1, "v2");
//
// Test that live ptr1, removed from cache, does not prevent reload of new value
//
auto ptr2 = loading_cache.get_ptr(0, load_v3).get0();
ptr1 = nullptr;
BOOST_REQUIRE_EQUAL(*ptr2, "v3");
}
// Test remove_if()
{
utils::loading_cache<int, sstring> loading_cache(num_loaders, 100s, testlog);
auto stop_cache_reload = seastar::defer([&loading_cache] { loading_cache.stop().get(); });
//
// Test remove_if() concurrent with loading
//
auto f = loading_cache.get_ptr(0, [&](auto key) {
return later().then([&] {
return load_v1(key);
});
});
loading_cache.remove_if([] (auto&& v) { return v == "v1"; });
BOOST_REQUIRE_EQUAL(loading_cache.find(0), nullptr);
BOOST_REQUIRE_EQUAL(loading_cache.size(), 0);
auto ptr1 = f.get0();
BOOST_REQUIRE_EQUAL(*ptr1, "v1");
BOOST_REQUIRE_EQUAL(loading_cache.find(0), nullptr);
BOOST_REQUIRE_EQUAL(loading_cache.size(), 0);
ptr1 = loading_cache.get_ptr(0, load_v2).get0();
loading_cache.remove_if([] (auto&& v) { return v == "v2"; });
BOOST_REQUIRE_EQUAL(*ptr1, "v2");
//
// Test that live ptr1, removed from cache, does not prevent reload of new value
//
auto ptr2 = loading_cache.get_ptr(0, load_v3).get0();
ptr1 = nullptr;
BOOST_REQUIRE_EQUAL(*ptr2, "v3");
ptr2 = nullptr;
}
}

View File

@@ -39,6 +39,9 @@
#include "test/lib/random_utils.hh"
#include "test/lib/log.hh"
#include "test/lib/reader_concurrency_semaphore.hh"
#include "test/lib/simple_schema.hh"
#include "test/lib/make_random_string.hh"
#include "utils/error_injection.hh"
static api::timestamp_type next_timestamp() {
static thread_local api::timestamp_type next_timestamp = 1;
@@ -528,6 +531,74 @@ SEASTAR_TEST_CASE(test_exception_safety_of_single_partition_reads) {
});
}
SEASTAR_THREAD_TEST_CASE(test_tombstone_merging_with_multiple_versions) {
tests::reader_concurrency_semaphore_wrapper semaphore;
simple_schema ss;
auto s = ss.schema();
auto mt = make_lw_shared<memtable>(ss.schema());
auto pk = ss.make_pkey(0);
auto pr = dht::partition_range::make_singular(pk);
auto t0 = ss.new_tombstone();
auto t1 = ss.new_tombstone();
auto t2 = ss.new_tombstone();
auto t3 = ss.new_tombstone();
mutation m1(s, pk);
ss.delete_range(m1, *position_range_to_clustering_range(position_range(
position_in_partition::before_key(ss.make_ckey(0)),
position_in_partition::for_key(ss.make_ckey(3))), *s), t1);
ss.add_row(m1, ss.make_ckey(0), "v");
ss.add_row(m1, ss.make_ckey(1), "v");
// Fill so that rd1 stays in the partition snapshot
int n_rows = 1000;
auto v = make_random_string(512);
for (int i = 0; i < n_rows; ++i) {
ss.add_row(m1, ss.make_ckey(i), v);
}
mutation m2(s, pk);
ss.delete_range(m2, *position_range_to_clustering_range(position_range(
position_in_partition::before_key(ss.make_ckey(0)),
position_in_partition::before_key(ss.make_ckey(1))), *s), t2);
ss.delete_range(m2, *position_range_to_clustering_range(position_range(
position_in_partition::before_key(ss.make_ckey(1)),
position_in_partition::for_key(ss.make_ckey(3))), *s), t3);
mutation m3(s, pk);
ss.delete_range(m3, *position_range_to_clustering_range(position_range(
position_in_partition::before_key(ss.make_ckey(0)),
position_in_partition::for_key(ss.make_ckey(4))), *s), t0);
mt->apply(m1);
auto rd1 = mt->make_flat_reader(s, semaphore.make_permit(), pr, s->full_slice(), default_priority_class(),
nullptr, streamed_mutation::forwarding::no, mutation_reader::forwarding::no);
auto close_rd1 = defer([&] { rd1.close().get(); });
rd1.fill_buffer().get();
BOOST_REQUIRE(!rd1.is_end_of_stream()); // rd1 must keep the m1 version alive
mt->apply(m2);
auto rd2 = mt->make_flat_reader(s, semaphore.make_permit(), pr, s->full_slice(), default_priority_class(),
nullptr, streamed_mutation::forwarding::no, mutation_reader::forwarding::no);
auto close_r2 = defer([&] { rd2.close().get(); });
rd2.fill_buffer().get();
BOOST_REQUIRE(!rd2.is_end_of_stream()); // rd2 must keep the m1 version alive
mt->apply(m3);
assert_that(mt->make_flat_reader(s, semaphore.make_permit(), pr))
.has_monotonic_positions();
assert_that(mt->make_flat_reader(s, semaphore.make_permit(), pr))
.produces(m1 + m2 + m3);
}
SEASTAR_TEST_CASE(test_hash_is_cached) {
return seastar::async([] {
auto s = schema_builder("ks", "cf")

View File

@@ -560,7 +560,7 @@ SEASTAR_TEST_CASE(test_apply_to_incomplete_respects_continuity) {
static mutation_partition read_using_cursor(partition_snapshot& snap) {
tests::reader_concurrency_semaphore_wrapper semaphore;
partition_snapshot_row_cursor cur(*snap.schema(), snap);
cur.maybe_refresh();
cur.advance_to(position_in_partition::before_all_clustered_rows());
auto mp = read_partition_from(*snap.schema(), cur);
for (auto&& rt : snap.range_tombstones()) {
mp.apply_delete(*snap.schema(), rt);

View File

@@ -763,9 +763,8 @@ SEASTAR_THREAD_TEST_CASE(multi_col_in) {
cquery_nofail(e, "insert into t(pk,ck1,ck2,r) values (4,13,23,'a')");
require_rows(e, "select pk from t where (ck1,ck2) in ((13,23)) allow filtering", {{I(3)}, {I(4)}});
require_rows(e, "select pk from t where (ck1) in ((13),(33),(44)) allow filtering", {{I(3)}, {I(4)}});
// TODO: uncomment when #6200 is fixed.
// require_rows(e, "select pk from t where (ck1,ck2) in ((13,23)) and r='a' allow filtering",
// {{I(4), I(13), F(23), T("a")}});
require_rows(e, "select pk from t where (ck1,ck2) in ((13,23)) and r='a' allow filtering",
{{I(4), I(13), F(23), T("a")}});
cquery_nofail(e, "delete from t where pk=4");
require_rows(e, "select pk from t where (ck1,ck2) in ((13,23)) allow filtering", {{I(3)}});
auto stmt = e.prepare("select ck1 from t where (ck1,ck2) in ? allow filtering").get0();

View File

@@ -1242,9 +1242,13 @@ SEASTAR_TEST_CASE(test_update_failure) {
class throttle {
unsigned _block_counter = 0;
promise<> _p; // valid when _block_counter != 0, resolves when goes down to 0
std::optional<promise<>> _entered;
bool _one_shot;
public:
// one_shot means whether only the first enter() after block() will block.
throttle(bool one_shot = false) : _one_shot(one_shot) {}
future<> enter() {
if (_block_counter) {
if (_block_counter && (!_one_shot || _entered)) {
promise<> p1;
promise<> p2;
@@ -1256,16 +1260,21 @@ public:
p3.set_value();
});
_p = std::move(p2);
if (_entered) {
_entered->set_value();
_entered.reset();
}
return f1;
} else {
return make_ready_future<>();
}
}
void block() {
future<> block() {
++_block_counter;
_p = promise<>();
_entered = promise<>();
return _entered->get_future();
}
void unblock() {
@@ -1410,7 +1419,7 @@ SEASTAR_TEST_CASE(test_cache_population_and_update_race) {
mt2->apply(m);
}
thr.block();
auto f = thr.block();
auto m0_range = dht::partition_range::make_singular(ring[0].ring_position());
auto rd1 = cache.make_reader(s, semaphore.make_permit(), m0_range);
@@ -1421,6 +1430,7 @@ SEASTAR_TEST_CASE(test_cache_population_and_update_race) {
rd2.set_max_buffer_size(1);
auto rd2_fill_buffer = rd2.fill_buffer();
f.get();
sleep(10ms).get();
// This update should miss on all partitions
@@ -1548,12 +1558,13 @@ SEASTAR_TEST_CASE(test_cache_population_and_clear_race) {
mt2->apply(m);
}
thr.block();
auto f = thr.block();
auto rd1 = cache.make_reader(s, semaphore.make_permit());
rd1.set_max_buffer_size(1);
auto rd1_fill_buffer = rd1.fill_buffer();
f.get();
sleep(10ms).get();
// This update should miss on all partitions
@@ -3777,3 +3788,81 @@ SEASTAR_TEST_CASE(test_scans_erase_dummies) {
BOOST_REQUIRE_EQUAL(tracker.get_stats().rows, 2);
});
}
SEASTAR_TEST_CASE(test_eviction_of_upper_bound_of_population_range) {
return seastar::async([] {
simple_schema s;
tests::reader_concurrency_semaphore_wrapper semaphore;
auto cache_mt = make_lw_shared<memtable>(s.schema());
auto pkey = s.make_pkey("pk");
mutation m1(s.schema(), pkey);
s.add_row(m1, s.make_ckey(1), "v1");
s.add_row(m1, s.make_ckey(2), "v2");
cache_mt->apply(m1);
cache_tracker tracker;
throttle thr(true);
auto cache_source = make_decorated_snapshot_source(snapshot_source([&] { return cache_mt->as_data_source(); }),
[&] (mutation_source src) {
return throttled_mutation_source(thr, std::move(src));
});
row_cache cache(s.schema(), cache_source, tracker);
auto pr = dht::partition_range::make_singular(pkey);
auto read = [&] (int start, int end) {
auto slice = partition_slice_builder(*s.schema())
.with_range(query::clustering_range::make(s.make_ckey(start), s.make_ckey(end)))
.build();
auto rd = cache.make_reader(s.schema(), semaphore.make_permit(), pr, slice);
auto close_rd = deferred_close(rd);
auto m_cache = read_mutation_from_flat_mutation_reader(rd).get0();
close_rd.close_now();
rd = cache_mt->make_flat_reader(s.schema(), semaphore.make_permit(), pr, slice);
auto close_rd2 = deferred_close(rd);
auto m_mt = read_mutation_from_flat_mutation_reader(rd).get0();
BOOST_REQUIRE(m_mt);
assert_that(m_cache).has_mutation().is_equal_to(*m_mt);
};
// populate [2]
{
auto slice = partition_slice_builder(*s.schema())
.with_range(query::clustering_range::make_singular(s.make_ckey(2)))
.build();
assert_that(cache.make_reader(s.schema(), semaphore.make_permit(), pr, slice))
.has_monotonic_positions();
}
auto arrived = thr.block();
// Read [0, 2]
auto f = seastar::async([&] {
read(0, 2);
});
arrived.get();
// populate (2, 3]
{
auto slice = partition_slice_builder(*s.schema())
.with_range(query::clustering_range::make(query::clustering_range::bound(s.make_ckey(2), false),
query::clustering_range::bound(s.make_ckey(3), true)))
.build();
assert_that(cache.make_reader(s.schema(), semaphore.make_permit(), pr, slice))
.has_monotonic_positions();
}
testlog.trace("Evicting");
evict_one_row(tracker); // Evicts before(0)
evict_one_row(tracker); // Evicts ck(2)
testlog.trace("Unblocking");
thr.unblock();
f.get();
read(0, 3);
});
}

View File

@@ -72,6 +72,7 @@
#include "test/lib/reader_concurrency_semaphore.hh"
#include "test/lib/sstable_utils.hh"
#include "test/lib/random_utils.hh"
#include "test/lib/random_schema.hh"
namespace fs = std::filesystem;
@@ -3003,3 +3004,58 @@ SEASTAR_TEST_CASE(sstable_reader_with_timeout) {
});
});
}
SEASTAR_TEST_CASE(test_crawling_reader_out_of_range_last_range_tombstone_change) {
return test_env::do_with_async([] (test_env& env) {
simple_schema table;
auto mut = table.new_mutation("pk0");
auto ckeys = table.make_ckeys(4);
table.add_row(mut, ckeys[0], "v0");
table.add_row(mut, ckeys[1], "v1");
table.add_row(mut, ckeys[2], "v2");
using bound = query::clustering_range::bound;
table.delete_range(mut, query::clustering_range::make(bound{ckeys[3], true}, bound{clustering_key::make_empty(), true}), tombstone(1, gc_clock::now()));
auto tmp = tmpdir();
auto sst_gen = [&env, &table, &tmp] () {
return env.make_sstable(table.schema(), tmp.path().string(), 1, sstables::get_highest_sstable_version(), big);
};
auto sst = make_sstable_containing(sst_gen, {mut});
assert_that(sst->make_crawling_reader(table.schema(), env.make_reader_permit())).has_monotonic_positions();
});
}
SEASTAR_TEST_CASE(test_crawling_reader_random_schema_random_mutations) {
return test_env::do_with_async([this] (test_env& env) {
auto random_spec = tests::make_random_schema_specification(
get_name(),
std::uniform_int_distribution<size_t>(1, 4),
std::uniform_int_distribution<size_t>(2, 4),
std::uniform_int_distribution<size_t>(2, 8),
std::uniform_int_distribution<size_t>(2, 8));
auto random_schema = tests::random_schema{tests::random::get_int<uint32_t>(), *random_spec};
auto schema = random_schema.schema();
testlog.info("Random schema:\n{}", random_schema.cql());
const auto muts = tests::generate_random_mutations(random_schema, 20).get();
auto tmp = tmpdir();
auto sst_gen = [&env, schema, &tmp] () {
return env.make_sstable(schema, tmp.path().string(), 1, sstables::get_highest_sstable_version(), big);
};
auto sst = make_sstable_containing(sst_gen, muts);
{
auto rd = assert_that(sst->make_crawling_reader(schema, env.make_reader_permit()));
for (const auto& mut : muts) {
rd.produces(mut);
}
}
assert_that(sst->make_crawling_reader(schema, env.make_reader_permit())).has_monotonic_positions();
});
}

View File

@@ -49,10 +49,18 @@ static void add_entry(logalloc::region& r,
static partition_index_page make_page0(logalloc::region& r, simple_schema& s) {
partition_index_page page;
auto destroy_page = defer([&] {
with_allocator(r.allocator(), [&] {
auto p = std::move(page);
});
});
add_entry(r, *s.schema(), page, s.make_pkey(0).key(), 0);
add_entry(r, *s.schema(), page, s.make_pkey(1).key(), 1);
add_entry(r, *s.schema(), page, s.make_pkey(2).key(), 2);
add_entry(r, *s.schema(), page, s.make_pkey(3).key(), 3);
destroy_page.cancel();
return page;
}
@@ -143,6 +151,47 @@ SEASTAR_THREAD_TEST_CASE(test_caching) {
}
}
template <typename T>
static future<> ignore_result(future<T>&& f) {
return f.then_wrapped([] (auto&& f) {
try {
f.get();
} catch (...) {
// expected, silence warnings about ignored failed futures
}
});
}
SEASTAR_THREAD_TEST_CASE(test_exception_while_loading) {
::lru lru;
simple_schema s;
logalloc::region r;
partition_index_cache cache(lru, r);
auto clear_lru = defer([&] {
with_allocator(r.allocator(), [&] {
lru.evict_all();
});
});
auto page0_loader = [&] (partition_index_cache::key_type k) {
return later().then([&] {
return make_page0(r, s);
});
};
memory::with_allocation_failures([&] {
cache.evict_gently().get();
auto f0 = ignore_result(cache.get_or_load(0, page0_loader));
auto f1 = ignore_result(cache.get_or_load(0, page0_loader));
f0.get();
f1.get();
});
auto ptr = cache.get_or_load(0, page0_loader).get0();
has_page0(ptr);
}
SEASTAR_THREAD_TEST_CASE(test_auto_clear) {
::lru lru;
simple_schema s;

View File

@@ -19,6 +19,7 @@ from cassandra.cluster import ConsistencyLevel
from cassandra.query import SimpleStatement
from util import new_test_table
from nodetool import flush
def test_cdc_log_entries_use_cdc_streams(scylla_only, cql, test_keyspace):
'''Test that the stream IDs chosen for CDC log entries come from the CDC generation
@@ -44,3 +45,16 @@ def test_cdc_log_entries_use_cdc_streams(scylla_only, cql, test_keyspace):
assert(log_stream_ids.issubset(stream_ids))
# Test for #10473 - reading logs (from sstable) after dropping
# column in base.
def test_cdc_alter_table_drop_column(scylla_only, cql, test_keyspace):
schema = "pk int primary key, v int"
extra = " with cdc = {'enabled': true}"
with new_test_table(cql, test_keyspace, schema, extra) as table:
cql.execute(f"insert into {table} (pk, v) values (0, 0)")
cql.execute(f"insert into {table} (pk, v) values (1, null)")
flush(cql, table)
flush(cql, table + "_scylla_cdc_log")
cql.execute(f"alter table {table} drop v")
cql.execute(f"select * from {table}_scylla_cdc_log")

View File

@@ -24,9 +24,11 @@
# is or isn't necessary.
import pytest
import re
from util import new_test_table
from cassandra.protocol import InvalidRequest
from cassandra.connection import DRIVER_NAME, DRIVER_VERSION
from cassandra.query import UNSET_VALUE
# When filtering for "x > 0" or "x < 0", rows with an unset value for x
# should not match the filter.
@@ -141,3 +143,118 @@ def test_index_with_in_relation(scylla_only, cql, test_keyspace):
cql.execute(f"insert into {table} (p,c,v) values ({p}, {c}, {v})")
res = cql.execute(f"select * from {table} where p in (0,1) and v = False ALLOW FILTERING")
assert set(res) == set([(0,1,False),(0,3,False),(1,1,False), (1,3,False)])
# Test that LIKE operator works fine as a filter when the filtered column
# has descending order. Regression test for issue #10183, when it was incorrectly
# rejected as a "non-string" column.
def test_filter_like_on_desc_column(cql, test_keyspace):
with new_test_table(cql, test_keyspace, "a int, b text, primary key(a, b)",
extra="with clustering order by (b desc)") as table:
cql.execute(f"INSERT INTO {table} (a, b) VALUES (1, 'one')")
res = cql.execute(f"SELECT b FROM {table} WHERE b LIKE '%%%' ALLOW FILTERING")
assert res.one().b == "one"
# Test that IN restrictions are supported with filtering and return the
# correct results.
# We mark this test "cassandra_bug" because Cassandra could support this
# feature but doesn't yet: It reports "IN predicates on non-primary-key
# columns (v) is not yet supported" when v is a regular column, or "IN
# restrictions are not supported when the query involves filtering" on
# partition-key columns p1 or p2. By the way, it does support IN restrictions
# on a clustering-key column.
def test_filtering_with_in_relation(cql, test_keyspace, cassandra_bug):
schema = 'p1 int, p2 int, c int, v int, primary key ((p1, p2),c)'
with new_test_table(cql, test_keyspace, schema) as table:
cql.execute(f"INSERT INTO {table} (p1, p2, c, v) VALUES (1, 2, 3, 4)")
cql.execute(f"INSERT INTO {table} (p1, p2, c, v) VALUES (2, 3, 4, 5)")
cql.execute(f"INSERT INTO {table} (p1, p2, c, v) VALUES (3, 4, 5, 6)")
cql.execute(f"INSERT INTO {table} (p1, p2, c, v) VALUES (4, 5, 6, 7)")
res = cql.execute(f"select * from {table} where p1 in (2,4) ALLOW FILTERING")
assert set(res) == set([(2,3,4,5), (4,5,6,7)])
res = cql.execute(f"select * from {table} where p2 in (2,4) ALLOW FILTERING")
assert set(res) == set([(1,2,3,4), (3,4,5,6)])
res = cql.execute(f"select * from {table} where c in (3,5) ALLOW FILTERING")
assert set(res) == set([(1,2,3,4), (3,4,5,6)])
res = cql.execute(f"select * from {table} where v in (5,7) ALLOW FILTERING")
assert set(res) == set([(2,3,4,5), (4,5,6,7)])
# Test that subscripts in expressions work as expected. They should only work
# on map columns, and must have the correct type. Test that they also work
# as expected for null or unset subscripts.
# Cassandra considers the null subscript 'm[null]' to be an invalid request.
# In Scylla we decided to it differently (we think better): m[null] is simply
# a null, so the filter 'WHERE m[null] = 2' is not an error - it just doesn't
# match anything. This is more consistent with our usual null handling
# (null[2] and null < 2 are both defined as returning null), and will also
# allow us in the future to support non-constant subscript - for example m[a]
# where the column a can be null for some rows and non-null for other rows.
# Because we decided that our behavior is better than Cassandra's, this test
# fails on Cassandra and is marked with cassandra_bug.
# This test is a superset of test test_null.py::test_map_subscript_null which
# tests only the special case of a null subscript.
# Reproduces #10361
def test_filtering_with_subscript(cql, test_keyspace, cassandra_bug):
with new_test_table(cql, test_keyspace,
"p int, m1 map<int, int>, m2 map<text, text>, s set<int>, PRIMARY KEY (p)") as table:
# Check for *errors* in subscript expressions - such as wrong type or
# null - with an empty table. This will force the implementation to
# check for these errors before actually evaluating the filter
# expression - because there will be no rows to filter.
# A subscript is not allowed on a non-map column (in this case, a set)
with pytest.raises(InvalidRequest, match='cannot be used as a map'):
cql.execute(f"SELECT p FROM {table} WHERE s[2] = 3 ALLOW FILTERING")
# A wrong type is passed for the subscript is not allowed
with pytest.raises(InvalidRequest, match=re.escape('key(m1)')):
cql.execute(f"select p from {table} where m1['black'] = 2 ALLOW FILTERING")
with pytest.raises(InvalidRequest, match=re.escape('key(m2)')):
cql.execute(f"select p from {table} where m2[1] = 2 ALLOW FILTERING")
# See discussion of m1[null] above. Reproduces #10361, and fails
# on Cassandra (Cassandra deliberately returns an error here -
# an InvalidRequest with "Unsupported null map key for column m1"
assert list(cql.execute(f"select p from {table} where m1[null] = 2 ALLOW FILTERING")) == []
assert list(cql.execute(f"select p from {table} where m2[null] = 'hi' ALLOW FILTERING")) == []
# Similar to above checks, but using a prepared statement. We can't
# cause the driver to send the wrong type to a bound variable, so we
# can't check that case unfortunately, but we have a new UNSET_VALUE
# case.
stmt = cql.prepare(f"select p from {table} where m1[?] = 2 ALLOW FILTERING")
assert list(cql.execute(stmt, [None])) == []
# The expression m1[UNSET_VALUE] should be an error, but because the
# table is empty, we do not actually need to evaluate the expression
# and the error might might not be caught. So this test is commented
# out. We'll do it below, after we add some data to ensure that the
# expression does need to be evaluated.
#with pytest.raises(InvalidRequest, match='Unsupported unset map key for column m1'):
# cql.execute(stmt, [UNSET_VALUE])
# Finally, check for sucessful filtering with subscripts. For that we
# need to add some data:
cql.execute("INSERT INTO "+table+" (p, m1, m2) VALUES (1, {1:2, 3:4}, {'dog':'cat', 'hi':'hello'})")
cql.execute("INSERT INTO "+table+" (p, m1, m2) VALUES (2, {2:3, 4:5}, {'man':'woman', 'black':'white'})")
res = cql.execute(f"select p from {table} where m1[1] = 2 ALLOW FILTERING")
assert list(res) == [(1,)]
res = cql.execute(f"select p from {table} where m2['black'] = 'white' ALLOW FILTERING")
assert list(res) == [(2,)]
res = cql.execute(stmt, [1])
assert list(res) == [(1,)]
# Try again the null-key request (reproduces #10361) that we did
# earlier when there was no data in the table. Now there is, and
# the scan brings up several rows, it may exercise different code
# paths.
assert list(cql.execute(f"select p from {table} where m1[null] = 2 ALLOW FILTERING")) == []
with pytest.raises(InvalidRequest, match='Unsupported unset map key for column m1'):
cql.execute(stmt, [UNSET_VALUE])
# Beyond the tests of map subscript expressions above, also test what happens
# when the expression is fine (e.g., m[2] = 3) but the *data* itself is null.
# We used to have a bug there where we attempted to incorrectly deserialize
# this null and get marshaling errors or even crashes - see issue #10417.
# This test reproduces #10417, but not always - run with "--count" to
# reproduce failures.
def test_filtering_null_map_with_subscript(cql, test_keyspace):
schema = 'p text primary key, m map<int, int>'
with new_test_table(cql, test_keyspace, schema) as table:
cql.execute(f"INSERT INTO {table} (p) VALUES ('dog')")
assert list(cql.execute(f"SELECT p FROM {table} WHERE m[2] = 3 ALLOW FILTERING")) == []

View File

@@ -98,3 +98,51 @@ def test_mv_empty_string_partition_key(cql, test_keyspace):
# because Cassandra forbids an empty partition key on select
with pytest.raises(InvalidRequest, match='Key may not be empty'):
cql.execute(f"SELECT * FROM {mv} WHERE v=''")
# Refs #10851. The code used to create a wildcard selection for all columns,
# which erroneously also includes static columns if such are present in the
# base table. Currently views only operate on regular columns and the filtering
# code assumes that. TODO: once we implement static column support for materialized
# views, this test case will be a nice regression test to ensure that everything still
# works if the static columns are *not* used in the view.
def test_filter_with_unused_static_column(cql, test_keyspace):
schema = 'p int, c int, v int, s int static, primary key (p,c)'
with new_test_table(cql, test_keyspace, schema) as table:
with new_materialized_view(cql, table, select='p,c,v', pk='p,c,v', where='p IS NOT NULL and c IS NOT NULL and v = 44') as mv:
cql.execute(f"INSERT INTO {table} (p,c,v) VALUES (42,43,44)")
cql.execute(f"INSERT INTO {table} (p,c,v) VALUES (1,2,3)")
assert list(cql.execute(f"SELECT * FROM {mv}")) == [(42, 43, 44)]
# Reproducer for issue #9450 - when a view's key column name is a (quoted)
# keyword, writes used to fail because they generated internally broken CQL
# with the column name not quoted.
def test_mv_quoted_column_names(cql, test_keyspace):
for colname in ['"dog"', '"Dog"', 'DOG', '"to"', 'int']:
with new_test_table(cql, test_keyspace, f'p int primary key, {colname} int') as table:
with new_materialized_view(cql, table, '*', f'{colname}, p', f'{colname} is not null and p is not null') as mv:
cql.execute(f'INSERT INTO {table} (p, {colname}) values (1, 2)')
# Validate that not only the write didn't fail, it actually
# write the right thing to the view. NOTE: on a single-node
# Scylla, view update is synchronous so we can just read and
# don't need to wait or retry.
assert list(cql.execute(f'SELECT * from {mv}')) == [(2, 1)]
# Same as test_mv_quoted_column_names above (reproducing issue #9450), just
# check *view building* - i.e., pre-existing data in the base table that
# needs to be copied to the view. The view building cannot return an error
# to the user, but can fail to write the desired data into the view.
def test_mv_quoted_column_names_build(cql, test_keyspace):
for colname in ['"dog"', '"Dog"', 'DOG', '"to"', 'int']:
with new_test_table(cql, test_keyspace, f'p int primary key, {colname} int') as table:
cql.execute(f'INSERT INTO {table} (p, {colname}) values (1, 2)')
with new_materialized_view(cql, table, '*', f'{colname}, p', f'{colname} is not null and p is not null') as mv:
# When Scylla's view builder fails as it did in issue #9450,
# there is no way to tell this state apart from a view build
# that simply hasn't completed (besides looking at the logs,
# which we don't). This means, unfortunately, that a failure
# of this test is slow - it needs to wait for a timeout.
start_time = time.time()
while time.time() < start_time + 30:
if list(cql.execute(f'SELECT * from {mv}')) == [(2, 1)]:
break
assert list(cql.execute(f'SELECT * from {mv}')) == [(2, 1)]

View File

@@ -28,7 +28,7 @@ from util import unique_name, random_string, new_test_table
@pytest.fixture(scope="module")
def table1(cql, test_keyspace):
table = test_keyspace + "." + unique_name()
cql.execute(f"CREATE TABLE {table} (p text, c text, v text, primary key (p, c))")
cql.execute(f"CREATE TABLE {table} (p text, c text, v text, i int, s set<int>, m map<int, int>, primary key (p, c))")
yield table
cql.execute("DROP TABLE " + table)
@@ -188,3 +188,65 @@ def test_empty_string_key2(cql, test_keyspace):
cql.execute(f"INSERT INTO {table} (p1,p2,c,v) VALUES ('', '', '', 'cat')")
cql.execute(f"INSERT INTO {table} (p1,p2,c,v) VALUES ('x', 'y', 'z', 'dog')")
assert list(cql.execute(f"SELECT v FROM {table} WHERE p1='' AND p2='' AND c=''")) == [('cat',)]
# Cassandra considers the null subscript 'm[null]' to be an invalid request.
# In Scylla we decided to it differently (we think better): m[null] is simply
# a null, so the filter 'WHERE m[null] = 3' is not an error - it just doesn't
# match anything. This is more consistent with our usual null handling (null[2]
# and null < 2 are both defined as returning null), and will also allow us
# in the future to support non-constant subscript - for example m[a] where
# the column a can be null for some rows and non-null for other rows.
# Before we implemented the above decision, we had multiple bugs in this case,
# resulting in bizarre errors and even crashes (see #10361, #10399 and #10417).
#
# Because this test uses a shared table (table1), then depending on how it's
# run, it sometimes sees an empty table and sometimes a table with data
# (and null values for the map m...), so this test mixes several different
# concerns and problems. The same problems are better covered separately
# by test_filtering.py::test_filtering_with_subscript and
# test_filtering.py::test_filtering_null_map_with_subscript so this test
# should eventually be deleted.
def test_map_subscript_null(cql, table1, cassandra_bug):
assert list(cql.execute(f"SELECT p FROM {table1} WHERE m[null] = 3 ALLOW FILTERING")) == []
assert list(cql.execute(cql.prepare(f"SELECT p FROM {table1} WHERE m[?] = 3 ALLOW FILTERING"), [None])) == []
# Similarly, CONTAINS restriction with NULL should also match nothing.
# Reproduces #10359.
@pytest.mark.xfail(reason="Issue #10359")
def test_filtering_contains_null(cassandra_bug, cql, table1):
p = unique_key_string()
cql.execute(f"INSERT INTO {table1} (p,c,s) VALUES ('{p}', '1', {{1, 2}})")
cql.execute(f"INSERT INTO {table1} (p,c,s) VALUES ('{p}', '2', {{3, 4}})")
cql.execute(f"INSERT INTO {table1} (p,c) VALUES ('{p}', '3')")
assert list(cql.execute(f"SELECT c FROM {table1} WHERE p='{p}' AND s CONTAINS NULL ALLOW FILTERING")) == []
# Similarly, CONTAINS KEY restriction with NULL should also match nothing.
# Reproduces #10359.
@pytest.mark.xfail(reason="Issue #10359")
def test_filtering_contains_key_null(cassandra_bug, cql, table1):
p = unique_key_string()
cql.execute(f"INSERT INTO {table1} (p,c,m) VALUES ('{p}', '1', {{1: 2}})")
cql.execute(f"INSERT INTO {table1} (p,c,m) VALUES ('{p}', '2', {{3: 4}})")
cql.execute(f"INSERT INTO {table1} (p,c) VALUES ('{p}', '3')")
assert list(cql.execute(f"SELECT c FROM {table1} WHERE p='{p}' AND m CONTAINS KEY NULL ALLOW FILTERING")) == []
# The above tests test_filtering_eq_null and test_filtering_inequality_null
# have WHERE x=NULL or x>NULL where "x" is a regular column. Such a
# comparison requires ALLOW FILTERING for non-NULL parameters, so we also
# require it for NULL. Unlike the previous tests, this one also passed on
# Cassandra.
def test_filtering_null_comparison_no_filtering(cql, table1):
with pytest.raises(InvalidRequest, match='ALLOW FILTERING'):
cql.execute(f"SELECT c FROM {table1} WHERE p='x' AND i=NULL")
with pytest.raises(InvalidRequest, match='ALLOW FILTERING'):
cql.execute(f"SELECT c FROM {table1} WHERE p='x' AND i>NULL")
with pytest.raises(InvalidRequest, match='ALLOW FILTERING'):
cql.execute(f"SELECT c FROM {table1} WHERE p='x' AND i>=NULL")
with pytest.raises(InvalidRequest, match='ALLOW FILTERING'):
cql.execute(f"SELECT c FROM {table1} WHERE p='x' AND i<NULL")
with pytest.raises(InvalidRequest, match='ALLOW FILTERING'):
cql.execute(f"SELECT c FROM {table1} WHERE p='x' AND i<=NULL")
with pytest.raises(InvalidRequest, match='ALLOW FILTERING'):
cql.execute(f"SELECT c FROM {table1} WHERE p='x' AND s CONTAINS NULL")
with pytest.raises(InvalidRequest, match='ALLOW FILTERING'):
cql.execute(f"SELECT c FROM {table1} WHERE p='x' AND m CONTAINS KEY NULL")

View File

@@ -0,0 +1,64 @@
# Copyright 2022-present ScyllaDB
#
# SPDX-License-Identifier: AGPL-3.0-or-later
#############################################################################
# Tests for scanning SELECT requests (which read many rows and/or many
# partitions).
# We have a separate test file test_filtering.py for scans which also involve
# filtering, and test_allow_filtering.py for checking when "ALLOW FILTERING"
# is needed in scan. test_secondary_index.py also contains tests for scanning
# using a secondary index.
#############################################################################
import pytest
from util import new_test_table
from cassandra.query import SimpleStatement
# Regression test for #9482
def test_scan_ending_with_static_row(cql, test_keyspace):
with new_test_table(cql, test_keyspace, "pk int, ck int, s int STATIC, v int, PRIMARY KEY (pk, ck)") as table:
stmt = cql.prepare(f"UPDATE {table} SET s = ? WHERE pk = ?")
for pk in range(100):
cql.execute(stmt, (0, pk))
statement = SimpleStatement(f"SELECT * FROM {table}", fetch_size=10)
# This will trigger an error in either processing or building the query
# results. The success criteria for this test is the query finishing
# without errors.
res = list(cql.execute(statement))
# Test that if we have multi-column restrictions on the clustering key
# and additional filtering on regular columns, both restrictions are obeyed.
# Reproduces #6200.
def test_multi_column_restrictions_and_filtering(cql, test_keyspace):
with new_test_table(cql, test_keyspace, "p int, c1 int, c2 int, r int, PRIMARY KEY (p, c1, c2)") as table:
stmt = cql.prepare(f"INSERT INTO {table} (p, c1, c2, r) VALUES (1, ?, ?, ?)")
for i in range(2):
for j in range(2):
cql.execute(stmt, [i, j, j])
assert list(cql.execute(f"SELECT c1,c2,r FROM {table} WHERE p=1 AND (c1, c2) = (0,1)")) == [(0,1,1)]
# Since in that result r=1, adding "AND r=1" should return the same
# result, and adding "AND r=0" should return nothing.
assert list(cql.execute(f"SELECT c1,c2,r FROM {table} WHERE p=1 AND (c1, c2) = (0,1) AND r=1 ALLOW FILTERING")) == [(0,1,1)]
# Reproduces #6200:
assert list(cql.execute(f"SELECT c1,c2,r FROM {table} WHERE p=1 AND (c1, c2) = (0,1) AND r=0 ALLOW FILTERING")) == []
# Test that if we have a range multi-column restrictions on the clustering key
# and additional filtering on regular columns, both restrictions are obeyed.
# Similar to test_multi_column_restrictions_and_filtering, but uses a range
# restriction on the clustering key columns.
# Reproduces #12014, the code is taken from a reproducer provided by a user.
def test_multi_column_range_restrictions_and_filtering(cql, test_keyspace):
with new_test_table(cql, test_keyspace, "pk int, ts timestamp, id int, processed boolean, PRIMARY KEY (pk, ts, id)") as table:
cql.execute(f"INSERT INTO {table} (pk, ts, id, processed) VALUES (0, currentTimestamp(), 0, true)")
cql.execute(f"INSERT INTO {table} (pk, ts, id, processed) VALUES (0, currentTimestamp(), 1, true)")
cql.execute(f"INSERT INTO {table} (pk, ts, id, processed) VALUES (0, currentTimestamp(), 2, false)")
cql.execute(f"INSERT INTO {table} (pk, ts, id, processed) VALUES (0, currentTimestamp(), 3, false)")
# This select doesn't use multi-column restrictions, the result shouldn't change when it does.
rows1 = list(cql.execute(f"SELECT id, processed FROM {table} WHERE pk = 0 AND ts >= 0 AND processed = false ALLOW FILTERING"))
assert rows1 == [(2, False), (3, False)]
# Reproduces #12014
rows2 = list(cql.execute(f"SELECT id, processed FROM {table} WHERE pk = 0 AND (ts, id) >= (0, 0) AND processed = false ALLOW FILTERING"))
assert rows1 == rows2

View File

@@ -102,6 +102,19 @@ def new_materialized_view(cql, table, select, pk, where):
finally:
cql.execute(f"DROP MATERIALIZED VIEW {mv}")
# A utility function for creating a new temporary secondary index of
# an existing table.
@contextmanager
def new_secondary_index(cql, table, column, name='', extra=''):
keyspace = table.split('.')[0]
if not name:
name = unique_name()
cql.execute(f"CREATE INDEX {name} ON {table} ({column}) {extra}")
try:
yield f"{keyspace}.{name}"
finally:
cql.execute(f"DROP INDEX {keyspace}.{name}")
def project(column_name_string, rows):
"""Returns a list of column values from each of the rows."""
return [getattr(r, column_name_string) for r in rows]

Some files were not shown because too many files have changed in this diff Show More