Compare commits

..

162 Commits

Author SHA1 Message Date
Beni Peled
5c5a9633ea release: prepare for 5.1.7 2023-03-12 08:25:57 +02:00
Jan Ciolek
53b6e720f6 cql3: preserve binary_operator.order in search_and_replace
There was a bug in `expr::search_and_replace`.
It doesn't preserve the `order` field of binary_operator.

`order` field is used to mark relations created
using the SCYLLA_CLUSTERING_BOUND.
It is a CQL feature used for internal queries inside Scylla.
It means that we should handle the restriction as a raw
clustering bound, not as an expression in the CQL language.

Losing the SCYLLA_CLUSTERING_BOUND marker could cause issues,
the database could end up selecting the wrong clustering ranges.

Fixes: #13055

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #13056

(cherry picked from commit aa604bd935)
2023-03-09 12:52:50 +02:00
Botond Dénes
46d6145b37 sstables/sstable: validate_checksums(): force-check EOF
EOF is only guarateed to be set if one tried to read past the end of the
file. So when checking for EOF, also try to read some more. This
should force the EOF flag into a correct value. We can then check that
the read yielded 0 bytes.
This should ensure that `validate_checksums()` will not falsely declare
the validation to have failed.

Fixes: #11190

Closes #12696

(cherry picked from commit 693c22595a)
2023-03-09 12:31:00 +02:00
Wojciech Mitros
772ac59299 functions: initialize aggregates on scylla start
Currently, UDAs can't be reused if Scylla has been
restarted since they have been created. This is
caused by the missing initialization of saved
UDAs that should have inserted them to the
cql3::functions::functions::_declared map, that
should store all (user-)created functions and
aggregates.

This patch adds the missing implementation in a way
that's analogous to the method of inserting UDF to
the _declared map.

Fixes #11309

(cherry picked from commit e558c7d988)
2023-03-09 12:21:07 +02:00
Tomasz Grabiec
1f334e48b2 row_cache: Destroy coroutine under region's allocator
The reason is alloc-dealloc mismatch of position_in_partition objects
allocated by cursors inside coroutine object stored in the update
variable in row_cache::do_update()

It is allocated under cache region, but in case of exception it will
be destroyed under the standard allocator. If update is successful, it
will be cleared under region allocator, so there is not problem in the
normal case.

Fixes #12068

Closes #12233

(cherry picked from commit 992a73a861)
2023-03-08 20:53:24 +02:00
Gleb Natapov
389050e421 lwt: do not destroy capture in upgrade_if_needed lambda since the lambda is used more then once
If on the first call the capture is destroyed the second call may crash.

Fixes: #12958

Message-Id: <Y/sks73Sb35F+PsC@scylladb.com>
(cherry picked from commit 1ce7ad1ee6)
2023-03-08 18:52:01 +02:00
Anna Stuchlik
2152b765a6 doc: Update the documentation landing page
This commit makes the following changes to the docs landing page:

- Adds the ScyllaDB enterprise docs as one of three tiles.

- Modifies the three tiles to reflect the three flavors of ScyllaDB.

- Moves the "New to ScyllaDB? Start here!" under the page title.

- Renames "Our Products" to "Other Products" to list the products other
  than ScyllaDB itself. In addtition, the boxes are enlarged from to
  large-4 to look better.

The major purpose of this commit is to expose the ScyllaDB
documentation.

docs: fix the link
(cherry picked from commit 27bb8c2302)

Closes #13086
2023-03-06 14:19:30 +02:00
Pavel Emelyanov
d43b6db152 azure_snitch: Handle empty zone returned from IMDS
Azure metadata API may return empty zone sometimes. If that happens
shard-0 gets empty string as its rack, but propagates UNKNOWN_RACK to
other shards.

Empty zones response should be handled regardless.

refs: #12185

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12274

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-02 09:16:02 +03:00
Pavel Emelyanov
a514e60e65 snitch: Check http response codes to be OK
Several snitch drivers make http requests to get
region/dc/zone/rack/whatever from the cloud provider. They blindly rely
on the response being successfull and read response body to parse the
data they need from.

That's not nice, add checks for requests finish with http OK statuses.

refs: #12185

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12287

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-02 09:16:02 +03:00
Anna Stuchlik
6eb70caba9 doc: fixes https://github.com/scylladb/scylladb/issues/12954, adds the minimal version from which the 2021.1-to-2022.1 upgrade is supported for Ubuntu, Debian, and image
Closes #12974

(cherry picked from commit 91b611209f)
2023-02-28 13:03:05 +02:00
Botond Dénes
dd094f1230 types: unserialize_value for multiprecision_int,bool: don't read uninitialized memory
Check the first fragment before dereferencing it, the fragment might be
empty, in which case move to the next one.
Found by running range scan tests with random schema and random data.

Fixes: #12821
Fixes: #12823
Fixes: #12708

Closes #12824

(cherry picked from commit ef548e654d)
2023-02-23 22:38:24 +02:00
Yaron Kaikov
530600a646 release: prepare for 5.1.6 2023-02-23 14:28:29 +02:00
Kefu Chai
c8e5f8c66b tools/schema_loader: do not return ref to a local variable
we should never return a reference to local variable.
so in this change, a reference to a static variable is returned
instead. this should address following warning from Clang 17:

```
/home/kefu/dev/scylladb/tools/schema_loader.cc:146:16: error: returning reference to local temporary object [-Werror,-Wreturn-stack-address]
        return {};
               ^~
```

Fixes #12875
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12876

(cherry picked from commit 6eab8720c4)
2023-02-22 22:02:51 +02:00
Gleb Natapov' via ScyllaDB development
ca7a13cad2 lwt: upgrade stored mutations to the latest schema during prepare
Currently they are upgraded during learn on a replica. The are two
problems with this.  First the column mapping may not exist on a replica
if it missed this particular schema (because it was down for instance)
and the mapping history is not part of the schema. In this case "Failed
to look up column mapping for schema version" will be thrown. Second lwt
request coordinator may not have the schema for the mutation as well
(because it was freed from the registry already) and when a replica
tries to retrieve the schema from the coordinator the retrieval will fail
causing the whole request to fail with "Schema version XXXX not found"

Both of those problems can be fixed by upgrading stored mutations
during prepare on a node it is stored at. To upgrade the mutation its
column mapping is needed and it is guarantied that it will be present
at the node the mutation is stored at since it is pre-request to store
it that the corresponded schema is available. After that the mutation
is processed using latest schema that will be available on all nodes.

Fixes #10770

Message-Id: <Y7/ifraPJghCWTsq@scylladb.com>
(cherry picked from commit 15ebd59071)
2023-02-22 21:58:30 +02:00
Tomasz Grabiec
d401551020 db: Fix trim_clustering_row_ranges_to() for non-full keys and reverse order
trim_clustering_row_ranges_to() is broken for non-full keys in reverse
mode. It will trim the range to
position_in_partition_view::after_key(full_key) instead of
position_in_partition_view::before_key(key), hence it will include the
key in the resulting range rather than exclude it.

Fixes #12180
Refs #1446

(cherry picked from commit 536c0ab194)
2023-02-22 21:52:46 +02:00
Tomasz Grabiec
d952bf4035 types: Fix comparison of frozen sets with empty values
A frozen set can be part of the clustering key, and with compact
storage, the corresponding key component can have an empty value.

Comparison was not prepared for this, the iterator attempts to
deserialize the item count and will fail if the value is empty.

Fixes #12242

(cherry picked from commit 232ce699ab)
2023-02-22 21:44:37 +02:00
Michał Chojnowski
b346136e98 utils: config_file: fix handling of workdir,W in the YAML file
Option names given in db/config.cc are handled for the command line by passing
them to boost::program_options, and by YAML by comparing them with YAML
keys.
boost::program_options has logic for understanding the
long_name,short_name syntax, so for a "workdir,W" option both --workdir and -W
worked, as intended. But our YAML config parsing doesn't have this logic
and expected "workdir,W" verbatim, which is obviously not intended. Fix that.

Fixes #7478
Fixes #9500
Fixes #11503

Closes #11506

(cherry picked from commit af7ace3926)
2023-02-22 21:33:04 +02:00
Takuya ASADA
87e267213d scylla_coredump_setup: fix coredump timeout settings
We currently configure only TimeoutStartSec, but probably it's not
enough to prevent coredump timeout, since TimeoutStartSec is maximum
waiting time for service startup, and there is another directive to
specify maximum service running time (RuntimeMaxSec).

To fix the problem, we should specify RunTimeMaxSec and TimeoutSec (it
configures both TimeoutStartSec and TimeoutStopSec).

Fixes #5430

Closes #12757

(cherry picked from commit bf27fdeaa2)
2023-02-19 21:13:59 +02:00
Botond Dénes
5bda9356d5 Merge 'doc: fix the service name from "scylla-enterprise-server" "to "scylla-server"' from Anna Stuchlik
Related https://github.com/scylladb/scylladb/issues/12658.

This issue fixes the bug in the upgrade guides for the released versions.

Closes #12679

* github.com:scylladb/scylladb:
  doc: fix the service name in the upgrade guide for patch releases versions 2022
  doc: fix the service name in the upgrade guide from 2021.1 to 2022.1

(cherry picked from commit 325246ab2a)
2023-02-17 12:20:26 +02:00
Botond Dénes
f9fe48ad89 Merge 'Backport compaction-backlog-tracker fixes to branch-5.1' from Raphael "Raph" Carvalho
Both patches are important to fix inefficiencies when updating the backlog tracker, which can manifest as a reactor stall, on a special event like schema change.

A simple conflict was resolved in the first patch, since master has compaction groups. It was very easy to resolve.

Regression since 1d9f53c881, which is present in 5.1 onwards. So probably it merits a backport to 5.2 too.

Closes #12769

* github.com:scylladb/scylladb:
  compaction: Fix inefficiency when updating LCS backlog tracker
  table: Fix quadratic behavior when inserting sstables into tracker on schema change
2023-02-15 07:26:00 +02:00
Raphael S. Carvalho
0c9a0faf0d compaction: Fix inefficiency when updating LCS backlog tracker
LCS backlog tracker uses STCS tracker for L0. Turns out LCS tracker
is calling STCS tracker's replace_sstables() with empty arguments
even when higher levels (> 0) *only* had sstables replaced.
This unnecessary call to STCS tracker will cause it to recompute
the L0 backlog, yielding the same value as before.

As LCS has a fragment size of 0.16G on higher levels, we may be
updating the tracker multiple times during incremental compaction,
which operates on SSTables on higher levels.

Inefficiency is fixed by only updating the STCS tracker if any
L0 sstable is being added or removed from the table.

This may be fixing a quadratic behavior during boot or refresh,
as new sstables are loaded one by one.
Higher levels have a substantial higher number of sstables,
therefore updating STCS tracker only when level 0 changes, reduces
significantly the number of times L0 backlog is recomputed.

Refs #12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12676

(cherry picked from commit 1b2140e416)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-07 12:45:34 -03:00
Raphael S. Carvalho
47dcfd866c table: Fix quadratic behavior when inserting sstables into tracker on schema change
Each time backlog tracker is informed about a new or old sstable, it
will recompute the static part of backlog which complexity is
proportional to the total number of sstables.
On schema change, we're calling backlog_tracker::replace_sstables()
for each existing sstable, therefore it produces O(N ^ 2) complexity.

Fixes #12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12593

(cherry picked from commit 87ee547120)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-07 12:43:30 -03:00
Beni Peled
5c9ecd5604 release: prepare for 5.1.5 2023-02-06 14:47:49 +02:00
Anna Stuchlik
1a37b85d14 docs: fix the option name from compaction to compression on the Data Definition page
Fixes the option name in the "Other table options" table on the Data Definition page.

Fixes #12334

Closes #12382

(cherry picked from commit ea7e23bf92)
2023-02-05 20:06:31 +02:00
Botond Dénes
5070ddb723 sstables: track decompressed buffers
Convert decompressed temporary buffers into tracked buffers just before
returning them to the upper layer. This ensures these buffers are known
to the reader concurrency semaphore and it has an accurate view of the
actual memory consumption of reads.

Fixes: #12448

Closes #12454

(cherry picked from commit c4688563e3)
2023-02-05 20:06:31 +02:00
Tomasz Grabiec
7480af58e5 row_cache: Fix violation of the "oldest version are evicted first" when evicting last dummy
Consider the following MVCC state of a partition:

   v2: ==== <7> [entry2] ==== <9> ===== <last dummy>
   v1: ================================ <last dummy> [entry1]

Where === means a continuous range and --- means a discontinuous range.

After two LRU items are evicted (entry1 and entry2), we will end up with:

   v2: ---------------------- <9> ===== <last dummy>
   v1: ================================ <last dummy> [entry1]

This will cause readers to incorrectly think there are no rows before
entry <9>, because the range is continuous in v1, and continuity of a
snapshot is a union of continuous intervals in all versions. The
cursor will see the interval before <9> as continuous and the reader
will produce no rows.

This is only temporary, because current MVCC merging rules are such
that the flag on the latest entry wins, so we'll end up with this once
v1 is no longer needed:

   v2: ---------------------- <9> ===== <last dummy>

...and the reader will go to sstables to fetch the evicted rows before
entry <9>, as expected.

The bug is in rows_entry::on_evicted(), which treats the last dummy
entry in a special way, and doesn't evict it, and doesn't clear the
continuity by omission.

The situation is not easy to trigger because it requires certain
eviction pattern concurrent with multiple reads of the same partition
in different versions, so across memtable flushes.

Closes #12452

(cherry-picked from commit f97268d8f2)

Fixes #12451.
2023-02-05 20:06:31 +02:00
Raphael S. Carvalho
43d46a241f compaction: LCS: don't reshape all levels if only a single breaks disjointness
LCS reshape is compacting all levels if a single one breaks
disjointness. That's unnecessary work because rewriting that single
level is enough to restore disjointness. If multiple levels break
disjointness, they'll each be reshaped in its own iteration, so
reducing operation time for each step and disk space requirement,
as input files can be released incrementally.
Incremental compaction is not applied to reshape yet, so we need to
avoid "major compaction", to avoid the space overhead.
But space overhead is not the only problem, the inefficiency, when
deciding what to reshape when overlapping is detected, motivated
this patch.

Fixes #12495.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12496

(cherry picked from commit f2f839b9cc)
2023-02-05 20:06:31 +02:00
Wojciech Mitros
3807020a7b forward_service: prevent heap use-after-free of forward_aggregates
Currently, we create `forward_aggregates` inside a function that
returns the result of a future lambda that captures these aggregates
by reference. As a result, the aggregates may be destructed before
the lambda finishes, resulting in a heap use-after-free.

To prolong the lifetime of these aggregates, we cannot use a move
capture, because the lambda is wrapped in a with_thread_if_needed()
call on these aggregates. Instead, we fix this by wrapping the
entire return statement in a do_with().

Fixes #12528

Closes #12533

(cherry picked from commit 5f45b32bfa)
2023-02-05 20:06:31 +02:00
Botond Dénes
d6fb20f30e types: is_tuple(): handle reverse types
Currently reverse types match the default case (false), even though they
might be wrapping a tuple type. One user-visible effect of this is that
a schema, which has a reversed<frozen<UDT>> clustering key component,
will have this component incorrectly represented in the schema cql dump:
the UDT will loose the frozen attribute. When attempting to recreate
this schema based on the dump, it will fail as the only frozen UDTs are
allowed in primary key components.

Fixes: #12576

Closes #12579

(cherry picked from commit ebc100f74f)
2023-02-05 20:06:31 +02:00
Calle Wilund
33c20eebe6 alterator::streams: Sort tables in list_streams to ensure no duplicates
Fixes #12601 (maybe?)

Sort the set of tables on ID. This should ensure we never
generate duplicates in a paged listing here. Can obviously miss things if they
are added between paged calls and end up with a "smaller" UUID/ARN, but that
is to be expected.

(cherry picked from commit da8adb4d26)
2023-02-05 20:06:28 +02:00
Benny Halevy
f3a6af663d view: row_lock: lock_ck: find or construct row_lock under partition lock
Since we're potentially searching the row_lock in parallel to acquiring
the read_lock on the partition, we're racing with row_locker::unlock
that may erase the _row_locks entry for the same clustering key, since
there is no lock to protect it up until the partition lock has been
acquired and the lock_partition future is resolved.

This change moves the code to search for or allocate the row lock
_after_ the partition lock has been acquired to make sure we're
synchronously starting the read/write lock function on it, without
yielding, to prevent this use-after-free.

This adds an allocation for copying the clustering key in advance
even if a row_lock entry already exists, that wasn't needed before.
It only us slows down (a bit) when there is contention and the lock
already existed when we want to go locking. In the fast path there
is no contention and then the code already had to create the lock
and copy the key. In any case, the penalty of copying the key once
is tiny compared to the rest of the work that view updates are doing.

This is required on top of 5007ded2c1 as
seen in https://github.com/scylladb/scylladb/issues/12632
which is closely related to #12168 but demonstrates a different race
causing use-after-free.

Fixes #12632

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4b5e324ecb)
2023-02-05 17:38:29 +02:00
Anna Stuchlik
1685af9829 docs: fixes https://github.com/scylladb/scylladb/issues/12654, update the links to the Download Center
Closes #12655

(cherry picked from commit 64cc4c8515)
2023-02-05 17:20:45 +02:00
Anna Stuchlik
bb880c7658 doc: fixes https://github.com/scylladb/scylladb/issues/12672, fix the redirects to the Cloud docs
Closes #12673

(cherry picked from commit 2be131da83)
2023-02-05 17:17:46 +02:00
Kefu Chai
f952d397e8 cql3/selection: construct string_view using char* not size
before this change, we construct a sstring from a comma statement,
which evaluates to the return value of `name.size()`, but what we
expect is `sstring(const char*, size_t)`.

in this change

* instead of passing the size of the string_view,
  both its address and size are used
* `std::string_view` is constructed instead of sstring, for better
  performance, as we don't need to perform a deep copy

the issue is reported by GCC-13:

```
In file included from cql3/selection/selectable.cc:11:
cql3/selection/field_selector.hh:83:60: error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result]
        auto sname = sstring(reinterpret_cast<const char*>(name.begin(), name.size()));
                                                           ^~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12666

(cherry picked from commit 186ceea009)

Fixes #12739.

(cherry picked from commit b588b19620)
2023-02-05 13:51:22 +02:00
Michał Chojnowski
5e88421360 commitlog: fix total_size_on_disk accounting after segment file removal
Currently, segment file removal first calls `f.remove_file()` and
does `total_size_on_disk -= f.known_size()` later.
However, `remove_file()` resets `known_size` to 0, so in effect
the freed space in not accounted for.

`total_size_on_disk` is not just a metric. It is also responsible
for deciding whether a segment should be recycled -- it is recycled
only if `total_size_on_disk - known_size < max_disk_size`.
Therefore this bug has dire performance consequences:
if `total_size_on_disk - known_size` ever exceeds `max_disk_size`,
the recycling of commitlog segments will stop permanently, because
`total_size_on_disk - known_size` will never go back below
`max_disk_size` due to the accounting bug. All new segments from this
point will be allocated from scratch.

The bug was uncovered by a QA performance test. It isn't easy to trigger --
it took the test 7 hours of constant high load to step into it.
However, the fact that the effect is permanent, and degrades the
performance of the cluster silently, makes the bug potentially quite severe.

The bug can be easily spotted with Prometheus as infinitely rising
`commitlog_total_size_on_disk` on the affected shards.

Fixes #12645

Closes #12646

(cherry picked from commit fa7e904cd6)
2023-02-01 21:54:52 +02:00
Kamil Braun
1945102ca0 docs: fix problems with Raft documentation
Fix some problems in the documentation, e.g. it is not possible to
enable Raft in an existing cluster in 5.0, but the documentation claimed
that it is.

(cherry picked from commit 1cc68b262e)

Cherry-pick note: the original commit added a lot of new stuff like
describing the Raft upgrade procedure, but also fixed problems with the
existing documentation. In this backport we include only the latter.

Closes #12582
2023-01-24 13:35:24 +02:00
Anna Mikhlin
be3f6f8c7b release: prepare for 5.1.4 2023-01-22 15:29:48 +02:00
Nadav Har'El
94735f63a3 Merge 'doc: add the upgrade guide for ScyllaDB 5.1 to ScyllaDB Enterprise 2022.2' from Anna Stuchlik
Fix https://github.com/scylladb/scylladb/issues/12315

This PR adds the upgrade guide from ScyllaDB 5.1 to ScyllaDB Enterprise 2022.2.
Instead of adding separate guides per platform, I've merged the information to create one platform-agnostic guide, similar to what we did for [OSS->OSS](https://docs.scylladb.com/stable/upgrade/upgrade-opensource/upgrade-guide-from-5.0-to-5.1/) and [Enterprise->Enterprise ](https://github.com/scylladb/scylladb/pull/12339)guides.

Closes #12450

* github.com:scylladb/scylladb:
  doc: add the new upgrade guide to the toctree and fix its name
  docs: add the upgrade guide from ScyllaDB 5.1 to ScyllaDB Enterprise 2022.2

(cherry picked from commit 7192283172)
2023-01-20 14:29:59 +01:00
Michał Sala
b0d28919c0 forward_service: fix timeout support in parallel aggregates
`forward_request` verb carried information about timeouts using
`lowres_clock::time_point` (that came from local steady clock
`seastar::lowres_clock`). The time point was produced on one node and
later compared against other node `lowres_clock`. That behavior
was wrong (`lowres_clock::time_point`s produced with different
`lowres_clock`s cannot be compared) and could lead to delayed or
premature timeout.

To fix this issue, `lowres_clock::time_point` was replaced with
`lowres_system_clock::time_point` in `forward_request` verb.
Representation to which both time point types serialize is the same
(64-bit integer denoting the count of elapsed nanoseconds), so it was
possible to do an in-place switch of those types using logic suggested
by @avikivity:
    - using steady_clock is just broken, so we aren't taking anything
        from users by breaking it further
    - once all nodes are upgraded, it magically starts to work

Closes #12529

(cherry picked from commit bbbe12af43)

Fixes #12458
2023-01-18 14:09:37 +02:00
Anna Mikhlin
addc4666d5 release: prepare for 5.1.3 2023-01-12 15:51:01 +02:00
Botond Dénes
a14ffbd5e2 Merge 'Backport 5.1 cleanup compaction flush memtable' from Benny Halevy
This a backport of 9fa1783892 (#11902) to branch-5.1

Flush the memtable before cleaning up the table so not to leave any disowned tokens in the memtable
as they might be resurrected if left in the memtable.

Refs #1239

Closes #12490

* github.com:scylladb/scylladb:
  table: perform_cleanup_compaction: flush memtable
  table: add perform_cleanup_compaction
  api: storage_service: add logging for compaction operations et al
2023-01-11 08:03:35 +02:00
Benny Halevy
ea56ecace0 table: perform_cleanup_compaction: flush memtable
We don't explicitly cleanup the memtable, while
it might hold tokens disowned by the current node.

Flush the memtable before performing cleanup compaction
to make sure all tokens in the memtable are cleaned up.

Note that non-owned ranges are invalidate in the cache
in compaction_group::update_main_sstable_list_on_compaction_completion
using desc.ranges_for_cache_invalidation.

\Fixes #1239

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit eb3a94e2bc)
2023-01-11 07:55:43 +02:00
Benny Halevy
fe8d8f97e2 table: add perform_cleanup_compaction
Move the integration with compaction_manager
from the api layer to the tabel class so
it can also make sure the memtable is cleaned up in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit fc278be6c4)
2023-01-11 07:51:29 +02:00
Benny Halevy
44e920cbb0 api: storage_service: add logging for compaction operations et al
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from 85523c45c0)
2023-01-11 07:46:09 +02:00
Michał Chojnowski
b114551d53 configure: don't reduce parsers' optimization level to 1 in release
The line modified in this patch was supposed to increase the
optimization levels of parsers in debug mode to 1, because they
were too slow otherwise. But as a side effect, it also reduced the
optimization level in release mode to 1. This is not a problem
for the CQL frontend, because statement preparation is not
performance-sensitive, but it is a serious performance problem
for Alternator, where it lies in the hot path.

Fix this by only applying the -O1 to debug modes.

Fixes #12463

Closes #12460

(cherry picked from commit 08b3a9c786)
2023-01-08 01:34:56 +02:00
Nadav Har'El
099145fe9a materialized view: fix bug in some large modifications to base partitions
Sometimes a single modification to a base partition requires updates to
a large number of view rows. A common example is deletion of a base
partition containing many rows. A large BATCH is also possible.

To avoid large allocations, we split the large amount of work into
batch of 100 (max_rows_for_view_updates) rows each. The existing code
assumed an empty result from one of these batches meant that we are
done. But this assumption was incorrect: There are several cases when
a base-table update may not need a view update to be generated (see
can_skip_view_updates()) so if all 100 rows in a batch were skipped,
the view update stopped prematurely. This patch includes two tests
showing when this bug can happen - one test using a partition deletion
with a USING TIMESTAMP causing the deletion to not affect the first
100 rows, and a second test using a specially-crafed large BATCH.
These use cases are fairly esoteric, but in fact hit a user in the
wild, which led to the discovery of this bug.

The fix is fairly simple: To detect when build_some() is done it is no
longer enough to check if it returned zero view-update rows; Rather,
it explicitly returns whether or not it is done as an std::optional.

The patch includes several tests for this bug, which pass on Cassandra,
failed on Scylla before this patch, and pass with this patch.

Fixes #12297.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12305

(cherry picked from commit 92d03be37b)
2023-01-04 10:05:18 +02:00
Avi Kivity
0bdbce90f4 Merge 'reader_concurrency_semaphore: fix waiter/inactive race' from Botond Dénes
We recently (in 7fbad8de87) made sure all admission paths can trigger the eviction of inactive reads. As reader eviction happens in the background, a mechanism was added to make sure only a single eviction fiber was running at any given time. This mechanism however had a preemption point between stopping the fiber and releasing the evict lock. This gave an opportunity for either new waiters or inactive readers to be added, without the fiber acting on it. Since it still held onto the lock, it also prevented from other eviction fibers to start. This could create a situation where the semaphore could admit new reads by evicting inactive ones, but it still has waiters. Since an empty waitlist is also an admission criteria, once one waiter is wrongly added, many more can accumulate.
This series fixes this by ensuring the lock is released in the instant the fiber decides there is no more work to do.
It also fixes the assert failure on recursive eviction and adds a detection to the inactive/waiter contradiction.

Fixes: #11923
Refs: #11770

Closes #12026

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: do_wait_admission(): detect admission-waiter anomaly
  reader_concurrency_semaphore: evict_readers_in_the_background(): eliminate blind spot
  reader_concurrency_semaphore: do_detach_inactive_read(): do a complete detach

(cherry picked from commit 15ee8cfc05)
2023-01-03 16:45:51 +02:00
Botond Dénes
8ffdb8546b reader_concurrency_semaphore: unify admission logic across all paths
The semaphore currently has two admission paths: the
obtain_permit()/with_permit() methods which admits permits on user
request (the front door) and the maybe_admit_waiters() which admits
permits based on internal events like memory resource being returned
(the back door). The two paths used their own admission conditions
and naturally this means that they diverged in time. Notably,
maybe_admit_waiters() did not look at inactive readers assuming that if
there are waiters there cannot be inactive readers. This is not true
however since we merged the execution-stage into the semaphore. Waiters
can queue up even when there are inactive reads and thus
maybe_admit_waiters() has to consider evicting some of them to see if
this would allow for admitting new reads.
To avoid such divergence in the future, the admission logic was moved
into a new method can_admit_read() which is now shared between the two
method families. This method now checks for the possibility of evicting
inactive readers as well.
The admission logic was tuned slightly to only consider evicting
inactive readers if there is a real possibility that this will result
in admissions: notably, before this patch, resource availability was
checked before stalls were (used permits == blocked permits), so we
could evict readers even if this couldn't help.
Because now eviction can be started from maybe_admit_waiters(), which is
also downstream from eviction, we added a flag to avoid recursive
evict -> maybe admit -> evict ... loops.

Fixes: #11770

Closes #11784

(cherry picked from commit 7fbad8de87)
2023-01-03 16:45:17 +02:00
Takuya ASADA
db382697f1 scylla_setup: fix incorrect type definition on --online-discard option
--online-discard option defined as string parameter since it doesn't
specify "action=", but has default value in boolean (default=True).
It breaks "provisioning in a similar environment" since the code
supposed boolean value should be "action='store_true'" but it's not.

We should change the type of the option to int, and also specify
"choices=[0, 1]" just like --io-setup does.

Fixes #11700

Closes #11831

(cherry picked from commit acc408c976)
2022-12-28 20:44:02 +02:00
Petr Gusev
3c02e5d263 cql: batch statement, inserting a row with a null key column should be forbidden
Regular INSERT statements with null values for primary key
components are rejected by Scylla since #9286 and #9314.
Batch statements missed a similar check, this patch
fixes it.

Fixes: #12060
(cherry picked from commit 7730c4718e)
2022-12-28 18:15:40 +02:00
Anna Mikhlin
4c0f7ea098 release: prepare for 5.1.2 2022-12-25 20:53:22 +02:00
Botond Dénes
c14a0340ca mutation_compactor: reset stop flag on page start
When the mutation compactor has all the rows it needs for a page, it
saves the decision to stop in a member flag: _stop.
For single partition queries, the mutation compactor is kept alive
across pages and so it has a method, start_new_page() to reset its state
for the next page. This method didn't clear the _stop flag. This meant
that the value set at the end of the previous could cause the new page
and subsequently the entire query to be stopped prematurely.
This can happen if the new page starts with a row that is covered by a
higher level tombstone and is completely empty after compaction.
Reset the _stop flag in start_new_page() to prevent this.

This commit also adds a unit test which reproduces the bug.

Fixes: #12361

Closes #12384

(cherry picked from commit b0d95948e1)
2022-12-25 09:45:30 +02:00
Botond Dénes
aa523141f9 Merge 'Backport Alternator TTL tests' from Nadav Har'El
This series backports several patches which add or enable tests for  Alternator TTL. The series does not touch the code - just tests.
The goal of backporting more tests is to get the code - which is already in branch 5.1 - tested. It wasn't a good idea to backport code without backporting the tests for it.

Closes #12200
Fixes #11374

* github.com:scylladb/scylladb:
  test/alternator: increase timeout on TTL tests
  test/alternator: fix timeout in flaky test test_ttl_stats
  test/alternator: test Alternator TTL metrics
  test/alternator: skip fewer Alternator TTL tests
2022-12-22 09:51:46 +02:00
Michał Chojnowski
86240d6344 sstables: index_reader: always evict the local cache gently
Due to an oversight, the local index cache isn't evicted gently
when _upper_bound existed. This is a source of reactor stalls.
Fix that.

Fixes #12271

Closes #12364

(cherry picked from commit d9269abf5b)
2022-12-21 13:42:54 +02:00
Nadav Har'El
95a94a2687 Merge 'doc: fix the CQL version in the Interfaces table' from Anna Stuchlik
Fix https://github.com/scylladb/scylla-doc-issues/issues/816
Fix https://github.com/scylladb/scylla-docs/issues/1613

This PR fixes the CQL version in the Interfaces page, so that it is the same as in other places across the docs and in sync with the version reported by the ScyllaDB (see https://github.com/scylladb/scylla-doc-issues/issues/816#issuecomment-1173878487).

To make sure the same CQL version is used across the docs, we should use the `|cql-version| `variable rather than hardcode the version number on several pages.
The variable is specified in the conf.py file:
```
rst_prolog = """
.. |cql-version| replace:: 3.3.1
"""
```

Closes #11320

* github.com:scylladb/scylladb:
  doc: add the Cassandra version on which the tools are based
  doc: fix the version number
  doc: update the Enterprise version where the ME format was introduced
  doc: add the ME format to the Cassandar Compatibility page
  doc: replace Scylla with ScyllaDB
  doc: rewrite the Interfaces table to the new format to include more information about CQL support
  doc: remove the CQL version from pages other than Cassandra compatibility
  doc: fix the CQL version in the Interfaces table

(cherry picked from commit ee606a5d52)
2022-12-21 09:51:14 +02:00
Benny Halevy
9173a3d808 view: row_lock: lock_ck: serialize partition and row locking
The problematic scenario this patch fixes might happen due to
unfortunate serialization of locks/unlocks between lock_pk and lock_ck,
as follows:

    1. lock_pk acquires an exclusive lock on the partition.
    2.a lock_ck attempts to acquire shared lock on the partition
        and any lock on the row. both cases currently use a fiber
        returning a future<rwlock::holder>.
    2.b since the partition is locked, the lock_partition times out
        returning an exceptional future.  lock_row has no such problem
        and succeeds, returning a future holding a rwlock::holder,
        pointing to the row lock.
    3.a the lock_holder previously returned by lock_pk is destroyed,
        calling `row_locker::unlock`
    3.b row_locker::unlock sees that the partition is not locked
        and erases it, including the row locks it contains.
    4.a when_all_succeeds continuation in lock_ck runs.  Since
        the lock_partition future failed, it destroyes both futures.
    4.b the lock_row future is destroyed with the rwlock::holder value.
    4.c ~holder attempts to return the semaphore units to the row rwlock,
        but the latter was already destroyed in 3.b above.

Acquiring the partition lock and row lock in parallel
doesn't help anything, but it complicates error handling
as seen above,

This patch serializes acquiring the row lock in lock_ck
after locking the partition to prevent the above race.

This way, erasing the unlocked partition is never expected
to happen while any of its rows locks is held.

Fixes #12168

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12208

(cherry picked from commit 5007ded2c1)
2022-12-13 14:51:44 +02:00
Botond Dénes
7942041b95 Merge 'doc: update the 5.1 upgrade guide with the mode-related information' from Anna Stuchlik
This PR adds the link to the KB article about updating the mode after the upgrade to the 5.1 upgrade guide.
In addition, I have:
- updated the KB article to include the versions affected by that change.
- fixed the broken link to the page about metric updates (it is not related to the KB article, but I fixed it in the same PR to limit the number of PRs that need to be backported).

Related: https://github.com/scylladb/scylladb/pull/11122

Closes #12148

* github.com:scylladb/scylladb:
  doc: update the releases in the KB about updating the mode after upgrade
  doc: fix the broken link in the 5.1 upgrade guide
  doc: add the link to the 5.1-related KB article to the 5.1 upgrade guide

(cherry picked from commit 897b501ba3)
2022-12-09 07:27:50 +02:00
Anna Mikhlin
1cfedc5b59 release: prepare for 5.1.1 2022-12-08 09:41:33 +02:00
Botond Dénes
606ed61263 Merge '[branch 5.1 backport] doc: fix the notes on the OS Support by Platform and Version page' from Anna Stuchlik
This is a backport of https://github.com/scylladb/scylladb/pull/11783.

Closes #12229

* github.com:scylladb/scylladb:
  doc: replace Scylla with ScyllaDB
  doc: add a comment to remove in future versions any information that refers to previous releases
  doc: rewrite the notes to improve clarity
  doc: remove the reperitions from the notes
2022-12-07 14:10:14 +02:00
Anna Stuchlik
796e4c39f8 doc: replace Scylla with ScyllaDB
(cherry picked from commit 09b0e3f63e)
2022-12-07 13:01:34 +01:00
Anna Stuchlik
960434f784 doc: add a comment to remove in future versions any information that refers to previous releases
(cherry picked from commit 9e2b7e81d3)
2022-12-07 12:55:56 +01:00
Anna Stuchlik
05aed0417a doc: rewrite the notes to improve clarity
(cherry picked from commit fc0308fe30)
2022-12-07 12:55:17 +01:00
Anna Stuchlik
9a1fc200e1 doc: remove the reperitions from the notes
(cherry picked from commit 1bd0bc00b3)
2022-12-07 12:54:36 +01:00
Tomasz Grabiec
c7e9bbc377 Merge 'raft: server: handle aborts when waiting for config entry to commit' from Kamil Braun
Changing configuration involves two entries in the log: a 'joint
configuration entry' and a 'non-joint configuration entry'. We use
`wait_for_entry` to wait on the joint one. To wait on the non-joint one,
we use a separate promise field in `server`. This promise wasn't
connected to the `abort_source` passed into `set_configuration`.

The call could get stuck if the server got removed from the
configuration and lost leadership after committing the joint entry but
before committing the non-joint one, waiting on the promise. Aborting
wouldn't help. Fix this by subscribing to the `abort_source` in
resolving the promise exceptionally.

Furthermore, make sure that two `set_configuration` calls don't step on
each other's toes by one setting the other's promise. To do that, reset
the promise field at the end of `set_configuration` and check that it's
not engaged at the beginning.

Fixes #11288.

Closes #11325

* github.com:scylladb/scylladb:
  test: raft: randomized_nemesis_test: additional logging
  raft: server: handle aborts when waiting for config entry to commit

(cherry picked from commit 83850e247a)
2022-12-06 17:12:32 +01:00
Tomasz Grabiec
a78dac7ae9 Merge 'raft: server: drop waiters in applier_fiber instead of io_fiber' from Kamil Braun
When `io_fiber` fetched a batch with a configuration that does not
contain this node, it would send the entries committed in this batch to
`applier_fiber` and proceed by any remaining entry dropping waiters (if
the node was no longer a leader).

If there were waiters for entries committed in this batch, it could
either happen that `applier_fiber` received and processed those entries
first, notifying the waiters that the entries were committed and/or
applied, or it could happen that `io_fiber` reaches the dropping waiters
code first, causing the waiters to be resolved with
`commit_status_unknown`.

The second scenario is undesirable. For example, when a follower tries
to remove the current leader from the configuration using
`modify_config`, if the second scenario happens, the follower will get
`commit_status_unknown` - this can happen even though there are no node
or network failures. In particular, this caused
`randomized_nemesis_test.remove_leader_with_forwarding_finishes` to fail
from time to time.

Fix it by serializing the notifying and dropping of waiters in a single
fiber - `applier_fiber`. We decided to move all management of waiters
into `applier_fiber`, because most of that management was already there
(there was already one `drop_waiters` call, and two `notify_waiters`
calls). Now, when `io_fiber` observes that we've been removed from the
config and no longer a leader, instead of dropping waiters, it sends a
message to `applier_fiber`. `applier_fiber` will drop waiters when
receiving that message.

Improve an existing test to reproduce this scenario more frequently.

Fixes #11235.

Closes #11308

* github.com:scylladb/scylladb:
  test: raft: randomized_nemesis_test: more chaos in `remove_leader_with_forwarding_finishes`
  raft: server: drop waiters in `applier_fiber` instead of `io_fiber`
  raft: server: use `visit` instead of `holds_alternative`+`get`

(cherry picked from commit 9c4e32d2e2)
2022-12-06 17:12:03 +01:00
Nadav Har'El
0debb419f7 Merge 'alternator: fix wrong 'where' condition for GSI range key' from Marcin Maliszkiewicz
Contains fixes requested in the issue (and some tiny extras), together with analysis why they don't affect the users (see commit messages).

Fixes [ #11800](https://github.com/scylladb/scylladb/issues/11800)

Closes #11926

* github.com:scylladb/scylladb:
  alternator: add maybe_quote to secondary indexes 'where' condition
  test/alternator: correct xfail reason for test_gsi_backfill_empty_string
  test/alternator: correct indentation in test_lsi_describe
  alternator: fix wrong 'where' condition for GSI range key

(cherry picked from commit ce7c1a6c52)
2022-12-05 20:18:39 +02:00
Nadav Har'El
0cfb950569 cql: fix column-name aliases in SELECT JSON
The SELECT JSON statement, just like SELECT, allows the user to rename
selected columns using an "AS" specification. E.g., "SELECT JSON v AS foo".
This specification was not honored: We simply forgot to look at the
alias in SELECT JSON's implementation (we did it correctly in regular
SELECT). So this patch fixes this bug.

We had two tests in cassandra_tests/validation/entities/json_test.py
that reproduced this bug. The checks in those tests now pass, but these
two tests still continue to fail after this patch because of two other
unrelated bugs that were discovered by the same tests. So in this patch
I also add a new test just for this specific issue - to serve as a
regression test.

Fixes #8078

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12123

(cherry picked from commit c5121cf273)
2022-12-05 20:12:44 +02:00
Nadav Har'El
eebe77b5b8 materialized views: fix view writes after base table schema change
When we write to a materialized view, we need to know some information
defined in the base table such as the columns in its schema. We have
a "view_info" object that tracks each view and its base.

This view_info object has a couple of mutable attributes which are
used to lazily-calculate and cache the SELECT statement needed to
read from the base table. If the base-table schema ever changes -
and the code calls set_base_info() at that point - we need to forget
this cached statement. If we don't (as before this patch), the SELECT
will use the wrong schema and writes will no longer work.

This patch also includes a reproducing test that failed before this
patch, and passes afterwords. The test creates a base table with a
view that has a non-trivial SELECT (it has a filter on one of the
base-regular columns), makes a benign modification to the base table
(just a silly addition of a comment), and then tries to write to the
view - and before this patch it fails.

Fixes #10026
Fixes #11542

(cherry picked from commit 2f2f01b045)
2022-12-05 20:09:15 +02:00
Nadav Har'El
aa206a6b6a test/alternator: increase timeout on TTL tests
Some of the tests in test/alternator/test_ttl.py need an expiration scan
pass to complete and expire items. In development builds on developer
machines, this usually takes less than a second (our scanning period is
set to half a second). However, in debug builds on Jenkins each scan
often takes up to 100 (!) seconds (this is the record we've seen so far).
This is why we set the tests' timeout to 120.

But recently we saw another test run failing. I think the problem is
that in some case, we need not one, but *two* scanning passes to
complete before the timeout: It is possible that the test writes an
item right after the current scan passed it, so it doesn't get expired,
and then we a second scan at a random position, possibly making that
item we mention one of the last items to be considered - so in total
we need to wait for two scanning periods, not one, for the item to
expire.

So this patch increases the timeout from 120 seconds to 240 seconds -
more than twice the highest scanning time we ever saw (100 seconds).

Note that this timeout is just a timeout, it's not the typical test
run time: The test can finish much more quickly, as little as one
second, if items expire quickly on a fast build and machine.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12106

(cherry picked from commit 6bc3075bbd)
2022-12-05 14:21:22 +02:00
Nadav Har'El
9baf72b049 test/alternator: fix timeout in flaky test test_ttl_stats
The test `test_metrics.py::test_ttl_stats` tests the metrics associated
with Alternator TTL expiration events. It normally finishes in less than a
second (the TTL scanning is configured to run every 0.5 seconds), so we
arbitrarily set a 60 second timeout for this test to allow for extremely
slow test machines. But in some extreme cases even this was not enough -
in one case we measured the TTL scan to take 63 seconds.

So in this patch we increase the timeout in this test from 60 seconds
to 120 seconds. We already did the same change in other Alternator TTL
tests in the past - in commit 746c4bd.

Fixes #11695

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11696

(cherry picked from commit 3a30fbd56c)
2022-12-05 14:21:22 +02:00
Nadav Har'El
8e62405117 test/alternator: test Alternator TTL metrics
This patch adds a test for the metrics generated by the background
expiration thread run for Alternator's TTL feature.

We test three of the four metrics: scylla_expiration_scan_passes,
scylla_expiration_scan_table and scylla_expiration_items_deleted.
The fourth metric, scylla_expiration_secondary_ranges_scanned, counts the
number of times that this node took over another node's expiration duty.
so requires a multi-node cluster to test, and we can't test it in the
single-node cluster test framework.

To see TTL expiration in action this test may need to wait up to the
setting of alternator_ttl_period_in_seconds. For a setting of 1
second (the default set by test/alternator/run), this means this
test can take up to 1 second to run. If alternator_ttl_period_in_seconds
is set higher, the test is skipped unless --runveryslow is requested.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 297109f6ee)
2022-12-05 14:21:16 +02:00
Nadav Har'El
15421e45a0 test/alternator: skip fewer Alternator TTL tests
Most of the Alternator TTL tests are extremely slow on DynamoDB because
item expiration may be delayed up to 24 hours (!), and in practice for
10 to 30 minutes. Because of this, we marked most of these tests
with the "veryslow" mark, causing them to be skipped by default - unless
pytest is given the "--runveryslow" option.

The result was that the TTL tests were not run in the normal test runs,
which can allow regressions to be introduced (luckily, this hasn't happened).

However, this "veryslow" mark was excessive. Many of the tests are very
slow only on DynamoDB, but aren't very slow on Scylla. In particular,
many of the tests involve waiting for an item to expire, something that
happens after the configurable alternator_ttl_period_in_seconds, which
is just one second in our tests.

So in this patch, we remove the "veryslow" mark from 6 tests of Alternator TTL
tests, and instead use two new fixtures - waits_for_expiration and
veryslow_on_aws - to only skip the test when running on DynamoDB or
when alternator_ttl_period_in_seconds is high - but in our usual test
environment they will not get skipped.

Because 5 of these 6 tests wait for an item to expire, they take one
second each and this patch adds 5 seconds to the Alternator test
runtime. This is unfortunate (it's more than 25% of the total Alternator
test runtime!) but not a disaster, and we plan to reduce this 5 second
time futher in the following patch, but decreasing the TTL scanning
period even further.

This patch also increases the timeout of several of these tests, to 120
seconds from the previous 10 seconds. As mentioned above, normally,
these tests should always finish in alternator_ttl_period_in_seconds
(1 second) with a single scan taking less than 0.2 seconds, but in
extreme cases of debug builds on overloaded test machines, we saw even
60 seconds being passed, so let's increase the maximum. I also needed
to make the sleep time between retries smaller, not a function of the
new (unrealistic) timeout.

4 more tests remain "veryslow" (and won't run by default) because they
are take 5-10 seconds each (e.g., a test which waits to see that an item
does *not* get expired, and a test involving writing a lot of data).
We should reconsider this in the future - to perhaps run these tests in
our normal test runs - but even for now, the 6 extra tests that we
start running are a much better protection against regressions than what
we had until now.

Fixes #11374

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

x

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 746c4bd9eb)
2022-12-05 13:07:16 +02:00
Benny Halevy
cf6bcffc1b configure: add --perf-tests-debuginfo option
Provides separate control over debuginfo for perf tests
since enabling --tests-debuginfo affects both today
causing the Jenkins archives of perf tests binaries to
inflate considerably.

Refs https://github.com/scylladb/scylla-pkg/issues/3060

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 48021f3ceb)

Fixes #12191
2022-12-04 17:19:59 +02:00
Petr Gusev
a50b7f3d6a modification_statement: fix LWT insert crash if clustering key is null
PR #9314 fixed a similar issue with regular insert statements
but missed the LWT code path.

It's expected behaviour of
modification_statement::create_clustering_ranges to return an
empty range in this case, since possible_lhs_values it
uses explicitly returns empty_value_set if it evaluates rhs
to null, and it has a comment about it (All NULL
comparisons fail; no column values match.) On the other hand,
all components of the primary key are required to be set,
this is checked at the prepare phase, in
modification_statement::process_where_clause. So the only
problem was modification_statement::execute_with_condition
was not expecting an empty clustering_range in case of
a null clustering key.

Fixes: #11954
(cherry picked from commit 0d443dfd16)
2022-12-04 15:45:44 +02:00
Pavel Emelyanov
0e06025487 distributed_loader: Use coroutine::lambda in sleeping coroutine
According to seastar/doc/lambda-coroutine-fiasco.md lambda that
co_awaits once loses its capture frame. In distrobuted_loader
code there's at least one of that kind.

fixes: #12175

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12170

(cherry picked from commit 71179ff5ab)
2022-12-04 15:45:44 +02:00
Avi Kivity
820b79d56e Update seastar submodule (coroutine lambda fiasco)
* seastar 3aa91b4d2d...328edb2bce (1):
  > coroutine: explain and mitigate the lambda coroutine fiasco

Ref #12175
2022-12-04 15:45:31 +02:00
Botond Dénes
fde4a6e92d Merge 'doc: document the procedure for updating the mode after upgrade' from Anna Stuchlik
Fix https://github.com/scylladb/scylla-docs/issues/4126

Closes #11122

* github.com:scylladb/scylladb:
  doc: add info about the time-consuming step due to resharding
  doc: add the new KB to the toctree
  doc: doc: add a KB about updating the mode in perftune.yaml after upgrade

(cherry picked from commit e9fec761a2)
2022-11-30 08:49:59 +02:00
Nadav Har'El
34b8d4306c Merge 'doc: add the links to the per-partition rate limit extension ' from Anna Stuchlik
Release 5.1. introduced a new CQL extension that applies to the CREATE TABLE and ALTER TABLE statements. The ScyllaDB-specific extensions are described on a separate page, so the CREATE TABLE and ALTER TABLE should include links to that page and section.

Note: CQL extensions are described with Markdown, while the Data Definition page is RST. Currently, there's no way to link from an RST page to an MD subsection (using a section heading or anchor), so a URL is used as a temporary solution.

Related: https://github.com/scylladb/scylladb/pull/9810

Closes #12070

* github.com:scylladb/scylladb:
  doc: move the info about per-partition rate limit for the ALTER TABLE statemet from the paragraph to the list
  doc: add the links to the per-partition rate limit extention to the CREATE TABLE and ALTER TABLE sections

(cherry picked from commit 6e9f739f19)
2022-11-28 08:52:28 +02:00
Yaron Kaikov
69e8fb997c release: prepare for 5.1.0 2022-11-27 14:36:37 +02:00
Botond Dénes
fbfd91e02f Merge '[branch 5.1 backport] doc: add ScyllaDB image upgrade guides for patch releases' from Anna Stuchlik
This is a backport of https://github.com/scylladb/scylladb/pull/11460.

Closes #12079

* github.com:scylladb/scylladb:
  doc: update the commands to upgrade the ScyllaDB image
  doc: fix the filename in the index to resolve the warnings and fix the link
  doc: apply feedback by adding she step fo load the new repo and fixing the links
  doc: fix the version name in file upgrade-guide-from-2021.1-to-2022.1-image.rst
  doc: rename the upgrade-image file to upgrade-image-opensource and update all the links to that file
  doc: update the Enterprise guide to include the Enterprise-onlyimage file
  doc: update the image files
  doc: split the upgrade-image file to separate files for Open Source and Enterprise
  doc: clarify the alternative upgrade procedures for the ScyllaDB image
  doc: add the upgrade guide for ScyllaDB Image from 2022.x.y. to 2022.x.z
  doc: add the upgrade guide for ScyllaDB Image from 5.x.y. to 5.x.z
2022-11-25 13:27:21 +02:00
Anna Stuchlik
4c19b48495 doc: update the commands to upgrade the ScyllaDB image
(cherry picked from commit db75adaf9a)
2022-11-25 11:10:33 +01:00
Anna Stuchlik
9deca4250f doc: fix the filename in the index to resolve the warnings and fix the link
(cherry picked from commit e5c9f3c8a2)
2022-11-25 11:09:58 +01:00
Anna Stuchlik
97ab2a4eb3 doc: apply feedback by adding she step fo load the new repo and fixing the links
(cherry picked from commit 338b45303a)
2022-11-25 11:08:49 +01:00
Anna Stuchlik
16a941db3b doc: fix the version name in file upgrade-guide-from-2021.1-to-2022.1-image.rst
(cherry picked from commit 54d6d8b8cc)
2022-11-25 11:08:11 +01:00
Anna Stuchlik
8f60a464a7 doc: rename the upgrade-image file to upgrade-image-opensource and update all the links to that file
(cherry picked from commit 6ccc838740)
2022-11-25 11:07:21 +01:00
Anna Stuchlik
1e72f9cb5e doc: update the Enterprise guide to include the Enterprise-onlyimage file
(cherry picked from commit 22317f8085)
2022-11-25 11:06:40 +01:00
Anna Stuchlik
d91da87313 doc: update the image files
(cherry picked from commit 593f987bb2)
2022-11-25 11:05:51 +01:00
Anna Stuchlik
c73d59c1cb doc: split the upgrade-image file to separate files for Open Source and Enterprise
(cherry picked from commit 42224dd129)
2022-11-25 11:04:13 +01:00
Anna Stuchlik
9a6c0a89a0 doc: clarify the alternative upgrade procedures for the ScyllaDB image
(cherry picked from commit 64a527e1d3)
2022-11-25 11:01:27 +01:00
Anna Stuchlik
5b7dd00b14 doc: add the upgrade guide for ScyllaDB Image from 2022.x.y. to 2022.x.z
(cherry picked from commit 5136d7e6d7)
2022-11-25 11:00:32 +01:00
Anna Stuchlik
237df3b935 doc: add the upgrade guide for ScyllaDB Image from 5.x.y. to 5.x.z
(cherry picked from commit f1ef6a181e)
2022-11-25 10:59:43 +01:00
Botond Dénes
993d0371d9 Merge '[branch 5.1 backport] doc: remove the Operator documentation pages from the core ScyllaDB docs' from Anna Stuchlik
This is a backport of https://github.com/scylladb/scylladb/pull/11154.

Closes #12061

* github.com:scylladb/scylladb:
  add the redirect to the Operator
  doc: remove the Operator docs from the core documentation
2022-11-25 07:49:10 +02:00
Anna Stuchlik
04167eba68 add the redirect to the Operator
(cherry picked from commit 792d1412d6)
2022-11-24 18:57:09 +01:00
Anna Stuchlik
ddf8eaba04 doc: remove the Operator docs from the core documentation
(cherry picked from commit 966c3423ad)
2022-11-24 18:54:10 +01:00
Botond Dénes
d5e5d27929 Merge '[branch 5.1 backport] doc: add the upgrade guide from 5.0 to 2022.1' from Anna Stuchlik
This is a backport of https://github.com/scylladb/scylladb/pull/11108.

Closes #12063

* github.com:scylladb/scylladb:
  doc: apply feedback about scylla-enterprise-machine-image
  doc: update the note about installing scylla-enterprise-machine-image
  update the info about installing scylla-enterprise-machine-image during upgrade
  doc: add the requirement to install scylla-enterprise-machine-image if the previous version was installed with an image
  doc: update the info about metrics in 2022.1 compared to 5.0
  doc: minor formatting and language fixes
  doc: add the new guide to the toctree
  doc: add the upgrade guide from 5.0 to 2022.1
2022-11-23 14:53:13 +02:00
Anna Stuchlik
bb69ece13d doc: update the note about installing scylla-enterprise-machine-image
(cherry picked from commit 9fe7aa5c9a)
2022-11-23 14:53:13 +02:00
Anna Stuchlik
d4268863cd update the info about installing scylla-enterprise-machine-image during upgrade
(cherry picked from commit 4b0ec11136)
2022-11-23 14:53:13 +02:00
Anna Stuchlik
8b05c67226 doc: add the requirement to install scylla-enterprise-machine-image if the previous version was installed with an image
(cherry picked from commit da7f6cdec4)
2022-11-23 14:53:13 +02:00
Anna Stuchlik
3afc58de7d doc: add the upgrade guide from 5.0 to 2022.1
(cherry picked from commit c9b0c6fdbf)
2022-11-23 13:08:17 +01:00
Anna Stuchlik
496696140b docs: backport upgrade guide improvements from #11577 to 5.1
PR #11577 added the 5.0->5.1 upgrade guide. At the same time, it
improved some of the common `.rst` files that were using in other
upgrade guides; e.g. the `docs/upgrade/_common/upgrade-guide-v4-rpm.rst`
file is used in the 4.6->5.0 upgrade guide.

The 5.0->5.1 upgrade guide was then refactored. The refactored version
was already backported to the 5.1 branch (#12034). But we should still
backport the improvements done in #11577. This commit contains these
improvements.

(cherry picked from commit 2513497f9a)

Closes #12055
2022-11-23 13:51:54 +02:00
Anna Stuchlik
3a070380a7 doc: update the links to Manager and Operator
Closes #11196

(cherry picked from commit 532aa6e655)
2022-11-23 13:47:58 +02:00
Botond Dénes
7b551d8ce4 Merge '[branch 5.1 backport] doc: update the OS support for versions 2022.1 and 2022.2' from Anna Stuchlik
This is a backport of https://github.com/scylladb/scylladb/pull/11461.

Closes #12044

* github.com:scylladb/scylladb:
  doc: remove support for Debian 9 from versions 2022.1 and 2022.2
  doc: remove support for Ubuntu 16.04 from versions 2022.1 and 2022.2
  backport 11461 doc: add support for Debian 11 to versions 2022.1 and 2022.2
2022-11-22 08:30:04 +02:00
Anna Stuchlik
2018b8fcfd doc: remove support for Debian 9 from versions 2022.1 and 2022.2
(cherry picked from commit 4c7aa5181e)
2022-11-22 08:30:04 +02:00
Anna Stuchlik
20b5aa938e doc: remove support for Ubuntu 16.04 from versions 2022.1 and 2022.2
(cherry picked from commit dfc7203139)
2022-11-22 08:30:04 +02:00
Anna Stuchlik
f519580252 doc: add support for Debian 11 to versions 2022.1 and 2022.2
(cherry picked from commit dd4979ffa8)
2022-11-22 08:30:00 +02:00
Takuya ASADA
46c0a1cc0a scylla_raid_setup: run uuidpath existance check only after mount failed
We added UUID device file existance check on #11399, we expect UUID
device file is created before checking, and we wait for the creation by
"udevadm settle" after "mkfs.xfs".

However, we actually getting error which says UUID device file missing,
it probably means "udevadm settle" doesn't guarantee the device file created,
on some condition.

To avoid the error, use var-lib-scylla.mount to wait for UUID device
file is ready, and run the file existance check when the service is
failed.

Fixes #11617

Closes #11666

(cherry picked from commit a938b009ca)
2022-11-21 21:26:48 +02:00
Takuya ASADA
a822282fde scylla_raid_setup: prevent mount failed for /var/lib/scylla
Just like 4a8ed4c, we also need to wait for udev event completion to
create /dev/disk/by-uuid/$UUID for newly formatted disk, to mount the
disk just after formatting.

Fixes #11359

(cherry picked from commit 8835a34ab6)
2022-11-21 21:26:48 +02:00
Takuya ASADA
bc5af9fdea scylla_raid_setup: check uuid and device path are valid
Added code to check make sure uuid and uuid based device path are valid.

(cherry picked from commit 40134efee4)

Ref #11617, #11359 (prerequisite).
2022-11-21 21:26:11 +02:00
Nadav Har'El
c8bb147f84 Merge 'cql3: don't ignore other restrictions when a multi column restriction is present during filtering' from Jan Ciołek
When filtering with multi column restriction present all other restrictions were ignored.
So a query like:
`SELECT * FROM WHERE pk = 0 AND (ck1, ck2) < (0, 0) AND regular_col = 0 ALLOW FILTERING;`
would ignore the restriction `regular_col = 0`.

This was caused by a bug in the filtering code:
2779a171fc/cql3/selection/selection.cc (L433-L449)

When multi column restrictions were detected, the code checked if they are satisfied and returned immediately.
This is fixed by returning only when these restrictions are not satisfied. When they are satisfied the other restrictions are checked as well to ensure all of them are satisfied.

This code was introduced back in 2019, when fixing #3574.
Perhaps back then it was impossible to mix multi column and regular columns and this approach was correct.

Fixes: #6200
Fixes: #12014

Closes #12031

* github.com:scylladb/scylladb:
  cql-pytest: add a reproducer for #12014, verify that filtering multi column and regular restrictions works
  boost/restrictions-test: uncomment part of the test that passes now
  cql-pytest: enable test for filtering combined multi column and regular column restrictions
  cql3: don't ignore other restrictions when a multi column restriction is present during filtering

(cherry picked from commit 2d2034ea28)
2022-11-21 14:02:33 +02:00
Kamil Braun
dc92ec4c8b docs: a single 5.0 -> 5.1 upgrade guide
There were 4 different pages for upgrading Scylla 5.0 to 5.1 (and the
same is true for other version pairs, but I digress) for different
environments:
- "ScyllaDB Image for EC2, GCP, and Azure"
- Ubuntu
- Debian
- RHEL/CentOS

THe Ubuntu and Debian pages used a common template:
```
.. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p1.rst
.. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p2.rst
```
with different variable substitutions.

The "Image" page used a similar template, with some extra content in the
middle:
```
.. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p1.rst
.. include:: /upgrade/_common/upgrade-image-opensource.rst
.. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p2.rst
```

The RHEL/CentOS page used a different template:
```
.. include:: /upgrade/_common/upgrade-guide-v4-rpm.rst
```

This was an unmaintainable mess. Most of the content was "the same" for
each of these options. The only content that must actually be different
is the part with package installation instructions (e.g. calls to `yum`
vs `apt-get`). The rest of the content was logically the same - the
differences were mistakes, typos, and updates/fixes to the text that
were made in some of these docs but not others.

In this commit I prepare a single page that covers the upgrade and
rollback procedures for each of these options. The section dependent on
the system was implemented using Sphinx Tabs.

I also fixed and changed some parts:

- In the "Gracefully stop the node" section:
Ubuntu/Debian/Images pages had:

```rst
.. code:: sh

   sudo service scylla-server stop
```

RHEL/CentOS pages had:
```rst
.. code:: sh

.. include:: /rst_include/scylla-commands-stop-index.rst
```

the stop-index file contained this:
```rst
.. tabs::

   .. group-tab:: Supported OS

      .. code-block:: shell

         sudo systemctl stop scylla-server

   .. group-tab:: Docker

      .. code-block:: shell

         docker exec -it some-scylla supervisorctl stop scylla

      (without stopping *some-scylla* container)
```

So the RHEL/CentOS version had two tabs: one for Scylla installed
directly on the system, one for Scylla running in Docker - which is
interesting, because nothing anywhere else in the upgrade documents
mentions Docker.  Furthermore, the RHEL/CentOS version used `systemctl`
while the ubuntu/debian/images version used `service` to stop/start
scylla-server.  Both work on modern systems.

The Docker option is completely out of place - the rest of the upgrade
procedure does not mention Docker. So I decided it doesn't make sense to
include it. Docker documentation could be added later if we actually
decide to write upgrade documentation when using Docker...  Between
`systemctl` and `service` I went with `service` as it's a bit
higher-level.

- Similar change for "Start the node" section, and corresponding
  stop/start sections in the Rollback procedure.

- To reuse text for Ubuntu and Debian, when referencing "ScyllaDB deb
  repo" in the Debian/Ubuntu tabs, I provide two separate links: to
  Debian and Ubuntu repos.

- the link to rollback procedure in the RPM guide (in 'Download and
  install the new release' section) pointed to rollback procedure from
  3.0 to 3.1 guide... Fixed to point to the current page's rollback
  procedure.

- in the rollback procedure steps summary, the RPM version missed the
  "Restore system tables" step.

- in the rollback procedure, the repository links were pointing to the
  new versions, while they should point to the old versions.

There are some other pre-existing problems I noticed that need fixing:

- EC2/GCP/Azure option has no corresponding coverage in the rollback
  section (Download and install the old release) as it has in the
  upgrade section. There is no guide for rolling back 3rd party and OS
  packages, only Scylla. I left a TODO in a comment.
- the repository links assume certain Debian and Ubuntu versions (Debian
  10 and Ubuntu 20), but there are more available options (e.g. Ubuntu
  22). Not sure how to deal with this problem. Maybe a separate section
  with links? Or just a generic link without choice of platform/version?

Closes #11891

(cherry picked from commit 0c7ff0d2cb)

Backport notes:
Funnily, the 5.1 branch did not have the upgrade guide to 5.1 at all. It
was only in `master`. So the backport does not remove files, only adds
new ones.
I also had to add:
- an additional link in the upgrade-opensource index to the 5.1 upgrade
  page (it was already in upstream `master` when the cherry-picked commit
  was added)
- the list of new metrics, which was also completely missing in
  branch-5.1.

Closes #12034
2022-11-21 13:58:41 +02:00
Tzach Livyatan
8f7e3275a2 Update Alternator Markdown file to use automatic link notation
Closes #11335

(cherry picked from commit 8fc58300ea)
2022-11-21 09:56:10 +02:00
Yaron Kaikov
40a1905a2d release: prepare for 5.1.0-rc5 2022-11-19 13:41:28 +02:00
Avi Kivity
4e2c436222 Merge 'doc: add the upgrade guide from 5.0 to 2022.1 on Ubuntu 20.04' from Anna Stuchlik
Ubuntu 22.04 is supported by both ScyllaDB Open Source 5.0 and Enterprise 2022.1.

Closes #11227

* github.com:scylladb/scylladb:
  doc: add the redirects from Ubuntu version specific to version generic pages
  doc: remove version-speific content for Ubuntu and add the generic page to the toctree
  doc: rename the file to include Ubuntu
  doc: remove the version number from the document and add the link to Supported Versions
  doc: add a generic page for Ubuntu
  doc: add the upgrade guide from 5.0 to 2022.1 on Ubuntu 2022.1

(cherry picked from commit d4c986e4fa)
2022-11-18 17:06:00 +02:00
Botond Dénes
68be369f93 Merge 'doc: add the upgrade guide for ScyllaDB image from 2021.1 to 2022.1' from Anna Stuchlik
This PR is related to  https://github.com/scylladb/scylla-docs/issues/4124 and https://github.com/scylladb/scylla-docs/issues/4123.

**New Enterprise Upgrade Guide from 2021.1 to 2022.2**
I've added the upgrade guide for ScyllaDB Enterprise image. In consists of 3 files:
/upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian-p1.rst
upgrade/_common/upgrade-image.rst
 /upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian-p2.rst

**Modified Enterprise Upgrade Guides 2021.1 to 2022.2**
I've modified the existing guides for Ubuntu and Debian to use the same files as above, but exclude the image-related information:
/upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian-p1.rst + /upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian-p2.rst = /upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian.rst

To make things simpler and remove duplication, I've replaced the guides for Ubuntu 18 and 20 with a generic Ubuntu guide.

**Modified Enterprise Upgrade Guides from 4.6 to 5.0**
These guides included a bug: they included the image-related information (about updating OS packages), because a file that includes that information was included by mistake. What's worse, it was duplicated. After the includes were removed, image-related information is no longer included in the Ubuntu and Debian guides (this fixes https://github.com/scylladb/scylla-docs/issues/4123).

I've modified the index file to be in sync with the updates.

Closes #11285

* github.com:scylladb/scylladb:
  doc: reorganize the content to list the recommended way of upgrading the image first
  doc: update the image upgrade guide for ScyllaDB image to include the location of the manifest file
  doc: fix the upgrade guides for Ubuntu and Debian by removing image-related information
  doc: update the guides for Ubuntu and Debian to remove image information and the OS version number
  doc: add the upgrade guide for ScyllaDB image from 2021.1 to 2022.1

(cherry picked from commit dca351c2a6)
2022-11-18 17:05:14 +02:00
Botond Dénes
0f7adb5f47 Merge 'doc: change the tool names to "Scylla SStable" and "Scylla Types"' from Anna Stuchlik
Fix https://github.com/scylladb/scylladb/issues/11393

- Rename the tool names across the docs.
- Update the examples to replace `scylla-sstable` and `scylla-types` with `scylla sstable` and `scylla types`, respectively.

Closes #11432

* github.com:scylladb/scylladb:
  doc: update the tool names in the toctree and reference pages
  doc: rename the scylla-types tool as Scylla Types
  doc: rename the scylla-sstable tool as Scylla SStable

(cherry picked from commit 2c46c24608)
2022-11-18 17:04:37 +02:00
Avi Kivity
82dc8357ef Merge 'Docs: document how scylla-sstable obtains its schema' from Botond Dénes
This is a very important aspect of the tool that was completely missing from the document before. Also add a comparison with SStableDump.
Fixes: https://github.com/scylladb/scylladb/issues/11363

Closes #11390

* github.com:scylladb/scylladb:
  docs: scylla-sstable.rst: add comparison with SStableDump
  docs: scylla-sstable.rst: add section about providing the schema

(cherry picked from commit 2ab5cbd841)
2022-11-18 17:01:17 +02:00
Anna Stuchlik
12a58957e2 doc: fix the upgrade version in the upgrade guide for RHEL and CentOS
Closes #11477

(cherry picked from commit 0dee507c48)
2022-11-18 16:59:44 +02:00
Botond Dénes
3423ad6e38 Merge 'doc: update the default SStable format' from Anna Stuchlik
The purpose of this PR is to update the information about the default SStable format.
It

Closes #11431

* github.com:scylladb/scylladb:
  doc: simplify the information about default formats in different versions
  doc: update the SSTables 3.0 Statistics File Format to add the UUID host_id option of the ME format
  doc: add the information regarding the ME format to the SSTables 3.0 Data File Format page
  doc: fix additional information regarding the ME format on the SStable 3.x page
  doc: add the ME format to the table
  add a comment to remove the information when the documentation is versioned (in 5.1)
  doc: replace Scylla with ScyllaDB
  doc: fix the formatting and language in the updated section
  doc: fix the default SStable format

(cherry picked from commit a0392bc1eb)
2022-11-18 16:58:14 +02:00
Anna Stuchlik
64001719fa doc: remove the section about updating OS packages during upgrade from upgrade guides for Ubunut and Debian (from 4.5 to 4.6)
Closes #11629

(cherry picked from commit c5285bcb14)
2022-11-18 16:56:29 +02:00
AdamStawarz
cc3d368bc8 Update tombstones-flush.rst
change syntax:

nodetool compact <keyspace>.<mytable>;
to
nodetool compact <keyspace> <mytable>;

Closes #11904

(cherry picked from commit 6bc455ebea)
2022-11-18 16:52:06 +02:00
Botond Dénes
d957b0044b Merge 'doc: improve the documentation landing page ' from Anna Stuchlik
This PR introduces the following changes to the documentation landing page:

- The " New to ScyllaDB? Start here!" box is added.
- The "Connect your application to Scylla" box is removed.
- Some wording has been improved.
- "Scylla" has been replaced with "ScyllaDB".

Closes #11896

* github.com:scylladb/scylladb:
  Update docs/index.rst
  doc: replace Scylla with ScyllaDB on the landing page
  doc: improve the wording on the landing page
  doc: add the link to the ScyllaDB Basics page to the documentation landing page

(cherry picked from commit 2b572d94f5)
2022-11-18 16:51:26 +02:00
Botond Dénes
d4ed67bd47 Merge 'doc: cql-extensions.md: improve description of synchronous views' from Nadav Har'El
It was pointed out to me that our description of the synchronous_updates
materialized-view option does not make it clear enough what is the
default setting, or why a user might want to use this option.

This patch changes the description to (I hope) better address these
issues.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11404

* github.com:scylladb/scylladb:
  doc: cql-extensions.md: replace "Scylla" by "ScyllaDB"
  doc: cql-extensions.md: improve description of synchronous views

(cherry picked from commit b9fc504fb2)
2022-11-18 16:44:38 +02:00
Nadav Har'El
0cd6341cae Merge 'doc: document user defined functions (UDFs)' from Anna Stuchlik
This PR is V2 of the[ PR created by @psarna.](https://github.com/scylladb/scylladb/pull/11560).
I have:
- copied the content.
- applied the suggestions left by @nyh.
- made minor improvements, such as replacing "Scylla" with "ScyllaDB", fixing punctuation, and fixing the RST syntax.

Fixes https://github.com/scylladb/scylladb/issues/11378

Closes #11984

* github.com:scylladb/scylladb:
  doc: label user-defined functions as Experimental
  doc: restore the note for the Count function (removed by mistatke)
  doc: document user defined functions (UDFs)

(cherry picked from commit 7cbb0b98bb)
2022-11-18 16:43:53 +02:00
Nadav Har'El
23d8852a82 Merge 'doc: update the "Counting all rows in a table is slow" page' from Anna Stuchlik
Fix https://github.com/scylladb/scylladb/issues/11373

- Updated the information on the "Counting all rows in a table is slow" page.
- Added COUNT to the list of selectors of the SELECT statement (somehow it was missing).
- Added the note to the description of the COUNT() function with a link to the KB page for troubleshooting if necessary. This will allow the users to easily find the KB page.

Closes #11417

* github.com:scylladb/scylladb:
  doc: add a comment to remove the note in version 5.1
  doc: update the information on the Countng all rows page and add the recommendation to upgrade ScyllaDB
  doc: add a note to the description of COUNT with a reference to the KB article
  doc: add COUNT to the list of acceptable selectors of the SELECT statement

(cherry picked from commit 22bb35e2cb)
2022-11-18 16:28:43 +02:00
Aleksandra Martyniuk
88016de43e compaction: request abort only once in compaction_data::stop
compaction_manager::task (and thus compaction_data) can be stopped
because of many different reasons. Thus, abort can be requested more
than once on compaction_data abort source causing a crash.

To prevent this before each request_abort() we check whether an abort
was requested before.

Closes #12004

(cherry picked from commit 7ead1a7857)

Fixes #12002.
2022-11-17 19:15:43 +02:00
Asias He
bdecf4318a gossip: Improve get_live_token_owners and get_unreachable_token_owners
The get_live_token_owners returns the nodes that are part of the ring
and live.

The get_unreachable_token_owners returns the nodes that are part of the ring
and is not alive.

The token_metadata::get_all_endpoints returns nodes that are part of the
ring.

The patch changes both functions to use the more authoritative source to
get the nodes that are part of the ring and call is_alive to check if
the node is up or down. So that the correctness does not depend on
any derived information.

This patch fixes a truncate issue in storage_proxy::truncate_blocking
where it calls get_live_token_owners and get_unreachable_token_owners to
decide the nodes to talk with for truncate operation. The truncate
failed because incorrect nodes were returned.

Fixes #10296
Fixes #11928

Closes #11952

(cherry picked from commit 16bd9ec8b1)
2022-11-17 14:30:43 +02:00
Eliran Sinvani
72bf244ad1 cql: Fix crash upon use of the word empty for service level name
Wrong access to an uninitialized token instead of the actual
generated string caused the parser to crash, this wasn't
detected by the ANTLR3 compiler because all the temporary
variables defined in the ANTLR3 statements are global in the
generated code. This essentialy caused a null dereference.

Tests: 1. The fixed issue scenario from github.
       2. Unit tests in release mode.

Fixes #11774

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190612133151.20609-1-eliransin@scylladb.com>

Closes #11777

(cherry picked from commit ab7429b77d)
2022-11-10 20:42:59 +02:00
Botond Dénes
ee82323599 db/view/view_builder: don't drop partition and range tombstones when resuming
The view builder builds the views from a given base table in
view_builder::batch_size batches of rows. After processing this many
rows, it suspends so the view builder can switch to building views for
other base tables in the name of fairness. When resuming the build step
for a given base table, it reuses the reader used previously (also
serving the role of a snapshot, pinning sstables read from). The
compactor however is created anew. As the reader can be in the middle of
a partition, the view builder injects a partition start into the
compactor to prime it for continuing the partition. This however only
included the partition-key, crucially missing any active tombstones:
partition tombstone or -- since the v2 transition -- active range
tombstone. This can result in base rows covered by either of this to be
resurrected and the view builder to generate view updates for them.
This patch solves this by using the detach-state mechanism of the
compactor which was explicitly developed for situations like this (in
the range scan code) -- resuming a read with the readers kept but the
compactor recreated.
Also included are two test cases reproducing the problem, one with a
range tombstone, the other with a partition tombstone.

Fixes: #11668

Closes #11671

(cherry picked from commit 5621cdd7f9)
2022-11-07 11:45:37 +02:00
Alexander Turetskiy
2f78df92ab Alternator: Projection field added to return from DescribeTable which describes GSIs and LSIs.
The return from DescribeTable which describes GSIs and LSIs is missing
the Projection field. We do not yet support all the settings Projection
(see #5036), but the default which we support is ALL, and DescribeTable
should return that in its description.

Fixes #11470

Closes #11693

(cherry picked from commit 636e14cc77)
2022-11-07 10:36:04 +02:00
Takuya ASADA
e2809674d2 locator::ec2_snitch: Retry HTTP request to EC2 instance metadata service
EC2 instance metadata service can be busy, ret's retry to connect with
interval, just like we do in scylla-machine-image.

Fixes #10250

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #11688

(cherry picked from commit 6b246dc119)
2022-11-06 15:43:06 +02:00
Yaron Kaikov
0295d0c5c8 release: prepare for 5.1.0-rc4 2022-11-06 14:49:29 +02:00
Botond Dénes
fa94222662 Merge 'Alternator, MV: fix bug in some view updates which set the view key to its existing value' from Nadav Har'El
As described in issue #11801, we saw in Alternator when a GSI has both partition and sort keys which were non-key attributes in the base, cases where updating the GSI-sort-key attribute to the same value it already had caused the entire GSI row to be deleted.

In this series fix this bug (it was a bug in our materialized views implementation) and add a reproducing test (plus a few more tests for similar situations which worked before the patch, and continue to work after it).

Fixes #11801

Closes #11808

* github.com:scylladb/scylladb:
  test/alternator: add test for issue 11801
  MV: fix handling of view update which reassign the same key value
  materialized views: inline used-once and confusing function, replace_entry()

(cherry picked from commit e981bd4f21)
2022-11-01 13:14:21 +02:00
Pavel Emelyanov
dff7f3c5ba compaction_manager: Swallow ENOSPCs in ::stop()
When being stopped compaction manager may step on ENOSPC. This is not a
reason to fail stopping process with abort, better to warn this fact in
logs and proceed as if nothing happened

refs: #11245

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-13 15:36:44 +03:00
Pavel Emelyanov
3723713130 exceptions: Mark storage_io_error::code() with noexcept
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-13 15:36:38 +03:00
Pavel Emelyanov
03f8411e38 table: Handle storage_io_error's ENOSPC when flushing
Commit a9805106 (table: seal_active_memtable: handle ENOSPC error)
made memtable flushing code stand ENOSPC and continue flusing again
in the hope that the node administrator would provide some free space.

However, it looks like the IO code may report back ENOSPC with some
exception type this code doesn't expect. This patch tries to fix it

refs: #11245

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-13 15:35:26 +03:00
Pavel Emelyanov
0e391d67d1 table: Rewrap retry loop
The existing loop is very branchy in its attempts to find out whether or
not to abort. The "allowed_retries" count can be a good indicator of the
decision taken. This makes the code notably shorter and easier to extend

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-13 15:35:25 +03:00
Benny Halevy
f76989285e table: seal_active_memtable: handle ENOSPC error
Aborting too soon on ENOSPC is too harsh, leading to loss of
availability of the node for reads, while restarting it won't
solve the ENOSPC condition.

Fixes #11245

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11246
2022-10-13 15:35:19 +03:00
Beni Peled
9deeeb4db1 release: prepare for 5.1.0-rc3 2022-10-09 08:36:06 +03:00
Avi Kivity
1f3196735f Update tools/java submodule (cqlsh permissions)
* tools/java ad6764b506...b3959948dd (1):
  > install.sh is using wrong permissions for install cqlsh files

Fixes #11584.
2022-10-04 18:02:03 +03:00
Nadav Har'El
abb6817261 cql: validate bloom_filter_fp_chance up-front
Scylla's Bloom filter implementation has a minimal false-positive rate
that it can support (6.71e-5). When setting bloom_filter_fp_chance any
lower than that, the compute_bloom_spec() function, which writes the bloom
filter, throws an exception. However, this is too late - it only happens
while flushing the memtable to disk, and a failure at that point causes
Scylla to crash.

Instead, we should refuse the table creation with the unsupported
bloom_filter_fp_chance. This is also what Cassandra did six years ago -
see CASSANDRA-11920.

This patch also includes a regression test, which crashes Scylla before
this patch but passes after the patch (and also passes on Cassandra).

Fixes #11524.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11576

(cherry picked from commit 4c93a694b7)
2022-10-04 16:21:48 +03:00
Nadav Har'El
d3fd090429 alternator: return ProvisionedThroughput in DescribeTable
DescribeTable is currently hard-coded to return PAY_PER_REQUEST billing
mode. Nevertheless, even in PAY_PER_REQUEST mode, the DescribeTable
operation must return a ProvisionedThroughput structure, listing both
ReadCapacityUnits and WriteCapacityUnits as 0. This requirement is not
stated in some DynamoDB documentation but is explictly mentioned in
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ProvisionedThroughput.html
Also in empirically, DynamoDB returns ProvisionedThroughput with zeros
even in PAY_PER_REQUEST mode. We even had an xfailing test to confirm this.

The ProvisionedThroughput structure being missing was a problem for
applications like DynamoDB connectors for Spark, if they implicitly
assume that ProvisionedThroughput is returned by DescribeTable, and
fail (as described in issue #11222) if it's outright missing.

So this patch adds the missing ProvisionedThroughput structure, and
the xfailing test starts to pass.

Note that this patch doesn't change the fact that attempting to set
a table to PROVISIONED billing mode is ignored: DescribeTable continues
to always return PAY_PER_REQUEST as the billing mode and zero as the
provisioned capacities.

Fixes #11222

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11298

(cherry picked from commit 941c719a23)
2022-10-03 14:26:55 +03:00
Pavel Emelyanov
3e7c57d162 cross-shard-barrier: Capture shared barrier in complete
When cross-shard barrier is abort()-ed it spawns a background fiber
that will wake-up other shards (if they are sleeping) with exception.

This fiber is implicitly waited by the owning sharded service .stop,
because barrier usage is like this:

    sharded<service> s;
    co_await s.invoke_on_all([] {
        ...
        barrier.abort();
    });
    ...
    co_await s.stop();

If abort happens, the invoke_on_all() will only resolve _after_ it
queues up the waking lambdas into smp queues, thus the subseqent stop
will queue its stopping lambdas after barrier's ones.

However, in debug mode the queue can be shuffled, so the owning service
can suddenly be freed from under the barrier's feet causing use after
free. Fortunately, this can be easily fixed by capturing the shared
pointer on the shared barrier instead of a regular pointer on the
shard-local barrier.

fixes: #11303

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11553
2022-10-03 13:20:28 +03:00
Tomasz Grabiec
f878a34da3 test: lib: random_mutation_generator: Don't generate mutations with marker uncompacted with shadowable tombstone
The generator was first setting the marker then applied tombstones.

The marker was set like this:

  row.marker() = random_row_marker();

Later, when shadowable tombstones were applied, they were compacted
with the marker as expected.

However, the key for the row was chosen randomly in each iteration and
there are multiple keys set, so there was a possibility of a key clash
with an earlier row. This could override the marker without applying
any tombstones, which is conditional on random choice.

This could generate rows with markers uncompacted with shadowable tombstones.

This broken row_cache_test::test_concurrent_reads_and_eviction on
comparison between expected and read mutations. The latter was
compacted because it went through an extra merge path, which compacts
the row.

Fix by making sure there are no key clashes.

Closes #11663

(cherry picked from commit 5268f0f837)
2022-10-02 16:44:57 +03:00
Raphael S. Carvalho
eaded57b2e compaction: Properly handle stop request for off-strategy
If user stops off-strategy via API, compaction manager can decide
to give up on it completely, so data will sit unreshaped in
maintenance set, preventing it from being compacted with data
in the main set. That's problematic because it will probably lead
to a significant increase in read and space amplification until
off-strategy is triggered again, which cannot happen anytime
soon.

Let's handle it by moving data in maintenance set into main one,
even if unreshaped. Then regular compaction will be able to
continue from where off-strategy left off.

Fixes #11543.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #11545

(cherry picked from commit a04047f390)
2022-10-02 14:20:17 +03:00
Tomasz Grabiec
25d2da08d1 db: range_tombstone_list: Avoid quadratic behavior when applying
Range tombstones are kept in memory (cache/memtable) in
range_tombstone_list. It keeps them deoverlapped, so applying a range
tombstone which covers many range tombstones will erase existing range
tombstones from the list. This operation needs to be exception-safe,
so range_tombstone_list maintains an undo log. This undo log will
receive a record for each range tombstone which is removed. For
exception safety reasons, before pushing an undo log entry, we reserve
space in the log by calling std::vector::reserve(size() + 1). This is
O(N) where N is the number of undo log entries. Therefore, the whole
application is O(N^2).

This can cause reactor stalls and availability issues when replicas
apply such deletions.

This patch avoids the problem by reserving exponentially increasing
amount of space. Also, to avoid large allocations, switches the
container to chunked_vector.

Fixes #11211

Closes #11215

(cherry picked from commit 7f80602b01)
2022-09-30 00:01:26 +03:00
Botond Dénes
9b1a570f6f sstables: crawling mx-reader: make on_out_of_clustering_range() no-op
Said method currently emits a partition-end. This method is only called
when the last fragment in the stream is a range tombstone change with a
position after all clustered rows. The problem is that
consume_partition_end() is also called unconditionally, resulting in two
partition-end fragments being emitted. The fix is simple: make this
method a no-op, there is nothing to do there.

Also add two tests: one targeted to this bug and another one testing the
crawling reader with random mutations generated for random schema.

Fixes: #11421

Closes #11422

(cherry picked from commit be9d1c4df4)
2022-09-29 23:42:01 +03:00
Piotr Dulikowski
426d045249 exception: fix the error code used for rate_limit_exception
Per-partition rate limiting added a new error type which should be
returned when Scylla decides to reject an operation due to per-partition
rate limit being exceeded. The new error code requires drivers to
negotiate support for it, otherwise Scylla will report the error as
`Config_error`. The existing error code override logic works properly,
however due to a mistake Scylla will report the `Config_error` code even
if the driver correctly negotiated support for it.

This commit fixes the problem by specifying the correct error code in
`rate_limit_exception`'s constructor.

Tested manually with a modified version of the Rust driver which
negotiates support for the new error. Additionally, tested what happens
when the driver doesn't negotiate support (Scylla properly falls back to
`Config_error`).

Branches: 5.1
Fixes: #11517

Closes #11518

(cherry picked from commit e69b44a60f)
2022-09-29 23:39:25 +03:00
Botond Dénes
86dbbf12cc shard_reader: do_fill_buffer(): only update _end_of_stream after buffer is copied
Commit 8ab57aa added a yield to the buffer-copy loop, which means that
the copy can yield before done and the multishard reader might see the
half-copied buffer and consider the reader done (because
`_end_of_stream` is already set) resulting in the dropping the remaining
part of the buffer and in an invalid stream if the last copied fragment
wasn't a partition-end.

Fixes: #11561
(cherry picked from commit 0c450c9d4c)
2022-09-29 19:11:52 +03:00
Pavel Emelyanov
b05903eddd messaging_service: Fix gossiper verb group
When configuring tcp-nodelay unconditionally, messaging service thinks
gossiper uses group index 1, though it had changed some time ago and now
those verbs belong to group 0.

fixes: #11465

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit 2c74062962)
2022-09-29 19:05:11 +03:00
Piotr Sarna
26ead53304 Merge 'Fix mutation commutativity with shadowable tombstone'
from Tomasz Grabiec

This series fixes lack of mutation associativity which manifests as
sporadic failures in
row_cache_test.cc::test_concurrent_reads_and_eviction due to differences
in mutations applied and read.

No known production impact.

Refs https://github.com/scylladb/scylladb/issues/11307

Closes #11312

* github.com:scylladb/scylladb:
  test: mutation_test: Add explicit test for mutation commutativity
  test: random_mutation_generator: Workaround for non-associativity of mutations with shadowable tombstones
  db: mutation_partition: Drop unnecessary maybe_shadow()
  db: mutation_partition: Maintain shadowable tombstone invariant when applying a hard tombstone
  mutation_partition: row: make row marker shadowing symmetric

(cherry picked from commit 484004e766)
2022-09-20 23:21:58 +02:00
Tomasz Grabiec
f60bab9471 test: row_cache: Use more narrow key range to stress overlapping reads more
This makes catching issues related to concurrent access of same or
adjacent entries more likely. For example, catches #11239.

Closes #11260

(cherry picked from commit 8ee5b69f80)
2022-09-20 23:21:54 +02:00
Yaron Kaikov
66f34245fc release: prepare for 5.1.0-rc2 2022-09-19 14:35:28 +03:00
Michał Chojnowski
4047528bd9 db: commitlog: don't print INFO logs on shutdown
The intention was for these logs to be printed during the
database shutdown sequence, but it was overlooked that it's not
the only place where commitlog::shutdown is called.
Commitlogs are started and shut down periodically by hinted handoff.
When that happens, these messages spam the log.

Fix that by adding INFO commitlog shutdown logs to database::stop,
and change the level of the commitlog::shutdown log call to DEBUG.

Fixes #11508

Closes #11536

(cherry picked from commit 9b6fc553b4)
2022-09-18 13:33:05 +03:00
Michał Chojnowski
1a82c61452 sstables: add a flag for disabling long-term index caching
Long-term index caching in the global cache, as introduced in 4.6, is a major
pessimization for workloads where accesses to the index are (spacially) sparse.
We want to have a way to disable it for the affected workloads.

There is already infrastructure in place for disabling it for BYPASS CACHE
queries. One way of solving the issue is hijacking that infrastructure.

This patch adds a global flag (and a corresponding CLI option) which controls
index caching. Setting the flag to `false` causes all index reads to behave
like they would in BYPASS CACHE queries.

Consequences of this choice:

- The per-SSTable partition_index_cache is unused. Every index_reader has
  its own, and they die together. Independent reads can no longer reuse the
  work of other reads which hit the same index pages. This is not crucial,
  since partition accesses have no (natural) spatial locality. Note that
  the original reason for partition_index_cache -- the ability to share
  reads for the lower and upper bound of the query -- is unaffected.
- The per-SSTable cached_file is unused. Every index_reader has its own
  (uncached) input stream from the index file, and every
  bsearch_clustered_cursor has its own cached_file, which dies together with
  the cursor. Note that the cursor still can perform its binary search with
  caching. However, it won't be able to reuse the file pages read by
  index_reader. In particular, if the promoted index is small, and fits inside
  the same file page as its index_entry, that page will be re-read.
  It can also happen that index_reader will read the same index file page
  multiple times. When the summary is so dense that multiple index pages fit in
  one index file page, advancing the upper bound, which reads the next index
  page, will read the same index file page. Since summary:disk ratio is 1:2000,
  this is expected to happen for partitions with size greater than 2000
  partition keys.

Fixes #11202

(cherry picked from commit cdb3e71045)
2022-09-18 13:27:46 +03:00
Avi Kivity
3d9800eb1c logalloc: don't crash while reporting reclaim stalls if --abort-on-seastar-bad-alloc is specified
The logger is proof against allocation failures, except if
--abort-on-seastar-bad-alloc is specified. If it is, it will crash.

The reclaim stall report is likely to be called in low memory conditions
(reclaim's job is to alleviate these conditions after all), so we're
likely to crash here if we're reclaiming a very low memory condition
and have a large stall simultaneously (AND we're running in a debug
environment).

Prevent all this by disabling --abort-on-seastar-bad-alloc temporarily.

Fixes #11549

Closes #11555

(cherry picked from commit d3b8c0c8a6)
2022-09-18 13:24:21 +03:00
Karol Baryła
c48e9b47dd transport/server.cc: Return correct size of decompressed lz4 buffer
An incorrect size is returned from the function, which could lead to
crashes or undefined behavior. Fix by erroring out in these cases.

Fixes #11476

(cherry picked from commit 1c2eef384d)
2022-09-07 10:58:30 +03:00
Avi Kivity
2eadaad9f7 Merge 'database: evict all inactive reads for table when detaching table' from Botond Dénes
Currently, when detaching the table from the database, we force-evict all queriers for said table. This series broadens the scope of this force-evict to include all inactive reads registered at the semaphore. This ensures that any regular inactive read "forgotten" for any reason in the semaphore, will not end up in said readers accessing a dangling table reference when destroyed later.

Fixes: https://github.com/scylladb/scylladb/issues/11264

Closes #11273

* github.com:scylladb/scylladb:
  querier: querier_cache: remove now unused evict_all_for_table()
  database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table()
  reader_concurrency_semaphore: add evict_inactive_reads_for_table()

(cherry picked from commit afa7960926)
2022-09-02 10:41:22 +03:00
Yaron Kaikov
d10aee15e7 release: prepare for 5.1.0-rc1 2022-09-02 06:15:05 +03:00
Avi Kivity
9e017cb1e6 Update seastar submodule (tls error handling)
* seastar f9f5228b74...3aa91b4d2d (1):
  > Merge 'tls: vec_push: handle async errors rather than throwing on_internal_error' from Benny Halevy

Fixes #11252
2022-09-01 13:10:13 +03:00
Avi Kivity
b8504cc9b2 .gitmodules: switch seastar to scylla-seastar.git
This allows us to backport seastar patches to branch-5.1 on
scylla-seastar.git.
2022-09-01 13:08:22 +03:00
Avi Kivity
856703a85e Merge 'row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy' from Tomasz Grabiec
Scenario:

cache = [
    row(pos=2, continuous=false),
    row(pos=after(2), dummy=true)
]

Scanning read starts, starts populating [-inf, before(2)] from sstables.

row(pos=2) is evicted.

cache = [
    row(pos=after(2), dummy=true)
]

Scanning read finishes reading from sstables.

Refreshes cache cursor via
partition_snapshot_row_cursor::maybe_refresh(), which calls
partition_snapshot_row_cursor::advance_to() because iterators are
invalidated. This advances the cursor to
after(2). no_clustering_row_between(2, after(2)) returns true, so
advance_to() returns true, and maybe_refresh() returns true. This is
interpreted by the cache reader as "the cursor has not moved forward",
so it marks the range as complete, without emitting the row with
pos=2. Also, it marks row(pos=after(2)) as continuous, so later reads
will also miss the row.

The bug is in advance_to(), which is using
no_clustering_row_between(a, b) to determine its result, which by
definition excludes the starting key.

Discovered by row_cache_test.cc::test_concurrent_reads_and_eviction
with reduced key range in the random_mutation_generator (1024 -> 16).

Fixes #11239

Closes #11240

* github.com:scylladb/scylladb:
  test: mvcc: Fix illegal use of maybe_refresh()
  tests: row_cache_test: Add test_eviction_of_upper_bound_of_population_range()
  tests: row_cache_test: Introduce one_shot mode to throttle
  row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy
2022-08-11 16:51:59 +02:00
Yaron Kaikov
86a6c1fb2b release: prepare for 5.1.0-rc0 2022-08-09 18:48:43 +03:00
641 changed files with 11603 additions and 24567 deletions

2
.gitignore vendored
View File

@@ -31,5 +31,3 @@ docs/poetry.lock
compile_commands.json
.ccls-cache/
.mypy_cache
.envrc
rust/Cargo.lock

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -189,8 +189,6 @@ set(swagger_files
api/api-doc/storage_service.json
api/api-doc/stream_manager.json
api/api-doc/system.json
api/api-doc/task_manager.json
api/api-doc/task_manager_test.json
api/api-doc/utils.json)
set(swagger_gen_files)
@@ -303,8 +301,6 @@ set(scylla_sources
api/storage_service.cc
api/stream_manager.cc
api/system.cc
api/task_manager.cc
api/task_manager_test.cc
atomic_cell.cc
auth/allow_all_authenticator.cc
auth/allow_all_authorizer.cc
@@ -428,8 +424,6 @@ set(scylla_sources
cql3/statements/sl_prop_defs.cc
cql3/statements/truncate_statement.cc
cql3/statements/update_statement.cc
cql3/statements/strongly_consistent_modification_statement.cc
cql3/statements/strongly_consistent_select_statement.cc
cql3/statements/use_statement.cc
cql3/type_json.cc
cql3/untyped_result_set.cc
@@ -539,7 +533,6 @@ set(scylla_sources
raft/raft.cc
raft/server.cc
raft/tracker.cc
service/broadcast_tables/experimental/lang.cc
range_tombstone.cc
range_tombstone_list.cc
tombstone_gc_options.cc
@@ -619,7 +612,6 @@ set(scylla_sources
streaming/stream_task.cc
streaming/stream_transfer_task.cc
table_helper.cc
tasks/task_manager.cc
thrift/controller.cc
thrift/handler.cc
thrift/server.cc

View File

@@ -1,12 +1,11 @@
#!/bin/sh
USAGE=$(cat <<-END
Usage: $(basename "$0") [-h|--help] [-o|--output-dir PATH] [--date-stamp DATE] -- generate Scylla version and build information files.
Usage: $(basename "$0") [-h|--help] [-o|--output-dir PATH] -- generate Scylla version and build information files.
Options:
-h|--help show this help message.
-o|--output-dir PATH specify destination path at which the version files are to be created.
-d|--date-stamp DATE manually set date for release parameter
By default, the script will attempt to parse 'version' file
in the current directory, which should contain a string of
@@ -32,8 +31,6 @@ using '-o PATH' option.
END
)
DATE=""
while [[ $# -gt 0 ]]; do
opt="$1"
case $opt in
@@ -46,11 +43,6 @@ while [[ $# -gt 0 ]]; do
shift
shift
;;
--date-stamp)
DATE="$2"
shift
shift
;;
*)
echo "Unexpected argument found: $1"
echo
@@ -66,33 +58,24 @@ if [ -z "$OUTPUT_DIR" ]; then
OUTPUT_DIR="$SCRIPT_DIR/build"
fi
if [ -z "$DATE" ]; then
DATE=$(date --utc +%Y%m%d)
fi
# Default scylla product/version tags
PRODUCT=scylla
VERSION=5.2.0-dev
VERSION=5.1.7
if test -f version
then
SCYLLA_VERSION=$(cat version | awk -F'-' '{print $1}')
SCYLLA_RELEASE=$(cat version | awk -F'-' '{print $2}')
else
DATE=$(date --utc +%Y%m%d)
GIT_COMMIT=$(git -C "$SCRIPT_DIR" log --pretty=format:'%h' -n 1 --abbrev=12)
SCYLLA_VERSION=$VERSION
if [ -z "$SCYLLA_RELEASE" ]; then
DATE=$(date --utc +%Y%m%d)
GIT_COMMIT=$(git -C "$SCRIPT_DIR" log --pretty=format:'%h' -n 1 --abbrev=12)
# For custom package builds, replace "0" with "counter.your_name",
# where counter starts at 1 and increments for successive versions.
# This ensures that the package manager will select your custom
# package over the standard release.
SCYLLA_BUILD=0
SCYLLA_RELEASE=$SCYLLA_BUILD.$DATE.$GIT_COMMIT
elif [ -f "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" ]; then
echo "setting SCYLLA_RELEASE only makes sense in clean builds" 1>&2
exit 1
fi
# For custom package builds, replace "0" with "counter.your_name",
# where counter starts at 1 and increments for successive versions.
# This ensures that the package manager will select your custom
# package over the standard release.
SCYLLA_BUILD=0
SCYLLA_RELEASE=$SCYLLA_BUILD.$DATE.$GIT_COMMIT
fi
if [ -f "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" ]; then

2
abseil

Submodule abseil updated: 7f3c0d7811...9e408e050f

View File

@@ -133,8 +133,7 @@ future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::strin
}
auto selection = cql3::selection::selection::for_columns(schema, {salted_hash_col});
auto partition_slice = query::partition_slice(std::move(bounds), {}, query::column_id_vector{salted_hash_col->id}, selection->get_query_options());
auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice,
proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));
auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, proxy.get_max_result_size(partition_slice));
auto cl = auth::password_authenticator::consistency_for_user(username);
service::client_state client_state{service::client_state::internal_tag()};

View File

@@ -14,8 +14,6 @@
#include "db/config.hh"
#include "cdc/generation_service.hh"
#include "service/memory_limiter.hh"
#include "auth/service.hh"
#include "service/qos/service_level_controller.hh"
using namespace seastar;
@@ -30,8 +28,6 @@ controller::controller(
sharded<db::system_distributed_keyspace>& sys_dist_ks,
sharded<cdc::generation_service>& cdc_gen_svc,
sharded<service::memory_limiter>& memory_limiter,
sharded<auth::service>& auth_service,
sharded<qos::service_level_controller>& sl_controller,
const db::config& config)
: _gossiper(gossiper)
, _proxy(proxy)
@@ -39,8 +35,6 @@ controller::controller(
, _sys_dist_ks(sys_dist_ks)
, _cdc_gen_svc(cdc_gen_svc)
, _memory_limiter(memory_limiter)
, _auth_service(auth_service)
, _sl_controller(sl_controller)
, _config(config)
{
}
@@ -83,7 +77,7 @@ future<> controller::start_server() {
auto get_cdc_metadata = [] (cdc::generation_service& svc) { return std::ref(svc.get_cdc_metadata()); };
_executor.start(std::ref(_gossiper), std::ref(_proxy), std::ref(_mm), std::ref(_sys_dist_ks), sharded_parameter(get_cdc_metadata, std::ref(_cdc_gen_svc)), _ssg.value()).get();
_server.start(std::ref(_executor), std::ref(_proxy), std::ref(_gossiper), std::ref(_auth_service), std::ref(_sl_controller)).get();
_server.start(std::ref(_executor), std::ref(_proxy), std::ref(_gossiper)).get();
// Note: from this point on, if start_server() throws for any reason,
// it must first call stop_server() to stop the executor and server
// services we just started - or Scylla will cause an assertion

View File

@@ -34,14 +34,6 @@ class gossiper;
}
namespace auth {
class service;
}
namespace qos {
class service_level_controller;
}
namespace alternator {
// This is the official DynamoDB API version.
@@ -61,8 +53,6 @@ class controller : public protocol_server {
sharded<db::system_distributed_keyspace>& _sys_dist_ks;
sharded<cdc::generation_service>& _cdc_gen_svc;
sharded<service::memory_limiter>& _memory_limiter;
sharded<auth::service>& _auth_service;
sharded<qos::service_level_controller>& _sl_controller;
const db::config& _config;
std::vector<socket_address> _listen_addresses;
@@ -78,8 +68,6 @@ public:
sharded<db::system_distributed_keyspace>& sys_dist_ks,
sharded<cdc::generation_service>& cdc_gen_svc,
sharded<service::memory_limiter>& memory_limiter,
sharded<auth::service>& auth_service,
sharded<qos::service_level_controller>& sl_controller,
const db::config& config);
virtual sstring name() const override;

View File

@@ -34,6 +34,7 @@
#include "expressions.hh"
#include "conditions.hh"
#include "cql3/constants.hh"
#include "cql3/util.hh"
#include <optional>
#include "utils/overloaded_functor.hh"
#include <seastar/json/json_elements.hh>
@@ -927,9 +928,10 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra
if (!range_key.empty() && range_key != view_hash_key && range_key != view_range_key) {
add_column(view_builder, range_key, attribute_definitions, column_kind::clustering_key);
}
sstring where_clause = "\"" + view_hash_key + "\" IS NOT NULL";
sstring where_clause = format("{} IS NOT NULL", cql3::util::maybe_quote(view_hash_key));
if (!view_range_key.empty()) {
where_clause = where_clause + " AND \"" + view_hash_key + "\" IS NOT NULL";
where_clause = format("{} AND {} IS NOT NULL", where_clause,
cql3::util::maybe_quote(view_range_key));
}
where_clauses.push_back(std::move(where_clause));
view_builders.emplace_back(std::move(view_builder));
@@ -984,9 +986,10 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra
// Note above we don't need to add virtual columns, as all
// base columns were copied to view. TODO: reconsider the need
// for virtual columns when we support Projection.
sstring where_clause = "\"" + view_hash_key + "\" IS NOT NULL";
sstring where_clause = format("{} IS NOT NULL", cql3::util::maybe_quote(view_hash_key));
if (!view_range_key.empty()) {
where_clause = where_clause + " AND \"" + view_range_key + "\" IS NOT NULL";
where_clause = format("{} AND {} IS NOT NULL", where_clause,
cql3::util::maybe_quote(view_range_key));
}
where_clauses.push_back(std::move(where_clause));
view_builders.emplace_back(std::move(view_builder));
@@ -1092,6 +1095,7 @@ future<executor::request_return_type> executor::update_table(client_state& clien
elogger.trace("Updating table {}", request);
static const std::vector<sstring> unsupported = {
"AttributeDefinitions",
"GlobalSecondaryIndexUpdates",
"ProvisionedThroughput",
"ReplicaUpdates",
@@ -1264,22 +1268,6 @@ put_or_delete_item::put_or_delete_item(const rjson::value& key, schema_ptr schem
check_key(key, schema);
}
// find_attribute() checks whether the named attribute is stored in the
// schema as a real column (we do this for key attribute, and for a GSI key)
// and if so, returns that column. If not, the function returns nullptr,
// telling the caller that the attribute is stored serialized in the
// ATTRS_COLUMN_NAME map - not in a stand-alone column in the schema.
static inline const column_definition* find_attribute(const schema& schema, const bytes& attribute_name) {
const column_definition* cdef = schema.get_column_definition(attribute_name);
// Although ATTRS_COLUMN_NAME exists as an actual column, when used as an
// attribute name it should refer to an attribute inside ATTRS_COLUMN_NAME
// not to ATTRS_COLUMN_NAME itself. This if() is needed for #5009.
if (cdef && cdef->name() == executor::ATTRS_COLUMN_NAME) {
return nullptr;
}
return cdef;
}
put_or_delete_item::put_or_delete_item(const rjson::value& item, schema_ptr schema, put_item)
: _pk(pk_from_json(item, schema)), _ck(ck_from_json(item, schema)) {
_cells = std::vector<cell>();
@@ -1287,7 +1275,7 @@ put_or_delete_item::put_or_delete_item(const rjson::value& item, schema_ptr sche
for (auto it = item.MemberBegin(); it != item.MemberEnd(); ++it) {
bytes column_name = to_bytes(it->name.GetString());
validate_value(it->value, "PutItem");
const column_definition* cdef = find_attribute(*schema, column_name);
const column_definition* cdef = schema->get_column_definition(column_name);
if (!cdef) {
bytes value = serialize_item(it->value);
_cells->push_back({std::move(column_name), serialize_item(it->value)});
@@ -1319,7 +1307,7 @@ mutation put_or_delete_item::build(schema_ptr schema, api::timestamp_type ts) co
auto& row = m.partition().clustered_row(*schema, _ck);
attribute_collector attrs_collector;
for (auto& c : *_cells) {
const column_definition* cdef = find_attribute(*schema, c.column_name);
const column_definition* cdef = schema->get_column_definition(c.column_name);
if (!cdef) {
attrs_collector.put(c.column_name, c.value, ts);
} else {
@@ -1384,8 +1372,7 @@ static lw_shared_ptr<query::read_command> previous_item_read_command(service::st
auto regular_columns = boost::copy_range<query::column_id_vector>(
schema->regular_columns() | boost::adaptors::transformed([] (const column_definition& cdef) { return cdef.id; }));
auto partition_slice = query::partition_slice(std::move(bounds), {}, std::move(regular_columns), selection->get_query_options());
return ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, proxy.get_max_result_size(partition_slice),
query::tombstone_limit(proxy.get_tombstone_limit()));
return ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, proxy.get_max_result_size(partition_slice));
}
static dht::partition_range_vector to_partition_ranges(const schema& schema, const partition_key& pk) {
@@ -2774,7 +2761,7 @@ update_item_operation::apply(std::unique_ptr<rjson::value> previous_item, api::t
}
}
}
const column_definition* cdef = find_attribute(*_schema, column_name);
const column_definition* cdef = _schema->get_column_definition(column_name);
if (cdef) {
bytes column_value = get_key_from_typed_value(json_value, *cdef);
row.cells().apply(*cdef, atomic_cell::make_live(*cdef->type, ts, column_value));
@@ -2796,7 +2783,7 @@ update_item_operation::apply(std::unique_ptr<rjson::value> previous_item, api::t
rjson::add_with_string_name(_return_attributes, cn, rjson::copy(*col));
}
}
const column_definition* cdef = find_attribute(*_schema, column_name);
const column_definition* cdef = _schema->get_column_definition(column_name);
if (cdef) {
row.cells().apply(*cdef, atomic_cell::make_dead(ts, gc_clock::now()));
} else {
@@ -3089,8 +3076,7 @@ future<executor::request_return_type> executor::get_item(client_state& client_st
auto selection = cql3::selection::selection::wildcard(schema);
auto partition_slice = query::partition_slice(std::move(bounds), {}, std::move(regular_columns), selection->get_query_options());
auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, _proxy.get_max_result_size(partition_slice),
query::tombstone_limit(_proxy.get_tombstone_limit()));
auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, _proxy.get_max_result_size(partition_slice));
std::unordered_set<std::string> used_attribute_names;
auto attrs_to_get = calculate_attrs_to_get(request, used_attribute_names);
@@ -3244,8 +3230,7 @@ future<executor::request_return_type> executor::batch_get_item(client_state& cli
rs.schema->regular_columns() | boost::adaptors::transformed([] (const column_definition& cdef) { return cdef.id; }));
auto selection = cql3::selection::selection::wildcard(rs.schema);
auto partition_slice = query::partition_slice(std::move(bounds), {}, std::move(regular_columns), selection->get_query_options());
auto command = ::make_lw_shared<query::read_command>(rs.schema->id(), rs.schema->version(), partition_slice, _proxy.get_max_result_size(partition_slice),
query::tombstone_limit(_proxy.get_tombstone_limit()));
auto command = ::make_lw_shared<query::read_command>(rs.schema->id(), rs.schema->version(), partition_slice, _proxy.get_max_result_size(partition_slice));
command->allow_limit = db::allow_per_partition_rate_limit::yes;
future<std::vector<rjson::value>> f = _proxy.query(rs.schema, std::move(command), std::move(partition_ranges), rs.cl,
service::storage_proxy::coordinator_query_options(executor::default_timeout(), permit, client_state, trace_state)).then(
@@ -3646,7 +3631,7 @@ static future<executor::request_return_type> do_query(service::storage_proxy& pr
if (schema->clustering_key_size() > 0) {
pos = pos_from_json(*exclusive_start_key, schema);
}
paging_state = make_lw_shared<service::pager::paging_state>(pk, pos, query::max_partitions, query_id::create_null_id(), service::pager::paging_state::replicas_per_token_range{}, std::nullopt, 0);
paging_state = make_lw_shared<service::pager::paging_state>(pk, pos, query::max_partitions, utils::UUID(), service::pager::paging_state::replicas_per_token_range{}, std::nullopt, 0);
}
auto regular_columns = boost::copy_range<query::column_id_vector>(
@@ -3657,8 +3642,7 @@ static future<executor::request_return_type> do_query(service::storage_proxy& pr
query::partition_slice::option_set opts = selection->get_query_options();
opts.add(custom_opts);
auto partition_slice = query::partition_slice(std::move(ck_bounds), std::move(static_columns), std::move(regular_columns), opts);
auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, proxy.get_max_result_size(partition_slice),
query::tombstone_limit(proxy.get_tombstone_limit()));
auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, proxy.get_max_result_size(partition_slice));
auto query_state_ptr = std::make_unique<service::query_state>(client_state, trace_state, std::move(permit));

View File

@@ -16,7 +16,6 @@
#include <seastar/util/short_streams.hh>
#include "seastarx.hh"
#include "error.hh"
#include "service/qos/service_level_controller.hh"
#include "utils/rjson.hh"
#include "auth.hh"
#include <cctype>
@@ -235,7 +234,7 @@ protected:
future<std::string> server::verify_signature(const request& req, const chunked_content& content) {
if (!_enforce_authorization) {
slogger.debug("Skipping authorization");
return make_ready_future<std::string>();
return make_ready_future<std::string>("<unauthenticated request>");
}
auto host_it = req._headers.find("Host");
if (host_it == req._headers.end()) {
@@ -365,9 +364,7 @@ static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_
tracing::add_session_param(trace_state, "alternator_op", op);
tracing::add_query(trace_state, truncated_content_view(query, buf));
tracing::begin(trace_state, format("Alternator {}", op), client_state.get_client_address());
if (!username.empty()) {
tracing::set_username(trace_state, auth::authenticated_user(username));
}
tracing::set_username(trace_state, auth::authenticated_user(username));
}
return trace_state;
}
@@ -410,11 +407,7 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr
auto leave = defer([this] () noexcept { _pending_requests.leave(); });
//FIXME: Client state can provide more context, e.g. client's endpoint address
// We use unique_ptr because client_state cannot be moved or copied
executor::client_state client_state = username.empty()
? service::client_state{service::client_state::internal_tag()}
: service::client_state{service::client_state::internal_tag(), _auth_service, _sl_controller, username};
co_await client_state.maybe_update_per_service_level_params();
executor::client_state client_state{executor::client_state::internal_tag()};
tracing::trace_state_ptr trace_state = maybe_trace_query(client_state, username, op, content);
tracing::trace(trace_state, op);
rjson::value json_request = co_await _json_parser.parse(std::move(content));
@@ -447,14 +440,12 @@ void server::set_routes(routes& r) {
//FIXME: A way to immediately invalidate the cache should be considered,
// e.g. when the system table which stores the keys is changed.
// For now, this propagation may take up to 1 minute.
server::server(executor& exec, service::storage_proxy& proxy, gms::gossiper& gossiper, auth::service& auth_service, qos::service_level_controller& sl_controller)
server::server(executor& exec, service::storage_proxy& proxy, gms::gossiper& gossiper)
: _http_server("http-alternator")
, _https_server("https-alternator")
, _executor(exec)
, _proxy(proxy)
, _gossiper(gossiper)
, _auth_service(auth_service)
, _sl_controller(sl_controller)
, _key_cache(1024, 1min, slogger)
, _enforce_authorization(false)
, _enabled_servers{}

View File

@@ -15,7 +15,6 @@
#include <seastar/net/tls.hh>
#include <optional>
#include "alternator/auth.hh"
#include "service/qos/service_level_controller.hh"
#include "utils/small_vector.hh"
#include "utils/updateable_value.hh"
#include <seastar/core/units.hh>
@@ -35,8 +34,6 @@ class server {
executor& _executor;
service::storage_proxy& _proxy;
gms::gossiper& _gossiper;
auth::service& _auth_service;
qos::service_level_controller& _sl_controller;
key_cache _key_cache;
bool _enforce_authorization;
@@ -68,7 +65,7 @@ class server {
json_parser _json_parser;
public:
server(executor& executor, service::storage_proxy& proxy, gms::gossiper& gossiper, auth::service& service, qos::service_level_controller& sl_controller);
server(executor& executor, service::storage_proxy& proxy, gms::gossiper& gossiper);
future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,
bool enforce_authorization, semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests);

View File

@@ -74,8 +74,8 @@ struct rapidjson::internal::TypeHelper<ValueType, utils::UUID>
: public from_string_helper<ValueType, utils::UUID>
{};
static db_clock::time_point as_timepoint(const table_id& tid) {
return db_clock::time_point{utils::UUID_gen::unix_timestamp(tid.uuid())};
static db_clock::time_point as_timepoint(const utils::UUID& uuid) {
return db_clock::time_point{utils::UUID_gen::unix_timestamp(uuid)};
}
/**
@@ -106,9 +106,6 @@ public:
stream_arn(const UUID& uuid)
: UUID(uuid)
{}
stream_arn(const table_id& tid)
: UUID(tid.uuid())
{}
stream_arn(std::string_view v)
: UUID(v.substr(1))
{
@@ -145,20 +142,25 @@ future<alternator::executor::request_return_type> alternator::executor::list_str
auto table = find_table(_proxy, request);
auto db = _proxy.data_dictionary();
auto cfs = db.get_tables();
auto i = cfs.begin();
auto e = cfs.end();
if (limit < 1) {
throw api_error::validation("Limit must be 1 or more");
}
// TODO: the unordered_map here is not really well suited for partial
// querying - we're sorting on local hash order, and creating a table
// between queries may or may not miss info. But that should be rare,
// and we can probably expect this to be a single call.
// # 12601 (maybe?) - sort the set of tables on ID. This should ensure we never
// generate duplicates in a paged listing here. Can obviously miss things if they
// are added between paged calls and end up with a "smaller" UUID/ARN, but that
// is to be expected.
std::sort(cfs.begin(), cfs.end(), [](const data_dictionary::table& t1, const data_dictionary::table& t2) {
return t1.schema()->id() < t2.schema()->id();
});
auto i = cfs.begin();
auto e = cfs.end();
if (streams_start) {
i = std::find_if(i, e, [&](data_dictionary::table t) {
return t.schema()->id().uuid() == streams_start
i = std::find_if(i, e, [&](const data_dictionary::table& t) {
return t.schema()->id() == streams_start
&& cdc::get_base_table(db.real_database(), *t.schema())
&& is_alternator_keyspace(t.schema()->ks_name())
;
@@ -433,7 +435,7 @@ future<executor::request_return_type> executor::describe_stream(client_state& cl
auto db = _proxy.data_dictionary();
try {
auto cf = db.find_column_family(table_id(stream_arn));
auto cf = db.find_column_family(stream_arn);
schema = cf.schema();
bs = cdc::get_base_table(db.real_database(), *schema);
} catch (...) {
@@ -720,7 +722,7 @@ future<executor::request_return_type> executor::get_shard_iterator(client_state&
std::optional<shard_id> sid;
try {
auto cf = db.find_column_family(table_id(stream_arn));
auto cf = db.find_column_family(stream_arn);
schema = cf.schema();
sid = rjson::get<shard_id>(request, "ShardId");
} catch (...) {
@@ -805,7 +807,7 @@ future<executor::request_return_type> executor::get_records(client_state& client
auto db = _proxy.data_dictionary();
schema_ptr schema, base;
try {
auto log_table = db.find_column_family(table_id(iter.table));
auto log_table = db.find_column_family(iter.table);
schema = log_table.schema();
base = cdc::get_base_table(db.real_database(), *schema);
} catch (...) {
@@ -879,7 +881,7 @@ future<executor::request_return_type> executor::get_records(client_state& client
++mul;
}
auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, _proxy.get_max_result_size(partition_slice),
query::tombstone_limit(_proxy.get_tombstone_limit()), query::row_limit(limit * mul));
query::row_limit(limit * mul));
return _proxy.query(schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), std::move(permit), client_state)).then(
[this, schema, partition_slice = std::move(partition_slice), selection = std::move(selection), start_time = std::move(start_time), limit, key_names = std::move(key_names), attr_names = std::move(attr_names), type, iter, high_ts] (service::storage_proxy::coordinator_query_result qr) mutable {

View File

@@ -136,7 +136,7 @@ future<executor::request_return_type> executor::describe_time_to_live(client_sta
// expiration_service is a sharded service responsible for cleaning up expired
// items in all tables with per-item expiration enabled. Currently, this means
// Alternator tables with TTL configured via a UpdateTimeToLive request.
// Alternator tables with TTL configured via a UpdateTimeToLeave request.
//
// Here is a brief overview of how the expiration service works:
//
@@ -150,25 +150,25 @@ future<executor::request_return_type> executor::describe_time_to_live(client_sta
// To avoid scanning the same items RF times in RF replicas, only one node is
// responsible for scanning a token range at a time. Normally, this is the
// node owning this range as a "primary range" (the first node in the ring
// with this range), but when this node is down, the secondary owner (the
// second in the ring) may take over.
// with this range), but when this node is down, other nodes may take over
// (FIXME: this is not implemented yet).
// An expiration thread is reponsible for all tables which need expiration
// scans. Currently, the different tables are scanned sequentially (not in
// parallel).
// scans. FIXME: explain how this is done with multiple tables - parallel,
// staggered, or what?
// The expiration thread scans item using CL=QUORUM to ensures that it reads
// a consistent expiration-time attribute. This means that the items are read
// locally and in addition QUORUM-1 additional nodes (one additional node
// when RF=3) need to read the data and send digests.
// FIXME: explain if we can read the exact attribute or the entire map.
// When the expiration thread decides that an item has expired and wants
// to delete it, it does it using a CL=QUORUM write. This allows this
// deletion to be visible for consistent (quorum) reads. The deletion,
// like user deletions, will also appear on the CDC log and therefore
// Alternator Streams if enabled - currently as ordinary deletes (the
// userIdentity flag is currently missing this is issue #11523).
expiration_service::expiration_service(data_dictionary::database db, service::storage_proxy& proxy, gms::gossiper& g)
// Alternator Streams if enabled (FIXME: explain how we mark the
// deletion different from user deletes. We don't do it yet.).
expiration_service::expiration_service(data_dictionary::database db, service::storage_proxy& proxy)
: _db(db)
, _proxy(proxy)
, _gossiper(g)
{
}
@@ -282,9 +282,7 @@ static future<> expire_item(service::storage_proxy& proxy,
auto ck = clustering_key::from_exploded(exploded_ck);
m.partition().clustered_row(*schema, ck).apply(tombstone(ts, gc_clock::now()));
}
std::vector<mutation> mutations;
mutations.push_back(std::move(m));
return proxy.mutate(std::move(mutations),
return proxy.mutate(std::vector<mutation>{std::move(m)},
db::consistency_level::LOCAL_QUORUM,
executor::default_timeout(), // FIXME - which timeout?
qs.get_trace_state(), qs.get_permit(),
@@ -367,7 +365,7 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary
// 2. The primary replica for this token is currently marked down.
// 3. In this node, this shard is responsible for this token.
// We use the <secondary> case to handle the possibility that some of the
// nodes in the system are down. A dead node will not be expiring
// nodes in the system are down. A dead node will not be expiring expiring
// the tokens owned by it, so we want the secondary owner to take over its
// primary ranges.
//
@@ -513,7 +511,7 @@ struct scan_ranges_context {
opts.set<query::partition_slice::option::bypass_cache>();
std::vector<query::clustering_range> ck_bounds{query::clustering_range::make_open_ended_both_sides()};
auto partition_slice = query::partition_slice(std::move(ck_bounds), {}, std::move(regular_columns), opts);
command = ::make_lw_shared<query::read_command>(s->id(), s->version(), partition_slice, proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));
command = ::make_lw_shared<query::read_command>(s->id(), s->version(), partition_slice, proxy.get_max_result_size(partition_slice));
executor::client_state client_state{executor::client_state::internal_tag()};
tracing::trace_state_ptr trace_state;
// NOTICE: empty_service_permit is used because the TTL service has fixed parallelism
@@ -639,7 +637,6 @@ static future<> scan_table_ranges(
static future<bool> scan_table(
service::storage_proxy& proxy,
data_dictionary::database db,
gms::gossiper& gossiper,
schema_ptr s,
abort_source& abort_source,
named_semaphore& page_sem,
@@ -692,7 +689,7 @@ static future<bool> scan_table(
expiration_stats.scan_table++;
// FIXME: need to pace the scan, not do it all at once.
scan_ranges_context scan_ctx{s, proxy, std::move(column_name), std::move(member)};
token_ranges_owned_by_this_shard<primary> my_ranges(db.real_database(), gossiper, s);
token_ranges_owned_by_this_shard<primary> my_ranges(db.real_database(), proxy.gossiper(), s);
while (std::optional<dht::partition_range> range = my_ranges.next_partition_range()) {
// Note that because of issue #9167 we need to run a separate
// query on each partition range, and can't pass several of
@@ -712,7 +709,7 @@ static future<bool> scan_table(
// by tasking another node to take over scanning of the dead node's primary
// ranges. What we do here is that this node will also check expiration
// on its *secondary* ranges - but only those whose primary owner is down.
token_ranges_owned_by_this_shard<secondary> my_secondary_ranges(db.real_database(), gossiper, s);
token_ranges_owned_by_this_shard<secondary> my_secondary_ranges(db.real_database(), proxy.gossiper(), s);
while (std::optional<dht::partition_range> range = my_secondary_ranges.next_partition_range()) {
expiration_stats.secondary_ranges_scanned++;
dht::partition_range_vector partition_ranges;
@@ -744,7 +741,7 @@ future<> expiration_service::run() {
co_return;
}
try {
co_await scan_table(_proxy, _db, _gossiper, s, _abort_source, _page_sem, _expiration_stats);
co_await scan_table(_proxy, _db, s, _abort_source, _page_sem, _expiration_stats);
} catch (...) {
// The scan of a table may fail in the middle for many
// reasons, including network failure and even the table
@@ -770,15 +767,13 @@ future<> expiration_service::run() {
// in the next iteration by reducing the scanner's scheduling-group
// share (if using a separate scheduling group), or introduce
// finer-grain sleeps into the scanning code.
std::chrono::milliseconds scan_duration(std::chrono::duration_cast<std::chrono::milliseconds>(lowres_clock::now() - start));
std::chrono::milliseconds period(long(_db.get_config().alternator_ttl_period_in_seconds() * 1000));
std::chrono::seconds scan_duration(std::chrono::duration_cast<std::chrono::seconds>(lowres_clock::now() - start));
std::chrono::seconds period(_db.get_config().alternator_ttl_period_in_seconds());
if (scan_duration < period) {
try {
tlogger.info("sleeping {} seconds until next period", (period - scan_duration).count()/1000.0);
tlogger.info("sleeping {} seconds until next period", (period - scan_duration).count());
co_await seastar::sleep_abortable(period - scan_duration, _abort_source);
} catch(seastar::sleep_aborted&) {}
} else {
tlogger.warn("scan took {} seconds, longer than period - not sleeping", scan_duration.count()/1000.0);
}
}
}

View File

@@ -14,10 +14,6 @@
#include <seastar/core/semaphore.hh>
#include "data_dictionary/data_dictionary.hh"
namespace gms {
class gossiper;
}
namespace replica {
class database;
}
@@ -51,7 +47,6 @@ public:
private:
data_dictionary::database _db;
service::storage_proxy& _proxy;
gms::gossiper& _gossiper;
// _end is set by start(), and resolves when the the background service
// started by it ends. To ask the background service to end, _abort_source
// should be triggered. stop() below uses both _abort_source and _end.
@@ -65,7 +60,7 @@ public:
// sharded_service<expiration_service>::start() creates this object on
// all shards, so calls this constructor on each shard. Later, the
// additional start() function should be invoked on all shards.
expiration_service(data_dictionary::database, service::storage_proxy&, gms::gossiper&);
expiration_service(data_dictionary::database, service::storage_proxy&);
future<> start();
future<> run();
// sharded_service<expiration_service>::stop() calls the following stop()

View File

@@ -1228,7 +1228,7 @@
"operations":[
{
"method":"POST",
"summary":"Removes a node from the cluster. Replicated data that logically belonged to this node is redistributed among the remaining nodes.",
"summary":"Removes token (and all data associated with enpoint that had it) from the ring",
"type":"void",
"nickname":"remove_node",
"produces":[
@@ -1245,7 +1245,7 @@
},
{
"name":"ignore_nodes",
"description":"Comma-separated list of dead nodes to ignore in removenode operation. Use the same method for all nodes to ignore: either Host IDs or ip addresses.",
"description":"List of dead nodes to ingore in removenode operation",
"required":false,
"allowMultiple":false,
"type":"string",

View File

@@ -52,45 +52,6 @@
}
]
},
{
"path":"/system/log",
"operations":[
{
"method":"POST",
"summary":"Write a message to the Scylla log",
"type":"void",
"nickname":"write_log_message",
"produces":[
"application/json"
],
"parameters":[
{
"name":"message",
"description":"The message to write to the log",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"level",
"description":"The logging level to use",
"required":true,
"allowMultiple":false,
"type":"string",
"enum":[
"error",
"warn",
"info",
"debug",
"trace"
],
"paramType":"query"
}
]
}
]
},
{
"path":"/system/drop_sstable_caches",
"operations":[

View File

@@ -1,251 +0,0 @@
{
"apiVersion":"0.0.1",
"swaggerVersion":"1.2",
"basePath":"{{Protocol}}://{{Host}}",
"resourcePath":"/task_manager",
"produces":[
"application/json"
],
"apis":[
{
"path":"/task_manager/list_modules",
"operations":[
{
"method":"GET",
"summary":"Get all modules names",
"type":"array",
"items":{
"type":"string"
},
"nickname":"get_modules",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/task_manager/list_module_tasks/{module}",
"operations":[
{
"method":"GET",
"summary":"Get a list of tasks",
"type":"array",
"items":{
"type":"task_stats"
},
"nickname":"get_tasks",
"produces":[
"application/json"
],
"parameters":[
{
"name":"module",
"description":"The module to query about",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"internal",
"description":"Boolean flag indicating whether internal tasks should be shown (false by default)",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"keyspace",
"description":"The keyspace to query about",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"table",
"description":"The table to query about",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/task_manager/task_status/{task_id}",
"operations":[
{
"method":"GET",
"summary":"Get task status",
"type":"task_status",
"nickname":"get_task_status",
"produces":[
"application/json"
],
"parameters":[
{
"name":"task_id",
"description":"The uuid of a task to query about",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
{
"path":"/task_manager/abort_task/{task_id}",
"operations":[
{
"method":"POST",
"summary":"Abort running task and its descendants",
"type":"void",
"nickname":"abort_task",
"produces":[
"application/json"
],
"parameters":[
{
"name":"task_id",
"description":"The uuid of a task to abort",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
{
"path":"/task_manager/wait_task/{task_id}",
"operations":[
{
"method":"GET",
"summary":"Wait for a task to complete",
"type":"task_status",
"nickname":"wait_task",
"produces":[
"application/json"
],
"parameters":[
{
"name":"task_id",
"description":"The uuid of a task to wait for",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
}
],
"models":{
"task_stats" :{
"id": "task_stats",
"description":"A task statistics object",
"properties":{
"task_id":{
"type":"string",
"description":"The uuid of a task"
},
"state":{
"type":"string",
"enum":[
"created",
"running",
"done",
"failed"
],
"description":"The state of a task"
}
}
},
"task_status":{
"id":"task_status",
"description":"A task status object",
"properties":{
"id":{
"type":"string",
"description":"The uuid of the task"
},
"type":{
"type":"string",
"description":"The description of the task"
},
"state":{
"type":"string",
"enum":[
"created",
"running",
"done",
"failed"
],
"description":"The state of the task"
},
"is_abortable":{
"type":"boolean",
"description":"Boolean flag indicating whether the task can be aborted"
},
"start_time":{
"type":"datetime",
"description":"The start time of the task"
},
"end_time":{
"type":"datetime",
"description":"The end time of the task (unspecified when the task is not completed)"
},
"error":{
"type":"string",
"description":"Error string, if the task failed"
},
"parent_id":{
"type":"string",
"description":"The uuid of the parent task"
},
"sequence_number":{
"type":"long",
"description":"The running sequence number of the task"
},
"shard":{
"type":"long",
"description":"The number of a shard the task is running on"
},
"keyspace":{
"type":"string",
"description":"The keyspace the task is working on (if applicable)"
},
"table":{
"type":"string",
"description":"The table the task is working on (if applicable)"
},
"entity":{
"type":"string",
"description":"Task-specific entity description"
},
"progress_units":{
"type":"string",
"description":"A description of the progress units"
},
"progress_total":{
"type":"double",
"description":"The total number of units to complete for the task"
},
"progress_completed":{
"type":"double",
"description":"The number of units completed so far"
}
}
}
}
}

View File

@@ -1,185 +0,0 @@
{
"apiVersion":"0.0.1",
"swaggerVersion":"1.2",
"basePath":"{{Protocol}}://{{Host}}",
"resourcePath":"/task_manager_test",
"produces":[
"application/json"
],
"apis":[
{
"path":"/task_manager_test/test_module",
"operations":[
{
"method":"POST",
"summary":"Register test module in task manager",
"type":"void",
"nickname":"register_test_module",
"produces":[
"application/json"
],
"parameters":[
]
},
{
"method":"DELETE",
"summary":"Unregister test module in task manager",
"type":"void",
"nickname":"unregister_test_module",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/task_manager_test/test_task",
"operations":[
{
"method":"POST",
"summary":"Register test task",
"type":"string",
"nickname":"register_test_task",
"produces":[
"application/json"
],
"parameters":[
{
"name":"task_id",
"description":"The uuid of a task to register",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"shard",
"description":"The shard of the task",
"required":false,
"allowMultiple":false,
"type":"long",
"paramType":"query"
},
{
"name":"parent_id",
"description":"The uuid of a parent task",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"keyspace",
"description":"The keyspace the task is working on",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"table",
"description":"The table the task is working on",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"type",
"description":"The type of the task",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"entity",
"description":"Task-specific entity description",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
},
{
"method":"DELETE",
"summary":"Unregister test task",
"type":"void",
"nickname":"unregister_test_task",
"produces":[
"application/json"
],
"parameters":[
{
"name":"task_id",
"description":"The uuid of a task to register",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/task_manager_test/finish_test_task/{task_id}",
"operations":[
{
"method":"POST",
"summary":"Finish test task",
"type":"void",
"nickname":"finish_test_task",
"produces":[
"application/json"
],
"parameters":[
{
"name":"task_id",
"description":"The uuid of a task to finish",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"error",
"description":"The error with which task fails (if it does)",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/task_manager_test/ttl",
"operations":[
{
"method":"POST",
"summary":"Set ttl in seconds and get last value",
"type":"long",
"nickname":"get_and_update_ttl",
"produces":[
"application/json"
],
"parameters":[
{
"name":"ttl",
"description":"The number of seconds for which the tasks will be kept in memory after it finishes",
"required":true,
"allowMultiple":false,
"type":"long",
"paramType":"query"
}
]
}
]
}
]
}

View File

@@ -29,8 +29,6 @@
#include "stream_manager.hh"
#include "system.hh"
#include "api/config.hh"
#include "task_manager.hh"
#include "task_manager_test.hh"
logging::logger apilog("api");
@@ -148,14 +146,8 @@ future<> unset_server_snapshot(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_snapshot(ctx, r); });
}
future<> set_server_snitch(http_context& ctx, sharded<locator::snitch_ptr>& snitch) {
return register_api(ctx, "endpoint_snitch_info", "The endpoint snitch info API", [&snitch] (http_context& ctx, routes& r) {
set_endpoint_snitch(ctx, r, snitch);
});
}
future<> unset_server_snitch(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_endpoint_snitch(ctx, r); });
future<> set_server_snitch(http_context& ctx) {
return register_api(ctx, "endpoint_snitch_info", "The endpoint snitch info API", set_endpoint_snitch);
}
future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g) {
@@ -253,30 +245,6 @@ future<> set_server_done(http_context& ctx) {
});
}
future<> set_server_task_manager(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx](routes& r) {
rb->register_function(r, "task_manager",
"The task manager API");
set_task_manager(ctx, r);
});
}
#ifndef SCYLLA_BUILD_MODE_RELEASE
future<> set_server_task_manager_test(http_context& ctx, lw_shared_ptr<db::config> cfg) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx, &cfg = *cfg](routes& r) mutable {
rb->register_function(r, "task_manager_test",
"The task manager test API");
set_task_manager_test(ctx, r, cfg);
});
}
#endif
void req_params::process(const request& req) {
// Process mandatory parameters
for (auto& [name, ent] : params) {

View File

@@ -137,14 +137,6 @@ future<json::json_return_type> sum_timer_stats(distributed<T>& d, utils::timed_
});
}
template<class T, class F>
future<json::json_return_type> sum_timer_stats(distributed<T>& d, utils::timed_rate_moving_average_summary_and_histogram F::*f) {
return d.map_reduce0([f](const T& p) {return (p.get_stats().*f).rate();}, utils::rate_moving_average_and_histogram(),
std::plus<utils::rate_moving_average_and_histogram>()).then([](const utils::rate_moving_average_and_histogram& val) {
return make_ready_future<json::json_return_type>(timer_to_json(val));
});
}
inline int64_t min_int64(int64_t a, int64_t b) {
return std::min(a,b);
}

View File

@@ -11,7 +11,6 @@
#include <seastar/core/future.hh>
#include "replica/database_fwd.hh"
#include "tasks/task_manager.hh"
#include "seastarx.hh"
namespace service {
@@ -32,7 +31,6 @@ namespace locator {
class token_metadata;
class shared_token_metadata;
class snitch_ptr;
} // namespace locator
@@ -68,12 +66,11 @@ struct http_context {
distributed<service::storage_proxy>& sp;
service::load_meter& lmeter;
const sharded<locator::shared_token_metadata>& shared_token_metadata;
sharded<tasks::task_manager>& tm;
http_context(distributed<replica::database>& _db,
distributed<service::storage_proxy>& _sp,
service::load_meter& _lm, const sharded<locator::shared_token_metadata>& _stm, sharded<tasks::task_manager>& _tm)
: db(_db), sp(_sp), lmeter(_lm), shared_token_metadata(_stm), tm(_tm) {
service::load_meter& _lm, const sharded<locator::shared_token_metadata>& _stm)
: db(_db), sp(_sp), lmeter(_lm), shared_token_metadata(_stm) {
}
const locator::token_metadata& get_token_metadata();
@@ -81,8 +78,7 @@ struct http_context {
future<> set_server_init(http_context& ctx);
future<> set_server_config(http_context& ctx, const db::config& cfg);
future<> set_server_snitch(http_context& ctx, sharded<locator::snitch_ptr>& snitch);
future<> unset_server_snitch(http_context& ctx);
future<> set_server_snitch(http_context& ctx);
future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, sharded<gms::gossiper>& g, sharded<cdc::generation_service>& cdc_gs, sharded<db::system_keyspace>& sys_ks);
future<> set_server_sstables_loader(http_context& ctx, sharded<sstables_loader>& sst_loader);
future<> unset_server_sstables_loader(http_context& ctx);
@@ -111,7 +107,5 @@ future<> set_server_gossip_settle(http_context& ctx, sharded<gms::gossiper>& g);
future<> set_server_cache(http_context& ctx);
future<> set_server_compaction_manager(http_context& ctx);
future<> set_server_done(http_context& ctx);
future<> set_server_task_manager(http_context& ctx);
future<> set_server_task_manager_test(http_context& ctx, lw_shared_ptr<db::config> cfg);
}

View File

@@ -14,7 +14,7 @@
#include "sstables/metadata_collector.hh"
#include "utils/estimated_histogram.hh"
#include <algorithm>
#include "db/system_keyspace.hh"
#include "db/system_keyspace_view_types.hh"
#include "db/data_listeners.hh"
#include "storage_service.hh"
#include "unimplemented.hh"
@@ -43,7 +43,7 @@ std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {
return std::make_tuple(name.substr(0, pos), name.substr(end));
}
const table_id& get_uuid(const sstring& ks, const sstring& cf, const replica::database& db) {
const utils::UUID& get_uuid(const sstring& ks, const sstring& cf, const replica::database& db) {
try {
return db.find_uuid(ks, cf);
} catch (replica::no_such_column_family& e) {
@@ -51,7 +51,7 @@ const table_id& get_uuid(const sstring& ks, const sstring& cf, const replica::da
}
}
const table_id& get_uuid(const sstring& name, const replica::database& db) {
const utils::UUID& get_uuid(const sstring& name, const replica::database& db) {
auto [ks, cf] = parse_fully_qualified_cf_name(name);
return get_uuid(ks, cf, db);
}
@@ -110,7 +110,7 @@ static future<json::json_return_type> get_cf_stats_count(http_context& ctx,
static future<json::json_return_type> get_cf_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram replica::column_family_stats::*f) {
auto uuid = get_uuid(name, ctx.db.local());
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const replica::database& p) {
return (p.find_column_family(uuid).get_stats().*f).hist;},
utils::ihistogram(),
@@ -122,7 +122,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, const
static future<json::json_return_type> get_cf_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_summary_and_histogram replica::column_family_stats::*f) {
auto uuid = get_uuid(name, ctx.db.local());
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const replica::database& p) {
return (p.find_column_family(uuid).get_stats().*f).hist;},
utils::ihistogram(),
@@ -149,7 +149,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils:
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_summary_and_histogram replica::column_family_stats::*f) {
auto uuid = get_uuid(name, ctx.db.local());
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const replica::database& p) {
return (p.find_column_family(uuid).get_stats().*f).rate();},
utils::rate_moving_average_and_histogram(),
@@ -394,7 +394,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_all_cf_all_memtables_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {
warn(unimplemented::cause::INDEXES);
return ctx.db.map_reduce0([](const replica::database& db){
return db.dirty_memory_region_group().real_memory_used();
return db.dirty_memory_region_group().memory_used();
}, int64_t(0), std::plus<int64_t>()).then([](int res) {
return make_ready_future<json::json_return_type>(res);
});
@@ -855,7 +855,7 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_auto_compaction.set(r, [&ctx] (const_req req) {
auto uuid = get_uuid(req.param["name"], ctx.db.local());
const utils::UUID& uuid = get_uuid(req.param["name"], ctx.db.local());
replica::column_family& cf = ctx.db.local().find_column_family(uuid);
return !cf.is_auto_compaction_disabled_by_user();
});

View File

@@ -18,7 +18,7 @@ namespace api {
void set_column_family(http_context& ctx, routes& r);
const table_id& get_uuid(const sstring& name, const replica::database& db);
const utils::UUID& get_uuid(const sstring& name, const replica::database& db);
future<> foreach_column_family(http_context& ctx, const sstring& name, std::function<void(replica::column_family&)> f);
@@ -63,7 +63,7 @@ struct map_reduce_column_families_locally {
std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)> reducer;
future<std::unique_ptr<std::any>> operator()(replica::database& db) const {
auto res = seastar::make_lw_shared<std::unique_ptr<std::any>>(std::make_unique<std::any>(init));
return do_for_each(db.get_column_families(), [res, this](const std::pair<table_id, seastar::lw_shared_ptr<replica::table>>& i) {
return do_for_each(db.get_column_families(), [res, this](const std::pair<utils::UUID, seastar::lw_shared_ptr<replica::table>>& i) {
*res = reducer(std::move(*res), mapper(*i.second.get()));
}).then([res] {
return std::move(*res);

View File

@@ -68,7 +68,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {
cm::get_pending_tasks_by_table.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.db.map_reduce0([&ctx](replica::database& db) {
return do_with(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), [&ctx, &db](std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& tasks) {
return do_for_each(db.get_column_families(), [&tasks](const std::pair<table_id, seastar::lw_shared_ptr<replica::table>>& i) {
return do_for_each(db.get_column_families(), [&tasks](const std::pair<utils::UUID, seastar::lw_shared_ptr<replica::table>>& i) {
replica::table& cf = *i.second.get();
tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.get_compaction_strategy().estimated_pending_compactions(cf.as_table_state());
return make_ready_future<>();

View File

@@ -8,15 +8,13 @@
#include "locator/token_metadata.hh"
#include "locator/snitch_base.hh"
#include "locator/production_snitch_base.hh"
#include "endpoint_snitch.hh"
#include "api/api-doc/endpoint_snitch_info.json.hh"
#include "api/api-doc/storage_service.json.hh"
#include "utils/fb_utilities.hh"
namespace api {
void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_ptr>& snitch) {
void set_endpoint_snitch(http_context& ctx, routes& r) {
static auto host_or_broadcast = [](const_req req) {
auto host = req.get_query_param("host");
return host.empty() ? gms::inet_address(utils::fb_utilities::get_broadcast_address()) : gms::inet_address(host);
@@ -24,45 +22,17 @@ void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_p
httpd::endpoint_snitch_info_json::get_datacenter.set(r, [&ctx](const_req req) {
auto& topology = ctx.shared_token_metadata.local().get()->get_topology();
auto ep = host_or_broadcast(req);
if (!topology.has_endpoint(ep, locator::topology::pending::yes)) {
// Cannot return error here, nodetool status can race, request
// info about just-left node and not handle it nicely
return sstring(locator::production_snitch_base::default_dc);
}
return topology.get_datacenter(ep);
return topology.get_datacenter(host_or_broadcast(req));
});
httpd::endpoint_snitch_info_json::get_rack.set(r, [&ctx](const_req req) {
auto& topology = ctx.shared_token_metadata.local().get()->get_topology();
auto ep = host_or_broadcast(req);
if (!topology.has_endpoint(ep, locator::topology::pending::yes)) {
// Cannot return error here, nodetool status can race, request
// info about just-left node and not handle it nicely
return sstring(locator::production_snitch_base::default_rack);
}
return topology.get_rack(ep);
return topology.get_rack(host_or_broadcast(req));
});
httpd::endpoint_snitch_info_json::get_snitch_name.set(r, [&snitch] (const_req req) {
return snitch.local()->get_name();
httpd::endpoint_snitch_info_json::get_snitch_name.set(r, [] (const_req req) {
return locator::i_endpoint_snitch::get_local_snitch_ptr()->get_name();
});
httpd::storage_service_json::update_snitch.set(r, [&snitch](std::unique_ptr<request> req) {
locator::snitch_config cfg;
cfg.name = req->get_query_param("ep_snitch_class_name");
return locator::i_endpoint_snitch::reset_snitch(snitch, cfg).then([] {
return make_ready_future<json::json_return_type>(json::json_void());
});
});
}
void unset_endpoint_snitch(http_context& ctx, routes& r) {
httpd::endpoint_snitch_info_json::get_datacenter.unset(r);
httpd::endpoint_snitch_info_json::get_rack.unset(r);
httpd::endpoint_snitch_info_json::get_snitch_name.unset(r);
httpd::storage_service_json::update_snitch.unset(r);
}
}

View File

@@ -10,13 +10,8 @@
#include "api.hh"
namespace locator {
class snitch_ptr;
}
namespace api {
void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_ptr>&);
void unset_endpoint_snitch(http_context& ctx, routes& r);
void set_endpoint_snitch(http_context& ctx, routes& r);
}

View File

@@ -22,9 +22,6 @@ namespace sp = httpd::storage_proxy_json;
using proxy = service::storage_proxy;
using namespace json;
utils::time_estimated_histogram timed_rate_moving_average_summary_merge(utils::time_estimated_histogram a, const utils::timed_rate_moving_average_summary_and_histogram& b) {
return a.merge(b.histogram());
}
/**
* This function implement a two dimentional map reduce where
@@ -58,10 +55,10 @@ future<V> two_dimensional_map_reduce(distributed<service::storage_proxy>& d,
* @param initial_value - the initial value to use for both aggregations* @return
* @return A future that resolves to the result of the aggregation.
*/
template<typename V, typename Reducer, typename F, typename C>
template<typename V, typename Reducer, typename F>
future<V> two_dimensional_map_reduce(distributed<service::storage_proxy>& d,
C F::*f, Reducer reducer, V initial_value) {
return two_dimensional_map_reduce(d, [f] (F& stats) -> V {
V F::*f, Reducer reducer, V initial_value) {
return two_dimensional_map_reduce(d, [f] (F& stats) {
return stats.*f;
}, reducer, initial_value);
}
@@ -115,10 +112,10 @@ utils_json::estimated_histogram time_to_json_histogram(const utils::time_estimat
return res;
}
static future<json::json_return_type> sum_estimated_histogram(http_context& ctx, utils::timed_rate_moving_average_summary_and_histogram service::storage_proxy_stats::stats::*f) {
return two_dimensional_map_reduce(ctx.sp, [f] (service::storage_proxy_stats::stats& stats) {
return (stats.*f).histogram();
}, utils::time_estimated_histogram_merge, utils::time_estimated_histogram()).then([](const utils::time_estimated_histogram& val) {
static future<json::json_return_type> sum_estimated_histogram(http_context& ctx, utils::time_estimated_histogram service::storage_proxy_stats::stats::*f) {
return two_dimensional_map_reduce(ctx.sp, f, utils::time_estimated_histogram_merge,
utils::time_estimated_histogram()).then([](const utils::time_estimated_histogram& val) {
return make_ready_future<json::json_return_type>(time_to_json_histogram(val));
});
}
@@ -133,7 +130,7 @@ static future<json::json_return_type> sum_estimated_histogram(http_context& ctx
});
}
static future<json::json_return_type> total_latency(http_context& ctx, utils::timed_rate_moving_average_summary_and_histogram service::storage_proxy_stats::stats::*f) {
static future<json::json_return_type> total_latency(http_context& ctx, utils::timed_rate_moving_average_and_histogram service::storage_proxy_stats::stats::*f) {
return two_dimensional_map_reduce(ctx.sp, [f] (service::storage_proxy_stats::stats& stats) {
return (stats.*f).hist.mean * (stats.*f).hist.count;
}, std::plus<double>(), 0.0).then([](double val) {
@@ -153,7 +150,7 @@ static future<json::json_return_type> total_latency(http_context& ctx, utils::t
template<typename F>
future<json::json_return_type>
sum_histogram_stats_storage_proxy(distributed<proxy>& d,
utils::timed_rate_moving_average_summary_and_histogram F::*f) {
utils::timed_rate_moving_average_and_histogram F::*f) {
return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {
return (stats.*f).hist;
}, std::plus<utils::ihistogram>(), utils::ihistogram()).
@@ -173,7 +170,7 @@ sum_histogram_stats_storage_proxy(distributed<proxy>& d,
template<typename F>
future<json::json_return_type>
sum_timer_stats_storage_proxy(distributed<proxy>& d,
utils::timed_rate_moving_average_summary_and_histogram F::*f) {
utils::timed_rate_moving_average_and_histogram F::*f) {
return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {
return (stats.*f).rate();
@@ -494,14 +491,14 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se
});
sp::get_read_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::read);
return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::estimated_read);
});
sp::get_read_latency.set(r, [&ctx](std::unique_ptr<request> req) {
return total_latency(ctx, &service::storage_proxy_stats::stats::read);
});
sp::get_write_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::write);
return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::estimated_write);
});
sp::get_write_latency.set(r, [&ctx](std::unique_ptr<request> req) {

View File

@@ -69,11 +69,6 @@ sstring validate_keyspace(http_context& ctx, const parameters& param) {
return validate_keyspace(ctx, param["keyspace"]);
}
locator::host_id validate_host_id(const sstring& param) {
auto hoep = locator::host_id_or_endpoint(param, locator::host_id_or_endpoint::param_type::host_id);
return hoep.id;
}
// splits a request parameter assumed to hold a comma-separated list of table names
// verify that the tables are found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective no_such_column_family error.
@@ -553,7 +548,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::describe_any_ring.set(r, [&ctx, &ss](std::unique_ptr<request> req) {
// Find an arbitrary non-system keyspace.
auto keyspaces = ctx.db.local().get_non_local_strategy_keyspaces();
auto keyspaces = ctx.db.local().get_non_system_keyspaces();
if (keyspaces.empty()) {
throw std::runtime_error("No keyspace provided and no non system kespace exist");
}
@@ -615,12 +610,13 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
if (column_families.empty()) {
column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
apilog.debug("force_keyspace_compaction: keyspace={} tables={}", keyspace, column_families);
return ctx.db.invoke_on_all([keyspace, column_families] (replica::database& db) -> future<> {
auto table_ids = boost::copy_range<std::vector<table_id>>(column_families | boost::adaptors::transformed([&] (auto& cf_name) {
auto table_ids = boost::copy_range<std::vector<utils::UUID>>(column_families | boost::adaptors::transformed([&] (auto& cf_name) {
return db.find_uuid(keyspace, cf_name);
}));
// major compact smaller tables first, to increase chances of success if low on space.
std::ranges::sort(table_ids, std::less<>(), [&] (const table_id& id) {
std::ranges::sort(table_ids, std::less<>(), [&] (const utils::UUID& id) {
return db.find_column_family(id).get_stats().live_disk_space_used;
});
// as a table can be dropped during loop below, let's find it before issuing major compaction request.
@@ -639,6 +635,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
if (column_families.empty()) {
column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
apilog.info("force_keyspace_cleanup: keyspace={} tables={}", keyspace, column_families);
return ss.local().is_cleanup_allowed(keyspace).then([&ctx, keyspace,
column_families = std::move(column_families)] (bool is_cleanup_allowed) mutable {
if (!is_cleanup_allowed) {
@@ -646,11 +643,11 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
std::runtime_error("Can not perform cleanup operation when topology changes"));
}
return ctx.db.invoke_on_all([keyspace, column_families] (replica::database& db) -> future<> {
auto table_ids = boost::copy_range<std::vector<table_id>>(column_families | boost::adaptors::transformed([&] (auto& table_name) {
auto table_ids = boost::copy_range<std::vector<utils::UUID>>(column_families | boost::adaptors::transformed([&] (auto& table_name) {
return db.find_uuid(keyspace, table_name);
}));
// cleanup smaller tables first, to increase chances of success if low on space.
std::ranges::sort(table_ids, std::less<>(), [&] (const table_id& id) {
std::ranges::sort(table_ids, std::less<>(), [&] (const utils::UUID& id) {
return db.find_column_family(id).get_stats().live_disk_space_used;
});
auto& cm = db.get_compaction_manager();
@@ -658,7 +655,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
// as a table can be dropped during loop below, let's find it before issuing the cleanup request.
for (auto& id : table_ids) {
replica::table& t = db.find_column_family(id);
co_await cm.perform_cleanup(owned_ranges_ptr, t.as_table_state());
co_await t.perform_cleanup_compaction(owned_ranges_ptr);
}
co_return;
}).then([]{
@@ -668,6 +665,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
ss::perform_keyspace_offstrategy_compaction.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> tables) -> future<json::json_return_type> {
apilog.info("perform_keyspace_offstrategy_compaction: keyspace={} tables={}", keyspace, tables);
co_return co_await ctx.db.map_reduce0([&keyspace, &tables] (replica::database& db) -> future<bool> {
bool needed = false;
for (const auto& table : tables) {
@@ -681,6 +679,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::upgrade_sstables.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {
bool exclude_current_version = req_param<bool>(*req, "exclude_current_version", false);
apilog.info("upgrade_sstables: keyspace={} tables={} exclude_current_version={}", keyspace, column_families, exclude_current_version);
return ctx.db.invoke_on_all([=] (replica::database& db) {
auto owned_ranges_ptr = compaction::make_owned_ranges_ptr(db.get_keyspace_local_ranges(keyspace));
return do_for_each(column_families, [=, &db](sstring cfname) {
@@ -696,17 +695,19 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::force_keyspace_flush.set(r, [&ctx](std::unique_ptr<request> req) -> future<json::json_return_type> {
auto keyspace = validate_keyspace(ctx, req->param);
auto column_families = parse_tables(keyspace, ctx, req->query_parameters, "cf");
auto& db = ctx.db;
apilog.info("perform_keyspace_flush: keyspace={} tables={}", keyspace, column_families);
auto &db = ctx.db.local();
if (column_families.empty()) {
co_await replica::database::flush_keyspace_on_all_shards(db, keyspace);
co_await db.flush_on_all(keyspace);
} else {
co_await replica::database::flush_tables_on_all_shards(db, keyspace, std::move(column_families));
co_await db.flush_on_all(keyspace, std::move(column_families));
}
co_return json_void();
});
ss::decommission.set(r, [&ss](std::unique_ptr<request> req) {
apilog.info("decommission");
return ss.local().decommission().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
@@ -720,23 +721,21 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
ss::remove_node.set(r, [&ss](std::unique_ptr<request> req) {
auto host_id = validate_host_id(req->get_query_param("host_id"));
auto host_id = req->get_query_param("host_id");
std::vector<sstring> ignore_nodes_strs= split(req->get_query_param("ignore_nodes"), ",");
auto ignore_nodes = std::list<locator::host_id_or_endpoint>();
apilog.info("remove_node: host_id={} ignore_nodes={}", host_id, ignore_nodes_strs);
auto ignore_nodes = std::list<gms::inet_address>();
for (std::string n : ignore_nodes_strs) {
try {
std::replace(n.begin(), n.end(), '\"', ' ');
std::replace(n.begin(), n.end(), '\'', ' ');
boost::trim_all(n);
if (!n.empty()) {
auto hoep = locator::host_id_or_endpoint(n);
if (!ignore_nodes.empty() && hoep.has_host_id() != ignore_nodes.front().has_host_id()) {
throw std::runtime_error("All nodes should be identified using the same method: either Host IDs or ip addresses.");
}
ignore_nodes.push_back(std::move(hoep));
auto node = gms::inet_address(n);
ignore_nodes.push_back(node);
}
} catch (...) {
throw std::runtime_error(format("Failed to parse ignore_nodes parameter: ignore_nodes={}, node={}: {}", ignore_nodes_strs, n, std::current_exception()));
throw std::runtime_error(format("Failed to parse ignore_nodes parameter: ignore_nodes={}, node={}", ignore_nodes_strs, n));
}
}
return ss.local().removenode(host_id, std::move(ignore_nodes)).then([] {
@@ -797,6 +796,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
ss::drain.set(r, [&ss](std::unique_ptr<request> req) {
apilog.info("drain");
return ss.local().drain().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
@@ -814,18 +814,30 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
if (type == "user") {
return ctx.db.local().get_user_keyspaces();
} else if (type == "non_local_strategy") {
return ctx.db.local().get_non_local_strategy_keyspaces();
return map_keys(ctx.db.local().get_keyspaces() | boost::adaptors::filtered([](const auto& p) {
return p.second.get_replication_strategy().get_type() != locator::replication_strategy_type::local;
}));
}
return map_keys(ctx.db.local().get_keyspaces());
});
ss::update_snitch.set(r, [](std::unique_ptr<request> req) {
locator::snitch_config cfg;
cfg.name = req->get_query_param("ep_snitch_class_name");
return locator::i_endpoint_snitch::reset_snitch(cfg).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::stop_gossiping.set(r, [&ss](std::unique_ptr<request> req) {
apilog.info("stop_gossiping");
return ss.local().stop_gossiping().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::start_gossiping.set(r, [&ss](std::unique_ptr<request> req) {
apilog.info("start_gossiping");
return ss.local().start_gossiping().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
@@ -928,6 +940,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::rebuild.set(r, [&ss](std::unique_ptr<request> req) {
auto source_dc = req->get_query_param("source_dc");
apilog.info("rebuild: source_dc={}", source_dc);
return ss.local().rebuild(std::move(source_dc)).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
@@ -964,6 +977,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
// FIXME: We should truncate schema tables if more than one node in the cluster.
auto& sp = service::get_storage_proxy();
auto& fs = sp.local().features();
apilog.info("reset_local_schema");
return db::schema_tables::recalculate_schema_version(sys_ks, sp, fs).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
@@ -971,6 +985,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::set_trace_probability.set(r, [](std::unique_ptr<request> req) {
auto probability = req->get_query_param("probability");
apilog.info("set_trace_probability: probability={}", probability);
return futurize_invoke([probability] {
double real_prob = std::stod(probability.c_str());
return tracing::tracing::tracing_instance().invoke_on_all([real_prob] (auto& local_tracing) {
@@ -1008,6 +1023,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
auto ttl = req->get_query_param("ttl");
auto threshold = req->get_query_param("threshold");
auto fast = req->get_query_param("fast");
apilog.info("set_slow_query: enable={} ttl={} threshold={} fast={}", enable, ttl, threshold, fast);
try {
return tracing::tracing::tracing_instance().invoke_on_all([enable, ttl, threshold, fast] (auto& local_tracing) {
if (threshold != "") {
@@ -1034,6 +1050,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
auto keyspace = validate_keyspace(ctx, req->param);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("enable_auto_compaction: keyspace={} tables={}", keyspace, tables);
return set_tables_autocompaction(ctx, keyspace, tables, true);
});
@@ -1041,6 +1058,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
auto keyspace = validate_keyspace(ctx, req->param);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("disable_auto_compaction: keyspace={} tables={}", keyspace, tables);
return set_tables_autocompaction(ctx, keyspace, tables, false);
});
@@ -1312,7 +1330,7 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_
});
});
ss::take_snapshot.set(r, [&ctx, &snap_ctl](std::unique_ptr<request> req) -> future<json::json_return_type> {
ss::take_snapshot.set(r, [&snap_ctl](std::unique_ptr<request> req) -> future<json::json_return_type> {
apilog.info("take_snapshot: {}", req->query_parameters);
auto tag = req->get_query_param("tag");
auto column_families = split(req->get_query_param("cf"), ",");
@@ -1330,13 +1348,7 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_
if (keynames.size() > 1) {
throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");
}
for (const auto& table_name : column_families) {
auto& t = ctx.db.local().find_column_family(keynames[0], table_name);
if (t.schema()->is_view()) {
throw std::invalid_argument("Do not take a snapshot of a materialized view or a secondary index by itself. Run snapshot on the base table instead.");
}
}
co_await snap_ctl.local().take_column_family_snapshot(keynames[0], column_families, tag, db::snapshot_ctl::snap_views::yes, sf);
co_await snap_ctl.local().take_column_family_snapshot(keynames[0], column_families, tag, sf);
}
co_return json_void();
} catch (...) {
@@ -1406,10 +1418,7 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_
if (!req_param<bool>(*req, "disable_snapshot", false)) {
auto tag = format("pre-scrub-{:d}", db_clock::now().time_since_epoch().count());
f = parallel_for_each(column_families, [&snap_ctl, keyspace, tag](sstring cf) {
// We always pass here db::snapshot_ctl::snap_views::no since:
// 1. When scrubbing particular tables, there's no need to auto-snapshot their views.
// 2. When scrubbing the whole keyspace, column_families will contain both base tables and views.
return snap_ctl.local().take_column_family_snapshot(keyspace, cf, tag, db::snapshot_ctl::snap_views::no, db::snapshot_ctl::skip_flush::no);
return snap_ctl.local().take_column_family_snapshot(keyspace, cf, tag, db::snapshot_ctl::skip_flush::no, db::snapshot_ctl::allow_view_snapshots::yes);
});
}

View File

@@ -61,16 +61,6 @@ void set_system(http_context& ctx, routes& r) {
return json::json_void();
});
hs::write_log_message.set(r, [](const_req req) {
try {
logging::log_level level = boost::lexical_cast<logging::log_level>(std::string(req.get_query_param("level")));
apilog.log(level, "/system/log: {}", std::string(req.get_query_param("message")));
} catch (boost::bad_lexical_cast& e) {
throw bad_param_exception("Unknown logging level " + req.get_query_param("level"));
}
return json::json_void();
});
hs::drop_sstable_caches.set(r, [&ctx](std::unique_ptr<request> req) {
apilog.info("Dropping sstable caches");
return ctx.db.invoke_on_all([] (replica::database& db) {

View File

@@ -1,164 +0,0 @@
/*
* Copyright (C) 2022-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include <seastar/core/coroutine.hh>
#include "task_manager.hh"
#include "api/api-doc/task_manager.json.hh"
#include "db/system_keyspace.hh"
#include "column_family.hh"
#include "unimplemented.hh"
#include "storage_service.hh"
#include <utility>
#include <boost/range/adaptors.hpp>
namespace api {
namespace tm = httpd::task_manager_json;
using namespace json;
inline bool filter_tasks(tasks::task_manager::task_ptr task, std::unordered_map<sstring, sstring>& query_params) {
return (!query_params.contains("keyspace") || query_params["keyspace"] == task->get_status().keyspace) &&
(!query_params.contains("table") || query_params["table"] == task->get_status().table);
}
struct full_task_status {
tasks::task_manager::task::status task_status;
tasks::task_manager::task::progress progress;
std::string module;
tasks::task_id parent_id;
tasks::is_abortable abortable;
};
struct task_stats {
task_stats(tasks::task_manager::task_ptr task) : task_id(task->id().to_sstring()), state(task->get_status().state) {}
sstring task_id;
tasks::task_manager::task_state state;
};
tm::task_status make_status(full_task_status status) {
auto start_time = db_clock::to_time_t(status.task_status.start_time);
auto end_time = db_clock::to_time_t(status.task_status.end_time);
::tm st, et;
::gmtime_r(&end_time, &et);
::gmtime_r(&start_time, &st);
tm::task_status res{};
res.id = status.task_status.id.to_sstring();
res.type = status.task_status.type;
res.state = status.task_status.state;
res.is_abortable = bool(status.abortable);
res.start_time = st;
res.end_time = et;
res.error = status.task_status.error;
res.parent_id = status.parent_id.to_sstring();
res.sequence_number = status.task_status.sequence_number;
res.shard = status.task_status.shard;
res.keyspace = status.task_status.keyspace;
res.table = status.task_status.table;
res.entity = status.task_status.entity;
res.progress_units = status.task_status.progress_units;
res.progress_total = status.progress.total;
res.progress_completed = status.progress.completed;
return res;
}
future<json::json_return_type> retrieve_status(tasks::task_manager::foreign_task_ptr task) {
if (task.get() == nullptr) {
co_return coroutine::return_exception(httpd::bad_param_exception("Task not found"));
}
auto progress = co_await task->get_progress();
full_task_status s;
s.task_status = task->get_status();
s.parent_id = task->get_parent_id();
s.abortable = task->is_abortable();
s.module = task->get_module_name();
s.progress.completed = progress.completed;
s.progress.total = progress.total;
co_return make_status(s);
}
void set_task_manager(http_context& ctx, routes& r) {
tm::get_modules.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
std::vector<std::string> v = boost::copy_range<std::vector<std::string>>(ctx.tm.local().get_modules() | boost::adaptors::map_keys);
co_return v;
});
tm::get_tasks.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
using chunked_stats = utils::chunked_vector<task_stats>;
auto internal = tasks::is_internal{req_param<bool>(*req, "internal", false)};
std::vector<chunked_stats> res = co_await ctx.tm.map([&req, internal] (tasks::task_manager& tm) {
chunked_stats local_res;
auto module = tm.find_module(req->param["module"]);
const auto& filtered_tasks = module->get_tasks() | boost::adaptors::filtered([&params = req->query_parameters, internal] (const auto& task) {
return (internal || !task.second->is_internal()) && filter_tasks(task.second, params);
});
for (auto& [task_id, task] : filtered_tasks) {
local_res.push_back(task_stats{task});
}
return local_res;
});
std::function<future<>(output_stream<char>&&)> f = [r = std::move(res)] (output_stream<char>&& os) -> future<> {
auto s = std::move(os);
auto res = std::move(r);
co_await s.write("[");
std::string delim = "";
for (auto& v: res) {
for (auto& stats: v) {
co_await s.write(std::exchange(delim, ", "));
tm::task_stats ts;
ts = stats;
co_await formatter::write(s, ts);
}
}
co_await s.write("]");
co_await s.close();
};
co_return std::move(f);
});
tm::get_task_status.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};
auto task = co_await tasks::task_manager::invoke_on_task(ctx.tm, id, std::function([] (tasks::task_manager::task_ptr task) -> future<tasks::task_manager::foreign_task_ptr> {
auto state = task->get_status().state;
if (state == tasks::task_manager::task_state::done || state == tasks::task_manager::task_state::failed) {
task->unregister_task();
}
co_return std::move(task);
}));
co_return co_await retrieve_status(std::move(task));
});
tm::abort_task.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};
co_await tasks::task_manager::invoke_on_task(ctx.tm, id, [] (tasks::task_manager::task_ptr task) -> future<> {
if (!task->is_abortable()) {
co_await coroutine::return_exception(std::runtime_error("Requested task cannot be aborted"));
}
co_await task->abort();
});
co_return json_void();
});
tm::wait_task.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};
auto task = co_await tasks::task_manager::invoke_on_task(ctx.tm, id, std::function([] (tasks::task_manager::task_ptr task) {
return task->done().then_wrapped([task] (auto f) {
task->unregister_task();
f.get();
return make_foreign(task);
});
}));
co_return co_await retrieve_status(std::move(task));
});
}
}

View File

@@ -1,17 +0,0 @@
/*
* Copyright (C) 2022-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#pragma once
#include "api.hh"
namespace api {
void set_task_manager(http_context& ctx, routes& r);
}

View File

@@ -1,109 +0,0 @@
/*
* Copyright (C) 2022-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#ifndef SCYLLA_BUILD_MODE_RELEASE
#include <seastar/core/coroutine.hh>
#include "task_manager_test.hh"
#include "api/api-doc/task_manager_test.json.hh"
#include "tasks/test_module.hh"
namespace api {
namespace tmt = httpd::task_manager_test_json;
using namespace json;
void set_task_manager_test(http_context& ctx, routes& r, db::config& cfg) {
tmt::register_test_module.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
co_await ctx.tm.invoke_on_all([] (tasks::task_manager& tm) {
auto m = make_shared<tasks::test_module>(tm);
tm.register_module("test", m);
});
co_return json_void();
});
tmt::unregister_test_module.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
co_await ctx.tm.invoke_on_all([] (tasks::task_manager& tm) -> future<> {
auto module_name = "test";
auto module = tm.find_module(module_name);
co_await module->stop();
});
co_return json_void();
});
tmt::register_test_task.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
sharded<tasks::task_manager>& tms = ctx.tm;
auto it = req->query_parameters.find("task_id");
auto id = it != req->query_parameters.end() ? tasks::task_id{utils::UUID{it->second}} : tasks::task_id::create_null_id();
it = req->query_parameters.find("shard");
unsigned shard = it != req->query_parameters.end() ? boost::lexical_cast<unsigned>(it->second) : 0;
it = req->query_parameters.find("keyspace");
std::string keyspace = it != req->query_parameters.end() ? it->second : "";
it = req->query_parameters.find("table");
std::string table = it != req->query_parameters.end() ? it->second : "";
it = req->query_parameters.find("type");
std::string type = it != req->query_parameters.end() ? it->second : "";
it = req->query_parameters.find("entity");
std::string entity = it != req->query_parameters.end() ? it->second : "";
it = req->query_parameters.find("parent_id");
tasks::task_info data;
if (it != req->query_parameters.end()) {
data.id = tasks::task_id{utils::UUID{it->second}};
auto parent_ptr = co_await tasks::task_manager::lookup_task_on_all_shards(ctx.tm, data.id);
data.shard = parent_ptr->get_status().shard;
}
auto module = tms.local().find_module("test");
id = co_await module->make_task<tasks::test_task_impl>(shard, id, keyspace, table, type, entity, data);
co_await tms.invoke_on(shard, [id] (tasks::task_manager& tm) {
auto it = tm.get_all_tasks().find(id);
if (it != tm.get_all_tasks().end()) {
it->second->start();
}
});
co_return id.to_sstring();
});
tmt::unregister_test_task.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->query_parameters["task_id"]}};
co_await tasks::task_manager::invoke_on_task(ctx.tm, id, [] (tasks::task_manager::task_ptr task) -> future<> {
tasks::test_task test_task{task};
co_await test_task.unregister_task();
});
co_return json_void();
});
tmt::finish_test_task.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};
auto it = req->query_parameters.find("error");
bool fail = it != req->query_parameters.end();
std::string error = fail ? it->second : "";
co_await tasks::task_manager::invoke_on_task(ctx.tm, id, [fail, error = std::move(error)] (tasks::task_manager::task_ptr task) {
tasks::test_task test_task{task};
if (fail) {
test_task.finish_failed(std::make_exception_ptr(std::runtime_error(error)));
} else {
test_task.finish();
}
return make_ready_future<>();
});
co_return json_void();
});
tmt::get_and_update_ttl.set(r, [&ctx, &cfg] (std::unique_ptr<request> req) -> future<json::json_return_type> {
uint32_t ttl = cfg.task_ttl_seconds();
cfg.task_ttl_seconds.set(boost::lexical_cast<uint32_t>(req->query_parameters["ttl"]));
co_return json::json_return_type(ttl);
});
}
}
#endif

View File

@@ -1,22 +0,0 @@
/*
* Copyright (C) 2022-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#ifndef SCYLLA_BUILD_MODE_RELEASE
#pragma once
#include "api.hh"
#include "db/config.hh"
namespace api {
void set_task_manager_test(http_context& ctx, routes& r, db::config& cfg);
}
#endif

View File

@@ -1,59 +0,0 @@
/*
* Copyright (C) 2022-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#pragma once
#ifndef SCYLLA_BUILD_MODE
#error SCYLLA_BUILD_MODE must be defined
#endif
#ifndef STRINGIFY
// We need to levels of indirection
// to make a string out of the macro name.
// The outer level expands the macro
// and the inner level makes a string out of the expanded macro.
#define STRINGIFY_VALUE(x) #x
#define STRINGIFY_MACRO(x) STRINGIFY_VALUE(x)
#endif
#define SCYLLA_BUILD_MODE_STR STRINGIFY_MACRO(SCYLLA_BUILD_MODE)
// We use plain macro definitions
// so the preprocessor can expand them
// inline in the #if directives below
#define SCYLLA_BUILD_MODE_CODE_debug 0
#define SCYLLA_BUILD_MODE_CODE_release 1
#define SCYLLA_BUILD_MODE_CODE_dev 2
#define SCYLLA_BUILD_MODE_CODE_sanitize 3
#define SCYLLA_BUILD_MODE_CODE_coverage 4
#define _SCYLLA_BUILD_MODE_CODE(sbm) SCYLLA_BUILD_MODE_CODE_ ## sbm
#define SCYLLA_BUILD_MODE_CODE(sbm) _SCYLLA_BUILD_MODE_CODE(sbm)
#if SCYLLA_BUILD_MODE_CODE(SCYLLA_BUILD_MODE) == SCYLLA_BUILD_MODE_CODE_debug
#define SCYLLA_BUILD_MODE_DEBUG
#elif SCYLLA_BUILD_MODE_CODE(SCYLLA_BUILD_MODE) == SCYLLA_BUILD_MODE_CODE_release
#define SCYLLA_BUILD_MODE_RELEASE
#elif SCYLLA_BUILD_MODE_CODE(SCYLLA_BUILD_MODE) == SCYLLA_BUILD_MODE_CODE_dev
#define SCYLLA_BUILD_MODE_DEV
#elif SCYLLA_BUILD_MODE_CODE(SCYLLA_BUILD_MODE) == SCYLLA_BUILD_MODE_CODE_sanitize
#define SCYLLA_BUILD_MODE_SANITIZE
#elif SCYLLA_BUILD_MODE_CODE(SCYLLA_BUILD_MODE) == SCYLLA_BUILD_MODE_CODE_coverage
#define SCYLLA_BUILD_MODE_COVERAGE
#else
#error unrecognized SCYLLA_BUILD_MODE
#endif
#if (defined(SCYLLA_BUILD_MODE_RELEASE) || defined(SCYLLA_BUILD_MODE_DEV)) && defined(SEASTAR_DEBUG)
#error SEASTAR_DEBUG is not expected to be defined when SCYLLA_BUILD_MODE is "release" or "dev"
#endif
#if (defined(SCYLLA_BUILD_MODE_DEBUG) || defined(SCYLLA_BUILD_MODE_SANITIZE)) && !defined(SEASTAR_DEBUG)
#error SEASTAR_DEBUG is expected to be defined when SCYLLA_BUILD_MODE is "debug" or "sanitize"
#endif

View File

@@ -737,9 +737,7 @@ void cache_flat_mutation_reader::maybe_drop_last_entry() noexcept {
&& _snp->at_oldest_version()) {
with_allocator(_snp->region().allocator(), [&] {
cache_tracker& tracker = _read_context.cache()._tracker;
tracker.get_lru().remove(*_last_row);
_last_row->on_evicted(tracker);
_last_row->on_evicted(_read_context.cache()._tracker);
});
_last_row = nullptr;

View File

@@ -15,7 +15,14 @@
#include "converting_mutation_partition_applier.hh"
#include "hashing_partition_visitor.hh"
#include "utils/UUID.hh"
#include "serializer.hh"
#include "idl/uuid.dist.hh"
#include "idl/keys.dist.hh"
#include "idl/mutation.dist.hh"
#include "serializer_impl.hh"
#include "serialization_visitors.hh"
#include "idl/uuid.dist.impl.hh"
#include "idl/keys.dist.impl.hh"
#include "idl/mutation.dist.impl.hh"
#include <iostream>
@@ -37,7 +44,7 @@ canonical_mutation::canonical_mutation(const mutation& m)
}).end_canonical_mutation();
}
table_id canonical_mutation::column_family_id() const {
utils::UUID canonical_mutation::column_family_id() const {
auto in = ser::as_input_stream(_data);
auto mv = ser::deserialize(in, boost::type<ser::canonical_mutation_view>());
return mv.table_id();
@@ -113,19 +120,17 @@ std::ostream& operator<<(std::ostream& os, const canonical_mutation& cm) {
auto&& entry = _cm.static_column_at(id);
fmt::print(_os, "static column {} {}", bytes_to_text(entry.name()), collection_mutation_view::printer(*entry.type(), cmv));
}
virtual stop_iteration accept_row_tombstone(range_tombstone rt) override {
virtual void accept_row_tombstone(range_tombstone rt) override {
print_separator();
fmt::print(_os, "row tombstone {}", rt);
return stop_iteration::no;
}
virtual stop_iteration accept_row(position_in_partition_view pipv, row_tombstone rt, row_marker rm, is_dummy, is_continuous) override {
virtual void accept_row(position_in_partition_view pipv, row_tombstone rt, row_marker rm, is_dummy, is_continuous) override {
if (_in_row) {
fmt::print(_os, "}}, ");
}
fmt::print(_os, "{{row {} tombstone {} marker {}", pipv, rt, rm);
_in_row = true;
_first = false;
return stop_iteration::no;
}
virtual void accept_row_cell(column_id id, atomic_cell ac) override {
print_separator();

View File

@@ -14,6 +14,10 @@
#include "bytes_ostream.hh"
#include <iosfwd>
namespace utils {
class UUID;
} // namespace utils
// Immutable mutation form which can be read using any schema version of the same table.
// Safe to access from other shards via const&.
// Safe to pass serialized across nodes.
@@ -35,7 +39,7 @@ public:
// is not intended, user should sync the schema first.
mutation to_mutation(schema_ptr) const;
table_id column_family_id() const;
utils::UUID column_family_id() const;
const bytes_ostream& representation() const { return _data; }

View File

@@ -455,12 +455,7 @@ static future<> update_streams_description(
noncopyable_function<unsigned()> get_num_token_owners,
abort_source& abort_src) -> future<> {
while (true) {
try {
co_await sleep_abortable(std::chrono::seconds(60), abort_src);
} catch (seastar::sleep_aborted&) {
cdc_log.warn( "Aborted update CDC description table with generation {}", gen_id);
co_return;
}
co_await sleep_abortable(std::chrono::seconds(60), abort_src);
try {
co_await do_update_streams_description(gen_id, *sys_dist_ks, { get_num_token_owners() });
co_return;
@@ -583,7 +578,7 @@ future<> generation_service::maybe_rewrite_streams_descriptions() {
co_return;
}
if (co_await _sys_ks.local().cdc_is_rewritten()) {
if (co_await db::system_keyspace::cdc_is_rewritten()) {
co_return;
}
@@ -607,13 +602,13 @@ future<> generation_service::maybe_rewrite_streams_descriptions() {
continue;
}
times_and_ttls.push_back(time_and_ttl{as_timepoint(s.id().uuid()), cdc_opts.ttl()});
times_and_ttls.push_back(time_and_ttl{as_timepoint(s.id()), cdc_opts.ttl()});
}
if (times_and_ttls.empty()) {
// There's no point in rewriting old generations' streams (they don't contain any data).
cdc_log.info("No CDC log tables present, not rewriting stream tables.");
co_return co_await _sys_ks.local().cdc_set_rewritten(std::nullopt);
co_return co_await db::system_keyspace::cdc_set_rewritten(std::nullopt);
}
auto get_num_token_owners = [tm = _token_metadata.get()] { return tm->count_normal_token_owners(); };
@@ -631,7 +626,7 @@ future<> generation_service::maybe_rewrite_streams_descriptions() {
std::move(get_num_token_owners),
_abort_src);
co_await _sys_ks.local().cdc_set_rewritten(last_rewritten);
co_await db::system_keyspace::cdc_set_rewritten(last_rewritten);
}
static void assert_shard_zero(const sstring& where) {
@@ -875,7 +870,7 @@ future<> generation_service::check_and_repair_cdc_streams() {
{ gms::application_state::CDC_GENERATION_ID, gms::versioned_value::cdc_generation_id(new_gen_id) },
{ gms::application_state::STATUS, *status }
});
co_await _sys_ks.local().update_cdc_generation_id(new_gen_id);
co_await db::system_keyspace::update_cdc_generation_id(new_gen_id);
}
future<> generation_service::handle_cdc_generation(std::optional<cdc::generation_id> gen_id) {
@@ -1018,7 +1013,7 @@ future<bool> generation_service::do_handle_cdc_generation(cdc::generation_id gen
// The assumption follows from the requirement of bootstrapping nodes sequentially.
if (!_gen_id || get_ts(*_gen_id) < get_ts(gen_id)) {
_gen_id = gen_id;
co_await _sys_ks.local().update_cdc_generation_id(gen_id);
co_await db::system_keyspace::update_cdc_generation_id(gen_id);
co_await _gossiper.add_local_application_state(
gms::application_state::CDC_GENERATION_ID, gms::versioned_value::cdc_generation_id(gen_id));
}

View File

@@ -59,7 +59,7 @@ using namespace std::chrono_literals;
logging::logger cdc_log("cdc");
namespace cdc {
static schema_ptr create_log_schema(const schema&, std::optional<table_id> = {}, schema_ptr = nullptr);
static schema_ptr create_log_schema(const schema&, std::optional<utils::UUID> = {}, schema_ptr = nullptr);
}
static constexpr auto cdc_group_name = "cdc";
@@ -485,7 +485,7 @@ bytes log_data_column_deleted_elements_name_bytes(const bytes& column_name) {
return to_bytes(cdc_deleted_elements_column_prefix) + column_name;
}
static schema_ptr create_log_schema(const schema& s, std::optional<table_id> uuid, schema_ptr old) {
static schema_ptr create_log_schema(const schema& s, std::optional<utils::UUID> uuid, schema_ptr old) {
schema_builder b(s.ks_name(), log_name(s.cf_name()));
b.with_partitioner(cdc::cdc_partitioner::classname);
b.set_compaction_strategy(sstables::compaction_strategy_type::time_window);
@@ -1655,8 +1655,7 @@ public:
auto partition_slice = query::partition_slice(std::move(bounds), std::move(static_columns), std::move(regular_columns), std::move(opts));
const auto max_result_size = _ctx._proxy.get_max_result_size(partition_slice);
const auto tombstone_limit = query::tombstone_limit(_ctx._proxy.get_tombstone_limit());
auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(), partition_slice, query::max_result_size(max_result_size), tombstone_limit, query::row_limit(row_limit));
auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(), partition_slice, query::max_result_size(max_result_size), query::row_limit(row_limit));
const auto select_cl = adjust_cl(write_cl);

View File

@@ -13,12 +13,6 @@
class schema;
class partition_key;
class clustering_row;
struct atomic_cell_view;
struct tombstone;
namespace db::view {
struct view_key_and_action;
}
class column_computation;
using column_computation_ptr = std::unique_ptr<column_computation>;
@@ -28,7 +22,7 @@ using column_computation_ptr = std::unique_ptr<column_computation>;
* Computed columns description is also available at docs/dev/system_schema_keyspace.md. They hold values
* not provided directly by the user, but rather computed: from other column values and possibly other sources.
* This class is able to serialize/deserialize column computations and perform the computation itself,
* based on given schema, and partition key. Responsibility for providing enough data
* based on given schema, partition key and clustering row. Responsibility for providing enough data
* in the clustering row in order for computation to succeed belongs to the caller. In particular,
* generating a value might involve performing a read-before-write if the computation is performed
* on more values than are present in the update request.
@@ -42,19 +36,7 @@ public:
virtual column_computation_ptr clone() const = 0;
virtual bytes serialize() const = 0;
virtual bytes compute_value(const schema& schema, const partition_key& key) const = 0;
/*
* depends_on_non_primary_key_column for a column computation is needed to
* detect a case where the primary key of a materialized view depends on a
* non primary key column from the base table, but at the same time, the view
* itself doesn't have non-primary key columns. This is an issue, since as
* for now, it was assumed that no non-primary key columns in view schema
* meant that the update cannot change the primary key of the view, and
* therefore the update path can be simplified.
*/
virtual bool depends_on_non_primary_key_column() const {
return false;
}
virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const = 0;
};
/*
@@ -72,7 +54,7 @@ public:
return std::make_unique<legacy_token_column_computation>(*this);
}
virtual bytes serialize() const override;
virtual bytes compute_value(const schema& schema, const partition_key& key) const override;
virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const override;
};
@@ -93,53 +75,5 @@ public:
return std::make_unique<token_column_computation>(*this);
}
virtual bytes serialize() const override;
virtual bytes compute_value(const schema& schema, const partition_key& key) const override;
};
/*
* collection_column_computation is used for a secondary index on a collection
* column. In this case we don't have a single value to compute, but rather we
* want to return multiple values (e.g., all the keys in the collection).
* So this class does not implement the base class's compute_value() -
* instead it implements a new method compute_collection_values(), which
* can return multiple values. This new method is currently called only from
* the materialized-view code which uses collection_column_computation.
*/
class collection_column_computation final : public column_computation {
enum class kind {
keys,
values,
entries,
};
const bytes _collection_name;
const kind _kind;
collection_column_computation(const bytes& collection_name, kind kind) : _collection_name(collection_name), _kind(kind) {}
using collection_kv = std::pair<bytes_view, atomic_cell_view>;
void operate_on_collection_entries(
std::invocable<collection_kv*, collection_kv*, tombstone> auto&& old_and_new_row_func, const schema& schema,
const partition_key& key, const clustering_row& update, const std::optional<clustering_row>& existing) const;
public:
static collection_column_computation for_keys(const bytes& collection_name) {
return {collection_name, kind::keys};
}
static collection_column_computation for_values(const bytes& collection_name) {
return {collection_name, kind::values};
}
static collection_column_computation for_entries(const bytes& collection_name) {
return {collection_name, kind::entries};
}
static column_computation_ptr for_target_type(std::string_view type, const bytes& collection_name);
virtual bytes serialize() const override;
virtual bytes compute_value(const schema& schema, const partition_key& key) const override;
virtual column_computation_ptr clone() const override {
return std::make_unique<collection_column_computation>(*this);
}
virtual bool depends_on_non_primary_key_column() const override {
return true;
}
std::vector<db::view::view_key_and_action> compute_values_with_action(const schema& schema, const partition_key& key, const clustering_row& row, const std::optional<clustering_row>& existing) const;
virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const override;
};

View File

@@ -52,7 +52,6 @@
#include "readers/filtering.hh"
#include "readers/compacting.hh"
#include "tombstone_gc.hh"
#include "keys.hh"
namespace sstables {
@@ -278,18 +277,6 @@ class compacted_fragments_writer {
stop_func_t _stop_compaction_writer;
std::optional<utils::observer<>> _stop_request_observer;
bool _unclosed_partition = false;
struct partition_state {
dht::decorated_key_opt dk;
// Partition tombstone is saved for the purpose of replicating it to every fragment storing a partition pL.
// Then when reading from the SSTable run, we won't unnecessarily have to open >= 2 fragments, the one which
// contains the tombstone and another one(s) that has the partition slice being queried.
::tombstone tombstone;
// Used to determine whether any active tombstones need closing at EOS.
::tombstone current_emitted_tombstone;
// Track last emitted clustering row, which will be used to close active tombstone if splitting partition
position_in_partition last_pos = position_in_partition::before_all_clustered_rows();
bool is_splitting_partition = false;
} _current_partition;
private:
inline void maybe_abort_compaction();
@@ -299,13 +286,6 @@ private:
consume_end_of_stream();
});
}
void stop_current_writer();
bool can_split_large_partition() const;
void track_last_position(position_in_partition_view pos);
void split_large_partition();
void do_consume_new_partition(const dht::decorated_key& dk);
stop_iteration do_consume_end_of_partition();
public:
explicit compacted_fragments_writer(compaction& c, creator_func_t cpw, stop_func_t scw)
: _c(c)
@@ -324,7 +304,7 @@ public:
void consume_new_partition(const dht::decorated_key& dk);
void consume(tombstone t);
void consume(tombstone t) { _compaction_writer->writer.consume(t); }
stop_iteration consume(static_row&& sr, tombstone, bool) {
maybe_abort_compaction();
return _compaction_writer->writer.consume(std::move(sr));
@@ -332,11 +312,17 @@ public:
stop_iteration consume(static_row&& sr) {
return consume(std::move(sr), tombstone{}, bool{});
}
stop_iteration consume(clustering_row&& cr, row_tombstone, bool);
stop_iteration consume(clustering_row&& cr, row_tombstone, bool) {
maybe_abort_compaction();
return _compaction_writer->writer.consume(std::move(cr));
}
stop_iteration consume(clustering_row&& cr) {
return consume(std::move(cr), row_tombstone{}, bool{});
}
stop_iteration consume(range_tombstone_change&& rtc);
stop_iteration consume(range_tombstone_change&& rtc) {
maybe_abort_compaction();
return _compaction_writer->writer.consume(std::move(rtc));
}
stop_iteration consume_end_of_partition();
void consume_end_of_stream();
@@ -462,11 +448,10 @@ protected:
uint64_t _estimated_partitions = 0;
db::replay_position _rp;
encoding_stats_collector _stats_collector;
bool _can_split_large_partition = false;
bool _contains_multi_fragment_runs = false;
mutation_source_metadata _ms_metadata = {};
compaction_sstable_replacer_fn _replacer;
run_id _run_identifier;
utils::UUID _run_identifier;
::io_priority_class _io_priority;
// optional clone of sstable set to be used for expiration purposes, so it will be set if expiration is enabled.
std::optional<sstable_set> _sstable_set;
@@ -494,7 +479,6 @@ protected:
, _type(descriptor.options.type())
, _max_sstable_size(descriptor.max_sstable_bytes)
, _sstable_level(descriptor.level)
, _can_split_large_partition(descriptor.can_split_large_partition)
, _replacer(std::move(descriptor.replacer))
, _run_identifier(descriptor.run_identifier)
, _io_priority(descriptor.io_priority)
@@ -505,7 +489,7 @@ protected:
for (auto& sst : _sstables) {
_stats_collector.update(sst->get_encoding_stats_for_compaction());
}
std::unordered_set<run_id> ssts_run_ids;
std::unordered_set<utils::UUID> ssts_run_ids;
_contains_multi_fragment_runs = std::any_of(_sstables.begin(), _sstables.end(), [&ssts_run_ids] (shared_sstable& sst) {
return !ssts_run_ids.insert(sst->run_identifier()).second;
});
@@ -700,8 +684,7 @@ private:
reader.consume_in_thread(std::move(cfc));
});
});
const auto& gc_state = _table_s.get_tombstone_gc_state();
return consumer(make_compacting_reader(make_sstable_reader(), compaction_time, max_purgeable_func(), gc_state));
return consumer(make_compacting_reader(make_sstable_reader(), compaction_time, max_purgeable_func()));
}
future<> consume() {
@@ -721,7 +704,6 @@ private:
using compact_mutations = compact_for_compaction_v2<compacted_fragments_writer, compacted_fragments_writer>;
auto cfc = compact_mutations(*schema(), now,
max_purgeable_func(),
_table_s.get_tombstone_gc_state(),
get_compacted_fragments_writer(),
get_gc_compacted_fragments_writer());
@@ -731,7 +713,6 @@ private:
using compact_mutations = compact_for_compaction_v2<compacted_fragments_writer, noop_compacted_fragments_consumer>;
auto cfc = compact_mutations(*schema(), now,
max_purgeable_func(),
_table_s.get_tombstone_gc_state(),
get_compacted_fragments_writer(),
noop_compacted_fragments_consumer());
reader.consume_in_thread(std::move(cfc));
@@ -883,51 +864,7 @@ void compacted_fragments_writer::maybe_abort_compaction() {
}
}
void compacted_fragments_writer::stop_current_writer() {
// stop sstable writer being currently used.
_stop_compaction_writer(&*_compaction_writer);
_compaction_writer = std::nullopt;
}
bool compacted_fragments_writer::can_split_large_partition() const {
return _c._can_split_large_partition;
}
void compacted_fragments_writer::track_last_position(position_in_partition_view pos) {
if (can_split_large_partition()) {
_current_partition.last_pos = pos;
}
}
void compacted_fragments_writer::split_large_partition() {
// Closes the active range tombstone if needed, before emitting partition end.
// after_key(last_pos) is used for both closing and re-opening the active tombstone, which
// will result in current fragment storing an inclusive end bound for last pos, and the
// next fragment storing an exclusive start bound for last pos. This is very important
// for not losing information on the range tombstone.
auto after_last_pos = position_in_partition::after_key(_current_partition.last_pos.key());
if (_current_partition.current_emitted_tombstone) {
auto rtc = range_tombstone_change(after_last_pos, tombstone{});
_c.log_debug("Closing active tombstone {} with {} for partition {}", _current_partition.current_emitted_tombstone, rtc, *_current_partition.dk);
_compaction_writer->writer.consume(std::move(rtc));
}
_c.log_debug("Splitting large partition {} in order to respect SSTable size limit of {}", *_current_partition.dk, pretty_printed_data_size(_c._max_sstable_size));
// Close partition in current writer, and open it again in a new writer.
do_consume_end_of_partition();
stop_current_writer();
do_consume_new_partition(*_current_partition.dk);
// Replicate partition tombstone to every fragment, allowing the SSTable run reader
// to open a single fragment during the read.
if (_current_partition.tombstone) {
consume(_current_partition.tombstone);
}
if (_current_partition.current_emitted_tombstone) {
_compaction_writer->writer.consume(range_tombstone_change(after_last_pos, _current_partition.current_emitted_tombstone));
}
_current_partition.is_splitting_partition = false;
}
void compacted_fragments_writer::do_consume_new_partition(const dht::decorated_key& dk) {
void compacted_fragments_writer::consume_new_partition(const dht::decorated_key& dk) {
maybe_abort_compaction();
if (!_compaction_writer) {
_compaction_writer = _create_compaction_writer(dk);
@@ -935,55 +872,17 @@ void compacted_fragments_writer::do_consume_new_partition(const dht::decorated_k
_c.on_new_partition();
_compaction_writer->writer.consume_new_partition(dk);
_c._cdata.total_keys_written++;
_unclosed_partition = true;
}
stop_iteration compacted_fragments_writer::do_consume_end_of_partition() {
_unclosed_partition = false;
return _compaction_writer->writer.consume_end_of_partition();
}
void compacted_fragments_writer::consume_new_partition(const dht::decorated_key& dk) {
_current_partition = {
.dk = dk,
.tombstone = tombstone(),
.current_emitted_tombstone = tombstone(),
.last_pos = position_in_partition(position_in_partition::partition_start_tag_t()),
.is_splitting_partition = false
};
do_consume_new_partition(dk);
_c._cdata.total_keys_written++;
}
void compacted_fragments_writer::consume(tombstone t) {
_current_partition.tombstone = t;
_compaction_writer->writer.consume(t);
}
stop_iteration compacted_fragments_writer::consume(clustering_row&& cr, row_tombstone, bool) {
maybe_abort_compaction();
if (_current_partition.is_splitting_partition) [[unlikely]] {
split_large_partition();
}
track_last_position(cr.position());
auto ret = _compaction_writer->writer.consume(std::move(cr));
if (can_split_large_partition() && ret == stop_iteration::yes) [[unlikely]] {
_current_partition.is_splitting_partition = true;
}
return stop_iteration::no;
}
stop_iteration compacted_fragments_writer::consume(range_tombstone_change&& rtc) {
maybe_abort_compaction();
_current_partition.current_emitted_tombstone = rtc.tombstone();
track_last_position(rtc.position());
return _compaction_writer->writer.consume(std::move(rtc));
}
stop_iteration compacted_fragments_writer::consume_end_of_partition() {
auto ret = do_consume_end_of_partition();
auto ret = _compaction_writer->writer.consume_end_of_partition();
_unclosed_partition = false;
if (ret == stop_iteration::yes) {
stop_current_writer();
// stop sstable writer being currently used.
_stop_compaction_writer(&*_compaction_writer);
_compaction_writer = std::nullopt;
}
return ret;
}
@@ -1284,9 +1183,9 @@ public:
type,
schema.ks_name(),
schema.cf_name(),
new_key.key().with_schema(schema),
partition_key_to_string(new_key.key(), schema),
new_key,
current_key.key().with_schema(schema),
partition_key_to_string(current_key.key(), schema),
current_key,
action.empty() ? "" : "; ",
action);
@@ -1299,9 +1198,9 @@ public:
type,
schema.ks_name(),
schema.cf_name(),
new_key.key().with_schema(schema),
partition_key_to_string(new_key.key(), schema),
new_key,
current_key.key().with_schema(schema),
partition_key_to_string(current_key.key(), schema),
current_key,
action.empty() ? "" : "; ",
action);
@@ -1319,7 +1218,7 @@ public:
mf.mutation_fragment_kind(),
mf.has_key() ? format(" with key {}", mf.key().with_schema(schema)) : "",
mf.position(),
key.key().with_schema(schema),
partition_key_to_string(key.key(), schema),
key,
prev_pos.region(),
prev_pos.has_key() ? format(" with key {}", prev_pos.key().with_schema(schema)) : "",
@@ -1331,10 +1230,14 @@ public:
const auto& schema = validator.schema();
const auto& key = validator.previous_partition_key();
clogger.error("[{} compaction {}.{}] Invalid end-of-stream, last partition {} ({}) didn't end with a partition-end fragment{}{}",
type, schema.ks_name(), schema.cf_name(), key.key().with_schema(schema), key, action.empty() ? "" : "; ", action);
type, schema.ks_name(), schema.cf_name(), partition_key_to_string(key.key(), schema), key, action.empty() ? "" : "; ", action);
}
private:
static sstring partition_key_to_string(const partition_key& key, const ::schema& s) {
sstring ret = format("{}", key.with_schema(s));
return utils::utf8::validate((const uint8_t*)ret.data(), ret.size()) ? ret : "<non-utf8-key>";
}
class reader : public flat_mutation_reader_v2::impl {
using skip = bool_class<class skip_tag>;
@@ -1346,20 +1249,17 @@ private:
uint64_t& _validation_errors;
private:
void maybe_abort_scrub(std::function<void()> report_error) {
void maybe_abort_scrub() {
if (_scrub_mode == compaction_type_options::scrub::mode::abort) {
report_error();
throw compaction_aborted_exception(_schema->ks_name(), _schema->cf_name(), "scrub compaction found invalid data");
}
++_validation_errors;
}
void on_unexpected_partition_start(const mutation_fragment_v2& ps) {
auto report_fn = [this, &ps] (std::string_view action = "") {
report_invalid_partition_start(compaction_type::Scrub, _validator, ps.as_partition_start().key(), action);
};
maybe_abort_scrub(report_fn);
report_fn("Rectifying by adding assumed missing partition-end");
maybe_abort_scrub();
report_invalid_partition_start(compaction_type::Scrub, _validator, ps.as_partition_start().key(),
"Rectifying by adding assumed missing partition-end");
auto pe = mutation_fragment_v2(*_schema, _permit, partition_end{});
if (!_validator(pe)) {
@@ -1379,26 +1279,20 @@ private:
}
skip on_invalid_partition(const dht::decorated_key& new_key) {
auto report_fn = [this, &new_key] (std::string_view action = "") {
report_invalid_partition(compaction_type::Scrub, _validator, new_key, action);
};
maybe_abort_scrub(report_fn);
maybe_abort_scrub();
if (_scrub_mode == compaction_type_options::scrub::mode::segregate) {
report_fn("Detected");
report_invalid_partition(compaction_type::Scrub, _validator, new_key, "Detected");
_validator.reset(new_key);
// Let the segregating interposer consumer handle this.
return skip::no;
}
report_fn("Skipping");
report_invalid_partition(compaction_type::Scrub, _validator, new_key, "Skipping");
_skip_to_next_partition = true;
return skip::yes;
}
skip on_invalid_mutation_fragment(const mutation_fragment_v2& mf) {
auto report_fn = [this, &mf] (std::string_view action = "") {
report_invalid_mutation_fragment(compaction_type::Scrub, _validator, mf, "");
};
maybe_abort_scrub(report_fn);
maybe_abort_scrub();
const auto& key = _validator.previous_partition_key();
@@ -1413,7 +1307,8 @@ private:
// The only case a partition end is invalid is when it comes after
// another partition end, and we can just drop it in that case.
if (!mf.is_end_of_partition() && _scrub_mode == compaction_type_options::scrub::mode::segregate) {
report_fn("Injecting partition start/end to segregate out-of-order fragment");
report_invalid_mutation_fragment(compaction_type::Scrub, _validator, mf,
"Injecting partition start/end to segregate out-of-order fragment");
push_mutation_fragment(*_schema, _permit, partition_end{});
// We loose the partition tombstone if any, but it will be
@@ -1426,19 +1321,16 @@ private:
return skip::no;
}
report_fn("Skipping");
report_invalid_mutation_fragment(compaction_type::Scrub, _validator, mf, "Skipping");
return skip::yes;
}
void on_invalid_end_of_stream() {
auto report_fn = [this] (std::string_view action = "") {
report_invalid_end_of_stream(compaction_type::Scrub, _validator, action);
};
maybe_abort_scrub(report_fn);
maybe_abort_scrub();
// Handle missing partition_end
push_mutation_fragment(mutation_fragment_v2(*_schema, _permit, partition_end{}));
report_fn("Rectifying by adding missing partition-end to the end of the stream");
report_invalid_end_of_stream(compaction_type::Scrub, _validator, "Rectifying by adding missing partition-end to the end of the stream");
}
void fill_buffer_from_underlying() {
@@ -1617,7 +1509,7 @@ class resharding_compaction final : public compaction {
uint64_t estimated_partitions = 0;
};
std::vector<estimated_values> _estimation_per_shard;
std::vector<run_id> _run_identifiers;
std::vector<utils::UUID> _run_identifiers;
private:
// return estimated partitions per sstable for a given shard
uint64_t partitions_per_sstable(shard_id s) const {
@@ -1641,7 +1533,7 @@ public:
}
}
for (auto i : boost::irange(0u, smp::count)) {
_run_identifiers[i] = run_id::create_random_id();
_run_identifiers[i] = utils::make_random_uuid();
}
}
@@ -1874,7 +1766,7 @@ get_fully_expired_sstables(const table_state& table_s, const std::vector<sstable
int64_t min_timestamp = std::numeric_limits<int64_t>::max();
for (auto& sstable : overlapping) {
auto gc_before = sstable->get_gc_before_for_fully_expire(compaction_time, table_s.get_tombstone_gc_state());
auto gc_before = sstable->get_gc_before_for_fully_expire(compaction_time);
if (sstable->get_max_local_deletion_time() >= gc_before) {
min_timestamp = std::min(min_timestamp, sstable->get_stats_metadata().min_timestamp);
}
@@ -1893,7 +1785,7 @@ get_fully_expired_sstables(const table_state& table_s, const std::vector<sstable
// SStables that do not contain live data is added to list of possibly expired sstables.
for (auto& candidate : compacting) {
auto gc_before = candidate->get_gc_before_for_fully_expire(compaction_time, table_s.get_tombstone_gc_state());
auto gc_before = candidate->get_gc_before_for_fully_expire(compaction_time);
clogger.debug("Checking if candidate of generation {} and max_deletion_time {} is expired, gc_before is {}",
candidate->generation(), candidate->get_stats_metadata().max_local_deletion_time, gc_before);
// A fully expired sstable which has an ancestor undeleted shouldn't be compacted because
@@ -1924,7 +1816,7 @@ get_fully_expired_sstables(const table_state& table_s, const std::vector<sstable
}
unsigned compaction_descriptor::fan_in() const {
return boost::copy_range<std::unordered_set<run_id>>(sstables | boost::adaptors::transformed(std::mem_fn(&sstables::sstable::run_identifier))).size();
return boost::copy_range<std::unordered_set<utils::UUID>>(sstables | boost::adaptors::transformed(std::mem_fn(&sstables::sstable::run_identifier))).size();
}
uint64_t compaction_descriptor::sstables_size() const {

View File

@@ -80,8 +80,10 @@ struct compaction_data {
}
void stop(sstring reason) {
stop_requested = std::move(reason);
abort.request_abort();
if (!abort.abort_requested()) {
stop_requested = std::move(reason);
abort.request_abort();
}
}
};

View File

@@ -14,7 +14,7 @@
#include <variant>
#include <seastar/core/smp.hh>
#include <seastar/core/file.hh>
#include "sstables/types_fwd.hh"
#include "sstables/shared_sstable.hh"
#include "sstables/sstable_set.hh"
#include "utils/UUID.hh"
#include "dht/i_partitioner.hh"
@@ -153,10 +153,8 @@ struct compaction_descriptor {
int level;
// Threshold size for sstable(s) to be created.
uint64_t max_sstable_bytes;
// Can split large partitions at clustering boundary.
bool can_split_large_partition = false;
// Run identifier of output sstables.
sstables::run_id run_identifier;
utils::UUID run_identifier;
// The options passed down to the compaction code.
// This also selects the kind of compaction to do.
compaction_type_options options = compaction_type_options::make_regular();
@@ -178,7 +176,7 @@ struct compaction_descriptor {
::io_priority_class io_priority,
int level = default_level,
uint64_t max_sstable_bytes = default_max_sstable_bytes,
run_id run_identifier = run_id::create_random_id(),
utils::UUID run_identifier = utils::make_random_uuid(),
compaction_type_options options = compaction_type_options::make_regular())
: sstables(std::move(sstables))
, level(level)
@@ -194,7 +192,7 @@ struct compaction_descriptor {
: sstables(std::move(sstables))
, level(default_level)
, max_sstable_bytes(default_max_sstable_bytes)
, run_identifier(run_id::create_random_id())
, run_identifier(utils::make_random_uuid())
, options(compaction_type_options::make_regular())
, io_priority(io_priority)
, has_only_fully_expired(has_only_fully_expired)

View File

@@ -19,7 +19,6 @@
#include "locator/abstract_replication_strategy.hh"
#include "utils/fb_utilities.hh"
#include "utils/UUID_gen.hh"
#include "db/system_keyspace.hh"
#include <cmath>
#include <boost/algorithm/cxx11/any_of.hpp>
#include <boost/range/algorithm/remove_if.hpp>
@@ -207,7 +206,7 @@ std::vector<sstables::shared_sstable> compaction_manager::get_candidates(compact
candidates.reserve(t.main_sstable_set().all()->size());
// prevents sstables that belongs to a partial run being generated by ongoing compaction from being
// selected for compaction, which could potentially result in wrong behavior.
auto partial_run_identifiers = boost::copy_range<std::unordered_set<sstables::run_id>>(_tasks
auto partial_run_identifiers = boost::copy_range<std::unordered_set<utils::UUID>>(_tasks
| boost::adaptors::filtered(std::mem_fn(&task::generating_output_run))
| boost::adaptors::transformed(std::mem_fn(&task::output_run_id)));
auto& cs = t.get_compaction_strategy();
@@ -348,15 +347,8 @@ future<sstables::compaction_result> compaction_manager::task::compact_sstables(s
future<> compaction_manager::task::update_history(compaction::table_state& t, const sstables::compaction_result& res, const sstables::compaction_data& cdata) {
auto ended_at = std::chrono::duration_cast<std::chrono::milliseconds>(res.stats.ended_at.time_since_epoch());
if (_cm._sys_ks) {
// FIXME: add support to merged_rows. merged_rows is a histogram that
// shows how many sstables each row is merged from. This information
// cannot be accessed until we make combined_reader more generic,
// for example, by adding a reducer method.
auto sys_ks = _cm._sys_ks; // hold pointer on sys_ks
co_await sys_ks->update_compaction_history(cdata.compaction_uuid, t.schema()->ks_name(), t.schema()->cf_name(),
ended_at.count(), res.stats.start_size, res.stats.end_size, std::unordered_map<int32_t, int64_t>{});
}
co_return co_await t.update_compaction_history(cdata.compaction_uuid, t.schema()->ks_name(), t.schema()->cf_name(), ended_at,
res.stats.start_size, res.stats.end_size);
}
class compaction_manager::major_compaction_task : public compaction_manager::task {
@@ -619,7 +611,7 @@ future<semaphore_units<named_semaphore_exception_factory>> compaction_manager::t
});
}
void compaction_manager::task::setup_new_compaction(sstables::run_id output_run_id) {
void compaction_manager::task::setup_new_compaction(utils::UUID output_run_id) {
_compaction_data = create_compaction_data();
_output_run_identifier = output_run_id;
switch_state(state::active);
@@ -627,7 +619,7 @@ void compaction_manager::task::setup_new_compaction(sstables::run_id output_run_
void compaction_manager::task::finish_compaction(state finish_state) noexcept {
switch_state(finish_state);
_output_run_identifier = sstables::run_id::create_null_id();
_output_run_identifier = utils::null_uuid();
if (finish_state != state::failed) {
_compaction_retry.reset();
}
@@ -665,7 +657,6 @@ compaction_manager::compaction_manager(config cfg, abort_source& as)
, _update_compaction_static_shares_action([this] { return update_static_shares(static_shares()); })
, _compaction_static_shares_observer(_cfg.static_shares.observe(_update_compaction_static_shares_action.make_observer()))
, _strategy_control(std::make_unique<strategy_control>(*this))
, _tombstone_gc_state(&_repair_history_maps)
{
register_metrics();
// Bandwidth throttling is node-wide, updater is needed on single shard
@@ -685,7 +676,6 @@ compaction_manager::compaction_manager()
, _update_compaction_static_shares_action([] { return make_ready_future<>(); })
, _compaction_static_shares_observer(_cfg.static_shares.observe(_update_compaction_static_shares_action.make_observer()))
, _strategy_control(std::make_unique<strategy_control>(*this))
, _tombstone_gc_state(&_repair_history_maps)
{
// No metric registration because this constructor is supposed to be used only by the testing
// infrastructure.
@@ -799,27 +789,23 @@ future<> compaction_manager::stop_tasks(std::vector<shared_ptr<task>> tasks, sst
});
}
future<> compaction_manager::stop_ongoing_compactions(sstring reason, compaction::table_state* t, std::optional<sstables::compaction_type> type_opt) noexcept {
try {
auto ongoing_compactions = get_compactions(t).size();
auto tasks = boost::copy_range<std::vector<shared_ptr<task>>>(_tasks | boost::adaptors::filtered([t, type_opt] (auto& task) {
return (!t || task->compacting_table() == t) && (!type_opt || task->type() == *type_opt);
}));
logging::log_level level = tasks.empty() ? log_level::debug : log_level::info;
if (cmlog.is_enabled(level)) {
std::string scope = "";
if (t) {
scope = fmt::format(" for table {}.{}", t->schema()->ks_name(), t->schema()->cf_name());
}
if (type_opt) {
scope += fmt::format(" {} type={}", scope.size() ? "and" : "for", *type_opt);
}
cmlog.log(level, "Stopping {} tasks for {} ongoing compactions{} due to {}", tasks.size(), ongoing_compactions, scope, reason);
future<> compaction_manager::stop_ongoing_compactions(sstring reason, compaction::table_state* t, std::optional<sstables::compaction_type> type_opt) {
auto ongoing_compactions = get_compactions(t).size();
auto tasks = boost::copy_range<std::vector<shared_ptr<task>>>(_tasks | boost::adaptors::filtered([t, type_opt] (auto& task) {
return (!t || task->compacting_table() == t) && (!type_opt || task->type() == *type_opt);
}));
logging::log_level level = tasks.empty() ? log_level::debug : log_level::info;
if (cmlog.is_enabled(level)) {
std::string scope = "";
if (t) {
scope = fmt::format(" for table {}.{}", t->schema()->ks_name(), t->schema()->cf_name());
}
return stop_tasks(std::move(tasks), std::move(reason));
} catch (...) {
return current_exception_as_future<>();
if (type_opt) {
scope += fmt::format(" {} type={}", scope.size() ? "and" : "for", *type_opt);
}
cmlog.log(level, "Stopping {} tasks for {} ongoing compactions{} due to {}", tasks.size(), ongoing_compactions, scope, reason);
}
return stop_tasks(std::move(tasks), std::move(reason));
}
future<> compaction_manager::drain() {
@@ -871,7 +857,7 @@ auto swallow_enospc(const Ex& ex) noexcept {
}
void compaction_manager::do_stop() noexcept {
if (_state == state::none || _stop_future) {
if (_state == state::none || _state == state::stopped) {
return;
}
@@ -1041,7 +1027,7 @@ future<> compaction_manager::maybe_wait_for_sstable_count_reduction(compaction::
auto num_runs_for_compaction = [&, this] {
auto& cs = t.get_compaction_strategy();
auto desc = cs.get_sstables_for_compaction(t, get_strategy_control(), get_candidates(t));
return boost::copy_range<std::unordered_set<sstables::run_id>>(
return boost::copy_range<std::unordered_set<utils::UUID>>(
desc.sstables
| boost::adaptors::transformed(std::mem_fn(&sstables::sstable::run_identifier))).size();
};
@@ -1581,7 +1567,7 @@ void compaction_manager::add(compaction::table_state& t) {
}
}
future<> compaction_manager::remove(compaction::table_state& t) noexcept {
future<> compaction_manager::remove(compaction::table_state& t) {
auto handle = _compaction_state.extract(&t);
if (!handle.empty()) {
@@ -1689,14 +1675,6 @@ compaction::strategy_control& compaction_manager::get_strategy_control() const n
return *_strategy_control;
}
void compaction_manager::plug_system_keyspace(db::system_keyspace& sys_ks) noexcept {
_sys_ks = sys_ks.shared_from_this();
}
void compaction_manager::unplug_system_keyspace() noexcept {
_sys_ks = nullptr;
}
double compaction_backlog_tracker::backlog() const {
return disabled() ? compaction_controller::disable_backlog : _impl->backlog(_ongoing_writes, _ongoing_compactions);
}

View File

@@ -8,9 +8,6 @@
#pragma once
#include <boost/icl/interval.hpp>
#include <boost/icl/interval_map.hpp>
#include <seastar/core/semaphore.hh>
#include <seastar/core/sstring.hh>
#include <seastar/core/shared_ptr.hh>
@@ -36,19 +33,9 @@
#include "backlog_controller.hh"
#include "seastarx.hh"
#include "sstables/exceptions.hh"
#include "tombstone_gc.hh"
namespace db {
class system_keyspace;
}
class compacting_sstable_registration;
class repair_history_map {
public:
boost::icl::interval_map<dht::token, gc_clock::time_point, boost::icl::partial_absorber, std::less, boost::icl::inplace_max> map;
};
// Compaction manager provides facilities to submit and track compaction jobs on
// behalf of existing tables.
class compaction_manager {
@@ -123,7 +110,7 @@ public:
shared_future<compaction_stats_opt> _compaction_done = make_ready_future<compaction_stats_opt>();
exponential_backoff_retry _compaction_retry = exponential_backoff_retry(std::chrono::seconds(5), std::chrono::seconds(300));
sstables::compaction_type _type;
sstables::run_id _output_run_identifier;
utils::UUID _output_run_identifier;
gate::holder _gate_holder;
sstring _description;
@@ -147,7 +134,7 @@ public:
// Return true if the task isn't stopped
// and the compaction manager allows proceeding.
inline bool can_proceed(throw_if_stopping do_throw_if_stopping = throw_if_stopping::no) const;
void setup_new_compaction(sstables::run_id output_run_id = sstables::run_id::create_null_id());
void setup_new_compaction(utils::UUID output_run_id = utils::null_uuid());
void finish_compaction(state finish_state = state::done) noexcept;
// Compaction manager stop itself if it finds an storage I/O error which results in
@@ -192,7 +179,7 @@ public:
bool generating_output_run() const noexcept {
return compaction_running() && _output_run_identifier;
}
const sstables::run_id& output_run_id() const noexcept {
const utils::UUID& output_run_id() const noexcept {
return _output_run_identifier;
}
@@ -289,8 +276,6 @@ private:
// being picked more than once.
seastar::named_semaphore _off_strategy_sem = {1, named_semaphore_exception_factory{"off-strategy compaction"}};
seastar::shared_ptr<db::system_keyspace> _sys_ks;
std::function<void()> compaction_submission_callback();
// all registered tables are reevaluated at a constant interval.
// Submission is a NO-OP when there's nothing to do, so it's fine to call it regularly.
@@ -309,9 +294,6 @@ private:
class strategy_control;
std::unique_ptr<strategy_control> _strategy_control;
per_table_history_maps _repair_history_maps;
tombstone_gc_state _tombstone_gc_state;
private:
future<compaction_stats_opt> perform_task(shared_ptr<task>);
@@ -488,9 +470,6 @@ public:
// Run a function with compaction temporarily disabled for a table T.
future<> run_with_compaction_disabled(compaction::table_state& t, std::function<future<> ()> func);
void plug_system_keyspace(db::system_keyspace& sys_ks) noexcept;
void unplug_system_keyspace() noexcept;
// Adds a table to the compaction manager.
// Creates a compaction_state structure that can be used for submitting
// compaction jobs of all types.
@@ -498,7 +477,7 @@ public:
// Remove a table from the compaction manager.
// Cancel requests on table and wait for possible ongoing compactions.
future<> remove(compaction::table_state& t) noexcept;
future<> remove(compaction::table_state& t);
const stats& get_stats() const {
return _stats;
@@ -515,7 +494,7 @@ public:
future<> stop_compaction(sstring type, compaction::table_state* table = nullptr);
// Stops ongoing compaction of a given table and/or compaction_type.
future<> stop_ongoing_compactions(sstring reason, compaction::table_state* t = nullptr, std::optional<sstables::compaction_type> type_opt = {}) noexcept;
future<> stop_ongoing_compactions(sstring reason, compaction::table_state* t = nullptr, std::optional<sstables::compaction_type> type_opt = {});
double backlog() {
return _backlog_manager.backlog();
@@ -529,14 +508,6 @@ public:
compaction::strategy_control& get_strategy_control() const noexcept;
tombstone_gc_state& get_tombstone_gc_state() noexcept {
return _tombstone_gc_state;
};
const tombstone_gc_state& get_tombstone_gc_state() const noexcept {
return _tombstone_gc_state;
};
friend class compacting_sstable_registration;
friend class compaction_weight_registration;
friend class compaction_manager_test;

View File

@@ -50,7 +50,7 @@ std::vector<compaction_descriptor> compaction_strategy_impl::get_cleanup_compact
}));
}
bool compaction_strategy_impl::worth_dropping_tombstones(const shared_sstable& sst, gc_clock::time_point compaction_time, const tombstone_gc_state& gc_state) {
bool compaction_strategy_impl::worth_dropping_tombstones(const shared_sstable& sst, gc_clock::time_point compaction_time) {
if (_disable_tombstone_compaction) {
return false;
}
@@ -61,7 +61,7 @@ bool compaction_strategy_impl::worth_dropping_tombstones(const shared_sstable& s
if (db_clock::now()-_tombstone_compaction_interval < sst->data_file_write_time()) {
return false;
}
auto gc_before = sst->get_gc_before_for_drop_estimation(compaction_time, gc_state);
auto gc_before = sst->get_gc_before_for_drop_estimation(compaction_time);
return sst->estimate_droppable_tombstone_ratio(gc_before) >= _tombstone_threshold;
}
@@ -409,7 +409,9 @@ public:
l0_old_ssts.push_back(std::move(sst));
}
}
_l0_scts.replace_sstables(std::move(l0_old_ssts), std::move(l0_new_ssts));
if (l0_old_ssts.size() || l0_new_ssts.size()) {
_l0_scts.replace_sstables(std::move(l0_old_ssts), std::move(l0_new_ssts));
}
}
};
@@ -670,8 +672,8 @@ compaction_descriptor date_tiered_compaction_strategy::get_sstables_for_compacti
}
// filter out sstables which droppable tombstone ratio isn't greater than the defined threshold.
auto e = boost::range::remove_if(candidates, [this, compaction_time, &table_s] (const sstables::shared_sstable& sst) -> bool {
return !worth_dropping_tombstones(sst, compaction_time, table_s.get_tombstone_gc_state());
auto e = boost::range::remove_if(candidates, [this, compaction_time] (const sstables::shared_sstable& sst) -> bool {
return !worth_dropping_tombstones(sst, compaction_time);
});
candidates.erase(e, candidates.end());
if (candidates.empty()) {

View File

@@ -13,7 +13,6 @@
#include "compaction_strategy.hh"
#include "db_clock.hh"
#include "compaction_descriptor.hh"
#include "tombstone_gc.hh"
namespace compaction {
class table_state;
@@ -68,7 +67,7 @@ public:
// Check if a given sstable is entitled for tombstone compaction based on its
// droppable tombstone histogram and gc_before.
bool worth_dropping_tombstones(const shared_sstable& sst, gc_clock::time_point compaction_time, const tombstone_gc_state& gc_state);
bool worth_dropping_tombstones(const shared_sstable& sst, gc_clock::time_point compaction_time);
virtual compaction_backlog_tracker& get_backlog_tracker() = 0;

View File

@@ -39,16 +39,16 @@ compaction_descriptor leveled_compaction_strategy::get_sstables_for_compaction(t
for (auto level = int(manifest.get_level_count()); level >= 0; level--) {
auto& sstables = manifest.get_level(level);
// filter out sstables which droppable tombstone ratio isn't greater than the defined threshold.
auto e = boost::range::remove_if(sstables, [this, compaction_time, &table_s] (const sstables::shared_sstable& sst) -> bool {
return !worth_dropping_tombstones(sst, compaction_time, table_s.get_tombstone_gc_state());
auto e = boost::range::remove_if(sstables, [this, compaction_time] (const sstables::shared_sstable& sst) -> bool {
return !worth_dropping_tombstones(sst, compaction_time);
});
sstables.erase(e, sstables.end());
if (sstables.empty()) {
continue;
}
auto& sst = *std::max_element(sstables.begin(), sstables.end(), [&] (auto& i, auto& j) {
auto gc_before1 = i->get_gc_before_for_drop_estimation(compaction_time, table_s.get_tombstone_gc_state());
auto gc_before2 = j->get_gc_before_for_drop_estimation(compaction_time, table_s.get_tombstone_gc_state());
auto gc_before1 = i->get_gc_before_for_drop_estimation(compaction_time);
auto gc_before2 = j->get_gc_before_for_drop_estimation(compaction_time);
return i->estimate_droppable_tombstone_ratio(gc_before1) < j->estimate_droppable_tombstone_ratio(gc_before2);
});
return sstables::compaction_descriptor({ sst }, service::get_local_compaction_priority(), sst->get_sstable_level());
@@ -200,10 +200,8 @@ leveled_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input
auto [disjoint, overlapping_sstables] = is_disjoint(level_info[level], tolerance(level));
if (!disjoint) {
auto ideal_level = ideal_level_for_input(input, max_sstable_size_in_bytes);
leveled_manifest::logger.warn("Turns out that level {} is not disjoint, found {} overlapping SSTables, so compacting everything on behalf of {}.{}", level, overlapping_sstables, schema->ks_name(), schema->cf_name());
// Unfortunately no good limit to limit input size to max_sstables for LCS major
compaction_descriptor desc(std::move(input), iop, ideal_level, max_sstable_size_in_bytes);
leveled_manifest::logger.warn("Turns out that level {} is not disjoint, found {} overlapping SSTables, so the level will be entirely compacted on behalf of {}.{}", level, overlapping_sstables, schema->ks_name(), schema->cf_name());
compaction_descriptor desc(std::move(level_info[level]), iop, level, max_sstable_size_in_bytes);
desc.options = compaction_type_options::make_reshape();
return desc;
}

View File

@@ -169,8 +169,8 @@ size_tiered_compaction_strategy::get_sstables_for_compaction(table_state& table_
// tombstone purge, i.e. less likely to shadow even older data.
for (auto&& sstables : buckets | boost::adaptors::reversed) {
// filter out sstables which droppable tombstone ratio isn't greater than the defined threshold.
auto e = boost::range::remove_if(sstables, [this, compaction_time, &table_s] (const sstables::shared_sstable& sst) -> bool {
return !worth_dropping_tombstones(sst, compaction_time, table_s.get_tombstone_gc_state());
auto e = boost::range::remove_if(sstables, [this, compaction_time] (const sstables::shared_sstable& sst) -> bool {
return !worth_dropping_tombstones(sst, compaction_time);
});
sstables.erase(e, sstables.end());
if (sstables.empty()) {

View File

@@ -40,9 +40,9 @@ public:
virtual sstables::shared_sstable make_sstable() const = 0;
virtual sstables::sstable_writer_config configure_writer(sstring origin) const = 0;
virtual api::timestamp_type min_memtable_timestamp() const = 0;
virtual future<> update_compaction_history(utils::UUID compaction_id, sstring ks_name, sstring cf_name, std::chrono::milliseconds ended_at, int64_t bytes_in, int64_t bytes_out) = 0;
virtual future<> on_compaction_completion(sstables::compaction_completion_desc desc, sstables::offstrategy offstrategy) = 0;
virtual bool is_auto_compaction_disabled_by_user() const noexcept = 0;
virtual const tombstone_gc_state& get_tombstone_gc_state() const noexcept = 0;
};
}

View File

@@ -43,10 +43,6 @@ time_window_compaction_strategy_options::time_window_compaction_strategy_options
}
}
if (window_size <= 0) {
throw exceptions::configuration_exception(fmt::format("{} must be greater than 1 for compaction_window_size", window_size));
}
sstable_window_size = window_size * window_unit;
it = options.find(EXPIRED_SSTABLE_CHECK_FREQUENCY_SECONDS_KEY);
@@ -274,8 +270,8 @@ time_window_compaction_strategy::get_next_non_expired_sstables(table_state& tabl
// if there is no sstable to compact in standard way, try compacting single sstable whose droppable tombstone
// ratio is greater than threshold.
auto e = boost::range::remove_if(non_expiring_sstables, [this, compaction_time, &table_s] (const shared_sstable& sst) -> bool {
return !worth_dropping_tombstones(sst, compaction_time, table_s.get_tombstone_gc_state());
auto e = boost::range::remove_if(non_expiring_sstables, [this, compaction_time] (const shared_sstable& sst) -> bool {
return !worth_dropping_tombstones(sst, compaction_time);
});
non_expiring_sstables.erase(e, non_expiring_sstables.end());
if (non_expiring_sstables.empty()) {

View File

@@ -10,6 +10,88 @@
#pragma once
#include "dht/i_partitioner.hh"
#include <optional>
#include <variant>
// Wraps ring_position_view so it is compatible with old-style C++: default
// constructor, stateless comparators, yada yada.
class compatible_ring_position_view {
const ::schema* _schema = nullptr;
// Optional to supply a default constructor, no more.
std::optional<dht::ring_position_view> _rpv;
public:
constexpr compatible_ring_position_view() = default;
compatible_ring_position_view(const schema& s, dht::ring_position_view rpv)
: _schema(&s), _rpv(rpv) {
}
const dht::ring_position_view& position() const {
return *_rpv;
}
const ::schema& schema() const {
return *_schema;
}
friend std::strong_ordering tri_compare(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return dht::ring_position_tri_compare(*x._schema, *x._rpv, *y._rpv);
}
friend bool operator<(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) != 0;
}
};
// Wraps ring_position so it is compatible with old-style C++: default
// constructor, stateless comparators, yada yada.
class compatible_ring_position {
schema_ptr _schema;
// Optional to supply a default constructor, no more.
std::optional<dht::ring_position> _rp;
public:
constexpr compatible_ring_position() = default;
compatible_ring_position(schema_ptr s, dht::ring_position rp)
: _schema(std::move(s)), _rp(std::move(rp)) {
}
dht::ring_position_view position() const {
return *_rp;
}
const ::schema& schema() const {
return *_schema;
}
friend std::strong_ordering tri_compare(const compatible_ring_position& x, const compatible_ring_position& y) {
return dht::ring_position_tri_compare(*x._schema, *x._rp, *y._rp);
}
friend bool operator<(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) != 0;
}
};
// Wraps ring_position or ring_position_view so either is compatible with old-style C++: default
// constructor, stateless comparators, yada yada.
@@ -17,22 +99,37 @@
// on callers to keep ring position alive, allow lookup on containers that don't support different
// key types, and also avoiding unnecessary copies.
class compatible_ring_position_or_view {
schema_ptr _schema;
lw_shared_ptr<dht::ring_position> _rp;
dht::ring_position_view_opt _rpv; // optional only for default ctor, nothing more
// Optional to supply a default constructor, no more.
std::optional<std::variant<compatible_ring_position, compatible_ring_position_view>> _crp_or_view;
public:
compatible_ring_position_or_view() = default;
constexpr compatible_ring_position_or_view() = default;
explicit compatible_ring_position_or_view(schema_ptr s, dht::ring_position rp)
: _schema(std::move(s)), _rp(make_lw_shared<dht::ring_position>(std::move(rp))), _rpv(dht::ring_position_view(*_rp)) {
: _crp_or_view(compatible_ring_position(std::move(s), std::move(rp))) {
}
explicit compatible_ring_position_or_view(const schema& s, dht::ring_position_view rpv)
: _schema(s.shared_from_this()), _rpv(rpv) {
: _crp_or_view(compatible_ring_position_view(s, rpv)) {
}
const dht::ring_position_view& position() const {
return *_rpv;
dht::ring_position_view position() const {
struct rpv_accessor {
dht::ring_position_view operator()(const compatible_ring_position& crp) {
return crp.position();
}
dht::ring_position_view operator()(const compatible_ring_position_view& crpv) {
return crpv.position();
}
};
return std::visit(rpv_accessor{}, *_crp_or_view);
}
friend std::strong_ordering tri_compare(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return dht::ring_position_tri_compare(*x._schema, x.position(), y.position());
struct schema_accessor {
const ::schema& operator()(const compatible_ring_position& crp) {
return crp.schema();
}
const ::schema& operator()(const compatible_ring_position_view& crpv) {
return crpv.schema();
}
};
return dht::ring_position_tri_compare(std::visit(schema_accessor{}, *x._crp_or_view), x.position(), y.position());
}
friend bool operator<(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) < 0;

View File

@@ -410,9 +410,6 @@ commitlog_total_space_in_mb: -1
# Log a warning when row number is larger than this value
# compaction_rows_count_warning_threshold: 100000
# Log a warning when writing a collection containing more elements than this value
# compaction_collection_elements_count_warning_threshold: 10000
# How long the coordinator should wait for seq or index scans to complete
# range_request_timeout_in_ms: 10000
# How long the coordinator should wait for writes to complete

View File

@@ -44,18 +44,32 @@ distro_extra_cflags = ''
distro_extra_ldflags = ''
distro_extra_cmake_args = []
employ_ld_trickery = True
has_wasmtime = False
use_wasmtime_as_library = False
# distro-specific setup
def distro_setup_nix():
global os_ids, employ_ld_trickery, has_wasmtime, use_wasmtime_as_library
global os_ids, employ_ld_trickery
global distro_extra_ldflags, distro_extra_cflags, distro_extra_cmake_args
os_ids = ['linux']
employ_ld_trickery = False
has_wasmtime = True
use_wasmtime_as_library = True
if os.environ.get('NIX_CC'):
libdirs = list(dict.fromkeys(os.environ.get('CMAKE_LIBRARY_PATH').split(':')))
incdirs = list(dict.fromkeys(os.environ.get('CMAKE_INCLUDE_PATH').split(':')))
# add nix {lib,inc}dirs to relevant flags, mimicing nix versions of cmake & autotools.
# also add rpaths to make sure that any built executables can run in place.
distro_extra_ldflags = ' '.join([ '-rpath ' + path + ' -L' + path for path in libdirs ]);
distro_extra_cflags = ' '.join([ '-isystem ' + path for path in incdirs ])
# indexers like clangd may or may not know which stdc++ or glibc
# the compiler was configured with, so make the relevant paths
# explicit on each compilation command line:
implicit_cflags = os.environ.get('IMPLICIT_CFLAGS').strip()
distro_extra_cflags += ' ' + implicit_cflags
# also propagate to cmake-built dependencies:
distro_extra_cmake_args = ['-DCMAKE_CXX_STANDARD_INCLUDE_DIRECTORIES:INTERNAL=' + implicit_cflags]
if os.environ.get('NIX_BUILD_TOP'):
distro_setup_nix()
# distribution "internationalization", converting package names.
@@ -416,7 +430,6 @@ scylla_tests = set([
'test/boost/estimated_histogram_test',
'test/boost/summary_test',
'test/boost/logalloc_test',
'test/boost/logalloc_standard_allocator_segment_pool_backend_test',
'test/boost/managed_vector_test',
'test/boost/managed_bytes_test',
'test/boost/intrusive_array_test',
@@ -448,7 +461,7 @@ scylla_tests = set([
'test/boost/schema_change_test',
'test/boost/schema_registry_test',
'test/boost/secondary_index_test',
'test/boost/tracing_test',
'test/boost/tracing',
'test/boost/index_with_paging_test',
'test/boost/serialization_test',
'test/boost/serialized_action_test',
@@ -549,7 +562,6 @@ raft_tests = set([
'test/raft/replication_test',
'test/raft/randomized_nemesis_test',
'test/raft/many_test',
'test/raft/raft_server_test',
'test/raft/fsm_test',
'test/raft/etcd_test',
'test/raft/raft_sys_table_storage_test',
@@ -642,8 +654,6 @@ arg_parser.add_argument('--clang-inline-threshold', action='store', type=int, de
help="LLVM-specific inline threshold compilation parameter")
arg_parser.add_argument('--list-artifacts', dest='list_artifacts', action='store_true', default=False,
help='List all available build artifacts, that can be passed to --with')
arg_parser.add_argument('--date-stamp', dest='date_stamp', type=str,
help='Set datestamp for SCYLLA-VERSION-GEN')
args = arg_parser.parse_args()
if args.list_artifacts:
@@ -653,7 +663,6 @@ if args.list_artifacts:
defines = ['XXH_PRIVATE_API',
'SEASTAR_TESTING_MAIN',
'FMT_DEPRECATED_OSTREAM',
]
scylla_raft_core = [
@@ -664,8 +673,7 @@ scylla_raft_core = [
'raft/log.cc',
]
scylla_core = (['message/messaging_service.cc',
'replica/database.cc',
scylla_core = (['replica/database.cc',
'replica/table.cc',
'replica/distributed_loader.cc',
'replica/memtable.cc',
@@ -793,8 +801,6 @@ scylla_core = (['message/messaging_service.cc',
'cql3/statements/raw/parsed_statement.cc',
'cql3/statements/property_definitions.cc',
'cql3/statements/update_statement.cc',
'cql3/statements/strongly_consistent_modification_statement.cc',
'cql3/statements/strongly_consistent_select_statement.cc',
'cql3/statements/delete_statement.cc',
'cql3/statements/prune_materialized_view_statement.cc',
'cql3/statements/batch_statement.cc',
@@ -943,6 +949,7 @@ scylla_core = (['message/messaging_service.cc',
'locator/ec2_snitch.cc',
'locator/ec2_multi_region_snitch.cc',
'locator/gce_snitch.cc',
'message/messaging_service.cc',
'service/client_state.cc',
'service/storage_service.cc',
'service/misc_services.cc',
@@ -994,6 +1001,7 @@ scylla_core = (['message/messaging_service.cc',
'tracing/tracing.cc',
'tracing/trace_keyspace_helper.cc',
'tracing/trace_state.cc',
'tracing/tracing_backend_registry.cc',
'tracing/traced_file.cc',
'table_helper.cc',
'range_tombstone.cc',
@@ -1031,8 +1039,6 @@ scylla_core = (['message/messaging_service.cc',
'service/raft/raft_group0.cc',
'direct_failure_detector/failure_detector.cc',
'service/raft/raft_group0_client.cc',
'service/broadcast_tables/experimental/lang.cc',
'tasks/task_manager.cc',
] + [Antlr3Grammar('cql3/Cql.g')] + [Thrift('interface/cassandra.thrift', 'Cassandra')] \
+ scylla_raft_core
)
@@ -1069,10 +1075,6 @@ api = ['api/api.cc',
'api/stream_manager.cc',
Json2Code('api/api-doc/system.json'),
'api/system.cc',
Json2Code('api/api-doc/task_manager.json'),
'api/task_manager.cc',
Json2Code('api/api-doc/task_manager_test.json'),
'api/task_manager_test.cc',
'api/config.cc',
Json2Code('api/api-doc/config.json'),
'api/error_injection.cc',
@@ -1147,7 +1149,6 @@ idls = ['idl/gossip_digest.idl.hh',
'idl/replica_exception.idl.hh',
'idl/per_partition_rate_limit_info.idl.hh',
'idl/position_in_partition.idl.hh',
'idl/experimental/broadcast_tables_lang.idl.hh',
]
rusts = [
@@ -1175,7 +1176,7 @@ scylla_tests_dependencies = scylla_core + idls + scylla_tests_generic_dependenci
'test/lib/random_schema.cc',
]
scylla_raft_dependencies = scylla_raft_core + ['utils/uuid.cc', 'utils/error_injection.cc']
scylla_raft_dependencies = scylla_raft_core + ['utils/uuid.cc']
scylla_tools = ['tools/scylla-types.cc', 'tools/scylla-sstable.cc', 'tools/schema_loader.cc', 'tools/utils.cc']
@@ -1299,7 +1300,7 @@ deps['test/boost/linearizing_input_stream_test'] = [
"test/boost/linearizing_input_stream_test.cc",
"test/lib/log.cc",
]
deps['test/boost/expr_test'] = ['test/boost/expr_test.cc', 'test/lib/expr_test_utils.cc'] + scylla_core
deps['test/boost/expr_test'] = ['test/boost/expr_test.cc'] + scylla_core
deps['test/boost/rate_limiter_test'] = ['test/boost/rate_limiter_test.cc', 'db/rate_limiter.cc']
deps['test/boost/exceptions_optimized_test'] = ['test/boost/exceptions_optimized_test.cc', 'utils/exceptions.cc']
deps['test/boost/exceptions_fallback_test'] = ['test/boost/exceptions_fallback_test.cc', 'utils/exceptions.cc']
@@ -1309,7 +1310,6 @@ deps['test/boost/schema_loader_test'] += ['tools/schema_loader.cc']
deps['test/boost/rust_test'] += rusts
deps['test/raft/replication_test'] = ['test/raft/replication_test.cc', 'test/raft/replication.cc', 'test/raft/helpers.cc'] + scylla_raft_dependencies
deps['test/raft/raft_server_test'] = ['test/raft/raft_server_test.cc', 'test/raft/replication.cc', 'test/raft/helpers.cc'] + scylla_raft_dependencies
deps['test/raft/randomized_nemesis_test'] = ['test/raft/randomized_nemesis_test.cc', 'direct_failure_detector/failure_detector.cc', 'test/raft/helpers.cc'] + scylla_raft_dependencies
deps['test/raft/failure_detector_test'] = ['test/raft/failure_detector_test.cc', 'direct_failure_detector/failure_detector.cc', 'test/raft/helpers.cc'] + scylla_raft_dependencies
deps['test/raft/many_test'] = ['test/raft/many_test.cc', 'test/raft/replication.cc', 'test/raft/helpers.cc'] + scylla_raft_dependencies
@@ -1409,8 +1409,7 @@ if flag_supported(flag='-Wstack-usage=4096', compiler=args.cxx):
for mode in modes:
modes[mode]['cxxflags'] += f' -Wstack-usage={modes[mode]["stack-usage-threshold"]} -Wno-error=stack-usage='
if not has_wasmtime:
has_wasmtime = os.path.isfile('/usr/lib64/libwasmtime.a') and os.path.isdir('/usr/local/include/wasmtime')
has_wasmtime = os.path.isfile('/usr/lib64/libwasmtime.a') and os.path.isdir('/usr/local/include/wasmtime')
if has_wasmtime:
if platform.machine() == 'aarch64':
@@ -1540,8 +1539,7 @@ if args.artifacts:
else:
build_artifacts = all_artifacts
date_stamp = f"--date-stamp {args.date_stamp}" if args.date_stamp else ""
status = subprocess.call(f"./SCYLLA-VERSION-GEN {date_stamp}", shell=True)
status = subprocess.call("./SCYLLA-VERSION-GEN")
if status != 0:
print('Version file generation failed')
sys.exit(1)
@@ -1556,8 +1554,7 @@ scylla_product = file.read().strip()
arch = platform.machine()
for m, mode_config in modes.items():
mode_config['cxxflags'] += f" -DSCYLLA_BUILD_MODE={m}"
cxxflags = "-DSCYLLA_VERSION=\"\\\"" + scylla_version + "\\\"\" -DSCYLLA_RELEASE=\"\\\"" + scylla_release + "\\\"\""
cxxflags = "-DSCYLLA_VERSION=\"\\\"" + scylla_version + "\\\"\" -DSCYLLA_RELEASE=\"\\\"" + scylla_release + "\\\"\" -DSCYLLA_BUILD_MODE=\"\\\"" + m + "\\\"\""
mode_config["per_src_extra_cxxflags"]["release.cc"] = cxxflags
if mode_config["can_have_debug_info"]:
mode_config['cxxflags'] += ' ' + dbgflag
@@ -1742,8 +1739,6 @@ libs = ' '.join([maybe_static(args.staticyamlcpp, '-lyaml-cpp'), '-latomic', '-l
])
if has_wasmtime:
print("Found wasmtime dependency, linking with libwasmtime")
if use_wasmtime_as_library:
libs += " -lwasmtime"
if not args.staticboost:
args.user_cflags += ' -DBOOST_TEST_DYN_LINK'
@@ -1762,7 +1757,7 @@ if any(filter(thrift_version.startswith, thrift_boost_versions)):
for pkg in pkgs:
args.user_cflags += ' ' + pkg_config(pkg, '--cflags')
libs += ' ' + pkg_config(pkg, '--libs')
args.user_cflags += ' -isystem abseil'
args.user_cflags += ' -Iabseil'
user_cflags = args.user_cflags + ' -fvisibility=hidden'
user_ldflags = args.user_ldflags + ' -fvisibility=hidden'
if args.staticcxx:
@@ -1825,14 +1820,8 @@ with open(buildfile, 'w') as f:
rule copy
command = cp --reflink=auto $in $out
description = COPY $out
rule strip
command = scripts/strip.sh $in
rule package
command = scripts/create-relocatable-package.py --mode $mode $out
rule stripped_package
command = scripts/create-relocatable-package.py --stripped --mode $mode $out
rule debuginfo_package
command = dist/debuginfo/scripts/create-relocatable-package.py --mode $mode $out
rule rpmbuild
command = reloc/build_rpm.sh --reloc-pkg $in --builddir $out
rule debbuild
@@ -1931,7 +1920,7 @@ with open(buildfile, 'w') as f:
for src in srcs
if src.endswith('.cc')]
objs.append('$builddir/../utils/arch/powerpc/crc32-vpmsum/crc32.S')
if has_wasmtime and not use_wasmtime_as_library:
if has_wasmtime:
objs.append('/usr/lib64/libwasmtime.a')
has_thrift = False
for dep in deps[binary]:
@@ -1977,8 +1966,6 @@ with open(buildfile, 'w') as f:
f.write('build $builddir/{}/{}: {}.{} {} | {}\n'.format(mode, binary, regular_link_rule, mode, str.join(' ', objs), seastar_dep))
if has_thrift:
f.write(' libs = {} {} $seastar_libs_{} $libs\n'.format(thrift_libs, maybe_static(args.staticboost, '-lboost_system'), mode))
f.write(f'build $builddir/{mode}/{binary}.stripped: strip $builddir/{mode}/{binary}\n')
f.write(f'build $builddir/{mode}/{binary}.debug: phony $builddir/{mode}/{binary}.stripped\n')
for src in srcs:
if src.endswith('.cc'):
obj = '$builddir/' + mode + '/' + src.replace('.cc', '.o')
@@ -2087,7 +2074,8 @@ with open(buildfile, 'w') as f:
f.write('build {}: cxx.{} {} || {}\n'.format(obj, mode, cc, ' '.join(serializers)))
if cc.endswith('Parser.cpp'):
# Unoptimized parsers end up using huge amounts of stack space and overflowing their stack
flags = '-O1'
flags = '-O1' if modes[mode]['optimization-level'] in ['0', 'g', 's'] else ''
if has_sanitize_address_use_after_scope:
flags += ' -fno-sanitize-address-use-after-scope'
f.write(' obj_cxxflags = %s\n' % flags)
@@ -2113,36 +2101,22 @@ with open(buildfile, 'w') as f:
f.write(' target = iotune\n'.format(**locals()))
f.write(textwrap.dedent('''\
build $builddir/{mode}/iotune: copy $builddir/{mode}/seastar/apps/iotune/iotune
build $builddir/{mode}/iotune.stripped: strip $builddir/{mode}/iotune
build $builddir/{mode}/iotune.debug: phony $builddir/{mode}/iotune.stripped
''').format(**locals()))
if args.dist_only:
include_scylla_and_iotune = ''
include_scylla_and_iotune_stripped = ''
include_scylla_and_iotune_debug = ''
else:
include_scylla_and_iotune = f'$builddir/{mode}/scylla $builddir/{mode}/iotune'
include_scylla_and_iotune_stripped = f'$builddir/{mode}/scylla.stripped $builddir/{mode}/iotune.stripped'
include_scylla_and_iotune_debug = f'$builddir/{mode}/scylla.debug $builddir/{mode}/iotune.debug'
f.write('build $builddir/{mode}/dist/tar/{scylla_product}-unstripped-{scylla_version}-{scylla_release}.{arch}.tar.gz: package {include_scylla_and_iotune} $builddir/SCYLLA-RELEASE-FILE $builddir/SCYLLA-VERSION-FILE $builddir/debian/debian $builddir/node_exporter/node_exporter | always\n'.format(**locals()))
f.write(' mode = {mode}\n'.format(**locals()))
f.write('build $builddir/{mode}/dist/tar/{scylla_product}-{scylla_version}-{scylla_release}.{arch}.tar.gz: stripped_package {include_scylla_and_iotune_stripped} $builddir/SCYLLA-RELEASE-FILE $builddir/SCYLLA-VERSION-FILE $builddir/debian/debian $builddir/node_exporter/node_exporter.stripped | always\n'.format(**locals()))
f.write(' mode = {mode}\n'.format(**locals()))
f.write('build $builddir/{mode}/dist/tar/{scylla_product}-debuginfo-{scylla_version}-{scylla_release}.{arch}.tar.gz: debuginfo_package {include_scylla_and_iotune_debug} $builddir/SCYLLA-RELEASE-FILE $builddir/SCYLLA-VERSION-FILE $builddir/debian/debian $builddir/node_exporter/node_exporter.debug | always\n'.format(**locals()))
include_scylla_and_iotune = f'$builddir/{mode}/scylla $builddir/{mode}/iotune' if not args.dist_only else ''
f.write('build $builddir/{mode}/dist/tar/{scylla_product}-{scylla_version}-{scylla_release}.{arch}.tar.gz: package {include_scylla_and_iotune} $builddir/SCYLLA-RELEASE-FILE $builddir/SCYLLA-VERSION-FILE $builddir/debian/debian $builddir/node_exporter | always\n'.format(**locals()))
f.write(' mode = {mode}\n'.format(**locals()))
f.write('build $builddir/{mode}/dist/tar/{scylla_product}-package.tar.gz: copy $builddir/{mode}/dist/tar/{scylla_product}-{scylla_version}-{scylla_release}.{arch}.tar.gz\n'.format(**locals()))
f.write(' mode = {mode}\n'.format(**locals()))
f.write('build $builddir/{mode}/dist/tar/{scylla_product}-{arch}-package.tar.gz: copy $builddir/{mode}/dist/tar/{scylla_product}-{scylla_version}-{scylla_release}.{arch}.tar.gz\n'.format(**locals()))
f.write(' mode = {mode}\n'.format(**locals()))
f.write(f'build $builddir/dist/{mode}/redhat: rpmbuild $builddir/{mode}/dist/tar/{scylla_product}-unstripped-{scylla_version}-{scylla_release}.{arch}.tar.gz\n')
f.write(f'build $builddir/dist/{mode}/redhat: rpmbuild $builddir/{mode}/dist/tar/{scylla_product}-{scylla_version}-{scylla_release}.{arch}.tar.gz\n')
f.write(f' mode = {mode}\n')
f.write(f'build $builddir/dist/{mode}/debian: debbuild $builddir/{mode}/dist/tar/{scylla_product}-unstripped-{scylla_version}-{scylla_release}.{arch}.tar.gz\n')
f.write(f'build $builddir/dist/{mode}/debian: debbuild $builddir/{mode}/dist/tar/{scylla_product}-{scylla_version}-{scylla_release}.{arch}.tar.gz\n')
f.write(f' mode = {mode}\n')
f.write(f'build dist-server-{mode}: phony $builddir/dist/{mode}/redhat $builddir/dist/{mode}/debian dist-server-compat-{mode} dist-server-compat-arch-{mode}\n')
f.write(f'build dist-server-compat-{mode}: phony $builddir/{mode}/dist/tar/{scylla_product}-package.tar.gz\n')
f.write(f'build dist-server-compat-arch-{mode}: phony $builddir/{mode}/dist/tar/{scylla_product}-{arch}-package.tar.gz\n')
f.write(f'build dist-server-debuginfo-{mode}: phony $builddir/{mode}/dist/tar/{scylla_product}-debuginfo-{scylla_version}-{scylla_release}.{arch}.tar.gz\n')
f.write(f'build dist-jmx-{mode}: phony $builddir/{mode}/dist/tar/{scylla_product}-jmx-{scylla_version}-{scylla_release}.noarch.tar.gz dist-jmx-rpm dist-jmx-deb dist-jmx-compat\n')
f.write(f'build dist-tools-{mode}: phony $builddir/{mode}/dist/tar/{scylla_product}-tools-{scylla_version}-{scylla_release}.noarch.tar.gz dist-tools-rpm dist-tools-deb dist-tools-compat\n')
f.write(f'build dist-python3-{mode}: phony dist-python3-tar dist-python3-rpm dist-python3-deb dist-python3-compat dist-python3-compat-arch\n')
@@ -2186,10 +2160,9 @@ with open(buildfile, 'w') as f:
build dist-server-deb: phony {' '.join(['$builddir/dist/{mode}/debian'.format(mode=mode) for mode in build_modes])}
build dist-server-rpm: phony {' '.join(['$builddir/dist/{mode}/redhat'.format(mode=mode) for mode in build_modes])}
build dist-server-tar: phony {' '.join(['$builddir/{mode}/dist/tar/{scylla_product}-{scylla_version}-{scylla_release}.{arch}.tar.gz'.format(mode=mode, scylla_product=scylla_product, arch=arch, scylla_version=scylla_version, scylla_release=scylla_release) for mode in default_modes])}
build dist-server-debuginfo: phony {' '.join(['$builddir/{mode}/dist/tar/{scylla_product}-debuginfo-{scylla_version}-{scylla_release}.{arch}.tar.gz'.format(mode=mode, scylla_product=scylla_product, arch=arch, scylla_version=scylla_version, scylla_release=scylla_release) for mode in default_modes])}
build dist-server-compat: phony {' '.join(['$builddir/{mode}/dist/tar/{scylla_product}-package.tar.gz'.format(mode=mode, scylla_product=scylla_product, arch=arch) for mode in default_modes])}
build dist-server-compat-arch: phony {' '.join(['$builddir/{mode}/dist/tar/{scylla_product}-{arch}-package.tar.gz'.format(mode=mode, scylla_product=scylla_product, arch=arch) for mode in default_modes])}
build dist-server: phony dist-server-tar dist-server-debuginfo dist-server-compat dist-server-compat-arch dist-server-rpm dist-server-deb
build dist-server: phony dist-server-tar dist-server-compat dist-server-compat-arch dist-server-rpm dist-server-deb
rule build-submodule-reloc
command = cd $reloc_dir && ./reloc/build_reloc.sh --version $$(<../../build/SCYLLA-PRODUCT-FILE)-$$(sed 's/-/~/' <../../build/SCYLLA-VERSION-FILE)-$$(<../../build/SCYLLA-RELEASE-FILE) --nodeps $args
@@ -2259,7 +2232,7 @@ with open(buildfile, 'w') as f:
build $builddir/{mode}/dist/tar/{scylla_product}-jmx-{scylla_version}-{scylla_release}.noarch.tar.gz: copy tools/jmx/build/{scylla_product}-jmx-{scylla_version}-{scylla_release}.noarch.tar.gz
build $builddir/{mode}/dist/tar/{scylla_product}-jmx-package.tar.gz: copy tools/jmx/build/{scylla_product}-jmx-{scylla_version}-{scylla_release}.noarch.tar.gz
build {mode}-dist: phony dist-server-{mode} dist-server-debuginfo-{mode} dist-python3-{mode} dist-tools-{mode} dist-jmx-{mode} dist-unified-{mode}
build {mode}-dist: phony dist-server-{mode} dist-python3-{mode} dist-tools-{mode} dist-jmx-{mode} dist-unified-{mode}
build dist-{mode}: phony {mode}-dist
build dist-check-{mode}: dist-check
mode = {mode}
@@ -2302,9 +2275,7 @@ with open(buildfile, 'w') as f:
build $builddir/debian/debian: debian_files_gen | always
rule extract_node_exporter
command = tar -C build -xvpf {node_exporter_filename} --no-same-owner && rm -rfv build/node_exporter && mv -v build/{node_exporter_dirname} build/node_exporter
build $builddir/node_exporter/node_exporter: extract_node_exporter | always
build $builddir/node_exporter/node_exporter.stripped: strip $builddir/node_exporter/node_exporter
build $builddir/node_exporter/node_exporter.debug: phony $builddir/node_exporter/node_exporter.stripped
build $builddir/node_exporter: extract_node_exporter | always
rule print_help
command = ./scripts/build-help.sh
build help: print_help | always

View File

@@ -12,6 +12,10 @@
#include <boost/range/algorithm/sort.hpp>
std::ostream& operator<<(std::ostream& os, const counter_id& id) {
return os << id.to_uuid();
}
std::ostream& operator<<(std::ostream& os, counter_shard_view csv) {
return os << "{global_shard id: " << csv.id() << " value: " << csv.value()
<< " clock: " << csv.logical_clock() << "}";
@@ -170,11 +174,9 @@ std::optional<atomic_cell> counter_cell_view::difference(atomic_cell_view a, ato
}
void transform_counter_updates_to_shards(mutation& m, const mutation* current_state, uint64_t clock_offset, locator::host_id local_host_id) {
void transform_counter_updates_to_shards(mutation& m, const mutation* current_state, uint64_t clock_offset, utils::UUID local_id) {
// FIXME: allow current_state to be frozen_mutation
utils::UUID local_id = local_host_id.uuid();
auto transform_new_row_to_shards = [&s = *m.schema(), clock_offset, local_id] (column_kind kind, auto& cells) {
cells.for_each_cell([&] (column_id id, atomic_cell_or_collection& ac_o_c) {
auto& cdef = s.column_at(kind, id);

View File

@@ -14,12 +14,51 @@
#include "atomic_cell.hh"
#include "types.hh"
#include "locator/host_id.hh"
class mutation;
class atomic_cell_or_collection;
using counter_id = utils::tagged_uuid<struct counter_id_tag>;
class counter_id {
int64_t _least_significant;
int64_t _most_significant;
public:
static_assert(std::is_same<decltype(std::declval<utils::UUID>().get_least_significant_bits()), int64_t>::value
&& std::is_same<decltype(std::declval<utils::UUID>().get_most_significant_bits()), int64_t>::value,
"utils::UUID is expected to work with two signed 64-bit integers");
counter_id() = default;
explicit counter_id(utils::UUID uuid) noexcept
: _least_significant(uuid.get_least_significant_bits())
, _most_significant(uuid.get_most_significant_bits())
{ }
utils::UUID to_uuid() const {
return utils::UUID(_most_significant, _least_significant);
}
bool operator<(const counter_id& other) const {
return to_uuid() < other.to_uuid();
}
bool operator>(const counter_id& other) const {
return other.to_uuid() < to_uuid();
}
bool operator==(const counter_id& other) const {
return to_uuid() == other.to_uuid();
}
bool operator!=(const counter_id& other) const {
return !(*this == other);
}
public:
// For tests.
static counter_id generate_random() {
return counter_id(utils::make_random_uuid());
}
};
static_assert(
std::is_standard_layout_v<counter_id> && std::is_trivial_v<counter_id>,
"counter_id should be a POD type");
std::ostream& operator<<(std::ostream& os, const counter_id& id);
template<mutable_view is_mutable>
class basic_counter_shard_view {
@@ -367,13 +406,13 @@ struct counter_cell_mutable_view : basic_counter_cell_view<mutable_view::yes> {
// Transforms mutation dst from counter updates to counter shards using state
// stored in current_state.
// If current_state is present it has to be in the same schema as dst.
void transform_counter_updates_to_shards(mutation& dst, const mutation* current_state, uint64_t clock_offset, locator::host_id local_id);
void transform_counter_updates_to_shards(mutation& dst, const mutation* current_state, uint64_t clock_offset, utils::UUID local_id);
template<>
struct appending_hash<counter_shard_view> {
template<typename Hasher>
void operator()(Hasher& h, const counter_shard_view& cshard) const {
::feed_hash(h, cshard.id());
::feed_hash(h, cshard.id().to_uuid());
::feed_hash(h, cshard.value());
::feed_hash(h, cshard.logical_clock());
}

View File

@@ -44,7 +44,7 @@ options {
#include "cql3/statements/drop_aggregate_statement.hh"
#include "cql3/statements/drop_service_level_statement.hh"
#include "cql3/statements/detach_service_level_statement.hh"
#include "cql3/statements/raw/truncate_statement.hh"
#include "cql3/statements/truncate_statement.hh"
#include "cql3/statements/raw/update_statement.hh"
#include "cql3/statements/raw/insert_statement.hh"
#include "cql3/statements/raw/delete_statement.hh"
@@ -371,8 +371,7 @@ useStatement returns [std::unique_ptr<raw::use_statement> stmt]
* SELECT [JSON] <expression>
* FROM <CF>
* WHERE KEY = "key1" AND COL > 1 AND COL < 100
* LIMIT <NUMBER>
* [USING TIMEOUT <duration>];
* LIMIT <NUMBER>;
*/
selectStatement returns [std::unique_ptr<raw::select_statement> expr]
@init {
@@ -399,7 +398,7 @@ selectStatement returns [std::unique_ptr<raw::select_statement> expr]
( K_LIMIT rows=intValue { limit = rows; } )?
( K_ALLOW K_FILTERING { allow_filtering = true; } )?
( K_BYPASS K_CACHE { bypass_cache = true; })?
( usingTimeoutClause[attrs] )?
( usingClause[attrs] )?
{
auto params = make_lw_shared<raw::select_statement::parameters>(std::move(orderings), is_distinct, allow_filtering, statement_subtype, bypass_cache);
$expr = std::make_unique<raw::select_statement>(std::move(cf), std::move(params),
@@ -519,19 +518,6 @@ usingClauseObjective[std::unique_ptr<cql3::attributes::raw>& attrs]
| K_TIMEOUT to=term { attrs->timeout = to; }
;
usingTimestampTimeoutClause[std::unique_ptr<cql3::attributes::raw>& attrs]
: K_USING usingTimestampTimeoutClauseObjective[attrs] ( K_AND usingTimestampTimeoutClauseObjective[attrs] )*
;
usingTimestampTimeoutClauseObjective[std::unique_ptr<cql3::attributes::raw>& attrs]
: K_TIMESTAMP ts=intValue { attrs->timestamp = ts; }
| K_TIMEOUT to=term { attrs->timeout = to; }
;
usingTimeoutClause[std::unique_ptr<cql3::attributes::raw>& attrs]
: K_USING K_TIMEOUT to=term { attrs->timeout = to; }
;
/**
* UPDATE <CF>
* USING TIMESTAMP <long>
@@ -566,7 +552,7 @@ updateConditions returns [conditions_type conditions]
/**
* DELETE name1, name2
* FROM <CF>
* [USING (TIMESTAMP <long> | TIMEOUT <duration>) [AND ...]]
* USING TIMESTAMP <long>
* WHERE KEY = keyname
[IF (EXISTS | name = value, ...)];
*/
@@ -578,7 +564,7 @@ deleteStatement returns [std::unique_ptr<raw::delete_statement> expr]
}
: K_DELETE ( dels=deleteSelection { column_deletions = std::move(dels); } )?
K_FROM cf=columnFamilyName
( usingTimestampTimeoutClause[attrs] )?
( usingClause[attrs] )?
K_WHERE wclause=whereClause
( K_IF ( K_EXISTS { if_exists = true; } | conditions=updateConditions ))?
{
@@ -871,8 +857,7 @@ indexIdent returns [::shared_ptr<index_target::raw> id]
@init {
std::vector<::shared_ptr<cql3::column_identifier::raw>> columns;
}
: c=cident { $id = index_target::raw::regular_values_of(c); }
| K_VALUES '(' c=cident ')' { $id = index_target::raw::collection_values_of(c); }
: c=cident { $id = index_target::raw::values_of(c); }
| K_KEYS '(' c=cident ')' { $id = index_target::raw::keys_of(c); }
| K_ENTRIES '(' c=cident ')' { $id = index_target::raw::keys_and_values_of(c); }
| K_FULL '(' c=cident ')' { $id = index_target::raw::full_collection(c); }
@@ -1072,18 +1057,10 @@ dropIndexStatement returns [std::unique_ptr<drop_index_statement> expr]
;
/**
* TRUNCATE [TABLE] <CF>
* [USING TIMEOUT <duration>];
* TRUNCATE <CF>;
*/
truncateStatement returns [std::unique_ptr<raw::truncate_statement> stmt]
@init {
auto attrs = std::make_unique<cql3::attributes::raw>();
}
: K_TRUNCATE (K_COLUMNFAMILY)? cf=columnFamilyName
( usingTimeoutClause[attrs] )?
{
$stmt = std::make_unique<raw::truncate_statement>(std::move(cf), std::move(attrs));
}
truncateStatement returns [std::unique_ptr<truncate_statement> stmt]
: K_TRUNCATE (K_COLUMNFAMILY)? cf=columnFamilyName { $stmt = std::make_unique<truncate_statement>(cf); }
;
/**
@@ -1419,7 +1396,7 @@ serviceLevelOrRoleName returns [sstring name]
std::transform($name.begin(), $name.end(), $name.begin(), ::tolower); }
| t=STRING_LITERAL { $name = sstring($t.text); }
| t=QUOTED_NAME { $name = sstring($t.text); }
| k=unreserved_keyword { $name = sstring($t.text);
| k=unreserved_keyword { $name = k;
std::transform($name.begin(), $name.end(), $name.begin(), ::tolower);}
| QMARK {add_recognition_error("Bind variables cannot be used for service levels or role names");}
;

View File

@@ -73,14 +73,6 @@ public:
std::move(in_values), nullptr, expr::oper_t::IN);
}
const std::optional<expr::expression>& get_value() const {
return _value;
}
expr::oper_t get_operation() const {
return _op;
}
class raw final {
private:
std::optional<cql3::expr::expression> _value;

View File

@@ -10,7 +10,6 @@
#include "cql3/constants.hh"
#include "cql3/cql3_type.hh"
#include "cql3/statements/strongly_consistent_modification_statement.hh"
namespace cql3 {
void constants::deleter::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) {
@@ -23,9 +22,4 @@ void constants::deleter::execute(mutation& m, const clustering_key_prefix& prefi
m.set_cell(prefix, column, params.make_dead_cell());
}
}
void
constants::setter::prepare_for_broadcast_tables(statements::broadcast_tables::prepared_update& query) const {
query.new_value = *_e;
}
}

View File

@@ -11,17 +11,12 @@
#pragma once
#include "cql3/abstract_marker.hh"
#include "cql3/query_options.hh"
#include "cql3/update_parameters.hh"
#include "cql3/operation.hh"
#include "cql3/values.hh"
#include "mutation.hh"
#include <seastar/core/shared_ptr.hh>
namespace service::broadcast_tables {
class update_query;
}
namespace cql3 {
/**
@@ -49,8 +44,6 @@ public:
m.set_cell(prefix, column, params.make_cell(*column.type, value));
}
}
virtual void prepare_for_broadcast_tables(statements::broadcast_tables::prepared_update& query) const override;
};
struct adder final : operation {

View File

@@ -335,8 +335,7 @@ bool limits(const tuple_constructor& columns_tuple, const oper_t op, const expre
/// True iff collection (list, set, or map) contains value.
bool contains(const data_value& collection, const raw_value_view& value) {
if (!value) {
// CONTAINS NULL should evaluate to NULL/false
return false;
return true; // Compatible with old code, which skips null terms in value comparisons.
}
auto col_type = static_pointer_cast<const collection_type_impl>(collection.type());
auto&& element_type = col_type->is_set() ? col_type->name_comparator() : col_type->value_comparator();
@@ -374,8 +373,7 @@ bool contains(const column_value& col, const raw_value_view& value, const evalua
/// True iff a column is a map containing \p key.
bool contains_key(const column_value& col, cql3::raw_value_view key, const evaluation_inputs& inputs) {
if (!key) {
// CONTAINS_KEY NULL should evaluate to NULL/false
return false;
return true; // Compatible with old code, which skips null terms in key comparisons.
}
auto type = col.col->type;
const auto collection = get_value(col, inputs);
@@ -460,9 +458,8 @@ bool like(const column_value& cv, const raw_value_view& pattern, const evaluatio
/// True iff the column value is in the set defined by rhs.
bool is_one_of(const expression& col, const expression& rhs, const evaluation_inputs& inputs) {
const cql3::raw_value in_list = evaluate(rhs, inputs);
if (in_list.is_null()) {
return false;
}
statements::request_validations::check_false(
in_list.is_null(), "Invalid null value for column {}", col);
return boost::algorithm::any_of(get_list_elements(in_list), [&] (const managed_bytes_opt& b) {
return equal(col, b, inputs);
@@ -698,9 +695,7 @@ value_list get_IN_values(
if (in_list.is_unset_value()) {
throw exceptions::invalid_request_exception(format("Invalid unset value for column {}", column_name));
}
if (in_list.is_null()) {
return value_list();
}
statements::request_validations::check_false(in_list.is_null(), "Invalid null value for column {}", column_name);
utils::chunked_vector<managed_bytes> list_elems = get_list_elements(in_list);
return to_sorted_vector(std::move(list_elems) | non_null | deref, comparator);
}
@@ -746,25 +741,25 @@ expression make_conjunction(expression a, expression b) {
return conjunction{std::move(children)};
}
static
void
do_factorize(std::vector<expression>& factors, expression e) {
if (auto c = expr::as_if<conjunction>(&e)) {
for (auto&& element : c->children) {
do_factorize(factors, std::move(element));
}
} else {
factors.push_back(std::move(e));
}
}
std::vector<expression>
boolean_factors(expression e) {
std::vector<expression> ret;
for_each_boolean_factor(e, [&](const expression& boolean_factor) {
ret.push_back(boolean_factor);
});
do_factorize(ret, std::move(e));
return ret;
}
void for_each_boolean_factor(const expression& e, const noncopyable_function<void (const expression&)>& for_each_func) {
if (auto conj = as_if<conjunction>(&e)) {
for (const expression& child : conj->children) {
for_each_boolean_factor(child, for_each_func);
}
} else {
for_each_func(e);
}
}
template<typename T>
nonwrapping_range<std::remove_cvref_t<T>> to_range(oper_t op, T&& val) {
using U = std::remove_cvref_t<T>;
@@ -823,37 +818,11 @@ value_set possible_lhs_values(const column_definition* cdef, const expression& e
: to_range(oper.op, std::move(*val));
} else if (oper.op == oper_t::IN) {
return get_IN_values(oper.rhs, options, type->as_less_comparator(), cdef->name_as_text());
} else if (oper.op == oper_t::CONTAINS || oper.op == oper_t::CONTAINS_KEY) {
managed_bytes_opt val = evaluate(oper.rhs, options).to_managed_bytes_opt();
if (!val) {
return empty_value_set; // All NULL comparisons fail; no column values match.
}
return value_set(value_list{*val});
}
throw std::logic_error(format("possible_lhs_values: unhandled operator {}", oper));
},
[&] (const subscript& s) -> value_set {
const column_value& col = get_subscripted_column(s);
if (!cdef || cdef != col.col) {
return unbounded_value_set;
}
managed_bytes_opt sval = evaluate(s.sub, options).to_managed_bytes_opt();
if (!sval) {
return empty_value_set; // NULL can't be a map key
}
if (oper.op == oper_t::EQ) {
managed_bytes_opt rval = evaluate(oper.rhs, options).to_managed_bytes_opt();
if (!rval) {
return empty_value_set; // All NULL comparisons fail; no column values match.
}
managed_bytes_opt elements[] = {sval, rval};
managed_bytes val = tuple_type_impl::build_value_fragmented(elements);
return value_set(value_list{val});
}
throw std::logic_error(format("possible_lhs_values: unhandled operator {}", oper));
on_internal_error(expr_logger, "possible_lhs_values: subscripts are not supported as the LHS of a binary expression");
},
[&] (const tuple_constructor& tuple) -> value_set {
if (!cdef) {
@@ -1010,26 +979,11 @@ nonwrapping_range<managed_bytes> to_range(const value_set& s) {
}, s);
}
namespace {
constexpr inline secondary_index::index::supports_expression_v operator&&(secondary_index::index::supports_expression_v v1, secondary_index::index::supports_expression_v v2) {
using namespace secondary_index;
auto True = index::supports_expression_v::from_bool(true);
return v1 == True && v2 == True ? True : index::supports_expression_v::from_bool(false);
}
secondary_index::index::supports_expression_v is_supported_by_helper(const expression& expr, const secondary_index::index& idx) {
using ret_t = secondary_index::index::supports_expression_v;
using namespace secondary_index;
bool is_supported_by(const expression& expr, const secondary_index::index& idx) {
using std::placeholders::_1;
return expr::visit(overloaded_functor{
[&] (const conjunction& conj) -> ret_t {
if (conj.children.empty()) {
return index::supports_expression_v::from_bool(true);
}
auto init = is_supported_by_helper(conj.children[0], idx);
return std::accumulate(std::begin(conj.children) + 1, std::end(conj.children), init,
[&] (ret_t acc, const expression& child) -> ret_t {
return acc && is_supported_by_helper(child, idx);
});
[&] (const conjunction& conj) {
return boost::algorithm::all_of(conj.children, std::bind(is_supported_by, _1, idx));
},
[&] (const binary_operator& oper) {
return expr::visit(overloaded_functor{
@@ -1043,64 +997,57 @@ secondary_index::index::supports_expression_v is_supported_by_helper(const expre
}
}
// We don't use index table for multi-column restrictions, as it cannot avoid filtering.
return index::supports_expression_v::from_bool(false);
return false;
},
[&] (const token&) { return index::supports_expression_v::from_bool(false); },
[&] (const subscript& s) -> ret_t {
const column_value& col = get_subscripted_column(s);
return idx.supports_subscript_expression(*col.col, oper.op);
[&] (const token&) { return false; },
[&] (const subscript& s) -> bool {
// We don't support indexes on map entries yet.
return false;
},
[&] (const binary_operator&) -> ret_t {
[&] (const binary_operator&) -> bool {
on_internal_error(expr_logger, "is_supported_by: nested binary operators are not supported");
},
[&] (const conjunction&) -> ret_t {
[&] (const conjunction&) -> bool {
on_internal_error(expr_logger, "is_supported_by: conjunctions are not supported as the LHS of a binary expression");
},
[] (const constant&) -> ret_t {
[] (const constant&) -> bool {
on_internal_error(expr_logger, "is_supported_by: constants are not supported as the LHS of a binary expression");
},
[] (const unresolved_identifier&) -> ret_t {
[] (const unresolved_identifier&) -> bool {
on_internal_error(expr_logger, "is_supported_by: an unresolved identifier is not supported as the LHS of a binary expression");
},
[&] (const column_mutation_attribute&) -> ret_t {
[&] (const column_mutation_attribute&) -> bool {
on_internal_error(expr_logger, "is_supported_by: writetime/ttl are not supported as the LHS of a binary expression");
},
[&] (const function_call&) -> ret_t {
[&] (const function_call&) -> bool {
on_internal_error(expr_logger, "is_supported_by: function calls are not supported as the LHS of a binary expression");
},
[&] (const cast&) -> ret_t {
[&] (const cast&) -> bool {
on_internal_error(expr_logger, "is_supported_by: typecasts are not supported as the LHS of a binary expression");
},
[&] (const field_selection&) -> ret_t {
[&] (const field_selection&) -> bool {
on_internal_error(expr_logger, "is_supported_by: field selections are not supported as the LHS of a binary expression");
},
[&] (const null&) -> ret_t {
[&] (const null&) -> bool {
on_internal_error(expr_logger, "is_supported_by: nulls are not supported as the LHS of a binary expression");
},
[&] (const bind_variable&) -> ret_t {
[&] (const bind_variable&) -> bool {
on_internal_error(expr_logger, "is_supported_by: bind variables are not supported as the LHS of a binary expression");
},
[&] (const untyped_constant&) -> ret_t {
[&] (const untyped_constant&) -> bool {
on_internal_error(expr_logger, "is_supported_by: untyped constants are not supported as the LHS of a binary expression");
},
[&] (const collection_constructor&) -> ret_t {
[&] (const collection_constructor&) -> bool {
on_internal_error(expr_logger, "is_supported_by: collection constructors are not supported as the LHS of a binary expression");
},
[&] (const usertype_constructor&) -> ret_t {
[&] (const usertype_constructor&) -> bool {
on_internal_error(expr_logger, "is_supported_by: user type constructors are not supported as the LHS of a binary expression");
},
}, oper.lhs);
},
[] (const auto& default_case) { return index::supports_expression_v::from_bool(false); }
[] (const auto& default_case) { return false; }
}, expr);
}
}
bool is_supported_by(const expression& expr, const secondary_index::index& idx) {
auto s = is_supported_by_helper(expr, idx);
return s != secondary_index::index::supports_expression_v::from_bool(false);
}
bool has_supporting_index(
const expression& expr,
@@ -1510,7 +1457,7 @@ expression search_and_replace(const expression& e,
};
},
[&] (const binary_operator& oper) -> expression {
return binary_operator(recurse(oper.lhs), oper.op, recurse(oper.rhs));
return binary_operator(recurse(oper.lhs), oper.op, recurse(oper.rhs), oper.order);
},
[&] (const column_mutation_attribute& cma) -> expression {
return column_mutation_attribute{cma.kind, recurse(cma.column)};

View File

@@ -666,9 +666,6 @@ inline auto find_clustering_order(const expression& e) {
/// empty conjunction).
std::vector<expression> boolean_factors(expression e);
/// Run the given function for each element in the top level conjunction.
void for_each_boolean_factor(const expression& e, const noncopyable_function<void (const expression&)>& for_each_func);
/// True iff binary_operator involves a collection.
extern bool is_on_collection(const binary_operator&);

View File

@@ -18,7 +18,6 @@
#include "types/list.hh"
#include "types/set.hh"
#include "types/user.hh"
#include "service/broadcast_tables/experimental/lang.hh"
namespace cql3 {
@@ -298,8 +297,8 @@ operation::set_counter_value_from_tuple_list::prepare(data_dictionary::database
if (id <= last) {
throw marshal_exception(
format("invalid counter id order, {} <= {}",
id.uuid().to_sstring(),
last.uuid().to_sstring()));
id.to_uuid().to_sstring(),
last.to_uuid().to_sstring()));
}
last = id;
// TODO: maybe allow more than global values to propagate,
@@ -358,10 +357,4 @@ operation::element_deletion::prepare(data_dictionary::database db, const sstring
abort();
}
void
operation::prepare_for_broadcast_tables(statements::broadcast_tables::prepared_update&) const {
// FIXME: implement for every type of `operation`.
throw service::broadcast_tables::unsupported_operation_error{};
}
}

View File

@@ -11,7 +11,6 @@
#pragma once
#include <seastar/core/shared_ptr.hh>
#include "cql3/cql3_type.hh"
#include "exceptions/exceptions.hh"
#include "data_dictionary/data_dictionary.hh"
#include "update_parameters.hh"
@@ -22,10 +21,6 @@
namespace cql3 {
namespace statements::broadcast_tables {
struct prepared_update;
}
class update_parameters;
/**
@@ -90,8 +85,6 @@ public:
* Execute the operation.
*/
virtual void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) = 0;
virtual void prepare_for_broadcast_tables(statements::broadcast_tables::prepared_update&) const;
/**
* A parsed raw UPDATE operation.

View File

@@ -55,7 +55,7 @@ public:
}
};
query_processor::query_processor(service::storage_proxy& proxy, service::forward_service& forwarder, data_dictionary::database db, service::migration_notifier& mn, service::migration_manager& mm, query_processor::memory_config mcfg, cql_config& cql_cfg, utils::loading_cache_config auth_prep_cache_cfg, service::raft_group0_client& group0_client)
query_processor::query_processor(service::storage_proxy& proxy, service::forward_service& forwarder, data_dictionary::database db, service::migration_notifier& mn, service::migration_manager& mm, query_processor::memory_config mcfg, cql_config& cql_cfg, utils::loading_cache_config auth_prep_cache_cfg)
: _migration_subscriber{std::make_unique<migration_subscriber>(this)}
, _proxy(proxy)
, _forwarder(forwarder)
@@ -64,7 +64,6 @@ query_processor::query_processor(service::storage_proxy& proxy, service::forward
, _mm(mm)
, _mcfg(mcfg)
, _cql_config(cql_cfg)
, _group0_client(group0_client)
, _internal_state(new internal_state())
, _prepared_cache(prep_cache_log, _mcfg.prepared_statment_cache_size)
, _authorized_prepared_cache(std::move(auth_prep_cache_cfg), authorized_prepared_statements_cache_log)

View File

@@ -23,7 +23,6 @@
#include "exceptions/exceptions.hh"
#include "lang/wasm_instance_cache.hh"
#include "service/migration_listener.hh"
#include "service/raft/raft_group0_client.hh"
#include "transport/messages/result_message.hh"
#include "service/qos/service_level_controller.hh"
#include "service/client_state.hh"
@@ -99,7 +98,6 @@ private:
service::migration_manager& _mm;
memory_config _mcfg;
const cql_config& _cql_config;
service::raft_group0_client& _group0_client;
struct stats {
uint64_t prepare_invocations = 0;
@@ -140,7 +138,7 @@ public:
static std::unique_ptr<statements::raw::parsed_statement> parse_statement(const std::string_view& query);
static std::vector<std::unique_ptr<statements::raw::parsed_statement>> parse_statements(std::string_view queries);
query_processor(service::storage_proxy& proxy, service::forward_service& forwarder, data_dictionary::database db, service::migration_notifier& mn, service::migration_manager& mm, memory_config mcfg, cql_config& cql_cfg, utils::loading_cache_config auth_prep_cache_cfg, service::raft_group0_client& group0_client);
query_processor(service::storage_proxy& proxy, service::forward_service& forwarder, data_dictionary::database db, service::migration_notifier& mn, service::migration_manager& mm, memory_config mcfg, cql_config& cql_cfg, utils::loading_cache_config auth_prep_cache_cfg);
~query_processor();
@@ -204,10 +202,6 @@ public:
return _prepared_cache.find(key);
}
service::raft_group0_client& get_group0_client() {
return _group0_client;
}
inline
future<::shared_ptr<cql_transport::messages::result_message>>
execute_prepared(

View File

@@ -1169,11 +1169,8 @@ struct multi_column_range_accumulator {
intersect_all(to_range(binop.op, clustering_key_prefix(std::move(values))));
} else if (binop.op == oper_t::IN) {
const cql3::raw_value tup = expr::evaluate(binop.rhs, options);
utils::chunked_vector<std::vector<managed_bytes_opt>> tuple_elems;
if (tup.is_value()) {
tuple_elems = expr::get_list_of_tuples_elements(tup, *type_of(binop.rhs));
}
process_in_values(std::move(tuple_elems));
statements::request_validations::check_false(tup.is_null(), "Invalid null value for IN restriction");
process_in_values(expr::get_list_of_tuples_elements(tup, *type_of(binop.rhs)));
} else {
on_internal_error(rlogger, format("multi_column_range_accumulator: unexpected atom {}", binop));
}
@@ -1882,12 +1879,17 @@ std::vector<query::clustering_range> statement_restrictions::get_global_index_cl
std::transform(pk_value.cbegin(), pk_value.cend(), pkv_linearized.begin(),
[] (const managed_bytes& mb) { return to_bytes(mb); });
auto& token_column = idx_tbl_schema.clustering_column_at(0);
bytes token_bytes = token_column.get_computation().compute_value(*_schema, pkv_linearized);
bytes_opt token_bytes = token_column.get_computation().compute_value(
*_schema, pkv_linearized, clustering_row(clustering_key_prefix::make_empty()));
if (!token_bytes) {
on_internal_error(rlogger,
format("null value for token column in indexing table {}",
token_column.name_as_text()));
}
// WARNING: We must not yield to another fiber from here until the function's end, lest this RHS be
// overwritten.
const_cast<expr::expression&>(expr::as<binary_operator>((*_idx_tbl_ck_prefix)[0]).rhs) =
expr::constant(raw_value::make_value(token_bytes), token_column.type);
expr::constant(raw_value::make_value(*token_bytes), token_column.type);
// Multi column restrictions are not added to _idx_tbl_ck_prefix, they are handled later by filtering.
return get_single_column_clustering_bounds(options, idx_tbl_schema, *_idx_tbl_ck_prefix);

View File

@@ -81,7 +81,7 @@ public:
virtual sstring assignment_testable_source_context() const override {
auto&& name = _type->field_name(_field);
auto sname = sstring(reinterpret_cast<const char*>(name.begin(), name.size()));
auto sname = std::string_view(reinterpret_cast<const char*>(name.data()), name.size());
return format("{}.{}", _selected, sname);
}

View File

@@ -435,7 +435,7 @@ bool result_set_builder::restrictions_filter::do_filter(const selection& selecti
clustering_key_prefix ckey = clustering_key_prefix::from_exploded(clustering_key);
// FIXME: push to upper layer so it happens once per row
auto static_and_regular_columns = expr::get_non_pk_values(selection, static_row, row);
return expr::is_satisfied_by(
bool multi_col_clustering_satisfied = expr::is_satisfied_by(
clustering_columns_restrictions,
expr::evaluation_inputs{
.partition_key = &partition_key,
@@ -444,6 +444,9 @@ bool result_set_builder::restrictions_filter::do_filter(const selection& selecti
.selection = &selection,
.options = &_options,
});
if (!multi_col_clustering_satisfied) {
return false;
}
}
auto static_row_iterator = static_row.iterator();

View File

@@ -409,8 +409,7 @@ cql3::statements::alter_table_statement::prepare(data_dictionary::database db, c
future<::shared_ptr<messages::result_message>>
alter_table_statement::execute(query_processor& qp, service::query_state& state, const query_options& options) const {
auto s = validation::validate_column_family(qp.db(), keyspace(), column_family());
std::optional<sstring> warning = check_restricted_table_properties(qp, s, keyspace(), column_family(), *_properties);
std::optional<sstring> warning = check_restricted_table_properties(qp, keyspace(), column_family(), *_properties);
return schema_altering_statement::execute(qp, state, options).then([this, warning = std::move(warning)] (::shared_ptr<messages::result_message> msg) {
if (warning) {
msg->add_warning(*warning);

View File

@@ -261,6 +261,10 @@ future<shared_ptr<cql_transport::messages::result_message>> batch_statement::do_
if (options.getSerialConsistency() == null)
throw new InvalidRequestException("Invalid empty serial consistency level");
#endif
for (size_t i = 0; i < _statements.size(); ++i) {
_statements[i].statement->validate_primary_key_restrictions(options.for_statement(i));
}
if (_has_conditions) {
++_stats.cas_batches;
_stats.statements_in_cas_batches += _statements.size();

View File

@@ -91,8 +91,7 @@ lw_shared_ptr<query::read_command> cas_request::read_command(query_processor& qp
options.set(query::partition_slice::option::always_return_static_content);
query::partition_slice ps(std::move(ranges), *_schema, columns_to_read, options);
ps.set_partition_row_limit(max_rows);
return make_lw_shared<query::read_command>(_schema->id(), _schema->version(), std::move(ps), qp.proxy().get_max_result_size(ps),
query::tombstone_limit(qp.proxy().get_tombstone_limit()));
return make_lw_shared<query::read_command>(_schema->id(), _schema->version(), std::move(ps), qp.proxy().get_max_result_size(ps));
}
bool cas_request::applies_to() const {

View File

@@ -200,10 +200,10 @@ int32_t cf_prop_defs::get_paxos_grace_seconds() const {
return get_int(KW_PAXOSGRACESECONDS, DEFAULT_GC_GRACE_SECONDS);
}
std::optional<table_id> cf_prop_defs::get_id() const {
std::optional<utils::UUID> cf_prop_defs::get_id() const {
auto id = get_simple(KW_ID);
if (id) {
return std::make_optional<table_id>(utils::UUID(*id));
return utils::UUID(*id);
}
return std::nullopt;

View File

@@ -100,7 +100,7 @@ public:
int32_t get_default_time_to_live() const;
int32_t get_gc_grace_seconds() const;
int32_t get_paxos_grace_seconds() const;
std::optional<table_id> get_id() const;
std::optional<utils::UUID> get_id() const;
bool get_synchronous_updates_flag() const;
void apply_to_builder(schema_builder& builder, schema::extensions_map schema_extensions) const;

View File

@@ -66,7 +66,7 @@ shared_ptr<db::functions::function> create_aggregate_statement::create(query_pro
auto dummy_ident = ::make_shared<column_identifier>("", true);
auto column_spec = make_lw_shared<column_specification>("", "", dummy_ident, state_type);
auto initcond_term = expr::evaluate(prepare_expression(_ival.value(), db, _name.keyspace, nullptr, {column_spec}), query_options::DEFAULT);
initcond = std::move(initcond_term).to_bytes_opt();
initcond = std::move(initcond_term).to_bytes();
}
return ::make_shared<functions::user_aggregate>(_name, initcond, std::move(state_func), std::move(reduce_func), std::move(final_func));

View File

@@ -10,7 +10,6 @@
#include <seastar/core/coroutine.hh>
#include "create_index_statement.hh"
#include "exceptions/exceptions.hh"
#include "prepared_statement.hh"
#include "validation.hh"
#include "service/storage_proxy.hh"
@@ -29,7 +28,6 @@
#include <boost/range/adaptor/transformed.hpp>
#include <boost/algorithm/string/join.hpp>
#include <stdexcept>
namespace cql3 {
@@ -53,16 +51,6 @@ create_index_statement::check_access(query_processor& qp, const service::client_
return state.has_column_family_access(qp.db(), keyspace(), column_family(), auth::permission::ALTER);
}
static sstring target_type_name(index_target::target_type type) {
switch (type) {
case index_target::target_type::keys: return "keys()";
case index_target::target_type::keys_and_values: return "entries()";
case index_target::target_type::collection_values: return "values()";
case index_target::target_type::regular_values: return "value";
default: throw std::invalid_argument("should not reach");
}
}
void
create_index_statement::validate(query_processor& qp, const service::client_state& state) const
{
@@ -101,7 +89,6 @@ std::vector<::shared_ptr<index_target>> create_index_statement::validate_while_e
validate_targets_for_multi_column_index(targets);
}
const bool is_local_index = targets.size() > 0 && std::holds_alternative<index_target::multiple_columns>(targets.front()->value);
for (auto& target : targets) {
auto* ident = std::get_if<::shared_ptr<column_identifier>>(&target->value);
if (!ident) {
@@ -111,7 +98,7 @@ std::vector<::shared_ptr<index_target>> create_index_statement::validate_while_e
if (cd == nullptr) {
throw exceptions::invalid_request_exception(
format("No column definition found for column {}", target->column_name()));
format("No column definition found for column {}", target->as_string()));
}
//NOTICE(sarna): Should be lifted after resolving issue #2963
@@ -140,25 +127,14 @@ std::vector<::shared_ptr<index_target>> create_index_statement::validate_while_e
if (cd->kind == column_kind::partition_key && cd->is_on_all_components()) {
throw exceptions::invalid_request_exception(
format("Cannot create secondary index on partition key column {}",
target->column_name()));
target->as_string()));
}
if (cd->type->is_multi_cell()) {
if (cd->type->is_collection()) {
if (!db.features().collection_indexing) {
throw exceptions::invalid_request_exception(
"Indexing of collection columns not supported by some older nodes in this cluster. Please upgrade them.");
}
if (is_local_index) {
throw exceptions::invalid_request_exception(
format("Local secondary index on collection column {} is not implemented yet.", target->column_name()));
}
validate_not_full_index(*target);
validate_for_collection(*target, *cd);
rewrite_target_for_collection(*target, *cd);
} else {
throw exceptions::invalid_request_exception(format("Cannot create secondary index on UDT column {}", cd->name_as_text()));
}
// NOTICE(sarna): should be lifted after #2962 (indexes on non-frozen collections) is implemented
// NOTICE(kbraun): don't forget about non-frozen user defined types
throw exceptions::invalid_request_exception(
format("Cannot create secondary index on non-frozen collection or UDT column {}", cd->name_as_text()));
} else if (cd->type->is_collection()) {
validate_for_frozen_collection(*target);
} else {
@@ -218,8 +194,8 @@ void create_index_statement::validate_for_frozen_collection(const index_target&
if (target.type != index_target::target_type::full) {
throw exceptions::invalid_request_exception(
format("Cannot create index on {} of frozen collection column {}",
target_type_name(target.type),
target.column_name()));
index_target::index_option(target.type),
target.as_string()));
}
}
@@ -230,70 +206,16 @@ void create_index_statement::validate_not_full_index(const index_target& target)
}
}
void create_index_statement::validate_for_collection(const index_target& target, const column_definition& cd) const
{
auto throw_exception = [&] {
const char* msg_format = "Cannot create secondary index on {} of non-frozen collection column {}";
throw exceptions::invalid_request_exception(format(msg_format, to_sstring(target.type), cd.name_as_text()));
};
switch (target.type) {
case index_target::target_type::full:
throw std::logic_error("invalid target type(full) in validate_for_collection");
case index_target::target_type::regular_values:
break;
case index_target::target_type::collection_values:
break;
case index_target::target_type::keys:
[[fallthrough]];
case index_target::target_type::keys_and_values:
if (!cd.type->is_map()) {
const char* msg_format = "Cannot create secondary index on {} of column {} with non-map type";
throw exceptions::invalid_request_exception(format(msg_format, to_sstring(target.type), cd.name_as_text()));
}
break;
}
}
void create_index_statement::rewrite_target_for_collection(index_target& target, const column_definition& cd) const
{
// In Cassandra, `CREATE INDEX ON table(collection)` works the same as `CREATE INDEX ON table(VALUES(collection))`,
// and index on VALUES(collection) indexes values, if the collection was a map or a list, but it indexes the keys, if it
// was a set. Rewrite it to clean the mess.
switch (target.type) {
case index_target::target_type::full:
throw std::logic_error("invalid target type(full) in rewrite_target_for_collection");
case index_target::target_type::keys:
// If it was keys, then it must have been a map.
break;
case index_target::target_type::keys_and_values:
// If it was entries, then it must have been a map.
break;
case index_target::target_type::regular_values:
// Regular values for collections means the same as collection values.
[[fallthrough]];
case index_target::target_type::collection_values:
if (cd.type->is_map() || cd.type->is_list()) {
target.type = index_target::target_type::collection_values;
} else if (cd.type->is_set()) {
target.type = index_target::target_type::keys;
} else {
throw std::logic_error(format("rewrite_target_for_collection: unknown collection type {}", cd.type->cql3_type_name()));
}
break;
}
}
void create_index_statement::validate_is_values_index_if_target_column_not_collection(
const column_definition* cd, const index_target& target) const
{
if (!cd->type->is_collection()
&& target.type != index_target::target_type::regular_values) {
&& target.type != index_target::target_type::values) {
throw exceptions::invalid_request_exception(
format("Cannot create index on {} of column {}; only non-frozen collections support {} indexes",
target_type_name(target.type),
target.column_name(),
target_type_name(target.type)));
index_target::index_option(target.type),
target.as_string(),
index_target::index_option(target.type)));
}
}
@@ -304,7 +226,7 @@ void create_index_statement::validate_target_column_is_map_if_index_involves_key
if (!is_map) {
throw exceptions::invalid_request_exception(
format("Cannot create index on {} of column {} with non-map type",
target_type_name(target.type), target.column_name()));
index_target::index_option(target.type), target.as_string()));
}
}
}
@@ -318,10 +240,10 @@ void create_index_statement::validate_targets_for_multi_column_index(std::vector
}
std::unordered_set<sstring> columns;
for (auto& target : targets) {
if (columns.contains(target->column_name())) {
throw exceptions::invalid_request_exception(format("Duplicate column {} in index target list", target->column_name()));
if (columns.contains(target->as_string())) {
throw exceptions::invalid_request_exception(format("Duplicate column {} in index target list", target->as_string()));
}
columns.emplace(target->column_name());
columns.emplace(target->as_string());
}
}
@@ -335,9 +257,9 @@ schema_ptr create_index_statement::build_index_schema(query_processor& qp) const
if (accepted_name.empty()) {
std::optional<sstring> index_name_root;
if (targets.size() == 1) {
index_name_root = targets[0]->column_name();
index_name_root = targets[0]->as_string();
} else if ((targets.size() == 2 && std::holds_alternative<index_target::multiple_columns>(targets.front()->value))) {
index_name_root = targets[1]->column_name();
index_name_root = targets[1]->as_string();
}
accepted_name = db.get_available_index_name(keyspace(), column_family(), index_name_root);
}

View File

@@ -55,8 +55,6 @@ private:
void validate_for_local_index(const schema& schema) const;
void validate_for_frozen_collection(const index_target& target) const;
void validate_not_full_index(const index_target& target) const;
void validate_for_collection(const index_target& target, const column_definition&) const;
void rewrite_target_for_collection(index_target& target, const column_definition&) const;
void validate_is_values_index_if_target_column_not_collection(const column_definition* cd,
const index_target& target) const;
void validate_target_column_is_map_if_index_involves_keys(bool is_map, const index_target& target) const;

View File

@@ -30,7 +30,6 @@
#include "service/migration_manager.hh"
#include "service/storage_proxy.hh"
#include "db/config.hh"
#include "compaction/time_window_compaction_strategy.hh"
namespace cql3 {
@@ -42,7 +41,7 @@ create_table_statement::create_table_statement(cf_name name,
::shared_ptr<cf_prop_defs> properties,
bool if_not_exists,
column_set_type static_columns,
const std::optional<table_id>& id)
const std::optional<utils::UUID>& id)
: schema_altering_statement{name}
, _use_compact_storage(false)
, _static_columns{static_columns}
@@ -427,7 +426,6 @@ void create_table_statement::raw_statement::add_column_alias(::shared_ptr<column
// in the table's options are done elsewhere.
std::optional<sstring> check_restricted_table_properties(
query_processor& qp,
std::optional<schema_ptr> schema,
const sstring& keyspace, const sstring& table,
const cf_prop_defs& cfprops)
{
@@ -436,19 +434,6 @@ std::optional<sstring> check_restricted_table_properties(
// function before cfprops.validate() (there, validate() is only called
// in prepare_schema_mutations(), in the middle of execute).
auto strategy = cfprops.get_compaction_strategy_class();
sstables::compaction_strategy_type current_strategy = sstables::compaction_strategy_type::null;
gc_clock::duration current_ttl = gc_clock::duration::zero();
// cfprops doesn't return any of the table attributes unless the attribute
// has been specified in the CQL statement. If a schema is defined, then
// this was an ALTER TABLE statement.
if (schema) {
current_strategy = (*schema)->compaction_strategy();
current_ttl = (*schema)->default_time_to_live();
}
// Evaluate whether the strategy to evaluate was explicitly passed
auto cs = (strategy) ? strategy : current_strategy;
if (strategy && *strategy == sstables::compaction_strategy_type::date_tiered) {
switch(qp.db().get_config().restrict_dtcs()) {
case db::tri_mode_restriction_t::mode::TRUE:
@@ -467,52 +452,12 @@ std::optional<sstring> check_restricted_table_properties(
break;
}
}
if (cs == sstables::compaction_strategy_type::time_window) {
std::map<sstring, sstring> options = (strategy) ? cfprops.get_compaction_type_options() : (*schema)->compaction_strategy_options();
sstables::time_window_compaction_strategy_options twcs_options(options);
long ttl = (cfprops.has_property(cf_prop_defs::KW_DEFAULT_TIME_TO_LIVE)) ? cfprops.get_default_time_to_live() : current_ttl.count();
auto max_windows = qp.db().get_config().twcs_max_window_count();
// It may happen that an user tries to update an unrelated table property. Allow the request through.
if (!cfprops.has_property(cf_prop_defs::KW_DEFAULT_TIME_TO_LIVE) && !strategy) {
return std::nullopt;
}
if (ttl > 0) {
// Ideally we should not need the window_size check below. However, given #2336 it may happen that some incorrectly
// table created with a window_size=0 may exist, which would cause a division by zero (eg: in an ALTER statement).
// Given that, an invalid window size is treated as 1 minute, which is the smaller "supported" window size for TWCS.
auto window_size = twcs_options.get_sstable_window_size() > std::chrono::seconds::zero() ? twcs_options.get_sstable_window_size() : std::chrono::seconds(60);
auto window_count = std::chrono::seconds(ttl) / window_size;
if (max_windows > 0 && window_count > max_windows) {
throw exceptions::configuration_exception(fmt::format("The setting of default_time_to_live={} and compaction window={}(s) "
"can lead to {} windows, which is larger than the allowed number of windows specified "
"by the twcs_max_window_count ({}) parameter. Note that default_time_to_live=0 is also "
"highly discouraged.", ttl, twcs_options.get_sstable_window_size().count(), window_count, max_windows));
}
} else {
switch (qp.db().get_config().restrict_twcs_without_default_ttl()) {
case db::tri_mode_restriction_t::mode::TRUE:
throw exceptions::configuration_exception(
"TimeWindowCompactionStrategy tables without a strict default_time_to_live setting "
"are forbidden. You may override this restriction by setting restrict_twcs_without_default_ttl "
"configuration option to false.");
case db::tri_mode_restriction_t::mode::WARN:
return format("TimeWindowCompactionStrategy tables without a default_time_to_live "
"may potentially introduce too many windows. Ensure that insert statements specify a "
"TTL (via USING TTL), when inserting data to this table. The restrict_twcs_without_default_ttl "
"configuration option can be changed to silence this warning or make it into an error");
case db::tri_mode_restriction_t::mode::FALSE:
break;
}
}
}
return std::nullopt;
}
future<::shared_ptr<messages::result_message>>
create_table_statement::execute(query_processor& qp, service::query_state& state, const query_options& options) const {
std::optional<sstring> warning = check_restricted_table_properties(qp, std::nullopt, keyspace(), column_family(), *_properties);
std::optional<sstring> warning = check_restricted_table_properties(qp, keyspace(), column_family(), *_properties);
return schema_altering_statement::execute(qp, state, options).then([this, warning = std::move(warning)] (::shared_ptr<messages::result_message> msg) {
if (warning) {
msg->add_warning(*warning);

View File

@@ -60,13 +60,13 @@ class create_table_statement : public schema_altering_statement {
column_set_type _static_columns;
const ::shared_ptr<cf_prop_defs> _properties;
const bool _if_not_exists;
std::optional<table_id> _id;
std::optional<utils::UUID> _id;
public:
create_table_statement(cf_name name,
::shared_ptr<cf_prop_defs> properties,
bool if_not_exists,
column_set_type static_columns,
const std::optional<table_id>& id);
const std::optional<utils::UUID>& id);
virtual future<> check_access(query_processor& qp, const service::client_state& state) const override;
@@ -130,7 +130,6 @@ public:
std::optional<sstring> check_restricted_table_properties(
query_processor& qp,
std::optional<schema_ptr> schema,
const sstring& keyspace, const sstring& table,
const cf_prop_defs& cfprops);

View File

@@ -99,7 +99,9 @@ delete_statement::delete_statement(cf_name name,
, _deletions(std::move(deletions))
, _where_clause(std::move(where_clause))
{
assert(!_attrs->time_to_live.has_value());
if (_attrs->time_to_live) {
throw exceptions::invalid_request_exception("TTL attribute is not allowed for deletes");
}
}
}

View File

@@ -8,7 +8,6 @@
* SPDX-License-Identifier: (AGPL-3.0-or-later and Apache-2.0)
*/
#include <regex>
#include <stdexcept>
#include "index_target.hh"
#include "index/secondary_index.hh"
@@ -22,11 +21,9 @@ using db::index::secondary_index;
const sstring index_target::target_option_name = "target";
const sstring index_target::custom_index_option_name = "class_name";
const std::regex index_target::target_regex("^(keys|entries|values|full)\\((.+)\\)$");
sstring index_target::column_name() const {
sstring index_target::as_string() const {
struct as_string_visitor {
const index_target* target;
sstring operator()(const std::vector<::shared_ptr<column_identifier>>& columns) const {
return "(" + boost::algorithm::join(columns | boost::adaptors::transformed(
[](const ::shared_ptr<cql3::column_identifier>& ident) -> sstring {
@@ -39,7 +36,7 @@ sstring index_target::column_name() const {
}
};
return std::visit(as_string_visitor {this}, value);
return std::visit(as_string_visitor(), value);
}
index_target::target_type index_target::from_sstring(const sstring& s)
@@ -48,66 +45,26 @@ index_target::target_type index_target::from_sstring(const sstring& s)
return index_target::target_type::keys;
} else if (s == "entries") {
return index_target::target_type::keys_and_values;
} else if (s == "regular_values") {
return index_target::target_type::regular_values;
} else if (s == "values") {
return index_target::target_type::collection_values;
return index_target::target_type::values;
} else if (s == "full") {
return index_target::target_type::full;
}
throw std::runtime_error(format("Unknown target type: {}", s));
}
index_target::target_type index_target::from_target_string(const sstring& target) {
std::cmatch match;
if (std::regex_match(target.data(), match, target_regex)) {
return index_target::from_sstring(match[1].str());
sstring index_target::index_option(target_type type) {
switch (type) {
case target_type::keys: return secondary_index::index_keys_option_name;
case target_type::keys_and_values: return secondary_index::index_entries_option_name;
case target_type::values: return secondary_index::index_values_option_name;
default: throw std::invalid_argument("should not reach");
}
return target_type::regular_values;
}
// A CQL column's name may contain any characters. If we use this string as-is
// inside a target string, it may confuse us when we later try to parse the
// resulting string (e.g., see issue #10707). We should therefore use the
// function escape_target_column() to "escape" the target column name, and use
// the reverse function unescape_target_column() to undo this.
// Cassandra uses for this escaping the CQL syntax of this column name
// (basically, column_identifier::as_cql_name()). This is an overkill,
// but since we already have such code, we might as well use it.
sstring index_target::escape_target_column(const cql3::column_identifier& col) {
return col.to_cql_string();
}
sstring index_target::unescape_target_column(std::string_view str) {
// We don't have a reverse version of util::maybe_quote(), so
// we need to open-code it here. Cassandra has this too - in
// index/TargetParser.java
if (str.size() >= 2 && str.starts_with('"') && str.ends_with('"')) {
str.remove_prefix(1);
str.remove_suffix(1);
// remove doubled quotes in the middle of the string, which to_cql_string()
// adds. This code is inefficient but rarely called so it's fine.
static const std::regex double_quote_re("\"\"");
return std::regex_replace(std::string(str), double_quote_re, "\"");
}
return sstring(str);
}
sstring index_target::column_name_from_target_string(const sstring& target) {
std::cmatch match;
if (std::regex_match(target.data(), match, target_regex)) {
return unescape_target_column(match[2].str());
}
return unescape_target_column(target);
}
::shared_ptr<index_target::raw>
index_target::raw::regular_values_of(::shared_ptr<column_identifier::raw> c) {
return ::make_shared<raw>(c, target_type::regular_values);
}
::shared_ptr<index_target::raw>
index_target::raw::collection_values_of(::shared_ptr<column_identifier::raw> c) {
return ::make_shared<raw>(c, target_type::collection_values);
index_target::raw::values_of(::shared_ptr<column_identifier::raw> c) {
return ::make_shared<raw>(c, target_type::values);
}
::shared_ptr<index_target::raw>
@@ -127,7 +84,7 @@ index_target::raw::full_collection(::shared_ptr<column_identifier::raw> c) {
::shared_ptr<index_target::raw>
index_target::raw::columns(std::vector<::shared_ptr<column_identifier::raw>> c) {
return ::make_shared<raw>(std::move(c), target_type::regular_values);
return ::make_shared<raw>(std::move(c), target_type::values);
}
::shared_ptr<index_target>
@@ -158,11 +115,10 @@ sstring to_sstring(index_target::target_type type)
switch (type) {
case index_target::target_type::keys: return "keys";
case index_target::target_type::keys_and_values: return "entries";
case index_target::target_type::regular_values: return "regular_values";
case index_target::target_type::collection_values: return "values";
case index_target::target_type::values: return "values";
case index_target::target_type::full: return "full";
}
throw std::runtime_error("to_sstring(index_target::target_type): should not reach");
return "";
}
}

View File

@@ -13,7 +13,6 @@
#include <seastar/core/shared_ptr.hh>
#include "cql3/column_identifier.hh"
#include <variant>
#include <regex>
namespace cql3 {
@@ -22,42 +21,26 @@ namespace statements {
struct index_target {
static const sstring target_option_name;
static const sstring custom_index_option_name;
static const std::regex target_regex;
enum class target_type {
regular_values, collection_values, keys, keys_and_values, full
};
using single_column = ::shared_ptr<column_identifier>;
using single_column =::shared_ptr<column_identifier>;
using multiple_columns = std::vector<::shared_ptr<column_identifier>>;
using value_type = std::variant<single_column, multiple_columns>;
enum class target_type {
values, keys, keys_and_values, full
};
const value_type value;
target_type type;
const target_type type;
index_target(::shared_ptr<column_identifier> c, target_type t) : value(c) , type(t) {}
index_target(std::vector<::shared_ptr<column_identifier>> c, target_type t) : value(std::move(c)), type(t) {}
sstring column_name() const;
sstring as_string() const;
static sstring index_option(target_type type);
static target_type from_column_definition(const column_definition& cd);
// Parses index_target::target_type from it's textual form.
// e.g. from_sstring("keys") == index_target::target_type::keys
static index_target::target_type from_sstring(const sstring& s);
// Parses index_target::target_type from index target string form.
// e.g. from_target_string("keys(some_column)") == index_target::target_type::keys
static index_target::target_type from_target_string(const sstring& s);
// Parses column name from index target string form
// e.g. column_name_from_target_string("keys(some_column)") == "some_column"
static sstring column_name_from_target_string(const sstring& s);
// A CQL column's name may contain any characters. If we use this string
// as-is inside a target string, it may confuse us when we later try to
// parse the resulting string (e.g., see issue #10707). We should
// therefore use the function escape_target_column() to "escape" the
// target column name, and the reverse function unescape_target_column().
static sstring escape_target_column(const cql3::column_identifier& col);
static sstring unescape_target_column(std::string_view str);
class raw {
public:
@@ -71,8 +54,7 @@ struct index_target {
raw(::shared_ptr<column_identifier::raw> c, target_type t) : value(c), type(t) {}
raw(std::vector<::shared_ptr<column_identifier::raw>> pk_columns, target_type t) : value(pk_columns), type(t) {}
static ::shared_ptr<raw> regular_values_of(::shared_ptr<column_identifier::raw> c);
static ::shared_ptr<raw> collection_values_of(::shared_ptr<column_identifier::raw> c);
static ::shared_ptr<raw> values_of(::shared_ptr<column_identifier::raw> c);
static ::shared_ptr<raw> keys_of(::shared_ptr<column_identifier::raw> c);
static ::shared_ptr<raw> keys_and_values_of(::shared_ptr<column_identifier::raw> c);
static ::shared_ptr<raw> full_collection(::shared_ptr<column_identifier::raw> c);

View File

@@ -8,10 +8,8 @@
* SPDX-License-Identifier: (AGPL-3.0-or-later and Apache-2.0)
*/
#include "cql3/cql_statement.hh"
#include "types/map.hh"
#include "cql3/statements/modification_statement.hh"
#include "cql3/statements/strongly_consistent_modification_statement.hh"
#include "cql3/statements/raw/modification_statement.hh"
#include "cql3/statements/prepared_statement.hh"
#include "cql3/util.hh"
@@ -30,7 +28,6 @@
#include "cas_request.hh"
#include "cql3/query_processor.hh"
#include "service/storage_proxy.hh"
#include "service/broadcast_tables/experimental/lang.hh"
template<typename T = void>
using coordinator_result = exceptions::coordinator_result<T>;
@@ -112,9 +109,6 @@ future<> modification_statement::check_access(query_processor& qp, const service
future<std::vector<mutation>>
modification_statement::get_mutations(query_processor& qp, const query_options& options, db::timeout_clock::time_point timeout, bool local, int64_t now, service::query_state& qs) const {
if (_restrictions->range_or_slice_eq_null(options)) { // See #7852 and #9290.
throw exceptions::invalid_request_exception("Invalid null value in condition for a key column");
}
auto cl = options.get_consistency();
auto json_cache = maybe_prepare_json_cache(options);
auto keys = build_partition_keys(options, json_cache);
@@ -214,7 +208,7 @@ modification_statement::read_command(query_processor& qp, query::clustering_row_
}
query::partition_slice ps(std::move(ranges), *s, columns_to_read(), update_parameters::options);
const auto max_result_size = qp.proxy().get_max_result_size(ps);
return make_lw_shared<query::read_command>(s->id(), s->version(), std::move(ps), query::max_result_size(max_result_size), query::tombstone_limit::max);
return make_lw_shared<query::read_command>(s->id(), s->version(), std::move(ps), query::max_result_size(max_result_size));
}
std::vector<query::clustering_range>
@@ -253,6 +247,12 @@ modification_statement::execute_without_checking_exception_message(query_process
return modify_stage(this, seastar::ref(qp), seastar::ref(qs), seastar::cref(options));
}
void modification_statement::validate_primary_key_restrictions(const query_options& options) const {
if (_restrictions->range_or_slice_eq_null(options)) { // See #7852 and #9290.
throw exceptions::invalid_request_exception("Invalid null value in condition for a key column");
}
}
future<::shared_ptr<cql_transport::messages::result_message>>
modification_statement::do_execute(query_processor& qp, service::query_state& qs, const query_options& options) const {
if (has_conditions() && options.get_protocol_version() == 1) {
@@ -263,6 +263,8 @@ modification_statement::do_execute(query_processor& qp, service::query_state& qs
inc_cql_stats(qs.get_client_state().is_internal());
validate_primary_key_restrictions(options);
if (has_conditions()) {
return execute_with_condition(qp, qs, options);
}
@@ -442,30 +444,13 @@ modification_statement::process_where_clause(data_dictionary::database db, expr:
}
}
::shared_ptr<strongly_consistent_modification_statement>
modification_statement::prepare_for_broadcast_tables() const {
// FIXME: implement for every type of `modification_statement`.
throw service::broadcast_tables::unsupported_operation_error{};
}
namespace raw {
::shared_ptr<cql_statement_opt_metadata>
modification_statement::prepare_statement(data_dictionary::database db, prepare_context& ctx, cql_stats& stats) {
::shared_ptr<cql3::statements::modification_statement> statement = prepare(db, ctx, stats);
if (service::broadcast_tables::is_broadcast_table_statement(keyspace(), column_family())) {
return statement->prepare_for_broadcast_tables();
} else {
return statement;
}
}
std::unique_ptr<prepared_statement>
modification_statement::prepare(data_dictionary::database db, cql_stats& stats) {
schema_ptr schema = validation::validate_column_family(db, keyspace(), column_family());
auto meta = get_prepare_context();
auto statement = prepare_statement(db, meta, stats);
auto statement = prepare(db, meta, stats);
auto partition_key_bind_indices = meta.get_partition_key_bind_indexes(*schema);
return std::make_unique<prepared_statement>(std::move(statement), meta, std::move(partition_key_bind_indices));
}

View File

@@ -33,7 +33,6 @@ class operation;
namespace statements {
class strongly_consistent_modification_statement;
namespace raw { class modification_statement; }
@@ -70,11 +69,10 @@ public:
protected:
std::vector<::shared_ptr<operation>> _column_operations;
cql_stats& _stats;
private:
// Separating normal and static conditions makes things somewhat easier
std::vector<lw_shared_ptr<column_condition>> _regular_conditions;
std::vector<lw_shared_ptr<column_condition>> _static_conditions;
private:
const ks_selector _ks_sel;
// True if this statement has _if_exists or _if_not_exists or other
@@ -231,6 +229,8 @@ public:
// True if this statement needs to read only static column values to check if it can be applied.
bool has_only_static_column_conditions() const { return !_has_regular_column_conditions && _has_static_column_conditions; }
void validate_primary_key_restrictions(const query_options& options) const;
virtual future<::shared_ptr<cql_transport::messages::result_message>>
execute(query_processor& qp, service::query_state& qs, const query_options& options) const override;
@@ -258,9 +258,6 @@ public:
future<std::vector<mutation>> get_mutations(query_processor& qp, const query_options& options, db::timeout_clock::time_point timeout, bool local, int64_t now, service::query_state& qs) const;
virtual json_cache_opt maybe_prepare_json_cache(const query_options& options) const;
virtual ::shared_ptr<strongly_consistent_modification_statement> prepare_for_broadcast_tables() const;
protected:
/**
* If there are conditions on the statement, this is called after the where clause and conditions have been

View File

@@ -32,8 +32,7 @@ static future<> delete_ghost_rows(dht::partition_range_vector partition_ranges,
auto selection = cql3::selection::selection::for_columns(view, key_columns);
query::partition_slice partition_slice(std::move(clustering_bounds), {}, {}, selection->get_query_options());
auto command = ::make_lw_shared<query::read_command>(view->id(), view->version(), partition_slice, proxy.get_max_result_size(partition_slice),
query::tombstone_limit(proxy.get_tombstone_limit()));
auto command = ::make_lw_shared<query::read_command>(view->id(), view->version(), partition_slice, proxy.get_max_result_size(partition_slice));
tracing::trace(state.get_trace_state(), "Deleting ghost rows from partition ranges {}", partition_ranges);

View File

@@ -14,7 +14,6 @@
#include "cql3/column_identifier.hh"
#include "cql3/column_condition.hh"
#include "cql3/attributes.hh"
#include "cql3/cql_statement.hh"
#include <seastar/core/shared_ptr.hh>
@@ -39,11 +38,10 @@ private:
const bool _if_not_exists;
const bool _if_exists;
protected:
modification_statement(cf_name name, std::unique_ptr<attributes::raw> attrs, conditions_vector conditions = {}, bool if_not_exists = false, bool if_exists = false);
modification_statement(cf_name name, std::unique_ptr<attributes::raw> attrs, conditions_vector conditions, bool if_not_exists, bool if_exists);
public:
virtual std::unique_ptr<prepared_statement> prepare(data_dictionary::database db, cql_stats& stats) override;
::shared_ptr<cql_statement_opt_metadata> prepare_statement(data_dictionary::database db, prepare_context& ctx, cql_stats& stats);
::shared_ptr<cql3::statements::modification_statement> prepare(data_dictionary::database db, prepare_context& ctx, cql_stats& stats) const;
protected:
virtual ::shared_ptr<cql3::statements::modification_statement> prepare_internal(data_dictionary::database db, schema_ptr schema,

View File

@@ -1,41 +0,0 @@
/*
* Copyright (C) 2022-present ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* SPDX-License-Identifier: (AGPL-3.0-or-later and Apache-2.0)
*/
#pragma once
#include "cql3/statements/raw/cf_statement.hh"
#include "cql3/attributes.hh"
namespace cql3 {
namespace statements {
namespace raw {
class truncate_statement : public raw::cf_statement {
private:
std::unique_ptr<attributes::raw> _attrs;
public:
/**
* Creates a new truncate_statement from a column family name, and attributes.
*
* @param name column family being operated on
* @param attrs additional attributes for statement (timeout)
*/
truncate_statement(cf_name name, std::unique_ptr<attributes::raw> attrs);
virtual std::unique_ptr<prepared_statement> prepare(data_dictionary::database db, cql_stats& stats) override;
};
} // namespace raw
} // namespace statements
} // namespace cql3

View File

@@ -10,13 +10,10 @@
#include "cql3/statements/select_statement.hh"
#include "cql3/expr/expression.hh"
#include "cql3/statements/index_target.hh"
#include "cql3/statements/raw/select_statement.hh"
#include "cql3/query_processor.hh"
#include "cql3/statements/prune_materialized_view_statement.hh"
#include "cql3/statements/strongly_consistent_select_statement.hh"
#include "service/broadcast_tables/experimental/lang.hh"
#include "transport/messages/result_message.hh"
#include "cql3/functions/as_json_function.hh"
#include "cql3/selection/selection.hh"
@@ -356,12 +353,11 @@ select_statement::do_execute(query_processor& qp,
_schema->version(),
std::move(slice),
max_result_size,
query::tombstone_limit(qp.proxy().get_tombstone_limit()),
query::row_limit(limit),
query::partition_limit(query::max_partitions),
now,
tracing::make_trace_info(state.get_trace_state()),
query_id::create_null_id(),
utils::UUID(),
query::is_first_page::no,
options.get_timestamp(state));
command->allow_limit = db::allow_per_partition_rate_limit::yes;
@@ -528,12 +524,11 @@ indexed_table_select_statement::prepare_command_for_base_query(query_processor&
_schema->version(),
std::move(slice),
qp.proxy().get_max_result_size(slice),
query::tombstone_limit(qp.proxy().get_tombstone_limit()),
query::row_limit(get_limit(options)),
query::partition_limit(query::max_partitions),
now,
tracing::make_trace_info(state.get_trace_state()),
query_id::create_null_id(),
utils::UUID(),
query::is_first_page::no,
options.get_timestamp(state));
cmd->allow_limit = db::allow_per_partition_rate_limit::yes;
@@ -569,14 +564,12 @@ indexed_table_select_statement::do_execute_base_query(
const bool is_paged = bool(paging_state);
base_query_state query_state{cmd->get_row_limit() * queried_ranges_count, std::move(ranges_to_vnodes)};
{
return do_with(std::move(query_state), [this, is_paged, &qp, &state, &options, cmd, timeout] (auto&& query_state) {
auto& merger = query_state.merger;
auto& ranges_to_vnodes = query_state.ranges_to_vnodes;
auto& concurrency = query_state.concurrency;
auto& previous_result_size = query_state.previous_result_size;
query::short_read is_short_read = query::short_read::no;
bool page_limit_reached = false;
while (!is_short_read && !ranges_to_vnodes.empty() && !page_limit_reached) {
return utils::result_repeat([this, is_paged, &previous_result_size, &ranges_to_vnodes, &merger, &qp, &state, &options, &concurrency, cmd, timeout]() {
// Starting with 1 range, we check if the result was a short read, and if not,
// we continue exponentially, asking for 2x more ranges than before
dht::partition_range_vector prange = ranges_to_vnodes(concurrency);
@@ -613,19 +606,19 @@ indexed_table_select_statement::do_execute_base_query(
if (previous_result_size < query::result_memory_limiter::maximum_result_size && concurrency < max_base_table_query_concurrency) {
concurrency *= 2;
}
coordinator_result<service::storage_proxy::coordinator_query_result> rqr = co_await qp.proxy().query_result(_schema, command, std::move(prange), options.get_consistency(), {timeout, state.get_permit(), state.get_client_state(), state.get_trace_state()});
if (!rqr.has_value()) {
co_return std::move(rqr).as_failure();
}
auto& qr = rqr.value();
is_short_read = qr.query_result->is_short_read();
// Results larger than 1MB should be shipped to the client immediately
page_limit_reached = is_paged && qr.query_result->buf().size() >= query::result_memory_limiter::maximum_result_size;
previous_result_size = qr.query_result->buf().size();
merger(std::move(qr.query_result));
}
co_return coordinator_result<value_type>(value_type(merger.get(), std::move(cmd)));
}
return qp.proxy().query_result(_schema, command, std::move(prange), options.get_consistency(), {timeout, state.get_permit(), state.get_client_state(), state.get_trace_state()})
.then(utils::result_wrap([is_paged, &previous_result_size, &ranges_to_vnodes, &merger] (service::storage_proxy::coordinator_query_result qr) -> coordinator_result<stop_iteration> {
auto is_short_read = qr.query_result->is_short_read();
// Results larger than 1MB should be shipped to the client immediately
const bool page_limit_reached = is_paged && qr.query_result->buf().size() >= query::result_memory_limiter::maximum_result_size;
previous_result_size = qr.query_result->buf().size();
merger(std::move(qr.query_result));
return stop_iteration(is_short_read || ranges_to_vnodes.empty() || page_limit_reached);
}));
}).then(utils::result_wrap([&merger, cmd]() mutable {
return make_ready_future<coordinator_result<value_type>>(value_type(merger.get(), std::move(cmd)));
}));
});
}
future<shared_ptr<cql_transport::messages::result_message>>
@@ -970,15 +963,18 @@ static void append_base_key_to_index_ck(std::vector<managed_bytes_view>& explode
bytes indexed_table_select_statement::compute_idx_token(const partition_key& key) const {
const column_definition& cdef = *_view_schema->clustering_key_columns().begin();
clustering_row empty_row(clustering_key_prefix::make_empty());
bytes computed_value;
bytes_opt computed_value;
if (!cdef.is_computed()) {
// FIXME(pgrabowski): this legacy code is here for backward compatibility and should be removed
// once "computed_columns feature" is supported by every node
computed_value = legacy_token_column_computation().compute_value(*_schema, key);
computed_value = legacy_token_column_computation().compute_value(*_schema, key, empty_row);
} else {
computed_value = cdef.get_computation().compute_value(*_schema, key);
computed_value = cdef.get_computation().compute_value(*_schema, key, empty_row);
}
return computed_value;
if (!computed_value) {
throw std::logic_error(format("No value computed for idx_token column {}", cdef.name()));
}
return *computed_value;
}
lw_shared_ptr<const service::pager::paging_state> indexed_table_select_statement::generate_view_paging_state_from_base_query_results(lw_shared_ptr<const service::pager::paging_state> paging_state,
@@ -1241,12 +1237,11 @@ indexed_table_select_statement::read_posting_list(query_processor& qp,
_view_schema->version(),
partition_slice,
qp.proxy().get_max_result_size(partition_slice),
query::tombstone_limit(qp.proxy().get_tombstone_limit()),
query::row_limit(limit),
query::partition_limit(query::max_partitions),
now,
tracing::make_trace_info(state.get_trace_state()),
query_id::create_null_id(),
utils::UUID(),
query::is_first_page::no,
options.get_timestamp(state));
@@ -1304,12 +1299,6 @@ indexed_table_select_statement::find_index_partition_ranges(query_processor& qp,
// to avoid outputting the same partition key twice, but luckily in
// the sorted order, these will be adjacent.
std::optional<dht::decorated_key> last_dk;
if (options.get_paging_state()) {
auto paging_state = options.get_paging_state();
auto base_pk = generate_base_key_from_index_pk<partition_key>(paging_state->get_partition_key(),
paging_state->get_clustering_key(), *_schema, *_view_schema);
last_dk = dht::decorate_key(*_schema, base_pk);
}
for (size_t i = 0; i < rs.size(); i++) {
const auto& row = rs.at(i);
std::vector<bytes> pk_columns;
@@ -1346,24 +1335,6 @@ indexed_table_select_statement::find_index_clustering_rows(query_processor& qp,
auto rs = cql3::untyped_result_set(rows);
std::vector<primary_key> primary_keys;
primary_keys.reserve(rs.size());
std::optional<std::reference_wrapper<primary_key>> last_primary_key;
// Set last_primary_key if indexing map values and not in the first
// query page. See comment below why last_primary_key is needed for
// indexing map values. We have a test for this with paging:
// test_secondary_index.py::test_index_map_values_paging.
std::optional<primary_key> page_start_primary_key;
if (_index.target_type() == cql3::statements::index_target::target_type::collection_values &&
options.get_paging_state()) {
auto paging_state = options.get_paging_state();
auto base_pk = generate_base_key_from_index_pk<partition_key>(paging_state->get_partition_key(),
paging_state->get_clustering_key(), *_schema, *_view_schema);
auto base_dk = dht::decorate_key(*_schema, base_pk);
auto base_ck = generate_base_key_from_index_pk<clustering_key>(paging_state->get_partition_key(),
paging_state->get_clustering_key(), *_schema, *_view_schema);
page_start_primary_key = primary_key{std::move(base_dk), std::move(base_ck)};
last_primary_key = *page_start_primary_key;
}
for (size_t i = 0; i < rs.size(); i++) {
const auto& row = rs.at(i);
auto pk_columns = _schema->partition_key_columns() | boost::adaptors::transformed([&] (auto& cdef) {
@@ -1375,24 +1346,7 @@ indexed_table_select_statement::find_index_clustering_rows(query_processor& qp,
return row.get_blob(cdef.name_as_text());
});
auto ck = clustering_key::from_range(ck_columns);
if (_index.target_type() == cql3::statements::index_target::target_type::collection_values) {
// The index on collection values is special in a way, as its' clustering key contains not only the
// base primary key, but also a column that holds the keys of the cells in the collection, which
// allows to distinguish cells with different keys but the same value.
// This has an unwanted consequence, that it's possible to receive two identical base table primary
// keys. Thankfully, they are guaranteed to occur consequently.
if (last_primary_key) {
const auto& [last_dk, last_ck] = last_primary_key->get();
if (last_dk.equal(*_schema, dk) && last_ck.equal(*_schema, ck)) {
continue;
}
}
}
primary_keys.emplace_back(primary_key{std::move(dk), std::move(ck)});
// Last use of this reference will be before next .emplace_back to the vector.
last_primary_key = primary_keys.back();
}
auto paging_state = rows->rs().get_metadata().paging_state();
return make_ready_future<coordinator_result<value_type>>(value_type(std::move(primary_keys), std::move(paging_state)));
@@ -1527,12 +1481,11 @@ parallelized_select_statement::do_execute(
_schema->version(),
std::move(slice),
qp.proxy().get_max_result_size(slice),
query::tombstone_limit(qp.proxy().get_tombstone_limit()),
query::row_limit(query::max_rows),
query::partition_limit(query::max_partitions),
now,
tracing::make_trace_info(state.get_trace_state()),
query_id::create_null_id(),
utils::UUID(),
query::is_first_page::no,
options.get_timestamp(state)
);
@@ -1546,7 +1499,7 @@ parallelized_select_statement::do_execute(
command->slice.options.set<query::partition_slice::option::allow_short_read>();
auto timeout_duration = get_timeout(state.get_client_state(), options);
auto timeout = db::timeout_clock::now() + timeout_duration;
auto timeout = lowres_system_clock::now() + timeout_duration;
auto reductions = _selection->get_reductions();
query::forward_request req = {
@@ -1573,8 +1526,12 @@ parallelized_select_statement::do_execute(
namespace raw {
static void validate_attrs(const cql3::attributes::raw& attrs) {
assert(!attrs.timestamp.has_value());
assert(!attrs.time_to_live.has_value());
if (attrs.timestamp) {
throw exceptions::invalid_request_exception("Specifying TIMESTAMP is not legal for SELECT statement");
}
if (attrs.time_to_live) {
throw exceptions::invalid_request_exception("Specifying TTL is not legal for SELECT statement");
}
}
select_statement::select_statement(cf_name cf_name,
@@ -1614,11 +1571,16 @@ void select_statement::maybe_jsonize_select_clause(data_dictionary::database db,
std::vector<data_type> selector_types;
std::vector<const column_definition*> defs;
selector_names.reserve(_select_clause.size());
selector_types.reserve(_select_clause.size());
auto selectables = selection::raw_selector::to_selectables(_select_clause, *schema);
selection::selector_factories factories(selection::raw_selector::to_selectables(_select_clause, *schema), db, schema, defs);
auto selectors = factories.new_instances();
for (size_t i = 0; i < selectors.size(); ++i) {
selector_names.push_back(selectables[i]->to_string());
if (_select_clause[i]->alias) {
selector_names.push_back(_select_clause[i]->alias->to_string());
} else {
selector_names.push_back(selectables[i]->to_string());
}
selector_types.push_back(selectors[i]->get_type());
}
@@ -1732,20 +1694,6 @@ std::unique_ptr<prepared_statement> select_statement::prepare(data_dictionary::d
stats,
std::move(prepared_attrs)
);
} else if (service::broadcast_tables::is_broadcast_table_statement(keyspace(), column_family())) {
stmt = ::make_shared<cql3::statements::strongly_consistent_select_statement>(
schema,
ctx.bound_variables_size(),
_parameters,
std::move(selection),
std::move(restrictions),
std::move(group_by_cell_indices),
is_reversed_,
std::move(ordering_comparator),
prepare_limit(db, ctx, _limit),
prepare_limit(db, ctx, _per_partition_limit),
stats,
std::move(prepared_attrs));
} else {
stmt = ::make_shared<cql3::statements::primary_key_select_statement>(
schema,
@@ -2183,10 +2131,7 @@ std::unique_ptr<cql3::statements::raw::select_statement> build_select_statement(
out << join(", ", cols);
}
// Note that cf_name may need to be quoted, just like column names above.
out << " FROM " << util::maybe_quote(sstring(cf_name));
if (!where_clause.empty()) {
out << " WHERE " << where_clause << " ALLOW FILTERING";
}
out << " FROM " << util::maybe_quote(sstring(cf_name)) << " WHERE " << where_clause << " ALLOW FILTERING";
return do_with_parser(out.str(), std::mem_fn(&cql3_parser::CqlParser::selectStatement));
}

View File

@@ -1,132 +0,0 @@
/*
* Copyright (C) 2022-present ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* SPDX-License-Identifier: (AGPL-3.0-or-later and Apache-2.0)
*/
#include "cql3/statements/strongly_consistent_modification_statement.hh"
#include <boost/range/adaptors.hpp>
#include <optional>
#include <seastar/core/future.hh>
#include <seastar/util/variant_utils.hh>
#include "bytes.hh"
#include "cql3/attributes.hh"
#include "cql3/expr/expression.hh"
#include "cql3/operation.hh"
#include "cql3/query_processor.hh"
#include "cql3/values.hh"
#include "timeout_config.hh"
#include "service/broadcast_tables/experimental/lang.hh"
namespace cql3 {
static logging::logger logger("strongly_consistent_modification_statement");
namespace statements {
strongly_consistent_modification_statement::strongly_consistent_modification_statement(
uint32_t bound_terms,
schema_ptr schema,
broadcast_tables::prepared_update query)
: cql_statement_opt_metadata{&timeout_config::write_timeout}
, _bound_terms{bound_terms}
, _schema{schema}
, _query{std::move(query)}
{ }
future<::shared_ptr<cql_transport::messages::result_message>>
strongly_consistent_modification_statement::execute(query_processor& qp, service::query_state& qs, const query_options& options) const {
return execute_without_checking_exception_message(qp, qs, options)
.then(cql_transport::messages::propagate_exception_as_future<shared_ptr<cql_transport::messages::result_message>>);
}
static
service::broadcast_tables::update_query
evaluate_prepared(
const broadcast_tables::prepared_update& query,
const query_options& options) {
return service::broadcast_tables::update_query{
.key = expr::evaluate(query.key, options).to_bytes(),
.new_value = expr::evaluate(query.new_value, options).to_bytes(),
.value_condition = query.value_condition
? std::optional<bytes_opt>{expr::evaluate(*query.value_condition, options).to_bytes_opt()}
: std::nullopt
};
}
future<::shared_ptr<cql_transport::messages::result_message>>
strongly_consistent_modification_statement::execute_without_checking_exception_message(query_processor& qp, service::query_state& qs, const query_options& options) const {
if (this_shard_id() != 0) {
co_return ::make_shared<cql_transport::messages::result_message::bounce_to_shard>(0, cql3::computed_function_values{});
}
auto result = co_await service::broadcast_tables::execute(
qp.get_group0_client(),
{ evaluate_prepared(_query, options) }
);
co_return co_await std::visit(make_visitor(
[] (service::broadcast_tables::query_result_conditional_update& qr) -> future<::shared_ptr<cql_transport::messages::result_message>> {
auto result_set = std::make_unique<cql3::result_set>(std::vector{
make_lw_shared<cql3::column_specification>(
db::system_keyspace::NAME,
db::system_keyspace::BROADCAST_KV_STORE,
::make_shared<cql3::column_identifier>("[applied]", false),
boolean_type
),
make_lw_shared<cql3::column_specification>(
db::system_keyspace::NAME,
db::system_keyspace::BROADCAST_KV_STORE,
::make_shared<cql3::column_identifier>("value", true),
utf8_type
)
});
result_set->add_row({ boolean_type->decompose(qr.is_applied), qr.previous_value });
return make_ready_future<::shared_ptr<cql_transport::messages::result_message>>(
::make_shared<cql_transport::messages::result_message::rows>(cql3::result{std::move(result_set)}));
},
[] (service::broadcast_tables::query_result_none&) -> future<::shared_ptr<cql_transport::messages::result_message>> {
return make_ready_future<::shared_ptr<cql_transport::messages::result_message>>();
},
[] (service::broadcast_tables::query_result_select&) -> future<::shared_ptr<cql_transport::messages::result_message>> {
on_internal_error(logger, "incorrect query result ");
}
), result);
}
uint32_t strongly_consistent_modification_statement::get_bound_terms() const {
return _bound_terms;
}
future<> strongly_consistent_modification_statement::check_access(query_processor& qp, const service::client_state& state) const {
const data_dictionary::database db = qp.db();
auto f = state.has_column_family_access(db, _schema->ks_name(), _schema->cf_name(), auth::permission::MODIFY);
if (_query.value_condition.has_value()) {
f = f.then([this, &state, db] {
return state.has_column_family_access(db, _schema->ks_name(), _schema->cf_name(), auth::permission::SELECT);
});
}
return f;
}
void strongly_consistent_modification_statement::validate(query_processor&, const service::client_state& state) const {
// Nothing to do, all validation has been done by prepare().
}
bool strongly_consistent_modification_statement::depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const {
return _schema->ks_name() == ks_name && (!cf_name || _schema->cf_name() == *cf_name);
}
}
}

View File

@@ -1,58 +0,0 @@
/*
* Copyright (C) 2022-present ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* SPDX-License-Identifier: (AGPL-3.0-or-later and Apache-2.0)
*/
#pragma once
#include "cql3/cql_statement.hh"
#include "cql3/statements/modification_statement.hh"
#include "service/broadcast_tables/experimental/lang.hh"
namespace cql3 {
namespace statements {
namespace broadcast_tables {
struct prepared_update {
expr::expression key;
expr::expression new_value;
std::optional<expr::expression> value_condition;
};
}
class strongly_consistent_modification_statement : public cql_statement_opt_metadata {
const uint32_t _bound_terms;
const schema_ptr _schema;
const broadcast_tables::prepared_update _query;
public:
strongly_consistent_modification_statement(uint32_t bound_terms, schema_ptr schema, broadcast_tables::prepared_update query);
virtual future<::shared_ptr<cql_transport::messages::result_message>>
execute(query_processor& qp, service::query_state& qs, const query_options& options) const override;
virtual future<::shared_ptr<cql_transport::messages::result_message>>
execute_without_checking_exception_message(query_processor& qp, service::query_state& qs, const query_options& options) const override;
virtual uint32_t get_bound_terms() const override;
virtual future<> check_access(query_processor& qp, const service::client_state& state) const override;
// Validate before execute, using client state and current schema
void validate(query_processor&, const service::client_state& state) const override;
virtual bool depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const override;
};
}
}

View File

@@ -1,129 +0,0 @@
/*
* Copyright (C) 2022-present ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* SPDX-License-Identifier: (AGPL-3.0-or-later and Apache-2.0)
*/
#include "cql3/statements/strongly_consistent_select_statement.hh"
#include <seastar/core/future.hh>
#include <seastar/core/on_internal_error.hh>
#include "cql3/restrictions/statement_restrictions.hh"
#include "cql3/query_processor.hh"
#include "service/broadcast_tables/experimental/lang.hh"
namespace cql3 {
namespace statements {
static logging::logger logger("strongly_consistent_select_statement");
static
expr::expression get_key(const cql3::expr::expression& partition_key_restrictions) {
const auto* conjunction = cql3::expr::as_if<cql3::expr::conjunction>(&partition_key_restrictions);
if (!conjunction || conjunction->children.size() != 1) {
throw service::broadcast_tables::unsupported_operation_error(fmt::format(
"partition key restriction: {}", partition_key_restrictions));
}
const auto* key_restriction = cql3::expr::as_if<cql3::expr::binary_operator>(&conjunction->children[0]);
if (!key_restriction) {
throw service::broadcast_tables::unsupported_operation_error(fmt::format("partition key restriction: {}", *conjunction));
}
const auto* column = cql3::expr::as_if<cql3::expr::column_value>(&key_restriction->lhs);
if (!column || column->col->kind != column_kind::partition_key ||
key_restriction->op != cql3::expr::oper_t::EQ) {
throw service::broadcast_tables::unsupported_operation_error(fmt::format("key restriction: {}", *key_restriction));
}
return key_restriction->rhs;
}
static
bool is_selecting_only_value(const cql3::selection::selection& selection) {
return selection.is_trivial() &&
selection.get_column_count() == 1 &&
selection.get_columns()[0]->name() == "value";
}
strongly_consistent_select_statement::strongly_consistent_select_statement(schema_ptr schema, uint32_t bound_terms,
lw_shared_ptr<const parameters> parameters,
::shared_ptr<selection::selection> selection,
::shared_ptr<const restrictions::statement_restrictions> restrictions,
::shared_ptr<std::vector<size_t>> group_by_cell_indices,
bool is_reversed,
ordering_comparator_type ordering_comparator,
std::optional<expr::expression> limit,
std::optional<expr::expression> per_partition_limit,
cql_stats &stats,
std::unique_ptr<attributes> attrs)
: select_statement{schema, bound_terms, parameters, selection, restrictions, group_by_cell_indices, is_reversed, ordering_comparator, std::move(limit), std::move(per_partition_limit), stats, std::move(attrs)},
_query{prepare_query()}
{ }
broadcast_tables::prepared_select strongly_consistent_select_statement::prepare_query() const {
if (!is_selecting_only_value(*_selection)) {
throw service::broadcast_tables::unsupported_operation_error("only 'value' selector is allowed");
}
return {
.key = get_key(_restrictions->get_partition_key_restrictions())
};
}
static
service::broadcast_tables::select_query
evaluate_prepared(
const broadcast_tables::prepared_select& query,
const query_options& options) {
return service::broadcast_tables::select_query{
.key = expr::evaluate(query.key, options).to_bytes()
};
}
future<::shared_ptr<cql_transport::messages::result_message>>
strongly_consistent_select_statement::execute_without_checking_exception_message(query_processor& qp, service::query_state& qs, const query_options& options) const {
if (this_shard_id() != 0) {
co_return ::make_shared<cql_transport::messages::result_message::bounce_to_shard>(0, cql3::computed_function_values{});
}
auto result = co_await service::broadcast_tables::execute(
qp.get_group0_client(),
{ evaluate_prepared(_query, options) }
);
auto query_result = std::get_if<service::broadcast_tables::query_result_select>(&result);
if (!query_result) {
on_internal_error(logger, "incorrect query result ");
}
auto result_set = std::make_unique<cql3::result_set>(std::vector{
make_lw_shared<cql3::column_specification>(
db::system_keyspace::NAME,
db::system_keyspace::BROADCAST_KV_STORE,
::make_shared<cql3::column_identifier>("value", true),
utf8_type
)
});
if (query_result->value) {
result_set->add_row({ query_result->value });
}
co_return ::make_shared<cql_transport::messages::result_message::rows>(cql3::result{std::move(result_set)});
}
}
}

View File

@@ -1,53 +0,0 @@
/*
* Copyright (C) 2022-present ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* SPDX-License-Identifier: (AGPL-3.0-or-later and Apache-2.0)
*/
#pragma once
#include "cql3/expr/expression.hh"
#include "cql3/statements/select_statement.hh"
#include "service/broadcast_tables/experimental/lang.hh"
namespace cql3 {
namespace statements {
namespace broadcast_tables {
struct prepared_select {
expr::expression key;
};
}
class strongly_consistent_select_statement : public select_statement {
const broadcast_tables::prepared_select _query;
broadcast_tables::prepared_select prepare_query() const;
public:
strongly_consistent_select_statement(schema_ptr schema,
uint32_t bound_terms,
lw_shared_ptr<const parameters> parameters,
::shared_ptr<selection::selection> selection,
::shared_ptr<const restrictions::statement_restrictions> restrictions,
::shared_ptr<std::vector<size_t>> group_by_cell_indices,
bool is_reversed,
ordering_comparator_type ordering_comparator,
std::optional<expr::expression> limit,
std::optional<expr::expression> per_partition_limit,
cql_stats &stats,
std::unique_ptr<cql3::attributes> attrs);
virtual future<::shared_ptr<cql_transport::messages::result_message>>
execute_without_checking_exception_message(query_processor& qp, service::query_state& qs, const query_options& options) const override;
};
}
}

View File

@@ -8,7 +8,6 @@
* SPDX-License-Identifier: (AGPL-3.0-or-later and Apache-2.0)
*/
#include "cql3/statements/raw/truncate_statement.hh"
#include "cql3/statements/truncate_statement.hh"
#include "cql3/statements/prepared_statement.hh"
#include "cql3/cql_statement.hh"
@@ -16,54 +15,15 @@
#include "cql3/query_processor.hh"
#include "service/storage_proxy.hh"
#include <optional>
#include "validation.hh"
namespace cql3 {
namespace statements {
namespace raw {
truncate_statement::truncate_statement(cf_name name, std::unique_ptr<attributes::raw> attrs)
: cf_statement(std::move(name))
, _attrs(std::move(attrs))
truncate_statement::truncate_statement(cf_name name)
: cf_statement{std::move(name)}
, cql_statement_no_metadata(&timeout_config::truncate_timeout)
{
// Validate the attributes.
// Currently, TRUNCATE supports only USING TIMEOUT
assert(!_attrs->timestamp.has_value());
assert(!_attrs->time_to_live.has_value());
}
std::unique_ptr<prepared_statement> truncate_statement::prepare(data_dictionary::database db, cql_stats& stats) {
schema_ptr schema = validation::validate_column_family(db, keyspace(), column_family());
auto prepared_attributes = _attrs->prepare(db, keyspace(), column_family());
auto ctx = get_prepare_context();
prepared_attributes->fill_prepare_context(ctx);
auto stmt = ::make_shared<cql3::statements::truncate_statement>(std::move(schema), std::move(prepared_attributes));
return std::make_unique<prepared_statement>(std::move(stmt));
}
} // namespace raw
truncate_statement::truncate_statement(schema_ptr schema, std::unique_ptr<attributes> prepared_attrs)
: cql_statement_no_metadata(&timeout_config::truncate_timeout)
, _schema{std::move(schema)}
, _attrs(std::move(prepared_attrs))
{
}
truncate_statement::truncate_statement(const truncate_statement& ts)
: cql_statement_no_metadata(ts)
, _schema(ts._schema)
, _attrs(std::make_unique<attributes>(*ts._attrs))
{ }
const sstring& truncate_statement::keyspace() const {
return _schema->ks_name();
}
const sstring& truncate_statement::column_family() const {
return _schema->cf_name();
}
uint32_t truncate_statement::get_bound_terms() const
@@ -71,6 +31,11 @@ uint32_t truncate_statement::get_bound_terms() const
return 0;
}
std::unique_ptr<prepared_statement> truncate_statement::prepare(data_dictionary::database db,cql_stats& stats)
{
return std::make_unique<prepared_statement>(::make_shared<truncate_statement>(*this));
}
bool truncate_statement::depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const
{
return false;
@@ -95,18 +60,13 @@ truncate_statement::execute(query_processor& qp, service::query_state& state, co
if (qp.db().find_schema(keyspace(), column_family())->is_view()) {
throw exceptions::invalid_request_exception("Cannot TRUNCATE materialized view directly; must truncate base table instead");
}
auto timeout_in_ms = std::chrono::duration_cast<std::chrono::milliseconds>(get_timeout(state.get_client_state(), options));
return qp.proxy().truncate_blocking(keyspace(), column_family(), timeout_in_ms).handle_exception([](auto ep) {
return qp.proxy().truncate_blocking(keyspace(), column_family()).handle_exception([](auto ep) {
throw exceptions::truncate_exception(ep);
}).then([] {
return ::shared_ptr<cql_transport::messages::result_message>{};
});
}
db::timeout_clock::duration truncate_statement::get_timeout(const service::client_state& state, const query_options& options) const {
return _attrs->is_timeout_set() ? _attrs->get_timeout(options) : state.get_timeout_config().truncate_timeout;
}
}
}

View File

@@ -12,7 +12,6 @@
#include "cql3/statements/raw/cf_statement.hh"
#include "cql3/cql_statement.hh"
#include "cql3/attributes.hh"
namespace cql3 {
@@ -20,20 +19,14 @@ class query_processor;
namespace statements {
class truncate_statement : public cql_statement_no_metadata {
schema_ptr _schema;
const std::unique_ptr<attributes> _attrs;
class truncate_statement : public raw::cf_statement, public cql_statement_no_metadata {
public:
truncate_statement(schema_ptr schema, std::unique_ptr<attributes> prepared_attrs);
truncate_statement(const truncate_statement&);
truncate_statement(truncate_statement&&) = default;
const sstring& keyspace() const;
const sstring& column_family() const;
truncate_statement(cf_name name);
virtual uint32_t get_bound_terms() const override;
virtual std::unique_ptr<prepared_statement> prepare(data_dictionary::database db, cql_stats& stats) override;
virtual bool depends_on(std::string_view ks_name, std::optional<std::string_view> cf_name) const override;
virtual future<> check_access(query_processor& qp, const service::client_state& state) const override;
@@ -42,8 +35,6 @@ public:
virtual future<::shared_ptr<cql_transport::messages::result_message>>
execute(query_processor& qp, service::query_state& state, const query_options& options) const override;
private:
db::timeout_clock::duration get_timeout(const service::client_state& state, const query_options& options) const;
};
}

View File

@@ -9,9 +9,6 @@
*/
#include "update_statement.hh"
#include "cql3/expr/expression.hh"
#include "cql3/statements/strongly_consistent_modification_statement.hh"
#include "service/broadcast_tables/experimental/lang.hh"
#include "raw/update_statement.hh"
#include "raw/insert_statement.hh"
@@ -25,7 +22,6 @@
#include "types/user.hh"
#include "concrete_types.hh"
#include "validation.hh"
#include <optional>
namespace cql3 {
@@ -258,94 +254,6 @@ void insert_prepared_json_statement::execute_operations_for_key(mutation& m, con
}
}
static
expr::expression get_key(const cql3::expr::expression& partition_key_restrictions) {
const auto* conjunction = cql3::expr::as_if<cql3::expr::conjunction>(&partition_key_restrictions);
if (!conjunction || conjunction->children.size() != 1) {
throw service::broadcast_tables::unsupported_operation_error(fmt::format(
"partition key restriction: {}", partition_key_restrictions));
}
const auto* key_restriction = cql3::expr::as_if<cql3::expr::binary_operator>(&conjunction->children[0]);
if (!key_restriction) {
throw service::broadcast_tables::unsupported_operation_error(fmt::format("partition key restriction: {}", *conjunction));
}
const auto* column = cql3::expr::as_if<cql3::expr::column_value>(&key_restriction->lhs);
if (!column || column->col->kind != column_kind::partition_key ||
key_restriction->op != cql3::expr::oper_t::EQ) {
throw service::broadcast_tables::unsupported_operation_error(fmt::format("key restriction: {}", *key_restriction));
}
return key_restriction->rhs;
}
static
void prepare_new_value(broadcast_tables::prepared_update& query, const std::vector<::shared_ptr<operation>>& operations) {
if (operations.size() != 1) {
throw service::broadcast_tables::unsupported_operation_error("only one operation is allowed and must be of the form \"value = X\"");
}
operations[0]->prepare_for_broadcast_tables(query);
}
static
std::optional<expr::expression> get_value_condition(const std::vector<lw_shared_ptr<column_condition>>& conditions) {
if (conditions.size() == 0) {
return std::nullopt;
}
if (conditions.size() > 1) {
throw service::broadcast_tables::unsupported_operation_error(fmt::format("conditions: {}", fmt::join(
conditions | boost::adaptors::transformed([] (const auto& cond) {
return fmt::format("{}{}", cond->get_operation(), cond->get_value());
}), ", ")));
}
const auto& condition = conditions[0];
if (!condition->get_value() || condition->get_operation() != cql3::expr::oper_t::EQ) {
throw service::broadcast_tables::unsupported_operation_error(fmt::format(
"condition: {}{}", condition->get_operation(), condition->get_value()));
}
return condition->get_value();
}
::shared_ptr<strongly_consistent_modification_statement>
update_statement::prepare_for_broadcast_tables() const {
if (attrs) {
if (attrs->is_time_to_live_set()) {
throw service::broadcast_tables::unsupported_operation_error{"USING TTL"};
}
if (attrs->is_timestamp_set()) {
throw service::broadcast_tables::unsupported_operation_error{"USING TIMESTAMP"};
}
if (attrs->is_timeout_set()) {
throw service::broadcast_tables::unsupported_operation_error{"USING TIMEOUT"};
}
}
broadcast_tables::prepared_update query = {
.key = get_key(restrictions().get_partition_key_restrictions()),
.value_condition = get_value_condition(_regular_conditions),
};
prepare_new_value(query, _column_operations);
return ::make_shared<strongly_consistent_modification_statement>(
get_bound_terms(),
s,
query
);
}
namespace raw {
insert_statement::insert_statement(cf_name name,

View File

@@ -42,9 +42,6 @@ private:
virtual void add_update_for_key(mutation& m, const query::clustering_range& range, const update_parameters& params, const json_cache_opt& json_cache) const override;
virtual void execute_operations_for_key(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params, const json_cache_opt& json_cache) const;
public:
virtual ::shared_ptr<strongly_consistent_modification_statement> prepare_for_broadcast_tables() const override;
};
/*

View File

@@ -78,8 +78,4 @@ bool operator==(const raw_value& v1, const raw_value& v2) {
return false;
}
std::ostream& operator<<(std::ostream& os, const raw_value& value) {
return os << value.view();
}
}

View File

@@ -85,14 +85,6 @@ public:
bool is_unset_value() const {
return std::holds_alternative<unset_value>(_data);
}
// An empty value is not null or unset, but it has 0 bytes of data.
// An empty int value can be created in CQL using blobasint(0x).
bool is_empty_value() const {
if (is_null() || is_unset_value()) {
return false;
}
return size_bytes() == 0;
}
bool is_value() const {
return _data.index() <= 1;
}
@@ -241,14 +233,6 @@ public:
bool is_null_or_unset() const {
return !is_value();
}
// An empty value is not null or unset, but it has 0 bytes of data.
// An empty int value can be created in CQL using blobasint(0x).
bool is_empty_value() const {
if (is_null_or_unset()) {
return false;
}
return view().size_bytes() == 0;
}
bool is_value() const {
return _data.index() <= 1;
}
@@ -256,28 +240,10 @@ public:
return is_value();
}
bytes to_bytes() && {
return std::visit(overloaded_functor{
[](bytes&& bytes_val) { return std::move(bytes_val); },
[](managed_bytes&& managed_bytes_val) { return ::to_bytes(managed_bytes_val); },
[](null_value&&) -> bytes {
throw std::runtime_error("to_bytes() called on raw value that is null");
},
[](unset_value&&) -> bytes {
throw std::runtime_error("to_bytes() called on raw value that is unset_value");
}
}, std::move(_data));
}
bytes_opt to_bytes_opt() && {
return std::visit(overloaded_functor{
[](bytes&& bytes_val) { return bytes_opt(bytes_val); },
[](managed_bytes&& managed_bytes_val) { return bytes_opt(::to_bytes(managed_bytes_val)); },
[](null_value&&) -> bytes_opt {
return std::nullopt;
},
[](unset_value&&) -> bytes_opt {
return std::nullopt;
}
}, std::move(_data));
switch (_data.index()) {
case 0: return std::move(std::get<bytes>(_data));
default: return ::to_bytes(std::get<managed_bytes>(_data));
}
}
managed_bytes to_managed_bytes() && {
return std::visit(overloaded_functor{
@@ -307,7 +273,6 @@ public:
friend class raw_value_view;
friend bool operator==(const raw_value& v1, const raw_value& v2);
friend std::ostream& operator<<(std::ostream& os, const raw_value& value);
};
}

View File

@@ -100,12 +100,12 @@ database::find_table(std::string_view ks, std::string_view table) const {
}
std::optional<table>
database::try_find_table(table_id id) const {
database::try_find_table(utils::UUID id) const {
return _ops->try_find_table(*this, id);
}
table
database::find_column_family(table_id uuid) const {
database::find_column_family(utils::UUID uuid) const {
auto t = try_find_table(uuid);
if (!t) {
throw no_such_column_family(uuid);
@@ -119,7 +119,7 @@ database::find_schema(std::string_view ks, std::string_view table) const {
}
schema_ptr
database::find_schema(table_id uuid) const {
database::find_schema(utils::UUID uuid) const {
return find_column_family(uuid).schema();
}
@@ -326,7 +326,7 @@ no_such_keyspace::no_such_keyspace(std::string_view ks_name)
{
}
no_such_column_family::no_such_column_family(const table_id& uuid)
no_such_column_family::no_such_column_family(const utils::UUID& uuid)
: runtime_error{format("Can't find a column family with UUID {}", uuid)}
{
}
@@ -336,7 +336,7 @@ no_such_column_family::no_such_column_family(std::string_view ks_name, std::stri
{
}
no_such_column_family::no_such_column_family(std::string_view ks_name, const table_id& uuid)
no_such_column_family::no_such_column_family(std::string_view ks_name, const utils::UUID& uuid)
: runtime_error{format("Can't find a column family with UUID {} in keyspace {}", uuid, ks_name)}
{
}

View File

@@ -14,7 +14,7 @@
#include <vector>
#include <seastar/core/shared_ptr.hh>
#include "seastarx.hh"
#include "schema_fwd.hh"
#include "utils/UUID.hh"
namespace replica {
class database; // For transition; remove
@@ -62,9 +62,9 @@ public:
class no_such_column_family : public std::runtime_error {
public:
no_such_column_family(const table_id& uuid);
no_such_column_family(const utils::UUID& uuid);
no_such_column_family(std::string_view ks_name, std::string_view cf_name);
no_such_column_family(std::string_view ks_name, const table_id& uuid);
no_such_column_family(std::string_view ks_name, const utils::UUID& uuid);
};
class table {
@@ -105,13 +105,13 @@ public:
std::vector<keyspace> get_keyspaces() const;
std::vector<table> get_tables() const;
table find_table(std::string_view ks, std::string_view table) const; // throws no_such_column_family
table find_column_family(table_id uuid) const; // throws no_such_column_family
table find_column_family(utils::UUID uuid) const; // throws no_such_column_family
schema_ptr find_schema(std::string_view ks, std::string_view table) const; // throws no_such_column_family
schema_ptr find_schema(table_id uuid) const; // throws no_such_column_family
schema_ptr find_schema(utils::UUID uuid) const; // throws no_such_column_family
table find_column_family(schema_ptr s) const;
bool has_schema(std::string_view ks_name, std::string_view cf_name) const;
std::optional<table> try_find_table(std::string_view ks, std::string_view table) const;
std::optional<table> try_find_table(table_id id) const;
std::optional<table> try_find_table(utils::UUID id) const;
const db::config& get_config() const;
std::set<sstring> existing_index_names(std::string_view ks_name, std::string_view cf_to_exclude = sstring()) const;
schema_ptr find_indexed_table(std::string_view ks_name, std::string_view index_name) const;

Some files were not shown because too many files have changed in this diff Show More