Compare commits

...

559 Commits

Author SHA1 Message Date
Pavel Emelyanov
42a930da3b Update seastar submodule
* seastar e45cef9c...1b299004 (3):
  > rpc: Abort server connection streams on stop
  > rpc: Do not register stream to dying parent
  > rpc: Fix client-side stream registration race

refs: #13100

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-06 12:35:37 +03:00
Takuya ASADA
660d68953d scylla_fstrim_setup: start scylla-fstrim.timer on setup
Currently, scylla_fstrim_setup does not start scylla-fstrim.timer and
just enables it, so the timer starts only after rebooted.
This is incorrect behavior, we start start it during the setup.

Also, unmask is unnecessary for enabling the timer.

Fixes #14249

Closes #14252

(cherry picked from commit c70a9cbffe)

Closes #14422
2023-07-18 16:03:53 +03:00
Botond Dénes
bff9b459ef repair: Release permit earlier when the repair_reader is done
Consider

- 10 repair instances take all the 10 _streaming_concurrency_sem

- repair readers are done but the permits are not released since they
  are waiting for view update _registration_sem

- view updates trying to take the _streaming_concurrency_sem to make
  progress of view update so it could release _registration_sem, but it
  could not take _streaming_concurrency_sem since the 10 repair
  instances have taken them

- deadlock happens

Note, when the readers are done, i.e., reaching EOS, the repair reader
replaces the underlying (evictable) reader with an empty reader. The
empty reader is not evictable, so the resources cannot be forcibly
released.

To fix, release the permits manually as soon as the repair readers are
done even if the repair job is waiting for _registration_sem.

Fixes #14676

Closes #14677

(cherry picked from commit 1b577e0414)
2023-07-14 18:18:05 +03:00
Anna Stuchlik
4c77b86f26 doc: document the minimum_keyspace_rf option
Fixes https://github.com/scylladb/scylladb/issues/14598

This commit adds the description of minimum_keyspace_rf
to the CREATE KEYSPACE section of the docs.
(When we have the reference section for all ScyllaDB options,
an appropriate link should be added.)

This commit must be backported to branch-5.3, because
the feature is already on that branch.

Closes #14686

(cherry picked from commit 9db9dedb41)
2023-07-14 15:48:28 +03:00
Marcin Maliszkiewicz
fb0afb04ae alternator: close output_stream when exception is thrown during response streaming
When exception occurs and we omit closing output_stream then the whole process is brought down
by an assertion in ~output_stream.

Fixes https://github.com/scylladb/scylladb/issues/14453
Relates https://github.com/scylladb/scylladb/issues/14403

Closes #14454

(cherry picked from commit 6424dd5ec4)
2023-07-13 22:48:36 +03:00
Nadav Har'El
bdb93af423 Merge 'Yield while building large results in Alternator - rjson::print, executor::batch_get_item' from Marcin Maliszkiewicz
Adds preemption points used in Alternator when:
 - sending bigger json response
 - building results for BatchGetItem

I've tested manually by inserting in preemptible sections (e.g. before `os.write`) code similar to:

    auto start  = std::chrono::steady_clock::now();
    do { } while ((std::chrono::steady_clock::now() - start) < 100ms);

and seeing reactor stall times. After the patch they
were not increasing while before they kept building up due to no preemption.

Refs #7926
Fixes #13689

Closes #12351

* github.com:scylladb/scylladb:
  alternator: remove redundant flush call in make_streamed
  utils: yield when streaming json in print()
  alternator: yield during BatchGetItem operation

(cherry picked from commit d2e089777b)
2023-07-13 22:48:30 +03:00
Avi Kivity
2fffaf36c4 Merge ' message: match unknown tenants to the default tenant' from Botond Dénes
On connection setup, the isolation cookie of the connection is matched to the appropriate scheduling group. This is achieved by iterating over the known statement tenant connection types as well as the system connections and choosing the one with a matching name.

If a match is not found, it is assumed that the cluster is upgraded and the remote node has a scheduling group the local one doesn't have. To avoid demoting a scheduling group of unknown importance, in this case the default scheduling group is chosen.

This is problematic when upgrading an OSS cluster to an enterprise version, as the scheduling groups of the enterprise service-levels will match none of the statement tenants and will hence fall-back to the default scheduling group. As a consequence, while the cluster is mixed, user workload on old (OSS) nodes, will be executed under the system scheduling group and concurrency semaphore. Not only does this mean that user workloads are directly competing for resources with system ones, but the two workloads are now sharing the semaphore too, reducing the available throughput. This usually manifests in queries timing out on the old (OSS) nodes in the cluster.

This PR proposes to fix this, by recognizing that the unknown scheduling group is in fact a tenant this node doesn't know yet, and matching it with the default statement tenant. With this, order should be restored, with service-level connections being recognized as user connections and being executed in the statement scheduling group and the statement (user) concurrency semaphore.

I tested this manually, by creating a cluster of 2 OSS nodes, then upgrading one of the nodes to enterprise and verifying (with extra logging) that service level connections are matched to the default statement tenant after the PR and they indeed match to the default scheduling group before.

Fixes: #13841
Fixes: #12552

Closes #13843

* github.com:scylladb/scylladb:
  message: match unknown tenants to the default tenant
  message: generalize per-tenant connection types

(cherry picked from commit a7c2c9f92b)
2023-07-12 15:31:30 +03:00
Avi Kivity
dab150f3d8 Revert "Merge 'cql: update permissions when creating/altering a function/keyspace' from Wojciech Mitros"
This reverts commit 52e4edfd5e, reversing
changes made to d2d53fc1db. The associated test
fails with about 10% probablity, which blocks other work.

Fixes #14395

Closes #14662
2023-07-12 13:23:17 +03:00
Tomasz Grabiec
7b6db3b69a Merge 'atomic_cell: compare value last' from Benny Halevy
Currently, when two cells have the same write timestamp
and both are alive or expiring, we compare their value first,
before checking if either of them is expiring
and if both are expiring, comparing their expiration time
and ttl value to determine which of them will expire
later or was written later.

This was based on an early version of Cassandra.
However, the Cassandra implementation rightfully changed in
e225c88a65 ([CASSANDRA-14592](https://issues.apache.org/jira/browse/CASSANDRA-14592)),
where the cell expiration is considered before the cell value.

To summarize, the motivation for this change is three fold:
1. Cassandra compatibility
2. Prevent an edge case where a null value is returned by select query when an expired cell has a larger value than a cell with later expiration.
3. A generalization of the above: value-based reconciliation may cause select query to return a mixture of upserts, if multiple upserts use the same timeastamp but have different expiration times.  If the cell value is considered before expiration, the select result may contain cells from different inserts, while reconciling based the expiration times will choose cells consistently from either upserts, as all cells in the respective upsert will carry the same expiration time.

\Fixes scylladb/scylladb#14182

Also, this series:
- updates dml documentation
- updates internal documentation
- updates and adds unit tests and cql pytest reproducing #14182

\Closes scylladb/scylladb#14183

* github.com:scylladb/scylladb:
  docs: dml: add update ordering section
  cql-pytest: test_using_timestamp: add tests for rewrites using same timestamp
  mutation_partition: compare_row_marker_for_merge: consider ttl in case expiry is the same
  atomic_cell: compare_atomic_cell_for_merge: update and add documentation
  compare_atomic_cell_for_merge: compare value last for live cells
  mutation_test: test_cell_ordering: improve debuggability

(cherry picked from commit 87b4606cd6)

Closes #14647
2023-07-12 09:58:59 +03:00
Botond Dénes
b096c0d97d Merge '[backport 5.3] view: fix range tombstone handling on flushes in view_updating_consumer' from Michał Chojnowski
View update routines accept mutation objects.
But what comes out of staging sstable readers is a stream of mutation_fragment_v2 objects.
To build view updates after a repair/streaming, we have to convert the fragment stream into mutations. This is done by piping the stream to mutation_rebuilder_v2.

To keep memory usage limited, the stream for a single partition might have to be split into multiple partial mutation objects. view_update_consumer does that, but in improper way -- when the split/flush happens inside an active range tombstone, the range tombstone isn't closed properly. This is illegal, and triggers an internal error.

This patch fixes the problem by closing the active range tombstone (and reopening in the same position in the next mutation object).

The tombstone is closed just after the last seen clustered position. This is not necessary for correctness -- for example we could delay all processing of the range tombstone until we see its end bound -- but it seems like the most natural semantic.

Backported from c25201c1a3. view_build_test.cc needed some tiny adjustments for the backport.

Closes #14622
Fixes #14503

* github.com:scylladb/scylladb:
  test: view_build_test: add range tombstones to test_view_update_generator_buffering
  test: view_build_test: add test_view_udate_generator_buffering_with_random_mutations
  view_updating_consumer: make buffer limit a variable
  view: fix range tombstone handling on flushes in view_updating_consumer
2023-07-11 15:05:56 +03:00
Michał Chojnowski
55e157be4d test: view_build_test: add range tombstones to test_view_update_generator_buffering
This patch adds a full-range tombstone to the compacted mutation.
This raises the coverage of the test. In particular, it reproduces
issue #14503, which should have been caught by this test, but wasn't.
2023-07-11 10:49:40 +02:00
Michał Chojnowski
75593a6178 test: view_build_test: add test_view_udate_generator_buffering_with_random_mutations
A random mutation test for view_updating_consumer's buffering logic.
Reproduces #14503.
2023-07-11 10:49:40 +02:00
Michał Chojnowski
bbbc4aafef view_updating_consumer: make buffer limit a variable
The limit doesn't change at runtime, but we this patch makes it variable for
unit testing purposes.
2023-07-11 10:48:34 +02:00
Michał Chojnowski
b74411f0ed view: fix range tombstone handling on flushes in view_updating_consumer
View update routines accept `mutation` objects.
But what comes out of staging sstable readers is a stream of
mutation_fragment_v2 objects.
To build view updates after a repair/streaming, we have to
convert the fragment stream into `mutation`s. This is done by piping
the stream to mutation_rebuilder_v2.

To keep memory usage limited, the stream for a single partition might
have to be split into multiple partial `mutation` objects.
view_update_consumer does that, but in improper way -- when the
split/flush happens inside an active range tombstone, the range
tombstone isn't closed properly. This is illegal, and triggers an
internal error.

This patch fixes the problem by closing the active range tombstone
(and reopening in the same position in the next `mutation` object).

The tombstone is closed just after the last seen clustered position.
This is not necessary for correctness -- for example we could delay
all processing of the range tombstone until we see its end
bound -- but it seems like the most natural semantic.

Fixes #14503
2023-07-11 10:48:34 +02:00
Calle Wilund
995ffd6ee0 storage_proxy: Make split_stats resilient to being called from different scheduling group
Fixes #11017

When doing writes, storage proxy creates types deriving from abstract_write_response_handler.
These are created in the various scheduling groups executing the write inducing code. They
pick up a group-local reference to the various metrics used by SP. Normally all code
using (and esp. modifying) these metrics are executed in the same scheduling group.
However, if gossip sees a node go down, it will notify listeners, which eventually
calls get_ep_stat and register_metrics.
This code (before this patch) uses _active_ scheduling group to eventually add
metrics, using a local dict as guard against double regs. If, as described above,
we're called in a different sched group than the original one however, this
can cause double registrations.

Fixed here by keeping a reference to creating scheduling group and using this, not
active one, when/if creating new metrics.

Closes #14294

(cherry picked from commit f18e967939)
2023-07-11 11:42:28 +03:00
Piotr Dulikowski
88e843c9db combined: mergers: remove recursion in operator()()
In mutation_reader_merger and clustering_order_reader_merger, the
operator()() is responsible for producing mutation fragments that will
be merged and pushed to the combined reader's buffer. Sometimes, it
might have to advance existing readers, open new and / or close some
existing ones, which requires calling a helper method and then calling
operator()() recursively.

In some unlucky circumstances, a stack overflow can occur:

- Readers have to be opened incrementally,
- Most or all readers must not produce any fragments and need to report
  end of stream without preemption,
- There has to be enough readers opened within the lifetime of the
  combined reader (~500),
- All of the above needs to happen within a single task quota.

In order to prevent such a situation, the code of both reader merger
classes were modified not to perform recursion at all. Most of the code
of the operator()() was moved to maybe_produce_batch which does not
recur if it is not possible for it to produce a fragment, instead it
returns std::nullopt and operator()() calls this method in a loop via
seastar::repeat_until_value.

A regression test is added.

Fixes: scylladb/scylladb#14415

Closes #14452

(cherry picked from commit ee9bfb583c)

Closes #14606
2023-07-11 11:38:33 +03:00
Takuya ASADA
8cfeb6f509 test/perf/perf_fast_forward: avoid allocating AIO slots on startup
On main.cc, we have early commands which want to run prior to initialize
Seastar.
Currently, perf_fast_forward is breaking this, since it defined
"app_template app" on global variable.
To avoid that, we should defer running app_template's constructor in
scylla_fast_forward_main().

Fixes #13945

Closes #14026

(cherry picked from commit 45ef09218e)
2023-07-10 15:34:06 +03:00
Jan Ciolek
7adc9aa50c forward_service: fix forgetting case-sensitivity in aggregates
There was a bug that caused aggregates to fail when
used on column-sensitive columns.

For example:
```
SELECT SUM("SomeColumn") FROM ks.table;
```
would fail, with a message saying that there
is no column "somecolumn".

This is because the case-sensitivity got lost on the way.

For non case-sensitive column names we convert them to lowercase,
but for case sensitive names we have to preserve the name
as originally written.

The problem was in `forward_service` - we took a column name
and created a non case-sensitive `column_identifier` out of it.
This converted the name to lowercase, and later such column
couldn't be found.

To fix it, let's make the `column_identifier` case-sensitive.
It will preserve the name, without converting it to lowercase.

Fixes: https://github.com/scylladb/scylladb/issues/14307

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
(cherry picked from commit 7fca350075)
2023-07-10 15:22:25 +03:00
Botond Dénes
15d4475870 Merge 'doc: fix rollback in the 4.3-to-2021.1, 5.0-to-2022.1, and 5.1-to-2022.2 upgrade guides' from Anna Stuchlik
This PR fixes the Restore System Tables section of the upgrade guides by adding a command to clean upgraded SStables during rollback or adding the entire section to restore system tables (which was missing from the older documents).

This PR fixes is a bug and must be backported to branch-5.3, branch-5.2., and branch-5.1.

Refs: https://github.com/scylladb/scylla-enterprise/issues/3046

- [x]  5.1-to-2022.2 - update command (backport to branch-5.3, branch-5.2, and branch-5.1)
- [x]  5.0-to-2022.1 - add "Restore system tables" to rollback (backport to branch-5.3, branch-5.2, and branch-5.1)
- [x]  4.3-to-2021.1 - add "Restore system tables" to rollback (backport to branch-5.3, branch-5.2, and branch-5.1)

(see https://github.com/scylladb/scylla-enterprise/issues/3046#issuecomment-1604232864)

Closes #14444

* github.com:scylladb/scylladb:
  doc: fix rollback in 4.3-to-2021.1 upgrade guide
  doc: fix rollback in 5.0-to-2022.1 upgrade guide
  doc: fix rollback in 5.1-to-2022.2 upgrade guide

(cherry picked from commit 8a7261fd70)
2023-07-10 15:15:48 +03:00
Kamil Braun
bbe5c323a9 storage_proxy: query_partition_key_range_concurrent: don't access empty range
`query_partition_range_concurrent` implements an optimization when
querying a token range that intersects multiple vnodes. Instead of
sending a query for each vnode separately, it sometimes sends a single
query to cover multiple vnodes - if the intersection of replica sets for
those vnodes is large enough to satisfy the CL and good enough in terms
of the heat metric. To check the latter condition, the code would take
the smallest heat metric of the intersected replica set and compare them
to smallest heat metrics of replica sets calculated separately for each
vnode.

Unfortunately, there was an edge case that the code didn't handle: the
intersected replica set might be empty and the code would access an
empty range.

This was catched by an assertion added in
8db1d75c6c by the dtest
`test_query_dc_with_rf_0_does_not_crash_db`.

The fix is simple: check if the intersected set is empty - if so, don't
calculate the heat metrics because we can decide early that the
optimization doesn't apply.

Also change the `assert` to `on_internal_error`.

Fixes #14284

Closes #14300

(cherry picked from commit 732feca115)

Backport note: the original `assert` was never added to branch-5.3, but
the fix is still applicable, so I backported the fix and the
`on_internal_error` check.
2023-07-05 13:06:04 +02:00
Mikołaj Grzebieluch
82f70a1c19 raft topology: wait_for_peers_to_enter_synchronize_state doesn't need to resolve all IPs
Another node can stop after it joined the group0 but before it advertised itself
in gossip. `get_inet_addrs` will try to resolve all IPs and
`wait_for_peers_to_enter_synchronize_state` will loop indefinitely.

But `wait_for_peers_to_enter_synchronize_state` can return early if one of
the nodes confirms that the upgrade procedure has finished. For that, it doesn't
need the IPs of all group 0 members - only the IP of some nodes which can do
the confirmation.

This commit restructures the code so that IPs of nodes are resolved inside the
`max_concurrent_for_each` that `wait_for_peers_to_enter_synchronize_state` performs.
Then, even if some IPs won't be resolved, but one of the nodes confirms a
successful upgrade, we can continue.

Fixes #13543

(cherry picked from commit a45e0765e4)
2023-07-05 13:02:22 +02:00
Yaron Kaikov
0668dc25df release: prepare for 5.3.0-rc1 2023-07-02 11:28:58 +00:00
Anna Stuchlik
148655dc21 doc: fix rollback in 5.2-to-2023.1 upgrade guide
This commit fixes the Restore System Tables section
in the 5.2-to-2023.1 upgrade guide by adding a command
to clean upgraded SStables during rollback.

This is a bug (an incomplete command) and must be
backported to branch-5.3 and branch-5.2.

Refs: https://github.com/scylladb/scylla-enterprise/issues/3046

Closes #14373

(cherry picked from commit f4ae2c095b)
2023-06-29 12:07:17 +03:00
Botond Dénes
c89e5f06ba Merge 'readers: evictable_reader: don't accidentally consume the entire partition' from Kamil Braun
The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the previous buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between.

The code guranteeing this forward progress had a bug: the comparison between the position after the last buffer-fill and the current last fragment position was done in the wrong direction.

So if the condition that we wanted to achieve was already true, we would continue filling the buffer until partition end which may lead to OOMs such as in #13491.

There was already a fix in this area to handle `partition_start` fragments correctly - #13563 - but it missed that the position comparison was done in the wrong order.

Fix the comparison and adjust one of the tests (added in #13563) to detect this case.

After the fix, the evictable reader starts generating some redundant (but expected) range tombstone change fragments since it's now being paused and resumed. For this we need to adjust mutation source tests which were a bit too specific. We modify `flat_mutation_reader_assertions` to squash the redundant `r_t_c`s.

Fixes #13491

Closes #14375

* github.com:scylladb/scylladb:
  readers: evictable_reader: don't accidentally consume the entire partition
  test: flat_mutation_reader_assertions: squash `r_t_c`s with the same position

(cherry picked from commit 586102b42e)
2023-06-29 12:04:13 +03:00
Anna Stuchlik
b38d169f55 doc: add Ubuntu 22 to 2021.1 OS support
Fixes https://github.com/scylladb/scylla-enterprise/issues/3036

This commit adds support for Ubuntu 22.04 to the list
of OSes supported by ScyllaDB Enterprise 2021.1.

This commit fixex a bug and must be backported to
branch-5.3 and branch-5.2.

Closes #14372

(cherry picked from commit 74fc69c825)
2023-06-26 13:42:26 +03:00
Anna Stuchlik
5ac40ed1a8 doc: udpate the OSS docs landing page
Fixes https://github.com/scylladb/scylladb/issues/14333

This commit replaces the documentation landing page with
the Open Source-only documentation landing page.

This change is required as now there is a separate landing
page for the ScyllaDB documentation, so the page is duplicated,
creating bad user experience.

(cherry picked from commit f60f89df17)

Closes #14371
2023-06-23 14:01:19 +02:00
Avi Kivity
37e6e65211 Update seastar submodule (default priority class shares)
* seastar f94b1bb9cb...e45cef9ce8 (1):
  > reactor: change shares for default IO class from 1 to 200

Fixes #13753.
2023-06-21 21:21:29 +03:00
Avi Kivity
994645c03b Switch seastar submodule to scylla-seastar.git
This allows us to start backporting Seastar patches.

Ref #13753.
2023-06-21 21:20:23 +03:00
Botond Dénes
51ed9a0ec0 Merge 'doc: add OS support for ScyllaDB 5.3' from Anna Stuchlik
Fixes https://github.com/scylladb/scylladb/issues/14084

This commit adds OS support for version 5.3 to the table on the OS Support by Linux Distributions and Version page.

Closes #14228

* github.com:scylladb/scylladb:
  doc: remove OS support for outdated ScyllaDB versions 2.x and 3.x
  doc: add OS support for ScyllaDB 5.3

(cherry picked from commit aaac455ebe)
2023-06-15 10:33:28 +03:00
Raphael S. Carvalho
fa689c811e compaction: Fix sstable cleanup after resharding on refresh
Problem can be reproduced easily:
1) wrote some sstables with smp 1
2) shut down scylla
3) moved sstables to upload
4) restarted scylla with smp 2
5) ran refresh (resharding happens, adds sstable to cleanup
set and never removes it)
6) cleanup (tries to cleanup resharded sstables which were
leaked in the cleanup set)

Bumps into assert "Assertion `!sst->is_shared()' failed", as
cleanup picks a shared sstable that was leaked and already
processed by resharding.

Fix is about not inserting shared sstables into cleanup set,
as shared sstables are restricted to resharding and cannot
be processed later by cleanup (nor it should because
resharding itself cleaned up its input files).

Dtest: https://github.com/scylladb/scylla-dtest/pull/3206

Fixes #14001.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14147

(cherry picked from commit 156d771101)
2023-06-13 13:22:36 +03:00
Anna Stuchlik
18774b90a7 doc: remove support for Ubuntu 18
Fixes https://github.com/scylladb/scylladb/issues/14097

This commit removes support for Ubuntu 18 from
platform support for ScyllaDB Enterprise 2023.1.

The update is in sync with the change made for
ScyllaDB 5.2.

This commit must be backported to branch-5.2 and
branch-5.3.

Closes #14118

(cherry picked from commit b7022cd74e)
2023-06-13 12:06:04 +03:00
Avi Kivity
b20a85d651 Merge 'multishard_mutation_query: make reader_context::lookup_readers() exception safe' from Botond Dénes
With regards to closing the looked-up querier if an exception is thrown. In particular, this requires closing the querier if a semaphore mismatch is detected. Move the table lookup above the line where the querier is looked up, to avoid having to handle the exception from it. As a consequence of closing the querier on the error path, the lookup lambda has to be made a coroutine. This is sad, but this is executed once per page, so its cost should be insignificant when spread over an
entire page worth of work.

Also add a unit test checking that the mismatch is detected in the first place and that readers are closed.

Fixes: #13784

Closes #13790

* github.com:scylladb/scylladb:
  test/boost/database_test: add unit test for semaphore mismatch on range scans
  partition_slice_builder: add set_specific_ranges()
  multishard_mutation_query: make reader_context::lookup_readers() exception safe
  multishard_mutation_query: lookup_readers(): make inner lambda a coroutine

(cherry picked from commit 1c0e8c25ca)
2023-06-07 13:29:48 +03:00
Michał Chojnowski
1f0f3a4464 data_dictionary: fix forgetting of UDTs on ALTER KEYSPACE
Due to a simple programming oversight, one of keyspace_metadata
constructors is using empty user_types_metadata instead of the
passed one. Fix that.

Fixes #14139

Closes #14143

(cherry picked from commit 1a521172ec)
2023-06-06 21:52:23 +03:00
Kamil Braun
698ac3ac4e auth: don't use infinite timeout in default_role_row_satisfies query
A long long time ago there was an issue about removing infinite timeouts
from distributed queries: #3603. There was also a fix:
620e950fc8. But apparently some queries
escaped the fix, like the one in `default_role_row_satisfies`.

With the right conditions and timing this query may cause a node to hang
indefinitely on shutdown. A node tries to perform this query after it
starts. If we kill another node which is required to serve this query
right before that moment, the query will hang; when we try to shutdown
the querying node, it will wait for the query to finish (it's a
background task in auth service), which it never does due to infinite
timeout.

Use the same timeout configuration as other queries in this module do.

Fixes #13545.

Closes #14134

(cherry picked from commit f51312e580)
2023-06-06 19:38:53 +03:00
Raphael S. Carvalho
f975b7890e compaction: Fix incremental compaction for sstable cleanup
After c7826aa910, sstable runs are cleaned up together.

The procedure which executes cleanup was holding reference to all
input sstables, such that it could later retry the same cleanup
job on failure.

Turns out it was not taking into account that incremental compaction
will exhaust the input set incrementally.

Therefore cleanup is affected by the 100% space overhead.

To fix it, cleanup will now have the input set updated, by removing
the sstables that were already cleaned up. On failure, cleanup
will retry the same job with the remaining sstables that weren't
exhausted by incremental compaction.

New unit test reproduces the failure, and passes with the fix.

Fixes #14035.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14038

(cherry picked from commit 23443e0574)
2023-06-06 11:59:31 +03:00
Tzach Livyatan
5da2489e0e Remove Ubuntu 18.04 support from 5.2
Ubuntu [18.04 will be soon out of standard support](https://ubuntu.com/blog/18-04-end-of-standard-support), and can be removed from 5.2 supported list
https://github.com/scylladb/scylla-pkg/issues/3346

Closes #13529

(cherry picked from commit e655060429)
2023-05-30 16:25:15 +03:00
Anna Stuchlik
d9961fc6a2 doc: add the upgrade guide from 5.2 to 5.3
Fixes https://github.com/scylladb/scylladb/issues/13288

This commit adds the upgrade guide from ScyllaDB Open Source 5.2
to 5.3.
The details of the metric update will be added with a separate commit.

Closes #13960

(cherry picked from commit 508b68377e)
2023-05-24 10:01:26 +02:00
Beni Peled
3c3621db07 release: prepare for 5.3.0-rc0 2023-05-21 10:38:17 +03:00
Botond Dénes
3b424e391b Merge 'perform_cleanup: wait until all candidates are cleaned up' from Benny Halevy
cleanup_compaction should resolve only after all
sstables that require cleanup are cleaned up.

Since it is possible that some of them are in staging
and therefore cannot be cleaned up, retry once a second
until they become eligible.

Timeout if there is no progress within 5 minutes
to prevent hanging due to view building bug.

Fixes #9559

Closes #13812

* github.com:scylladb/scylladb:
  table: signal compaction_manager when staging sstables become eligible for cleanup
  compaction_manager: perform_cleanup: wait until all candidates are cleaned up
  compaction_manager: perform_cleanup: perform_offstrategy if needed
  compaction_manager: perform_cleanup: update_sstables_cleanup_state in advance
  sstable_set: add for_each_sstable_gently* helpers
2023-05-19 12:35:59 +03:00
Kefu Chai
031f770557 install.sh: use scylla-jmx for detecting JRE
now that scylla-jmx has a dedicated script for detecting the existence
of OpenJDK, and this script is included in the unified package, let's
just leverage it instead of repeating it in `install.sh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13514
2023-05-19 11:22:57 +03:00
Kefu Chai
8bb1f15542 test: sstable_3_x_test: avoid using helper using generation_type::int_t
this change is one of the series which drops most of the callers
using SSTable generation as integer. as the generation of SSTable
is but an identifier, we should not use it as an integer out of
generation_type's implementation. so, in this change, instead of
using `generation_type::int_t` in the helper functions, we just
pass `generation_type` in place of integer.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13931
2023-05-19 11:21:35 +03:00
Tomasz Grabiec
05be5e969b migration_manager: Fix snapshot transfer failing if TABLETS feature is not enabled
Without the feature, the system schema doesn't have the table, and the
read will fail with:

   Transferring snapshot to ... failed with: seastar::rpc::remote_verb_error (Can't find a column family tablets in keyspace system)

We should not attempt to read tablet metadata in the experimental
feature is not enabled.

Fixes #13946
Closes #13947
2023-05-19 09:58:56 +02:00
Botond Dénes
c2aee26278 Merge 'Keep sstables garbage collection in sstable_directory' from Pavel Emelyanov
Currently temporary directories with incomplete sstables and pending deletion log are processed by distributed loader on start. That's not nice, because for s3 backed sstables this code makes no sense (and is currently a no-op because of incomplete implementation). This garbage collecting should be kept in sstable_directory where it can off-load this work onto lister component that is storage-aware.

Once g.c. code moved, it allows to clean the class sstable list of static helpers a bit.

refs: #13024
refs: #13020
refs: #12707

Closes #13767

* github.com:scylladb/scylladb:
  sstable: Toss tempdir extension usage
  sstable: Drop pending_delete_dir_basename()
  sstable: Drop is_pending_delete_dir() helper
  sstable_directory: Make garbage_collect() non-static
  sstable_directory: Move deletion log exists check
  distributed_loader: Move garbage collecting into sstable_directory
  distributed_loader: Collect garbace collecting in one call
  sstable: Coroutinize remove_temp_dir()
  sstable: Coroutinize touch_temp_dir()
  sstable: Use storage::temp_dir instead of hand-crafted path
2023-05-19 08:50:13 +03:00
Jan Ciolek
1bcb4c024c cql3/expr: print expressions in user-friendly way by default
When a CQL expression is printed, it can be done using
either the `debug` mode, or the `user` mode.

`user` mode is basically how you would expect the CQL
to be printed, it can be printed and then parsed back.

`debug` mode is more detailed, for example in `debug`
mode a column name can be displayed as
`unresolved_identifier(my_column)`, which can't
be parsed back to CQL.

The default way of printing is the `debug` mode,
but this requires us to remember to enable the `user`
mode each time we're printing a user-facing message,
for example for an invalid_request_exception.

It's cumbersome and people forget about it,
so let's change the default to `user`.

There issues about expressions being printed
in a `strange` way, this fixes them.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #13916
2023-05-18 20:57:00 +03:00
Kamil Braun
64dc76db55 test: pylib: fix read_barrier implementation
The previous implementation didn't actually do a read barrier, because
the statement failed on an early prepare/validate step which happened
before read barrier was even performed.

Change it to a statement which does not fail and doesn't perform any
schema change but requires a read barrier.

This breaks one test which uses `RandomTables.verify_schema()` when only
one node is alive, but `verify_schema` performs a read barrier. Unbreak
it by skipping the read barrier in this case (it makes sense in this
particular test).

Closes #13933
2023-05-18 18:30:11 +02:00
Kamil Braun
13df85ea11 Merge 'Cut feature_service -> system_keyspace dependency' from Pavel Emelyanov
This implicit link it pretty bad, because feature service is a low-level
one which lots of other services depend on. System keyspace is opposite
-- a high-level one that needs e.g. query processor and database to
operate. This inverse dependency is created by the feature service need
to commit enabled features' names into system keyspace on cluster join.
And it uses the qctx thing for that in a best-effort manner (not doing
anything if it's null).

The dependency can be cut. The only place when enabled features are
committed is when gossiper enables features on join or by receiving
state changes from other nodes. By that time the
sharded<system_keyspace> is up and running and can be used.

Despite gossiper already has system keyspace dependency, it's better not
to overload it with the need to mess with enabling and persisting
features. Instead, the feature_enabler instance is equipped with needed
dependencies and takes care of it. Eventually the enabler is also moved
to feature_service.cc where it naturally belongs.

Fixes: #13837

Closes #13172

* github.com:scylladb/scylladb:
  gossiper: Remove features and sysks from gossiper
  system_keyspace: De-static save_local_supported_features()
  system_keyspace: De-static load_|save_local_enabled_features()
  system_keyspace: Move enable_features_on_startup to feature_service (cont)
  system_keyspace: Move enable_features_on_startup to feature_service
  feature_service: Open-code persist_enabled_feature_info() into enabler
  gms: Move feature enabler to feature_service.cc
  gms: Move gossiper::enable_features() to feature_service::enable_features_on_join()
  gms: Persist features explicitly in features enabler
  feature_service: Make persist_enabled_feature_info() return a future
  system_keyspace: De-static load_peer_features()
  gms: Move gossiper::do_enable_features to persistent_feature_enabler::enable_features()
  gossiper: Enable features and register enabler from outside
  gms: Add feature_service and system_keyspace to feature_enabler
2023-05-18 18:21:06 +02:00
Gleb Natapov
701d6941a5 storage_proxy: raft topology: use gossiper state to populate peers table
Some state that is used to fill in 'peeers' table is still propagated
over gossiper.  When moving a node into the normal state in raft
topology code use the data from the gossiper to populate peers table because
storage_service::on_change() will not do it in case the node was not in
normal state at the time it was called.

Fixes: #13911

Message-Id: <ZGYk/V1ymIeb8qMK@scylladb.com>
2023-05-18 16:00:29 +02:00
Pavel Emelyanov
5216dcb1b3 Merge 'db/system_keyspace: remove the dependency on storage_proxy' from Botond Dénes
The `system_keyspace` has several methods to query the tables in it. These currently require a storage proxy parameter, because the read has to go through storage-proxy. This PR uses the observation that all these reads are really local-replica reads and they only actually need a relatively small code snippet from storage proxy. These small code snippets are exported into standalone function in a new header (`replica/query.hh`). Then the system keyspace code is patched to use these new standalone functions instead of their equivalent in storage proxy. This allows us to replace the storage proxy dependency with a much more reasonable dependency on `replica::database`.

This PR patches the system keyspace code and the signatures of the affected methods as well as their immediate callers. Indirect callers are only patched to the extent it was needed to avoid introducing new includes (some had only a forward-declaration of storage proxy and so couldn't get database from it). There are a lot of opportunities left to free other methods or maybe even entire subsystems from storage proxy dependency, but this is not pursued in this PR, instead being left for follow-ups.

This PR was conceived to help us break the storage proxy -> storage service -> system tables -> storage proxy dependency loop, which become a major roadblock in migrating from IP -> host_id. After this PR, system keyspace still indirectly depends on storage proxy, because it still uses `cql3::query_processor` in some places. This will be addressed in another PR.

Refs: #11870

Closes #13869

* github.com:scylladb/scylladb:
  db/system_keyspace: remove dependency on storage_proxy
  db/system_keyspace: replace storage_proxy::query*() with  replica:: equivalent
  replica: add query.hh
2023-05-18 10:53:27 +03:00
Raphael S. Carvalho
38b226f997 Resurrect optimization to avoid bloom filter checks during compaction
Commit 8c4b5e4283 introduced an optimization which only
calculates max purgeable timestamp when a tombstone satisfy the
grace period.

Commit 'repair: Get rid of the gc_grace_seconds' inverted the order,
probably under the assumption that getting grace period can be
more expensive than calculating max purgeable, as repair-mode GC
will look up into history data in order to calculate gc_before.

This caused a significant regression on tombstone heavy compactions,
where most of tombstones are still newer than grace period.
A compaction which used to take 5s, now takes 35s. 7x slower.

The reason is simple, now calculation of max purgeable happens
for every single tombstone (once for each key), even the ones that
cannot be GC'ed yet. And each calculation has to iterate through
(i.e. check the bloom filter of) every single sstable that doesn't
participate in compaction.

Flame graph makes it very clear that bloom filter is a heavy path
without the optimization:
    45.64%    45.64%  sstable_compact  sstable_compaction_test_g
        [.] utils::filter::bloom_filter::is_present

With its resurrection, the problem is gone.

This scenario can easily happen, e.g. after a deletion burst, and
tombstones becoming only GC'able after they reach upper tiers in
the LSM tree.

Before this patch, a compaction can be estimated to have this # of
filter checks:
(# of keys containing *any* tombstone) * (# of uncompacting sstable
runs[1])

[1] It's # of *runs*, as each key tend to overlap with only one
fragment of each run.

After this patch, the estimation becomes:
(# of keys containing a GC'able tombstone) * (# of uncompacting
runs).

With repair mode for tombstone GC, the assumption, that retrieval
of gc_before is more expensive than calculating max purgeable,
is kept. We can revisit it later. But the default mode, which
is the "timeout" (i.e. gc_grace_seconds) one, we still benefit
from the optimization of deferring the calculation until
needed.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13908
2023-05-18 09:01:50 +03:00
Kefu Chai
03be1f438c sstables: move get_components_lister() into sstable_directory
sstables_manager::get_component_lister() is used by sstable_directory.
and almost all the "ingredients" used to create a component lister
are located in sstable_directory. among the other things, the two
implementations of `components_lister` are located right in
`sstable_directory`. there is no need to outsource this to
sstables_manager just for accessing the system_keyspace, which is
already exposed as a public function of `sstables_manager`. so let's
move this helper into sstable_directory as a member function.

with this change, we can even go further by moving the
`components_lister` implementations into the same .cc file.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13853
2023-05-18 08:43:35 +03:00
Botond Dénes
88a2421961 Merge 'Generalize global table pointer' from Pavel Emelyanov
There are several places that need to carry a pointer to a table that's shard-wide accessible -- database snapshot and truncate code and distributed loader. The database code uses `get_table_on_all_shards()` returning a vector of foreign lw-pointers, the loader code uses its own global_column_family_ptr class.

This PR generalizes both into global_table_ptr facility.

Closes #13909

* github.com:scylladb/scylladb:
  replica: Use global_table_ptr in distributed loader
  replica: Make global_table_ptr a class
  replica: Add type alias for vector of foreign lw-pointers
  replica: Put get_table_on_all_shards() to header
  replica: Rewrite get_table_on_all_shards()
2023-05-18 08:42:04 +03:00
Kefu Chai
8bcbc9a90d sstables: add an maybe_owned_by_this_shard() helper
instead of encoding the fact that we are using generation identifier
as a hint where the SSTable with this generation should be processed
at the caller sites of `as_int()`, just provide an accessor on
sstable_generation_generator's side. this helps to encapsulate the
underlying type of generation in `generation_type` instead of exposing
it to its users.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13846
2023-05-18 08:41:02 +03:00
Benny Halevy
8a7e77e0ed gossiper: is_alive: fix use-after-move if endpoint is unknown
`ep` is std::move'ed to get_endpoint_state_for_endpoint_ptr
but it's used later for logger.warn()

Fixes #13921

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13920
2023-05-17 21:57:26 +03:00
Pavel Emelyanov
c3fca9481c replica: Use global_table_ptr in distributed loader
The loader has very similar global_column_family_ptr class for its
distributed loadings. Now it can use the "standard" one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 18:14:34 +03:00
Pavel Emelyanov
d7f99d031d replica: Make global_table_ptr a class
Right now all users of global_table know it's a vector and reference its
elements with this_shard_id() index. Making the global_table_ptr a class
makes it possible to stop using operator[] and "index" this_shard_id()
in its -> and * operators.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 18:14:34 +03:00
Pavel Emelyanov
b4a8843907 replica: Add type alias for vector of foreign lw-pointers
This is to convert the global_table_ptr into a class with less bulky
patch further

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 18:14:34 +03:00
Pavel Emelyanov
fffe3e4336 replica: Put get_table_on_all_shards() to header
This is to share it with distributed loader some time soon.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 18:14:34 +03:00
Pavel Emelyanov
f974617c79 replica: Rewrite get_table_on_all_shards()
Use sharded<database>::invoke_on_all() instead of open-coded analogy.
Also don't access database's _column_families directly, use the
find_column_family() method instead.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 18:14:34 +03:00
Pavel Emelyanov
ed50fda1fe sstable: Toss tempdir extension usage
The tempdir for filesystem-based sstables is {generation}.sstable one.
There are two places that need to know the ".sstable" extention -- the
tempdir creating code and the tempdir garbage-collecting code.

This patch simplifies the sstable class by patching the aforementioned
functions to use newly introduced tempdir_extension string directly,
without the help of static one-line helpers.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 15:19:38 +03:00
Pavel Emelyanov
e8c0ae28b5 sstable: Drop pending_delete_dir_basename()
The helper is used to return const char* value of the pending delete
dir. Callers can use it directly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 15:17:33 +03:00
Pavel Emelyanov
7792479865 sstable: Drop is_pending_delete_dir() helper
It's only used by the sstable_directory::replay_pending_delete_log()
method. The latter is only called by the sstable_directory itself with
the path being pending-delete dir for sure. So the method can be made
private and the is_pending_delete_dir() can be removed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 15:17:32 +03:00
Pavel Emelyanov
7429205632 sstable_directory: Make garbage_collect() non-static
When non static the call can use sstable_directory::_sstable_dir path,
not the provided argument. The main benefit is that the method can later
be moved onto lister so that filesystem and ownership-table listers can
process dangling bits differently.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 15:16:23 +03:00
Pavel Emelyanov
45adf61490 sstable_directory: Move deletion log exists check
Check if the deletion log exists in the handling helper, not outside of
it. This makes next patch shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 15:16:23 +03:00
Pavel Emelyanov
3d7122d2fe distributed_loader: Move garbage collecting into sstable_directory
It's the directory that owns the components lister and can reason about
the way to pick up dangling bits, be it local directories or entries
from the ownership table.

First thing to do is to move the g.c. code into sstable_directory. While
at it -- convert ssting dir into fs::path dir and switch logger.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 15:16:23 +03:00
Pavel Emelyanov
99f924666f distributed_loader: Collect garbace collecting in one call
When the loader starts it first scans the directory for sstables'
tempdirs and pending deletion logs. Put both into one call so that it
can be moved more easily later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 15:16:23 +03:00
Pavel Emelyanov
22299a31c8 sstable: Coroutinize remove_temp_dir()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 15:16:23 +03:00
Pavel Emelyanov
9db5e9f77f sstable: Coroutinize touch_temp_dir()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 15:15:38 +03:00
Pavel Emelyanov
7e506354fd sstable: Use storage::temp_dir instead of hand-crafted path
When opening an sstable on filesystem it's first created in a temporary
directory whose path is saved in storage::temp_dir variable. However,
the opening method constructs the path by hand. Fix that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 15:14:04 +03:00
Anna Stuchlik
6f4a68175b doc: fix the links to the Enterprise docs
Fixes https://github.com/scylladb/scylladb/issues/13915

This commit fixes broken links to the Enterprise docs.
They are links to the enterprise branch, which is not
published. The links to the Enterprise docs should include
"stable" instead of the branch name.

This commit must be backported to branch-5.2, because
the broken links are present in the published 5.2 docs.

Closes #13917
2023-05-17 13:56:21 +03:00
Benny Halevy
bb59687116 table: signal compaction_manager when staging sstables become eligible for cleanup
perform_cleanup may be waiting for those sstables
to become eligible for cleanup so signal it
when table::move_sstables_from_staging detects an
sstable that requires cleanup.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-17 11:33:22 +03:00
Benny Halevy
a5a8020ecd compaction_manager: perform_cleanup: wait until all candidates are cleaned up
cleanup_compaction should resolve only after all
sstables that require cleanup are cleaned up.

Since it is possible that some of them are in staging
and therefore cannot be cleaned up, retry once a second
until they become eligible.

Timeout if there is no progress within 5 minutes
to prevent hanging due to view building bug.

Fixes #9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-17 11:31:07 +03:00
Benny Halevy
be4e23437f compaction_manager: perform_cleanup: perform_offstrategy if needed
It is possible that cleanup will be executed
right after repair-based node operations,
in which case we have a 5 minutes timer
before off-strategy compaction is started.

After marking the sstables that need cleanup,
perform offstrategy compaction, if needed.
This will implicitly cleanup those sstables
as part of offstrategy compaction, before
they are even passed for view update (if the table
has views/secondary index).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-17 11:31:07 +03:00
Benny Halevy
53fbf9dd32 compaction_manager: perform_cleanup: update_sstables_cleanup_state in advance
Scan all sstables to determine which of them
requires cleanup before calling perform_task_on_all_files.

This allows for cheaper no-op return when
no sstable was identified as requiring cleanup,
and also it will allow triggering offstrategy
compaction if needed, after selecting the sstables
for cleanup, in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-17 11:31:07 +03:00
Benny Halevy
ff7c9c661d sstable_set: add for_each_sstable_gently* helpers
Currently callers of `for_each_sstable` need to
use a seastar thread to allow preemption
in the for_each_sstable loop.

Provide for_each_sstable_gently and
for_each_sstable_gently_until to make using this
facility from a coroutine easier, without requiring
a seastar thread.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-17 11:31:07 +03:00
Kefu Chai
6cd745fd8b build: cmake: add missing test
string_format_test was added in 1b5d5205c8,
so let's add it to CMake building system as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13912
2023-05-17 09:51:51 +03:00
Raphael S. Carvalho
5544d12f18 compaction: avoid excessive reallocation and during input list formatting
with off-strategy, input list size can be close to 1k, which will
lead to unneeded reallocations when formatting the list for
logging.

in the past, we faced stalls in this area, and excessive reallocation
(log2 ~1k = ~10) may have contributed to that.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13907
2023-05-17 09:40:06 +03:00
Benny Halevy
302a89488a test: sstable_3_x_test: add test_compression_premature_eof
Reproduces #13599 and verifies the fix.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13903
2023-05-17 09:00:44 +03:00
Gleb Natapov
605e53e617 do not report raft as enabled before group0 is configured
Currently we may start to receive requests before group0 is configured
during boot.  If that happens those requests may try to pull schema and
issue raft read barrier which will crash the system because group0 is
not yet available. Workaround it by pretending the raft is disabled in
this case and use non raft procedure. The proper fix should make sure
that storage proxy verbs are registered only after group0 is fully
functional.

Message-Id: <ZGOZkXC/MsiWtNGu@scylladb.com>
2023-05-17 01:06:42 +02:00
Michał Chojnowski
9b0679c140 range_tombstone_change_generator: fix an edge case in flush()
range_tombstone_change_generator::flush() mishandles the case when two range
tombstones are adjacent and flush(pos, end_of_range=true) is called with pos
equal to the end bound of the lesser-position range tombstone.

In such case, the start change of the greater-position rtc will be accidentally
emitted, and there won't be an end change, which breaks reader assumptions by
ending the stream with an unclosed range tombstone, triggering an assertion.

This is due to a non-strict inequality used in a place where strict inequality
should be used. The modified line was intended to close range tombstones
which end exactly on the flush position, but this is unnecessary because such
range tombstones are handled by the last `if` in the function anyway.
Instead, this line caused range tombstones beginning right after the flush
position to be emitted sometimes.

Fixes #12462

Closes #13906
2023-05-16 17:54:08 +02:00
Nadav Har'El
24c3cbcb0b Merge 'Improve verbosity of test/pylib/minio.py' from Pavel Emelyanov
CI once failed due to mc being unable to configure minio server. There's currently no glues why it could happen, let's increase the minio.py verbosity a bit

refs: #13896

Closes #13901

* github.com:scylladb/scylladb:
  test,minio: Run mc with --debug option
  test,minio: Log mc operations to log file
2023-05-16 18:04:36 +03:00
Nadav Har'El
52e4edfd5e Merge 'cql: update permissions when creating/altering a function/keyspace' from Wojciech Mitros
Currently, when a user creates a function or a keyspace, no
permissions on functions are update.
Instead, the user should gain all permissions on the function
that they created, or on all functions in the keyspace they have
created. This is also the behavior in Cassandra.

However, if the user is granted permissions on an function after
performing a CREATE OR REPLACE statement, they may
actually only alter the function but still gain permissions to it
as a result of the approach above, which requires another
workaround added to this series.

Lastly, as of right now, when a user is altering a function, they
need both CREATE and ALTER permissions, which is incompatible
with Cassandra - instead, only the ALTER permission should be
required.

This series fixes the mentioned issues, and the tests are already
present in the auth_roles_test dtest.

Fixes #13747

Closes #13814

* github.com:scylladb/scylladb:
  cql: adjust tests to the updated permissions on functions
  cql: fix authorization when altering a function
  cql: grant permissions on functions when creating a keyspace/function
  cql: pass a reference to query processor in grant_permissions_to_creator
  test_permissions: make tests pass on cassandra
2023-05-16 18:04:35 +03:00
Avi Kivity
d2d53fc1db Merge 'Do not yield while traversing the gossiper endpoint state map' from Benny Halevy
This series introduces a new gossiper method: get_endpoints that returns a vector of endpoints (by value) based on the endpoint state map.

get_endpoints is used here by gossiper and storage_service for iterations that may preempt
instead of iterating direction over the endpoint state map (`_endpoint_state_map` in gossiper or via `get_endpoint_states()`) so to prevent use-after-free that may potentially happen if the map is rehashed while the function yields causing invalidation of the loop iterators.

Fixes #13899

Closes #13900

* github.com:scylladb/scylladb:
  storage_service: do not preempt while traversing endpoint_state_map
  gossiper: do not preempt while traversing endpoint_state_map
2023-05-16 18:04:35 +03:00
Botond Dénes
3ea521d21b Update tools/jmx submodule
* tools/jmx f176bcd1...1fd23b60 (1):
  > select-java: query java version using -XshowSettings
2023-05-16 18:04:35 +03:00
Kamil Braun
5a8e2153a0 Merge 'Fix heart_beat_state::force_highest_possible_version_unsafe' from Benny Halevy
It turns out that numeric_limits defines an implicit implementation
for std::numeric_limits<utils::tagged_integer<Tag, ValueType>>
which apprently returns a default-constructed tagged_integer
for min() and max(), and this broke
`gms::heart_beat_state::force_highest_possible_version_unsafe()`
since [gms: heart_beat_state: use generation_type and version_type](4cdad8bc8b)
(merged in [Merge 'gms: define and use generation and version types'...](7f04d8231d))

Implementing min/max correctly
Fixes #13801

Closes #13880

* github.com:scylladb/scylladb:
  storage_service: handle_state_normal: on_internal_error on "owns no tokens"
  utils: tagged_integer: implement std::numeric_limits::{min,max}
  test: add tagged_integer_test
2023-05-16 13:59:41 +02:00
Wojciech Mitros
6bc16047ba rust: update wasmtime dependency
The previous version of wasmtime had a vulnerability that possibly
allowed causing undefined behavior when calling UDFs.

We're directly updating to wasmtime 8.0.1, because the update only
requires a slight code modification and the Wasm UDF feature is
still experimental. As a result, we'll benefit from a number of
new optimizations.

Fixes #13807

Closes #13804
2023-05-16 13:03:29 +03:00
Pavel Emelyanov
29fffaa160 schema_tables: Use sharded<database>& variable
The auto& db = proxy.local().get_db() is called few lines above this
patch, so the &db can be reused for invoke_on_all() call.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13896
2023-05-16 12:57:47 +03:00
Benny Halevy
1da0b0ff76 storage_service: do not preempt while traversing endpoint_state_map
The map iterators might be invalidated while yielding
on insert if the map is rehashed.
See https://en.cppreference.com/w/cpp/container/unordered_map/insert

Refs #13899

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-16 12:24:44 +03:00
Benny Halevy
ba13056eba gossiper: do not preempt while traversing endpoint_state_map
The map iterators might be invalidated while yielding
on insert if the map is rehashed.
See https://en.cppreference.com/w/cpp/container/unordered_map/insert

Refs #13899

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-16 12:24:42 +03:00
Pavel Emelyanov
01628ae8c1 test,minio: Run mc with --debug option
With that if mc fails we'll (hopefully) get some meaningful information
about why it happened.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-16 12:16:15 +03:00
Pavel Emelyanov
4041c2f30d test,minio: Log mc operations to log file
Currently everything minio.py does goes to test.py log, while mc (and
minio) output go to another log file. That's inconvenient, better to
keep minio.py's messages in minio log file.

Also, while at it, print a message if local alias drop fails (it's
benign failure, but it's good to have the note anyway).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-16 12:14:49 +03:00
Kefu Chai
67dae95f58 build: cmake: add Scylla_USE_LINKER option
this option allows user to use specified linker instead of the
default one. this is more flexible than adding more linker
candidates to the known linkers.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13874
2023-05-16 11:30:18 +03:00
Tzach Livyatan
a73fde6888 Update Azure recommended instances type from the Lsv2-series to the Lsv3-series
Closes #13835
2023-05-16 10:58:19 +03:00
Avi Kivity
3c54d5ec5e test: string_format_test: don't compare std::string with sstring
For unknown reasons, clang 16 rejects equality comparison
(operator==) where the left-hand-side is an std::string and the
right-hand-side is an sstring. gcc and older clang versions first
convert the left-hand-side to an sstring and then call the symmetric
equality operator.

I was able to hack sstring to support this assymetric comparison,
but the solution is quite convoluted, and it may be that it's clang
at fault here. So instead this patch eliminates the three cases where
it happened. With is applied, we can build with clang 16.

Closes #13893
2023-05-16 08:56:16 +03:00
Kefu Chai
b112a3b78a api: storage_service: use string for generation
in this change, the type of the "generation" field of "sstable" in the
return value of RESTful API entry point at
"/storage_service/sstable_info" is changed from "long" to "string".

this change depends on the corresponding change on tools/jmx submodule,
so we have to include the submodule change in this very commit.

this API is used by our JMX exporter, which in turn exposes the
SSTable information via the "StorageService.getSSTableInfo" mBean
operation, which returns the retrieved SSTable info as a list of
CompositeData. and "generation" is a field of an element in the
CompositeData. in general, the scylla JMX exporter is consumed
by the nodetool, which prints out returned SSTable info list with
a pretty formatted table, see
tools/java/src/java/org/apache/cassandra/tools/nodetool/SSTableInfo.java.
the nodetool's formatter is not aware of the schema or type of the
SSTables to be printed, neither does it enforce the type -- it just
tries it best to pretty print them as a tabular.

But the fields in CompositeData is typed, when the scylla JMX exporter
translates the returned SSTables from the RESTful API, it sets the
typed fields of every `SSTableInfo` when constructing `PerTableSSTableInfo`.
So, we should be consistent on the type of "generation" field on both
the JMX and the RESTful API sides. because we package the same version
of scylla-jmx and nodetool in the same precompiled tarball, and enforce
the dependencies on exactly same version when shipping deb and rpm
packages, we should be safe when it comes to interoperability of
scylla-jmx and scylla. also, as explained above, nodetool does not care
about the typing, so it is not a problem on nodetool's front.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13834
2023-05-15 20:33:48 +03:00
Botond Dénes
646396a879 mutation/mutation_partition: append_clustered_row(): use on_internal_error()
Instead of simply throwing an exception. With just the exception, it is
impossible to find out what went wrong, as this API is very generic and
is used in a variety of places. The backtrace printed by
`on_internal_error()` will help zero in on the problem.

Fixes: #13876

Closes #13883
2023-05-15 20:31:44 +03:00
Calle Wilund
469e710caa docs: Add initial doc on commitlog segment file format
Refs #12849

Just a few lines on the file format of segments.

Closes #13848
2023-05-15 16:22:44 +03:00
Benny Halevy
502b5522ca storage_service: handle_state_normal: on_internal_error on "owns no tokens"
Although this condition should not happen,
we suspect that certain timing conditions might
lead this state of node in handle_normal_state
(possibly when shutdown) has no tokens.

Currently we call on_internal_error_noexcept, so
if abort_on_internal_error is false, we will just
print an error and continue on with handle_state_normal.

Change that to `on_internal_error` so to throw an
exception in production in this unexpected state.

Refs #13801

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-15 12:49:17 +03:00
Anna Stuchlik
84ed95f86f doc: add OS support for version 2023.1
Fixes https://github.com/scylladb/scylladb/issues/13857

This commit adds the OS support for ScyllaDB Enterprise 2023.1.
The support is the same as for ScyllaDB Open Source 5.2, on which
2023.1 is based.

After this commit is merged, it must be backported to branch-5.2.
In this way, it will be merged to branch-2023.1 and available in
the docs for Enterprise 2023.1

Closes: #13858
2023-05-15 10:51:53 +03:00
Alejo Sanchez
19687b54f1 test/pytest: yaml configuration cluster section
Separate cluster_size into a cluster section and specify this value as
initial_size.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13440
2023-05-15 09:48:39 +02:00
Benny Halevy
a70b53b6e7 utils: tagged_integer: implement std::numeric_limits::{min,max}
Add add a respective unit test.

It turns out that numeric_limits defines an implicit implementation
for std::numeric_limits<utils::tagged_integer<Tag, ValueType>>
which apprently returns a default-constructed tagged_integer
for min() and max(), and this broke
`gms::heart_beat_state::force_highest_possible_version_unsafe()`
since 4cdad8bc8b
(merged in 7f04d8231d)

Implementing min/max correctly
Fixes #13801

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-15 10:19:39 +03:00
Botond Dénes
0cff0ffa08 Merge 'alternator,config: make alternator_timeout_in_ms live-updateable' from Kefu Chai
before this change, alternator_timeout_in_ms is not live-updatable,
as after setting executor's default timeout right before creating
sharded executor instances, they never get updated with this option
anymore. but many users would like to set the driver timers based on
server timers. we need to enable them to configure timeout even
when the server is still running.

in this change,

* `alternator_timeout_in_ms` is marked as live-updateable
* `executor::_s_default_timeout` is changed to a thread_local variable,
   so it can be updated by a per-shard updateable_value. and
   it is now a updateable_value, so its variable name is updated
   accordingly. this value is set in the ctor of executor, and
   it is disconnected from the corresponding named_value<> option
   in the dtor of executor.
* alternator_timeout_in_ms is passed to the constructor of
   executor via sharded_parameter, so `executor::_timeout_in_ms` can
   be initialized on per-shard basis
* `executor::set_default_timeout()` is dropped, as we already pass
   the option to executor in its ctor.

Fixes #12232

Closes #13300

* github.com:scylladb/scylladb:
  alternator: split the param list of executor ctor into multi lines
  alternator,config: make alternator_timeout_in_ms live-updateable
2023-05-15 10:16:29 +03:00
Botond Dénes
6c27297406 Merge 'test: sstable_*test: use generator to create new generations' from Kefu Chai
in this series, instead of hardwiring to integer, we switch to generation generator for creating new generations. this should helps us to migrate to a generation identifier which can also represented by UUID. and potentially can help to improve the testing coverage once we switch over to UUID-based generation identifier. will need to parameterize these tests by then, for sure.

Closes #13863

* github.com:scylladb/scylladb:
  test: sstable: use generator to generate generations
  test: sstable: pass generation_type in helper functions
  test: sstable: use generator to generate generations
2023-05-15 10:04:30 +03:00
Botond Dénes
3256afe263 Update tools/jmx submodule
* tools/jmx 5f988945...f176bcd1 (1):
  > sstableinfo: change the type of generation to string

Refs: #13834
2023-05-15 09:59:40 +03:00
Asias He
93c93c69f9 repair: Add per peer node error for get_sync_boundary and friends
It is useful to know which node has the error. For example, when a node
has a corrupted sstable, with this patch, repair master node can tell
which node has the corrupted sstable.

```
WARN  2023-05-15 10:54:50,213 [shard 0] repair -
repair[2df49b2c-219d-411d-87c6-2eae7073ba61]: get_combined_row_hash: got
error from node=127.0.0.2, keyspace=ks2a, table=tb,
range=(8992118519279586742,9031388867920791714],
error=seastar::rpc::remote_verb_error (some error)
```

Fixes #13881

Closes #13882
2023-05-15 09:52:27 +03:00
Pavel Emelyanov
07b7e9faf1 load-meter: Remove unused get_load_string
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13873
2023-05-15 09:21:08 +03:00
Piotr Dulikowski
760651b4ad error injection: allow enabling injections via config
Currently, error injections can be enabled either through HTTP or CQL.
While these mechanisms are effective for injecting errors after a node
has already started, it can't be reliably used to trigger failures
shortly after node start. In order to support this use case, this commit
adds possibility to enable some error injections via config.

A configuration option `error_injections_at_startup` is added. This
option uses our existing configuration framework, so it is possible to
supply it either via CLI or in the YAML configuration file.

- When passed in commandline, the option is parsed as a
  semicolon-separated list of error injection names that should be
  enabled. Those error injections are enabled in non-oneshot mode.

  The CLI option is marked as not used in release mode and does not
  appear in the option list.

  Example:

      --error-injections-at-startup failure_point1;failure_point2

- When provided in YAML config, the option is parsed as a list of items.
  Each item is either a string or a map or parameters. This method is
  more flexible as it allows to provide parameters for each injection
  point. At this time, the only benefit is that it allows enabling
  points in oneshot mode, but more parameters can be added in the future
  if needed.

  Explanatory example:

      error_injections_at_startup:
      - failure_point1 # enabled in non-oneshot mode
      - name: failure_point2 # enabled in oneshot mode
        one_shot: true       # due to one_shot optional parameter

The primary goal of this feature is to facilitate testing of raft-based
cluster features. An error injection will be used to enable an
additional feature to simulate node upgrade.

Tests: manual

Closes #13861
2023-05-15 09:14:07 +03:00
Botond Dénes
1b04fc1425 Merge 'Use member initializer list for trace_state and related helper classes' from Pavel Emelyanov
Constructors of trace_state class initialize most of the fields in constructor body with the help of non-inline helper method. It's possible and is better to initialize as much as possible with initializer lists.

Closes #13871

* github.com:scylladb/scylladb:
  tracing: List-initialize trace_state::_records
  tracing: List-initialize trace_state::_props
  tracing: List-initialize trace_state::_slow_query_threshold
  tracing: Reorder trace_state fields initialization
  tracing: Remove init_session_records()
  tracing: List-initialize one_session_records::ttl
  tracing: List-initialize one_session_records
  tracing: List-initialize session_record
2023-05-15 09:06:14 +03:00
Botond Dénes
20ff122a84 Merge 'Delete S3 sstables without the help of deletion log' from Pavel Emelyanov
There are two layers of stables deletion -- delete-atomically and wipe. The former is in fact the "API" method, it's called by table code when the specific sstable(s) are no longer needed. It's called "atomically" because it's expected to fail in the middle in a safe manner so that subsequent boot would pick the dangling parts and proceed. The latter is a low-level removal function that can fail in the middle, but it's not of _its_ care.

Currently the atomic deletion is implemented with the help of sstable_directory::delete_atomically() method that commits sstables files names into deletion log, then calls wipe (indirectly), then drops the deletion log. On boot all found deletion logs are replayed. The described functionality is used regardless of the sstable storage type, even for S3, though deletion log is an overkill for S3, it's better be implemented with the help of ownership table. In fact, S3 storage already implements atomic deletion in its wipe method thus being overly careful.

So this PR
- makes atomic deletion be storage-specific
- makes S3 wipe non-atomic

fixes: #13016
note: Replaying sstables deletion from ownership table on boot is not here, see #13024

Closes #13562

* github.com:scylladb/scylladb:
  sstables: Implement atomic deleter for s3 storage
  sstables: Get atomic deleter from underlying storage
  sstables: Move delete_atomically to manager and rename
2023-05-15 08:57:47 +03:00
Benny Halevy
1b5d5205c8 test: add tagged_integer_test
Add basic test for tagged+integer arithmetic operations.

Remove const qualifier from `tagged_integer::operator[+-]=`
as these are add/sub-assign operators that need to modify
the value in place.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-14 23:26:58 +03:00
Wojciech Mitros
96e912e1cf auth: disallow CREATE permission on a specific function
Similarly to how we handle Roles and Tables, we do not
allow permissions on non-existent objects, so the CREATE
permission on a specific function is meaningless, because
for the permission to be granted to someone, the function
must be already created.
This patch removes the CREATE permission from the set of
permissions applicable to a specific function.

Fixes #13822

Closes #13824
2023-05-14 18:40:34 +03:00
Wojciech Mitros
1e18731a69 cql-pytest: translate Cassandra's UFTypesTest
This is a translation of Cassandra's CQL unit test source file
validation/entities/UFTypesTest.java into our cql-pytest framework.

There are 7 tests, which reproduce one known bug:
Refs #13746: UDF can only be used in SELECT, and abort when used in WHERE, or in INSERT/UPDATE/DELETE commands

And uncovered two previously unknown bugs:

Refs #13855: UDF with a non-frozen collection parameter cannot be called on a frozen value
Refs #13860: A non-frozen collection returned by a UDF cannot be used as a frozen one

Additionally, we encountered an issue that can be treated as either a bug or a hole in documentation:

Refs #13866: Argument and return types in UDFs can be frozen

Closes #13867
2023-05-14 15:22:03 +03:00
Avi Kivity
31e820e5a1 Merge 'Allow tombstone GC in compaction to be disabled on user request' from Raphael "Raph" Carvalho
Adding new APIs /column_family/tombstone_gc and /storage_service/tombstone_gc, that will allow for disabling tombstone garbage collection (GC) in compaction.

Mimicks existing APIs /column_family/autocompaction and /storage_service/autocompaction.

column_family variant must specify a single table only, following existing convention.

whereas the storage_service one can specify an entire keyspace, or a subset of a tables in a keyspace.

column_family API usage
-----

```
    The table name must be in keyspace:name format

    Get status:
    curl -s -X GET "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"

    Enable GC
    curl -s -X POST "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"

    Disable GC
    curl -s -X DELETE "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"
```

storage_service API usage
-----

```
    Tables can be specified using a comma-separated list.

    Enable GC on keyspace
    curl -s -X POST "http://127.0.0.1:10000/storage_service/tombstone_gc/ks"

    Disable GC on keyspace
    curl -s -X DELETE "http://127.0.0.1:10000/storage_service/tombstone_gc/ks"

    Enable GC on a subset of tables
    curl -s -X POST
    "http://127.0.0.1:10000/storage_service/tombstone_gc/ks?cf=table1,table2"
```

Closes #13793

* github.com:scylladb/scylladb:
  test: Test new API for disabling tombstone GC
  test: rest_api: extract common testing code into generic functions
  Add API to disable tombstone GC in compaction
  api: storage_service: restore indentation
  api: storage_service: extract code to set attribute for a set of tables
  tests: Test new option for disabling tombstone GC in compaction
  compaction_strategy: bypass tombstone compaction if tombstone GC is disabled
  table: Allow tombstone GC in compaction to be disabled on user request
2023-05-14 14:16:16 +03:00
Tomasz Grabiec
a91e83fad6 Merge "issue raft read barrier before pulling schema" from Gleb
Schema pull may fail because the pull does not contain everything that
is needed to instantiate a schema pointer. For instance it does not
contain a keyspace. This series changes the code to issue raft read
barrier before the pull which will guaranty that the keyspace is created
before the actual schema pull is performed.
2023-05-14 14:14:24 +03:00
Raphael S. Carvalho
a7ceb987f5 test: Fix sporadic failures of database_test
database_test is failing sporadically and the cause was traced back
to commit e3e7c3c7e5.

The commit forces a subset of tests in database_test, to run once
for each of predefined x_log2_compaction_group settings.

That causes two problems:
1) test becomes 240% slower in dev mode.
2) queries on system.auth is timing out, and the reason is a small
table being spread across hundreds of compaction groups in each
shard. so to satisfy a range scan, there will be multiple hops,
making the overhead huge. additionally, the compaction group
aware sstable set is not merged yet. so even point queries will
unnecessarily scan through all the groups.

Fixes #13660.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13851
2023-05-14 14:14:24 +03:00
Avi Kivity
97694d26c4 Merge 'reader_permit: minor improvements to resource consume/release safety' from Botond Dénes
This PR contains some small improvements to the safety of consuming/releasing resources to/from the semaphore:
* reader_permit: make the low-level `consume()/signal()` API private, making the only user (an RAII class) friend.
* reader_resources: split `reset()` into `noexcept` and potentially throwing variant.
* reader_resources::reset_to(): try harder to avoid calling `consume()` (when the new resource amount is smaller then the previous one)

Closes #13678

* github.com:scylladb/scylladb:
  reader_permit: resource_units::reset_to(): try harder to avoid calling consume()
  reader_permit: split resource_units::reset()
  reader_permit: make consume()/signal() API private
2023-05-14 14:14:23 +03:00
Avi Kivity
5d6f31df8e Merge 'Coroutinize sstable::read_toc()' from Pavel Emelyanov
It consists of two parts -- call for do_read_simple() with lambda and handling of its results. PR coroutinizes it in two steps for review simplicity -- first the lambda, then the outer caller. Then restores indentation.

Closes #13862

* github.com:scylladb/scylladb:
  sstables: Restore indentation after previous patches
  sstables: Coroutinuze read_toc() outer part
  sstables: Coroutinuze read_toc() inner part
2023-05-14 14:14:23 +03:00
Avi Kivity
0a78995e2b Merge 'Share s3 clients between sstables' from Pavel Emelyanov
Currently s3::client is created for each sstable::storage. It's later shared between sstable's files and upload sink(s). Also foreign_sstable_open_info can produce a file from a handle making a new standalone client. Coupled with the seastar's http client spawning connections on demand, this makes it impossible to control the amount of opened connections to object storage server.

In order to put some policy on top of that (as well as apply workload prioritization) s3 clients should be collected in one place and then shared by users. Since s3::client uses seastar::http::client under the hood which, in turn, can generate many connections on demand, it's enough to produce a single s3::client per configured endpoint one each shard and then share it between all the sstables, files and sinks.

There's one difficulty however, solving which is most of what this PR does. The file handle, that's used to transfer sstable's file across shards, should keep aboard all it needs to re-create a file on another shard. Since there's a single s3::client per shard, creation of a file out of a handle should grab that shard's client somehow. The meaningful shard-local object that can help is the sstables_manager and there are three ways to make use of it. All deal with the fact that sstables_manager-s are not sharded<> services, but are owner by the database independently on each shard.

1. walk the client -> sst.manager -> database -> container -> database -> sst.manager -> client chain by keeping its first half on the handle and unrolling the second half to produce a file
2. keep sharded peering service referenced by the sstables_manager that's initialized in main and passed though the database constructor down to sstables_manager(s)
3. equip file_handle::to_file with the "context" argument and teach sstables foreign info opener to push sstables_manager down to s3 file ... somehow

This PR chooses the 2nd way and introduces the sstables::storage_manager main-local sharded peering service that maintains all the s3::clients. "While at it" the new manager gets the object_storage_config updating facilities from the database (it's overloaded even without it already). Later the manager will also be in charge of collecting and exporting S3 metrics. In order to limit the number of S3 connections it also needs a patch seastar http::client, there's PR already doing that, once (if) merged there'll come one more fix on top.

refs: #13458
refs: #13369
refs: scylladb/seastar#1652

Closes #13859

* github.com:scylladb/scylladb:
  s3: Pick client from manager via handle
  s3: Generalize s3 file handle
  s3: Live-update clients' configs
  sstables: Keep clients shared across sstables
  storage_manager: Rewrap config map
  sstables, database: Move object storage config maintenance onto storage_manager
  sstables: Introduce sharded<storage_manager>
2023-05-14 14:14:23 +03:00
Pavel Emelyanov
8bca54902c sstables: Implement atomic deleter for s3 storage
The existing storage::wipe() method of s3 is in fact atomic deleter --
it commits "deleting" status into ownership table, deletes the objects
from server, then removes the entry from ownership table. So the atomic
deleter does the same and the .wipe() just removes the objects, because
it's not supposed to be atomic.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-12 17:52:13 +03:00
Pavel Emelyanov
6a8139a4fe sstables: Get atomic deleter from underlying storage
While the driver isn't known without the sstable itself, we have a
vector of them can can get it from the front element. This is not very
generic, but fortunately all sstables here belong to the same table and,
respectively, to the same storage and even prefix. The latter is also
assert-checked by the sstable_directory atomic deleter code.

For now S3 storage returns the same directory-based deleter, but next
patch will change that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-12 17:52:13 +03:00
Pavel Emelyanov
5985f00da9 sstables: Move delete_atomically to manager and rename
This is to let manager decide which storage driver to call for atomic
sstables deletion in the next patch. While at it -- rename the
sstable_directory's method into something more descriptive (to make
compiler catch all callers of it).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-12 17:52:12 +03:00
Raphael S. Carvalho
107999c990 test: Test new API for disabling tombstone GC
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-12 10:34:38 -03:00
Raphael S. Carvalho
c396db2e4c test: rest_api: extract common testing code into generic functions
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-12 10:34:38 -03:00
Raphael S. Carvalho
abc1eae1c2 Add API to disable tombstone GC in compaction
Adding new APIs /column_family/tombstone_gc and
/storage_service/tombstone_gc.

Mimicks existing APIs /column_family/autocompaction and
/storage_service/autocompaction.

column_family variant must specify a single table only,
following existing convention.

whereas the storage_service one can specify an entire
keyspace, or a subset of a tables in a keyspace.

column_family API usage
-----

The table name must be in keyspace:name format

Get status:
curl -s -X GET "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"

Enable GC
curl -s -X POST "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"

Disable GC
curl -s -X DELETE "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"

storage_service API usage
-----

Tables can be specified using a comma-separated list.

Enable GC on keyspace
curl -s -X POST "http://127.0.0.1:10000/storage_service/tombstone_gc/ks"

Disable GC on keyspace
curl -s -X DELETE "http://127.0.0.1:10000/storage_service/tombstone_gc/ks"

Enable GC on a subset of tables
curl -s -X POST
"http://127.0.0.1:10000/storage_service/tombstone_gc/ks?cf=table1,table2"

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-12 10:34:38 -03:00
Raphael S. Carvalho
07104393af api: storage_service: restore indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-12 10:34:36 -03:00
Raphael S. Carvalho
501b5a9408 api: storage_service: extract code to set attribute for a set of tables
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-12 10:33:50 -03:00
Pavel Emelyanov
d58bc9a797 tracing: List-initialize trace_state::_records
This field needs to call trace_state::ttl_by_type() which, in turn,
looks into _props. The latter should have been initialized already

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-12 16:15:58 +03:00
Pavel Emelyanov
5aebbedaba tracing: List-initialize trace_state::_props
It takes props from constructor args and tunes them according to the
constructing "flavor" -- primary or secondary state. Adding two static
helpers code-document the intent and make list-initialization possible

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-12 16:14:32 +03:00
Raphael S. Carvalho
6c32148751 tests: Test new option for disabling tombstone GC in compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-12 10:14:28 -03:00
Raphael S. Carvalho
777af7df44 compaction_strategy: bypass tombstone compaction if tombstone GC is disabled
compaction strategies know how to pick files that are most likely to
satisfy tombstone purge conditions (i.e. not shadow data in uncompacting
files).

This logic can be bypassed if tombstone GC was disabled by user,
as it's a waste of effort to proceed with it until re-enabled.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-12 10:14:28 -03:00
Raphael S. Carvalho
3b28c26c77 table: Allow tombstone GC in compaction to be disabled on user request
If tombstone GC was disabled, compaction will ensure that fully expired
sstables won't be bypassed and that no expired tombstones will be
purged. Changing the value takes immediate effect even on ongoing
compactions.

Not wired into an API yet.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-12 10:14:28 -03:00
Pavel Emelyanov
e7978dbf98 tracing: List-initialize trace_state::_slow_query_threshold
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-12 16:14:15 +03:00
Pavel Emelyanov
3ebbc25cec tracing: Reorder trace_state fields initialization
The instance ptr and props have to be set up early, because other
members' initialization depends on them. It's currently OK, because
other members are initialized in the constructor body, but moving them
into initializer list would require correct ordering

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-12 16:13:13 +03:00
Pavel Emelyanov
16e1315eef tracing: Remove init_session_records()
It now does nothing but wraps make_lw_shared<one_session_records>()
call. Callers can do it on their own thus facilitating further
list-initialization patching

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-12 16:11:18 +03:00
Pavel Emelyanov
dd87adadf3 tracing: List-initialize one_session_records::ttl
For that to happen the value evaluation is moved from the
init_session_records() into a private trace_state helper as it checks
the props values initialized earlier

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-12 16:09:05 +03:00
Pavel Emelyanov
b63084237c tracing: List-initialize one_session_records
This touches session_id, parent_id and my_span_id fields

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-12 16:07:24 +03:00
Pavel Emelyanov
944b98f261 tracing: List-initialize session_record
This object is constructed via one_session_records thus the latter needs
to pass some arguments along

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-12 16:04:01 +03:00
Botond Dénes
157fdb2f6d db/system_keyspace: remove dependency on storage_proxy
The methods that take storage_proxy as argument can now accept a
replica::database instead. So update their signatures and update all
callers. With that, system_keyspace.* no longer depends on storage_proxy
directly.
2023-05-12 07:27:55 -04:00
Botond Dénes
f4f757af23 db/system_keyspace: replace storage_proxy::query*() with replica:: equivalent
Use the recently introduced replica side query utility functions to
query the content of the system tables. This allows us to cut the
dependency of the system keyspace on storage proxy.
The methods still take storage proxy parameter, this will be replaced
with replica::database in the next patch.
There is still one hidden storage proxy dependency left, via
clq3::query_processor. This will be addressed later.
2023-05-12 07:27:55 -04:00
Botond Dénes
f5d41ac88c replica: add query.hh
Containing utility methods to query data from the local replica.
Intended to be used to read system tables, completely bypassing storage
proxy in the process.
This duplicates some code already found in storage proxy, but that is a
small price to pay, to be able to break some circular dependencies
involving storage proxy, that have been plaguing us since time
immemorial.
One thing we lose with this, is the smp service level using in storage
proxy. If this becomes a problem, we can create one in database and use
it in these methods too.
Another thing we lose is increasing `replica_cross_shard_ops` storage
proxy stat. I think this is not a problem at all as these new functions
are meant to be used by internal users, which will reduce the internal
noise in this metric, which is meant to indicate users not using
shard-aware clients.
2023-05-12 07:26:18 -04:00
Wojciech Mitros
d50f048279 cql: adjust tests to the updated permissions on functions
As a result of the preceding patches, permissions on a function
are now granted to its creator. As a result, some permissions may
appear which we did not expect before.

In the test_udf_permissions_serialization, we create a function
as the superuser, and as a result, when we compare the permissions
we specifically granted to the ones read from the LIST PERMISSIONS
result, we get more than expected - this is fixed by granting
permissions explicitly to a new user and only checking this user's
permissions list.

In the test_grant_revoke_udf_permissions case, we test whether
the DROP permission in enforced on a function that we have previously
created as the same user - as a result we have the DROP permission
even without granting it directly. We fix this by testing the DROP
permission on a function created by a different user.

In the test_grant_revoke_alter_udf_permissions case, we previously
tested that we require both ALTER and CREATE permissions when executing
a CREATE OR REPLACE FUNCTION statement. The new permissions required
for this statement now depend on whether we actually CREATE or REPLACE
a function, so now we test that the ALTER permission is required when
REPLACING a function, and the CREATE permission is required when
CREATING a function. After the changes, the case no longer needs to
be arfitifially extracted from the previous one, so they are merged
now. Analogous adjustments are made in the test case
test_grant_revoke_alter_uda_permissions.
2023-05-12 10:56:29 +02:00
Wojciech Mitros
8abed6445a cql: fix authorization when altering a function
Currently, when a user is altering a function, they need
both CREATE and ALTER permissions, instead of just ALTER.
Additionally, after altering a function, the user is
treated as an owner of this function, gaining all access
permissions to it.

This patch fixes these 2 issues, by checking only the ALTER
permission when actually altering, and by not modifying
user's permisssions if the user did not actually create
the function.
2023-05-12 10:56:29 +02:00
Wojciech Mitros
1d099644d4 cql: grant permissions on functions when creating a keyspace/function
When a user creates a function, they should have all permissions on
this function.
Similarly, when a user creates a keyspace, they should have all
permissions on functions in the keyspace.
This patch introduces GRANTs on the missing permissions.
2023-05-12 10:56:29 +02:00
Wojciech Mitros
dd20621d71 cql: pass a reference to query processor in grant_permissions_to_creator
In the following patch, the grant_permissions_to_creator method is going
to be also used to grant permissions on a newly created function. The
function resource may contain user-defined types which need the
query processor to be prepared, so we add a reference to it in advance
in this patch for easier review.
2023-05-12 10:56:29 +02:00
Wojciech Mitros
f4d2cd15e9 test_permissions: make tests pass on cassandra
Despite the cql-pytests being intended to pass on both Scylla and
Cassandra, the test_permissions.py case was actually failing on
Cassandra in a few cases. The most common issue was a different
exception type returned by Scylla and Cassandra for an invalid
query. This was fixed by accepting 2 types of exceptions when
necessary.
The second issue was java UDF code that did not compile, which was
fixed simply by debugging the code.
The last issue was a case that was scylla_only with no good reason.
The missing java UDFs were added to that case, and the test was
adjusted so that the ALTER permission was only checked in a
CREATE OR REPLACE statement only if the UDF was already existing -
- Scylla requires it in both cases, which will get resolved in the
next patch.
2023-05-12 10:50:12 +02:00
Kefu Chai
e89e0d4b28 test: sstable: use generator to generate generations
instead of assuming the integer-based generation id, let's use
the generation generator for creating a new generation id. this
helps us to improve the testing coverity once we migrate to the
UUID-based generation identifier.

this change uses generator to generate generations for
`make_sstable_for_all_shards()`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-12 13:22:32 +08:00
Kefu Chai
e3d6dd46b7 test: sstable: pass generation_type in helper functions
always avoid using generation_type if possible. this helps us to
hide the underlying type of generation identifier, which could also
be a UUID in future.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-12 13:22:32 +08:00
Kefu Chai
e788bfbb43 test: sstable: use generator to generate generations
instead of assuming the integer-based generation id, let's use
the generation generator for creating a new generation id. this
helps us to improve the testing coverity once we migrate to the
UUID-based generation identifier.

this change uses generator to create generations for
`make_sstable_for_this_shard()`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-12 13:22:30 +08:00
Pavel Emelyanov
613acba5d0 s3: Pick client from manager via handle
Add the global-factory onto the client that is

- cross-shard copyable
- generates a client from local storage_manager by given endpoint

With that the s3 file handle is fixed and also picks up shared s3
clients from the storage manager instead of creating its own one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-11 19:39:01 +03:00
Pavel Emelyanov
8ed9716f59 s3: Generalize s3 file handle
Currently the s3 file handle tries to carry client's info via explicit
host name and endpoint config pointer. This is buggy, the latter pointer
is shard-local can cannot be transferred across shards.

This patch prepares the fix by abstracting the client handle part.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-11 19:39:01 +03:00
Pavel Emelyanov
63ff6744d8 s3: Live-update clients' configs
Now when the client is accessible directli via the storage_manager, when
the latter is requested to update its endpoint config, it can kick the
client to do the same.

The latter, in turn, can only update the AWS creds info for now. The
endpoint port and https usage are immutable for now.

Also, updating the endpoint address is not possible, but for another
reason -- the endpoint itself is the part of keyspace configuration and
updating one in the object_storage.yaml will have no effect on it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-11 19:39:01 +03:00
Pavel Emelyanov
e6760482b2 sstables: Keep clients shared across sstables
Nowadays each sstable gets its own instance of an s3::client. This patch
keeps clients on storage_manager's endpoints map and when creating a
storage for an sstable -- grab the shared pointer from the map, thus
making one client serve all sstables over there (except for those that
duplicated their files with the help of foreign-info, but that's to be
handled by next patches).

Moving the ownership of a client to the storage_manager level also means
that the client has to be closed on manager's stop, not on sstable
destroy.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-11 19:39:01 +03:00
Pavel Emelyanov
743f26040f storage_manager: Rewrap config map
Now the map is endpoint -> config_ptr. Wrap the config_ptr into an
s3_endpoint struct. Next patch will keep the client on this new wrapper
struct thus making them shared between sstables.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-11 19:39:01 +03:00
Pavel Emelyanov
a59096aa70 sstables, database: Move object storage config maintenance onto storage_manager
Right now the map<endpoint, config> sits on the sstables manager and its
update is governed by database (because it's peering and can kick other
shards to update it as well).

Having the sharded<storage_manager> at hand lets freeing database from
the need to update configs and keeps sstables_manager a bit smaller.
Also this will allow keeping s3 clients shared between sstables via this
map by next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-11 19:39:00 +03:00
Pavel Emelyanov
2153751d45 sstables: Introduce sharded<storage_manager>
The manager in question keeps track of whatever sstables_manager needs
to work with the storage (spoiler: only S3 one). It's main-local sharded
peering service, so that container() call can be used by next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-11 19:36:01 +03:00
Pavel Emelyanov
d7af178f20 sstables: Restore indentation after previous patches
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-11 18:40:24 +03:00
Pavel Emelyanov
54e892caf1 sstables: Coroutinuze read_toc() outer part
It just needs to catch the system_error of ENOENT and re-throw it as
malformed_sstable_exception.

Indentatil is deliberately left broken. Again.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-11 18:40:14 +03:00
Pavel Emelyanov
1eb3ae2256 sstables: Coroutinuze read_toc() inner part
One non-trivial change is the removal of buf temporary variable. That's
because it existed under the same name in the .then() lambda generating
name conflict after coroutinization.

Other than that it's pretty straightforward.

Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-11 18:40:07 +03:00
Gleb Natapov
7caf1d26fb migration manager: Make schema pull abortable.
Now which schema pull may issues raft read barrier it may stuck if
majority is not available. Make the operation abortable and abort it
during queries if timeout is reached.
2023-05-11 16:31:23 +03:00
Gleb Natapov
091ec285fe serialized_action: make serialized_action abortable
Add an ability to abort waiting for a result of a specific trigger()
invocation.
2023-05-11 16:31:23 +03:00
Asias He
7fcc403122 tombstone_gc: Fix gc_before for immediate mode
The immediate mode is similar to timeout mode with gc_grace_seconds
zero. Thus, the gc_before returned should be the query_time instead of
gc_clock::time_point::max in immediate mode.

Setting gc_before to gc_clock::time_point::max, a row could be dropped
by compaction even if the ttl is not expired yet.

The following procedure reproduces the issue:

- Start 2 nodes

- Insert data

```
CREATE KEYSPACE ks2a WITH REPLICATION = { 'class' : 'SimpleStrategy',
'replication_factor' : 2 };
CREATE TABLE ks2a.tb (pk int, ck int, c0 text, c1 text, c2 text, PRIMARY
KEY(pk, ck)) WITH tombstone_gc = {'mode': 'immediate'};
INSERT into ks2a.tb (pk,ck, c0, c1, c2) values (10 ,1, 'x', 'y', 'z')
USING TTL 1000000;
INSERT into ks2a.tb (pk,ck, c0, c1, c2) values (20 ,1, 'x', 'y', 'z')
USING TTL 1000000;
INSERT into ks2a.tb (pk,ck, c0, c1, c2) values (30 ,1, 'x', 'y', 'z')
USING TTL 1000000;
```

- Run nodetool flush and nodetool compact

- Compaction drops all data

```
~128 total partitions merged to 0.
```

Fixes #13572

Closes #13800
2023-05-11 15:10:00 +03:00
Botond Dénes
3d75158fda Merge 'Allow no owned token ranges in cleanup compaction' from Benny Halevy
It is possible that a node will have no owned token ranges
in some keyspaces based on their replication strategy,
if the strategy is configured to have no replicas in
this node's data center.

In this case we should go ahead with cleanup that will
effectively delete all data.

Note that this is current very inefficient as we need
to filter every partition and drop it as unowned.
It can be optimized by either special casing this case
or, better, use skip forward to the next owned range.
This will skip to end-of-stream since there are no
owned ranges.

Fixes #13634

Also, add a respective rest_api unit test

Closes #13849

* github.com:scylladb/scylladb:
  test: rest_api: test_storage_service: add test_storage_service_keyspace_cleanup_with_no_owned_ranges
  compaction_manager: perform_cleanup: handle empty owned ranges
2023-05-11 15:05:06 +03:00
Gleb Natapov
70189b60de migration manager: if raft is enables sync with group0 leader before pulling a schema which is not available locally
Schema pull may fail because the pull does not contain everything that
is needed to instantiate a schema pointer. For instance it does not
contain a keyspace. This patch changes the code to issue raft read
barrier before the pull which will guaranty that the keyspace is created
before the actual schema pull is performed.

Refs: #3760
Fixes: #13211
2023-05-11 13:28:54 +03:00
Gleb Natapov
d4417442e9 service: raft_group0_client: add using_raft function
Make it easy to check if raft is enabled.
2023-05-11 13:27:58 +03:00
Anna Stuchlik
7f7ab3ae3e doc: fix the broken Glossary link
Fixes https://github.com/scylladb/scylladb/issues/13805

This commit fixes the redirection required by moving the Glossary
page from the top of the page tree to the Reference section.

As the change was only merged to master (not to branch-5.2),
it is not working for version 5.2, which is now the latest stable
version.
For this reason, "stable" in the path must be replaced with "master".

Closes #13847
2023-05-11 10:30:59 +03:00
Botond Dénes
24cb351655 Merge 'test: sstable_*test: avoid using helper using generation_type::int_t ' from Kefu Chai
the series drops some of the callers using SSTable generation as integer. as the generation of SSTable is but an identifier, we should not use it as an integer out of generation_type's implementation.

Closes #13845

* github.com:scylladb/scylladb:
  test: drop unused helper functions
  test: sstable_mutation_test: avoid using helper using generation_type::int_t
  test: sstable_move_test: avoid using helper using generation_type::int_t
  test: sstable_*test: avoid using helper using generation_type::int_t
  test: sstable_3_x_test: do not use reuseable_sst() accepting integer
2023-05-11 10:17:02 +03:00
Benny Halevy
2fc142279f compaction_manager: perform_cleanup: hold on to sstable_set around yielding
Updates to the compaction_group sstable sets are
never done in place.  Instead, the update is done
on a mutable copy of the sstable set, and the lw_shared
result is set back in the compaction_group.
(see for example compaction_group::set_main_sstables)

Therefore, there's currently a risk in perform_cleanup
`get_sstables` lambda that if it yield while in
set.for_each_sstable, the sstable_set might be replaced
and the copy it is traversing may be destroyed.
This was introduced in c2bf0e0b72.

To prevent that, hold on to set.shared_from_this()
around set.for_each_sstable.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13852
2023-05-11 09:46:53 +03:00
Benny Halevy
0b91bfbcc5 test: rest_api: test_storage_service: add test_storage_service_keyspace_cleanup_with_no_owned_ranges
Test cleanup on a keyspace after altering
it replication factor to 0.
Expect no sstables to remain.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-11 08:16:31 +03:00
Kefu Chai
29284d64a5 test: drop unused helper functions
all users of these two helpers have switched to their alternatives,
so there is no need to keep them.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-11 12:32:37 +08:00
Kefu Chai
b036d2b50c test: sstable_mutation_test: avoid using helper using generation_type::int_t
this change is one of the series which drops most of the callers
using SSTable generation as integer. as the generation of SSTable
is but an identifier, we should not use it as an integer out of
generation_type's implementation. so, in this change, instead of
using `generation_type::int_t` in the helper functions, we just
pass `generation_type` in place of integer. also, since
`generate_clustered()` is only used by functions in the same
compilation unit, let's take the opportunity to mark it `static`.
and there is no need to pass generation as a template parameter,
we just pass it as a regular parameter.

we will divert other callers of `reusable_sst(...,
generation_type::int)` in following-up changes in different ways.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-11 12:32:22 +08:00
Kefu Chai
689e1e99d6 test: sstable_move_test: avoid using helper using generation_type::int_t
this change is one of the series which drops most of the callers
using SSTable generation as integer. as the generation of SSTable
is but an identifier, we should not use it as an integer out of
generation_type's implementation. so, in this change, instead of
using `generation_type::int_t` in helper functions, we just use
`generation_type`. please note, despite that we'd prefer generating
the generations using generator, the SSTables used by the tests
modified by this change are stored in the repo, to ensure that the
tests are always able to find the SSTable files, we keep them
unchanged instead of using generation_generator, or a random
generation for the testing.

we will divert other callers of `reusable_sst(...,
generation_type::int)` in following-up changes in different ways.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-11 12:32:22 +08:00
Kefu Chai
bfd6caffbb test: sstable_*test: avoid using helper using generation_type::int_t
this change is one of the series which drops most of the callers
using SSTable generation as integer. as the generation of SSTable
is but an identifier, we should not use it as an integer out of
generation_type's implementation. so, in this change, instead of
using the helper accepting int, we switch to the one which accepts
generation_type by offering a default paramter, which is a
generation created using 1. this preserves the existing behavior.

we will divert other callers of `reusable_sst(...,
generation_type::int)` in following-up changes in different ways.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-11 12:32:22 +08:00
Kefu Chai
ab8efbf1ab test: sstable_3_x_test: do not use reuseable_sst() accepting integer
this change is one of the series which drops most of the callers
using SSTable generation as integer. as the generation of SSTable
is but an identifier, we should not use it as an integer out of
generation_type's implementation. so, in this change, instead of
using the helper accepting int, we switch to the one which accepts
generation_type.

also, as no callers are using the last parameter of `make_test_sstable()`,
let's drop it .

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-11 12:32:21 +08:00
Nadav Har'El
f1cad230bb Merge 'cql: enable setting permissions on resources with quoted UDT names' from Wojciech Mitros
This series fixes an issue with altering permissions on UDFs with
parameter types that are UDTs with quoted names and adds
a test for it.

The issue was caused by the format of the temporary string
that represented the UDT in `auth::resource`. After parsing the
user input to a raw type, we created a string representing the
UDT using `ut_name::to_string()`. The segment of the resulting
string that represented the name of the UDT was not quoted,
making us unable to parse it again when the UDT was being
`prepare`d. Other than for this purpose, the `ut_name::to_string()`
is used only for logging, so the solution was modifying it to
maybe quote the UDT name.

Ref: https://github.com/scylladb/scylladb/pull/12869

Closes #13257

* github.com:scylladb/scylladb:
  cql-pytest: test permissions for UDTs with quoted names
  cql: maybe quote user type name in ut_name::to_string()
  cql: add a check for currently used stack in parser
  cql-pytest: add an optional name parameter to new_type()
2023-05-10 19:10:29 +03:00
Wojciech Mitros
1f45c7364c cql: check permissions for used functions when creating a UDA
Currently, when creating a UDA, we only check for permissions
for creating functions. However, the creator gains all permissions
to the UDA, including the EXECUTE permission. This enables the
user to also execute the state/reduce/final functions that were
used in the UDA, even if they don't have the EXECUTE permissions
on them.

This patch adds checks for the missing EXECUTE permissions, so
that the UDA can be only created if the user has all required
permissions.

The new permissions that are now required when creating a UDA
are now granted in the existing UDA test.

Fixes #13818

Closes #13819
2023-05-10 18:06:04 +03:00
Wojciech Mitros
a86b9fa0bb auth: fix formatting of function resource with no arguments
Currently, when a function has no arguments, the function_args()
method, which is supposed to return a vector of string_views
representing the arguments of the function, returns a nullopt
instead, as if it was a functions_resource on all functions
or all functions in a keyspace. As a result, the functions_resource
can't be properly formatted.
This is fixed in this patch by returning an empty vector instead,
and the fix is confirmed in a cql-pytest.

Fixes #13842

Closes #13844
2023-05-10 17:07:33 +03:00
Benny Halevy
3771d48488 sstables: mx: validate: close consumer context
data_consume_rows keeps an input_stream member that must be closed.
In particular, on the error path, when we destroy it possibly
with readaheads in flight.

Fixes #13836

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13840
2023-05-10 17:05:43 +03:00
Benny Halevy
c720754e37 compaction_manager: perform_cleanup: handle empty owned ranges
It is possible that a node will have no owned token ranges
in some keyspaces based on their replication strategy,
if the strategy is configured to have no replicas in
this node's data center.

In this case we should go ahead with cleanup that will
effectively delete all data.

Note that this is current very inefficient as we need
to filter every partition and drop it as unowned.
It can be optimized by either special casing this case
ot, better, use skip forward to the next owned range.
This will skip to end-of-stream since there are no
owned ranges.

Fixes #13634

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-10 15:11:53 +03:00
Avi Kivity
171a6cbbaa cql3: untyped_result_set: document performance characteristics
untyped_result_set is optimized towards conventience and safety,
so note that.

Closes #13661
2023-05-10 15:03:12 +03:00
Nadav Har'El
e57252092c Merge 'cql3: result_set, selector: change value type to managed_bytes_opt' from Avi Kivity
CQL evolved several expression evaluation mechanisms: WHERE clause,
selectors (the SELECT clause), and the LWT IF clause are just some
examples. Most now use expressions, which use managed_bytes_opt
as the underlying value representation, but selectors still use bytes_opt.

This poses two problems:
1. bytes_opt generates large contiguous allocations when used with large blobs, impacting latency
2. trying to use expressions with bytes_opt will incur a copy, reducing performance

To solve the problem, we harmonize the data types to managed_bytes_opt
(#13216 notwithstanding). This is somewhat difficult since the source of the values
are views into a bytes_ostream. However, luckily bytes_ostream and managed_bytes_view
are mostly compatible so with a little effort this can be done.

The series is neutral wrt performance:

before:
```
222118.61 tps ( 61.1 allocs/op,  12.1 tasks/op,   43092 insns/op,        0 errors)
224250.14 tps ( 61.1 allocs/op,  12.1 tasks/op,   43094 insns/op,        0 errors)
224115.66 tps ( 61.1 allocs/op,  12.1 tasks/op,   43092 insns/op,        0 errors)
223508.70 tps ( 61.1 allocs/op,  12.1 tasks/op,   43107 insns/op,        0 errors)
223498.04 tps ( 61.1 allocs/op,  12.1 tasks/op,   43087 insns/op,        0 errors)
```

after:
```
220708.37 tps ( 61.1 allocs/op,  12.1 tasks/op,   43118 insns/op,        0 errors)
225168.99 tps ( 61.1 allocs/op,  12.1 tasks/op,   43081 insns/op,        0 errors)
222406.00 tps ( 61.1 allocs/op,  12.1 tasks/op,   43088 insns/op,        0 errors)
224608.27 tps ( 61.1 allocs/op,  12.1 tasks/op,   43102 insns/op,        0 errors)
225458.32 tps ( 61.1 allocs/op,  12.1 tasks/op,   43098 insns/op,        0 errors)
```

Though I expect with some more effort we can eliminate some copies.

Closes #13637

* github.com:scylladb/scylladb:
  cql3: untyped_result_set: switch to managed_bytes_view as the cell type
  cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt
  cql3: untyped_result_set: always own data
  types: abstract_type: add mixed-type versions of compare() and equal()
  utils/managed_bytes, serializer: add conversion between buffer_view<bytes_ostream> and managed_bytes_view
  utils: managed_bytes: add bidirectional conversion between bytes_opt and managed_bytes_opt
  utils: managed_bytes: add managed_bytes_view::with_linearized()
  utils: managed_bytes: mark managed_bytes_view::is_linearized() const
2023-05-10 15:01:45 +03:00
Wojciech Mitros
9ae1b02144 service: revoke permissions on functions when a function/keyspace is dropped
Currently, when a user has permissions on a function/all functions in
keyspace, and the function/keyspace is dropped, the user keeps the
permissions. As a result, when a new function/keyspace is created
with the same name (and signature), they will be able to use it even
if no permissions on it are granted to them.

Simliarly to regular UDFs, the same applies to UDAs.

After this patch, the corresponding permissions on functions are dropped
when a function/keyspace is dropped.

Fixes #13820

Closes #13823
2023-05-10 14:39:42 +03:00
Botond Dénes
bb62038119 Merge 'Scrub compaction task' from Aleksandra Martyniuk
Task manager's tasks covering scrub compaction on top,
shard and table level.

For this levels we have common scrub tasks for each scrub
mode since they share code. Scrub modes will be differentiated
on compaction group level.

Closes #13694

* github.com:scylladb/scylladb:
  test: extend test_compaction_task.py to test scrub compaction
  compaction: add table_scrub_sstables_compaction_task_impl
  compaction: add shard_scrub_sstables_compaction_task_impl
  compaction: add scrub_sstables_compaction_task_impl
  api: get rid of unnecessary std::optional in scrub
  compaction: rename rewrite_sstables_compaction_task_impl
2023-05-10 14:18:20 +03:00
Anna Stuchlik
4898a20ae9 doc: add troubleshooting for failed schema sync
Fixes https://github.com/scylladb/scylladb/issues/12133

This commit adds a Troubleshooting article to support
users when schema sync failed on their cluster.

Closes #13709
2023-05-10 14:01:36 +03:00
Avi Kivity
1a3545b13d Merge 'data_dictionary: define helpers in options and define == operator only' from Kefu Chai
in this series, `data_dictionary::storage_options` is refactored so that each dedicated storage option takes care of itself, instead of putting all the logic into `storage_options`. cleaner this way. as the next step, i will add yet another set of options for the tiered_storage which is backed by the s3_storage and the local filesystem_storage. with this change, we will be able to group the per-option functionalities together by the option thy are designed for, instead of sharding them by the actual function.

Closes #13826

* github.com:scylladb/scylladb:
  data_dictionary: define helpers in options
  data_dictionary: only define operator== for storage options
2023-05-10 12:59:57 +03:00
Avi Kivity
e252dbcfb8 Merge ' readers,mutation: move mutation_fragment_stream_validator to mutation/' from Botond Dénes
The validator classes have their definition in a header located in mutation/, while their implementation is located in a .cc in readers/mutation_reader.cc.
This PR fixes this inconsistency by moving the implementation into mutation/mutation_fragment_stream_validator.cc. The only change is that the validator code gets a new logger instance (but the logger variable itself is left unchanged for now).

Closes #13831

* github.com:scylladb/scylladb:
  mutation/mutation_fragment_stream_validator.cc: rename logger
  readers,mutation: move mutation_fragment_stream_validator to mutation/
2023-05-10 12:54:53 +03:00
Kamil Braun
7d9ab44e81 Merge 'token_metadata: read remapping for write_both_read_new' from Gusev Petr
When new nodes are added or existing nodes are deleted, the topology
state machine needs to shunt reads from the old nodes to the new ones.
This happens in the `write_both_read_new` state. The problem is that
previously this state was not handled in any way in `token_metadata` and
the read nodes were only changed when the topology state machine reached
the final 'owned' state.

To handle `write_both_read_new` an additional `interval_map` inside
`token_metadata` is maintained similar to `pending_endpoints`.  It maps
the ranges affected by the ongoing topology change operation to replicas
which should be used for reading. When topology state sm reaches the
point when it needs to switch reads to a new topology, it passes
`request_read_new=true` in a call to `update_pending_ranges`. This
forces `update_pending_ranges` to compute the ranges based on new
topology and store them to the `interval_map`. On the data plane, when a
read on coordinator needs to decide which endpoints to use, it first
consults this `interval_map` in `token_metadata`, and only if it doesn't
contain a range for current token it uses normal endpoints from
`effective_replication_map`.

Closes #13376

* github.com:scylladb/scylladb:
  storage_proxy, storage_service: use new read endpoints
  storage_proxy: rename get_live_sorted_endpoints->get_endpoints_for_reading
  token_metadata: add unit test for endpoints_for_reading
  token_metadata: add endpoints for reading
  sequenced_set: add extract_set method
  token_metadata_impl: extract maybe_migration_endpoints helper function
  token_metadata_impl: introduce migration_info
  token_metadata_impl: refactor update_pending_ranges
  token_metadata: add unit tests
  token_metadata: fix indentation
  token_metadata_impl: return unique_ptr from clone functions
2023-05-10 10:03:30 +02:00
Avi Kivity
550aa01242 Merge 'Restore raft::internal::tagged_uint64 type' from Benny Halevy
Change f5f566bdd8 introduced
tagged_integer and replaced raft::internal::tagged_uint64
with utils::tagged_integer.

However, the idl type for raft::internal::tagged_uint64
was not marked as final, but utils::tagged_integer is, breaking
the on-the-wire compatibility.

This change restores the use of raft::internal::tagged_uint64
for the raft types and adds back an idl definition for
it that is not marked as final, similar to the way
raft::internal::tagged_id extends utils::tagged_uuid.

Fixes #13752

Closes #13774

* github.com:scylladb/scylladb:
  raft, idl: restore internal::tagged_uint64 type
  raft: define term_t as a tagged uint64_t
  idl: gossip_digest: include required headers
2023-05-09 22:51:25 +03:00
Kefu Chai
d8cd62b91a compaction/compaction: initialize local variable
the initial `validation_errors` should be zero. so let's initialize it
instead of leaving it to uninitialized.

this should address following warning from Clang-16:

```
/usr/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=6 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -I/home/kefu/dev/scylladb/build/cmake/gen -isystem /home/kefu/dev/scylladb/build/cmake/rust -Wall -Werror -Wno-error=deprecated-declarations -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -march=westmere  -Og -g -gz -std=gnu++20 -fvisibility=hidden -U_FORTIFY_SOURCE -DSEASTAR_SSTRING -Wno-error=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT compaction/CMakeFiles/compaction.dir/compaction.cc.o -MF compaction/CMakeFiles/compaction.dir/compaction.cc.o.d -o compaction/CMakeFiles/compaction.dir/compaction.cc.o -c /home/kefu/dev/scylladb/compaction/compaction.cc
/home/kefu/dev/scylladb/compaction/compaction.cc:1681:9: error: variable 'validation_errors' is uninitialized when used here [-Werror,-Wuninitialized]
        validation_errors += co_await sst->validate(permit, descriptor.io_priority, cdata.abort, [&schema] (sstring what) {
        ^~~~~~~~~~~~~~~~~
/home/kefu/dev/scylladb/compaction/compaction.cc:1676:31: note: initialize the variable 'validation_errors' to silence this warning
    uint64_t validation_errors;
                              ^
                               = 0
```

the change which introduced this local variable was 7ba5c9cc6a.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13813
2023-05-09 22:49:29 +03:00
Avi Kivity
8c6229d229 Merge 'sstable: encode value using UUID' from Kefu Chai
in this series, we encode the value of generation using UUID to prepare for the UUID generation identifier. simpler this way, as we don't need to have two ways to encode integer or a timeduuid: uuid with a zero timestamp, and a variant. also, add a `from_string()` factory method to convert string to generation to hide the the underlying type of value from generation_type's users.

Closes #13782

* github.com:scylladb/scylladb:
  sstable: use generation_type::from_string() to convert from string
  sstable: encode int using UUID in generation_type
2023-05-09 22:07:23 +03:00
Avi Kivity
996f717dfc Merge 'cql3/prepare_expr: force token() receiver name to be partition key token' from Jan Ciołek
Let's say that we have a prepared statement with a token restriction:
```cql
SELECT * FROM some_table WHERE token(p1, p2) = ?
```

After calling `prepare` the drivers receives some information about the prepared statment, including names of values bound to each bind marker.

In case of a partition token restriction (`token(p1, p2) = ?`) there's an expectation that the name assigned to this bind marker will be `"partition key token"`.

In a recent change the code handling `token()` expressions has been unified with the code that handles generic function calls, and as a result the name has changed to `token(p1, p2)`.

It turns out that the Java driver relies on the name being `"partition key token"`, so a change to `token(p1, p2)` broke some things.

This patch sets the name back to `"partition key token"`. To achieve this we detect any restrictions that match the pattern `token(p1, p2, p3) = X` and set the receiver name for X to `"partition key token"`.

Fixes: #13769

Closes #13815

* github.com:scylladb/scylladb:
  cql-pytest: test that bind marker is partition key token
  cql3/prepare_expr: force token() receiver name to be partition key token
2023-05-09 20:44:46 +03:00
Anna Stuchlik
c64109d8c7 doc: add driver support for Serverless
Fixes https://github.com/scylladb/scylladb/issues/13453

This is V2 of https://github.com/scylladb/scylladb/pull/13710/.

This commit adds:
- the information about which ScyllaDB drivers support ScyllaDB Cloud Serverless.
- language and organization improvements to the ScyllaDB CQL Drivers
  page.

Closes #13825
2023-05-09 20:43:22 +03:00
Kefu Chai
c872ade50f sstable: use generation_type::from_string() to convert from string
in this change,

* instead of using "\d+" to match the generation, use "[^-]",
* let generation_type to convert a string to generation

before this change, we casts the matched string in SSTable file name
to integer and then construct a generation identifier from the integer.
this solution has a strong assumption that the generation is represented
with an integer, we should not encode this assumption in sstable.cc,
instead we'd better let generation_type itself to take care of this. also,
to relax the restriction of regex for matching generation, let's
just use any characters except for the delimeter -- "-".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-09 22:57:39 +08:00
Kefu Chai
478c13d0d4 sstable: encode int using UUID in generation_type
since we already use UUID for encoding an bigint in SSTable registry
table, let's just use the same approach for encoding bigint in generation_type,
to be more consistent, and less repeatings this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-09 22:57:38 +08:00
Petr Gusev
08529a1c6c storage_proxy, storage_service: use new read endpoints
We use set_topology_transition_state to set read_new state
in storage_service::topology_state_load
based on _topology_state_machine._topology.tstate.
This triggers update_pending_ranges to compute and store new ranges
for read requests. We use this information in
storage_proxy::get_endpoints_for_reading
when we need to decide which nodes to use for reading.
2023-05-09 18:42:03 +04:00
Petr Gusev
052b91fb1f storage_proxy: rename get_live_sorted_endpoints->get_endpoints_for_reading
We are going to use remapped_endpoints_for_reading, we need
to make sure we use it in the right place. The
get_live_sorted_endpoints function looks like what we
need - it's used in all read code paths.
From its name, however, this was not obvious.

Also, we add the parameter ks_name as we'll need it
to pass to remapped_endpoints_for_reading.
2023-05-09 18:42:03 +04:00
Petr Gusev
15fe4d8d69 token_metadata: add unit test for endpoints_for_reading 2023-05-09 18:42:03 +04:00
Petr Gusev
0e4e2df657 token_metadata: add endpoints for reading
In this patch we add
token_metadata::set_topology_transition_state method.
If the current state is
write_both_read_new update_pending_ranges
will compute new ranges for read requests. The default value
of topology_transition_state is null, meaning no read
ranges are computed. We will add the appropriate
set_topology_transition_state calls later.

Also, we add endpoints_for_reading method to get
read endpoints based on the computed ranges.
2023-05-09 18:41:59 +04:00
Kefu Chai
d24687ea26 data_dictionary: define helpers in options
instead of dispatching and implementing the per-option handling
right in `storage_option`, define these helpers in the dedicated
option themselves, so `storage_option` is only responsible for
dispatching.

much cleaner this way. this change also makes it easier to add yet
another storage backend.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-09 21:51:52 +08:00
Kefu Chai
152d0224dc data_dictionary: only define operator== for storage options
as the only user of these comparison operators is
`storage_options::can_update_to()`, which just check if the given
`storage_options` is equal to the stored one. so no need to define
the <=> operator.

also, no need to add the `friend` specifier, as the options are plain
struct, all the member variables are public.

make the comparison operator a member function instead of a free
function, as in C++20 comparision operators are symmetric.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-09 21:51:45 +08:00
Botond Dénes
ef7b7223d5 mutation/mutation_fragment_stream_validator.cc: rename logger
This code inherited its logger variable name from mutation reader,
rename it to better match its new context.
2023-05-09 07:55:13 -04:00
Botond Dénes
8681f3e997 readers,mutation: move mutation_fragment_stream_validator to mutation/
The validator classes have their definition in a header located in mutation/,
while their implementation is located in a .cc in readers/mutation_reader.cc.
This patch fixes this inconsistency by moving the implementation into
mutation/mutation_fragment_stream_validator.cc. The only change is that
the validator code gets a new logger instance (but the logger variable itself
is left unchanged for now).
2023-05-09 07:55:13 -04:00
Botond Dénes
287ccce1cc Merge 'sstables: extract storage out ' from Kefu Chai
this change extracts the storage class and its derived classes
out into their own source files. for couple reasons:

- for better readability. the sstables.hh is over 1005 lines.
  and sstables.cc 3602 lines. it's a little bit difficult to figure
  out how the different parts in these sources interact with each
  other. for instance, with this change, it's clear some of helper
  functions are only used by file_system_storage.
- probably less inter-source dependency. by extracting the sources
  files out, they can be compiled individually, so changing one .cc
  file does not impact others. this could speed up the compilation
  time.

Closes #13785

* github.com:scylladb/scylladb:
  sstables: storage: coroutinize idempotent_link_file()
  sstables: extract storage out
2023-05-09 14:03:40 +03:00
Jan Ciolek
9ad1c5d9f2 cql-pytest: test that bind marker is partition key token
When preparing a query each bind marker gets a name.
For a query like:
```cql
SELECT * FROM some_table WHERE token(p1, p2) = ?
```
The bind marker's name should be `"partition key token"`.
Java driver relies on this name, having something else,
like `"token(p1, p2)"` be the name breaks the Java driver.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-05-09 12:33:06 +02:00
Jan Ciolek
8a256f63db cql3/prepare_expr: force token() receiver name to be partition key token
Let's say that we have a prepared statement with a token restriction:
```cql
SELECT * FROM some_table WHERE token(p1, p2) = ?
```

After calling `prepare` the drivers receives some information
about the prepared statment, including names of values bound
to each bind marker.

In case of a partition token restriction (`token(p1, p2) = ?`)
there's an expectation that the name assigned to this bind marker
will be `"partition key token"`.

In a recent change the code handling `token()` expressions has been
unified with the code that handles generic function calls,
and as a result the name has changed to `token(p1, p2)`.

It turns out that the Java driver relies on the name being
`"partition key token"`, so a change to `token(p1, p2)`
broke some things.

This patch sets the name back to `"partition key token"`.
To achieve this we detect any restrictions that match
the pattern `token(p1, p2, p3) = X` and set the receiver
name for X to `"partition key token"`.

Fixes: #13769

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-05-09 12:32:57 +02:00
Petr Gusev
b2e5d8c21c sequenced_set: add extract_set method
Can be useful if we want to reuse the set
when we are done with this sequenced_set
instance.
2023-05-09 13:56:38 +04:00
Petr Gusev
0567ab82ac token_metadata_impl: extract maybe_migration_endpoints helper function
We are going to add a function in token_metadata to get read endpoints,
similar to pending_endpoints_for. So in this commit we extract
the maybe_migration_endpoints helper function, which will be
used in both cases.
2023-05-09 13:56:38 +04:00
Petr Gusev
030f0f73aa token_metadata_impl: introduce migration_info
We are going to store read_endpoints in a way similar
to pending ranges, so in this commit we add
migration_info - a container for two
boost::icl::interval_map.

Also, _pending_ranges_interval_map is renamed to
_keyspace_to_migration_info, since it captures
the meaning better.
2023-05-09 13:56:38 +04:00
Petr Gusev
56c2b3e893 token_metadata_impl: refactor update_pending_ranges
Now update_pending_ranges is quite complex, mainly
because it tries to act efficiently and update only
the affected intervals. However, it uses the function
abstract_replication_strategy::get_ranges, which calls
calculate_natural_endpoints for every token
in the ring anyway.

Our goal is to start reading from the new replicas for
ranges in write_both_read_new state. In the current
code structure this is quite difficult to do, so
in this commit we first simplify update_pending_ranges.

The main idea of the refactoring is to build a new version
of token_metadata based on all planned changes
(join, bootstrap, replace) and then for each token
range compare the result of calculate_natural_endpoints on
the old token_metadata and on the new one.
Those endpoints that are in the new version and
are not in the old version should be added to the pending_ranges.

The add_mapping function is extracted for the
future - we are going to use it to handle read mappings.

Special care is taken when replacing with the same IP.
The coordinator employs the
get_natural_endpoints_without_node_being_replaced function,
which excludes such endpoints from its result. If we compare
the new (merged) and current token_metadata configurations, such
endpoints will also be absent from pending_endpoints since
they exist in both. To address this, we copy the current
token_metadata and remove these endpoints prior to comparison.
This ensures that nodes being replaced are treated
like those being deleted.
2023-05-09 13:56:28 +04:00
Petr Gusev
3120cabf56 token_metadata: add unit tests
We are going to refactor update_pending_ranges,
so in this commit we add some simple unit tests
to ensure we don't break it.
2023-05-09 13:56:06 +04:00
Benny Halevy
adfb79ba3e raft, idl: restore internal::tagged_uint64 type
Change f5f566bdd8 introduced
tagged_integer and replaced raft::internal::tagged_uint64
with utils::tagged_integer.

However, the idl type for raft::internal::tagged_uint64
was not marked as final, but utils::tagged_integer is, breaking
the on-the-wire compatibility.

This change defines the different raft tagged_uint64
types in idl/raft_storage.idl.hh as non-final
to restore the way they were serialized prior to
f5f566bdd8

Fixes #13752

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-09 12:38:20 +03:00
Kamil Braun
41cac23aa4 Merge 'raft: verify RPC destination ID' from Mikołaj Grzebieluch
All Raft verbs include `dst_id`, the ID of the destination server, but
it isn't checked. `append_entries` will work even if it arrives at
completely the wrong server (but in the same group). It can cause
problems, e.g. in the scenario of replacing a dead node.

This commit adds verifying if `dst_id` matches the server's ID and if it
doesn't, the Raft verb is rejected.

Closes #12179

Testing
---

Testcase and scylla's configuration:
57d3ef14d8

It artificially lengthens the duration of replacing the old node. It
increases the chance of getting the RPC command sent to a replaced node,
by the new node.

In the logs of the node that replaced the old one, we can see logs in
the form:
```
DEBUG <time> [shard 0] raft_group_registry - Got message for server <dst_id>, but my id is <my_id>
```
It indicates that the Raft verb with the wrong `dst_id` was rejected.

This test isn't included in the PR because it doesn't catch any specific error.

Closes #13575

* github.com:scylladb/scylladb:
  service/raft: raft_group_registry: Add verification of destination ID
  service/raft: raft_group_registry: `handle_raft_rpc` refactor
2023-05-09 11:33:28 +02:00
Aleksandra Martyniuk
f199ec5ec3 test: extend test_compaction_task.py to test scrub compaction 2023-05-09 11:15:26 +02:00
Aleksandra Martyniuk
83d3463d10 compaction: add table_scrub_sstables_compaction_task_impl
Implementation of task_manager's task covering scrub sstables
compaction of one table.
2023-05-09 11:15:25 +02:00
Aleksandra Martyniuk
d8e4a2fee3 compaction: add shard_scrub_sstables_compaction_task_impl
Implementation of task_manager's task covering scrub sstables
compaction on one shard.
2023-05-09 11:14:36 +02:00
Aleksandra Martyniuk
8d32579fe6 compaction: add scrub_sstables_compaction_task_impl
Implementation of task_manager's task covering scrub sstables
compaction.
2023-05-09 11:13:57 +02:00
Kefu Chai
a69282e69b sstables: storage: coroutinize idempotent_link_file()
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-09 16:47:00 +08:00
Kefu Chai
2eefcb37eb sstables: extract storage out
this change extracts the storage class and its derived classes
out into storage.cc and storage.hh. for couple reasons:

- for better readability. the sstables.hh is over 1005 lines.
  and sstables.cc 3602 lines. it's a little bit difficult to figure
  out how the different parts in these sources interact with each
  other. for instance, with this change, it's clear some of helper
  functions are only used by file_system_storage.
- probably less inter-source dependency. by extracting the sources
  files out, they can be compiled individually, so changing one .cc
  file does not impact others. this could speed up the compilation
  time.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-09 16:47:00 +08:00
Aleksandra Martyniuk
79c39e4ea7 api: get rid of unnecessary std::optional in scrub
In scrub lambdas returning std::optional<compaction_stats> cannot
return empty value. Hence, std::optional wrapper isn't needed.
2023-05-09 10:31:44 +02:00
Aleksandra Martyniuk
40809c887e compaction: rename rewrite_sstables_compaction_task_impl
Rename rewrite_sstables_compaction_task_impl to
sstables_compaction_task_impl as a new name describes the class
of tasks better. Rewriting sstables is a slightly more fine-grained
type of sstable compaction task then the one needed here.
2023-05-09 10:31:44 +02:00
Botond Dénes
20f620feb9 Merge 'replica, sstable: replace generation_type::value() with generation_type::as_int()' from Kefu Chai
this series prepares for the UUID based generation by replacing the general `value()` function with the function with more specific name: `as_int()`.

Closes #13796

* github.com:scylladb/scylladb:
  test: drop a reusable_sst() variant which accepts int as generation
  treewide: replace generation_type::value() with generation_type::as_int()
2023-05-09 07:30:54 +03:00
Benny Halevy
531ac63a8d raft: define term_t as a tagged uint64_t
It was defined as a tagged (signed) int64_t by mistake
in f5f566bdd8.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-09 06:51:26 +03:00
Benny Halevy
d3a59fdefd idl: gossip_digest: include required headers
To be self-sufficient, before the next patch
that will affect tagged_integer.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-09 06:51:26 +03:00
Michał Chojnowski
0813fa1da0 database: fix reads_memory_consumption for system semaphore
The metric shows the opposite of what its name suggests.
It shows available memory rather than consumed memory.
Fix that.

Fixes #13810

Closes #13811
2023-05-09 06:42:43 +03:00
Nadav Har'El
5f37d43ee6 Merge 'compaction: validate: validate the index too' from Botond Dénes
In addition to the data file itself. Currently validation avoids the
index altogether, using the crawling reader which only relies on the
data file and ignores the index+summary. This is because a corrupt
sstable usually has a corrupt index too and using both at the same time
might hide the corruption. This patch adds targeted validation of the
index, independent of and in addition to the already existing data
validation: it validates the order of index entries as well as whether
the entry points to a complete partition in the data file.
This will usually result in duplicate errors for out-of-order
partitions: one for the data file and one for the index file.

Fixes: #9611

Closes #11405

* github.com:scylladb/scylladb:
  test/cql-pytest: add test_sstable_validation.py
  test/cql-pytest: extract scylla_path,temp_workdir fixtures to conftest.py
  tools/scylla-sstables: write validation result to stdout
  sstables/sstable: validate(): delegate to mx validator for mx sstables
  sstables/mx/reader: add mx specific validator
  mutation/mutation_fragment_stream_validator: add validator() accessor to validating filter
  sstables/mx/reader: template data_consume_rows_context_m on the consumer
  sstables/mx/reader: move row_processing_result to namespace scope
  sstables/mx/reader: use data_consumer::proceed directly
  sstables/mx/reader.cc: extend namespace to end-of-file (cosmetic)
  compaction/compaction: remove now unused scrub_validate_mode_validate_reader()
  compaction/compaction: move away from scrub_validate_mode_validate_reader()
  tools/scylla-sstable: move away from scrub_validate_mode_validate_reader()
  test/boost/sstable_compaction_test: move away from scrub_validate_mode_validate_reader()
  sstables/sstable: add validate() method
  compaction/compaction: scrub_sstables_validate_mode(): validate sstables one-by-one
  compaction: scrub: use error messages from validator
  mutation_fragment_stream_validator: produce error messages in low-level validator
2023-05-08 17:14:26 +03:00
Botond Dénes
b790f14456 reader_concurrency_semaphore: execution_loop(): trigger admission check when _ready_list is empty
The execution loop consumes permits from the _ready_list and executes
them. The _ready_list usually contains a single permit. When the
_ready_list is not empty, new permits are queued until it becomes empty.
The execution loops relies on admission checks triggered by the read
releasing resouces, to bring in any queued read into the _ready_list,
while it is executing the current read. But in some cases the current
read might not free any resorces and thus fail to trigger an admission
check and the currently queued permits will sit in the queue until
another source triggers an admission check.
I don't yet know how this situation can occur, if at all, but it is
reproducible with a simple unit test, so it is best to cover this
corner-case in the off-chance it happens in the wild.
Add an explicit admission check to the execution loop, after the
_ready_list is exhausted, to make sure any waiters that can be admitted
with an empty _ready_list are admitted immediately and execution
continues.

Fixes: #13540

Closes #13541
2023-05-08 17:11:41 +03:00
Takuya ASADA
fdceda20cc scylla_raid_setup: wipe filesystem signatures from specified disks
The discussion on the thread says, when we reformat a volume with another
filesystem, kernel and libblkid may skip to populate /dev/disk/by-* since it
detected two filesystem signatures, because mkfs.xxx did not cleared previous
filesystem signature.
To avoid this, we need to run wipefs before running mkfs.

Note that this runs wipefs twice, for target disks and also for RAID device.
wipefs for RAID device is needed since wipefs on disks doesn't clear filesystem signatures on /dev/mdX (we may see previous filesystem signature on /dev/mdX when we construct RAID volume multiple time on same disks).

Also dropped -f option from mkfs.xfs, it will check wipefs is working as we
expected.

Fixes #13737

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #13738
2023-05-08 16:53:43 +03:00
Anna Stuchlik
98e1d7a692 doc: add the Elixir driver to the docs
This commit adds the link to the Exlixir driver
to the list of the third-party drivers.
The driver actively supports ScyllaDB.

This is v2 of https://github.com/scylladb/scylladb/pull/13701

Closes #13806
2023-05-08 15:36:35 +03:00
Kamil Braun
153cb00e9d test: test_random_tables: wait for token ring convergence before data queries
The test performs an `INSERT` followed by a `SELECT`, checking if the
previously inserted data is returned.

This may fail because we're using `ring_delay = 0` in tests and the two
queries may arrive at different nodes, whose `token_metadata` didn't
converge yet (it's eventually consistent based on gossiping).
I illustrated this here:
https://github.com/scylladb/scylladb/issues/12937#issuecomment-1536147455

Ensure that the nodes' token rings are synchronized (by waiting until
the token ring members on each node is the same as group 0
configuration).

Fixes #12937

Closes #13791
2023-05-08 13:22:52 +02:00
Kamil Braun
3f3dcf451b test: pylib: random_tables: perform read barrier in verify_schema
`RandomTables.verify_schema` is often called in topology tests after
performing a schema change. It compares the schema tables fetched from
some node to the expected latest schema stored by the `RandomTables`
object.

However there's no guarantee that the latest schema change has already
propagated to the node which we query. We could have performed the
schema change on a different node and the change may not have been
applied yet on all nodes.

To fix that, pick a specific node and perform a read barrier on it, then
use that node to fetch the schema tables.

Fixes #13788

Closes #13789
2023-05-08 13:21:10 +02:00
Avi Kivity
198738f2b1 Merge 'build: compile wasm udfs automatically' from Wojciech Mitros
Currently, when we deal with a Wasm program, we store
it in its final WebAssembly Text form. This causes a lot
of code bloat and is hard to read. Instead, we would like
to store only the source codes, and build Wasm when
necessary. This series adds build commands that
compile C/Rust sources to Wasm and uses them for Wasm
programs that we're already using.

After these changes, adding a new program that should be
compiled to Rust, requires only adding the source code
of it and updating the `wasms` and `wasm_deps` lists in
`configure.py`.

All Wasm programs are build by default when building all
artifacts, artifacts in a given mode, or when building
tests. Additionally, a {mode}-wasm target is added, so that
it's possible to build just the wasm files.
The generated files are saved in $builddir/{mode}/wasm,
and are accessed in cql-pytests similarly to the way we're
accessing the scylla binary - using glob.

Closes #13209

* github.com:scylladb/scylladb:
  wasm: replace wasm programs with their source programs
  build: prepare rules for compiling wasm files
  build: set the type of build_artifacts
  test: extend capabilities of Wasm reading helper funciton
2023-05-08 13:51:53 +03:00
Petr Gusev
e5c6af17e6 token_metadata: fix indentation 2023-05-08 13:16:21 +04:00
Petr Gusev
435a7573ff token_metadata_impl: return unique_ptr from clone functions
token_metadata takes token_metadata_impl as unique_ptr,
so it makes sense to create it that way in the first place
to avoid unnecessary moves.

token_metadata_impl constructor with shallow_copy parameter
was made public for std::make_unique. The effective
accessibility of this constructor hasn't changed though since
shallow_copy remains private.
2023-05-08 13:16:21 +04:00
Wojciech Mitros
6d89d718d9 wasm: replace wasm programs with their source programs
After recent changes, we are able to store only the
C/Rust source codes for Wasm programs, and only build
them when neccessary. This patch utilizes this
opportunity by removing most of the currently stored
raw Wasm programs, replacing them with C/Rust sources
and adding them to the new build system.
2023-05-08 10:47:34 +02:00
Wojciech Mitros
c065ae0ded build: prepare rules for compiling wasm files
Currently, when we deal with a Wasm program, we store
it in its final WebAssembly Text form. This causes a lot
of code bloat and is hard to read. Instead, we would like
to store only the (C/Rust) source codes, and build Wasm
when neccessary. This patch adds build commands that
compile C/Rust sources to Wasm.
After these changes, adding a new program that should be
compiled to Rust, requires only adding the source code
of it and updating the wasms and wasm_deps lists in
configure.py.
All Wasm programs are build by default when building all
artifacts, all artifacts in a given mode, or when building
tests. Additionally, a ninja wasm target is added, so that
it's possible to build just the wasm files.
The generated files are saved in $builddir/wasm.
2023-05-08 10:47:34 +02:00
Wojciech Mitros
c53d68ee3e build: set the type of build_artifacts
Currently, build_artifacts are of type set[str] | list, which prevents
us from performing set operations on it. In a future patch, we will
want to take a set difference and set intersections with it, so we
initialize the type of build_artifacts to a set in all cases.
2023-05-08 10:47:34 +02:00
Wojciech Mitros
0a34a54c73 test: extend capabilities of Wasm reading helper funciton
Currently, we require that the Wasm file is named the same
as the funciton. In the future we may want multiple functions
with the same name, which we can't currently do due to this
limitation.
This patch allows specifying the function name, so that multiple
files can have a function with the same name.
Additionally, the helper method now escapes "'" characters, so
that they can appear in future Wasm files.
2023-05-08 10:47:34 +02:00
Botond Dénes
ab5fd0f750 Merge 's3: Provide timestamps in the s3 file implementation' from Raphael "Raph" Carvalho
SSTable relies on st.st_mtime for providing creation time of data
file, which in turn is used by features like tombstone compaction.

Therefore, let's implement it.

Fixes https://github.com/scylladb/scylladb/issues/13649.

Closes #13713

* github.com:scylladb/scylladb:
  s3: Provide timestamps in the s3 file implementation
  s3: Introduce get_object_stats()
  s3: introduce get_object_header()
2023-05-08 11:43:41 +03:00
Raphael S. Carvalho
ad471e5846 s3: Provide timestamps in the s3 file implementation
SSTable relies on st.st_mtime for providing creation time of data
file, which in turn is used by features like tombstone compaction.

Fixes #13649.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-07 19:51:12 -03:00
Raphael S. Carvalho
57661f0392 s3: Introduce get_object_stats()
get_object_stats() will be used for retrieving content size and
also last modified.

The latter is required for filling st_mtim, etc, in the
s3::client::readable_file::stat() method.

Refs #13649.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-07 19:51:10 -03:00
Raphael S. Carvalho
da2ccc44a4 s3: introduce get_object_header()
This allows other functions to reuse the code to retrieve the
object header.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-07 19:49:52 -03:00
Kefu Chai
5fa459bd1a treewide: do not include unused header
since #13452, we switched most of the caller sites from std::regex
to boost::regex. in this change, all occurences of `#include <regex>`
are dropped unless std::regex is used in the same source file.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13765
2023-05-07 19:01:29 +03:00
Kefu Chai
468460718a utils: UUID: drop uint64_t_tri_compare()
functinoality wise, `uint64_t_tri_compare()` is identical to the
three-way comparison operator, so no need to keep it. in this change,
it is dropped in favor of <=>.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13794
2023-05-07 18:07:49 +03:00
Avi Kivity
380c0b0f33 cql3: untyped_result_set: switch to managed_bytes_view as the cell type
Now that result_set uses managed_bytes_opt for its internals, it's
easy to switch untyped_result_set too. This avoids large
contiguous allocations.
2023-05-07 17:17:36 +03:00
Avi Kivity
42a1ced73b cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt
The expression system uses managed_bytes_opt for values, but result_set
uses bytes_opt. This means that processing values from the result set
in expressions requires a copy.

Out of the two, managed_bytes_opt is the better choice, since it prevents
large contiguous allocations for large blobs. So we switch result_set
to use managed_bytes_opt. Users of the result_set API are adjusted.

The db::function interface is not modified to limit churn; instead we
convert the types on entry and exit. This will be adjusted in a following
patch.
2023-05-07 17:17:36 +03:00
Avi Kivity
df4b7e8500 cql3: untyped_result_set: always own data
untyped_result_set is used for internal queries, where ease-of-use is more
important than performance. Currently, cells are held either by value or
by reference (managed_bytes_view). An upcoming change will cause the
result set to be built from managed_bytes_view, making it non-owning, but
the source data is not actually held, resulting in a use-after-free.

Rather than chase the source and force the data to be owned in this case,
just drop the possibility for a non-owning untyped_result_set. It's only
used in non-performance-critical paths and safety is more important than
saving a few cycles.

This also results in simplification: previously, we had a variant selecting
monostate (for NULL), managed_bytes_view (for a reference), and bytes (for
owning data); now we only have a bytes_opt since that already signifies
data-or-NULL.

Once result_set transitions to managed_bytes_opt, untyped_result_set
will follow. For now it's easier to use bytes_opt.
2023-05-07 17:17:36 +03:00
Avi Kivity
d3e9fd49a3 types: abstract_type: add mixed-type versions of compare() and equal()
compare() and equal() can compare two unfragmented values or two
fragmented values, but a mix of a fragmented value and an unfragmented
value runs afoul of C++ conversion rules. Add more overloads to
make it simpler for users.
2023-05-07 17:17:36 +03:00
Avi Kivity
11d651b606 utils/managed_bytes, serializer: add conversion between buffer_view<bytes_ostream> and managed_bytes_view
The codebase evolved to have several different ways to hold a fragmented
buffer: fragmented_temporary_buffer (for data received from the network;
not relevant for this discussion); bytes_ostream (for fragmented data that
is built incrementally; also used for a serialized result_set), and
managed_bytes (used for lsa and serialized individual values in
expression evaluation).

One problem with this state of affairs is that using data in one
fragmented form with functions that accept another fragmented form
requires either a copy, or templating everything. The former is
unpalatable for fast-path code, and the latter is undesirable for
compile time and run-time code footprint. So we'd like to make
the various forms compatible.

In 53e0dc7530 ("bytes_ostream: base on managed_bytes") we changed
bytes_ostream to have the same underlying data structure as
managed_bytes, so all that remains is to add the right API. This
is somewhat difficult as the data is hidden in multiple layers:
ser::buffer_view<> is used to abstract a slice of bytes_ostream,
and this is further abstracted by using iterators into bytes_ostream
rather than directly using the internals. Likewise, it's impossible
to construct a managed_bytes_view from the internals.

Hack through all of these by adding extract_implementation() methods,
and a build_managed_bytes_view_from_internals() helper. These are all
used by new APIs buffer_view_to_managed_bytes_view() that extract
the internals and put them back together again.

Ideally we wouldn't need any of this, but unifying the type system
in this area is quite an undertaking, so we need some shortcuts.
2023-05-07 17:17:34 +03:00
Avi Kivity
613f4b9858 utils: managed_bytes: add bidirectional conversion between bytes_opt and managed_bytes_opt
Useful, rather than open-coding the conversions.
2023-05-07 17:16:38 +03:00
Avi Kivity
1e6ef5503c utils: managed_bytes: add managed_bytes_view::with_linearized()
Becomes useful in later patches.

To avoid double-compiling the call to func(), use an
immediately-invoked lambda to calculate the bytes_view we'll be
calling func() with.
2023-05-07 17:16:38 +03:00
Avi Kivity
08ba0935e2 utils: managed_bytes: mark managed_bytes_view::is_linearized() const
It's trivially const, mark it so.
2023-05-07 17:16:38 +03:00
Tomasz Grabiec
d8826acaa3 tablets: Fix stack smashing in tablet_map_to_mutation()
The code was incorrectly passing a data_value of type bytes due to
implicit conversion of the result of serialize() (bytes_opt) to a
data_value object of type bytes_type via:

   data_value(std::optional<NativeType>);

mutation::set_static_cell() accepts a data_value object, which is then
serialized using column's type in abstract_type::decompose(data_value&):

    bytes b(bytes::initialized_later(), serialized_size(*this, value._value));
    auto i = b.begin();
    value.serialize(i);

Notice that serialized_size() is taken from the column type, but
serialization is done using data_value's type. The two types may have
a compatible CQL binary representation, but may differ in native
types. serialized_size() may incorrectly interpret the native type and
come up with the wrong size. If the size is too smaller, we end up with
stack or heap corruption later after serialize().

For example, if the column type is utf8 but value holds bytes, the
size will be wrong because even though both use the basic_sstring
type, they have a different layout due to max_size (15 vs 31).

Fixes #13717

Closes #13787
2023-05-07 14:07:50 +03:00
Botond Dénes
c1e8e86637 reader_concurrency_semaphore: reader_permit: clean-up after failed memory requests
When requesting memory via `reader_permit::request_memory()`, the
requested amount is added to `_requested_memory` member of the permit
impl. This is because multiple concurrent requests may be blocked and
waiting at the same time. When the requests are fulfilled, the entire
amount is consumed and individual requests track their requested amount
with `resource_units` to release later.
There is a corner-case related to this: if a reader permit is registered
as inactive while it is waiting for memory, its active requests are
killed with `std::bad_alloc`, but the `_requested_memory` fields is not
cleared. If the read survives because the killed requests were part of
a non-vital background read-ahead, a later memory request will also
include amount from the failed requests. This extra amount wil not be
released and hence will cause a resource leak when the permit is
destroyed.
Fix by detecting this corner case and clearing the `_requested_memory`
field. Modify the existing unit test for the scenario of a permit
waiting on memory being registered as inactive, to also cover this
corner case, reproducing the bug.

Fixes: #13539

Closes #13679
2023-05-07 14:06:51 +03:00
Kefu Chai
bd3e8d0460 test: drop a reusable_sst() variant which accepts int as generation
this is one of the changes to reduce the usage of integer based generation
test. in future, we will need to expand the test to exercise the UUID
based generation, or at least to be neutral to the underlying generation's
identifier type. so, to remove the helpers which only accept `generation_type::int_t`
would helps us to make this happen.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-06 18:24:48 +08:00
Kefu Chai
9b35faf485 treewide: replace generation_type::value() with generation_type::as_int()
* replace generation_type::value() with generation_type::as_int()
* drop generation_value()

because we will switch over to UUID based generation identifier, the member
function or the free function generation_value() cannot fulfill the needs
anymore. so, in this change, they are consolidated and are replaced by
"as_int()", whose name is more specific, and will also work and won't be
misleading even after switching to UUID based generation identifier. as
`value()` would be confusing by then: it could be an integer or a UUID.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-06 18:24:45 +08:00
Kamil Braun
aba31ad06c storage_service: use seastar::format instead of fmt::format
For some reason Scylla crashes on `aarch64` in release mode when calling
`fmt::format` in `raft_removenode` and `raft_decommission`. E.g. on this
line:
```
group0_command g0_cmd = _group0->client().prepare_command(std::move(change), guard, fmt::format("decomission: request decomission for {}", raft_server.id()));
```

I found this in our configure.py:
```
def get_clang_inline_threshold():
    if args.clang_inline_threshold != -1:
        return args.clang_inline_threshold
    elif platform.machine() == 'aarch64':
        # we see miscompiles with 1200 and above with format("{}", uuid)
        # also coroutine miscompiles with 600
        return 300
    else:
        return 2500
```
but reducing it to `0` didn't help.

I managed to get the following backtrace (with inline threshold 0):
```
void boost::intrusive::list_impl<boost::intrusive::mhtraits<seastar::thread_context, boost::intrusive::list_member_hook<>, &seastar::thread_context::_all_link>, unsigned long, false, void>::clear_and_dispose<boost::intrusive::detail::null_disposer>(boost::intrusive::detail::null_disposer) at /usr/include/boost/intrusive/list.hpp:751
 (inlined by) boost::intrusive::list_impl<boost::intrusive::mhtraits<seastar::thread_context, boost::intrusive::list_member_hook<>, &seastar::thread_context::_all_link>, unsigned long, false, void>::clear() at /usr/include/boost/intrusive/list.hpp:728
 (inlined by) ~list_impl at /usr/include/boost/intrusive/list.hpp:255
void fmt::v9::detail::buffer<wchar_t>::append<wchar_t>(wchar_t const*, wchar_t const*) at ??:?
void fmt::v9::detail::vformat_to<char>(fmt::v9::detail::buffer<char>&, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<std::conditional<std::is_same<fmt::v9::type_identity<char>::type, char>::value, fmt::v9::appender, std::back_insert_iterator<fmt::v9::detail::buffer<fmt::v9::type_identity<char>::type> > >::type, fmt::v9::type_identity<char>::type> >, fmt::v9::detail::locale_ref) at ??:?
fmt::v9::vformat[abi:cxx11](fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<fmt::v9::appender, char> >) at ??:?
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > fmt::v9::format<utils::tagged_uuid<raft::server_id_tag>&>(fmt::v9::basic_format_string<char, fmt::v9::type_identity<utils::tagged_uuid<raft::server_id_tag>&>::type>, utils::tagged_uuid<raft::server_id_tag>&) at /usr/include/fmt/core.h:3206
 (inlined by) service::storage_service::raft_removenode(utils::tagged_uuid<locator::host_id_tag>) at ./service/storage_service.cc:3572
```

Maybe it's a bug in `fmt` library?

In any case replacing the call with `::format` (i.e. `seastar::format`
from seastar/core/print.hh) helps.

Do it for the entire file for consistency (and avoiding this bug).

Also, for the future, replace `format` calls with `::format` - now it's
the same thing, but the latter won't clash with `std::format` once we
switch to libstdc++13.

Fixes #13707

Closes #13711
2023-05-05 19:23:22 +02:00
Kamil Braun
70f2b09397 Merge 'scylla_cluster.py: fix read_last_line' from Gusev Petr
This is a follow-up to #13399, the patch
addresses the issues mentioned there:
* linesep can be split between blocks;
* linesep can be part of UTF-8 sequence;
* avoid excessively long lines, limit to 256 chars;
* the logic of the function made simpler and more maintainable.

Closes #13427

* github.com:scylladb/scylladb:
  pylib_test: add tests for read_last_line
  pytest: add pylib_test directory
  scylla_cluster.py: fix read_last_line
  scylla_cluster.py: move read_last_line to util.py
2023-05-05 13:29:15 +02:00
Botond Dénes
1e9dcaff01 Merge 'build: cmake: use Seastar API level 6' from Kefu Chai
to avoid the FTBFS after we bump up the Seastar submodule which bumped up its API level to v7. and API v7 is a breaking change. so, in order to unbreak the build, we have to hardwire the API level to 6. `configure.py` also does this.

Closes #13780

* github.com:scylladb/scylladb:
  build: cmake: disable deprecated warning
  build: cmake: use Seastar API level 6
2023-05-05 13:55:34 +03:00
Kefu Chai
05a172c7e7 build: cmake: link against Boost::unit_test_framework
we introduced the linkage to Boost::unit_test_framework in
fe70333c19, this library is used by
test/lib/test_utils.cc, so update CMake accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13781
2023-05-05 13:55:00 +03:00
Petr Gusev
8a0bcf9d9d pylib_test: add tests for read_last_line 2023-05-05 12:57:43 +04:00
Petr Gusev
7476e91d67 pytest: add pylib_test directory
We want to add tests for read_last_line,
in this commit we add a new directory for them
since there were no tests for pylib code before.
2023-05-05 12:57:43 +04:00
Petr Gusev
330d1d5163 scylla_cluster.py: fix read_last_line
This is a follow-up to #13399, the patch
addresses the issues mentioned there:
* linesep can be split between blocks;
* linesep can be part of UTF-8 sequence;
* avoid excessively long lines, limit to 512 chars;
* the logic of the function made simpler and more
maintainable.
2023-05-05 12:57:36 +04:00
Petr Gusev
8a5e211c30 scylla_cluster.py: move read_last_line to util.py
We want to add tests for read_last_line, so we
move it to make this simper.
2023-05-05 12:51:25 +04:00
Botond Dénes
687a8bb2f0 Merge 'Sanitize test::filename(sstable) API' from Pavel Emelyanov
There are two of them currently with slightly different declaration. Better to leave only one.

Closes #13772

* github.com:scylladb/scylladb:
  test: Deduplicate test::filename() static overload
  test: Make test::filename return fs::path
2023-05-05 11:36:08 +03:00
Botond Dénes
b704698ba5 Merge 'Close toc file in remove_by_toc_name()' from Pavel Emelyanov
The method in question suffers from scylladb/seastar#1298. The PR fixes it and makes a bit shorter along the way

Closes #13776

* github.com:scylladb/scylladb:
  sstable: Close file at the end
  sstables: Use read_entire_stream_cont() helper
2023-05-05 11:33:05 +03:00
Anna Stuchlik
27b0dff063 doc: make branch-5.2 latest and stable
This commit changes the configuration in the conf.py
file to make branch-5.2 the latest version and
remove it from the list of unstable versions.

As a result, the docs for version 5.2 will become
the default for users accessing the ScyllaDB Open Source
documentation.

This commit should be merged as soon as version 5.2
is released.

Closes #13681
2023-05-05 11:11:17 +03:00
Botond Dénes
0cccf9f1cc Merge 'Remove some file_writer public methods' from Pavel Emelyanov
One is unused, the other one is not really required in public

Closes #13771

* github.com:scylladb/scylladb:
  file_writer: Remove static make() helper
  sstable: Use toc_filename() to print TOC file path
2023-05-05 10:48:46 +03:00
Pavel Emelyanov
ac305076bd test: Split test_twcs_interposer_on_memtable_flush naturally
The test case consists of two internal sub-test-cases. Making them
explicit kills three birds with one stone

- improves parallelizm
- removes env's tempdir wiping
- fixes code indentation

refs: #12707

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13768
2023-05-05 10:42:30 +03:00
Raphael S. Carvalho
1f69c46889 sstables: use version_types received from parser or writer
This is only a cosmetical change, no change in semantics

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13779
2023-05-05 10:32:14 +03:00
Kefu Chai
e4c6b0b31d build: cmake: disable deprecated warning
since Seastar now deprecates a bunch of APIs which accept io_priority_class,
we started to have deprecated warnings. before migrating to V7 API,
let's disable this warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-05 15:31:39 +08:00
Kefu Chai
3c941e8b8a build: cmake: use Seastar API level 6
to avoid the FTBFS after we bump up the Seastar submodule
which bumped up its API level to v7. and API v7 is a breaking
change. so, in order to unbreak the build, we have to hardwire
the API level to 6. `configure.py` also does this.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-05 15:21:42 +08:00
Avi Kivity
fe1cd6f477 Update seastar submodule
* seastar 02d5a0d7c...f94b1bb9c (12):
  > Merge 'Unify CPU scheduling groups and IO priority classes' from Pavel Emelyanov
  > scripts: addr2line: relax regular expression for matching kernel traces
  > add dirs for clangd to .gitignore
  > http::client: Log failed requests' body
  > build: always quote the ENVIRONMENT with quotes
  > exception_hacks: Change guard check order to work around static init fail
  > shared_future: remove support for variadic futures
  > iotune: Don't close file that wasn't opened
Fixes #13439
  > Merge 'Relax per tick IO grab threshold' from Pavel Emelyanov
  > future: simplify constraint on then() a little
  > Merge 'coroutine: generator: initialize const member variable and enable generator tests' from Kefu Chai
  > future: drop libc++ std::tuple compatibility hack

Closes #13777
2023-05-05 00:32:11 +03:00
Pavel Emelyanov
75e7187e1a sstable: Close file at the end
The thing is than when closing file input stream the underlying file is
not .close()-d (see scylladb/seastar#1298). The remove_by_toc_name() is
buggy in this sense. Using with_closeable() fixes it and makes the code
shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-04 20:37:48 +03:00
Pavel Emelyanov
334383beb5 sstables: Use read_entire_stream_cont() helper
The remove_by_toc_name() wants to read the whole stream into a sstring.
There's a convenience helper to facilitate that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-04 20:37:09 +03:00
Avi Kivity
f125a3e315 Merge 'tree: finish the reader_permit state renames' from Botond Dénes
In https://github.com/scylladb/scylladb/pull/13482 we renamed the reader permit states to more descriptive names. That PR however only covered only the states themselves and their usages, as well as the documentation in `docs/dev`.
This PR is a followup to said PR, completing the name changes: renaming all symbols, names, comments etc, so all is consistent and up-to-date.

Closes #13573

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: misc updates w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update permit members w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update API w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update stats w.r.t. recent permit state name changes
2023-05-04 18:29:04 +03:00
Avi Kivity
204521b9a7 Merge 'mutation/mutation_compactor: validate range tombstone change before it is moved' from Botond Dénes
e2c9cdb576 moved the validation of the range tombstone change to the place where it is actually consumed, so we don't attempt to pass purged or discarded range tombstones to the validator. In doing so however, the validate pass was moved after the consume call, which moves the range tombstone change, the validator having been passed a moved-from range tombstone. Fix this by moving he validation to before the consume call.

Refs: #12575

Closes #13749

* github.com:scylladb/scylladb:
  test/boost/mutation_test: add sanity test for mutation compaction validator
  mutation/mutation_compactor: add validation level to compaction state query constructor
  mutation/mutation_compactor: validate range tombstone change before it is moved
2023-05-04 18:15:35 +03:00
Avi Kivity
1d351dde06 Merge 'Make S3 client work with real S3' from Pavel Emelyanov
Current S3 client was tested over minio and it takes few more touches to work with amazon S3.

The main challenge here is to support singed requests. The AWS S3 server explicitly bans unsigned multipart-upload requests, which in turn is the essential part of the sstables S3 backend, so we do need signing. Signing a request has many options and requirements, one of them is -- request _body_ can be or can be not included into signature calculations. This is called "(un)signed payload". Requests sent over plain HTTP require payload signing (i.e. -- request body should be included into signature calculations), which can a bit troublesome, so instead the PR uses unsigned payload (i.e. -- doesn't include the request body into signature calculation, only necessary headers and query parameters), but thus also needs HTTPS.

So what this set does is makes the existing S3 client code sign requests. In order to sign the request the code needs to get AWS key and secret (and region) from somewhere and this somewhere is the conf/object_storage.yaml config file. The signature generating code was previously merged (moved from alternator code) and updated to suit S3 client needs.

In order to properly support HTTPS the PR adds special connection factory to be used with seastar http client. The factory makes DNS resolving of AWS endpoint names and configures gnutls systemtrust.

fixes: #13425

Closes #13493

* github.com:scylladb/scylladb:
  doc: Add a document describing how to configure S3 backend
  s3/test: Add ability to run boost test over real s3
  s3/client: Sign requests if configured
  s3/client: Add connection factory with DNS resolve and configurable HTTPS
  s3/client: Keep server port on config
  s3/client: Construct it with config
  s3/client: Construct it with sstring endpoint
  sstables: Make s3_storage with endpoint config
  sstables_manager: Keep object storage configs onboard
  code: Introduce conf/object_storage.yaml configuration file
2023-05-04 18:08:54 +03:00
Avi Kivity
2d74dc0efd Merge 'sstable_directory: parallel_for_each_restricted: do not move container' from Benny Halevy
Commit ecbd112979
`distributed_loader: reshard: consider sstables for cleanup`
caused a regression in loading new sstables using the `upload`
directory, as seen in e.g. https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/230/testReport/migration_test/TestMigration/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split000___test_migrate_sstable_without_compression_3_0_md_/
```
            query = "SELECT COUNT(*) FROM cf"
            statement = SimpleStatement(query)
            s = self.patient_cql_connection(node, 'ks')
            result = list(s.execute(statement))
    >       assert result[0].count == expected_number_of_rows, \
                "Expected {} rows. Got {}".format(expected_number_of_rows, list(s.execute("SELECT *
FROM ks.cf")))
    E       AssertionError: Expected 1 rows. Got []
    E       assert 0 == 1
    E         +0
```

The reason for the regression is that the call to `do_for_each_sstable` in `collect_all_shared_sstables` to search for sstables that need cleanup caused the list of sstables in the sstable directory to be moved and cleared.

parallel_for_each_restricted moves the container passed to it into a `do_with` continuation.  This is required for parallel_for_each_restricted.

However, moving the container is destructive and so, the decision whether to move or not needs to be the caller's, not the callee.

This patch changes the signature of parallel_for_each_restricted to accept a container rather than a rvalue reference, allowing the callers to decide whether to move or not.

Most callers are converted to move the container, except for `do_for_each_sstable` that copies `_unshared_local_sstables`, allowing callers to call `dir.do_for_each_sstable` multiple times without moving the list contents.

Closes #13526

* github.com:scylladb/scylladb:
  sstable_directory: coroutinize parallel_for_each_restricted
  sstable_directory: parallel_for_each_restricted: use std::ranges for template definition
  sstable_directory: parallel_for_each_restricted: do not move container
2023-05-04 17:39:05 +03:00
Pavel Emelyanov
56dfc21ba0 test: Deduplicate test::filename() static overload
There are two of them currently, both returning fs::path for sstable
components. One is static and can be dropped, callers are patched to use
the non-static one making the code tiny bit shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-04 17:16:00 +03:00
Pavel Emelyanov
3f30a253be test: Make test::filename return fs::path
The sstable::filename() is private and is not supposed to be used as a
path to open any files. However, tests are different and they sometimes
know it is. For that they use test wrapper that has access to private
members and may make assumptions about meaning of sstable::filename().

Said that, the test::filename() should return fs::path, not sstring.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-04 17:14:04 +03:00
Michał Chojnowski
eb5ccb7356 mutation_partition_v2: fix a minor bug in printer
Commit 1cb95b8cf caused a small regression in the debug printer.
After that commit, range tombstones are printed to stdout,
instead of the target stream.
In practice, this causes range tombstones to appear in test logs
out of order with respect to other parts of the debug message.

Fix that.

Closes #13766
2023-05-04 16:56:40 +03:00
Pavel Emelyanov
c4394a059c file_writer: Remove static make() helper
It's simply unused

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-04 16:55:41 +03:00
Pavel Emelyanov
eaf534cc4b sstable: Use toc_filename() to print TOC file path
The sstable::write_toc() gets TOC filename from file writer, while it
can get it from itself. This makes the file_writer::get_filename()
private and actually improves logging, as the writer is not required
to have the filename onboard, while sstable always has it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-04 16:54:21 +03:00
Mikołaj Grzebieluch
4a8a8c153c service/raft: raft_group_registry: Add verification of destination ID
All Raft verbs include dst_id, the ID of the destination server, but it isn't checked.
`append_entries` will work even if it arrives at completely the wrong server (but in the same group).
It can cause problems, e.g. in the scenario of replacing a dead node.

This commit adds verifying if `dst_id` matches the server's ID and if it doesn't,
the Raft verb is rejected.

Closes #12179
2023-05-04 15:25:23 +02:00
Tomasz Grabiec
e385ce8a2b Merge "fix stack use after free during shutdown" from Gleb
storage_service uses raft_group0 but the during shutdown the later is
destroyed before the former is stopped. This series move raft_group0
destruction to be after storage_service is stopped already. For the
move to work some existing dependencies of raft_group0 are dropped
since they do not really needed during the object creation.

Fixes #13522
2023-05-04 15:14:18 +02:00
Pavel Emelyanov
fe70333c19 test: Auto-skip object-storage test cases if run from shell
In case an sstable unit test case is run individually, it would fail
with exception saying that S3_... environment is not set. It's better to
skip the test-case rather than fail. If someone wants to run it from
shell, it will have to prepare S3 server (minio/AWS public bucket) and
provide proper environment for the test-case.

refs: #13569

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13755
2023-05-04 14:15:18 +03:00
Mikołaj Grzebieluch
ae41d908d7 service/raft: raft_group_registry: handle_raft_rpc refactor
One-way RPC and two-way RPC have different semantics, i.e. in the first one
client doesn't need to wait for an answer.

This commit splits the logic of `handle_raft_rpc` to enable handle differences
in semantics, e.g. errors handling.
2023-05-04 13:05:04 +02:00
Botond Dénes
0c9af10470 test/cql-pytest: add test_sstable_validation.py
This test file, focuses on stressing the underlying sstable validator
with cases where the data/index has discrepancies.
2023-05-04 06:48:05 -04:00
Botond Dénes
a26224ffb8 test/cql-pytest: extract scylla_path,temp_workdir fixtures to conftest.py
From test_tools.py, their current home. They will soon be used by more
then one test file.
2023-05-04 06:48:05 -04:00
Konstantin Osipov
e7c9ca560b test: issue a read barrier before checking ring consistency
Raft replication doesn't guarantee that all replicas see
identical Raft state at all times, it only guarantees the
same order of events on all replicas.

When comparing raft state with gossip state on a node, first
issue a read barrier to ensure the node has the latest raft state.

To issue a read barrier it is sufficient to alter a non-existing
state: in order to validate the DDL the node needs to sync with the
leader and fetch its latest group0 state.

Fixes #13518 (flaky topology test).

Closes #13756
2023-05-04 12:22:07 +02:00
Gleb Natapov
dc6c3b60b4 init: move raft_group0 creation before storage_service
storage_service uses raft_group0 so the later needs to exists until
the former is stopped.
2023-05-04 13:03:18 +03:00
Gleb Natapov
e9fb885e82 service/raft: raft_group0: drop dependency on cdc::generation_service
raft_group0 does not really depends on cdc::generation_service, it needs
it only transiently, so pass it to appropriate methods of raft_group0
instead of during its creation.
2023-05-04 13:03:07 +03:00
Benny Halevy
205daf49fd sstable_directory: coroutinize parallel_for_each_restricted
Using a coroutine simplifies the function and reduced the
number of moves it performs.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-04 11:46:59 +03:00
Benny Halevy
e4acc44814 sstable_directory: parallel_for_each_restricted: use std::ranges for template definition
We'd like the container to be a std::ranges::range.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-04 11:44:24 +03:00
Benny Halevy
e2023877f2 sstable_directory: parallel_for_each_restricted: do not move container
Commit ecbd112979
`distributed_loader: reshard: consider sstables for cleanup`
caused a regression in loading new sstables using the `upload`
directory, as seen in e.g. https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/230/testReport/migration_test/TestMigration/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split000___test_migrate_sstable_without_compression_3_0_md_/
```
        query = "SELECT COUNT(*) FROM cf"
        statement = SimpleStatement(query)
        s = self.patient_cql_connection(node, 'ks')
        result = list(s.execute(statement))
>       assert result[0].count == expected_number_of_rows, \
            "Expected {} rows. Got {}".format(expected_number_of_rows, list(s.execute("SELECT * FROM ks.cf")))
E       AssertionError: Expected 1 rows. Got []
E       assert 0 == 1
E         +0
E         -1
```

The reason for the regression is that the call to `do_for_each_sstable`
in `collect_all_shared_sstables` to search for sstables that need
cleanup caused the list of sstables in the sstable directory to be
moved and cleared.

parallel_for_each_restricted moves the container passed to it
into a `do_with` continuation.  This is required for
parallel_for_each_restricted.

However, moving the container is destructive and so,
the decision whether to move or not needs to be the
caller's, not the callee.

This patch changes the signature of parallel_for_each_restricted
to accept a lvalue reference to the container rather than a rvalue reference,
allowing the callers to decide whether to move or not.

Most callers are converted to move the container, as they effectively do
today, and a new method, `filter_sstables` was added for the
`collect_all_shared_sstables` us case, that allows the `func` that
processes each sstable to decide whether the sstable is kept
in `_unshared_local_sstables` or not.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-04 11:36:25 +03:00
Botond Dénes
6bc5c4acf6 tools/scylla-sstables: write validation result to stdout
Currently the validate command uses the logger to output the result of
validation. This is inconsistent with other commands which all write
their output to stdout and log any additional information/errors to
stderr. This patch updates the validate command to do the same. While at
it, remove the "Validating..." message, it is not useful.
2023-05-04 03:13:07 -04:00
Botond Dénes
c1f18cb0c1 sstables/sstable: validate(): delegate to mx validator for mx sstables
We have a more in-depth validator for the mx format, so delegate to that
if the validated sstable is of that format. For kl/la we fall-back to
the reader-level validator we used before.
2023-05-04 03:13:07 -04:00
Botond Dénes
d941d38759 sstables/mx/reader: add mx specific validator
Working with the low-level sstable parser and index reader, this
validator also cross-checks the index with the data file, making sure
all partitions are located at the position and in the order the index
describes. Furthermore, if the index also has promoted index, the order
and position of clustering elements is checked against it.
This is above the usual fragment kind order, partition key order and
clustering order checks that we already had with the reader-level
validator.
2023-05-04 03:13:03 -04:00
Botond Dénes
11f2d6bd0a Merge 'build: only apply -Wno-parentheses-equality to ANTLR generated sources' from Kefu Chai
it turns out the only places where we have compiler warnings of -W-parentheses-equality is the source code generated by ANTLR. strictly speaking, this is valid C++ code, just not quite readable from the hygienic point of view. so let's enable this warning in the source tree, but only disable it when compiling the sources generated by ANTLR.

please note, this warning option is supported by both GCC and Clang, so no need to test if it is supported.

for a sample of the warnings, see:
```
/home/kefu/dev/scylladb/build/cmake/cql3/CqlLexer.cpp:21752:38: error: equality comparison with extraneous parentheses [-Werror,-Wparentheses-equality]
                            if ( (LA4_0 == '$'))
                                  ~~~~~~^~~~~~
/home/kefu/dev/scylladb/build/cmake/cql3/CqlLexer.cpp:21752:38: note: remove extraneous parentheses around the comparison to silence this warning
                            if ( (LA4_0 == '$'))
                                 ~      ^     ~
```

Closes #13762

* github.com:scylladb/scylladb:
  build: only apply -Wno-parentheses-equality to ANTLR generated sources
  compaction: disambiguate format_to()
2023-05-04 10:09:36 +03:00
Kefu Chai
c76486c508 build: only apply -Wno-parentheses-equality to ANTLR generated sources
it turns out the only places where we have compiler warnings of
-W-parentheses-equality is the source code generated by ANTLR. strictly
speaking, this is valid C++ code, just not quite readable from the
hygienic point of view. so let's enable this warning in the source tree,
but only disable it when compiling the sources generated by ANTLR.

please note, this warning option is supported by both GCC and Clang,
so no need to test if it is supported.

for a sample of the warnings, see:
```
/home/kefu/dev/scylladb/build/cmake/cql3/CqlLexer.cpp:21752:38: error: equality comparison with extraneous parentheses [-Werror,-Wparentheses-equality]
                            if ( (LA4_0 == '$'))
                                  ~~~~~~^~~~~~
/home/kefu/dev/scylladb/build/cmake/cql3/CqlLexer.cpp:21752:38: note: remove extraneous parentheses around the comparison to silence this warning
                            if ( (LA4_0 == '$'))
                                 ~      ^     ~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-04 11:16:27 +08:00
Kefu Chai
113fb32019 compaction: disambiguate format_to()
we should always qualify `format_to` with its namespace. otherwise
we'd have following failure when compiling with libstdc++ from GCC-13:

```
/home/kefu/dev/scylladb/compaction/table_state.hh:65:16: error: call to 'format_to' is ambiguous
        return format_to(ctx.out(), "{}.{} compaction_group={}", s->ks_name(), s->cf_name(), t.get_group_id());
               ^~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13760
2023-05-03 20:33:18 +03:00
Pavel Emelyanov
0b18e3bff9 doc: Add a document describing how to configure S3 backend
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:23:38 +03:00
Pavel Emelyanov
e00d3188ed s3/test: Add ability to run boost test over real s3
Support the AWS_S3_EXTRA environment vairable that's :-split and the
respective substrings are set as endpoint AWS configuration. This makes
it possible to run boost S3 test over real S3.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:23:38 +03:00
Pavel Emelyanov
98b9c205bb s3/client: Sign requests if configured
If the endpoint config specifies AWS key, secret and region, all the
S3 requests get signed. Signature should have all the x-amz-... headers
included and should contain at least three of them. This patch includes
x-ams-date, x-amz-content-sha256 and host headers into the signing list.
The content can be unsigned when sent over HTTPS, this is what this
patch does.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:23:37 +03:00
Pavel Emelyanov
3dd82485f6 s3/client: Add connection factory with DNS resolve and configurable HTTPS
Existing seastar's factories work on socket_address, but in S3 we have
endpoint name which's a DNS name in case of real S3. So this patch
creates the http client for S3 with the custom connection factory that
does two things.

First, it resolves the provided endpoint name into address.
Second, it loads trust-file from the provided file path (or sets system
trust if configured that way).

Since s3 client creation is no-waiting code currently, the above
initialization is spawned in afiber and before creating the connection
this fiber is waited upon.

This code probably deserves living in seastar, but for now it can land
next to utils/s3/client.cc.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:23:19 +03:00
Pavel Emelyanov
3bec5ea2ce s3/client: Keep server port on config
Currently the code temporarily assumes that the endpoint port is 9000.
This is what tests' local minio is started with. This patch keeps the
port number on endpoint config and makes test get the port number from
minio starting code via environment.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
85f06ca556 s3/client: Construct it with config
Similar to previous patch -- extent the s3::client constructor to get
the endpoint config value next to the endpoint string. For now the
configs are likely empty, but they are yet unused too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
caf9e357c8 s3/client: Construct it with sstring endpoint
Currently the client is constructed with socket_address which's prepared
by the caller from the endpoint string. That's not flexible engouh,
because s3 client needs to know the original endpoint string for two
reasons.

First, it needs to lookup endpoint config for potential AWS creds.
Second, it needs this exact value as Host: header in its http requests.

So this patch just relaxes the client constructor to accept the endpoint
string and hard-code the 9000 port. The latter is temporary, this is how
local tests' minio is started, but next patch will make it configurable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
711514096a sstables: Make s3_storage with endpoint config
Continuation of the previous patch. The sstables::s3_storage gets the
endpoint config instance upon creation.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
bd1e3c688f sstables_manager: Keep object storage configs onboard
The user sstables manager will need to provide endpoint config for
sstables' storage drivers. For that it needs to get it from db::config
and keep in-sync with its updates.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
2f6aa5b52e code: Introduce conf/object_storage.yaml configuration file
In order to access real S3 bucket, the client should use signed requests
over https. Partially this is due to security considerations, partially
this is unavoidable, because multipart-uploading is banned for unsigned
requests on the S3. Also, signed requests over plain http require
signing the payload as well, which is a bit troublesome, so it's better
to stick to secure https and keep payload unsigned.

To prepare signed requests the code needs to know three things:
- aws key
- aws secret
- aws region name

The latter could be derived from the endpoint URL, but it's simpler to
configure it explicitly, all the more so there's an option to use S3
URLs without region name in them we could want to use some time.

To keep the described configuration the proposed place is the
object_storage.yaml file with the format

endpoints:
  - name: a.b.c
    port: 443
    aws_key: 12345
    aws_secret: abcdefghijklmnop
    ...

When loaded, the map gets into db::config and later will be propagated
down to sstables code (see next patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:15 +03:00
Botond Dénes
4365f004c1 test/boost/mutation_test: add sanity test for mutation compaction validator
Checking that compacted fragments are forwarded to the validator intact.
2023-05-03 04:19:42 -04:00
Botond Dénes
60e1a23864 mutation/mutation_compactor: add validation level to compaction state query constructor
Allowing the validation level to be customized by whoever creates the
compaction state. Add a default value (the previous hardcoded level) to
avoid the churn of updating all call sites.
2023-05-03 04:17:05 -04:00
Botond Dénes
be859db112 mutation/mutation_compactor: validate range tombstone change before it is moved
e2c9cdb576 moved the validation of the
range tombstone change to the place where it is actually consumed, so we
don't attempt to pass purged or discarded range tombstones to the
validator. In doing so however, the validate pass was moved after the
consume call, which moves the range tombstone change, the validator
having been passed a moved-from range tombstone. Fix this by moving he
validation to before the consume call.

Refs: #12575
2023-05-03 03:07:31 -04:00
Botond Dénes
48b9f31a08 Merge 'db, sstable: use generation_type instead of its value when appropriate' from Kefu Chai
in this series, we try to use `generation_type` as a proxy to hide the consumers from its underlying type. this paves the road to the UUID based generation identifier. as by then, we cannot assume the type of the `value()` without asking `generation_type` first. better off leaving all the formatting and conversions to the `generation_type`. also, this series changes the "generation" column of sstable registry table to "uuid", and convert the value of it to the original generation_type when necessary, this paves the road to a world with UUID based generation id.

Closes #13652

* github.com:scylladb/scylladb:
  db: use uuid for the generation column in sstable registry table
  db, sstable: add operator data_value() for generation_type
  db, sstable: print generation instead of its value
2023-05-03 09:04:54 +03:00
Nadav Har'El
b5f28e2b55 Merge 'Add S3 support to sstables::test_env' from Pavel Emelyanov
Currently there are only 2 tests for S3 -- the pure client test and compound object_store test that launches scylla, creates s3-backed table and CQL-queries it. At the same time there's a whole lot of small unit test for sstables functionality, part of it can run over S3 storage too.

This PR adds this support and patches several test cases to use it. More test cases are to come later on demand.

fixes: #13015

Closes #13569

* github.com:scylladb/scylladb:
  test: Make resharding test run over s3 too
  test: Add lambda to fetch bloom filter size
  test: Tune resharding test use of sstable::test_env
  test: Make datafile test case run over s3 too
  test: Propagate storage options to table_for_test
  test: Add support for s3 storage_options in config
  test: Outline sstables::test_env::do_with_async()
  test: Keep storage options on sstable_test_env config
  sstables: Add and call storage::destroy()
  sstables: Coroutinize sstable::destroy()
2023-05-02 21:48:05 +03:00
Botond Dénes
a6387477fa mutation/mutation_fragment_stream_validator: add validator() accessor to validating filter 2023-05-02 09:42:42 -04:00
Botond Dénes
d79db676b1 sstables/mx/reader: template data_consume_rows_context_m on the consumer
Sadly this means all accesses of base-class members have to be qualified
with `this->`.
2023-05-02 09:42:42 -04:00
Botond Dénes
06fb48362a sstables/mx/reader: move row_processing_result to namespace scope
Reduce `data_consume_rows_context_m`'s dependency on the
`mp_row_consumer_m` symbol, preparing the way to make the former
templated on the consumer.
2023-05-02 09:42:42 -04:00
Botond Dénes
00362754a0 sstables/mx/reader: use data_consumer::proceed directly
Currently mp_row_consumer_m creates an alias to data_consumer::proceed.
Code in the rest of the file uses both unqualified name and
mp_row_consumer_m::proceed. Remove the alias and just use
`data_consumer::proceed` directly everywhere, leads to cleaner code.
2023-05-02 09:42:42 -04:00
Botond Dénes
388e7ddc03 sstables/mx/reader.cc: extend namespace to end-of-file (cosmetic) 2023-05-02 09:42:42 -04:00
Botond Dénes
10fe76a0fe compaction/compaction: remove now unused scrub_validate_mode_validate_reader() 2023-05-02 09:42:42 -04:00
Botond Dénes
f6e5be472d compaction/compaction: move away from scrub_validate_mode_validate_reader()
Use sstable::validate() directly instead.
2023-05-02 09:42:42 -04:00
Botond Dénes
3e52f0681e tools/scylla-sstable: move away from scrub_validate_mode_validate_reader()
Use sstable::validate() directly instead. Since sstables have to be
validated individually, this means the operation looses the `--merge`
option.
2023-05-02 09:42:42 -04:00
Botond Dénes
393c42d4a9 test/boost/sstable_compaction_test: move away from scrub_validate_mode_validate_reader()
Test sstable::validate() instead. Also rename the unit test testing said
method from scrub_validate_mode_validate_reader_test to
sstable_validate_test to reflect the change.
At this point this test should probably be moved to
sstable_datafile_test.cc, but not in this patch.
Sadly this transition means we loose some test scenarios. Since now we
have to write the invalid data to sstables, we have to drop scenarios
which trigger errors on either the write or read path.
2023-05-02 09:42:42 -04:00
Botond Dénes
47959454eb sstables/sstable: add validate() method
To replace the validate code currently in compaction/compaction.cc (not
in this commit). We want to push down this logic to the sstable layer,
so that:
* Non compaction code that wishes to validate sstables (tests, tools)
  doesn't have to go through compaction.
* We can abstract how sstables are validated, in particular we want to
  add a new more low-level validation method that only the more recent
  sstable versions (mx) will support.
2023-05-02 09:42:41 -04:00
Botond Dénes
7ba5c9cc6a compaction/compaction: scrub_sstables_validate_mode(): validate sstables one-by-one
Currently said method creates a combined reader from all the sstables
passed to it then validates this combined reader.
Change it to validate each sstable (reader) individually in preparation
of the new validate method which can handle a single sstable at a time.
Note that this is not going to make much impact in practice, all callers
pass a single sstable to this method already.
2023-05-02 09:42:41 -04:00
Botond Dénes
e8c7ba98f1 compaction: scrub: use error messages from validator 2023-05-02 09:42:41 -04:00
Botond Dénes
d3749b810a mutation_fragment_stream_validator: produce error messages in low-level validator
Currently, error messages for validation errors are produced in several
places:
* the high-level validator (which is built on the low-level one)
* scrub compaction and validation compaction (scrub in validate mode)
* scylla-sstable's validate operation

We plan to introduce yet another place which would use the low-level
validator and hence would have to produce its own error messages. To cut
down all this duplication, centralize the production of error messages
in the low-level validator, which now returns a `validation_result`
object instead of bool from its validate methods. This object can be
converted to bool (so its backwards compatible) and also contains an
error message if validation failed. In the next patches we will migrate
all users of the low level validator (be that direct or indirect) to use
the error messages provided in this result object instead of coming up
with one themselves.
2023-05-02 09:42:41 -04:00
Botond Dénes
72003dc35c readers: evictable_reader: skip progress guarantee when next pos is partition start
The evictable reader must ensure that each buffer fill makes forward
progress, i.e. the last fragment in the buffer has a position larger
than the last fragment from the last buffer-fill. Otherwise, the reader
could get stuck in an infinite loop between buffer fills, if the reader
is evicted in-between.
The code guranteeing this forward change has a bug: when the next
expected position is a partition-start (another partition), the code
would loop forever, effectively reading all there is from the underlying
reader.
To avoid this, add a special case to ignore the progress guarantee loop
altogether when the next expected position is a partition start. In this
case, progress is garanteed anyway, because there is exactly one
partition-start fragment in each partition.

Fixes: #13491

Closes #13563
2023-05-02 16:19:32 +03:00
Botond Dénes
7baa2d9cb2 Merge 'Cleanup range printing' from Benny Halevy
This mini-series cleans up printing of ranges in utils/to_string.hh

It generalizes the helper function to work on a std::ranges::range,
with some exceptions, and adds a helper for boost::transformed_range.

It also changes the internal interface by moving `join` the the utils namespace
and use std::string rather than seastar::sstring.

Additional unit tests were added to test/boost/json_test

Fixes #13146

Closes #13159

* github.com:scylladb/scylladb:
  utils: to_string: get rid of utils::join
  utils: to_string: get rid of to_string(std::initializer_list)
  utils: to_string: get rid of to_string(const Range&)
  utils: to_string: generalize range helpers
  test: add string_format_test
  utils: chunked_vector: add std::ranges::range ctor
2023-05-02 14:55:18 +03:00
Botond Dénes
d6ed5bbc7e Merge 'alternator: fix validation of numbers' magnitude and precision' from Nadav Har'El
DynamoDB limits the allowed magnitude and precision of numbers - valid
decimal exponents are between -130 and 125 and up to 38 significant
decimal digitst are allowed. In contrast, Scylla uses the CQL "decimal"
type which offers unlimited precision. This can cause two problems:

1. Users might get used to this "unofficial" feature and start relying
    on it, not allowing us to switch to a more efficient limited-precision
    implementation later.

2. If huge exponents are allowed, e.g., 1e-1000000, summing such a
    number with 1.0 will result in a huge number, huge allocations and
    stalls. This is highly undesirable.

This series adds more tests in this area covering additional corner cases,
and then fixes the issue by adding the missing verification where it's
needed. After the series, all 12 tests in test/alternator/test_number.py now pass.

Fixes #6794

Closes #13743

* github.com:scylladb/scylladb:
  alternator: unit test for number magnitude and precision function
  alternator: add validation of numbers' magnitude and precision
  test/alternator: more tests for limits on number precision and magnitude
  test/alternator: reproducer for DoS in unlimited-precision addition
2023-05-02 14:33:36 +03:00
Kefu Chai
74e9e6dd1a db: use uuid for the generation column in sstable registry table
* change the "generation" column of sstable registry table from
  bigint to uuid
* from helper to convert UUID back to the original generation

in the long run, we encourage user to use uuid based generation
identifier. but in the transition period, both bigint based and uuid
based identifiers are used for the generation. so to cater both
needs, we use a hackish way to store the integer into UUID. to
differentiate the was-integer UUID from the geniune UUID, we
check the UUID's most_significant_bits. because we only support
serialize UUID v1, so if the timestamp in the UUID is zero,
we assume the UUID was generated from an integer when converting it
back to a generation identififer.

also, please note, the only use case of using generation as a
column is the sstable_registry table, but since its schema is fixed,
we cannot store both a bigint and a UUID as the value of its
`generation` column, the simpler way forward is to use a single type
for the generation. to be more efficient and to preserve the type of
the generation, instead of using types like ascii string or bytes,
we will always store the generation as a UUID in this table, if the
generation's identifier is a int64_t, the value of the integer will
be used as the least significant bits of the UUID.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-02 19:23:22 +08:00
Nadav Har'El
ed34f3b5e4 cql-pytest: translate Cassandra's test for LWT with collections
This is a translation of Cassandra's CQL unit test source file
validation/operations/InsertUpdateIfConditionTest.java into our cql-pytest
framework.

This test file checks various LWT conditional updates which involve
collections or UDTs (there is a separate test file for LWT conditional
updates which do not involve collections, which I haven't translated
yet).

The tests reproduce one known bug:

Refs #5855:  lwt: comparing NULL collection with empty value in IF
             condition yields incorrect results

And also uncovered three previously-unknown bugs:

Refs #13586: Add support for CONTAINS and CONTAINS KEY in LWT expressions
Refs #13624: Add support for UDT subfields in LWT expression
Refs #13657: Misformatted printout of column name in LWT error message

Beyond those bona-fide bugs, this test also demonstrates several places
where we intentionally deviated from Cassandra's behavior, forcing me
to comment out several checks. These deviations are known, and intentional,
but some of them are undocumented and it's worth listing here the ones
re-discovered by this test:

1. On a successful conditional write, Cassandra returns just True, Scylla
   also returns the old contents of the row. This difference is officially
   documented in docs/kb/lwt-differences.rst.
2. Scylla allows the test "l = [null]" or "s = {null}" with this weird
   null element (the result is false), whereas Cassandra prints an error.
3. Scylla allows "l[null]" or "m[null]" (resulting in null), Cassandra
   prints an error.
4. Scylla allows a negative list index, "l[-2]", resulting in null.
   Cassandra prints an error in this case.
5. Cassandra allows in "IF v IN (?, ?)" to bind individual values to
   UNSET_VALUE and skips them, Scylla treats this as an error. Refs #13659.
6. Scylla allows "IN null" (the condition just fails), Cassandra prints
   an error in this case.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13663
2023-05-02 11:53:58 +03:00
Pavel Emelyanov
d4a72de406 test: Make resharding test run over s3 too
Now when the test case and used lib/utils code is using storage-agnostic
approach, it can be extended to run over S3 storage as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-02 11:46:23 +03:00
Pavel Emelyanov
2601c58278 test: Add lambda to fetch bloom filter size
The resharding test compares bloom filter sizes before and after reshard
runs. For that it gets the filter on-disk filename and stat()s it. That
won't work with S3 as it doesn't have its accessable on-disk files.

Some time ago there existed the storage::get_stats() method, but now
it's gone. The new s3::client::get_object_stat() is coming, but it will
take time to switch to it. For now, generalize filter size fetching into
a local lambda. Next patch will make a stub in it for S3 case, and once
the get_object_stat() is there we'll be able to smoothly start using it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-02 11:43:26 +03:00
Kefu Chai
135b4fd434 db: schema_tables: capture reference to temporary value by value
`clustering_key_columns()` returns a range view, and `front()` returns
the reference to its first element. so we cannot assume the availability
of this reference after the expression is evaluated. to address this
issue, let's capture the returned range by value, and keep the first
element by reference.

this also silences warning from GCC-13:

```
/home/kefu/dev/scylladb/db/schema_tables.cc:3654:30: error: possibly dangling reference to a temporary [-Werror=dangling-reference]
 3654 |     const column_definition& first_view_ck = v->clustering_key_columns().front();
      |                              ^~~~~~~~~~~~~
/home/kefu/dev/scylladb/db/schema_tables.cc:3654:79: note: the temporary was destroyed at the end of the full expression ‘(& v)->view_ptr::operator->()->schema::clustering_key_columns().boost::iterator_range<__gnu_cxx::__normal_iterator<const column_definition*, std::vector<column_definition> > >::<anonymous>.boost::iterator_range_detail::iterator_range_base<__gnu_cxx::__normal_iterator<const column_definition*, std::vector<column_definition> >, boost::iterators::random_access_traversal_tag>::<anonymous>.boost::iterator_range_detail::iterator_range_base<__gnu_cxx::__normal_iterator<const column_definition*, std::vector<column_definition> >, boost::iterators::bidirectional_traversal_tag>::<anonymous>.boost::iterator_range_detail::iterator_range_base<__gnu_cxx::__normal_iterator<const column_definition*, std::vector<column_definition> >, boost::iterators::incrementable_traversal_tag>::front()’
 3654 |     const column_definition& first_view_ck = v->clustering_key_columns().front();
      |                                              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
```

Fixes #13720
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13721
2023-05-02 11:42:43 +03:00
Pavel Emelyanov
76594bf72b test: Tune resharding test use of sstable::test_env
The test case in question spawns async context then makes the test_env
instance on the stack (and stopper for it too). There's helper for the
above steps, better to use them.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-02 11:30:03 +03:00
Pavel Emelyanov
439c8770aa test: Make datafile test case run over s3 too
Most of the sstable_datafile test cases are capable of running with S3
storage, so this patch makes the simplest of them do it. Patching the
rest from this file is optional, because mostly the cases test how the
datafile data manipulations work without checking the files
manipulations. So even if making them all run over S3 is possible, it
will just increase the testing time w/o real test of the storage driver.

So this patch makes one test case run over local and S3 storages, more
patches to update more test cases with files manipulations are yet to
come.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-02 11:30:03 +03:00
Pavel Emelyanov
f7df238545 test: Propagate storage options to table_for_test
Teach table_for_tests use any storage options, not just local one. For
now the only user that passes non-local options is sstables::test_env.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-02 11:30:03 +03:00
Pavel Emelyanov
fa1de16f30 test: Add support for s3 storage_options in config
When the sstable test case wants to run over S3 storage it needs to
specify that in test config by providing the S3 storage options. So
first thing this patch adds is the helper that makes these options based
on the env left by minio launcher from test.py.

Next, in order to make sstables_manager work with S3 it needs the
plugged system keyspace which, in turn, needs query processor, proxy,
database, etc. All this stuff lives in cql_test_env, so the test case
running with S3 options will run in a sstables::test_env nested inside
cql_test_env. The latter would also need to plug its system keyspace to
the former's sstables manager and turn the experimental feature ON.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-02 11:30:03 +03:00
Nadav Har'El
57ffbcbb22 cql3: fix spurious token names in syntax error messages
We have known for a long time (see issue #1703) that the quality of our
CQL "syntax error" messages leave a lot to be desired, especially when
compared to Cassandra. This patch doesn't yet bring us great error
messages with great context - doing this isn't easy and it appears that
Antlr3's C++ runtime isn't as good as the Java one in this regard -
but this patch at least fixes **garbage** printed in some error messages.

Specifically, when the parser can deduce that a specific token is missing,
it used to print

    line 1:83 missing ')' at '<missing '

After this patch we get rid of the meaningless string '<missing ':

    line 1:83 : Missing ')'

Also, when the parser deduced that a specific token was unneeded, it
used to print:

    line 1:83 extraneous input ')' expecting <invalid>

Now we got rid of this silly "<invalid>" and write just:

    line 1:83 : Unexpected ')'

Refs #1703. I didn't yet marked that issue "fixed" because I think a
complete fix would also require printing the entire misparsed line and the
point of the parse failure. Scylla still prints a generic "Syntax Error"
in most cases now, and although the character number (83 in the above
example) can help, it's much more useful to see the actual failed
statement and where character 83 is.

Unfortunately some tests enshrine buggy error messages and had to be
fixed. Other tests enshrined strange text for a generic unexplained
error message, which used to say "  : syntax error..." (note the two
spaces and elipses) and after this patch is " : Syntax error". So
these tests are changed. Another message, "no viable alternative at
input" is deliberately kept unchanged by this patch so as not to break
many more tests which enshrined this message.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13731
2023-05-02 11:23:58 +03:00
Pavel Emelyanov
1e03733e8c test: Outline sstables::test_env::do_with_async()
It's growing larger, better to keep it in .cc file

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-02 11:15:45 +03:00
Pavel Emelyanov
f223f5357d test: Keep storage options on sstable_test_env config
So that it could be set to s3 by the test case on demand. Default is
local storage which uses env's tempdir or explicit path argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-02 11:15:45 +03:00
Pavel Emelyanov
81a1416ebf sstables: Add and call storage::destroy()
The s3_storage leaks client when sstable gets destoryed. So far this
came unnoticed, but debug-mode unit test ran over minio captured it. So
here's the fix.

When sstable is destroyed it also kicks the storage to do whatever
cleanup is needed. In case of s3 storage the cleanup is in closing the
on-boarded client. Until #13458 is fixed each sstable has its own
private version of the client and there's no other place where it can be
close()d in co_await-able mannter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-02 11:15:44 +03:00
Avi Kivity
c0eb0d57bc install-dependencies.sh: don't use fgrep
fgrep says:

    fgrep: warning: fgrep is obsolescent; using grep -F

follow its advice.

Closes #13729
2023-05-02 11:15:40 +03:00
Pavel Emelyanov
3e0c3346a8 sstables: Coroutinize sstable::destroy()
To simiplify patching by next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-02 11:15:11 +03:00
Nadav Har'El
e74f69bb56 alternator: unit test for number magnitude and precision function
In the previous patch we added a limit in Alternator for the magnitude
and precision of numbers, based on a function get_magnitude_and_precision
whose implementation was, unfortunately, rather elaborate and delicate.

Although we did add in the previous patches some end-to-end tests which
confirmed that the final decision made based on this function, to accept or
reject numbers, was a correct decision in a few cases, such an elaborate
function deserves a separate unit test for checking just that function
in isolation. In fact, this unit tests uncovered some bugs in the first
implementation of get_magnitude_and_precision() which the other tests
missed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-05-02 11:04:05 +03:00
Nadav Har'El
3c0603558c alternator: add validation of numbers' magnitude and precision
DynamoDB limits the allowed magnitude and precision of numbers - valid
decimal exponents are between -130 and 125 and up to 38 significant
decimal digitst are allowed. In contrast, Scylla uses the CQL "decimal"
type which offers unlimited precision. This can cause two problems:

1. Users might get used to this "unofficial" feature and start relying
   on it, not allowing us to switch to a more efficient limited-precision
   implementation later.

2. If huge exponents are allowed, e.g., 1e-1000000, summing such a
   number with 1.0 will result in a huge number, huge allocations and
   stalls. This is highly undesirable.

After this patch, all tests in test/alternator/test_number.py now
pass. The various failing tests which verify magnitude and precision
limitations in different places (key attributes, non-key attributes,
and arithmetic expressions) now pass - so their "xfail" tags are removed.

Fixes #6794

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-05-02 11:04:05 +03:00
Nadav Har'El
0eccc49308 test/alternator: more tests for limits on number precision and magnitude
We already have xfailing tests for issue #6794 - the missing checks on
precision and magnitudes of numbers in Alternator - but this patch adds
checks for additional corner cases. In particular we check the case that
numbers are used in a *key* column, which goes to a different code path
than numbers used in non-key columns, so it's worth testing as well.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-05-02 11:04:05 +03:00
Nadav Har'El
56b8b9d670 test/alternator: reproducer for DoS in unlimited-precision addition
As already noted in issue #6794, whereas DynamoDB limits the magnitude
of numbers to between 10^-130 and 10^125, Scylla does not. In this patch
we add yet another test for this problem, but unlike previous tests
which just shown too much magnitude being allowed which always sounded
like a benign problem - the test in this patch shows that this "feature"
can be used to DoS Scylla - a user user can send a short request that
causes arbitrarily-large allocations, stalls and CPU usage.

The test is currently marked "skip" because it cause cause Scylla to
take a very long time and/or run out of memory. It passes on DynamoDB
because the excessive magnitude is simply not allowed there.

Refs #6794

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-05-02 11:03:51 +03:00
Benny Halevy
959a740dac utils: to_string: get rid of utils::join
Use `fmt::format("{}", fmt::join(...))` instead.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-02 10:59:58 +03:00
Benny Halevy
e6bcb1c8df utils: to_string: get rid of to_string(std::initializer_list)
It's unused.

Just in case, add a unit test case for using the fmt library to
format it (that includes fmt::to_string(std::initializer_list)).

Note that the existing to_string implementation
used square brackets to enclose the initializer_list
but the new, standardized form uses curly braces.

This doesn't break anything since to_string(initializer_list)
wasn't used.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-02 10:48:46 +03:00
Benny Halevy
ba883859c7 utils: to_string: get rid of to_string(const Range&)
Use fmt::to_string instead.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-02 10:48:46 +03:00
Benny Halevy
15c9f0f0df utils: to_string: generalize range helpers
As seen in https://github.com/scylladb/scylladb/issues/13146
the current implementation is not general enough
to provide print helpers for all kind of containers.

Modernize the implementation using templates based
on std::ranges::range and using fmt::join.

Extend unit test for formatting different types of ranges,
boost::transformed ranges, deque.

Fixes #13146

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-02 10:48:46 +03:00
Benny Halevy
59e89efca6 test: add string_format_test
Test string formatting before cleaning up
utils/to_string.hh in the next patches.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-02 10:48:46 +03:00
Benny Halevy
45153b58bd utils: chunked_vector: add std::ranges::range ctor
To be used in next patch for constructing
chunked_vector from an initializer_list.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-02 10:48:46 +03:00
Wojciech Mitros
b18c21147f cql: check if the keyspace is system when altering permissions
Currently, when altering permissions on a functions resource, we
only check if it's a builtin function and not if it's all functions
in the "system" keyspace, which contains all builtin functions.
This patch adds a check of whether the function resource keyspace
is "system". This check actually covers both "single function"
and "all functions in keyspace" cases, so the additional check
for single functions is removed.

Closes #13596
2023-05-02 10:13:59 +03:00
Botond Dénes
022465d673 Merge 'Tone down offstrategy log message' from Benny Halevy
In many cases we trigger offstrategy compaction opportunistically
also when there's nothing to do.  In this case we still print
to the log lots of info-level message and call
`run_offstrategy_compaction` that wastes more cpu cycles
on learning that it has nothing to do.

This change bails out early if the maintenance set is empty
and prints a "Skipping off-strategy compaction" message in debug
level instead.

Fixes #13466

Also, add an group_id class and return it from compaction_group and table_state.
Use that to identify the compaction_group / table_state by "ks_name.cf_name compaction_group=idx/total" in log messages.

Fixes #13467

Closes #13520

* github.com:scylladb/scylladb:
  compaction_manager: print compaction_group id
  compaction_group, table_state: add group_id member
  compaction_manager: offstrategy compaction: skip compaction if no candidates are found
2023-05-02 08:05:18 +03:00
Avi Kivity
9c37fdaca3 Revert "dht: incremental_owned_ranges_checker: use lower_bound()"
This reverts commit d85af3dca4. It
restores the linear search algorithm, as we expect the search to
terminate near the origin. In this case linear search is O(1)
while binary search is O(log n).

A comment is added so we don't repeat the mistake.

Closes #13704
2023-05-02 08:01:44 +03:00
Benny Halevy
707bd17858 everywhere: optimize calls to make_flat_mutation_reader_from_mutations_v2 with single mutation
No point in going through the vector<mutation> entry-point
just to discover in run time that it was called
with a single-element vector, when we know that
in advance.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13733
2023-05-02 07:58:34 +03:00
Avi Kivity
72c12a1ab2 Merge 'cdc, db_clock: specialize fmt::formatter<{db_clock::time_point, generation_id}>' from Kefu Chai
this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `cdc::generation_id` and `db_clock::time_point` without the help of `operator<<`.

the formatter of `cdc::generation_id` uses that of `db_clock::time_point` , so these two commits are posted together in a single pull request.

the corresponding `operator<<()` is removed in this change, as all its callers are now using fmtlib for formatting now.

Refs #13245

Closes #13703

* github.com:scylladb/scylladb:
  db_clock: specialize fmt::formatter<db_clock::time_point>
  cdc: generation: specialize fmt::formatter<generation_id>
2023-05-01 22:56:33 +03:00
Avi Kivity
7b7d9bcb14 Merge 'Do not access owned_ranges_ptr across shards in update_sstable_cleanup_state' from Benny Halevy
This series fixes a few issues caused by f1bbf705f9
(f1bbf705f9):

- table, compaction_manager: prevent cross shard access to owned_ranges_ptr
  - Fixes #13631
- distributed_loader: distribute_reshard_jobs: pick one of the sstable shard owners
- compaction: make_partition_filter: do not assert shard ownership
  - allow the filtering reader now used during resharding to process tokens owned by other shards

Closes #13635

* github.com:scylladb/scylladb:
  compaction: make_partition_filter: do not assert shard ownership
  distributed_loader: distribute_reshard_jobs: pick one of the sstable shard owners
  table, compaction_manager: prevent cross shard access to owned_ranges_ptr
2023-05-01 22:51:00 +03:00
Avi Kivity
c9dab3ac81 Merge 'treewide: fix warnings from GCC-13' from Kefu Chai
this series silences the warnings from GCC 13. some of these changes are considered as critical fixes, and posted separately.

see also #13243

Closes #13723

* github.com:scylladb/scylladb:
  cdc: initialize an optional using its value type
  compaction: disambiguate type name
  db: schema_tables: drop unused variable
  reader_concurrency_semaphore: fix signed/unsigned comparision
  locator: topology: disambiguate type names
  raft: disambiguate promise name in raft::awaited_conf_changes
2023-05-01 22:48:00 +03:00
Kefu Chai
37f1beade5 s3/client: do not allocate potentially big object on stack
when compiling using GCC-13, it warns that:

```
/home/kefu/dev/scylladb/utils/s3/client.cc:224:9: error: stack usage might be 66352 bytes [-Werror=stack-usage=]
  224 | sstring parse_multipart_upload_id(sstring& body) {
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~
```

so it turns out that `rapidxml::xml_document<>` could be very large,
let's allocate it on heap instead of on the stack to address this issue.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13722
2023-05-01 22:46:18 +03:00
Kefu Chai
108f20c684 cql3: capture reference to temporary value by value
`data_dictionary::database::find_keyspace()` returns a temporary
object, and `data_dictionary::keyspace::user_types()` returns a
references pointing to a member of this temporary object. so we
cannot use the reference after the expression is evaluated. in
this change, we capture the return value of `find_keyspace()` using
universal reference, and keep the return value of `user_types()`
with a reference, to ensure us that we can use it later.

this change silences the warning from GCC-13, like:

```
/home/kefu/dev/scylladb/cql3/statements/authorization_statement.cc:68:21: error: possibly dangling reference to a temporary [-Werror=dangling-reference]
   68 |         const auto& utm = qp.db().find_keyspace(*keyspace).user_types();
      |                     ^~~
```

Fixes #13725
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13726
2023-05-01 22:41:41 +03:00
Kefu Chai
b76877fd99 transport: capture reference to temp value by value
`current_scheduling_group()` returns a temporary value, and `name()`
returns a reference, so we cannot capture the return value by reference,
and use the reference after this expression is evaluated. this would
cause undefined behavior. so let's just capture it by value.

this change also silence following warning from GCC-13:

```
/home/kefu/dev/scylladb/transport/server.cc:204:11: error: possibly dangling reference to a temporary [-Werror=dangling-reference]
  204 |     auto& cur_sg_name = current_scheduling_group().name();
      |           ^~~~~~~~~~~
/home/kefu/dev/scylladb/transport/server.cc:204:56: note: the temporary was destroyed at the end of the full expression ‘seastar::current_scheduling_group().seastar::scheduling_group::name()’
  204 |     auto& cur_sg_name = current_scheduling_group().name();
      |                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
```

Fixes #13719
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13724
2023-05-01 22:40:36 +03:00
Kefu Chai
0a3a254284 cql3: do not capture reference to temporary value
`data_dictionary::database::find_column_family()` return a temporary value,
and `data_dictionary::table::get_index_manager()` returns a reference in
this temporary value, so we cannot capture this reference and use it after
the expression is evaluated. in this change, we keep the return value
of `find_column_family()` by value, to extend the lifecycle of the return
value of `get_index_manager()`.

this should address the warning from GCC-13, like:

```
/home/kefu/dev/scylladb/cql3/restrictions/statement_restrictions.cc:519:15: error: possibly dangling reference to a temporary [-Werror=dangling-reference]
  519 |         auto& sim = db.find_column_family(_schema).get_index_manager();
      |               ^~~
```

Fixes #13727
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13728
2023-05-01 22:39:48 +03:00
Nadav Har'El
1cefb662cd Merge 'cql3/expr: remove expr::token' from Jan Ciołek
Let's remove `expr::token` and replace all of its functionality with `expr::function_call`.

`expr::token` is a struct whose job is to represent a partition key token.
The idea is that when the user types in `token(p1, p2) < 1234`, this will be internally represented as an expression which uses `expr::token` to represent the `token(p1, p2)` part.

The situation with `expr::token` is a bit complicated.
On one hand side it's supposed to represent the partition token, but sometimes it's also assumed that it can represent a generic call to the `token()` function, for example `token(1, 2, 3)` could be a `function_call`, but it could also be `expr::token`.

The query planning code assumes that each occurence of expr::token
represents the partition token without checking the arguments.
Because of this allowing `token(1, 2, 3)` to be represented as `expr::token` is dangerous - the query planning might think that it is `token(p1, p2, p3)` and plan the query based on this, which would be wrong.

Currently `expr::token` is created only in one specific case.
When the parser detects that the user typed in a restriction which has a call to `token` on the LHS it generates `expr::token`.
In all other cases it generates an `expr::function_call`.
Even when the `function_call` represents a valid partition token, it stays a `function_call`. During preparation there is no check to see if a `function_call` to `token` could be turned into `expr::token`. This is a bit inconsistent - sometimes `token(p1, p2, p3)` is represented as `expr::token` and the query planner handles that, but sometimes it might be represented as `function_call`, which the query planner doesn't handle.

There is also a problem because there's a lot of code duplication between a `function_call` and `expr::token`.
All of the evaluation and preparation is the same for `expr::token` as it's for a `function_call` to the token function.
Currently it's impossible to evaluate `expr::token` and preparation has some flaws, but implementing it would basically consist of copy-pasting the corresponding code from token `function_call`.

One more aspect is multi-table queries.
With `expr::token` we turn a call to the `token()` function into a struct that is schema-specific.
What happens when a single expression is used to make queries to multiple tables? The schema is different, so something that is represented as `expr::token` for one schema would be represented as `function_call` in the context of a different schema.
Translating expressions to different tables would require careful manipulation to convert `expr::token` to `function_call` and vice versa. This could cause trouble for index queries.

Overall I think it would be best to remove `expr::token`.

Although having a clear marker for the partition token is sometimes nice for query planning, in my opinion the pros are outweighted by the cons.
I'm a big fan of having a single way to represent things, having two separate representations of the same thing without clear boundaries between them causes trouble.

Instead of having both `expr::token` and `function_call` we can just have the `function_call` and check if it represents a partition token when needed.

Refs: #12906
Refs: #12677
Closes: #12905

Closes #13480

* github.com:scylladb/scylladb:
  cql3: remove expr::token
  cql3: keep a schema in visitor for extract_clustering_prefix_restrictions
  cql3: keep a schema inside the visitor for extract_partition_range
  cql3/prepare_expr: make get_lhs_receiver handle any function_call
  cql3/expr: properly print token function_call
  expr_test: use unresolved_identifier when creating token
  cql3/expr: split possible_lhs_values into column and token variants
  cql3/expr: fix error message in possible_lhs_values
  cql3: expr: reimplement is_satisfied_by() in terms of evaluate()
  cql3/expr: add a schema argument to expr::replace_token
  cql3/expr: add a comment for expr::has_partition_token
  cql3/expr: add a schema argument to expr::has_token
  cql3: use statement_restrictions::has_token_restrictions() wherever possible
  cql3/expr: add expr::is_partition_token_for_schema
  cql3/expr: add expr::is_token_function
  cql3/expr: implement preparing function_call without a receiver
  cql3/functions: make column family argument optional in functions::get
  cql3/expr: make it possible to prepare expr::constant
  cql3/expr: implement test_assignment for column_value
  cql3/expr: implement test_assignment for expr::constant
2023-04-30 15:31:35 +03:00
Tomasz Grabiec
aba5667760 Merge 'raft topology: refactor the coordinator to allow non-node specific topology transitions' from Kamil Braun
We change the meaning and name of `replication_state`: previously it was meant
to describe the "state of tokens" of a specific node; now it describes the
topology as a whole - the current step in the 'topology saga'. It was moved
from `ring_slice` into `topology`, renamed into `transition_state`, and the
topology coordinator code was modified to switch on it first instead of node
state - because there may be no single transitioning node, but the topology
itself may be transitioning.

This PR was extracted from #13683, it contains only the part which refactors
the infrastructure to prepare for non-node specific topology transitions.

Closes #13690

* github.com:scylladb/scylladb:
  raft topology: rename `update_replica_state` -> `update_topology_state`
  raft topology: remove `transition_state::normal`
  raft topology: switch on `transition_state` first
  raft topology: `handle_ring_transition`: rename `res` to `exec_command_res`
  raft topology: parse replaced node in `exec_global_command`
  raft topology: extract `cleanup_group0_config_if_needed` from `get_node_to_work_on`
  storage_service: extract raft topology coordinator fiber to separate class
  raft topology: rename `replication_state` to `transition_state`
  raft topology: make `replication_state` a topology-global state
2023-04-30 10:55:24 +02:00
Kefu Chai
e333bcc2da cdc: initialize an optional using its value type
as this syntax is not supported by the standard, it seems clang
just silently construct the value with the initializer list and
calls the operator=, but GCC complains:

```
/home/kefu/dev/scylladb/cdc/split.cc:392:54: error: converting to ‘std::optional<partition_deletion>’ from initializer list would use explicit constructor ‘constexpr std::optional<_Tp>::optional(_Up&&) [with _Up = const tombstone&; typename std::enable_if<__and_v<std::__not_<std::is_same<std::optional<_Tp>, typename std::remove_cv<typename std::remove_reference<_Iter>::type>::type> >, std::__not_<std::is_same<std::in_place_t, typename std::remove_cv<typename std::remove_reference<_Iter>::type>::type> >, std::is_constructible<_Tp, _Up>, std::__not_<std::is_convertible<_Iter, _Iterator> > >, bool>::type <anonymous> = false; _Tp = partition_deletion]’
  392 |         _result[t.timestamp].partition_deletions = {t};
      |                                                      ^
```

to silences the error, and to be more standard compliant,
let's use emplace() instead.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-29 19:34:12 +08:00
Jan Ciolek
be8ef63bf5 cql3: remove expr::token
Let's remove expr::token and replace all of its functionality with expr::function_call.

expr::token is a struct whose job is to represent a partition key token.
The idea is that when the user types in `token(p1, p2) < 1234`,
this will be internally represented as an expression which uses
expr::token to represent the `token(p1, p2)` part.

The situation with expr::token is a bit complicated.
On one hand side it's supposed to represent the partition token,
but sometimes it's also assumed that it can represent a generic
call to the token() function, for example `token(1, 2, 3)` could
be a function_call, but it could also be expr::token.

The query planning code assumes that each occurence of expr::token
represents the partition token without checking the arguments.
Because of this allowing `token(1, 2, 3)` to be represented
as expr::token is dangerous - the query planning
might think that it is `token(p1, p2, p3)` and plan the query
based on this, which would be wrong.

Currently expr::token is created only in one specific case.
When the parser detects that the user typed in a restriction
which has a call to `token` on the LHS it generates expr::token.
In all other cases it generates an `expr::function_call`.
Even when the `function_call` represents a valid partition token,
it stays a `function_call`. During preparation there is no check
to see if a `function_call` to `token` could be turned into `expr::token`.
This is a bit inconsistent - sometimes `token(p1, p2, p3)` is represented
as `expr::token` and the query planner handles that, but sometimes it might
be represented as `function_call`, which the query planner doesn't handle.

There is also a problem because there's a lot of duplication
between a `function_call` and `expr::token`. All of the evaluation
and preparation is the same for `expr::token` as it's for a `function_call`
to the token function. Currently it's impossible to evaluate `expr::token`
and preparation has some flaws, but implementing it would basically
consist of copy-pasting the corresponding code from token `function_call`.

One more aspect is multi-table queries. With `expr::token` we turn
a call to the `token()` function into a struct that is schema-specific.
What happens when a single expression is used to make queries to multiple
tables? The schema is different, so something that is representad
as `expr::token` for one schema would be represented as `function_call`
in the context of a different schema.
Translating expressions to different tables would require careful
manipulation to convert `expr::token` to `function_call` and vice versa.
This could cause trouble for index queries.

Overall I think it would be best to remove expr::token.

Although having a clear marker for the partition token
is sometimes nice for query planning, in my opinion
the pros are outweighted by the cons.
I'm a big fan of having a single way to represent things,
having two separate representations of the same thing
without clear boundaries between them causes trouble.

Instead of having expr::token and function_call we can
just have the function_call and check if it represents
a partition token when needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:11:31 +02:00
Jan Ciolek
6e0ae59c5a cql3: keep a schema in visitor for extract_clustering_prefix_restrictions
The schema will be needed once we remove expr::token
and switch to using expr::is_partition_token_for_schema,
which requires a schema arguments.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:11:31 +02:00
Jan Ciolek
551135e83f cql3: keep a schema inside the visitor for extract_partition_range
The schema will be needed once we remove expr::token
and switch to using expr::is_partition_token_for_schema,
which requires a schema arguments.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:11:30 +02:00
Jan Ciolek
16bc1c930f cql3/prepare_expr: make get_lhs_receiver handle any function_call
get_lhs_receiver looks at the prepared LHS of a binary operator
and creates a receiver corresponding to this LHS expression.
This receiver is later used to prepare the RHS of the binary operator.

It's able to handle a few expression types - the ones that are currently
allowed to be on the LHS.
One of those types is `expr::token`, to handle restrictions like `token(p1, p2) = 3`.

Soon token will be replaced by `expr::function_call`, so the function will need
to handle `function_calls` to the token function.

Although we expect there to be only calls to the `token()` function,
as other functions are not allowed on the LHS, it can be made generic
over all function calls, which will help in future grammar extensions.

The functions call that it can currently get are calls to the token function,
but they're not validated yet, so it could also be something like `token(pk, pk, ck)`.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:53 +02:00
Jan Ciolek
d3a958490e cql3/expr: properly print token function_call
Printing for function_call is a bit strange.
When printing an unprepared function it prints
the name and then the arguments.

For prepared function it prints <anonymous function>
as the name and then the arguments.
Prepared functions have a name() method, but printing
doesn't use it, maybe not all functions have a valid name(?).

The token() function will soon be represent as a function_call
and it should be printable in a user-readable way.
Let's add an if which prints `token(arg1, arg2)`
instead of `<anonymous function>(arg1, arg2)` when printing
a call to the token function.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:53 +02:00
Jan Ciolek
289ca51ee5 expr_test: use unresolved_identifier when creating token
One test for expr::token uses raw column identifier
in the test.

Let's change it to unresloved_identifier, which is
a standard representation of unresolved column
names in expressions.

Once expr::token is removed it will be possible
to create a function_call with unresolved_identifiers
as arguments.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:53 +02:00
Jan Ciolek
096efc2f38 cql3/expr: split possible_lhs_values into column and token variants
The possible_lhs_values takes an expression and a column
and finds all possible values for the column that make
the expression true.

Apart from finding column values it's also capable of finding
all matching values for the partition key token.
When a nullptr column is passed, possible_lhs_values switches
into token values mode and finds all values for the token.

This interface isn't ideal.
It's confusing to pass a nullptr column when one wants to
find values for the token. It would be better to have a flag,
or just have a separate function.

Additionally in the future expr::token will be removed
and we will use expr::is_partition_token_for_schema
to find all occurences of the partition token.
expr::is_partition_token_for_schema takes a schema
as an argument, which possible_lhs_values doesn't have,
so it would have to be extended to get the schema from
somewhere.

To fix these two problems let's split possible_lhs_values
into two functions - one that finds possible values for a column,
which doesn't require a schema, and one that finds possible values
for the partition token and requires a schema:

value_set possible_column_values(const column_definition* col, const expression& e, const query_options& options);
value_set possible_partition_token_values(const expression& e, const query_options& options, const schema& table_schema);

This will make the interface cleaner and enable smooth transition
once expr::token is removed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:53 +02:00
Jan Ciolek
f2e5f654f2 cql3/expr: fix error message in possible_lhs_values
In possible_lhs_values there was a message talking
about is_satisifed_by. It looks like a badly
copy-pasted message.

Change it to possibel_lhs_values as it should be.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:52 +02:00
Avi Kivity
dc3c28516d cql3: expr: reimplement is_satisfied_by() in terms of evaluate()
It calls evaluate() internally anyway.

There's a scary if () in there talking about tokens, but everything
appears to work.
2023-04-29 13:04:52 +02:00
Jan Ciolek
ad5c931102 cql3/expr: add a schema argument to expr::replace_token
Just like has_token, replace_token will use
expr::is_partition_token_for_schema to find all instance
of the partition token to replace.

Let's prepare for this change by adding a schema argument
to the function before making the big change.

It's unsued at the moment, but having a separate commit
should make it easier to review.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:52 +02:00
Jan Ciolek
d50db32d14 cql3/expr: add a comment for expr::has_partition_token
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:52 +02:00
Jan Ciolek
18879aad6f cql3/expr: add a schema argument to expr::has_token
In the future expr::token will be removed and checking
whether there is a partition token inside an expression
will be done using expr::is_partition_token_for_schema.

This function takes a schema as an argument,
so all functions that will call it also need
to get the schema from somewhere.

Right now it's an unused argument, but in the future
it will be used. Adding it in a separate commit
makes it easier to review.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:52 +02:00
Jan Ciolek
90b3b85bd0 cql3: use statement_restrictions::has_token_restrictions() wherever possible
The statement_restrictions class has a method called has_token_restriction().
This method checks whether the partition key restrictions contain expr::token.

Let's use this function in all applicable places instead of manually calling has_token().

In the future has_token() will have an additional schema argument,
so eliminating calls to has_token() will simplify the transition.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:52 +02:00
Jan Ciolek
7af010095e cql3/expr: add expr::is_partition_token_for_schema
Add a function to check whether the expression
represents a partition token - that is a call
to the token function with consecutive partition
key columns as the arguments.

For example for `token(p1, p2, p3)` this function
would return `true`, but for `token(1, 2, 3)` or `token(p3, p2, p1)`
the result would be `false`.

The function has a schema argument because a schema is required
to get the list of partition columns that should be passed as
arguments to token().

Maybe it would be possible to infer the schema from the information
given earlier during prepare_expression, but it would be complicated
and a bit dangerous to do this. Sometimes we operate on multiple tables
and the schema is needed to differentiate between them - a token() call
can represent the base table's partition token, but for an index table
this is just a normal function call, not the partition token.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:51 +02:00
Jan Ciolek
694d9298aa cql3/expr: add expr::is_token_function
Add a function that can be used to check
whether a given expression represents a call
to the token() function.

Note that a call to token() doesn't mean
that the expression represents a partition
token - it could be something like token(1, 2, 3),
just a normal function_call.

The code for checking has been taken from functions::get.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:51 +02:00
Jan Ciolek
f7cac10fe0 cql3/expr: implement preparing function_call without a receiver
Currently trying to do prepare_expression(function_call)
with a nullptr receiver fails.

It should be possible to prepare function calls without
a known receiver.

When the user types in: `token(1, 2, 3)`
the code should be able to figure out that
they are looking for a function with name `token`,
which takes 3 integers as arguments.

In order to support that we need to prepare
all arguments that can be prepared before
attempting to find a function.

Prepared expressions have a known type,
which helps to find the right function
for the given arguments.

Additionally the current code for finding
a function requires all arguments to be
assignment_testable, which requires to prepare
some expression types, e.g column_values.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:51 +02:00
Jan Ciolek
15ed83adbc cql3/functions: make column family argument optional in functions::get
The method `functions::get` is used to get the `functions::function` object
of the CQL function called using `expr::function_call`.

Until now `functions::get` required the caller to pass both the keyspace
and the column family.

The keyspace argument is always needed, as every CQL function belongs
to some keyspace, but the column family isn't used in most cases.

The only case where having the column family is really required
is the `token()` function. Each variant of the `token()` function
belongs to some table, as the arguments to the function are the
consecutive partition key columns.

Let's make the column family argument optional. In most cases
the function will work without information about column family.
In case of the `token()` function there's gonna be a check
and it will throw an exception if the argument is nullopt.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:00:01 +02:00
Kefu Chai
0232115eaa compaction: disambiguate type name
otherwise GCC-13 complains:

```
/home/kefu/dev/scylladb/compaction/compaction_state.hh:38:22: error: declaration of ‘compaction::owned_ranges_ptr compaction::compaction_state::owned_ranges_ptr’ changes meaning of ‘owned_ranges_ptr’ [-Wchanges-meaning]
   38 |     owned_ranges_ptr owned_ranges_ptr;
      |                      ^~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-29 17:02:25 +08:00
Kefu Chai
56511a42d0 db: schema_tables: drop unused variable
this also silence the warning from GCC-13:
```
/home/kefu/dev/scylladb/db/schema_tables.cc:1489:10: error: variable ‘ts’ set but not used [-Werror=unused-but-set-variable]
 1489 |     auto ts = db_clock::now();
      |          ^~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-29 17:02:25 +08:00
Kefu Chai
48387a5a9a reader_concurrency_semaphore: fix signed/unsigned comparision
a signed/unsigned comparsion can overflow. and GCC-13 rightly points
this out. so let's use `std::cmp_greater_equal()` when comparing
unsigned and signed for greater-or-equal.

```
/home/kefu/dev/scylladb/reader_concurrency_semaphore.cc:931:76: error: comparison of integer expressions of different signedness: ‘long int’ and ‘uint64_t’ {aka ‘long unsigned int’} [-Werror=sign-compare]
  931 |     if (_resources.memory <= 0 && (consumed_resources().memory + r.memory) >= get_kill_limit()) [[unlikely]] {
      |                                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-29 17:02:25 +08:00
Kefu Chai
6d8188ad70 locator: topology: disambiguate type names
otherwise GCC-13 complains:
```
/home/kefu/dev/scylladb/locator/topology.hh:70:21: error: declaration of ‘const locator::topology* locator::node::topology() const’ changes meaning of ‘topology’ [-Wchanges-meaning]
   70 |     const topology* topology() const noexcept {
      |                     ^~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-29 17:02:25 +08:00
Kefu Chai
f80f638bb9 raft: disambiguate promise name in raft::awaited_conf_changes
otherwise GCC 13 complains that

```
/home/kefu/dev/scylladb/raft/server.cc:42:15: error: declaration of ‘seastar::promise<void> raft::awaited_index::promise’ changes meaning of ‘promise’ [-Wchanges-meaning]
   42 |     promise<> promise;
      |               ^~~~~~~
/home/kefu/dev/scylladb/raft/server.cc:42:5: note: used here to mean ‘class seastar::promise<void>’
   42 |     promise<> promise;
      |     ^~~~~~~~~
```
see also cd4af0c722

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-29 17:02:25 +08:00
Botond Dénes
f527b28174 Merge 'treewide: reenable -Wmissing-braces' from Kefu Chai
this change silences the warning of `-Wmissing-braces` from
clang. in general, we can initialize an object without constructor
with braces. this is called aggregate initialization.
but the standard does allow us to initialize each element using
either copy-initialization or direct-initialization. but in our case,
neither of them applies, so the clang warns like

```
suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
                options.elements.push_back({bytes(k.begin(), k.end()), bytes(v.begin(), v.end())});
                                            ^~~~~~~~~~~~~~~~~~~~~~~~~
                                            {                        }
```

in this change,

also, take the opportunity to use structured binding to simplify the
related code.

Closes #13705

* github.com:scylladb/scylladb:
  build: reenable -Wmissing-braces
  treewide: add braces around subobject
  cql3/stats: use zero-initialization
2023-04-28 16:00:14 +03:00
Kefu Chai
43e9910fa0 utils/chunked_managed_vector: use operator<=> when appropriate
instead of crafting 4 operators manually, just delegate it to <=>.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13698
2023-04-28 15:59:08 +03:00
Botond Dénes
0b5d9d94fa Merge 'Kill sstable::storage::get_stats() to help S3 provide accurate SSTable component stats' from Raphael "Raph" Carvalho
S3 wasn't providing filter size and accurate size for all SSTable components on disk.

First, filter size is provided by taking advantage that its in-memory representation is roughly the same as on-disk one.

Second, size for all components is provided by piggybacking on sstable parser and writer, so no longer a need to do a separate additional step after Scylla have either parsed or written all components.

Finally, sstable::storage::get_stats() is killed, so the burden is no longer pushed on the storage type implementation.

Refs #13649.

Closes #13682

* github.com:scylladb/scylladb:
  test: Verify correctness of sstable::bytes_on_disk()
  sstable: Piggyback on sstable parser and writer to provide bytes_on_disk
  sstable: restore indentation in read_digest() and read_checksum()
  sstable: make all parsing of simple components go through do_read_simple()
  sstable: Add missing pragma once to random_access_reader.hh
  sstable: make all writing of simple components go through do_write_simple()
  test: sstable_utils: reuse set_values()
  sstable: Restore indentation in read_simple()
  sstable: Coroutinize read_simple()
  sstable: Use filter memory footprint in filter_size()
2023-04-28 15:58:39 +03:00
Kefu Chai
ba8402067f db, sstable: add operator data_value() for generation_type
so we can apply `execute_cql()` on `generation_type` directly without
extracting its value using `generation.value()`. this paves the road to
adding UUID based generation id to `generation_type`. as by then, we
will have both UUID based and integer based `generation_type`, so
`generation_type::value()` will not be able to represent its value
anymore. and this method will be replaced by `operator data_value()` in
this use case.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-28 20:39:12 +08:00
Kefu Chai
ae9aa9c4bd db, sstable: print generation instead of its value
this change prepares for the change to use `variant<UUID, int64_t>`
as the value of `generation_type`. as after this change, the "value"
of a generation would be a UUID or an integer, and we don't want to
expose the variant in generation's public interface. so the `value()`
method would be changed or removed by then.

this change takes advantage of the fact that the formatter of
`generation_type` always prints its value. also, it's better to
reuse `generation_type` formatter when appropriate.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-28 20:39:12 +08:00
Jan Ciolek
b3d05f3525 cql3/expr: make it possible to prepare expr::constant
try_prepare_expression(constant) used to throw an error
when trying to prepeare expr::constant.

It would be useful to be able to do this
and it's not hard to implement.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-28 14:34:59 +02:00
Jan Ciolek
bf36cde29a cql3/expr: implement test_assignment for column_value
Make it possible to do test_assignment for column_values.
It's implemented using the generic expression assignment
testing function.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-28 14:34:59 +02:00
Jan Ciolek
fd174bda60 cql3/expr: implement test_assignment for expr::constant
test_assignment checks whether a value of some type
can be assigned to a value of different type.

There is no implementation of test_assignment
for expr::constant, but I would like to have one.

Currently there is a custom implementation
of test_assignment for each type of expression,
but generally each of them boils down to checking:
```
type1->is_value_compatible_with(type2)
```

Instead of implementing another type-specific funtion
I added expresion_test_assignment and used it to
implement test_assignment for constant.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-28 14:34:56 +02:00
Kefu Chai
a34e417069 streaming: remove unused operator==
since this operator is used nowhere, let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13697
2023-04-28 12:39:17 +03:00
Kefu Chai
662f8fa66e build: reenable -Wmissing-braces
since we've addressed all the -Wmissing-braces warnings, we can
now enable this warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-28 16:59:29 +08:00
Kefu Chai
eb7c41767b treewide: add braces around subobject
this change helps to silence the warning of `-Wmissing-braces` from
clang. in general, we can initialize an object without constructor
with braces. this is called aggregate initialization.
but the standard does allow us to initialize each element using
either copy-initialization or direct-initialization. but in our case,
neither of them applies, so the clang warns like

```
suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
                options.elements.push_back({bytes(k.begin(), k.end()), bytes(v.begin(), v.end())});
                                            ^~~~~~~~~~~~~~~~~~~~~~~~~
                                            {                        }
```

in this change,

also, take the opportunity to use structured binding to simplify the
related code.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-28 16:59:29 +08:00
Kefu Chai
91f22b0e81 cql3/stats: use zero-initialization
use {} instead of {0ul} for zero initialization. as `_query_cnt`
is a multi-dimension array, each elements in `_query_cnt` is yet
another array. so we cannot initialize it with a `{0ul}`. but
to zero-initialize this array, we can just use `{}`, as per
https://en.cppreference.com/w/cpp/language/zero_initialization

> If T is array type, each element is zero-initialized.

so this should recursively zero-initialize all arrays in `_query_cnt`.

this change should silence following warning:

stats.hh:88:60: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
            [statements::statement_type::MAX_VALUE + 1] = {0ul};
                                                           ^~~
                                                           {  }

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-28 16:59:29 +08:00
Botond Dénes
a93e5698b0 Merge 'Adding MindsDB integration to Docs' from Guy Shtub
@annastuchlik please review

Closes #13691

* github.com:scylladb/scylladb:
  adding documentation for integration with MindsDB
  adding documentation for integration with MindsDB
2023-04-28 11:47:10 +03:00
Botond Dénes
c6be764d46 Merge 'build: cmake: pick up tablets related changes and cleanups' from Kefu Chai
this series syncs the CMake building system with `configure.py` which was updated for introducing the tablets feature. also, this series include a couple cleanups.

Closes #13699

* github.com:scylladb/scylladb:
  build: cmake: remove dead code
  build: move test-perf down to test/perf
  build: cmake: pick up tablets related changes
2023-04-28 11:35:04 +03:00
Kefu Chai
066371adfa db_clock: specialize fmt::formatter<db_clock::time_point>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `db_clock::time_point` without the help of `operator<<`.

the corresponding `operator<<()` is removed in this change, as all its
callers are now using fmtlib for formatting now.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-28 15:48:06 +08:00
Kefu Chai
7863ef53ad cdc: generation: specialize fmt::formatter<generation_id>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `generation_id` without the help of `operator<<`.

the corresponding `operator<<()` is removed in this change, as all its
callers are now using fmtlib for formatting now.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-28 15:47:44 +08:00
Michał Sala
3b44ecd1e7 scripts: open-coredump.sh: suggest solib-search-path
Loading cores from Scylla executables installed in a non-standard
location can cause gdb to fail reading required libraries.

This is an example of a warning I've got after trying to load core
generated by dtest jenkins job (using ./scripts/open-coredump.sh):
> warning: Can't open file /jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.ccm/scylla-repository/0d64f327e1af9bcbb711ee217eda6df16e517c42/libreloc/libboost_system.so.1.78.0 during file-backed mapping note processing

Invocations of `scylla threads` command ended with an error:
> (gdb) scylla threads
> Python Exception <class 'gdb.error'>: Cannot find thread-local storage for LWP 2758, executable file (...)/scylla-debug-unstripped-5.3.0~dev-0.20230121.0d64f327e1af.x86_64/scylla/libexec/scylla:
> Cannot find thread-local variables on this target
> Error occurred in Python: Cannot find thread-local storage for LWP 2758, executable file (...)/scylla-debug-unstripped-5.3.0~dev-0.20230121.0d64f327e1af.x86_64/scylla/libexec/scylla:
> Cannot find thread-local variables on this target

An easy fix for this is to set solib-search-path to
/opt/scylladb/libreloc/.

This commit adds that set command to suggested command line gdb
arguments. I guess it's a good idea to always suggest setting
solib-search-path to that path, as it can save other people from wasting
their time on looking why does coredump opening does not work.

Closes #13696
2023-04-28 08:11:01 +03:00
Kefu Chai
572fab37bb build: cmake: remove dead code
the removed CMake script was designed to cater the needs when
Seastar's CMake script is not included in the parent project, but
this part is never tested and is dysfunctional as the `target_source()`
misses the target parameter. we can add it back when it is actually needed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-28 11:13:41 +08:00
Kefu Chai
d4530b023e build: move test-perf down to test/perf
so it is closer to where the sources are located.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-28 11:13:41 +08:00
Kefu Chai
56b99b7879 build: cmake: pick up tablets related changes
to sync with the changes in 5e89f2f5ba

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-28 11:13:41 +08:00
Benny Halevy
935ff0fcbb types: timestamp_from_string: print current_exception on error
We may catch exceptions that are not `marshal_exception`.
Print std::current_exception() in this case to provide
some context about the marshalling error.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13693
2023-04-27 22:30:55 +03:00
Asias He
a8040306bb storage_service: Fix removing replace node as pending
Consider

- n1, n2, n3
- n3 is down
- n4 replaces n3 with the same ip address 127.0.0.3
- Inside the storage_service::handle_state_normal callback for 127.0.0.3 on n1/n2

  ```
  auto host_id = _gossiper.get_host_id(endpoint);
  auto existing = tmptr->get_endpoint_for_host_id(host_id);
  ```

  host_id = new host id
  existing = empty

  As a result, del_replacing_endpoint() will not be called.

This means 127.0.0.3 will not be removed as a pending node on n1 and n2 when
replacing is done. This is wrong.

This is a regression since commit 9942c60d93
(storage_service: do not inherit the host_id of a replaced a node), where
replacing node uses a new host id than the node to be replaced.

To fix, call del_replacing_endpoint() when a node becomes NORMAL and existing
is empty.

Before:
n1:
storage_service - replace[cd1f187a-0eee-4b04-91a9-905ecc499cfc]: Added replacing_node=127.0.0.3 to replace existing_node=127.0.0.3, coordinator=127.0.0.3
token_metadata - Added node 127.0.0.3 as pending replacing endpoint which replaces existing node 127.0.0.3
storage_service - replace[cd1f187a-0eee-4b04-91a9-905ecc499cfc]: Marked ops done from coordinator=127.0.0.3
storage_service - Node 127.0.0.3 state jump to normal
storage_service - Set host_id=6f9ba4e8-9457-4c76-8e2a-e2be257fe123 to be owned by node=127.0.0.3

After:
n1:
storage_service - replace[28191ea6-d43b-3168-ab01-c7e7736021aa]: Added replacing_node=127.0.0.3 to replace existing_node=127.0.0.3, coordinator=127.0.0.3
token_metadata - Added node 127.0.0.3 as pending replacing endpoint which replaces existing node 127.0.0.3
storage_service - replace[28191ea6-d43b-3168-ab01-c7e7736021aa]: Marked ops done from coordinator=127.0.0.3
storage_service - Node 127.0.0.3 state jump to normal
token_metadata - Removed node 127.0.0.3 as pending replacing endpoint which replaces existing node 127.0.0.3
storage_service - Set host_id=72219180-e3d1-4752-b644-5c896e4c2fed to be owned by node=127.0.0.3

Tests: https://github.com/scylladb/scylla-dtest/pull/3126

Closes #13677
2023-04-27 21:03:01 +03:00
Raphael S. Carvalho
4e205650b6 test: Verify correctness of sstable::bytes_on_disk()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:06:48 -03:00
Raphael S. Carvalho
2dbae856f8 sstable: Piggyback on sstable parser and writer to provide bytes_on_disk
bytes_on_disk is the sum of all sstable components.

As read_simple() fetches the file size before parsing the component,
bytes_on_disk can be added incrementally rather than an additional
step after all components were already parsed.

Likewise, write_simple() tracks the offset for each new component,
and therefore bytes_on_disk can also be added incrementally.

This simplifies s3 life as it no longer have to care about feeding
a bytes_on_disk, which is currently limited to data and index
sizes only.

Refs #13649.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:06:48 -03:00
Raphael S. Carvalho
4d02821094 sstable: restore indentation in read_digest() and read_checksum()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:06:48 -03:00
Raphael S. Carvalho
75dc7b799e sstable: make all parsing of simple components go through do_read_simple()
With all parsing of simple components going through do_read_simple(),
common infrastructure can be reused (exception handling, debug logging,
etc), and also statistics spanning all components can be easily added.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:06:48 -03:00
Raphael S. Carvalho
71cd8e6b51 sstable: Add missing pragma once to random_access_reader.hh
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:06:48 -03:00
Raphael S. Carvalho
b783bddbdf sstable: make all writing of simple components go through do_write_simple()
With all writing of simple components going through do_write_simple(),
common infrastructure can be reused (exception handling, debug logging,
etc), and also statistics spanning all components can be easily added.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:06:46 -03:00
Raphael S. Carvalho
bc486b05fa test: sstable_utils: reuse set_values()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:04:52 -03:00
Raphael S. Carvalho
dcee5c4fae sstable: Restore indentation in read_simple()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:04:52 -03:00
Raphael S. Carvalho
253d9e787b sstable: Coroutinize read_simple()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:04:52 -03:00
Raphael S. Carvalho
0dcdec6a55 sstable: Use filter memory footprint in filter_size()
For S3, filter size is currently set to zero, as we want to avoid
"fstat-ing" each file.

On-disk representation of bloom filter is similar to the in-memory
one, therefore let's use memory footprint in filter_size().

User of filter_size() is API implementing "nodetool cfstats" and
it cares about the size of bloom filter data (that's how it's
described).

This way, we provide the filter data size regardless of the
underlying storage type.

Refs #13649.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:04:52 -03:00
Harsh Soni
84ea2f5066 raft: fsm: add empty check for max_read_id_with_quorum
Updated the empty() function in the struct fsm_output to include the
max_read_id_with_quorum field when checking whether the fsm output is
empty or not. The change was made in order maintain consistency with the
codebase and adding completeness to the empty check. This change has no
impact on other parts of the codebase.

Closes #13656
2023-04-27 16:04:58 +02:00
Kamil Braun
0bee872fb1 raft topology: rename update_replica_state -> update_topology_state
The new name is more generic and appropriate for topology transitions
which don't affect any specific replica but the entire cluster as a
whole (which we'll introduce later).

Also take `guard` directly instead of `node_to_work_on` in this more
generic function. Since we want `node_to_work_on` to die when we steal
its guard, introduce `take_guard` which takes ownership of the object
and returns the guard.
2023-04-27 15:22:19 +02:00
Kamil Braun
22ab5982e7 raft topology: remove transition_state::normal
What this state really represented is that there is currently no
transition. So remove it and make `transition_state` optional instead.
2023-04-27 15:18:32 +02:00
Kamil Braun
61c4e0ae20 raft topology: switch on transition_state first
Previously the code assumed that there was always a 'node to work on' (a
node which wants to change its state) or there was no work to do at all.
It would find such a node, switch on its state (e.g. check if it's
bootstrapping), and in some states switch on the topology
`transition_state` (e.g. check if it's `write_both_read_old`).

We want to introduce transitions that are not node-specific and can work
even when all nodes are 'normal' (so there's no 'node to work on'). As a
first step, we refactor the code so it switches on `transition_state`
first. In some of these states, like `write_both_read_old`, there must
be a 'node to work on' for the state to make sense; but later in some
states it will be optional (such as `commit_cdc_generation`).
2023-04-27 15:14:59 +02:00
Kamil Braun
a023ca2cf1 raft topology: handle_ring_transition: rename res to exec_command_res
A more descriptive name.
2023-04-27 15:12:12 +02:00
Kamil Braun
4ddfce8213 raft topology: parse replaced node in exec_global_command
Will make following commits easier.
2023-04-27 15:10:49 +02:00
Kamil Braun
bafce8fd28 raft topology: extract cleanup_group0_config_if_needed from get_node_to_work_on 2023-04-27 15:04:36 +02:00
Kamil Braun
98f69f52aa storage_service: extract raft topology coordinator fiber to separate class
The lambdas defined inside the fiber are now methods of this class.

Currently `handle_node_transition` is calling `handle_ring_transition`,
in a later commit we will reverse this: `handle_ring_transition` will
call `handle_node_transition`. We won't have to shuffle the functions
around because they are members of the same class, making the change
easier to review. In general, the code will be easier to maintain in
this new form (no need to deal with so many lambda captures etc.)

Also break up some lines which exceeded the 120 character limit (as per
Seastar coding guidelines).
2023-04-27 15:04:35 +02:00
Kefu Chai
87e9686f61 cdc: generation: simpify std::visit() call
if the visitor clauses are the same, we can just use the generic version
of it by specifying the parameter with `auto&`. simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13626
2023-04-27 14:43:20 +02:00
Alejo Sanchez
47d7939b8f test/topology: register RF pytest marker
Register pytest marker for replication_factor.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13688
2023-04-27 12:14:28 +02:00
Guy Shtub
c4664f9b66 adding documentation for integration with MindsDB 2023-04-27 13:13:19 +03:00
Guy Shtub
7e35a07f93 adding documentation for integration with MindsDB 2023-04-27 13:12:38 +03:00
Kamil Braun
defa63dc20 raft topology: rename replication_state to transition_state
The new name is more generic - it describes the current step of a
'topology saga` (a sequence of steps used to implement a larger topology
operation such as bootstrap).
2023-04-27 11:39:38 +02:00
Kamil Braun
af1ea2bb16 raft topology: make replication_state a topology-global state
Previously it was part of `ring_slice`, belonging to a specific node.
This commit moves it into `topology`, making it a cluster-global
property.

The `replication_state` column in `system.topology` is now `static`.

This will allow us to easily introduce topology transition states that
do not refer to any specific node. `commit_cdc_generation` will be such
a state, allowing us to commit a new CDC generation even though all
nodes are normal (none are transitioning). One could argue that the
other states are conceptually already cluster-global: for example,
`write_both_read_new` doesn't affect only the tokens of a bootstrapping
(or decommissioning etc.) node; it affects replica sets of other tokens
as well (with RFs greater than 1).
2023-04-27 11:39:38 +02:00
Kamil Braun
30cc07b40d Merge 'Introduce tablets' from Tomasz Grabiec
This PR introduces an experimental feature called "tablets". Tablets are
a way to distribute data in the cluster, which is an alternative to the
current vnode-based replication. Vnode-based replication strategy tries
to evenly distribute the global token space shared by all tables among
nodes and shards. With tablets, the aim is to start from a different
side. Divide resources of replica-shard into tablets, with a goal of
having a fixed target tablet size, and then assign those tablets to
serve fragments of tables (also called tablets). This will allow us to
balance the load in a more flexible manner, by moving individual tablets
around. Also, unlike with vnode ranges, tablet replicas live on a
particular shard on a given node, which will allow us to bind raft
groups to tablets. Those goals are not yet achieved with this PR, but it
lays the ground for this.

Things achieved in this PR:

  - You can start a cluster and create a keyspace whose tables will use
    tablet-based replication. This is done by setting `initial_tablets`
    option:

    ```
        CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',
                        'replication_factor': 3,
                        'initial_tablets': 8};
    ```

    All tables created in such a keyspace will be tablet-based.

    Tablet-based replication is a trait, not a separate replication
    strategy. Tablets don't change the spirit of replication strategy, it
    just alters the way in which data ownership is managed. In theory, we
    could use it for other strategies as well like
    EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy
    is augmented to support tablets.

  - You can create and drop tablet-based tables (no DDL language changes)

  - DML / DQL work with tablet-based tables

    Replicas for tablet-based tables are chosen from tablet metadata
    instead of token metadata

Things which are not yet implemented:

  - handling of views, indexes, CDC created on tablet-based tables
  - sharding is done using the old method, it ignores the shard allocated in tablet metadata
  - node operations (topology changes, repair, rebuild) are not handling tablet-based tables
  - not integrated with compaction groups
  - tablet allocator piggy-backs on tokens to choose replicas.
    Eventually we want to allocate based on current load, not statically

Closes #13387

* github.com:scylladb/scylladb:
  test: topology: Introduce test_tablets.py
  raft: Introduce 'raft_server_force_snapshot' error injection
  locator: network_topology_strategy: Support tablet replication
  service: Introduce tablet_allocator
  locator: Introduce tablet_aware_replication_strategy
  locator: Extract maybe_remove_node_being_replaced()
  dht: token_metadata: Introduce get_my_id()
  migration_manager: Send tablet metadata as part of schema pull
  storage_service: Load tablet metadata when reloading topology state
  storage_service: Load tablet metadata on boot and from group0 changes
  db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata()
  migration_notifier: Introduce before_drop_keyspace()
  migration_manager: Make prepare_keyspace_drop_announcement() return a future<>
  test: perf: Introduce perf-tablets
  test: Introduce tablets_test
  test: lib: Do not override table id in create_table()
  utils, tablets: Introduce external_memory_usage()
  db: tablets: Add printers
  db: tablets: Add persistence layer
  dht: Use last_token_of_compaction_group() in split_token_range_msb()
  locator: Introduce tablet_metadata
  dht: Introduce first_token()
  dht: Introduce next_token()
  storage_proxy: Improve trace-level logging
  locator: token_metadata: Fix confusing comment on ring_range()
  dht, storage_proxy: Abstract token space splitting
  Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries"
  db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms()
  db: Introduce get_non_local_vnode_based_strategy_keyspaces()
  service: storage_proxy: Avoid copying keyspace name in write handler
  locator: Introduce per-table replication strategy
  treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type
  locator: Introduce effective_replication_map
  locator: Rename effective_replication_map to vnode_effective_replication_map
  locator: effective_replication_map: Abstract get_pending_endpoints()
  db: Propagate feature_service to abstract_replication_strategy::validate_options()
  db: config: Introduce experimental "TABLETS" feature
  db: Log replication strategy for debugging purposes
  db: Log full exception on error in do_parse_schema_tables()
  db: keyspace: Remove non-const replication strategy getter
  config: Reformat
2023-04-27 09:40:18 +02:00
Kefu Chai
f5b05cf981 treewide: use defaulted operator!=() and operator==()
in C++20, compiler generate operator!=() if the corresponding
operator==() is already defined, the language now understands
that the comparison is symmetric in the new standard.

fortunately, our operator!=() is always equivalent to
`! operator==()`, this matches the behavior of the default
generated operator!=(). so, in this change, all `operator!=`
are removed.

in addition to the defaulted operator!=, C++20 also brings to us
the defaulted operator==() -- it is able to generated the
operator==() if the member-wise lexicographical comparison.
under some circumstances, this is exactly what we need. so,
in this change, if the operator==() is also implemented as
a lexicographical comparison of all memeber variables of the
class/struct in question, it is implemented using the default
generated one by removing its body and mark the function as
`default`. moreover, if the class happen to have other comparison
operators which are implemented using lexicographical comparison,
the default generated `operator<=>` is used in place of
the defaulted `operator==`.

sometimes, we fail to mark the operator== with the `const`
specifier, in this change, to fulfil the need of C++ standard,
and to be more correct, the `const` specifier is added.

also, to generate the defaulted operator==, the operand should
be `const class_name&`, but it is not always the case, in the
class of `version`, we use `version` as the parameter type, to
fulfill the need of the C++ standard, the parameter type is
changed to `const version&` instead. this does not change
the semantic of the comparison operator. and is a more idiomatic
way to pass non-trivial struct as function parameters.

please note, because in C++20, both operator= and operator<=> are
symmetric, some of the operators in `multiprecision` are removed.
they are the symmetric form of the another variant. if they were
not removed, compiler would, for instance, find ambiguous
overloaded operator '=='.

this change is a cleanup to modernize the code base with C++20
features.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13687
2023-04-27 10:24:46 +03:00
Botond Dénes
3e92bcaa20 Merge 'utils: redesign reusable_buffer' from Michał Chojnowski
Common compression libraries work on contiguous buffers.
Contiguous buffers are a problem for the allocator. However, as long as they are short-lived,
we can avoid the expensive allocations by reusing buffers across tasks.

This idea is already applied to the compression of CQL frames, but with some deficiencies.
`utils: redesign reusable_buffer` attempts to improve upon it in a few ways. See its commit message for an extended discussion.

Compression buffer reuse also happens in the zstd SSTable compressor, but the implementation is misguided. Every `zstd_processor` instance reuses a buffer, but each instance has its own buffer. This is very bad, because a healthy database might have thousands of concurrent instances (because there is one for each sstable reader). Together, the buffers might require gigabytes of memory, and the reuse actually *increases* memory pressure significantly, instead of reducing it.
`zstd: share buffers between compressor instances` aims to improve that by letting a single buffer be shared across all instances on a shard.

Closes #13324

* github.com:scylladb/scylladb:
  zstd: share buffers between compressor instances
  utils: redesign reusable_buffer
2023-04-27 09:09:09 +03:00
Pavel Emelyanov
4f93b440a5 sstables: Remove lost eptr variable from do_write_simple()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13684
2023-04-27 07:37:15 +03:00
Anna Stuchlik
c7df168059 doc: move Glossary to the Reference section
This commit moves the Glossary page to the Reference
section. In addition, it adds the redirection so that
there are no broken links because of this change
and fixes a link to a subsection of Glossary.

Closes #13664
2023-04-27 07:03:55 +03:00
Michał Chojnowski
16dd93cb7e zstd: share buffers between compressor instances
The zstd implementation of `compressor` has a separate decompression and
compression context per instance. This is unreasonably wasteful. One
decompression buffer and one compression buffer *per shard* is enough.

The waste is significant. There might exist thousands of SSTable readers, each
containing its own instance of `compressor` with several hundred KiB worth of
unneeded buffers. This adds up to gigabytes of wasted memory and gigapascals
of allocator pressure.

This patch modifies the implementation of zstd_processor so that all its
instances on the shard share their contexts.

Fixes #11733
2023-04-26 22:09:17 +02:00
Michał Chojnowski
bf26a8c467 utils: redesign reusable_buffer
Large contiguous buffers put large pressure on the allocator
and are a common source of reactor stalls. Therefore, Scylla avoids
their use, replacing it with fragmented buffers whenever possible.
However, the use of large contiguous buffers is impossible to avoid
when dealing with some external libraries (i.e. some compression
libraries, like LZ4).

Fortunately, calls to external libraries are synchronous, so we can
minimize the allocator impact by reusing a single buffer between calls.

An implementation of such a reusable buffer has two conflicting goals:
to allocate as rarely as possible, and to waste as little memory as
possible. The bigger the buffer, the more likely that it will be able
to handle future requests without reallocation, but also the memory
memory it ties up.

If request sizes are repetitive, the near-optimal solution is to
simply resize the buffer up to match the biggest seen request,
and never resize down.

However, if we anticipate pathologically large requests, which are
caused by an application/configuration bug and are never repeated
again after they are fixed, we might want to resize down after such
pathological requests stop, so that the memory they took isn't tied
up forever.

The current implementation of reusable buffers handles this by
resizing down to 0 every 100'000 requests.

This patch attempts to solve a few shortcomings of the current
implementation.
1. Resizing to 0 is too aggressive. During regular operation, we will
surely need to resize it back to the previous size again. If something
is allocated in the hole left by the old buffer, this might cause
a stall. We prefer to resize down only after pathological requests.
2. When resizing, the current implementation allocates the new buffer
before freeing the old one. This increases allocator pressure for no
reason.
3. When resizing up, the buffer is resized to exactly the requested
size. That is, if the current size is 1MiB, following requests
of 1MiB+1B and 1MiB+2B will both cause a resize.
It's preferable to limit the set of possible sizes so that every
reset doesn't tend to cause multiple resizes of almost the same size.
The natural set of sizes is powers of 2, because that's what the
underlying buddy allocator uses. No waste is caused by rounding up
the allocation to a power of 2.
4. The interval of 100'000 uses is both too low and too arbitrary.
This is up for discussion, but I think that it's preferable to base
the dynamics of the buffer on time, rather than the number of uses.
It's more predictable to humans.

The implementation proposed in this patch addresses these as follows:
1. Instead of resizing down to 0, we resize to the biggest size
   seen in the last period.
   As long as at least one maximal (up to a power of 2) "normal" request
   appears each period, the buffer will never have to be resized.
2. The capacity of the buffer is always rounded up to the nearest
   power of 2.
3. The resize down period is no longer measured in number of requests
   but in real time.

Additionally, since a shared buffer in asynchronous code is quite a
footgun, some rudimentary refcounting is added to assert that only
one reference to the buffer exists at a time, and that the buffer isn't
downsized while a reference to it exists.

Fixes #13437
2023-04-26 22:09:17 +02:00
Anna Stuchlik
1ce50faf02 doc: remove reduntant information about versions
Fixes https://github.com/scylladb/scylladb/issues/13578

Now that the documentation is versioned, we can remove
the .. versionadded:: and .. versionchanged:: information
(especially that the latter is hard to maintain and now
outdated), as well as the outdated information about
experimental features in very old releases.

This commit removes that information and nothing else.

Closes #13680
2023-04-26 17:20:52 +03:00
Botond Dénes
5aaa30b267 Merge 'treewide: stop using std::rel_ops' from Kefu Chai
std::rel_ops was deprecated in C++20, as C++20 provides a better solution for defining comparison operators. and all the use cases previously to be addressed by `using namespace std::rel_ops` have been addressed either by `operator<=>` or the default-generated `operator!=`.

so, in this series, to avoid using deprecated facilities, let's drop all these `using namespace std::rel_ops`. there are many more cases where we could either use `operator<=>` or the default-generated `operator!=` to simplify the implementation. but here, we care more about `std::rel_ops`, we will drop the most (if not all of them) of the explicitly defined `operator!=`  and other comparison operators later.

Closes #13676

* github.com:scylladb/scylladb:
  treewide: do not use std::rel_ops
  dht: token: s/tri_compare/operator<=>/
2023-04-26 16:49:44 +03:00
Aleksandra Martyniuk
725110a035 docs: clarify the meaning of cfhistogram's sstable column
Closes #13669
2023-04-26 16:19:23 +03:00
Tomasz Grabiec
8d5467fa9c Merge 'Some minor improvements in table' from Raphael "Raph" Carvalho
Removed outdated comments and added reverse() to avoid reallocations.

Closes #13672

* github.com:scylladb/scylladb:
  table: Avoid reallocations in make_compaction_groups()
  table: Remove another outdated comment regarding sstable generation
  table: Remove outdated comment regarding automatic compaction
2023-04-26 14:43:49 +02:00
Botond Dénes
88c19b23dc reader_permit: resource_units::reset_to(): try harder to avoid calling consume()
Currently, the `reset_to()` implementation calls `consume(new_amount)` (if
not zero), then calls `signal(old_amount)`. This means that even if
`reset_to()` is a net reduction in the amount of resources, there is a
call to `consume()` which can now potentially throw.
Add a special case for when the new amount of resources is strictly
smaller than the old amount. In this case, just call `signal()` with the
difference. This not just avoids a potential `std::bad_alloc`, but also
helps relieving memory pressure when this is most needed, by not failing
calls to release memory.
2023-04-26 07:41:57 -04:00
Botond Dénes
2449b714df reader_permit: split resource_units::reset()
Into reset_to() and reset_to_zero(). The latter replaces `reset()` with
the default 0 resources argument, which was often called from noexcept
contexts. Splitting it out from `reset()` allows for a specialized
implementation that is guaranteed to be `noexcept` indeed and thus
peace of mind.
2023-04-26 07:41:57 -04:00
Botond Dénes
21988842de reader_permit: make consume()/signal() API private
This API is dangerous, all resource consumption should happen via RAII
objects that guarantee that all consumed resources are appropriately
released.
At this poit, said API is just a low-level building block for
higher-level, RAII objects. To ensure nobody thinks of using it for
other purposes, make it private and make external users friends instead.
2023-04-26 07:41:53 -04:00
Tomasz Grabiec
ce94a2a5b0 Merge 'Fixes and tests for raft-based topology changes' from Kamil Braun
Fix two issues with the replace operation introduced by recent PRs.

Add a test which performs a sequence of basic topology operations (bootstrap,
decommission, removenode, replace) in a new suite that enables the `raft`
experimental feature (so that the new topology change coordinator code is used).

Fixes: #13651

Closes #13655

* github.com:scylladb/scylladb:
  test: new suite for testing raft-based topology
  test: remove topology_custom/test_custom.py
  raft topology: don't require new CDC generation UUID to always be present
  raft topology: include shard_count/ignore_msb during replace
2023-04-26 11:38:07 +02:00
Kefu Chai
951457a711 treewide: do not use std::rel_ops
std::rel_ops was deprecated in C++20, as C++20 provides a better
solution for defining comparison operators. and all the use cases
previously to be addressed by `using namespace std::rel_ops` have
been addressed either by `operator<=>` or the default-generated
`operator!=`.

so, in this change, to avoid using deprecated facilities, let's
drop all these `using namespace std::rel_ops`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-26 14:09:58 +08:00
Kefu Chai
5a11d67709 dht: token: s/tri_compare/operator<=>/
now that C++20 is able to generate the default-generated comparing
operators for us. there is no need to define them manually. and,
`std::rel_ops::*` are deprecated in C++20.

also, use `foo <=> bar` instead of `tri_compare(foo, bar)` for better
readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-26 14:09:57 +08:00
Kefu Chai
20da130cdf mutation: specialize fmt::formatter<range_tombstone_{entry,list}>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `range_tombstone_list` and `range_tombstone_entry`
without the help of `operator<<`.

the corresponding `operator<<()` for `range_tombstone_entry` is moved
into test, where it is used. and the other one is dropped in this change,
as all its callers are now using fmtlib for formatting now.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13627
2023-04-26 09:00:25 +03:00
Kefu Chai
c8aa7295d4 cql3: drop unused function
there are two variants of `query_processor::for_each_cql_result()`,
both of them perform the pagination of results returned by a CQL
statement. the one which accepts a function returning an instant
value is not used now. so let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13675
2023-04-26 08:43:22 +03:00
Raphael S. Carvalho
59904be5c3 table: Avoid reallocations in make_compaction_groups()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-25 11:14:33 -03:00
Raphael S. Carvalho
9f5e19224d table: Remove another outdated comment regarding sstable generation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-25 11:09:51 -03:00
Raphael S. Carvalho
2d45dd35c7 table: Remove outdated comment regarding automatic compaction
We already provide a way to disable automatic compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-25 11:09:45 -03:00
Pavel Emelyanov
9bb4ee160f gossiper: Remove features and sysks from gossiper
Now gossiper doesn't need those two as its dependencies, they can be
removed making code shorter and dependencies graph simpler.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-25 17:06:06 +03:00
Pavel Emelyanov
5cbc8fe2f9 system_keyspace: De-static save_local_supported_features()
That's, in fact, an independent change, because feature enabler doesn't
need this method. So this patch is like "while at it" thing, but on the
other hand it ditches one more qctx usage.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-25 17:04:54 +03:00
Pavel Emelyanov
a5bd6cc832 system_keyspace: De-static load_|save_local_enabled_features()
All callers now have the system keyspace instance at hand.

Unfortunately, this de-static doesn't allow more qctx drops, because
both methods use set_|get_scylla_local_param helpers that do use qctx
and are still in use by other static methods.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-25 17:03:09 +03:00
Pavel Emelyanov
9bfbcaa3f6 system_keyspace: Move enable_features_on_startup to feature_service (cont)
Now move the code itself. No functional changes here.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-25 17:02:38 +03:00
Pavel Emelyanov
858db9f706 system_keyspace: Move enable_features_on_startup to feature_service
This code belongs to feature service, system keyspace shoulnd't be aware
of any pecularities of startup features enabling, only loading and
saving the feature lists.

For now the move happens only in terms of code declarations, the
implementation is kept in its old place to reduce the patch churn.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-25 17:00:30 +03:00
Pavel Emelyanov
71eb4edf3c feature_service: Open-code persist_enabled_feature_info() into enabler
The method in question is only called by the enabler and is short enough
to be merged into the caller. This kills two birds with one stone --
makes less walks over features list and will make it possible to
de-static system keyspace features load and save methods.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-25 16:58:49 +03:00
Pavel Emelyanov
474548f614 gms: Move feature enabler to feature_service.cc
No functional changes, just move the code. Thie makes gossiper not mess
with enabling/persisting features, but just gossiping them around.
Feature intersection code is still in gossiper, but can be moved in
more suitable location any time later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-25 16:57:19 +03:00
Pavel Emelyanov
dcf88b07a4 gms: Move gossiper::enable_features() to feature_service::enable_features_on_join()
This will make it possible to move the enabler to feature_service.cc

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-25 16:56:07 +03:00
Pavel Emelyanov
1461a892a6 gms: Persist features explicitly in features enabler
Nowadays features are persisted in feature_service::enable() and there
are four callers of it

- feature enabler via gossiper notifications
- boot kicks feature enabler too
- schema loader tool
- cql test env

None but the first case need to persist features. The loader tool in
fact doesn't do it even now it by keeping qctx uninitialized. Cql test
env wires up the qctx, but it makes no differences for the test cases
themselves if the features are persisted or not.

Boot-time is a bit trickier -- it loads the feature list from system
keyspace and may filter-out some of them, then enable. In that case
committing the list back into system keyspace makes no sense, as the
persisted list doesn't extend.

The enabler, in turn, can call system keyspace directly via its explicit
dependency reference. This fixes the inverse dependency between system
keyspace and feature service.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-25 16:51:15 +03:00
Pavel Emelyanov
ba7af749b1 feature_service: Make persist_enabled_feature_info() return a future
It now knows that it runs inside async context, but things are changing
and soon it will be moved out of it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-25 16:50:32 +03:00
Pavel Emelyanov
1ee04e4934 system_keyspace: De-static load_peer_features()
This makes use of feature_enabler::_sys_ks dependency and gets rid of
one more global qctx usage.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-25 16:50:00 +03:00
Pavel Emelyanov
e30c72109f gms: Move gossiper::do_enable_features to persistent_feature_enabler::enable_features()
It's the enabler that's responsible for enabling the features and,
implicitly, persisting them into the system keyspace. This patch moves
this logic from gossiper to feature_enabler, further patching will make
the persisting code be explicit.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-25 16:47:30 +03:00
Pavel Emelyanov
ac60d8afca gossiper: Enable features and register enabler from outside
It's a bit hairy. The maybe_enable_features() is called from two places
-- the feature_enabler upon notifications from gossiper and directory by
gossiper from wait_for_gossip_to_settle().

The _latter_ is called only when the wait_for_gossip_to_settle() is
called for the first time because of the _gossip_settled checks in it.
For the first time this method is called by storage_service when it
tries to join the ring (next it's called from main, but that's not of
interest here).

Next, despite feature_enabler is registered early -- when gossiper
instance is constructed by sharded<gossiper>::start() -- it checks for
the _gossip_settled to be true to take any actions.

Considering both -- calling maybe_enable_features() _and_ registering
enabler after storage_service's call to wait_for_gossip_to_settle()
doesn't break the code logic, but make further patching possible. In
particular, the feature_enabler will move to feature_service not to
pollute gossiper code with anything that's not gossiping.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-25 16:42:17 +03:00
Pavel Emelyanov
cefcdeee1e gms: Add feature_service and system_keyspace to feature_enabler
And rename the guy. These dependencies will be used further, both are
available and started when the enabler is constructed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-25 16:41:09 +03:00
Kamil Braun
a29b8cd02b Merge 'cql3: fix a few misformatted printouts of column names in error messages' from Nadav Har'El
Fix a few cases where instead of printing column names in error messages, we printed weird stuff like ASCII codes or the address of the name.

Fixes #13657

Closes #13658

* github.com:scylladb/scylladb:
  cql3: fix printing of column_specification::name in some error messages
  cql3: fix printing of column_definition::name in some error messages
2023-04-25 14:21:09 +02:00
Avi Kivity
a1b99d457f Update tools/jmx submodule (error handling when jdk not available)
* tools/jmx fdd0474...5f98894 (1):
  > install.sh: bail out if jdk is not available
2023-04-25 14:20:57 +02:00
Kefu Chai
5804eb6d81 storage_service: specialize fmt::formatter<storage_service::mode>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `storage_service::mode` without the help of `operator<<`.

the corresponding `operator<<()` for `storage_service::mode` is removed
in this change, as all its callers are now using fmtlib for formatting
now.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13640
2023-04-25 14:20:57 +02:00
Tomasz Grabiec
a717c803c7 tests: row_cache: Add reproducer for reader producing missing closing range tombstone
Adds a reproducer for #12462, which doesn't manifest in master any
more after f73e2c992f. It's still useful
to keep the test to avoid regresions.

The bug manifests by reader throwing:

  std::logic_error: Stream ends with an active range tombstone: {range_tombstone_change: pos={position: clustered,ckp{},-1}, {tombstone: timestamp=-9223372036854775805, deletion_time=2}}

The reason is that prior to the rework of the cache reader,
range_tombstone_generator::flush() was used with end_of_range=true to
produce the closing range_tombstone_change and it did not handle
correctly the case when there are two adjacent range tombstones and
flush(pos, end_of_range=true) is called such that pos is the boundary
between the two.

Closes #13665
2023-04-25 14:20:57 +02:00
Gleb Natapov
9849409c2a service/raft: raft_group0: drop dependency on migration_manager
raft_group0 does not really depends on migration_manager, it needs it only
transiently, so pass it to appropriate methods of raft_group0 instead
of during its creation.
2023-04-25 12:38:01 +03:00
Gleb Natapov
d5d156d474 service/raft: raft_group0: drop dependency on query_processor
raft_group0 does not really depends on query_processor, it needs it only
transiently, so pass it to appropriate methods of raft_group0 instead
of during its creation.
2023-04-25 12:35:57 +03:00
Kamil Braun
59eb01b7a6 test: new suite for testing raft-based topology
Introduce new test suite for testing the new topology coordinator
(runs under `raft` experimental flag). Add a simple test that performs a
basic sequence of topology operations.
2023-04-25 11:04:51 +02:00
Gleb Natapov
029f1737ef service/raft: raft_group0: drop dependency on storage_service
raft_group0 does not really depends on storage_service, it needs it only
transiently, so pass it to appropriate methods of raft_group0 instead
of during its creation.
2023-04-25 11:07:47 +03:00
Botond Dénes
8765442f3f Merge 'utils: add basic_xx_hasher' from Benny Halevy
Consolidate `bytes_view_hasher` and abstract_replication_strategy `factory_key_hasher` which are the same into a reusable utils::basic_xx_hasher.

To be used in a followup series for netw:msg_addr.

Closes #13530

* github.com:scylladb/scylladb:
  utils: hashing: use simple_xx_hasher
  utils: hashing: add simple_xx_hasher
  utils: hashers: add HasherReturning concept
  hashing: move static_assert to source file
2023-04-25 09:53:47 +02:00
Kefu Chai
f4016d3289 cql3: coroutinize query_processor::for_each_cql_result
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13621
2023-04-25 09:53:47 +02:00
Botond Dénes
b9491c0134 Merge 'Test the column_family rest api' from Benny Halevy
Add a test for get/enable/disable auto_compaction via to column_family api.
And add log messages for admin operations over that api.

Closes #13566

* github.com:scylladb/scylladb:
  api: column_family: add log messages for admin operation
  test: rest_api: add test_column_family
2023-04-25 09:53:47 +02:00
Wojciech Mitros
b0fa59b260 build: add tools for optimizing the Wasm binaries and translating to wat
After the addition of the rust-std-static-wasm32-wasi target, we're
able to compile the Rust programs to Wasm binaries. However, we're still
only able to handle the Wasm UDFs in the Text format, so we need a tool
to translate the .wasm files to .wat. Additionally, the .wasm files
generated by default are unnecessarily large, which can be helped
using wasm-opt and wasm-strip.
The tool for translating wasm to wat (wasm2wat), and the tool for
stripping the wasm binaries (wasm-strip) are included in the `wabt`
package, and the optimization tool (wasm-opt) is included in the
`binaryen` package. Both packages are added to install-dependencies.sh

Closes #13282

[avi: regenerate frozen toolchain]

Closes #13605
2023-04-25 09:53:47 +02:00
Pavel Emelyanov
9a9dbffce3 s3/client: Zeroify stat by default
The s3::readable_file::stat() call returns a hand-crafted stat structure
with some fields set to some sane values, most are constants. However,
other fields remain not initialized which leads to troubles sometimes.
Better to fill the stat with zeroes and later revisit it for more sane
values.

fixes: #13645
refs: #13649
Using designated initializers is not an option here, see PR #13499

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13650
2023-04-25 09:53:47 +02:00
Kefu Chai
b0a01d85e9 s3/test: collect log on exit
the temporary directory holding the log file collecting the scylla
subprocess's output is specified by the test itself, and it is
`test_tempdir`. but unfortunately, cql-pytest/run.py is not aware
of this. so `cleanup_all()` is not able to print out the logging
messages at exit. as, please note, cql-pytest/run.py always
collect "log" file under the directory created using `pid_to_dir()`
where pid is the spawned subprocesses. but `object_store/run` uses
the main process's pid for its reusable tempdir.

so, with this change, we also register a cleanup func to printout
the logging message when the test exits.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-25 09:53:47 +02:00
Alejo Sanchez
c06e01cfba test/topology: log stages for concurrent test
For concurrent schema changes test, log when the different stages of the
test are finished.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13654
2023-04-25 09:53:47 +02:00
Kefu Chai
cc87e10f40 dht: print pk in decorated_key with "pk" prefix
this change ensures that `dk._key` is formatted with the "pk" prefix.
as in 3738fcb, the `operator<<` for partition_key was removed. so the
compiler has to find an alternative when trying to fulfill the needs
when this operator<< is called. fortunately, from the compiler's
perspective, `partition_key` has an `operator managed_bytes_view`, and
this operator does not have the explicit specifier, and,
`managed_bytes_view` does support `operator<<`. so this ends up with a
change in the format of `decorated_key` when it is printed using
`operator<<`. the code compiles. but unfortunately, the behavior is
changed, and it breaks scylla-dtest/cdc_tracing_info_test.py where the
partition_key is supposed to be printed like "pk{010203}" instead of
"010203". the latter is how `managed_bytes_view` is formatted.

a test is added accordingly to avoid future changes which break the
dtest.

Fixes scylladb#13628
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13653
2023-04-25 09:53:47 +02:00
Nadav Har'El
bd09dc308c cql3: fix printing of column_specification::name in some error messages
column_specification::name is a shared pointer, so it should be
dereferenced before printing - because we want to print the name, not
the pointer.

Fix a few instances of this mistake in prepare_expr.cc. Other instances
were already correct.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-04-25 10:46:56 +03:00
Nadav Har'El
4eabb3f429 cql3: fix printing of column_definition::name in some error messages
Printing a column_definition::name() in an error message is wrong,
because it is "bytes" and printed as hexadecimal ASCII codes :-(

Some error messages in cql3/operation.cc incorrectly used name()
and should be changed to name_as_text(), as was correctly done in
a few other error messages in the same file.

This patch also fixes a few places in the test/cql approval tests which
"enshrined" the wrong behavior - printing things like 666c697374696e74
in error messages - and now needs to be fixed for the right behavior.

Fixes #13657

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-04-25 10:46:47 +03:00
Kamil Braun
b1d58c3d3a test: remove topology_custom/test_custom.py
It was a temporary test just to check that the `topology_custom` suite
works. The suite now contains a real test so we can remove this one.
2023-04-24 14:41:33 +02:00
Kamil Braun
3f0498ca53 raft topology: don't require new CDC generation UUID to always be present
During node replace we don't introduce a new CDC generation, only during
regular bootstrap. Instead of checking that `new_cdc_generation_uuid`
must be present whenever there's a topology transition, only check it
when we're in `commit_cdc_generation` state.
2023-04-24 14:41:33 +02:00
Kamil Braun
9ca53478ed raft topology: include shard_count/ignore_msb during replace
Fixes: #13651
2023-04-24 14:40:47 +02:00
Kefu Chai
124153d439 build: cmake: sync with configure.py
this changes updates the CMake building system with the changes
introduced by 3f1ac846d8 and
d1817e9e1b

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13648
2023-04-24 14:55:20 +03:00
Benny Halevy
b3d91cbf65 utils: hashing: use simple_xx_hasher
Use simple_xx_hasher for bytes_view and effective_replication_map::factory_key
appending hashers instead of their custom, yet identical implementations.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-24 14:07:25 +03:00
Benny Halevy
f4fefec343 utils: hashing: add simple_xx_hasher
And a respective unit test.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-24 14:06:43 +03:00
Benny Halevy
b638dddf1b utils: hashers: add HasherReturning concept
And a more specific HasherReturningBytes for hashers
that return bytes in finalize().

HasherReturning will be used by the following patch
also for simple hashers that return size_t from
finalize().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-24 14:06:40 +03:00
Benny Halevy
a765472b8b hashing: move static_assert to source file
No need to check it inline in the header.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-24 12:23:03 +03:00
Tomasz Grabiec
03035e3675 test: topology: Introduce test_tablets.py 2023-04-24 10:49:37 +02:00
Tomasz Grabiec
c1fdbe79b7 raft: Introduce 'raft_server_force_snapshot' error injection
Will be used by tests to force followers to catch up from the snapshot.
2023-04-24 10:49:37 +02:00
Tomasz Grabiec
819bc86f0f locator: network_topology_strategy: Support tablet replication 2023-04-24 10:49:37 +02:00
Tomasz Grabiec
5e89f2f5ba service: Introduce tablet_allocator
Currently, responsible for injecting mutations of system.tablets to
schema changes.

Note that not all migrations are handled currently. Dependant view or
cdc table drops are not handled.
2023-04-24 10:49:37 +02:00
Tomasz Grabiec
6d4d3d8bbd locator: Introduce tablet_aware_replication_strategy
tablet_aware_replication_strategy is a trait class meant to be
inherited by replication strategy which want to work with tablets. The
trait produces per-table effective_replication_map which looks at
tablet metadata to determine replicas.

No replication startegy is changed to use tablets yet in this patch.
2023-04-24 10:49:37 +02:00
Tomasz Grabiec
97b969224c locator: Extract maybe_remove_node_being_replaced() 2023-04-24 10:49:37 +02:00
Tomasz Grabiec
e6b76ac4b9 dht: token_metadata: Introduce get_my_id() 2023-04-24 10:49:37 +02:00
Tomasz Grabiec
46eae545ad migration_manager: Send tablet metadata as part of schema pull
This is currently used by group0 to transfer snapshot of the raft
state machine.
2023-04-24 10:49:37 +02:00
Tomasz Grabiec
a8a03ee502 storage_service: Load tablet metadata when reloading topology state
This change puts the reloading into topology_state_load(), which is a
function which reloads token_metadata from system.topology (the new
raft-based topology management). It clears the metadata, so needs to
reload tablet map too. In the future, tablet metadata could change as
part of topology transaction too, so we reload rather than preserve.
2023-04-24 10:49:37 +02:00
Tomasz Grabiec
d42685d0cb storage_service: Load tablet metadata on boot and from group0 changes 2023-04-24 10:49:37 +02:00
Tomasz Grabiec
41e69836fd db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata() 2023-04-24 10:49:37 +02:00
Tomasz Grabiec
b754433ac1 migration_notifier: Introduce before_drop_keyspace()
Tablet allocator will need to inject mutations on keyspace drop.
2023-04-24 10:49:37 +02:00
Tomasz Grabiec
5b046043ea migration_manager: Make prepare_keyspace_drop_announcement() return a future<>
It will be extended with listener notification firing, which is an
async operation.
2023-04-24 10:49:37 +02:00
Tomasz Grabiec
4b4238b069 test: perf: Introduce perf-tablets
Example output:

  $ build/release/scylla perf-tablets --tables 10 --tablets-per-table $((8*1024)) --rf 3

  testlog - Total tablet count: 81920
  testlog - Size of tablet_metadata in memory: 7683 KiB
  testlog - Copied in 2.163421 [ms]
  testlog - Cleared in 0.767507 [ms]
  testlog - Saved in 774.813232 [ms]
  testlog - Read in 246.666885 [ms]
  testlog - Read mutations in 211.677292 [ms]
  testlog - Size of canonical mutations: 20.633621 [MiB]
  testlog - Disk space used by system.tablets: 0.902344 [MiB]
2023-04-24 10:49:37 +02:00
Tomasz Grabiec
70a35f70a6 test: Introduce tablets_test 2023-04-24 10:49:37 +02:00
Tomasz Grabiec
b4ac329367 test: lib: Do not override table id in create_table()
It is already set by schema_maker. In tablets_test we will depend on
the id being the same as that set in the schema_builder, so don't
change it to something else.
2023-04-24 10:49:37 +02:00
Tomasz Grabiec
5a24984147 utils, tablets: Introduce external_memory_usage() 2023-04-24 10:49:37 +02:00
Tomasz Grabiec
f3fbfdaa37 db: tablets: Add printers
Example:

TRACE 2023-03-30 12:06:33,918 [shard 0] tablets - Read tablet metadata: {
  8cd5b560-cee2-11ed-9cd5-7f37187f2167: {
    [0]: last_token=-6917529027641081857, replicas={4fe5c4d5-7030-4ddd-8117-ba22c29f4f57:0},
    [1]: last_token=-4611686018427387905, replicas={3160b965-1925-4677-884b-c761e2bf4272:0},
    [2]: last_token=-2305843009213693953, replicas={3160b965-1925-4677-884b-c761e2bf4272:0},
    [3]: last_token=-1, replicas={4fe5c4d5-7030-4ddd-8117-ba22c29f4f57:0},
    [4]: last_token=2305843009213693951, replicas={3160b965-1925-4677-884b-c761e2bf4272:0},
    [5]: last_token=4611686018427387903, replicas={4fe5c4d5-7030-4ddd-8117-ba22c29f4f57:0},
    [6]: last_token=6917529027641081855, replicas={4fe5c4d5-7030-4ddd-8117-ba22c29f4f57:0},
    [7]: last_token=9223372036854775807, replicas={3160b965-1925-4677-884b-c761e2bf4272:0}
  }
}
2023-04-24 10:49:37 +02:00
Tomasz Grabiec
9d786c1ebc db: tablets: Add persistence layer 2023-04-24 10:49:37 +02:00
Tomasz Grabiec
fa8ad9a585 dht: Use last_token_of_compaction_group() in split_token_range_msb() 2023-04-24 10:49:37 +02:00
Tomasz Grabiec
fceb5f8cf6 locator: Introduce tablet_metadata
token_metadata now stores tablet metadata with information about
tablets in the system.
2023-04-24 10:49:37 +02:00
Tomasz Grabiec
241f7febec dht: Introduce first_token() 2023-04-24 10:49:36 +02:00
Tomasz Grabiec
462e3ffd36 dht: Introduce next_token() 2023-04-24 10:49:36 +02:00
Tomasz Grabiec
27acf3b129 storage_proxy: Improve trace-level logging 2023-04-24 10:49:36 +02:00
Tomasz Grabiec
34a9c62ae5 locator: token_metadata: Fix confusing comment on ring_range()
It could be interpreted to mean that the search token is excluded.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
e4865bd4d1 dht, storage_proxy: Abstract token space splitting
Currently, scans are splitting partition ranges around tokens. This
will have to change with tablets, where we should split at tablet
boundaries.

This patch introduces token_range_splitter which abstracts this
task. It is provided by effective_replication_map implementation.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
b769c4ee55 Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries"
This reverts commit 95bf8eebe0.

Later patches will adapt this code to work with token_range_splitter,
and the unit test added by the reverted commit will start to fail.

The unit test asks the query_ranges_to_vnodes_generator to split the range:

   [t:end, t+1:start)

around token t, and expects the generator to produce an empty range

   [t:end, t:end]

After adapting this code to token_range_splitter, the input range will
not be split because it is recognized as adjacent to t:end, and the
optimization logic will not kick in. Rather than adding more logic to
handle this case, I think it's better to drop the optimization, as it
is not very useful (rarely happens) and not required for correctness.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
94e1c7b859 db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms()
This allows update_pending_ranges(), invoked on keyspace creation, to
succeed in the presence of keyspaces with per-table replication
strategy. It will update only vnode-based erms, which is intended
behavior, since only those need pending ranges updated.

This change will also make node operations like bootstrap, repair,
etc. to work (not fail) in the presence of keyspaces with per-table
erms, they will just not be replicated using those algorithms.

Before, these would fail inside get_effective_replication_map(), which
is forbidden for keyspaces with per-table replication.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
dc04da15ec db: Introduce get_non_local_vnode_based_strategy_keyspaces()
It's meant to be used in places where currently
get_non_local_strategy_keyspaces() is used, but work only with
keyspaces which use vnode-based replication strategy.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
8fcb320e71 service: storage_proxy: Avoid copying keyspace name in write handler 2023-04-24 10:49:36 +02:00
Tomasz Grabiec
9b17ad3771 locator: Introduce per-table replication strategy
Will be used by tablet-based replication strategies, for which
effective replication map is different per table.

Also, this patch adapts existing users of effective replication map to
use the per-table effective replication map.

For simplicity, every table has an effective replication map, even if
the erm is per keyspace. This way the client code can be uniform and
doesn't have to check whether replication strategy is per table.

Not all users of per-keyspace get_effective_replication_map() are
adapted yet to work per-table. Those algorithms will throw an
exception when invoked on a keyspace which uses per-table replication
strategy.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
5d9bcb45de treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type 2023-04-24 10:49:36 +02:00
Tomasz Grabiec
bb297d86a0 locator: Introduce effective_replication_map
With tablet-based replication strategies it will represent replication
of a single table.

Current vnode_effective_replication_map can be adapted to this interface.

This will allow algorithms like those in storage_proxy to work with
both kinds of replication strategies over a single abstraction.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
d3c9ad4ed6 locator: Rename effective_replication_map to vnode_effective_replication_map
In preparation for introducing a more abstract
effective_replication_map which can describe replication maps which
are not based on vnodes.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
1343bfa708 locator: effective_replication_map: Abstract get_pending_endpoints() 2023-04-24 10:49:36 +02:00
Tomasz Grabiec
7b01fe8742 db: Propagate feature_service to abstract_replication_strategy::validate_options()
Some replication strategy options may be feature-dependent.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
9781d3ffc5 db: config: Introduce experimental "TABLETS" feature 2023-04-24 10:49:36 +02:00
Tomasz Grabiec
a892e144cc db: Log replication strategy for debugging purposes 2023-04-24 10:49:36 +02:00
Tomasz Grabiec
7543c75b62 db: Log full exception on error in do_parse_schema_tables() 2023-04-24 10:49:36 +02:00
Tomasz Grabiec
c923bdd222 db: keyspace: Remove non-const replication strategy getter
Keyspace will store replication_ptr, which is a const pointer. No user
needs a mutable reference.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
bf2ce8ff75 config: Reformat 2023-04-24 10:49:36 +02:00
Benny Halevy
9768046d7c compaction_manager: print compaction_group id
Add a formatter to compaction::table_state that
prints the table ks_name.cf_name and compaction group id.

Fixes #13467

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-24 10:07:03 +03:00
Benny Halevy
dabf46c37f compaction_group, table_state: add group_id member
To help identify the compaction group / table_state.

Ref #13467

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-24 10:06:04 +03:00
Benny Halevy
1134ca2767 compaction_manager: offstrategy compaction: skip compaction if no candidates are found
In many cases we trigger offstrategy compaction opportunistically
also when there's nothing to do.  In this case we still print
to the log lots of info-level message and call
`run_offstrategy_compaction` that wastes more cpu cycles
on learning that it has nothing to do.

This change bails out early if the maintenance set is empty
and prints a "Skipping off-strategy compaction" message in debug
level instead.

Fixes #13466

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-24 09:23:32 +03:00
Benny Halevy
2e24b05122 compaction: make_partition_filter: do not assert shard ownership
Now, with f1bbf705f9
(Cleanup sstables in resharding and other compaction types),
we may filter sstables as part of resharding compaction
and the assertion that all tokens are owned by the current
shard when filtering is no longer true.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 15:24:20 +03:00
Benny Halevy
c7d064b8b1 distributed_loader: distribute_reshard_jobs: pick one of the sstable shard owners
When distributing the resharding jobs, prefer one of
the sstable shard owners based on foreign_sstable_open_info.

This is particularly important for uploaded sstables
that are resharded since they require cleanup.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 15:13:16 +03:00
Benny Halevy
2f61de8f7b table, compaction_manager: prevent cross shard access to owned_ranges_ptr
Seen after f1bbf705f9 in debug mode

distributed_loader collect_all_shared_sstables copies
compaction::owned_ranges_ptr (lw_shared_ptr<const
dht::token_range_vector>)
across shards.

Since update_sstable_cleanup_state is synchronous, it can
be passed a const refrence to the token_range_vector instead.
It is ok to access the memory read-only across shards
and since this happens on start-up, there are no special
performance requirements.

Fixes #13631

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 15:12:13 +03:00
Botond Dénes
ecbb118d32 reader_concurrency_semaphore: misc updates w.r.t. recent permit state name changes
Update comments, test names and etc. that are still using the old terminology for
permit state names, bring them up to date with the recent state name changes.
2023-04-19 05:31:27 -04:00
Botond Dénes
e71d6566ab reader_concurrency_semaphore: update permit members w.r.t. recent permit state name changes
They are still using the old terminology for permit state names, bring
them up to date with the recent state name changes.
2023-04-19 05:20:44 -04:00
Botond Dénes
804403f618 reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes
They is still using the old terminology for permit state names, bring
them up to date with the recent state name changes.
2023-04-19 05:20:42 -04:00
Botond Dénes
89328ce447 reader_concurrency_semaphore: update API w.r.t. recent permit state name changes
It is still using the old terminology for permit state names, bring it
up to date with the recent state name changes.
2023-04-19 05:18:13 -04:00
Botond Dénes
3919effe2d reader_concurrency_semaphore: update stats w.r.t. recent permit state name changes
It is still using the old terminology for permit state names, bring it
up to date with the recent state name changes.
2023-04-19 05:17:34 -04:00
Benny Halevy
456f5dfce5 api: column_family: add log messages for admin operation
Similar to the storage_service api, print a log message
for admin operations like enabling/disabling auto_compaction,
running major compaction, and setting the table compaction
strategy.

Note that there is overlap in functionality
between the storage_service and the column_family api entry points.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-18 17:11:33 +03:00
Benny Halevy
5e371e7861 test: rest_api: add test_column_family
Add a test for column_family/autocompaction

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-18 17:09:31 +03:00
Kefu Chai
37cf04818e alternator: split the param list of executor ctor into multi lines
before this change, the line is 249 chars long, so split it into
multiple lines for better readabitlity.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-23 20:57:28 +08:00
Kefu Chai
69c21f490a alternator,config: make alternator_timeout_in_ms live-updateable
before this change, alternator_timeout_in_ms is not live-updatable,
as after setting executor's default timeout right before creating
sharded executor instances, they never get updated with this option
anymore.

in this change,

* alternator_timeout_in_ms is marked as live-updateable
* executor::_s_default_timeout is changed to a thread_local variable,
  so it can be updated by a per-shard updateable_value. and
  it is now a updateable_value, so its variable name is updated
  accordingly. this value is set in the ctor of executor, and
  it is disconnected from the corresponding named_value<> option
  in the dtor of executor.
* alternator_timeout_in_ms is passed to the constructor of
  executor via sharded_parameter, so executor::_timeout_in_ms can
  be initialized on per-shard basis
* executor::set_default_timeout() is dropped, as we already pass
  the option to executor in its ctor.

please note, in the ctor of executor, we always update the cached
value of `s_default_timeout` with the value of `_timeout_in_ms`,
and we set the default timeout to 10s in `alternator_test_env`.
this is a design decision to avoid bending the production code for
testing, as in production, we always set the timeout with the value
specified either by the default value of yaml conf file.

Fixes #12232
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-23 20:57:08 +08:00
Wojciech Mitros
b03fce524b cql-pytest: test permissions for UDTs with quoted names
Currently, we only tested whether permissions with UDFs
that have quoted names work correctly. This patch adds
the missing test that confirms that we can also use UDTs
(as UDF parameter types) when altering permissions.
2023-03-23 01:41:58 +01:00
Wojciech Mitros
169a821316 cql: maybe quote user type name in ut_name::to_string()
Currently, the ut_name::to_string() is used only in 2 cases:
the first one is in logs or as part of error messages, and the
second one is during parsing, temporarily storing the user
defined type name in the auth::resource for later preparation
with database and data_dictionary context.

This patch changes the string so that the 'name' part of the
ut_name (as opposed to the 'keyspace' part) is now quoted when
needed. This does not worsen the logging set of cases, but it
does help with parsing of the resulting string, when finishing
preparing the auth::resource.

After the modification, a more fitting name for the function
is "ut_name::to_cql_string()", so the function is renamed to that.
2023-03-23 01:41:58 +01:00
Wojciech Mitros
fc8dcc1a62 cql: add a check for currently used stack in parser
While in debug mode, we may switch the default stack to
a larger one when parsing cql. We may, however, invoke
the parser recusively, causing us to switch to the big
stack while currently using it. After the reset, we
assume that the stack is empty, so after switching to
the same stack, we write over its previous contents.

This is fixed by checking if we're already using the large
stack, which is achieved by comparing the address of
a local variable to the start and end of the large stack.
2023-03-23 01:41:58 +01:00
Wojciech Mitros
a086682ecb cql-pytest: add an optional name parameter to new_type()
Currently, when creating a UDT, we're always generating
a new name for it. This patch enables setting the name
to a specific string instead.
2023-03-23 01:41:58 +01:00
539 changed files with 17043 additions and 15792 deletions

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -26,6 +26,7 @@ set(CMAKE_CXX_EXTENSIONS ON CACHE INTERNAL "")
set(CMAKE_CXX_VISIBILITY_PRESET hidden)
set(Seastar_TESTING ON CACHE BOOL "" FORCE)
set(Seastar_API_LEVEL 6 CACHE STRING "" FORCE)
add_subdirectory(seastar)
# System libraries dependencies
@@ -183,12 +184,25 @@ target_link_libraries(scylla PRIVATE
# Force SHA1 build-id generation
set(default_linker_flags "-Wl,--build-id=sha1")
include(CheckLinkerFlag)
foreach(linker "lld" "gold")
set(Scylla_USE_LINKER
""
CACHE
STRING
"Use specified linker instead of the default one")
if(Scylla_USE_LINKER)
set(linkers "${Scylla_USE_LINKER}")
else()
set(linkers "lld" "gold")
endif()
foreach(linker ${linkers})
set(linker_flag "-fuse-ld=${linker}")
check_linker_flag(CXX ${linker_flag} "CXX_LINKER_HAVE_${linker}")
if(CXX_LINKER_HAVE_${linker})
string(APPEND default_linker_flags " ${linker_flag}")
break()
elseif(Scylla_USE_LINKER)
message(FATAL_ERROR "${Scylla_USE_LINKER} is not supported.")
endif()
endforeach()

View File

@@ -72,7 +72,7 @@ fi
# Default scylla product/version tags
PRODUCT=scylla
VERSION=5.3.0-dev
VERSION=5.3.0-rc1
if test -f version
then

View File

@@ -53,7 +53,7 @@ future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::strin
if (result_set->empty()) {
co_await coroutine::return_exception(api_error::unrecognized_client(format("User not found: {}", username)));
}
const bytes_opt& salted_hash = result_set->rows().front().front(); // We only asked for 1 row and 1 column
const managed_bytes_opt& salted_hash = result_set->rows().front().front(); // We only asked for 1 row and 1 column
if (!salted_hash) {
co_await coroutine::return_exception(api_error::unrecognized_client(format("No password found for user: {}", username)));
}

View File

@@ -76,13 +76,16 @@ future<> controller::start_server() {
_ssg = create_smp_service_group(c).get0();
rmw_operation::set_default_write_isolation(_config.alternator_write_isolation());
executor::set_default_timeout(std::chrono::milliseconds(_config.alternator_timeout_in_ms()));
net::inet_address addr = utils::resolve(_config.alternator_address, family).get0();
auto get_cdc_metadata = [] (cdc::generation_service& svc) { return std::ref(svc.get_cdc_metadata()); };
_executor.start(std::ref(_gossiper), std::ref(_proxy), std::ref(_mm), std::ref(_sys_dist_ks), sharded_parameter(get_cdc_metadata, std::ref(_cdc_gen_svc)), _ssg.value()).get();
auto get_timeout_in_ms = [] (const db::config& cfg) -> utils::updateable_value<uint32_t> {
return cfg.alternator_timeout_in_ms;
};
_executor.start(std::ref(_gossiper), std::ref(_proxy), std::ref(_mm), std::ref(_sys_dist_ks),
sharded_parameter(get_cdc_metadata, std::ref(_cdc_gen_svc)), _ssg.value(),
sharded_parameter(get_timeout_in_ms, std::ref(_config))).get();
_server.start(std::ref(_executor), std::ref(_proxy), std::ref(_gossiper), std::ref(_auth_service), std::ref(_sl_controller)).get();
// Note: from this point on, if start_server() throws for any reason,
// it must first call stop_server() to stop the executor and server

View File

@@ -6,8 +6,6 @@
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include <regex>
#include "utils/base64.hh"
#include <seastar/core/sleep.hh>
@@ -90,17 +88,20 @@ json::json_return_type make_streamed(rjson::value&& value) {
// move objects to coroutine frame.
auto los = std::move(os);
auto lrs = std::move(rs);
std::exception_ptr ex;
try {
co_await rjson::print(*lrs, los);
co_await los.flush();
co_await los.close();
} catch (...) {
// at this point, we cannot really do anything. HTTP headers and return code are
// already written, and quite potentially a portion of the content data.
// just log + rethrow. It is probably better the HTTP server closes connection
// abruptly or something...
elogger.error("Unhandled exception in data streaming: {}", std::current_exception());
throw;
ex = std::current_exception();
elogger.error("Exception during streaming HTTP response: {}", ex);
}
co_await los.close();
if (ex) {
co_await coroutine::return_exception_ptr(std::move(ex));
}
co_return;
};
@@ -535,7 +536,7 @@ future<executor::request_return_type> executor::delete_table(client_state& clien
}
auto m = co_await mm.prepare_column_family_drop_announcement(keyspace_name, table_name, group0_guard.write_timestamp(), service::migration_manager::drop_views::yes);
auto m2 = mm.prepare_keyspace_drop_announcement(keyspace_name, group0_guard.write_timestamp());
auto m2 = co_await mm.prepare_keyspace_drop_announcement(keyspace_name, group0_guard.write_timestamp());
std::move(m2.begin(), m2.end(), std::back_inserter(m));
@@ -1365,14 +1366,11 @@ mutation put_or_delete_item::build(schema_ptr schema, api::timestamp_type ts) co
// The DynamoDB API doesn't let the client control the server's timeout, so
// we have a global default_timeout() for Alternator requests. The value of
// s_default_timeout is overwritten in alternator::controller::start_server()
// s_default_timeout_ms is overwritten in alternator::controller::start_server()
// based on the "alternator_timeout_in_ms" configuration parameter.
db::timeout_clock::duration executor::s_default_timeout = 10s;
void executor::set_default_timeout(db::timeout_clock::duration timeout) {
s_default_timeout = timeout;
}
thread_local utils::updateable_value<uint32_t> executor::s_default_timeout_in_ms{10'000};
db::timeout_clock::time_point executor::default_timeout() {
return db::timeout_clock::now() + s_default_timeout;
return db::timeout_clock::now() + std::chrono::milliseconds(s_default_timeout_in_ms);
}
static future<std::unique_ptr<rjson::value>> get_previous_item(
@@ -2300,14 +2298,14 @@ static std::optional<attrs_to_get> calculate_attrs_to_get(const rjson::value& re
* as before.
*/
void executor::describe_single_item(const cql3::selection::selection& selection,
const std::vector<bytes_opt>& result_row,
const std::vector<managed_bytes_opt>& result_row,
const std::optional<attrs_to_get>& attrs_to_get,
rjson::value& item,
bool include_all_embedded_attributes)
{
const auto& columns = selection.get_columns();
auto column_it = columns.begin();
for (const bytes_opt& cell : result_row) {
for (const managed_bytes_opt& cell : result_row) {
std::string column_name = (*column_it)->name_as_text();
if (cell && column_name != executor::ATTRS_COLUMN_NAME) {
if (!attrs_to_get || attrs_to_get->contains(column_name)) {
@@ -2315,7 +2313,9 @@ void executor::describe_single_item(const cql3::selection::selection& selection,
// so add() makes sense
rjson::add_with_string_name(item, column_name, rjson::empty_object());
rjson::value& field = item[column_name.c_str()];
rjson::add_with_string_name(field, type_to_string((*column_it)->type), json_key_column_value(*cell, **column_it));
cell->with_linearized([&] (bytes_view linearized_cell) {
rjson::add_with_string_name(field, type_to_string((*column_it)->type), json_key_column_value(linearized_cell, **column_it));
});
}
} else if (cell) {
auto deserialized = attrs_type()->deserialize(*cell);
@@ -2371,21 +2371,22 @@ std::optional<rjson::value> executor::describe_single_item(schema_ptr schema,
return item;
}
std::vector<rjson::value> executor::describe_multi_item(schema_ptr schema,
const query::partition_slice& slice,
const cql3::selection::selection& selection,
const query::result& query_result,
const std::optional<attrs_to_get>& attrs_to_get) {
cql3::selection::result_set_builder builder(selection, gc_clock::now());
query::result_view::consume(query_result, slice, cql3::selection::result_set_builder::visitor(builder, *schema, selection));
future<std::vector<rjson::value>> executor::describe_multi_item(schema_ptr schema,
const query::partition_slice&& slice,
shared_ptr<cql3::selection::selection> selection,
foreign_ptr<lw_shared_ptr<query::result>> query_result,
shared_ptr<const std::optional<attrs_to_get>> attrs_to_get) {
cql3::selection::result_set_builder builder(*selection, gc_clock::now());
query::result_view::consume(*query_result, slice, cql3::selection::result_set_builder::visitor(builder, *schema, *selection));
auto result_set = builder.build();
std::vector<rjson::value> ret;
for (auto& result_row : result_set->rows()) {
rjson::value item = rjson::empty_object();
describe_single_item(selection, result_row, attrs_to_get, item);
describe_single_item(*selection, result_row, *attrs_to_get, item);
ret.push_back(std::move(item));
co_await coroutine::maybe_yield();
}
return ret;
co_return ret;
}
static bool check_needs_read_before_write(const parsed::value& v) {
@@ -3257,8 +3258,7 @@ future<executor::request_return_type> executor::batch_get_item(client_state& cli
service::storage_proxy::coordinator_query_options(executor::default_timeout(), permit, client_state, trace_state)).then(
[schema = rs.schema, partition_slice = std::move(partition_slice), selection = std::move(selection), attrs_to_get = rs.attrs_to_get] (service::storage_proxy::coordinator_query_result qr) mutable {
utils::get_local_injector().inject("alternator_batch_get_item", [] { throw std::runtime_error("batch_get_item injection"); });
std::vector<rjson::value> jsons = describe_multi_item(schema, partition_slice, *selection, *qr.query_result, *attrs_to_get);
return make_ready_future<std::vector<rjson::value>>(std::move(jsons));
return describe_multi_item(std::move(schema), std::move(partition_slice), std::move(selection), std::move(qr.query_result), std::move(attrs_to_get));
});
response_futures.push_back(std::move(f));
}
@@ -3498,7 +3498,7 @@ public:
_column_it = _columns.begin();
}
void accept_value(const std::optional<query::result_bytes_view>& result_bytes_view) {
void accept_value(managed_bytes_view_opt result_bytes_view) {
if (!result_bytes_view) {
++_column_it;
return;

View File

@@ -22,6 +22,7 @@
#include "alternator/error.hh"
#include "stats.hh"
#include "utils/rjson.hh"
#include "utils/updateable_value.hh"
namespace db {
class system_distributed_keyspace;
@@ -170,8 +171,16 @@ public:
static constexpr auto KEYSPACE_NAME_PREFIX = "alternator_";
static constexpr std::string_view INTERNAL_TABLE_PREFIX = ".scylla.alternator.";
executor(gms::gossiper& gossiper, service::storage_proxy& proxy, service::migration_manager& mm, db::system_distributed_keyspace& sdks, cdc::metadata& cdc_metadata, smp_service_group ssg)
: _gossiper(gossiper), _proxy(proxy), _mm(mm), _sdks(sdks), _cdc_metadata(cdc_metadata), _ssg(ssg) {}
executor(gms::gossiper& gossiper,
service::storage_proxy& proxy,
service::migration_manager& mm,
db::system_distributed_keyspace& sdks,
cdc::metadata& cdc_metadata,
smp_service_group ssg,
utils::updateable_value<uint32_t> default_timeout_in_ms)
: _gossiper(gossiper), _proxy(proxy), _mm(mm), _sdks(sdks), _cdc_metadata(cdc_metadata), _ssg(ssg) {
s_default_timeout_in_ms = std::move(default_timeout_in_ms);
}
future<request_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
@@ -199,13 +208,16 @@ public:
future<request_return_type> describe_continuous_backups(client_state& client_state, service_permit permit, rjson::value request);
future<> start();
future<> stop() { return make_ready_future<>(); }
future<> stop() {
// disconnect from the value source, but keep the value unchanged.
s_default_timeout_in_ms = utils::updateable_value<uint32_t>{s_default_timeout_in_ms()};
return make_ready_future<>();
}
static sstring table_name(const schema&);
static db::timeout_clock::time_point default_timeout();
static void set_default_timeout(db::timeout_clock::duration timeout);
private:
static db::timeout_clock::duration s_default_timeout;
static thread_local utils::updateable_value<uint32_t> s_default_timeout_in_ms;
public:
static schema_ptr find_table(service::storage_proxy&, const rjson::value& request);
@@ -222,14 +234,14 @@ public:
const query::result&,
const std::optional<attrs_to_get>&);
static std::vector<rjson::value> describe_multi_item(schema_ptr schema,
const query::partition_slice& slice,
const cql3::selection::selection& selection,
const query::result& query_result,
const std::optional<attrs_to_get>& attrs_to_get);
static future<std::vector<rjson::value>> describe_multi_item(schema_ptr schema,
const query::partition_slice&& slice,
shared_ptr<cql3::selection::selection> selection,
foreign_ptr<lw_shared_ptr<query::result>> query_result,
shared_ptr<const std::optional<attrs_to_get>> attrs_to_get);
static void describe_single_item(const cql3::selection::selection&,
const std::vector<bytes_opt>&,
const std::vector<managed_bytes_opt>&,
const std::optional<attrs_to_get>&,
rjson::value&,
bool = false);

View File

@@ -50,6 +50,115 @@ type_representation represent_type(alternator_type atype) {
return it->second;
}
// Get the magnitude and precision of a big_decimal - as these concepts are
// defined by DynamoDB - to allow us to enforce limits on those as explained
// in ssue #6794. The "magnitude" of 9e123 is 123 and of -9e-123 is -123,
// the "precision" of 12.34e56 is the number of significant digits - 4.
//
// Unfortunately it turned out to be quite difficult to take a big_decimal and
// calculate its magnitude and precision from its scale() and unscaled_value().
// So in the following ugly implementation we calculate them from the string
// representation instead. We assume the number was already parsed
// sucessfully to a big_decimal to it follows its syntax rules.
//
// FIXME: rewrite this function to take a big_decimal, not a string.
// Maybe a snippet like this can help:
// boost::multiprecision::cpp_int digits = boost::multiprecision::log10(num.unscaled_value().convert_to<boost::multiprecision::mpf_float_50>()).convert_to<boost::multiprecision::cpp_int>() + 1;
internal::magnitude_and_precision internal::get_magnitude_and_precision(std::string_view s) {
size_t e_or_end = s.find_first_of("eE");
std::string_view base = s.substr(0, e_or_end);
if (s[0]=='-' || s[0]=='+') {
base = base.substr(1);
}
int magnitude = 0;
int precision = 0;
size_t dot_or_end = base.find_first_of(".");
size_t nonzero = base.find_first_not_of("0");
if (dot_or_end != std::string_view::npos) {
if (nonzero == dot_or_end) {
// 0.000031 => magnitude = -5 (like 3.1e-5), precision = 2.
std::string_view fraction = base.substr(dot_or_end + 1);
size_t nonzero2 = fraction.find_first_not_of("0");
if (nonzero2 != std::string_view::npos) {
magnitude = -nonzero2 - 1;
precision = fraction.size() - nonzero2;
}
} else {
// 000123.45678 => magnitude = 2, precision = 8.
magnitude = dot_or_end - nonzero - 1;
precision = base.size() - nonzero - 1;
}
// trailing zeros don't count to precision, e.g., precision
// of 1000.0, 1.0 or 1.0000 are just 1.
size_t last_significant = base.find_last_not_of(".0");
if (last_significant == std::string_view::npos) {
precision = 0;
} else if (last_significant < dot_or_end) {
// e.g., 1000.00 reduce 5 = 7 - (0+1) - 1 from precision
precision -= base.size() - last_significant - 2;
} else {
// e.g., 1235.60 reduce 5 = 7 - (5+1) from precision
precision -= base.size() - last_significant - 1;
}
} else if (nonzero == std::string_view::npos) {
// all-zero integer 000000
magnitude = 0;
precision = 0;
} else {
magnitude = base.size() - 1 - nonzero;
precision = base.size() - nonzero;
// trailing zeros don't count to precision, e.g., precision
// of 1000 is just 1.
size_t last_significant = base.find_last_not_of("0");
if (last_significant == std::string_view::npos) {
precision = 0;
} else {
// e.g., 1000 reduce 3 = 4 - (0+1)
precision -= base.size() - last_significant - 1;
}
}
if (precision && e_or_end != std::string_view::npos) {
std::string_view exponent = s.substr(e_or_end + 1);
if (exponent.size() > 4) {
// don't even bother atoi(), exponent is too large
magnitude = exponent[0]=='-' ? -9999 : 9999;
} else {
try {
magnitude += boost::lexical_cast<int32_t>(exponent);
} catch (...) {
magnitude = 9999;
}
}
}
return magnitude_and_precision {magnitude, precision};
}
// Parse a number read from user input, validating that it has a valid
// numeric format and also in the allowed magnitude and precision ranges
// (see issue #6794). Throws an api_error::validation if the validation
// failed.
static big_decimal parse_and_validate_number(std::string_view s) {
try {
big_decimal ret(s);
auto [magnitude, precision] = internal::get_magnitude_and_precision(s);
if (magnitude > 125) {
throw api_error::validation(format("Number overflow: {}. Attempting to store a number with magnitude larger than supported range.", s));
}
if (magnitude < -130) {
throw api_error::validation(format("Number underflow: {}. Attempting to store a number with magnitude lower than supported range.", s));
}
if (precision > 38) {
throw api_error::validation(format("Number too precise: {}. Attempting to store a number with more significant digits than supported.", s));
}
return ret;
} catch (const marshal_exception& e) {
throw api_error::validation(format("The parameter cannot be converted to a numeric value: {}", s));
}
}
struct from_json_visitor {
const rjson::value& v;
bytes_ostream& bo;
@@ -67,11 +176,7 @@ struct from_json_visitor {
bo.write(boolean_type->decompose(v.GetBool()));
}
void operator()(const decimal_type_impl& t) const {
try {
bo.write(t.from_string(rjson::to_string_view(v)));
} catch (const marshal_exception& e) {
throw api_error::validation(format("The parameter cannot be converted to a numeric value: {}", v));
}
bo.write(decimal_type->decompose(parse_and_validate_number(rjson::to_string_view(v))));
}
// default
void operator()(const abstract_type& t) const {
@@ -203,6 +308,8 @@ bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column
// FIXME: it's difficult at this point to get information if value was provided
// in request or comes from the storage, for now we assume it's user's fault.
return *unwrap_bytes(value, true);
} else if (column.type == decimal_type) {
return decimal_type->decompose(parse_and_validate_number(rjson::to_string_view(value)));
} else {
return column.type->from_string(value_view);
}
@@ -295,16 +402,13 @@ big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {
if (it->name != "N") {
throw api_error::validation(format("{}: expected number, found type '{}'", diagnostic, it->name));
}
try {
if (!it->value.IsString()) {
// We shouldn't reach here. Callers normally validate their input
// earlier with validate_value().
throw api_error::validation(format("{}: improperly formatted number constant", diagnostic));
}
return big_decimal(rjson::to_string_view(it->value));
} catch (const marshal_exception& e) {
throw api_error::validation(format("The parameter cannot be converted to a numeric value: {}", it->value));
if (!it->value.IsString()) {
// We shouldn't reach here. Callers normally validate their input
// earlier with validate_value().
throw api_error::validation(format("{}: improperly formatted number constant", diagnostic));
}
big_decimal ret = parse_and_validate_number(rjson::to_string_view(it->value));
return ret;
}
std::optional<big_decimal> try_unwrap_number(const rjson::value& v) {
@@ -316,8 +420,8 @@ std::optional<big_decimal> try_unwrap_number(const rjson::value& v) {
return std::nullopt;
}
try {
return big_decimal(rjson::to_string_view(it->value));
} catch (const marshal_exception& e) {
return parse_and_validate_number(rjson::to_string_view(it->value));
} catch (api_error&) {
return std::nullopt;
}
}

View File

@@ -94,5 +94,12 @@ std::optional<rjson::value> set_diff(const rjson::value& v1, const rjson::value&
// Returns a null value if one of the arguments is not actually a list.
rjson::value list_concatenate(const rjson::value& v1, const rjson::value& v2);
namespace internal {
struct magnitude_and_precision {
int magnitude;
int precision;
};
magnitude_and_precision get_magnitude_and_precision(std::string_view);
}
}

View File

@@ -241,7 +241,7 @@ static bool is_expired(const rjson::value& expiration_time, gc_clock::time_point
// understands it is an expiration event - not a user-initiated deletion.
static future<> expire_item(service::storage_proxy& proxy,
const service::query_state& qs,
const std::vector<bytes_opt>& row,
const std::vector<managed_bytes_opt>& row,
schema_ptr schema,
api::timestamp_type ts) {
// Prepare the row key to delete
@@ -260,7 +260,7 @@ static future<> expire_item(service::storage_proxy& proxy,
// FIXME: log or increment a metric if this happens.
return make_ready_future<>();
}
exploded_pk.push_back(*row_c);
exploded_pk.push_back(to_bytes(*row_c));
}
auto pk = partition_key::from_exploded(exploded_pk);
mutation m(schema, pk);
@@ -280,7 +280,7 @@ static future<> expire_item(service::storage_proxy& proxy,
// FIXME: log or increment a metric if this happens.
return make_ready_future<>();
}
exploded_ck.push_back(*row_c);
exploded_ck.push_back(to_bytes(*row_c));
}
auto ck = clustering_key::from_exploded(exploded_ck);
m.partition().clustered_row(*schema, ck).apply(tombstone(ts, gc_clock::now()));
@@ -387,7 +387,7 @@ class token_ranges_owned_by_this_shard {
class ranges_holder_primary {
const dht::token_range_vector _token_ranges;
public:
ranges_holder_primary(const locator::effective_replication_map_ptr& erm, gms::gossiper& g, gms::inet_address ep)
ranges_holder_primary(const locator::vnode_effective_replication_map_ptr& erm, gms::gossiper& g, gms::inet_address ep)
: _token_ranges(erm->get_primary_ranges(ep)) {}
std::size_t size() const { return _token_ranges.size(); }
const dht::token_range& operator[](std::size_t i) const {
@@ -593,7 +593,7 @@ static future<> scan_table_ranges(
continue;
}
for (const auto& row : rows) {
const bytes_opt& cell = row[*expiration_column];
const managed_bytes_opt& cell = row[*expiration_column];
if (!cell) {
continue;
}

View File

@@ -437,6 +437,68 @@
}
]
},
{
"path":"/column_family/tombstone_gc/{name}",
"operations":[
{
"method":"GET",
"summary":"Check if tombstone GC is enabled for a given table",
"type":"boolean",
"nickname":"get_tombstone_gc",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The table name in keyspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
},
{
"method":"POST",
"summary":"Enable tombstone GC for a given table",
"type":"void",
"nickname":"enable_tombstone_gc",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The table name in keyspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
},
{
"method":"DELETE",
"summary":"Disable tombstone GC for a given table",
"type":"void",
"nickname":"disable_tombstone_gc",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The table name in keyspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
{
"path":"/column_family/estimate_keys/{name}",
"operations":[

View File

@@ -2110,6 +2110,65 @@
}
]
},
{
"path":"/storage_service/tombstone_gc/{keyspace}",
"operations":[
{
"method":"POST",
"summary":"Enable tombstone GC",
"type":"void",
"nickname":"enable_tombstone_gc",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"cf",
"description":"Comma-separated column family names",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
},
{
"method":"DELETE",
"summary":"Disable tombstone GC",
"type":"void",
"nickname":"disable_tombstone_gc",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"cf",
"description":"Comma-separated column family names",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/deliver_hints",
"operations":[
@@ -2631,7 +2690,7 @@
"description":"File creation time"
},
"generation":{
"type":"long",
"type":"string",
"description":"SSTable generation"
},
"level":{

View File

@@ -871,6 +871,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/enable_auto_compaction: name={}", req->param["name"]);
return ctx.db.invoke_on(0, [&ctx, req = std::move(req)] (replica::database& db) {
auto g = replica::database::autocompaction_toggle_guard(db);
return foreach_column_family(ctx, req->param["name"], [](replica::column_family &cf) {
@@ -882,6 +883,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/disable_auto_compaction: name={}", req->param["name"]);
return ctx.db.invoke_on(0, [&ctx, req = std::move(req)] (replica::database& db) {
auto g = replica::database::autocompaction_toggle_guard(db);
return foreach_column_family(ctx, req->param["name"], [](replica::column_family &cf) {
@@ -892,6 +894,30 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
});
cf::get_tombstone_gc.set(r, [&ctx] (const_req req) {
auto uuid = get_uuid(req.param["name"], ctx.db.local());
replica::table& t = ctx.db.local().find_column_family(uuid);
return t.tombstone_gc_enabled();
});
cf::enable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/enable_tombstone_gc: name={}", req->param["name"]);
return foreach_column_family(ctx, req->param["name"], [](replica::table& t) {
t.set_tombstone_gc_enabled(true);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
cf::disable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/disable_tombstone_gc: name={}", req->param["name"]);
return foreach_column_family(ctx, req->param["name"], [](replica::table& t) {
t.set_tombstone_gc_enabled(false);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
cf::get_built_indexes.set(r, [&ctx, &sys_ks](std::unique_ptr<http::request> req) {
auto ks_cf = parse_fully_qualified_cf_name(req->param["name"]);
auto&& ks = std::get<0>(ks_cf);
@@ -955,6 +981,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
cf::set_compaction_strategy_class.set(r, [&ctx](std::unique_ptr<http::request> req) {
sstring strategy = req->get_query_param("class_name");
apilog.info("column_family/set_compaction_strategy_class: name={} strategy={}", req->param["name"], strategy);
return foreach_column_family(ctx, req->param["name"], [strategy](replica::column_family& cf) {
cf.set_compaction_strategy(sstables::compaction_strategy::type(strategy));
}).then([] {
@@ -1023,6 +1050,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
fail(unimplemented::cause::API);
}
apilog.info("column_family/force_major_compaction: name={}", req->param["name"]);
auto [ks, cf] = parse_fully_qualified_cf_name(req->param["name"]);
auto keyspace = validate_keyspace(ctx, ks);
std::vector<table_id> table_infos = {ctx.db.local().find_uuid(ks, cf)};

View File

@@ -220,32 +220,47 @@ seastar::future<json::json_return_type> run_toppartitions_query(db::toppartition
});
}
future<json::json_return_type> set_tables_autocompaction(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {
static future<json::json_return_type> set_tables(http_context& ctx, const sstring& keyspace, std::vector<sstring> tables, std::function<future<>(replica::table&)> set) {
if (tables.empty()) {
tables = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
apilog.info("set_tables_autocompaction: enabled={} keyspace={} tables={}", enabled, keyspace, tables);
return do_with(keyspace, std::move(tables), [&ctx, enabled] (const sstring &keyspace, const std::vector<sstring>& tables) {
return ctx.db.invoke_on(0, [&ctx, &keyspace, &tables, enabled] (replica::database& db) {
auto g = replica::database::autocompaction_toggle_guard(db);
return ctx.db.invoke_on_all([&keyspace, &tables, enabled] (replica::database& db) {
return parallel_for_each(tables, [&db, &keyspace, enabled] (const sstring& table) {
replica::column_family& cf = db.find_column_family(keyspace, table);
if (enabled) {
cf.enable_auto_compaction();
} else {
return cf.disable_auto_compaction();
}
return make_ready_future<>();
});
}).finally([g = std::move(g)] {});
return do_with(keyspace, std::move(tables), [&ctx, set] (const sstring& keyspace, const std::vector<sstring>& tables) {
return ctx.db.invoke_on_all([&keyspace, &tables, set] (replica::database& db) {
return parallel_for_each(tables, [&db, &keyspace, set] (const sstring& table) {
replica::table& t = db.find_column_family(keyspace, table);
return set(t);
});
});
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
}
future<json::json_return_type> set_tables_autocompaction(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {
apilog.info("set_tables_autocompaction: enabled={} keyspace={} tables={}", enabled, keyspace, tables);
return ctx.db.invoke_on(0, [&ctx, keyspace, tables = std::move(tables), enabled] (replica::database& db) {
auto g = replica::database::autocompaction_toggle_guard(db);
return set_tables(ctx, keyspace, tables, [enabled] (replica::table& cf) {
if (enabled) {
cf.enable_auto_compaction();
} else {
return cf.disable_auto_compaction();
}
return make_ready_future<>();
}).finally([g = std::move(g)] {});
});
}
future<json::json_return_type> set_tables_tombstone_gc(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {
apilog.info("set_tables_tombstone_gc: enabled={} keyspace={} tables={}", enabled, keyspace, tables);
return set_tables(ctx, keyspace, std::move(tables), [enabled] (replica::table& t) {
t.set_tombstone_gc_enabled(enabled);
return make_ready_future<>();
});
}
void set_transport_controller(http_context& ctx, routes& r, cql_transport::controller& ctl) {
ss::start_native_transport.set(r, [&ctl](std::unique_ptr<http::request> req) {
return smp::submit_to(0, [&] {
@@ -619,7 +634,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::describe_any_ring.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) {
// Find an arbitrary non-system keyspace.
auto keyspaces = ctx.db.local().get_non_local_strategy_keyspaces();
auto keyspaces = ctx.db.local().get_non_local_vnode_based_strategy_keyspaces();
if (keyspaces.empty()) {
throw std::runtime_error("No keyspace provided and no non system kespace exist");
}
@@ -1111,6 +1126,22 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
return set_tables_autocompaction(ctx, keyspace, tables, false);
});
ss::enable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("enable_tombstone_gc: keyspace={} tables={}", keyspace, tables);
return set_tables_tombstone_gc(ctx, keyspace, tables, true);
});
ss::disable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("disable_tombstone_gc: keyspace={} tables={}", keyspace, tables);
return set_tables_tombstone_gc(ctx, keyspace, tables, false);
});
ss::deliver_hints.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
@@ -1257,7 +1288,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::sstable info;
info.timestamp = t;
info.generation = sstables::generation_value(sstable->generation());
info.generation = fmt::to_string(sstable->generation());
info.level = sstable->get_sstable_level();
info.size = sstable->bytes_on_disk();
info.data_size = sstable->ondisk_data_size();
@@ -1494,27 +1525,12 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_
throw httpd::bad_param_exception(fmt::format("Unknown argument for 'quarantine_mode' parameter: {}", quarantine_mode_str));
}
const auto& reduce_compaction_stats = [] (const compaction_manager::compaction_stats_opt& lhs, const compaction_manager::compaction_stats_opt& rhs) {
sstables::compaction_stats stats{};
stats += lhs.value();
stats += rhs.value();
return stats;
};
sstables::compaction_stats stats;
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<scrub_sstables_compaction_task_impl>({}, std::move(keyspace), db, column_families, opts, stats);
try {
auto opt_stats = co_await db.map_reduce0([&] (replica::database& db) {
return map_reduce(column_families, [&] (sstring cfname) -> future<std::optional<sstables::compaction_stats>> {
auto& cm = db.get_compaction_manager();
auto& cf = db.find_column_family(keyspace, cfname);
sstables::compaction_stats stats{};
co_await cf.parallel_foreach_table_state([&] (compaction::table_state& ts) mutable -> future<> {
auto r = co_await cm.perform_sstable_scrub(ts, opts);
stats += r.value_or(sstables::compaction_stats{});
});
co_return stats;
}, std::make_optional(sstables::compaction_stats{}), reduce_compaction_stats);
}, std::make_optional(sstables::compaction_stats{}), reduce_compaction_stats);
if (opt_stats && opt_stats->validation_errors) {
co_await task->done();
if (stats.validation_errors) {
co_return json::json_return_type(static_cast<int>(scrub_status::validation_errors));
}
} catch (const sstables::compaction_aborted_exception&) {

View File

@@ -35,16 +35,9 @@ public:
///
authenticated_user() = default;
explicit authenticated_user(std::string_view name);
friend bool operator==(const authenticated_user&, const authenticated_user&) noexcept = default;
};
inline bool operator==(const authenticated_user& u1, const authenticated_user& u2) noexcept {
return u1.name == u2.name;
}
inline bool operator!=(const authenticated_user& u1, const authenticated_user& u2) noexcept {
return !(u1 == u2);
}
const authenticated_user& anonymous_user() noexcept;
inline bool is_anonymous(const authenticated_user& u) noexcept {

View File

@@ -39,10 +39,6 @@ inline bool operator==(const permission_details& pd1, const permission_details&
== std::forward_as_tuple(pd2.role_name, pd2.resource, pd2.permissions.mask());
}
inline bool operator!=(const permission_details& pd1, const permission_details& pd2) {
return !(pd1 == pd2);
}
inline bool operator<(const permission_details& pd1, const permission_details& pd2) {
return std::forward_as_tuple(pd1.role_name, pd1.resource, pd1.permissions)
< std::forward_as_tuple(pd2.role_name, pd2.resource, pd2.permissions);

View File

@@ -79,6 +79,13 @@ static permission_set applicable_permissions(const service_level_resource_view &
}
static permission_set applicable_permissions(const functions_resource_view& fv) {
if (fv.function_name() || fv.function_signature()) {
return permission_set::of<
permission::ALTER,
permission::DROP,
permission::AUTHORIZE,
permission::EXECUTE>();
}
return permission_set::of<
permission::CREATE,
permission::ALTER,
@@ -292,7 +299,7 @@ std::optional<std::vector<std::string_view>> functions_resource_view::function_a
std::vector<std::string_view> parts;
if (_resource._parts[3] == "") {
return {};
return parts;
}
for (size_t i = 3; i < _resource._parts.size(); i++) {
parts.push_back(_resource._parts[i]);

View File

@@ -117,20 +117,12 @@ private:
friend class functions_resource_view;
friend bool operator<(const resource&, const resource&);
friend bool operator==(const resource&, const resource&);
friend bool operator==(const resource&, const resource&) = default;
friend resource parse_resource(std::string_view);
};
bool operator<(const resource&, const resource&);
inline bool operator==(const resource& r1, const resource& r2) {
return (r1._kind == r2._kind) && (r1._parts == r2._parts);
}
inline bool operator!=(const resource& r1, const resource& r2) {
return !(r1 == r2);
}
std::ostream& operator<<(std::ostream&, const resource&);
class resource_kind_mismatch : public std::invalid_argument {

View File

@@ -17,10 +17,6 @@ std::ostream& operator<<(std::ostream& os, const role_or_anonymous& mr) {
return os;
}
bool operator==(const role_or_anonymous& mr1, const role_or_anonymous& mr2) noexcept {
return mr1.name == mr2.name;
}
bool is_anonymous(const role_or_anonymous& mr) noexcept {
return !mr.name.has_value();
}

View File

@@ -26,16 +26,11 @@ public:
role_or_anonymous() = default;
role_or_anonymous(std::string_view name) : name(name) {
}
friend bool operator==(const role_or_anonymous&, const role_or_anonymous&) noexcept = default;
};
std::ostream& operator<<(std::ostream&, const role_or_anonymous&);
bool operator==(const role_or_anonymous&, const role_or_anonymous&) noexcept;
inline bool operator!=(const role_or_anonymous& mr1, const role_or_anonymous& mr2) noexcept {
return !(mr1 == mr2);
}
bool is_anonymous(const role_or_anonymous&) noexcept;
}

View File

@@ -55,6 +55,7 @@ future<bool> default_role_row_satisfies(
return qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_query_state(),
{meta::DEFAULT_SUPERUSER_NAME},
cql3::query_processor::cache_internal::yes).then([&qp, &p](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {

View File

@@ -7,6 +7,7 @@
*/
#include <seastar/core/coroutine.hh>
#include "auth/resource.hh"
#include "auth/service.hh"
#include <algorithm>
@@ -20,6 +21,7 @@
#include "auth/allow_all_authorizer.hh"
#include "auth/common.hh"
#include "auth/role_or_anonymous.hh"
#include "cql3/functions/function_name.hh"
#include "cql3/functions/functions.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
@@ -66,6 +68,7 @@ private:
void on_update_function(const sstring& ks_name, const sstring& function_name) override {}
void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed) override {}
void on_update_tablet_metadata() override {}
void on_drop_keyspace(const sstring& ks_name) override {
// Do it in the background.
@@ -75,6 +78,12 @@ private:
}).handle_exception([] (std::exception_ptr e) {
log.error("Unexpected exception while revoking all permissions on dropped keyspace: {}", e);
});
(void)_authorizer.revoke_all(
auth::make_functions_resource(ks_name)).handle_exception_type([](const unsupported_authorization_operation&) {
// Nothing.
}).handle_exception([] (std::exception_ptr e) {
log.error("Unexpected exception while revoking all permissions on functions in dropped keyspace: {}", e);
});
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
@@ -89,8 +98,22 @@ private:
}
void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_drop_function(const sstring& ks_name, const sstring& function_name) override {}
void on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_drop_function(const sstring& ks_name, const sstring& function_name) override {
(void)_authorizer.revoke_all(
auth::make_functions_resource(ks_name, function_name)).handle_exception_type([](const unsupported_authorization_operation&) {
// Nothing.
}).handle_exception([] (std::exception_ptr e) {
log.error("Unexpected exception while revoking all permissions on dropped function: {}", e);
});
}
void on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {
(void)_authorizer.revoke_all(
auth::make_functions_resource(ks_name, aggregate_name)).handle_exception_type([](const unsupported_authorization_operation&) {
// Nothing.
}).handle_exception([] (std::exception_ptr e) {
log.error("Unexpected exception while revoking all permissions on dropped aggregate: {}", e);
});
}
void on_drop_view(const sstring& ks_name, const sstring& view_name) override {}
};

View File

@@ -17,7 +17,7 @@
#include <functional>
#include <compare>
#include "utils/mutable_view.hh"
#include <xxhash.h>
#include "utils/simple_hashers.hh"
using bytes = basic_sstring<int8_t, uint32_t, 31, false>;
using bytes_view = std::basic_string_view<int8_t>;
@@ -160,18 +160,7 @@ struct appending_hash<bytes_view> {
}
};
struct bytes_view_hasher : public hasher {
XXH64_state_t _state;
bytes_view_hasher(uint64_t seed = 0) noexcept {
XXH64_reset(&_state, seed);
}
void update(const char* ptr, size_t length) noexcept {
XXH64_update(&_state, ptr, length);
}
size_t finalize() {
return static_cast<size_t>(XXH64_digest(&_state));
}
};
using bytes_view_hasher = simple_xx_hasher;
namespace std {
template <>

View File

@@ -53,6 +53,10 @@ public:
using difference_type = std::ptrdiff_t;
using pointer = bytes_view*;
using reference = bytes_view&;
struct implementation {
blob_storage* current_chunk;
};
private:
chunk* _current = nullptr;
public:
@@ -75,11 +79,11 @@ public:
++(*this);
return tmp;
}
bool operator==(const fragment_iterator& other) const {
return _current == other._current;
}
bool operator!=(const fragment_iterator& other) const {
return _current != other._current;
bool operator==(const fragment_iterator&) const = default;
implementation extract_implementation() const {
return implementation {
.current_chunk = _current,
};
}
};
using const_iterator = fragment_iterator;
@@ -432,10 +436,6 @@ public:
return true;
}
bool operator!=(const bytes_ostream& other) const {
return !(*this == other);
}
// Makes this instance empty.
//
// The first buffer is not deallocated, so callers may rely on the

View File

@@ -68,7 +68,6 @@ public:
_pos = -1;
}
bool operator==(const iterator& o) const { return _pos == o._pos; }
bool operator!=(const iterator& o) const { return _pos != o._pos; }
};
public:
cartesian_product(const std::vector<std::vector<T>>& vec_of_vecs) : _vec_of_vecs(vec_of_vecs) {}

View File

@@ -65,7 +65,6 @@ public:
void ttl(int v) { _ttl = v; }
bool operator==(const options& o) const;
bool operator!=(const options& o) const;
};
} // namespace cdc

View File

@@ -1090,19 +1090,8 @@ shared_ptr<db::system_distributed_keyspace> generation_service::get_sys_dist_ks(
return _sys_dist_ks.local_shared();
}
std::ostream& operator<<(std::ostream& os, const generation_id& gen_id) {
std::visit(make_visitor(
[&os] (const generation_id_v1& id) { os << id.ts; },
[&os] (const generation_id_v2& id) { os << "(" << id.ts << ", " << id.id << ")"; }
), gen_id);
return os;
}
db_clock::time_point get_ts(const generation_id& gen_id) {
return std::visit(make_visitor(
[] (const generation_id_v1& id) { return id.ts; },
[] (const generation_id_v2& id) { return id.ts; }
), gen_id);
return std::visit([] (auto& id) { return id.ts; }, gen_id);
}
} // namespace cdc

View File

@@ -28,7 +28,35 @@ struct generation_id_v2 {
using generation_id = std::variant<generation_id_v1, generation_id_v2>;
std::ostream& operator<<(std::ostream&, const generation_id&);
db_clock::time_point get_ts(const generation_id&);
} // namespace cdc
template <>
struct fmt::formatter<cdc::generation_id_v1> {
constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }
template <typename FormatContext>
auto format(const cdc::generation_id_v1& gen_id, FormatContext& ctx) const {
return fmt::format_to(ctx.out(), "{}", gen_id.ts);
}
};
template <>
struct fmt::formatter<cdc::generation_id_v2> {
constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }
template <typename FormatContext>
auto format(const cdc::generation_id_v2& gen_id, FormatContext& ctx) const {
return fmt::format_to(ctx.out(), "({}, {})", gen_id.ts, gen_id.id);
}
};
template <>
struct fmt::formatter<cdc::generation_id> {
constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }
template <typename FormatContext>
auto format(const cdc::generation_id& gen_id, FormatContext& ctx) const {
return std::visit([&ctx] (auto& id) {
return fmt::format_to(ctx.out(), "{}", id);
}, gen_id);
}
};

View File

@@ -395,9 +395,6 @@ bool cdc::options::operator==(const options& o) const {
return enabled() == o.enabled() && _preimage == o._preimage && _postimage == o._postimage && _ttl == o._ttl
&& _delta_mode == o._delta_mode;
}
bool cdc::options::operator!=(const options& o) const {
return !(*this == o);
}
namespace cdc {
@@ -635,9 +632,6 @@ public:
bool operator==(const collection_iterator& x) const {
return _v == x._v;
}
bool operator!=(const collection_iterator& x) const {
return !(*this == x);
}
private:
void next() {
--_rem;

View File

@@ -389,7 +389,7 @@ struct extract_changes_visitor {
}
void partition_delete(const tombstone& t) {
_result[t.timestamp].partition_deletions = {t};
_result[t.timestamp].partition_deletions = partition_deletion{t};
}
constexpr bool finished() const { return false; }

View File

@@ -93,9 +93,6 @@ public:
bool operator==(const iterator& other) const {
return _position == other._position;
}
bool operator!=(const iterator& other) const {
return !(*this == other);
}
};
public:
explicit partition_cells_range(const mutation_partition& mp) : _mp(mp) { }

View File

@@ -15,12 +15,6 @@
std::atomic<int64_t> clocks_offset;
std::ostream& operator<<(std::ostream& os, db_clock::time_point tp) {
auto t = db_clock::to_time_t(tp);
::tm t_buf;
return os << std::put_time(::gmtime_r(&t, &t_buf), "%Y/%m/%d %T");
}
std::string format_timestamp(api::timestamp_type ts) {
auto t = std::time_t(std::chrono::duration_cast<std::chrono::seconds>(api::timestamp_clock::duration(ts)).count());
::tm t_buf;

View File

@@ -75,8 +75,7 @@ public:
const interval::interval_type& iv = *_i;
return position_range{iv.lower().position(), iv.upper().position()};
}
bool operator==(const position_range_iterator& other) const { return _i == other._i; }
bool operator!=(const position_range_iterator& other) const { return _i != other._i; }
bool operator==(const position_range_iterator& other) const = default;
position_range_iterator& operator++() {
++_i;
return *this;

View File

@@ -1,9 +1,7 @@
set(disabled_warnings
c++11-narrowing
mismatched-tags
missing-braces
overloaded-virtual
parentheses-equality
unsupported-friend)
include(CheckCXXCompilerFlag)
foreach(warning ${disabled_warnings})
@@ -13,7 +11,11 @@ foreach(warning ${disabled_warnings})
endif()
endforeach()
list(TRANSFORM _supported_warnings PREPEND "-Wno-")
string(JOIN " " CMAKE_CXX_FLAGS "-Wall" "-Werror" ${_supported_warnings})
string(JOIN " " CMAKE_CXX_FLAGS
"-Wall"
"-Werror"
"-Wno-error=deprecated-declarations"
${_supported_warnings})
function(default_target_arch arch)
set(x86_instruction_sets i386 i686 x86_64)

View File

@@ -168,7 +168,11 @@ std::ostream& operator<<(std::ostream& os, pretty_printed_throughput tp) {
}
static api::timestamp_type get_max_purgeable_timestamp(const table_state& table_s, sstable_set::incremental_selector& selector,
const std::unordered_set<shared_sstable>& compacting_set, const dht::decorated_key& dk) {
const std::unordered_set<shared_sstable>& compacting_set, const dht::decorated_key& dk, uint64_t& bloom_filter_checks) {
if (!table_s.tombstone_gc_enabled()) [[unlikely]] {
return api::min_timestamp;
}
auto timestamp = table_s.min_memtable_timestamp();
std::optional<utils::hashed_key> hk;
for (auto&& sst : boost::range::join(selector.select(dk).sstables, table_s.compacted_undeleted_sstables())) {
@@ -179,6 +183,7 @@ static api::timestamp_type get_max_purgeable_timestamp(const table_state& table_
hk = sstables::sstable::make_hashed_key(*table_s.schema(), dk.key());
}
if (sst->filter_has_key(*hk)) {
bloom_filter_checks++;
timestamp = std::min(timestamp, sst->get_stats_metadata().min_timestamp);
}
}
@@ -414,9 +419,12 @@ private:
class formatted_sstables_list {
bool _include_origin = true;
std::vector<sstring> _ssts;
std::vector<std::string> _ssts;
public:
formatted_sstables_list() = default;
void reserve(size_t n) {
_ssts.reserve(n);
}
explicit formatted_sstables_list(const std::vector<shared_sstable>& ssts, bool include_origin) : _include_origin(include_origin) {
_ssts.reserve(ssts.size());
for (const auto& sst : ssts) {
@@ -431,9 +439,7 @@ public:
};
std::ostream& operator<<(std::ostream& os, const formatted_sstables_list& lst) {
os << "[";
os << boost::algorithm::join(lst._ssts, ",");
os << "]";
fmt::print(os, "[{}]", fmt::join(lst._ssts, ","));
return os;
}
@@ -458,6 +464,7 @@ protected:
uint64_t _start_size = 0;
uint64_t _end_size = 0;
uint64_t _estimated_partitions = 0;
uint64_t _bloom_filter_checks = 0;
db::replay_position _rp;
encoding_stats_collector _stats_collector;
bool _can_split_large_partition = false;
@@ -571,7 +578,7 @@ protected:
// Tombstone expiration is enabled based on the presence of sstable set.
// If it's not present, we cannot purge tombstones without the risk of resurrecting data.
bool tombstone_expiration_enabled() const {
return bool(_sstable_set);
return bool(_sstable_set) && _table_s.tombstone_gc_enabled();
}
compaction_writer create_gc_compaction_writer() const {
@@ -625,11 +632,6 @@ protected:
flat_mutation_reader_v2::filter make_partition_filter() const {
return [this] (const dht::decorated_key& dk) {
#ifdef SEASTAR_DEBUG
// sstables should never be shared with other shards at this point.
assert(dht::shard_of(*_schema, dk.token()) == this_shard_id());
#endif
if (!_owned_ranges_checker->belongs_to_current_node(dk.token())) {
log_trace("Token {} does not belong to this node, skipping", dk.token());
return false;
@@ -668,6 +670,7 @@ private:
future<> setup() {
auto ssts = make_lw_shared<sstables::sstable_set>(make_sstable_set_for_input());
formatted_sstables_list formatted_msg;
formatted_msg.reserve(_sstables.size());
auto fully_expired = _table_s.fully_expired_sstables(_sstables, gc_clock::now());
min_max_tracker<api::timestamp_type> timestamp_tracker;
@@ -784,6 +787,7 @@ protected:
.ended_at = ended_at,
.start_size = _start_size,
.end_size = _end_size,
.bloom_filter_checks = _bloom_filter_checks,
},
};
@@ -824,7 +828,7 @@ private:
};
}
return [this] (const dht::decorated_key& dk) {
return get_max_purgeable_timestamp(_table_s, *_selector, _compacting_for_max_purgeable_func, dk);
return get_max_purgeable_timestamp(_table_s, *_selector, _compacting_for_max_purgeable_func, dk, _bloom_filter_checks);
};
}
@@ -1241,62 +1245,8 @@ public:
class scrub_compaction final : public regular_compaction {
public:
static void report_invalid_partition(compaction_type type, mutation_fragment_stream_validator& validator, const dht::decorated_key& new_key,
std::string_view action = "") {
const auto& schema = validator.schema();
const auto& current_key = validator.previous_partition_key();
clogger.error("[{} compaction {}.{}] Invalid partition {} ({}), partition is out-of-order compared to previous partition {} ({}){}{}",
type,
schema.ks_name(),
schema.cf_name(),
new_key.key().with_schema(schema),
new_key,
current_key.key().with_schema(schema),
current_key,
action.empty() ? "" : "; ",
action);
}
static void report_invalid_partition_start(compaction_type type, mutation_fragment_stream_validator& validator, const dht::decorated_key& new_key,
std::string_view action = "") {
const auto& schema = validator.schema();
const auto& current_key = validator.previous_partition_key();
clogger.error("[{} compaction {}.{}] Invalid partition start for partition {} ({}), previous partition {} ({}) didn't end with a partition-end fragment{}{}",
type,
schema.ks_name(),
schema.cf_name(),
new_key.key().with_schema(schema),
new_key,
current_key.key().with_schema(schema),
current_key,
action.empty() ? "" : "; ",
action);
}
static void report_invalid_mutation_fragment(compaction_type type, mutation_fragment_stream_validator& validator, const mutation_fragment_v2& mf,
std::string_view action = "") {
const auto& schema = validator.schema();
const auto& key = validator.previous_partition_key();
const auto prev_pos = validator.previous_position();
clogger.error("[{} compaction {}.{}] Invalid {} fragment{} ({}) in partition {} ({}),"
" fragment is out-of-order compared to previous {} fragment{} ({}){}{}",
type,
schema.ks_name(),
schema.cf_name(),
mf.mutation_fragment_kind(),
mf.has_key() ? format(" with key {}", mf.key().with_schema(schema)) : "",
mf.position(),
key.key().with_schema(schema),
key,
prev_pos.region(),
prev_pos.has_key() ? format(" with key {}", prev_pos.key().with_schema(schema)) : "",
prev_pos,
action.empty() ? "" : "; ",
action);
}
static void report_invalid_end_of_stream(compaction_type type, mutation_fragment_stream_validator& validator, std::string_view action = "") {
const auto& schema = validator.schema();
const auto& key = validator.previous_partition_key();
clogger.error("[{} compaction {}.{}] Invalid end-of-stream, last partition {} ({}) didn't end with a partition-end fragment{}{}",
type, schema.ks_name(), schema.cf_name(), key.key().with_schema(schema), key, action.empty() ? "" : "; ", action);
static void report_validation_error(compaction_type type, const ::schema& schema, sstring what, std::string_view action = "") {
clogger.error("[{} compaction {}.{}] {}{}{}", type, schema.ks_name(), schema.cf_name(), what, action.empty() ? "" : "; ", action);
}
private:
@@ -1319,9 +1269,9 @@ private:
++_validation_errors;
}
void on_unexpected_partition_start(const mutation_fragment_v2& ps) {
auto report_fn = [this, &ps] (std::string_view action = "") {
report_invalid_partition_start(compaction_type::Scrub, _validator, ps.as_partition_start().key(), action);
void on_unexpected_partition_start(const mutation_fragment_v2& ps, sstring error) {
auto report_fn = [this, error] (std::string_view action = "") {
report_validation_error(compaction_type::Scrub, *_schema, error, action);
};
maybe_abort_scrub(report_fn);
report_fn("Rectifying by adding assumed missing partition-end");
@@ -1343,9 +1293,9 @@ private:
}
}
skip on_invalid_partition(const dht::decorated_key& new_key) {
auto report_fn = [this, &new_key] (std::string_view action = "") {
report_invalid_partition(compaction_type::Scrub, _validator, new_key, action);
skip on_invalid_partition(const dht::decorated_key& new_key, sstring error) {
auto report_fn = [this, error] (std::string_view action = "") {
report_validation_error(compaction_type::Scrub, *_schema, error, action);
};
maybe_abort_scrub(report_fn);
if (_scrub_mode == compaction_type_options::scrub::mode::segregate) {
@@ -1359,9 +1309,9 @@ private:
return skip::yes;
}
skip on_invalid_mutation_fragment(const mutation_fragment_v2& mf) {
auto report_fn = [this, &mf] (std::string_view action = "") {
report_invalid_mutation_fragment(compaction_type::Scrub, _validator, mf, "");
skip on_invalid_mutation_fragment(const mutation_fragment_v2& mf, sstring error) {
auto report_fn = [this, error] (std::string_view action = "") {
report_validation_error(compaction_type::Scrub, *_schema, error, action);
};
maybe_abort_scrub(report_fn);
@@ -1396,9 +1346,9 @@ private:
return skip::yes;
}
void on_invalid_end_of_stream() {
auto report_fn = [this] (std::string_view action = "") {
report_invalid_end_of_stream(compaction_type::Scrub, _validator, action);
void on_invalid_end_of_stream(sstring error) {
auto report_fn = [this, error] (std::string_view action = "") {
report_validation_error(compaction_type::Scrub, *_schema, error, action);
};
maybe_abort_scrub(report_fn);
// Handle missing partition_end
@@ -1417,21 +1367,27 @@ private:
// and shouldn't be verified. We know the last fragment the
// validator saw is a partition-start, passing it another one
// will confuse it.
if (!_skip_to_next_partition && !_validator(mf)) {
on_unexpected_partition_start(mf);
if (!_skip_to_next_partition) {
if (auto res = _validator(mf); !res) {
on_unexpected_partition_start(mf, res.what());
}
// Continue processing this partition start.
}
_skip_to_next_partition = false;
// Then check that the partition monotonicity stands.
const auto& dk = mf.as_partition_start().key();
if (!_validator(dk) && on_invalid_partition(dk) == skip::yes) {
continue;
if (auto res = _validator(dk); !res) {
if (on_invalid_partition(dk, res.what()) == skip::yes) {
continue;
}
}
} else if (_skip_to_next_partition) {
continue;
} else {
if (!_validator(mf) && on_invalid_mutation_fragment(mf) == skip::yes) {
continue;
if (auto res = _validator(mf); !res) {
if (on_invalid_mutation_fragment(mf, res.what()) == skip::yes) {
continue;
}
}
}
push_mutation_fragment(std::move(mf));
@@ -1440,8 +1396,8 @@ private:
_end_of_stream = _reader.is_end_of_stream() && _reader.is_buffer_empty();
if (_end_of_stream) {
if (!_validator.on_end_of_stream()) {
on_invalid_end_of_stream();
if (auto res = _validator.on_end_of_stream(); !res) {
on_invalid_end_of_stream(res.what());
}
}
}
@@ -1722,81 +1678,29 @@ static std::unique_ptr<compaction> make_compaction(table_state& table_s, sstable
return descriptor.options.visit(visitor_factory);
}
future<uint64_t> scrub_validate_mode_validate_reader(flat_mutation_reader_v2 reader, const compaction_data& cdata) {
auto schema = reader.schema();
uint64_t errors = 0;
std::exception_ptr ex;
try {
auto validator = mutation_fragment_stream_validator(*schema);
while (auto mf_opt = co_await reader()) {
if (cdata.is_stop_requested()) [[unlikely]] {
// Compaction manager will catch this exception and re-schedule the compaction.
throw compaction_stopped_exception(schema->ks_name(), schema->cf_name(), cdata.stop_requested);
}
const auto& mf = *mf_opt;
if (mf.is_partition_start()) {
const auto& ps = mf.as_partition_start();
if (!validator(mf)) {
scrub_compaction::report_invalid_partition_start(compaction_type::Scrub, validator, ps.key());
validator.reset(mf);
++errors;
}
if (!validator(ps.key())) {
scrub_compaction::report_invalid_partition(compaction_type::Scrub, validator, ps.key());
validator.reset(ps.key());
++errors;
}
} else {
if (!validator(mf)) {
scrub_compaction::report_invalid_mutation_fragment(compaction_type::Scrub, validator, mf);
validator.reset(mf);
++errors;
}
}
}
if (!validator.on_end_of_stream()) {
scrub_compaction::report_invalid_end_of_stream(compaction_type::Scrub, validator);
++errors;
}
} catch (...) {
ex = std::current_exception();
}
co_await reader.close();
if (ex) {
co_return coroutine::exception(std::move(ex));
}
co_return errors;
}
static future<compaction_result> scrub_sstables_validate_mode(sstables::compaction_descriptor descriptor, compaction_data& cdata, table_state& table_s) {
auto schema = table_s.schema();
auto permit = table_s.make_compaction_reader_permit();
uint64_t validation_errors = 0;
formatted_sstables_list sstables_list_msg;
auto sstables = make_lw_shared<sstables::sstable_set>(sstables::make_partitioned_sstable_set(schema, false));
for (const auto& sst : descriptor.sstables) {
sstables_list_msg += sst;
sstables->insert(sst);
clogger.info("Scrubbing in validate mode {}", sst->get_filename());
validation_errors += co_await sst->validate(permit, descriptor.io_priority, cdata.abort, [&schema] (sstring what) {
scrub_compaction::report_validation_error(compaction_type::Scrub, *schema, what);
});
// Did validation actually finish because aborted?
if (cdata.is_stop_requested()) {
// Compaction manager will catch this exception and re-schedule the compaction.
throw compaction_stopped_exception(schema->ks_name(), schema->cf_name(), cdata.stop_requested);
}
clogger.info("Finished scrubbing in validate mode {} - sstable is {}", sst->get_filename(), validation_errors == 0 ? "valid" : "invalid");
}
clogger.info("Scrubbing in validate mode {}", sstables_list_msg);
auto permit = table_s.make_compaction_reader_permit();
auto reader = sstables->make_crawling_reader(schema, permit, descriptor.io_priority, nullptr);
const auto validation_errors = co_await scrub_validate_mode_validate_reader(std::move(reader), cdata);
clogger.info("Finished scrubbing in validate mode {} - sstable(s) are {}", sstables_list_msg, validation_errors == 0 ? "valid" : "invalid");
if (validation_errors != 0) {
for (auto& sst : *sstables->all()) {
for (auto& sst : descriptor.sstables) {
co_await sst->change_state(sstables::quarantine_dir);
}
}

View File

@@ -92,12 +92,15 @@ struct compaction_stats {
uint64_t start_size = 0;
uint64_t end_size = 0;
uint64_t validation_errors = 0;
// Bloom filter checks during max purgeable calculation
uint64_t bloom_filter_checks = 0;
compaction_stats& operator+=(const compaction_stats& r) {
ended_at = std::max(ended_at, r.ended_at);
start_size += r.start_size;
end_size += r.end_size;
validation_errors += r.validation_errors;
bloom_filter_checks += r.bloom_filter_checks;
return *this;
}
friend compaction_stats operator+(const compaction_stats& l, const compaction_stats& r) {
@@ -130,7 +133,4 @@ get_fully_expired_sstables(const table_state& table_s, const std::vector<sstable
// For tests, can drop after we virtualize sstables.
flat_mutation_reader_v2 make_scrubbing_reader(flat_mutation_reader_v2 rd, compaction_type_options::scrub::mode scrub_mode, uint64_t& validation_errors);
// For tests, can drop after we virtualize sstables.
future<uint64_t> scrub_validate_mode_validate_reader(flat_mutation_reader_v2 rd, const compaction_data& info);
}

View File

@@ -453,7 +453,7 @@ protected:
};
setup_new_compaction(descriptor.run_identifier);
cmlog.info0("User initiated compaction started on behalf of {}.{}", t->schema()->ks_name(), t->schema()->cf_name());
cmlog.info0("User initiated compaction started on behalf of {}", *t);
// Now that the sstables for major compaction are registered
// and the user_initiated_backlog_tracker is set up
@@ -533,8 +533,8 @@ compaction_manager::compaction_reenabler::compaction_reenabler(compaction_manage
, _holder(_compaction_state.gate.hold())
{
_compaction_state.compaction_disabled_counter++;
cmlog.debug("Temporarily disabled compaction for {}.{}. compaction_disabled_counter={}",
_table->schema()->ks_name(), _table->schema()->cf_name(), _compaction_state.compaction_disabled_counter);
cmlog.debug("Temporarily disabled compaction for {}. compaction_disabled_counter={}",
t, _compaction_state.compaction_disabled_counter);
}
compaction_manager::compaction_reenabler::compaction_reenabler(compaction_reenabler&& o) noexcept
@@ -547,13 +547,12 @@ compaction_manager::compaction_reenabler::compaction_reenabler(compaction_reenab
compaction_manager::compaction_reenabler::~compaction_reenabler() {
// submit compaction request if we're the last holder of the gate which is still opened.
if (_table && --_compaction_state.compaction_disabled_counter == 0 && !_compaction_state.gate.is_closed()) {
cmlog.debug("Reenabling compaction for {}.{}",
_table->schema()->ks_name(), _table->schema()->cf_name());
cmlog.debug("Reenabling compaction for {}", *_table);
try {
_cm.submit(*_table);
} catch (...) {
cmlog.warn("compaction_reenabler could not reenable compaction for {}.{}: {}",
_table->schema()->ks_name(), _table->schema()->cf_name(), std::current_exception());
cmlog.warn("compaction_reenabler could not reenable compaction for {}: {}",
*_table, std::current_exception());
}
}
}
@@ -606,8 +605,7 @@ compaction::compaction_state::~compaction_state() {
std::string compaction_task_executor::describe() const {
auto* t = _compacting_table;
auto s = t->schema();
return fmt::format("{} task {} for table {}.{} [{}]", _description, fmt::ptr(this), s->ks_name(), s->cf_name(), fmt::ptr(t));
return fmt::format("{} task {} for table {} [{}]", _description, fmt::ptr(this), *t, fmt::ptr(t));
}
compaction_task_executor::~compaction_task_executor() {
@@ -844,8 +842,7 @@ future<> compaction_manager::postponed_compactions_reevaluation() {
if (!_compaction_state.contains(t)) {
continue;
}
auto s = t->schema();
cmlog.debug("resubmitting postponed compaction for table {}.{} [{}]", s->ks_name(), s->cf_name(), fmt::ptr(t));
cmlog.debug("resubmitting postponed compaction for table {} [{}]", *t, fmt::ptr(t));
submit(*t);
co_await coroutine::maybe_yield();
}
@@ -894,7 +891,7 @@ future<> compaction_manager::stop_ongoing_compactions(sstring reason, table_stat
if (cmlog.is_enabled(level)) {
std::string scope = "";
if (t) {
scope = fmt::format(" for table {}.{}", t->schema()->ks_name(), t->schema()->cf_name());
scope = fmt::format(" for table {}", *t);
}
if (type_opt) {
scope += fmt::format(" {} type={}", scope.size() ? "and" : "for", *type_opt);
@@ -1037,8 +1034,8 @@ protected:
co_return std::nullopt;
}
if (!_cm.can_register_compaction(t, weight, descriptor.fan_in())) {
cmlog.debug("Refused compaction job ({} sstable(s)) of weight {} for {}.{}, postponing it...",
descriptor.sstables.size(), weight, t.schema()->ks_name(), t.schema()->cf_name());
cmlog.debug("Refused compaction job ({} sstable(s)) of weight {} for {}, postponing it...",
descriptor.sstables.size(), weight, t);
switch_state(state::postponed);
_cm.postpone_compaction_for_table(&t);
co_return std::nullopt;
@@ -1048,8 +1045,8 @@ protected:
auto release_exhausted = [&compacting] (const std::vector<sstables::shared_sstable>& exhausted_sstables) {
compacting.release_compacting(exhausted_sstables);
};
cmlog.debug("Accepted compaction job: task={} ({} sstable(s)) of weight {} for {}.{}",
fmt::ptr(this), descriptor.sstables.size(), weight, t.schema()->ks_name(), t.schema()->cf_name());
cmlog.debug("Accepted compaction job: task={} ({} sstable(s)) of weight {} for {}",
fmt::ptr(this), descriptor.sstables.size(), weight, t);
setup_new_compaction(descriptor.run_identifier);
std::exception_ptr ex;
@@ -1109,8 +1106,7 @@ bool compaction_manager::can_perform_regular_compaction(table_state& t) {
future<> compaction_manager::maybe_wait_for_sstable_count_reduction(table_state& t) {
auto schema = t.schema();
if (!can_perform_regular_compaction(t)) {
cmlog.trace("maybe_wait_for_sstable_count_reduction in {}.{}: cannot perform regular compaction",
schema->ks_name(), schema->cf_name());
cmlog.trace("maybe_wait_for_sstable_count_reduction in {}: cannot perform regular compaction", t);
co_return;
}
auto num_runs_for_compaction = [&, this] {
@@ -1123,8 +1119,8 @@ future<> compaction_manager::maybe_wait_for_sstable_count_reduction(table_state&
const auto threshold = size_t(std::max(schema->max_compaction_threshold(), 32));
auto count = num_runs_for_compaction();
if (count <= threshold) {
cmlog.trace("No need to wait for sstable count reduction in {}.{}: {} <= {}",
schema->ks_name(), schema->cf_name(), count, threshold);
cmlog.trace("No need to wait for sstable count reduction in {}: {} <= {}",
t, count, threshold);
co_return;
}
// Reduce the chances of falling into an endless wait, if compaction
@@ -1142,8 +1138,8 @@ future<> compaction_manager::maybe_wait_for_sstable_count_reduction(table_state&
}
auto end = db_clock::now();
auto elapsed_ms = (end - start) / 1ms;
cmlog.warn("Waited {}ms for compaction of {}.{} to catch up on {} sstable runs",
elapsed_ms, schema->ks_name(), schema->cf_name(), count);
cmlog.warn("Waited {}ms for compaction of {} to catch up on {} sstable runs",
elapsed_ms, t, count);
}
namespace compaction {
@@ -1264,12 +1260,16 @@ protected:
std::exception_ptr ex;
try {
table_state& t = *_compacting_table;
auto maintenance_sstables = t.maintenance_sstable_set().all();
cmlog.info("Starting off-strategy compaction for {}.{}, {} candidates were found",
t.schema()->ks_name(), t.schema()->cf_name(), maintenance_sstables->size());
auto size = t.maintenance_sstable_set().size();
if (!size) {
cmlog.debug("Skipping off-strategy compaction for {}, No candidates were found", t);
finish_compaction();
co_return std::nullopt;
}
cmlog.info("Starting off-strategy compaction for {}, {} candidates were found", t, size);
co_await run_offstrategy_compaction(_compaction_data);
finish_compaction();
cmlog.info("Done with off-strategy compaction for {}.{}", t.schema()->ks_name(), t.schema()->cf_name());
cmlog.info("Done with off-strategy compaction for {}", t);
co_return std::nullopt;
} catch (...) {
ex = std::current_exception();
@@ -1524,14 +1524,18 @@ protected:
co_return std::nullopt;
}
private:
// Releases reference to cleaned files such that respective used disk space can be freed.
void release_exhausted(std::vector<sstables::shared_sstable> exhausted_sstables) {
_compacting.release_compacting(exhausted_sstables);
}
future<> run_cleanup_job(sstables::compaction_descriptor descriptor) {
co_await coroutine::switch_to(_cm.compaction_sg().cpu);
// Releases reference to cleaned files such that respective used disk space can be freed.
auto release_exhausted = [this, &descriptor] (std::vector<sstables::shared_sstable> exhausted_sstables) mutable {
auto exhausted = boost::copy_range<std::unordered_set<sstables::shared_sstable>>(exhausted_sstables);
std::erase_if(descriptor.sstables, [&] (const sstables::shared_sstable& sst) {
return exhausted.contains(sst);
});
_compacting.release_compacting(exhausted_sstables);
};
for (;;) {
compaction_backlog_tracker user_initiated(std::make_unique<user_initiated_backlog_tracker>(_cm._compaction_controller.backlog_of_shares(200), _cm.available_memory()));
_cm.register_backlog_tracker(user_initiated);
@@ -1539,8 +1543,7 @@ private:
std::exception_ptr ex;
try {
setup_new_compaction(descriptor.run_identifier);
co_await compact_sstables_and_update_history(descriptor, _compaction_data,
std::bind(&cleanup_sstables_compaction_task_executor::release_exhausted, this, std::placeholders::_1));
co_await compact_sstables_and_update_history(descriptor, _compaction_data, release_exhausted);
finish_compaction();
_cm.reevaluate_postponed_compactions();
co_return; // done with current job
@@ -1561,6 +1564,11 @@ private:
bool needs_cleanup(const sstables::shared_sstable& sst,
const dht::token_range_vector& sorted_owned_ranges) {
// Finish early if the keyspace has no owned token ranges (in this data center)
if (sorted_owned_ranges.empty()) {
return true;
}
auto first_token = sst->get_first_decorated_key().token();
auto last_token = sst->get_last_decorated_key().token();
dht::token_range sst_token_range = dht::token_range::make(first_token, last_token);
@@ -1580,9 +1588,13 @@ bool needs_cleanup(const sstables::shared_sstable& sst,
return true;
}
bool compaction_manager::update_sstable_cleanup_state(table_state& t, const sstables::shared_sstable& sst, owned_ranges_ptr owned_ranges_ptr) {
bool compaction_manager::update_sstable_cleanup_state(table_state& t, const sstables::shared_sstable& sst, const dht::token_range_vector& sorted_owned_ranges) {
auto& cs = get_compaction_state(&t);
if (owned_ranges_ptr && needs_cleanup(sst, *owned_ranges_ptr)) {
if (sst->is_shared()) {
throw std::runtime_error(format("Shared SSTable {} cannot be marked as requiring cleanup, as it can only be processed by resharding",
sst->get_filename()));
}
if (needs_cleanup(sst, sorted_owned_ranges)) {
cs.sstables_requiring_cleanup.insert(sst);
return true;
} else {
@@ -1591,46 +1603,97 @@ bool compaction_manager::update_sstable_cleanup_state(table_state& t, const ssta
}
}
bool compaction_manager::erase_sstable_cleanup_state(table_state& t, const sstables::shared_sstable& sst) {
auto& cs = get_compaction_state(&t);
return cs.sstables_requiring_cleanup.erase(sst);
}
bool compaction_manager::requires_cleanup(table_state& t, const sstables::shared_sstable& sst) const {
const auto& cs = get_compaction_state(&t);
return cs.sstables_requiring_cleanup.contains(sst);
}
future<> compaction_manager::perform_cleanup(owned_ranges_ptr sorted_owned_ranges, table_state& t) {
constexpr auto sleep_duration = std::chrono::seconds(10);
constexpr auto max_idle_duration = std::chrono::seconds(300);
auto& cs = get_compaction_state(&t);
co_await try_perform_cleanup(sorted_owned_ranges, t);
auto last_idle = seastar::lowres_clock::now();
while (!cs.sstables_requiring_cleanup.empty()) {
auto idle = seastar::lowres_clock::now() - last_idle;
if (idle >= max_idle_duration) {
auto msg = ::format("Cleanup timed out after {} seconds of no progress", std::chrono::duration_cast<std::chrono::seconds>(idle).count());
cmlog.warn("{}", msg);
co_await coroutine::return_exception(std::runtime_error(msg));
}
auto has_sstables_eligible_for_compaction = [&] {
for (auto& sst : cs.sstables_requiring_cleanup) {
if (sstables::is_eligible_for_compaction(sst)) {
return true;
}
}
return false;
};
cmlog.debug("perform_cleanup: waiting for sstables to become eligible for cleanup");
co_await t.get_staging_done_condition().when(sleep_duration, [&] { return has_sstables_eligible_for_compaction(); });
if (!has_sstables_eligible_for_compaction()) {
continue;
}
co_await try_perform_cleanup(sorted_owned_ranges, t);
last_idle = seastar::lowres_clock::now();
}
}
future<> compaction_manager::try_perform_cleanup(owned_ranges_ptr sorted_owned_ranges, table_state& t) {
auto check_for_cleanup = [this, &t] {
return boost::algorithm::any_of(_tasks, [&t] (auto& task) {
return task->compacting_table() == &t && task->type() == sstables::compaction_type::Cleanup;
});
};
if (check_for_cleanup()) {
throw std::runtime_error(format("cleanup request failed: there is an ongoing cleanup on {}.{}",
t.schema()->ks_name(), t.schema()->cf_name()));
throw std::runtime_error(format("cleanup request failed: there is an ongoing cleanup on {}", t));
}
if (sorted_owned_ranges->empty()) {
throw std::runtime_error("cleanup request failed: sorted_owned_ranges is empty");
co_await run_with_compaction_disabled(t, [&] () -> future<> {
auto update_sstables_cleanup_state = [&] (const sstables::sstable_set& set) -> future<> {
// Hold on to the sstable set since it may be overwritten
// while we yield in this loop.
auto set_holder = set.shared_from_this();
co_await set.for_each_sstable_gently([&] (const sstables::shared_sstable& sst) {
update_sstable_cleanup_state(t, sst, *sorted_owned_ranges);
});
};
co_await update_sstables_cleanup_state(t.main_sstable_set());
co_await update_sstables_cleanup_state(t.maintenance_sstable_set());
});
auto& cs = get_compaction_state(&t);
if (cs.sstables_requiring_cleanup.empty()) {
cmlog.debug("perform_cleanup for {} found no sstables requiring cleanup", t);
co_return;
}
// Some sstables may remain in sstables_requiring_cleanup
// for later processing if they can't be cleaned up right now.
// They are erased from sstables_requiring_cleanup by compacting.release_compacting
cs.owned_ranges_ptr = std::move(sorted_owned_ranges);
auto found_maintenance_sstables = bool(t.maintenance_sstable_set().for_each_sstable_until([this, &t] (const sstables::shared_sstable& sst) {
return stop_iteration(requires_cleanup(t, sst));
}));
if (found_maintenance_sstables) {
co_await perform_offstrategy(t);
}
// Called with compaction_disabled
auto get_sstables = [this, &t, sorted_owned_ranges] () -> future<std::vector<sstables::shared_sstable>> {
return seastar::async([this, &t, sorted_owned_ranges = std::move(sorted_owned_ranges)] {
auto update_sstables_cleanup_state = [&] (const sstables::sstable_set& set) {
set.for_each_sstable([&] (const sstables::shared_sstable& sst) {
update_sstable_cleanup_state(t, sst, sorted_owned_ranges);
seastar::thread::maybe_yield();
});
};
update_sstables_cleanup_state(t.main_sstable_set());
update_sstables_cleanup_state(t.maintenance_sstable_set());
// Some sstables may remain in sstables_requiring_cleanup
// for later processing if they can't be cleaned up right now.
// They are erased from sstables_requiring_cleanup by compacting.release_compacting
auto& cs = get_compaction_state(&t);
if (!cs.sstables_requiring_cleanup.empty()) {
cs.owned_ranges_ptr = std::move(sorted_owned_ranges);
}
return get_candidates(t, cs.sstables_requiring_cleanup);
});
auto get_sstables = [this, &t] () -> future<std::vector<sstables::shared_sstable>> {
auto& cs = get_compaction_state(&t);
co_return get_candidates(t, cs.sstables_requiring_cleanup);
};
co_await perform_task_on_all_files<cleanup_sstables_compaction_task_executor>(t, sstables::compaction_type_options::make_cleanup(), std::move(sorted_owned_ranges),
@@ -1701,8 +1764,7 @@ compaction::compaction_state::compaction_state(table_state& t)
void compaction_manager::add(table_state& t) {
auto [_, inserted] = _compaction_state.try_emplace(&t, t);
if (!inserted) {
auto s = t.schema();
on_internal_error(cmlog, format("compaction_state for table {}.{} [{}] already exists", s->ks_name(), s->cf_name(), fmt::ptr(&t)));
on_internal_error(cmlog, format("compaction_state for table {} [{}] already exists", t, fmt::ptr(&t)));
}
}

View File

@@ -304,7 +304,12 @@ public:
// given sstable, e.g. after node loses part of its token range because
// of a newly added node.
future<> perform_cleanup(owned_ranges_ptr sorted_owned_ranges, compaction::table_state& t);
private:
future<> try_perform_cleanup(owned_ranges_ptr sorted_owned_ranges, compaction::table_state& t);
// Add sst to or remove it from the respective compaction_state.sstables_requiring_cleanup set.
bool update_sstable_cleanup_state(table_state& t, const sstables::shared_sstable& sst, const dht::token_range_vector& sorted_owned_ranges);
public:
// Submit a table to be upgraded and wait for its termination.
future<> perform_sstable_upgrade(owned_ranges_ptr sorted_owned_ranges, compaction::table_state& t, bool exclude_current_version);
@@ -404,8 +409,9 @@ public:
return _tombstone_gc_state;
};
// Add sst to or remove it from the respective compaction_state.sstables_requiring_cleanup set.
bool update_sstable_cleanup_state(table_state& t, const sstables::shared_sstable& sst, owned_ranges_ptr owned_ranges_ptr);
// Uncoditionally erase sst from `sstables_requiring_cleanup`
// Returns true iff sst was found and erased.
bool erase_sstable_cleanup_state(table_state& t, const sstables::shared_sstable& sst);
// checks if the sstable is in the respective compaction_state.sstables_requiring_cleanup set.
bool requires_cleanup(table_state& t, const sstables::shared_sstable& sst) const;

View File

@@ -35,7 +35,7 @@ struct compaction_state {
compaction_backlog_tracker backlog_tracker;
std::unordered_set<sstables::shared_sstable> sstables_requiring_cleanup;
owned_ranges_ptr owned_ranges_ptr;
compaction::owned_ranges_ptr owned_ranges_ptr;
explicit compaction_state(table_state& t);
compaction_state(compaction_state&&) = delete;

View File

@@ -37,6 +37,10 @@ compaction_descriptor leveled_compaction_strategy::get_sstables_for_compaction(t
return candidate;
}
if (!table_s.tombstone_gc_enabled()) {
return compaction_descriptor();
}
// if there is no sstable to compact in standard way, try compacting based on droppable tombstone ratio
// unlike stcs, lcs can look for sstable with highest droppable tombstone ratio, so as not to choose
// a sstable which droppable data shadow data in older sstable, by starting from highest levels which

View File

@@ -164,6 +164,10 @@ size_tiered_compaction_strategy::get_sstables_for_compaction(table_state& table_
return sstables::compaction_descriptor(std::move(most_interesting), service::get_local_compaction_priority());
}
if (!table_s.tombstone_gc_enabled()) {
return compaction_descriptor();
}
// if there is no sstable to compact in standard way, try compacting single sstable whose droppable tombstone
// ratio is greater than threshold.
// prefer oldest sstables from biggest size tiers because they will be easier to satisfy conditions for

View File

@@ -9,6 +9,8 @@
#pragma once
#include <seastar/core/condition-variable.hh>
#include "schema/schema_fwd.hh"
#include "compaction_descriptor.hh"
@@ -48,9 +50,24 @@ public:
virtual api::timestamp_type min_memtable_timestamp() const = 0;
virtual future<> on_compaction_completion(sstables::compaction_completion_desc desc, sstables::offstrategy offstrategy) = 0;
virtual bool is_auto_compaction_disabled_by_user() const noexcept = 0;
virtual bool tombstone_gc_enabled() const noexcept = 0;
virtual const tombstone_gc_state& get_tombstone_gc_state() const noexcept = 0;
virtual compaction_backlog_tracker& get_backlog_tracker() = 0;
virtual const std::string& get_group_id() const noexcept = 0;
virtual seastar::condition_variable& get_staging_done_condition() noexcept = 0;
};
}
} // namespace compaction
namespace fmt {
template <>
struct formatter<compaction::table_state> : formatter<std::string_view> {
template <typename FormatContext>
auto format(const compaction::table_state& t, FormatContext& ctx) const {
auto s = t.schema();
return fmt::format_to(ctx.out(), "{}.{} compaction_group={}", s->ks_name(), s->cf_name(), t.get_group_id());
}
};
} // namespace fmt

View File

@@ -128,4 +128,44 @@ future<> shard_upgrade_sstables_compaction_task_impl::run() {
});
}
future<> scrub_sstables_compaction_task_impl::run() {
_stats = co_await _db.map_reduce0([&] (replica::database& db) -> future<sstables::compaction_stats> {
sstables::compaction_stats stats;
tasks::task_info parent_info{_status.id, _status.shard};
auto& compaction_module = db.get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<shard_scrub_sstables_compaction_task_impl>(parent_info, _status.keyspace, _status.id, db, _column_families, _opts, stats);
co_await task->done();
co_return stats;
}, sstables::compaction_stats{}, std::plus<sstables::compaction_stats>());
}
tasks::is_internal shard_scrub_sstables_compaction_task_impl::is_internal() const noexcept {
return tasks::is_internal::yes;
}
future<> shard_scrub_sstables_compaction_task_impl::run() {
_stats = co_await map_reduce(_column_families, [&] (sstring cfname) -> future<sstables::compaction_stats> {
sstables::compaction_stats stats{};
tasks::task_info parent_info{_status.id, _status.shard};
auto& compaction_module = _db.get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<table_scrub_sstables_compaction_task_impl>(parent_info, _status.keyspace, cfname, _status.id, _db, _opts, stats);
co_await task->done();
co_return stats;
}, sstables::compaction_stats{}, std::plus<sstables::compaction_stats>());
}
tasks::is_internal table_scrub_sstables_compaction_task_impl::is_internal() const noexcept {
return tasks::is_internal::yes;
}
future<> table_scrub_sstables_compaction_task_impl::run() {
auto& cm = _db.get_compaction_manager();
auto& cf = _db.find_column_family(_status.keyspace, _status.table);
co_await cf.parallel_foreach_table_state([&] (compaction::table_state& ts) mutable -> future<> {
auto r = co_await cm.perform_sstable_scrub(ts, _opts);
_stats += r.value_or(sstables::compaction_stats{});
});
}
}

View File

@@ -8,6 +8,7 @@
#pragma once
#include "compaction/compaction.hh"
#include "replica/database_fwd.hh"
#include "schema/schema_fwd.hh"
#include "tasks/task_manager.hh"
@@ -213,9 +214,9 @@ protected:
virtual future<> run() override;
};
class rewrite_sstables_compaction_task_impl : public compaction_task_impl {
class sstables_compaction_task_impl : public compaction_task_impl {
public:
rewrite_sstables_compaction_task_impl(tasks::task_manager::module_ptr module,
sstables_compaction_task_impl(tasks::task_manager::module_ptr module,
tasks::task_id id,
unsigned sequence_number,
std::string keyspace,
@@ -234,7 +235,7 @@ protected:
virtual future<> run() override = 0;
};
class upgrade_sstables_compaction_task_impl : public rewrite_sstables_compaction_task_impl {
class upgrade_sstables_compaction_task_impl : public sstables_compaction_task_impl {
private:
sharded<replica::database>& _db;
std::vector<table_id> _table_infos;
@@ -245,7 +246,7 @@ public:
sharded<replica::database>& db,
std::vector<table_id> table_infos,
bool exclude_current_version) noexcept
: rewrite_sstables_compaction_task_impl(module, tasks::task_id::create_random_id(), module->new_sequence_number(), std::move(keyspace), "", "", tasks::task_id::create_null_id())
: sstables_compaction_task_impl(module, tasks::task_id::create_random_id(), module->new_sequence_number(), std::move(keyspace), "", "", tasks::task_id::create_null_id())
, _db(db)
, _table_infos(std::move(table_infos))
, _exclude_current_version(exclude_current_version)
@@ -254,7 +255,7 @@ protected:
virtual future<> run() override;
};
class shard_upgrade_sstables_compaction_task_impl : public rewrite_sstables_compaction_task_impl {
class shard_upgrade_sstables_compaction_task_impl : public sstables_compaction_task_impl {
private:
replica::database& _db;
std::vector<table_id> _table_infos;
@@ -266,7 +267,7 @@ public:
replica::database& db,
std::vector<table_id> table_infos,
bool exclude_current_version) noexcept
: rewrite_sstables_compaction_task_impl(module, tasks::task_id::create_random_id(), module->new_sequence_number(), std::move(keyspace), "", "", parent_id)
: sstables_compaction_task_impl(module, tasks::task_id::create_random_id(), module->new_sequence_number(), std::move(keyspace), "", "", parent_id)
, _db(db)
, _table_infos(std::move(table_infos))
, _exclude_current_version(exclude_current_version)
@@ -277,6 +278,79 @@ protected:
virtual future<> run() override;
};
class scrub_sstables_compaction_task_impl : public sstables_compaction_task_impl {
private:
sharded<replica::database>& _db;
std::vector<sstring> _column_families;
sstables::compaction_type_options::scrub _opts;
sstables::compaction_stats& _stats;
public:
scrub_sstables_compaction_task_impl(tasks::task_manager::module_ptr module,
std::string keyspace,
sharded<replica::database>& db,
std::vector<sstring> column_families,
sstables::compaction_type_options::scrub opts,
sstables::compaction_stats& stats) noexcept
: sstables_compaction_task_impl(module, tasks::task_id::create_random_id(), module->new_sequence_number(), std::move(keyspace), "", "", tasks::task_id::create_null_id())
, _db(db)
, _column_families(std::move(column_families))
, _opts(opts)
, _stats(stats)
{}
protected:
virtual future<> run() override;
};
class shard_scrub_sstables_compaction_task_impl : public sstables_compaction_task_impl {
private:
replica::database& _db;
std::vector<sstring> _column_families;
sstables::compaction_type_options::scrub _opts;
sstables::compaction_stats& _stats;
public:
shard_scrub_sstables_compaction_task_impl(tasks::task_manager::module_ptr module,
std::string keyspace,
tasks::task_id parent_id,
replica::database& db,
std::vector<sstring> column_families,
sstables::compaction_type_options::scrub opts,
sstables::compaction_stats& stats) noexcept
: sstables_compaction_task_impl(module, tasks::task_id::create_random_id(), module->new_sequence_number(), std::move(keyspace), "", "", parent_id)
, _db(db)
, _column_families(std::move(column_families))
, _opts(opts)
, _stats(stats)
{}
virtual tasks::is_internal is_internal() const noexcept override;
protected:
virtual future<> run() override;
};
class table_scrub_sstables_compaction_task_impl : public sstables_compaction_task_impl {
private:
replica::database& _db;
sstables::compaction_type_options::scrub _opts;
sstables::compaction_stats& _stats;
public:
table_scrub_sstables_compaction_task_impl(tasks::task_manager::module_ptr module,
std::string keyspace,
std::string table,
tasks::task_id parent_id,
replica::database& db,
sstables::compaction_type_options::scrub opts,
sstables::compaction_stats& stats) noexcept
: sstables_compaction_task_impl(module, tasks::task_id::create_random_id(), module->new_sequence_number(), std::move(keyspace), std::move(table), "", parent_id)
, _db(db)
, _opts(opts)
, _stats(stats)
{}
virtual tasks::is_internal is_internal() const noexcept override;
protected:
virtual future<> run() override;
};
class task_manager_module : public tasks::task_manager::module {
public:
task_manager_module(tasks::task_manager& tm) noexcept : tasks::task_manager::module(tm, "compaction") {}

View File

@@ -284,6 +284,10 @@ time_window_compaction_strategy::get_next_non_expired_sstables(table_state& tabl
return most_interesting;
}
if (!table_s.tombstone_gc_enabled()) {
return {};
}
// if there is no sstable to compact in standard way, try compacting single sstable whose droppable tombstone
// ratio is greater than threshold.
auto e = boost::range::remove_if(non_expiring_sstables, [this, compaction_time, &table_s] (const shared_sstable& sst) -> bool {

View File

@@ -31,25 +31,10 @@ public:
const dht::ring_position_view& position() const {
return *_rpv;
}
friend std::strong_ordering tri_compare(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return dht::ring_position_tri_compare(*x._schema, x.position(), y.position());
std::strong_ordering operator<=>(const compatible_ring_position_or_view& other) const {
return dht::ring_position_tri_compare(*_schema, position(), other.position());
}
friend bool operator<(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) != 0;
bool operator==(const compatible_ring_position_or_view& other) const {
return *this <=> other == 0;
}
};

View File

@@ -123,10 +123,6 @@ public:
bool operator==(const iterator& other) const {
return _offset == other._offset && other._i == _i;
}
bool operator!=(const iterator& other) const {
return !(*this == other);
}
};
// A trichotomic comparator defined on @CompoundType representations which
@@ -429,7 +425,6 @@ public:
const value_type& operator*() const { return _current; }
const value_type* operator->() const { return &_current; }
bool operator!=(const iterator& i) const { return _v.begin() != i._v.begin(); }
bool operator==(const iterator& i) const { return _v.begin() == i._v.begin(); }
friend class composite;
@@ -636,7 +631,6 @@ public:
}
bool operator==(const composite_view& k) const { return k._bytes == _bytes && k._is_compound == _is_compound; }
bool operator!=(const composite_view& k) const { return !(k == *this); }
friend fmt::formatter<composite_view>;
};

View File

@@ -175,10 +175,6 @@ bool compression_parameters::operator==(const compression_parameters& other) con
&& _crc_check_chance == other._crc_check_chance;
}
bool compression_parameters::operator!=(const compression_parameters& other) const {
return !(*this == other);
}
void compression_parameters::validate_options(const std::map<sstring, sstring>& options) {
// currently, there are no options specific to a particular compressor
static std::set<sstring> keywords({

View File

@@ -105,7 +105,6 @@ public:
void validate();
std::map<sstring, sstring> get_options() const;
bool operator==(const compression_parameters& other) const;
bool operator!=(const compression_parameters& other) const;
static compression_parameters no_compression() {
return compression_parameters(nullptr);

View File

@@ -272,6 +272,7 @@ batch_size_fail_threshold_in_kb: 1024
# - alternator-streams
# - alternator-ttl
# - raft
# - tablets
# The directory where hints files are stored if hinted handoff is enabled.
# hints_directory: /var/lib/scylla/hints

View File

@@ -435,6 +435,8 @@ scylla_tests = set([
'test/boost/mutation_writer_test',
'test/boost/mvcc_test',
'test/boost/network_topology_strategy_test',
'test/boost/token_metadata_test',
'test/boost/tablets_test',
'test/boost/nonwrapping_range_test',
'test/boost/observable_test',
'test/boost/partitioner_test',
@@ -507,6 +509,8 @@ scylla_tests = set([
'test/boost/exceptions_fallback_test',
'test/boost/s3_test',
'test/boost/locator_topology_test',
'test/boost/string_format_test',
'test/boost/tagged_integer_test',
'test/manual/ec2_snitch_test',
'test/manual/enormous_table_scan_test',
'test/manual/gce_snitch_test',
@@ -561,6 +565,20 @@ raft_tests = set([
'test/raft/failure_detector_test',
])
wasms = set([
'wasm/return_input.wat',
'wasm/test_complex_null_values.wat',
'wasm/test_fib_called_on_null.wat',
'wasm/test_functions_with_frozen_types.wat',
'wasm/test_mem_grow.wat',
'wasm/test_pow.wat',
'wasm/test_short_ints.wat',
'wasm/test_types_with_and_without_nulls.wat',
'wasm/test_UDA_final.wat',
'wasm/test_UDA_scalar.wat',
'wasm/test_word_double.wat',
])
apps = set([
'scylla',
])
@@ -571,7 +589,7 @@ other = set([
'iotune',
])
all_artifacts = apps | tests | other
all_artifacts = apps | tests | other | wasms
arg_parser = argparse.ArgumentParser('Configure scylla')
arg_parser.add_argument('--out', dest='buildfile', action='store', default='build.ninja',
@@ -663,6 +681,7 @@ scylla_raft_core = [
scylla_core = (['message/messaging_service.cc',
'replica/database.cc',
'replica/table.cc',
'replica/tablets.cc',
'replica/distributed_loader.cc',
'replica/memtable.cc',
'replica/exceptions.cc',
@@ -672,6 +691,7 @@ scylla_core = (['message/messaging_service.cc',
'mutation/frozen_mutation.cc',
'mutation/mutation.cc',
'mutation/mutation_fragment.cc',
'mutation/mutation_fragment_stream_validator.cc',
'mutation/mutation_partition.cc',
'mutation/mutation_partition_v2.cc',
'mutation/mutation_partition_view.cc',
@@ -717,6 +737,7 @@ scylla_core = (['message/messaging_service.cc',
'sstables/sstables.cc',
'sstables/sstables_manager.cc',
'sstables/sstable_set.cc',
'sstables/storage.cc',
'sstables/mx/partition_reversing_data_source.cc',
'sstables/mx/reader.cc',
'sstables/mx/writer.cc',
@@ -842,6 +863,7 @@ scylla_core = (['message/messaging_service.cc',
'validation.cc',
'service/priority_manager.cc',
'service/migration_manager.cc',
'service/tablet_allocator.cc',
'service/storage_proxy.cc',
'query_ranges_to_vnodes.cc',
'service/forward_service.cc',
@@ -931,6 +953,7 @@ scylla_core = (['message/messaging_service.cc',
'query.cc',
'query-result-set.cc',
'locator/abstract_replication_strategy.cc',
'locator/tablets.cc',
'locator/azure_snitch.cc',
'locator/simple_strategy.cc',
'locator/local_strategy.cc',
@@ -1169,7 +1192,7 @@ scylla_tests_generic_dependencies = [
'test/lib/sstable_run_based_compaction_strategy_for_tests.cc',
]
scylla_tests_dependencies = scylla_core + idls + scylla_tests_generic_dependencies + [
scylla_tests_dependencies = scylla_core + alternator + idls + scylla_tests_generic_dependencies + [
'test/lib/cql_assertions.cc',
'test/lib/result_set_assertions.cc',
'test/lib/mutation_source_test.cc',
@@ -1187,6 +1210,7 @@ scylla_perfs = ['test/perf/perf_fast_forward.cc',
'test/perf/perf_row_cache_update.cc',
'test/perf/perf_simple_query.cc',
'test/perf/perf_sstable.cc',
'test/perf/perf_tablets.cc',
'test/perf/perf.cc',
'test/lib/alternator_test_env.cc',
'test/lib/cql_test_env.cc',
@@ -1235,6 +1259,7 @@ pure_boost_tests = set([
'test/boost/vint_serialization_test',
'test/boost/bptree_test',
'test/boost/utf8_test',
'test/boost/string_format_test',
'test/manual/streaming_histogram_test',
])
@@ -1272,7 +1297,7 @@ for t in sorted(scylla_tests):
if t not in tests_not_using_seastar_test_framework:
deps[t] += scylla_tests_dependencies
else:
deps[t] += scylla_core + idls + scylla_tests_generic_dependencies
deps[t] += scylla_core + alternator + idls + scylla_tests_generic_dependencies
perf_tests_seastar_deps = [
'seastar/tests/perf/perf_tests.cc'
@@ -1338,15 +1363,27 @@ deps['test/raft/discovery_test'] = ['test/raft/discovery_test.cc',
'test/lib/log.cc',
'service/raft/discovery.cc'] + scylla_raft_dependencies
wasm_deps = {}
wasm_deps['wasm/return_input.wat'] = 'test/resource/wasm/rust/return_input.rs'
wasm_deps['wasm/test_short_ints.wat'] = 'test/resource/wasm/rust/test_short_ints.rs'
wasm_deps['wasm/test_complex_null_values.wat'] = 'test/resource/wasm/rust/test_complex_null_values.rs'
wasm_deps['wasm/test_functions_with_frozen_types.wat'] = 'test/resource/wasm/rust/test_functions_with_frozen_types.rs'
wasm_deps['wasm/test_types_with_and_without_nulls.wat'] = 'test/resource/wasm/rust/test_types_with_and_without_nulls.rs'
wasm_deps['wasm/test_fib_called_on_null.wat'] = 'test/resource/wasm/c/test_fib_called_on_null.c'
wasm_deps['wasm/test_mem_grow.wat'] = 'test/resource/wasm/c/test_mem_grow.c'
wasm_deps['wasm/test_pow.wat'] = 'test/resource/wasm/c/test_pow.c'
wasm_deps['wasm/test_UDA_final.wat'] = 'test/resource/wasm/c/test_UDA_final.c'
wasm_deps['wasm/test_UDA_scalar.wat'] = 'test/resource/wasm/c/test_UDA_scalar.c'
wasm_deps['wasm/test_word_double.wat'] = 'test/resource/wasm/c/test_word_double.c'
warnings = [
'-Wall',
'-Werror',
'-Wno-mismatched-tags', # clang-only
'-Wno-tautological-compare',
'-Wno-parentheses-equality',
'-Wno-c++11-narrowing',
'-Wno-missing-braces',
'-Wno-ignored-attributes',
'-Wno-overloaded-virtual',
'-Wno-unused-command-line-argument',
@@ -1502,10 +1539,10 @@ default_modes = args.selected_modes or [mode for mode, mode_cfg in modes.items()
build_modes = {m: modes[m] for m in selected_modes}
if args.artifacts:
build_artifacts = []
build_artifacts = set()
for artifact in args.artifacts:
if artifact in all_artifacts:
build_artifacts.append(artifact)
build_artifacts.add(artifact)
else:
print("Ignoring unknown build artifact: {}".format(artifact))
if not build_artifacts:
@@ -1787,7 +1824,32 @@ with open(buildfile, 'w') as f:
description = RUST_SOURCE $out
rule cxxbridge_header
command = cxxbridge --header > $out
rule c2wasm
command = clang --target=wasm32 --no-standard-libraries -Wl,--export-all -Wl,--no-entry $in -o $out
description = C2WASM $out
rule rust2wasm
# The default stack size in Rust is 1MB, which causes oversized allocation warnings,
# because it's allocated in a single chunk as a part of a Wasm Linear Memory.
# We change the stack size to 128KB using the RUSTFLAGS environment variable
# in the command below.
command = RUSTFLAGS="-C link-args=-zstack-size=131072" cargo build --target=wasm32-wasi --example=$example --locked --manifest-path=test/resource/wasm/rust/Cargo.toml --target-dir=$builddir/wasm/ $
&& wasm-opt -Oz $builddir/wasm/wasm32-wasi/debug/examples/$example.wasm -o $builddir/wasm/$example.wasm $
&& wasm-strip $builddir/wasm/$example.wasm
description = RUST2WASM $out
rule wasm2wat
command = wasm2wat $in > $out
description = WASM2WAT $out
''').format(**globals()))
for binary in sorted(wasms):
src = wasm_deps[binary]
wasm = binary[:-4] + '.wasm'
if src.endswith('.rs'):
f.write(f'build $builddir/{wasm}: rust2wasm {src} | test/resource/wasm/rust/Cargo.lock\n')
example_name = binary[binary.rindex('/')+1:-4]
f.write(f' example = {example_name}\n')
else:
f.write(f'build $builddir/{wasm}: c2wasm {src}\n')
f.write(f'build $builddir/{binary}: wasm2wat $builddir/{wasm}\n')
for mode in build_modes:
modeval = modes[mode]
fmt_lib = 'fmt'
@@ -1852,9 +1914,10 @@ with open(buildfile, 'w') as f:
description = RUST_LIB $out
''').format(mode=mode, antlr3_exec=antlr3_exec, fmt_lib=fmt_lib, test_repeat=test_repeat, test_timeout=test_timeout, **modeval))
f.write(
'build {mode}-build: phony {artifacts}\n'.format(
'build {mode}-build: phony {artifacts} {wasms}\n'.format(
mode=mode,
artifacts=str.join(' ', ['$builddir/' + mode + '/' + x for x in sorted(build_artifacts)])
artifacts=str.join(' ', ['$builddir/' + mode + '/' + x for x in sorted(build_artifacts - wasms)]),
wasms = str.join(' ', ['$builddir/' + x for x in sorted(build_artifacts & wasms)]),
)
)
include_cxx_target = f'{mode}-build' if not args.dist_only else ''
@@ -1871,7 +1934,7 @@ with open(buildfile, 'w') as f:
seastar_dep = f'$builddir/{mode}/seastar/libseastar.{seastar_lib_ext}'
seastar_testing_dep = f'$builddir/{mode}/seastar/libseastar_testing.{seastar_lib_ext}'
for binary in sorted(build_artifacts):
if binary in other:
if binary in other or binary in wasms:
continue
srcs = deps[binary]
objs = ['$builddir/' + mode + '/' + src.replace('.cc', '.o')
@@ -1904,7 +1967,7 @@ with open(buildfile, 'w') as f:
if binary not in tests_not_using_seastar_test_framework:
local_libs += ' ' + "$seastar_testing_libs_{}".format(mode)
else:
local_libs += ' ' + '-lgnutls'
local_libs += ' ' + '-lgnutls' + ' ' + '-lboost_unit_test_framework'
# Our code's debugging information is huge, and multiplied
# by many tests yields ridiculous amounts of disk space.
# So we strip the tests by default; The user can very
@@ -1959,9 +2022,10 @@ with open(buildfile, 'w') as f:
)
f.write(
'build {mode}-test: test.{mode} {test_executables} $builddir/{mode}/scylla\n'.format(
'build {mode}-test: test.{mode} {test_executables} $builddir/{mode}/scylla {wasms}\n'.format(
mode=mode,
test_executables=' '.join(['$builddir/{}/{}'.format(mode, binary) for binary in sorted(tests)]),
wasms=' '.join([f'$builddir/{binary}' for binary in sorted(wasms)]),
)
)
f.write(
@@ -2025,13 +2089,14 @@ with open(buildfile, 'w') as f:
for cc in grammar.sources('$builddir/{}/gen'.format(mode)):
obj = cc.replace('.cpp', '.o')
f.write('build {}: cxx.{} {} || {}\n'.format(obj, mode, cc, ' '.join(serializers)))
flags = '-Wno-parentheses-equality'
if cc.endswith('Parser.cpp'):
# Unoptimized parsers end up using huge amounts of stack space and overflowing their stack
flags = '-O1' if modes[mode]['optimization-level'] in ['0', 'g', 's'] else ''
flags += ' -O1' if modes[mode]['optimization-level'] in ['0', 'g', 's'] else ''
if has_sanitize_address_use_after_scope:
flags += ' -fno-sanitize-address-use-after-scope'
f.write(' obj_cxxflags = %s\n' % flags)
f.write(f' obj_cxxflags = {flags}\n')
f.write(f'build $builddir/{mode}/gen/empty.cc: gen\n')
for hh in headers:
f.write('build $builddir/{mode}/{hh}.o: checkhh.{mode} {hh} | $builddir/{mode}/gen/empty.cc || {gen_headers_dep}\n'.format(
@@ -2104,6 +2169,9 @@ with open(buildfile, 'w') as f:
f.write(
'build check: phony {}\n'.format(' '.join(['{mode}-check'.format(mode=mode) for mode in default_modes]))
)
f.write(
'build wasm: phony {}\n'.format(' '.join([f'$builddir/{binary}' for binary in sorted(wasms)]))
)
f.write(textwrap.dedent(f'''\
build dist-unified-tar: phony {' '.join([f'$builddir/{mode}/dist/tar/{scylla_product}-unified-{scylla_version}-{scylla_release}.{arch}.tar.gz' for mode in default_modes])}

View File

@@ -78,9 +78,6 @@ public:
return id() == other.id() && value() == other.value()
&& logical_clock() == other.logical_clock();
}
bool operator!=(const basic_counter_shard_view& other) const {
return !(*this == other);
}
struct less_compare_by_id {
bool operator()(const basic_counter_shard_view& x, const basic_counter_shard_view& y) const {

View File

@@ -7,7 +7,7 @@ generate_cql_grammar(
SOURCES cql_grammar_srcs)
set_source_files_properties(${cql_grammar_srcs}
PROPERTIES
COMPILE_FLAGS "-Wno-uninitialized")
COMPILE_FLAGS "-Wno-uninitialized -Wno-parentheses-equality")
add_library(cql3 STATIC)
target_sources(cql3

View File

@@ -1773,7 +1773,12 @@ relation returns [expression e]
: name=cident type=relationType t=term { $e = binary_operator(unresolved_identifier{std::move(name)}, type, std::move(t)); }
| K_TOKEN l=tupleOfIdentifiers type=relationType t=term
{ $e = binary_operator(token{std::move(l.elements)}, type, std::move(t)); }
{
$e = binary_operator(
function_call{functions::function_name::native_function("token"), std::move(l.elements)},
type,
std::move(t));
}
| name=cident K_IS K_NOT K_NULL {
$e = binary_operator(unresolved_identifier{std::move(name)}, oper_t::IS_NOT, make_untyped_null()); }
| name=cident K_IN marker1=marker

View File

@@ -57,13 +57,7 @@ public:
const cache_key_type& key() const { return _key; }
bool operator==(const authorized_prepared_statements_cache_key& other) const {
return _key == other._key;
}
bool operator!=(const authorized_prepared_statements_cache_key& other) const {
return !(*this == other);
}
bool operator==(const authorized_prepared_statements_cache_key&) const = default;
static size_t hash(const auth::authenticated_user& user, const cql3::prepared_cache_key_type::cache_key_type& prep_cache_key) {
return utils::hash_combine(std::hash<auth::authenticated_user>()(user), utils::tuple_hash()(prep_cache_key));

View File

@@ -11,8 +11,6 @@
#include "cql3/util.hh"
#include "cql3/query_options.hh"
#include <regex>
namespace cql3 {
column_identifier::column_identifier(sstring raw_text, bool keep_case) {
@@ -96,10 +94,6 @@ bool column_identifier_raw::operator==(const column_identifier_raw& other) const
return _text == other._text;
}
bool column_identifier_raw::operator!=(const column_identifier_raw& other) const {
return !operator==(other);
}
sstring column_identifier_raw::to_string() const {
return _text;
}

View File

@@ -88,8 +88,6 @@ public:
bool operator==(const column_identifier_raw& other) const;
bool operator!=(const column_identifier_raw& other) const;
virtual sstring to_string() const;
sstring to_cql_string() const;

View File

@@ -205,10 +205,10 @@ class cql3_type::raw_ut : public raw {
virtual sstring to_string() const override {
if (is_frozen()) {
return format("frozen<{}>", _name.to_string());
return format("frozen<{}>", _name.to_cql_string());
}
return _name.to_string();
return _name.to_cql_string();
}
public:
raw_ut(ut_name name)

View File

@@ -86,57 +86,43 @@ private:
{
using namespace antlr3;
std::stringstream msg;
// Antlr3 has a function ex->displayRecognitionError() which is
// supposed to nicely print the recognition exception. Unfortunately
// it is buggy - see https://github.com/antlr/antlr3/issues/191
// and not being fixed, so let's copy it here and fix it here.
switch (ex->getType()) {
case ExceptionType::UNWANTED_TOKEN_EXCEPTION: {
msg << "extraneous input " << get_token_error_display(recognizer, ex->get_token());
if (token_names != nullptr) {
std::string token_name;
if (recognizer.is_eof_token(ex->get_expecting())) {
token_name = "EOF";
} else {
token_name = reinterpret_cast<const char*>(token_names[ex->get_expecting()]);
}
msg << " expecting " << token_name;
}
break;
}
case ExceptionType::MISSING_TOKEN_EXCEPTION: {
std::string token_name;
if (token_names == nullptr) {
token_name = "(" + std::to_string(ex->get_expecting()) + ")";
} else {
if (recognizer.is_eof_token(ex->get_expecting())) {
token_name = "EOF";
} else {
token_name = reinterpret_cast<const char*>(token_names[ex->get_expecting()]);
}
}
msg << "missing " << token_name << " at " << get_token_error_display(recognizer, ex->get_token());
break;
}
case ExceptionType::NO_VIABLE_ALT_EXCEPTION: {
msg << "no viable alternative at input " << get_token_error_display(recognizer, ex->get_token());
break;
}
case ExceptionType::RECOGNITION_EXCEPTION:
case ExceptionType::EARLY_EXIT_EXCEPTION:
default:
// AntLR Exception class has a bug of dereferencing a null
// pointer in the displayRecognitionError. The following
// if statement makes sure it will not be null before the
// call to that function (displayRecognitionError).
// bug reference: https://github.com/antlr/antlr3/issues/191
if (!ex->get_expectingSet()) {
ex->set_expectingSet(&_empty_bit_list);
// Unknown syntax error - the parser can't figure out what
// specific token is missing or unwanted.
msg << ": Syntax error";
break;
case ExceptionType::MISSING_TOKEN_EXCEPTION:
msg << ": Missing ";
if (recognizer.is_eof_token(ex->get_expecting())) {
msg << "EOF";
} else if (token_names) {
msg << reinterpret_cast<const char*>(token_names[ex->get_expecting()]);
} else {
msg << ex->get_expecting();
}
ex->displayRecognitionError(token_names, msg);
break;
case ExceptionType::UNWANTED_TOKEN_EXCEPTION:
case ExceptionType::MISMATCHED_SET_EXCEPTION:
msg << ": Unexpected '";
msg << recognizer.token_text(ex->get_token());
msg << "'";
break;
case ExceptionType::NO_VIABLE_ALT_EXCEPTION:
msg << "no viable alternative at input '";
msg << recognizer.token_text(ex->get_token());
msg << "'";
break;
}
return msg.str();
}
std::string get_token_error_display(RecognizerType& recognizer, const TokenType* token)
{
return "'" + recognizer.token_text(token) + "'";
}
#if 0
/**

View File

@@ -56,10 +56,6 @@ bool operator==(const expression& e1, const expression& e2) {
}, e1);
}
bool operator!=(const expression& e1, const expression& e2) {
return !(e1 == e2);
}
expression::expression(const expression& o)
: _v(std::make_unique<impl>(*o._v)) {
}
@@ -70,24 +66,6 @@ expression::operator=(const expression& o) {
return *this;
}
token::token(std::vector<expression> args_in)
: args(std::move(args_in)) {
}
token::token(const std::vector<const column_definition*>& col_defs) {
args.reserve(col_defs.size());
for (const column_definition* col_def : col_defs) {
args.push_back(column_value(col_def));
}
}
token::token(const std::vector<::shared_ptr<column_identifier_raw>>& cols) {
args.reserve(cols.size());
for(const ::shared_ptr<column_identifier_raw>& col : cols) {
args.push_back(unresolved_identifier{col});
}
}
binary_operator::binary_operator(expression lhs, oper_t op, expression rhs, comparison_order order)
: lhs(std::move(lhs))
, op(op)
@@ -564,89 +542,11 @@ value_set intersection(value_set a, value_set b, const abstract_type* type) {
return std::visit(intersection_visitor{type}, std::move(a), std::move(b));
}
bool is_satisfied_by(const binary_operator& opr, const evaluation_inputs& inputs) {
if (is<token>(opr.lhs)) {
// The RHS value was already used to ensure we fetch only rows in the specified
// token range. It is impossible for any fetched row not to match now.
// When token restrictions are present we forbid all other restrictions on partition key.
// This means that the partition range is defined solely by restrictions on token.
// When is_satisifed_by is used by filtering we can be sure that the token restrictions
// are fulfilled. In the future it will be possible to evaluate() a token,
// and we will be able to get rid of this risky if.
return true;
}
raw_value binop_eval_result = evaluate(opr, inputs);
if (binop_eval_result.is_null()) {
return false;
}
if (binop_eval_result.is_empty_value()) {
on_internal_error(expr_logger, format("is_satisfied_by: binary operator evaluated to EMPTY_VALUE: {}", opr));
}
return binop_eval_result.view().deserialize<bool>(*boolean_type);
}
} // anonymous namespace
bool is_satisfied_by(const expression& restr, const evaluation_inputs& inputs) {
return expr::visit(overloaded_functor{
[] (const constant& constant_val) {
std::optional<bool> bool_val = get_bool_value(constant_val);
if (bool_val.has_value()) {
return *bool_val;
}
on_internal_error(expr_logger,
"is_satisfied_by: a constant that is not a bool value cannot serve as a restriction by itself");
},
[&] (const conjunction& conj) {
return boost::algorithm::all_of(conj.children, [&] (const expression& c) {
return is_satisfied_by(c, inputs);
});
},
[&] (const binary_operator& opr) { return is_satisfied_by(opr, inputs); },
[] (const column_value&) -> bool {
on_internal_error(expr_logger, "is_satisfied_by: a column cannot serve as a restriction by itself");
},
[] (const subscript&) -> bool {
on_internal_error(expr_logger, "is_satisfied_by: a subscript cannot serve as a restriction by itself");
},
[] (const token&) -> bool {
on_internal_error(expr_logger, "is_satisfied_by: the token function cannot serve as a restriction by itself");
},
[] (const unresolved_identifier&) -> bool {
on_internal_error(expr_logger, "is_satisfied_by: an unresolved identifier cannot serve as a restriction");
},
[] (const column_mutation_attribute&) -> bool {
on_internal_error(expr_logger, "is_satisfied_by: the writetime/ttl cannot serve as a restriction by itself");
},
[] (const function_call&) -> bool {
on_internal_error(expr_logger, "is_satisfied_by: a function call cannot serve as a restriction by itself");
},
[] (const cast&) -> bool {
on_internal_error(expr_logger, "is_satisfied_by: a a type cast cannot serve as a restriction by itself");
},
[] (const field_selection&) -> bool {
on_internal_error(expr_logger, "is_satisfied_by: a field selection cannot serve as a restriction by itself");
},
[] (const bind_variable&) -> bool {
on_internal_error(expr_logger, "is_satisfied_by: a bind variable cannot serve as a restriction by itself");
},
[] (const untyped_constant&) -> bool {
on_internal_error(expr_logger, "is_satisfied_by: an untyped constant cannot serve as a restriction by itself");
},
[] (const tuple_constructor&) -> bool {
on_internal_error(expr_logger, "is_satisfied_by: a tuple constructor cannot serve as a restriction by itself");
},
[] (const collection_constructor&) -> bool {
on_internal_error(expr_logger, "is_satisfied_by: a collection constructor cannot serve as a restriction by itself");
},
[] (const usertype_constructor&) -> bool {
on_internal_error(expr_logger, "is_satisfied_by: a user type constructor cannot serve as a restriction by itself");
},
}, restr);
static auto true_value = managed_bytes_opt(data_value(true).serialize_nonnull());
return evaluate(restr, inputs).to_managed_bytes_opt() == true_value;
}
namespace {
@@ -767,7 +667,15 @@ nonwrapping_range<clustering_key_prefix> to_range(oper_t op, const clustering_ke
return to_range<const clustering_key_prefix&>(op, val);
}
value_set possible_lhs_values(const column_definition* cdef, const expression& expr, const query_options& options) {
// When cdef == nullptr it finds possible token values instead of column values.
// When finding token values the table_schema_opt argument has to point to a valid schema,
// but it isn't used when finding values for column.
// The schema is needed to find out whether a call to token() function represents
// the partition token.
static value_set possible_lhs_values(const column_definition* cdef,
const expression& expr,
const query_options& options,
const schema* table_schema_opt) {
const auto type = cdef ? &cdef->type->without_reversed() : long_type.get();
return expr::visit(overloaded_functor{
[] (const constant& constant_val) {
@@ -783,7 +691,7 @@ value_set possible_lhs_values(const column_definition* cdef, const expression& e
return boost::accumulate(conj.children, unbounded_value_set,
[&] (const value_set& acc, const expression& child) {
return intersection(
std::move(acc), possible_lhs_values(cdef, child, options), type);
std::move(acc), possible_lhs_values(cdef, child, options, table_schema_opt), type);
});
},
[&] (const binary_operator& oper) -> value_set {
@@ -863,7 +771,11 @@ value_set possible_lhs_values(const column_definition* cdef, const expression& e
}
return unbounded_value_set;
},
[&] (token) -> value_set {
[&] (const function_call& token_fun_call) -> value_set {
if (!is_partition_token_for_schema(token_fun_call, *table_schema_opt)) {
on_internal_error(expr_logger, "possible_lhs_values: function calls are not supported as the LHS of a binary expression");
}
if (cdef) {
return unbounded_value_set;
}
@@ -905,9 +817,6 @@ value_set possible_lhs_values(const column_definition* cdef, const expression& e
[] (const column_mutation_attribute&) -> value_set {
on_internal_error(expr_logger, "possible_lhs_values: writetime/ttl are not supported as the LHS of a binary expression");
},
[] (const function_call&) -> value_set {
on_internal_error(expr_logger, "possible_lhs_values: function calls are not supported as the LHS of a binary expression");
},
[] (const cast&) -> value_set {
on_internal_error(expr_logger, "possible_lhs_values: typecasts are not supported as the LHS of a binary expression");
},
@@ -934,11 +843,8 @@ value_set possible_lhs_values(const column_definition* cdef, const expression& e
[] (const subscript&) -> value_set {
on_internal_error(expr_logger, "possible_lhs_values: a subscript cannot serve as a restriction by itself");
},
[] (const token&) -> value_set {
on_internal_error(expr_logger, "possible_lhs_values: the token function cannot serve as a restriction by itself");
},
[] (const unresolved_identifier&) -> value_set {
on_internal_error(expr_logger, "is_satisfied_by: an unresolved identifier cannot serve as a restriction");
on_internal_error(expr_logger, "possible_lhs_values: an unresolved identifier cannot serve as a restriction");
},
[] (const column_mutation_attribute&) -> value_set {
on_internal_error(expr_logger, "possible_lhs_values: the writetime/ttl functions cannot serve as a restriction by itself");
@@ -970,6 +876,14 @@ value_set possible_lhs_values(const column_definition* cdef, const expression& e
}, expr);
}
value_set possible_column_values(const column_definition* col, const expression& e, const query_options& options) {
return possible_lhs_values(col, e, options, nullptr);
}
value_set possible_partition_token_values(const expression& e, const query_options& options, const schema& table_schema) {
return possible_lhs_values(nullptr, e, options, &table_schema);
}
nonwrapping_range<managed_bytes> to_range(const value_set& s) {
return std::visit(overloaded_functor{
[] (const nonwrapping_range<managed_bytes>& r) { return r; },
@@ -1017,7 +931,7 @@ secondary_index::index::supports_expression_v is_supported_by_helper(const expre
// We don't use index table for multi-column restrictions, as it cannot avoid filtering.
return index::supports_expression_v::from_bool(false);
},
[&] (const token&) { return index::supports_expression_v::from_bool(false); },
[&] (const function_call&) { return index::supports_expression_v::from_bool(false); },
[&] (const subscript& s) -> ret_t {
const column_value& col = get_subscripted_column(s);
return idx.supports_subscript_expression(*col.col, oper.op);
@@ -1037,9 +951,6 @@ secondary_index::index::supports_expression_v is_supported_by_helper(const expre
[&] (const column_mutation_attribute&) -> ret_t {
on_internal_error(expr_logger, "is_supported_by: writetime/ttl are not supported as the LHS of a binary expression");
},
[&] (const function_call&) -> ret_t {
on_internal_error(expr_logger, "is_supported_by: function calls are not supported as the LHS of a binary expression");
},
[&] (const cast&) -> ret_t {
on_internal_error(expr_logger, "is_supported_by: typecasts are not supported as the LHS of a binary expression");
},
@@ -1106,7 +1017,7 @@ std::ostream& operator<<(std::ostream& os, const column_value& cv) {
std::ostream& operator<<(std::ostream& os, const expression& expr) {
expression::printer pr {
.expr_to_print = expr,
.debug_mode = true
.debug_mode = false
};
return os << pr;
@@ -1163,9 +1074,6 @@ std::ostream& operator<<(std::ostream& os, const expression::printer& pr) {
}
}
},
[&] (const token& t) {
fmt::print(os, "token({})", fmt::join(t.args | transformed(to_printer), ", "));
},
[&] (const column_value& col) {
fmt::print(os, "{}", cql3::util::maybe_quote(col.col->name_as_text()));
},
@@ -1185,14 +1093,18 @@ std::ostream& operator<<(std::ostream& os, const expression::printer& pr) {
to_printer(cma.column));
},
[&] (const function_call& fc) {
std::visit(overloaded_functor{
[&] (const functions::function_name& named) {
fmt::print(os, "{}({})", named, fmt::join(fc.args | transformed(to_printer), ", "));
},
[&] (const shared_ptr<functions::function>& anon) {
fmt::print(os, "<anonymous function>({})", fmt::join(fc.args | transformed(to_printer), ", "));
},
}, fc.func);
if (is_token_function(fc)) {
fmt::print(os, "token({})", fmt::join(fc.args | transformed(to_printer), ", "));
} else {
std::visit(overloaded_functor{
[&] (const functions::function_name& named) {
fmt::print(os, "{}({})", named, fmt::join(fc.args | transformed(to_printer), ", "));
},
[&] (const shared_ptr<functions::function>& anon) {
fmt::print(os, "<anonymous function>({})", fmt::join(fc.args | transformed(to_printer), ", "));
},
}, fc.func);
}
},
[&] (const cast& c) {
std::visit(overloaded_functor{
@@ -1367,9 +1279,9 @@ expression replace_column_def(const expression& expr, const column_definition* n
});
}
expression replace_token(const expression& expr, const column_definition* new_cdef) {
expression replace_partition_token(const expression& expr, const column_definition* new_cdef, const schema& table_schema) {
return search_and_replace(expr, [&] (const expression& expr) -> std::optional<expression> {
if (expr::is<token>(expr)) {
if (is_partition_token_for_schema(expr, table_schema)) {
return column_value{new_cdef};
} else {
return std::nullopt;
@@ -1443,14 +1355,6 @@ bool recurse_until(const expression& e, const noncopyable_function<bool (const e
}
return false;
},
[&] (const token& tok) {
for (auto& a : tok.args) {
if (auto found = recurse_until(a, predicate_fun)) {
return found;
}
}
return false;
},
[](LeafExpression auto const&) {
return false;
}
@@ -1526,13 +1430,6 @@ expression search_and_replace(const expression& e,
.type = s.type,
};
},
[&](const token& tok) -> expression {
return token {
boost::copy_range<std::vector<expression>>(
tok.args | boost::adaptors::transformed(recurse)
)
};
},
[&] (LeafExpression auto const& e) -> expression {
return e;
},
@@ -1607,7 +1504,6 @@ std::vector<expression> extract_single_column_restrictions_for_column(const expr
}
}
void operator()(const token&) {}
void operator()(const unresolved_identifier&) {}
void operator()(const column_mutation_attribute&) {}
void operator()(const function_call&) {}
@@ -1771,9 +1667,6 @@ cql3::raw_value evaluate(const expression& e, const evaluation_inputs& inputs) {
[&](const conjunction& conj) -> cql3::raw_value {
return evaluate(conj, inputs);
},
[](const token&) -> cql3::raw_value {
on_internal_error(expr_logger, "Can't evaluate token");
},
[](const unresolved_identifier&) -> cql3::raw_value {
on_internal_error(expr_logger, "Can't evaluate unresolved_identifier");
},
@@ -2313,11 +2206,6 @@ void fill_prepare_context(expression& e, prepare_context& ctx) {
fill_prepare_context(child, ctx);
}
},
[&](token& tok) {
for (expression& arg : tok.args) {
fill_prepare_context(arg, ctx);
}
},
[](unresolved_identifier&) {},
[&](column_mutation_attribute& a) {
fill_prepare_context(a.column, ctx);
@@ -2367,9 +2255,6 @@ type_of(const expression& e) {
[] (const column_value& e) {
return e.col->type;
},
[] (const token& e) {
return long_type;
},
[] (const unresolved_identifier& e) -> data_type {
on_internal_error(expr_logger, "evaluating type of unresolved_identifier");
},
@@ -2550,7 +2435,7 @@ sstring get_columns_in_commons(const expression& a, const expression& b) {
}
bytes_opt value_for(const column_definition& cdef, const expression& e, const query_options& options) {
value_set possible_vals = possible_lhs_values(&cdef, e, options);
value_set possible_vals = possible_column_values(&cdef, e, options);
return std::visit(overloaded_functor {
[&](const value_list& val_list) -> bytes_opt {
if (val_list.empty()) {
@@ -2694,5 +2579,69 @@ adjust_for_collection_as_maps(const expression& e) {
});
}
bool is_token_function(const function_call& fun_call) {
static thread_local const functions::function_name token_function_name =
functions::function_name::native_function("token");
// Check that function name is "token"
const functions::function_name& fun_name =
std::visit(overloaded_functor{[](const functions::function_name& fname) { return fname; },
[](const shared_ptr<functions::function>& fun) { return fun->name(); }},
fun_call.func);
return fun_name.has_keyspace() ? fun_name == token_function_name : fun_name.name == token_function_name.name;
}
bool is_token_function(const expression& e) {
const function_call* fun_call = as_if<function_call>(&e);
if (fun_call == nullptr) {
return false;
}
return is_token_function(*fun_call);
}
bool is_partition_token_for_schema(const function_call& fun_call, const schema& table_schema) {
if (!is_token_function(fun_call)) {
return false;
}
if (fun_call.args.size() != table_schema.partition_key_size()) {
return false;
}
auto arguments_iter = fun_call.args.begin();
for (const column_definition& partition_key_col : table_schema.partition_key_columns()) {
const expression& cur_argument = *arguments_iter;
const column_value* cur_col = as_if<column_value>(&cur_argument);
if (cur_col == nullptr) {
// A sanity check that we didn't call the function on an unprepared expression.
if (is<unresolved_identifier>(cur_argument)) {
on_internal_error(expr_logger,
format("called is_partition_token with unprepared expression: {}", fun_call));
}
return false;
}
if (cur_col->col != &partition_key_col) {
return false;
}
arguments_iter++;
}
return true;
}
bool is_partition_token_for_schema(const expression& maybe_token, const schema& table_schema) {
const function_call* fun_call = as_if<function_call>(&maybe_token);
if (fun_call == nullptr) {
return false;
}
return is_partition_token_for_schema(*fun_call, table_schema);
}
} // namespace expr
} // namespace cql3

View File

@@ -70,7 +70,6 @@ struct binary_operator;
struct conjunction;
struct column_value;
struct subscript;
struct token;
struct unresolved_identifier;
struct column_mutation_attribute;
struct function_call;
@@ -89,7 +88,6 @@ concept ExpressionElement
|| std::same_as<T, binary_operator>
|| std::same_as<T, column_value>
|| std::same_as<T, subscript>
|| std::same_as<T, token>
|| std::same_as<T, unresolved_identifier>
|| std::same_as<T, column_mutation_attribute>
|| std::same_as<T, function_call>
@@ -109,7 +107,6 @@ concept invocable_on_expression
&& std::invocable<Func, binary_operator>
&& std::invocable<Func, column_value>
&& std::invocable<Func, subscript>
&& std::invocable<Func, token>
&& std::invocable<Func, unresolved_identifier>
&& std::invocable<Func, column_mutation_attribute>
&& std::invocable<Func, function_call>
@@ -129,7 +126,6 @@ concept invocable_on_expression_ref
&& std::invocable<Func, binary_operator&>
&& std::invocable<Func, column_value&>
&& std::invocable<Func, subscript&>
&& std::invocable<Func, token&>
&& std::invocable<Func, unresolved_identifier&>
&& std::invocable<Func, column_mutation_attribute&>
&& std::invocable<Func, function_call&>
@@ -229,18 +225,6 @@ const column_value& get_subscripted_column(const subscript&);
/// Only columns can be subscripted in CQL, so we can expect that the subscripted expression is a column_value.
const column_value& get_subscripted_column(const expression&);
/// Represents token(c1, c2) function on LHS of an operator relation.
/// args contains arguments to the token function.
struct token {
std::vector<expression> args;
explicit token(std::vector<expression>);
explicit token(const std::vector<const column_definition*>&);
explicit token(const std::vector<::shared_ptr<column_identifier_raw>>&);
friend bool operator==(const token&, const token&) = default;
};
enum class oper_t { EQ, NEQ, LT, LTE, GTE, GT, IN, CONTAINS, CONTAINS_KEY, IS_NOT, LIKE };
/// Describes the nature of clustering-key comparisons. Useful for implementing SCYLLA_CLUSTERING_BOUND.
@@ -429,7 +413,7 @@ struct usertype_constructor {
// now that all expression types are fully defined, we can define expression::impl
struct expression::impl final {
using variant_type = std::variant<
conjunction, binary_operator, column_value, token, unresolved_identifier,
conjunction, binary_operator, column_value, unresolved_identifier,
column_mutation_attribute, function_call, cast, field_selection,
bind_variable, untyped_constant, constant, tuple_constructor, collection_constructor,
usertype_constructor, subscript>;
@@ -510,8 +494,8 @@ using value_list = std::vector<managed_bytes>; // Sorted and deduped using value
/// never singular and never has start > end. Universal set is a nonwrapping_range with both bounds null.
using value_set = std::variant<value_list, nonwrapping_range<managed_bytes>>;
/// A set of all column values that would satisfy an expression. If column is null, a set of all token values
/// that satisfy.
/// A set of all column values that would satisfy an expression. The _token_values variant finds
/// matching values for the partition token function call instead of the column.
///
/// An expression restricts possible values of a column or token:
/// - `A>5` restricts A from below
@@ -521,7 +505,8 @@ using value_set = std::variant<value_list, nonwrapping_range<managed_bytes>>;
/// - `A=1 AND A<=0` restricts A to an empty list; no value is able to satisfy the expression
/// - `A>=NULL` also restricts A to an empty list; all comparisons to NULL are false
/// - an expression without A "restricts" A to unbounded range
extern value_set possible_lhs_values(const column_definition*, const expression&, const query_options&);
extern value_set possible_column_values(const column_definition*, const expression&, const query_options&);
extern value_set possible_partition_token_values(const expression&, const query_options&, const schema& table_schema);
/// Turns value_set into a range, unless it's a multi-valued list (in which case this throws).
extern nonwrapping_range<managed_bytes> to_range(const value_set&);
@@ -642,8 +627,21 @@ inline bool is_multi_column(const binary_operator& op) {
return expr::is<tuple_constructor>(op.lhs);
}
inline bool has_token(const expression& e) {
return find_binop(e, [] (const binary_operator& o) { return expr::is<token>(o.lhs); });
// Check whether the given expression represents
// a call to the token() function.
bool is_token_function(const function_call&);
bool is_token_function(const expression&);
bool is_partition_token_for_schema(const function_call&, const schema&);
bool is_partition_token_for_schema(const expression&, const schema&);
/// Check whether the expression contains a binary_operator whose LHS is a call to the token
/// function representing a partition key token.
/// Examples:
/// For expression: "token(p1, p2, p3) < 123 AND c = 2" returns true
/// For expression: "p1 = token(1, 2, 3) AND c = 2" return false
inline bool has_partition_token(const expression& e, const schema& table_schema) {
return find_binop(e, [&] (const binary_operator& o) { return is_partition_token_for_schema(o.lhs, table_schema); });
}
inline bool has_slice_or_needs_filtering(const expression& e) {
@@ -689,7 +687,8 @@ extern expression replace_column_def(const expression&, const column_definition*
// Replaces all occurences of token(p1, p2) on the left hand side with the given colum.
// For example this changes token(p1, p2) < token(1, 2) to my_column_name < token(1, 2).
extern expression replace_token(const expression&, const column_definition*);
// Schema is needed to find out which calls to token() describe the partition token.
extern expression replace_partition_token(const expression&, const column_definition*, const schema&);
// Recursively copies e and returns it. Calls replace_candidate() on all nodes. If it returns nullopt,
// continue with the copying. If it returns an expression, that expression replaces the current node.
@@ -829,12 +828,12 @@ bool has_only_eq_binops(const expression&);
} // namespace cql3
/// Custom formatter for an expression. Use {:user} for user-oriented
/// output, {:debug} for debug-oriented output. Debug is the default.
/// output, {:debug} for debug-oriented output. User is the default.
///
/// Required for fmt::join() to work on expression.
template <>
class fmt::formatter<cql3::expr::expression> {
bool _debug = true;
bool _debug = false;
private:
constexpr static bool try_match_and_advance(format_parse_context& ctx, std::string_view s) {
auto [ctx_end, s_end] = std::ranges::mismatch(ctx, s);

View File

@@ -79,7 +79,7 @@ static
void
usertype_constructor_validate_assignable_to(const usertype_constructor& u, data_dictionary::database db, const sstring& keyspace, const column_specification& receiver) {
if (!receiver.type->is_user_type()) {
throw exceptions::invalid_request_exception(format("Invalid user type literal for {} of type {}", receiver.name, receiver.type->as_cql3_type()));
throw exceptions::invalid_request_exception(format("Invalid user type literal for {} of type {}", *receiver.name, receiver.type->as_cql3_type()));
}
auto ut = static_pointer_cast<const user_type_impl>(receiver.type);
@@ -91,7 +91,7 @@ usertype_constructor_validate_assignable_to(const usertype_constructor& u, data_
const expression& value = u.elements.at(field);
auto&& field_spec = usertype_field_spec_of(receiver, i);
if (!assignment_testable::is_assignable(test_assignment(value, db, keyspace, *field_spec))) {
throw exceptions::invalid_request_exception(format("Invalid user type literal for {}: field {} is not of type {}", receiver.name, field, field_spec->type->as_cql3_type()));
throw exceptions::invalid_request_exception(format("Invalid user type literal for {}: field {} is not of type {}", *receiver.name, field, field_spec->type->as_cql3_type()));
}
}
}
@@ -314,7 +314,7 @@ set_validate_assignable_to(const collection_constructor& c, data_dictionary::dat
return;
}
throw exceptions::invalid_request_exception(format("Invalid set literal for {} of type {}", receiver.name, receiver.type->as_cql3_type()));
throw exceptions::invalid_request_exception(format("Invalid set literal for {} of type {}", *receiver.name, receiver.type->as_cql3_type()));
}
auto&& value_spec = set_value_spec_of(receiver);
@@ -502,18 +502,18 @@ void
tuple_constructor_validate_assignable_to(const tuple_constructor& tc, data_dictionary::database db, const sstring& keyspace, const column_specification& receiver) {
auto tt = dynamic_pointer_cast<const tuple_type_impl>(receiver.type->underlying_type());
if (!tt) {
throw exceptions::invalid_request_exception(format("Invalid tuple type literal for {} of type {}", receiver.name, receiver.type->as_cql3_type()));
throw exceptions::invalid_request_exception(format("Invalid tuple type literal for {} of type {}", *receiver.name, receiver.type->as_cql3_type()));
}
for (size_t i = 0; i < tc.elements.size(); ++i) {
if (i >= tt->size()) {
throw exceptions::invalid_request_exception(format("Invalid tuple literal for {}: too many elements. Type {} expects {:d} but got {:d}",
receiver.name, tt->as_cql3_type(), tt->size(), tc.elements.size()));
*receiver.name, tt->as_cql3_type(), tt->size(), tc.elements.size()));
}
auto&& value = tc.elements[i];
auto&& spec = component_spec_of(receiver, i);
if (!assignment_testable::is_assignable(test_assignment(value, db, keyspace, *spec))) {
throw exceptions::invalid_request_exception(format("Invalid tuple literal for {}: component {:d} is not of type {}", receiver.name, i, spec->type->as_cql3_type()));
throw exceptions::invalid_request_exception(format("Invalid tuple literal for {}: component {:d} is not of type {}", *receiver.name, i, spec->type->as_cql3_type()));
}
}
}
@@ -817,17 +817,38 @@ cast_prepare_expression(const cast& c, data_dictionary::database db, const sstri
std::optional<expression>
prepare_function_call(const expr::function_call& fc, data_dictionary::database db, const sstring& keyspace, const schema* schema_opt, lw_shared_ptr<column_specification> receiver) {
if (!receiver) {
// TODO: It is possible to infer the type of a function call if there is only one overload, or if all overloads return the same type
return std::nullopt;
// Try to extract a column family name from the available information.
// Most functions can be prepared without information about the column family, usually just the keyspace is enough.
// One exception is the token() function - in order to prepare system.token() we have to know the partition key of the table,
// which can only be known when the column family is known.
// In cases when someone calls prepare_function_call on a token() function without a known column_family, an exception is thrown by functions::get.
std::optional<std::string_view> cf_name;
if (schema_opt != nullptr) {
cf_name = std::string_view(schema_opt->cf_name());
} else if (receiver.get() != nullptr) {
cf_name = receiver->cf_name;
}
// Prepare the arguments that can be prepared without a receiver.
// Prepared expressions have a known type, which helps with finding the right function.
std::vector<expression> partially_prepared_args;
for (const expression& argument : fc.args) {
std::optional<expression> prepared_arg_opt = try_prepare_expression(argument, db, keyspace, schema_opt, nullptr);
if (prepared_arg_opt.has_value()) {
partially_prepared_args.emplace_back(*prepared_arg_opt);
} else {
partially_prepared_args.push_back(argument);
}
}
auto&& fun = std::visit(overloaded_functor{
[] (const shared_ptr<functions::function>& func) {
return func;
},
[&] (const functions::function_name& name) {
auto args = boost::copy_range<std::vector<::shared_ptr<assignment_testable>>>(fc.args | boost::adaptors::transformed(expr::as_assignment_testable));
auto fun = functions::functions::get(db, keyspace, name, args, receiver->ks_name, receiver->cf_name, receiver.get());
auto args = boost::copy_range<std::vector<::shared_ptr<assignment_testable>>>(
partially_prepared_args | boost::adaptors::transformed(expr::as_assignment_testable));
auto fun = functions::functions::get(db, keyspace, name, args, keyspace, cf_name, receiver.get());
if (!fun) {
throw exceptions::invalid_request_exception(format("Unknown function {} called", name));
}
@@ -843,7 +864,7 @@ prepare_function_call(const expr::function_call& fc, data_dictionary::database d
// Functions.get() will complain if no function "name" type check with the provided arguments.
// We still have to validate that the return type matches however
if (!receiver->type->is_value_compatible_with(*scalar_fun->return_type())) {
if (receiver && !receiver->type->is_value_compatible_with(*scalar_fun->return_type())) {
throw exceptions::invalid_request_exception(format("Type error: cannot assign result of function {} (type {}) to {} (type {})",
fun->name(), fun->return_type()->as_cql3_type(),
receiver->name, receiver->type->as_cql3_type()));
@@ -855,11 +876,11 @@ prepare_function_call(const expr::function_call& fc, data_dictionary::database d
}
std::vector<expr::expression> parameters;
parameters.reserve(fc.args.size());
parameters.reserve(partially_prepared_args.size());
bool all_terminal = true;
for (size_t i = 0; i < fc.args.size(); ++i) {
expr::expression e = prepare_expression(fc.args[i], db, keyspace, schema_opt,
functions::functions::make_arg_spec(receiver->ks_name, receiver->cf_name, *scalar_fun, i));
for (size_t i = 0; i < partially_prepared_args.size(); ++i) {
expr::expression e = prepare_expression(partially_prepared_args[i], db, keyspace, schema_opt,
functions::functions::make_arg_spec(keyspace, cf_name, *scalar_fun, i));
if (!expr::is<expr::constant>(e)) {
all_terminal = false;
}
@@ -908,6 +929,17 @@ test_assignment_function_call(const cql3::expr::function_call& fc, data_dictiona
}
}
static assignment_testable::test_result expression_test_assignment(const data_type& expr_type,
const column_specification& receiver) {
if (receiver.type->underlying_type() == expr_type->underlying_type()) {
return assignment_testable::test_result::EXACT_MATCH;
} else if (receiver.type->is_value_compatible_with(*expr_type)) {
return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
} else {
return assignment_testable::test_result::NOT_ASSIGNABLE;
}
}
std::optional<expression> prepare_conjunction(const conjunction& conj,
data_dictionary::database db,
const sstring& keyspace,
@@ -958,8 +990,20 @@ std::optional<expression> prepare_conjunction(const conjunction& conj,
std::optional<expression>
try_prepare_expression(const expression& expr, data_dictionary::database db, const sstring& keyspace, const schema* schema_opt, lw_shared_ptr<column_specification> receiver) {
return expr::visit(overloaded_functor{
[] (const constant&) -> std::optional<expression> {
on_internal_error(expr_logger, "Can't prepare constant_value, it should not appear in parser output");
[&] (const constant& value) -> std::optional<expression> {
if (receiver && !is_assignable(expression_test_assignment(value.type, *receiver))) {
throw exceptions::invalid_request_exception(
format("cannot assign a constant {:user} of type {} to receiver {} of type {}", value,
value.type->as_cql3_type(), receiver->name, receiver->type->as_cql3_type()));
}
constant result = value;
if (receiver) {
// The receiver might have a different type from the constant, but this is allowed if the types are compatible.
// In such case the type is implictly converted to receiver type.
result.type = receiver->type;
}
return result;
},
[&] (const binary_operator& binop) -> std::optional<expression> {
if (receiver.get() != nullptr && &receiver->type->without_reversed() != boolean_type.get()) {
@@ -1013,24 +1057,6 @@ try_prepare_expression(const expression& expr, data_dictionary::database db, con
.type = static_cast<const collection_type_impl&>(sub_col_type).value_comparator(),
};
},
[&] (const token& tk) -> std::optional<expression> {
if (!schema_opt) {
throw exceptions::invalid_request_exception("cannot process token() function without schema");
}
std::vector<expression> prepared_token_args;
prepared_token_args.reserve(tk.args.size());
for (const expression& arg : tk.args) {
auto prepared_arg_opt = try_prepare_expression(arg, db, keyspace, schema_opt, receiver);
if (!prepared_arg_opt) {
return std::nullopt;
}
prepared_token_args.emplace_back(std::move(*prepared_arg_opt));
}
return token(std::move(prepared_token_args));
},
[&] (const unresolved_identifier& unin) -> std::optional<expression> {
if (!schema_opt) {
throw exceptions::invalid_request_exception(fmt::format("Cannot resolve column {} without schema", unin.ident->to_cql_string()));
@@ -1076,9 +1102,8 @@ assignment_testable::test_result
test_assignment(const expression& expr, data_dictionary::database db, const sstring& keyspace, const column_specification& receiver) {
using test_result = assignment_testable::test_result;
return expr::visit(overloaded_functor{
[&] (const constant&) -> test_result {
// constants shouldn't appear in parser output, only untyped_constants
on_internal_error(expr_logger, "constants are not yet reachable via test_assignment()");
[&] (const constant& value) -> test_result {
return expression_test_assignment(value.type, receiver);
},
[&] (const binary_operator&) -> test_result {
on_internal_error(expr_logger, "binary_operators are not yet reachable via test_assignment()");
@@ -1086,15 +1111,12 @@ test_assignment(const expression& expr, data_dictionary::database db, const sstr
[&] (const conjunction&) -> test_result {
on_internal_error(expr_logger, "conjunctions are not yet reachable via test_assignment()");
},
[&] (const column_value&) -> test_result {
on_internal_error(expr_logger, "column_values are not yet reachable via test_assignment()");
[&] (const column_value& col_val) -> test_result {
return expression_test_assignment(col_val.col->type, receiver);
},
[&] (const subscript&) -> test_result {
on_internal_error(expr_logger, "subscripts are not yet reachable via test_assignment()");
},
[&] (const token&) -> test_result {
on_internal_error(expr_logger, "tokens are not yet reachable via test_assignment()");
},
[&] (const unresolved_identifier&) -> test_result {
on_internal_error(expr_logger, "unresolved_identifiers are not yet reachable via test_assignment()");
},
@@ -1221,11 +1243,30 @@ static lw_shared_ptr<column_specification> get_lhs_receiver(const expression& pr
data_type tuple_type = tuple_type_impl::get_instance(tuple_types);
return make_lw_shared<column_specification>(schema.ks_name(), schema.cf_name(), std::move(identifier), std::move(tuple_type));
},
[&](const token& col_val) -> lw_shared_ptr<column_specification> {
return make_lw_shared<column_specification>(schema.ks_name(),
schema.cf_name(),
::make_shared<column_identifier>("partition key token", true),
dht::token::get_token_validator());
[&](const function_call& fun_call) -> lw_shared_ptr<column_specification> {
// In case of an expression like `token(p1, p2, p3) = ?` the receiver name should be "partition key token".
// This is required for compatibality with the java driver, it breaks with a receiver name like "token(p1, p2, p3)".
if (is_partition_token_for_schema(fun_call, schema)) {
return make_lw_shared<column_specification>(
schema.ks_name(),
schema.cf_name(),
::make_shared<column_identifier>("partition key token", true),
long_type);
}
data_type return_type = std::visit(
overloaded_functor{
[](const shared_ptr<db::functions::function>& fun) -> data_type { return fun->return_type(); },
[&](const functions::function_name&) -> data_type {
on_internal_error(expr_logger,
format("get_lhs_receiver: unprepared function call {:debug}", fun_call));
}},
fun_call.func);
return make_lw_shared<column_specification>(
schema.ks_name(), schema.cf_name(),
::make_shared<column_identifier>(format("{:user}", fun_call), true),
return_type);
},
[](const auto& other) -> lw_shared_ptr<column_specification> {
on_internal_error(expr_logger, format("get_lhs_receiver: unexpected expression: {}", other));

View File

@@ -152,7 +152,10 @@ void preliminary_binop_vaidation_checks(const binary_operator& binop) {
}
}
if (is<token>(binop.lhs)) {
// Right now a token() on the LHS means that there's a partition token there.
// In the future with relaxed grammar this might no longer be true and this check will have to be revisisted.
// Moving the check after preparation would break tests and cassandra compatability.
if (is_token_function(binop.lhs)) {
if (binop.op == oper_t::IN) {
throw exceptions::invalid_request_exception("IN cannot be used with the token function");
}
@@ -214,9 +217,9 @@ binary_operator validate_and_prepare_new_restriction(const binary_operator& rest
}
validate_multi_column_relation(lhs_cols, prepared_binop.op);
} else if (auto lhs_token = as_if<token>(&prepared_binop.lhs)) {
} else if (is_token_function(prepared_binop.lhs)) {
// Token restriction
std::vector<const column_definition*> column_defs = to_column_definitions(lhs_token->args);
std::vector<const column_definition*> column_defs = to_column_definitions(as<function_call>(prepared_binop.lhs).args);
validate_token_relation(column_defs, prepared_binop.op, *schema);
} else {
// Anything else

View File

@@ -202,9 +202,10 @@ std::optional<function_name> functions::used_by_user_function(const ut_name& use
}
lw_shared_ptr<column_specification>
functions::make_arg_spec(const sstring& receiver_ks, const sstring& receiver_cf,
functions::make_arg_spec(const sstring& receiver_ks, std::optional<const std::string_view> receiver_cf_opt,
const function& fun, size_t i) {
auto&& name = fmt::to_string(fun.name());
const std::string_view receiver_cf = receiver_cf_opt.has_value() ? *receiver_cf_opt : "<unknown_col_family>";
std::transform(name.begin(), name.end(), name.begin(), ::tolower);
return make_lw_shared<column_specification>(receiver_ks,
receiver_cf,
@@ -322,7 +323,7 @@ functions::get(data_dictionary::database db,
const function_name& name,
const std::vector<shared_ptr<assignment_testable>>& provided_args,
const sstring& receiver_ks,
const sstring& receiver_cf,
std::optional<const std::string_view> receiver_cf,
const column_specification* receiver) {
static const function_name TOKEN_FUNCTION_NAME = function_name::native_function("token");
@@ -332,7 +333,11 @@ functions::get(data_dictionary::database db,
if (name.has_keyspace()
? name == TOKEN_FUNCTION_NAME
: name.name == TOKEN_FUNCTION_NAME.name) {
auto fun = ::make_shared<token_fct>(db.find_schema(receiver_ks, receiver_cf));
if (!receiver_cf.has_value()) {
throw exceptions::invalid_request_exception("functions::get for token doesn't have a known column family");
}
auto fun = ::make_shared<token_fct>(db.find_schema(receiver_ks, *receiver_cf));
validate_types(db, keyspace, fun, provided_args, receiver_ks, receiver_cf);
return fun;
}
@@ -504,7 +509,7 @@ functions::validate_types(data_dictionary::database db,
shared_ptr<function> fun,
const std::vector<shared_ptr<assignment_testable>>& provided_args,
const sstring& receiver_ks,
const sstring& receiver_cf) {
std::optional<const std::string_view> receiver_cf) {
if (provided_args.size() != fun->arg_types().size()) {
throw exceptions::invalid_request_exception(
format("Invalid number of arguments in call to function {}: {:d} required but {:d} provided",
@@ -534,7 +539,7 @@ functions::match_arguments(data_dictionary::database db, const sstring& keyspace
shared_ptr<function> fun,
const std::vector<shared_ptr<assignment_testable>>& provided_args,
const sstring& receiver_ks,
const sstring& receiver_cf) {
std::optional<const std::string_view> receiver_cf) {
if (provided_args.size() != fun->arg_types().size()) {
return assignment_testable::test_result::NOT_ASSIGNABLE;
}

View File

@@ -40,7 +40,7 @@ class functions {
private:
static std::unordered_multimap<function_name, shared_ptr<function>> init() noexcept;
public:
static lw_shared_ptr<column_specification> make_arg_spec(const sstring& receiver_ks, const sstring& receiver_cf,
static lw_shared_ptr<column_specification> make_arg_spec(const sstring& receiver_ks, std::optional<const std::string_view> receiver_cf,
const function& fun, size_t i);
public:
static shared_ptr<function> get(data_dictionary::database db,
@@ -48,7 +48,7 @@ public:
const function_name& name,
const std::vector<shared_ptr<assignment_testable>>& provided_args,
const sstring& receiver_ks,
const sstring& receiver_cf,
std::optional<const std::string_view> receiver_cf,
const column_specification* receiver = nullptr);
template <typename AssignmentTestablePtrRange>
static shared_ptr<function> get(data_dictionary::database db,
@@ -56,7 +56,7 @@ public:
const function_name& name,
AssignmentTestablePtrRange&& provided_args,
const sstring& receiver_ks,
const sstring& receiver_cf,
std::optional<const std::string_view> receiver_cf,
const column_specification* receiver = nullptr) {
const std::vector<shared_ptr<assignment_testable>> args(std::begin(provided_args), std::end(provided_args));
return get(db, keyspace, name, args, receiver_ks, receiver_cf, receiver);
@@ -87,12 +87,12 @@ private:
shared_ptr<function> fun,
const std::vector<shared_ptr<assignment_testable>>& provided_args,
const sstring& receiver_ks,
const sstring& receiver_cf);
std::optional<const std::string_view> receiver_cf);
static assignment_testable::test_result match_arguments(data_dictionary::database db, const sstring& keyspace,
shared_ptr<function> fun,
const std::vector<shared_ptr<assignment_testable>>& provided_args,
const sstring& receiver_ks,
const sstring& receiver_cf);
std::optional<const std::string_view> receiver_cf);
static bool type_equals(const std::vector<data_type>& t1, const std::vector<data_type>& t2);

View File

@@ -32,9 +32,9 @@ operation::set_element::prepare(data_dictionary::database db, const sstring& key
using exceptions::invalid_request_exception;
auto rtype = dynamic_pointer_cast<const collection_type_impl>(receiver.type);
if (!rtype) {
throw invalid_request_exception(format("Invalid operation ({}) for non collection column {}", to_string(receiver), receiver.name()));
throw invalid_request_exception(format("Invalid operation ({}) for non collection column {}", to_string(receiver), receiver.name_as_text()));
} else if (!rtype->is_multi_cell()) {
throw invalid_request_exception(format("Invalid operation ({}) for frozen collection column {}", to_string(receiver), receiver.name()));
throw invalid_request_exception(format("Invalid operation ({}) for frozen collection column {}", to_string(receiver), receiver.name_as_text()));
}
if (rtype->get_kind() == abstract_type::kind::list) {
@@ -47,7 +47,7 @@ operation::set_element::prepare(data_dictionary::database db, const sstring& key
return make_shared<lists::setter_by_index>(receiver, std::move(idx), std::move(lval));
}
} else if (rtype->get_kind() == abstract_type::kind::set) {
throw invalid_request_exception(format("Invalid operation ({}) for set column {}", to_string(receiver), receiver.name()));
throw invalid_request_exception(format("Invalid operation ({}) for set column {}", to_string(receiver), receiver.name_as_text()));
} else if (rtype->get_kind() == abstract_type::kind::map) {
auto key = prepare_expression(_selector, db, keyspace, nullptr, maps::key_spec_of(*receiver.column_specification));
auto mval = prepare_expression(_value, db, keyspace, nullptr, maps::value_spec_of(*receiver.column_specification));
@@ -136,11 +136,11 @@ operation::addition::prepare(data_dictionary::database db, const sstring& keyspa
auto ctype = dynamic_pointer_cast<const collection_type_impl>(receiver.type);
if (!ctype) {
if (!receiver.is_counter()) {
throw exceptions::invalid_request_exception(format("Invalid operation ({}) for non counter column {}", to_string(receiver), receiver.name()));
throw exceptions::invalid_request_exception(format("Invalid operation ({}) for non counter column {}", to_string(receiver), receiver.name_as_text()));
}
return make_shared<constants::adder>(receiver, std::move(v));
} else if (!ctype->is_multi_cell()) {
throw exceptions::invalid_request_exception(format("Invalid operation ({}) for frozen collection column {}", to_string(receiver), receiver.name()));
throw exceptions::invalid_request_exception(format("Invalid operation ({}) for frozen collection column {}", to_string(receiver), receiver.name_as_text()));
}
if (ctype->get_kind() == abstract_type::kind::list) {
@@ -169,14 +169,14 @@ operation::subtraction::prepare(data_dictionary::database db, const sstring& key
auto ctype = dynamic_pointer_cast<const collection_type_impl>(receiver.type);
if (!ctype) {
if (!receiver.is_counter()) {
throw exceptions::invalid_request_exception(format("Invalid operation ({}) for non counter column {}", to_string(receiver), receiver.name()));
throw exceptions::invalid_request_exception(format("Invalid operation ({}) for non counter column {}", to_string(receiver), receiver.name_as_text()));
}
auto v = prepare_expression(_value, db, keyspace, nullptr, receiver.column_specification);
return make_shared<constants::subtracter>(receiver, std::move(v));
}
if (!ctype->is_multi_cell()) {
throw exceptions::invalid_request_exception(
format("Invalid operation ({}) for frozen collection column {}", to_string(receiver), receiver.name()));
format("Invalid operation ({}) for frozen collection column {}", to_string(receiver), receiver.name_as_text()));
}
if (ctype->get_kind() == abstract_type::kind::list) {
@@ -211,9 +211,9 @@ operation::prepend::prepare(data_dictionary::database db, const sstring& keyspac
auto v = prepare_expression(_value, db, keyspace, nullptr, receiver.column_specification);
if (!dynamic_cast<const list_type_impl*>(receiver.type.get())) {
throw exceptions::invalid_request_exception(format("Invalid operation ({}) for non list column {}", to_string(receiver), receiver.name()));
throw exceptions::invalid_request_exception(format("Invalid operation ({}) for non list column {}", to_string(receiver), receiver.name_as_text()));
} else if (!receiver.type->is_multi_cell()) {
throw exceptions::invalid_request_exception(format("Invalid operation ({}) for frozen list column {}", to_string(receiver), receiver.name()));
throw exceptions::invalid_request_exception(format("Invalid operation ({}) for frozen list column {}", to_string(receiver), receiver.name_as_text()));
}
return make_shared<lists::prepender>(receiver, std::move(v));
@@ -296,8 +296,6 @@ operation::set_counter_value_from_tuple_list::prepare(data_dictionary::database
auto clock = value_cast<int64_t>(tuple[2]);
auto value = value_cast<int64_t>(tuple[3]);
using namespace std::rel_ops;
if (id <= last) {
throw marshal_exception(
format("invalid counter id order, {} <= {}",
@@ -343,9 +341,9 @@ operation::element_deletion::affected_column() const {
shared_ptr<operation>
operation::element_deletion::prepare(data_dictionary::database db, const sstring& keyspace, const column_definition& receiver) const {
if (!receiver.type->is_collection()) {
throw exceptions::invalid_request_exception(format("Invalid deletion operation for non collection column {}", receiver.name()));
throw exceptions::invalid_request_exception(format("Invalid deletion operation for non collection column {}", receiver.name_as_text()));
} else if (!receiver.type->is_multi_cell()) {
throw exceptions::invalid_request_exception(format("Invalid deletion operation for frozen collection column {}", receiver.name()));
throw exceptions::invalid_request_exception(format("Invalid deletion operation for frozen collection column {}", receiver.name_as_text()));
}
auto ctype = static_pointer_cast<const collection_type_impl>(receiver.type);
if (ctype->get_kind() == abstract_type::kind::list) {

View File

@@ -58,13 +58,7 @@ public:
return key.key().second;
}
bool operator==(const prepared_cache_key_type& other) const {
return _key == other._key;
}
bool operator!=(const prepared_cache_key_type& other) const {
return !(*this == other);
}
bool operator==(const prepared_cache_key_type& other) const = default;
};
class prepared_statements_cache {

View File

@@ -729,65 +729,17 @@ bool query_processor::has_more_results(::shared_ptr<cql3::internal_query_state>
return false;
}
future<> query_processor::for_each_cql_result(
::shared_ptr<cql3::internal_query_state> state,
std::function<stop_iteration(const cql3::untyped_result_set::row&)>&& f) {
return do_with(seastar::shared_ptr<bool>(), [f, this, state](auto& is_done) mutable {
is_done = seastar::make_shared<bool>(false);
auto stop_when = [is_done]() {
return *is_done;
};
auto do_resuls = [is_done, state, f, this]() mutable {
return this->execute_paged_internal(
state).then([is_done, state, f, this](::shared_ptr<cql3::untyped_result_set> msg) mutable {
if (msg->empty()) {
*is_done = true;
} else {
if (!this->has_more_results(state)) {
*is_done = true;
}
for (auto& row : *msg) {
if (f(row) == stop_iteration::yes) {
*is_done = true;
break;
}
}
}
});
};
return do_until(stop_when, do_resuls);
});
}
future<> query_processor::for_each_cql_result(
::shared_ptr<cql3::internal_query_state> state,
noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set::row&)>&& f) {
// repeat can move the lambda's capture, so we need to hold f and it so the internal loop
// will be able to use it.
return do_with(noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set::row&)>(std::move(f)),
untyped_result_set::rows_type::const_iterator(),
[state, this](noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set::row&)>& f,
untyped_result_set::rows_type::const_iterator& it) mutable {
return repeat([state, &f, &it, this]() mutable {
return this->execute_paged_internal(state).then([state, &f, &it, this](::shared_ptr<cql3::untyped_result_set> msg) mutable {
it = msg->begin();
return repeat_until_value([&it, &f, msg, state, this]() mutable {
if (it == msg->end()) {
return make_ready_future<std::optional<stop_iteration>>(std::optional<stop_iteration>(!this->has_more_results(state)));
}
return f(*it).then([&it, msg](stop_iteration i) {
if (i == stop_iteration::yes) {
return std::optional<stop_iteration>(i);
}
++it;
return std::optional<stop_iteration>();
});
});
});
});
});
do {
auto msg = co_await execute_paged_internal(state);
for (auto& row : *msg) {
if ((co_await f(row)) == stop_iteration::yes) {
co_return;
}
}
} while (has_more_results(state));
}
future<::shared_ptr<untyped_result_set>>
@@ -948,6 +900,9 @@ void query_processor::migration_subscriber::on_update_view(
const sstring& view_name, bool columns_changed) {
}
void query_processor::migration_subscriber::on_update_tablet_metadata() {
}
void query_processor::migration_subscriber::on_drop_keyspace(const sstring& ks_name) {
remove_invalid_prepared_statements(ks_name, std::nullopt);
}

View File

@@ -294,6 +294,8 @@ public:
* page_size - maximum page size
* f - a function to be run on each row of the query result,
* if the function returns stop_iteration::yes the iteration will stop
*
* \note This function is optimized for convenience, not performance.
*/
future<> query_internal(
const sstring& query_string,
@@ -310,6 +312,8 @@ public:
* query_string - the cql string, can contain placeholders
* f - a function to be run on each row of the query result,
* if the function returns stop_iteration::yes the iteration will stop
*
* \note This function is optimized for convenience, not performance.
*/
future<> query_internal(
const sstring& query_string,
@@ -324,6 +328,8 @@ public:
// and schema changes will not be announced to other nodes.
// Because of that, changing global schema state (e.g. modifying non-local tables,
// creating namespaces, etc) is explicitly forbidden via this interface.
//
// note: optimized for convenience, not performance.
future<::shared_ptr<untyped_result_set>> execute_internal(
const sstring& query_string,
db::consistency_level,
@@ -429,18 +435,15 @@ private:
/*!
* \brief run a query using paging
*
* \note Optimized for convenience, not performance.
*/
future<::shared_ptr<untyped_result_set>> execute_paged_internal(::shared_ptr<internal_query_state> state);
/*!
* \brief iterate over all results using paging
*/
future<> for_each_cql_result(
::shared_ptr<cql3::internal_query_state> state,
std::function<stop_iteration(const cql3::untyped_result_set_row&)>&& f);
/*!
* \brief iterate over all results using paging, accept a function that returns a future
*
* \note Optimized for convenience, not performance.
*/
future<> for_each_cql_result(
::shared_ptr<cql3::internal_query_state> state,
@@ -522,6 +525,7 @@ public:
virtual void on_update_function(const sstring& ks_name, const sstring& function_name) override;
virtual void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override;
virtual void on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed) override;
virtual void on_update_tablet_metadata() override;
virtual void on_drop_keyspace(const sstring& ks_name) override;
virtual void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override;

View File

@@ -88,7 +88,8 @@ void with_current_binary_operator(
static std::vector<expr::expression> extract_partition_range(
const expr::expression& where_clause, schema_ptr schema) {
using namespace expr;
struct {
struct extract_partition_range_visitor {
schema_ptr table_schema;
std::optional<expression> tokens;
std::unordered_map<const column_definition*, expression> single_column;
const binary_operator* current_binary_operator = nullptr;
@@ -106,7 +107,11 @@ static std::vector<expr::expression> extract_partition_range(
current_binary_operator = nullptr;
}
void operator()(const token&) {
void operator()(const function_call& token_fun_call) {
if (!is_partition_token_for_schema(token_fun_call, *table_schema)) {
on_internal_error(rlogger, "extract_partition_range(function_call)");
}
with_current_binary_operator(*this, [&] (const binary_operator& b) {
if (tokens) {
tokens = make_conjunction(std::move(*tokens), b);
@@ -159,10 +164,6 @@ static std::vector<expr::expression> extract_partition_range(
on_internal_error(rlogger, "extract_partition_range(column_mutation_attribute)");
}
void operator()(const function_call&) {
on_internal_error(rlogger, "extract_partition_range(function_call)");
}
void operator()(const cast&) {
on_internal_error(rlogger, "extract_partition_range(cast)");
}
@@ -186,7 +187,12 @@ static std::vector<expr::expression> extract_partition_range(
void operator()(const usertype_constructor&) {
on_internal_error(rlogger, "extract_partition_range(usertype_constructor)");
}
} v;
};
extract_partition_range_visitor v {
.table_schema = schema
};
expr::visit(v, where_clause);
if (v.tokens) {
return {std::move(*v.tokens)};
@@ -207,6 +213,7 @@ static std::vector<expr::expression> extract_clustering_prefix_restrictions(
/// Collects all clustering-column restrictions from an expression. Presumes the expression only uses
/// conjunction to combine subexpressions.
struct visitor {
schema_ptr table_schema;
std::vector<expression> multi; ///< All multi-column restrictions.
/// All single-clustering-column restrictions, grouped by column. Each value is either an atom or a
/// conjunction of atoms.
@@ -266,8 +273,13 @@ static std::vector<expr::expression> extract_clustering_prefix_restrictions(
});
}
void operator()(const token&) {
// A token cannot be a clustering prefix restriction
void operator()(const function_call& fun_call) {
if (is_partition_token_for_schema(fun_call, *table_schema)) {
// A token cannot be a clustering prefix restriction
return;
}
on_internal_error(rlogger, "extract_clustering_prefix_restrictions(function_call)");
}
void operator()(const constant&) {}
@@ -280,10 +292,6 @@ static std::vector<expr::expression> extract_clustering_prefix_restrictions(
on_internal_error(rlogger, "extract_clustering_prefix_restrictions(column_mutation_attribute)");
}
void operator()(const function_call&) {
on_internal_error(rlogger, "extract_clustering_prefix_restrictions(function_call)");
}
void operator()(const cast&) {
on_internal_error(rlogger, "extract_clustering_prefix_restrictions(cast)");
}
@@ -307,7 +315,11 @@ static std::vector<expr::expression> extract_clustering_prefix_restrictions(
void operator()(const usertype_constructor&) {
on_internal_error(rlogger, "extract_clustering_prefix_restrictions(usertype_constructor)");
}
} v;
};
visitor v {
.table_schema = schema
};
expr::visit(v, where_clause);
if (!v.multi.empty()) {
@@ -358,7 +370,7 @@ statement_restrictions::statement_restrictions(data_dictionary::database db,
}
}
if (_where.has_value()) {
if (!has_token(_partition_key_restrictions)) {
if (!has_token_restrictions()) {
_single_column_partition_key_restrictions = expr::get_single_column_restrictions_map(_partition_key_restrictions);
}
if (!expr::contains_multi_column_restriction(_clustering_columns_restrictions)) {
@@ -488,7 +500,7 @@ std::pair<std::optional<secondary_index::index>, expr::expression> statement_res
for (const auto& index : sim.list_indexes()) {
auto cdef = _schema->get_column_definition(to_bytes(index.target_column()));
for (const expr::expression& restriction : index_restrictions()) {
if (has_token(restriction) || contains_multi_column_restriction(restriction)) {
if (has_partition_token(restriction, *_schema) || contains_multi_column_restriction(restriction)) {
continue;
}
@@ -516,7 +528,8 @@ bool statement_restrictions::has_eq_restriction_on_column(const column_definitio
std::vector<const column_definition*> statement_restrictions::get_column_defs_for_filtering(data_dictionary::database db) const {
std::vector<const column_definition*> column_defs_for_filtering;
if (need_filtering()) {
auto& sim = db.find_column_family(_schema).get_index_manager();
auto cf = db.find_column_family(_schema);
auto& sim = cf.get_index_manager();
auto opt_idx = std::get<0>(find_idx(sim));
auto column_uses_indexing = [&opt_idx] (const column_definition* cdef, const expr::expression* single_col_restr) {
return opt_idx && single_col_restr && is_supported_by(*single_col_restr, *opt_idx);
@@ -566,7 +579,7 @@ void statement_restrictions::add_restriction(const expr::binary_operator& restr,
} else if (expr::is_multi_column(restr)) {
// Multi column restrictions are only allowed on clustering columns
add_multi_column_clustering_key_restriction(restr);
} else if (has_token(restr)) {
} else if (has_partition_token(restr, *_schema)) {
// Token always restricts the partition key
add_token_partition_key_restriction(restr);
} else if (expr::is_single_column_restriction(restr)) {
@@ -610,7 +623,7 @@ void statement_restrictions::add_single_column_parition_key_restriction(const ex
"Only EQ and IN relation are supported on the partition key "
"(unless you use the token() function or allow filtering)");
}
if (has_token(_partition_key_restrictions)) {
if (has_token_restrictions()) {
throw exceptions::invalid_request_exception(
format("Columns \"{}\" cannot be restricted by both a normal relation and a token relation",
fmt::join(expr::get_sorted_column_defs(_partition_key_restrictions) |
@@ -625,7 +638,7 @@ void statement_restrictions::add_single_column_parition_key_restriction(const ex
}
void statement_restrictions::add_token_partition_key_restriction(const expr::binary_operator& restr) {
if (!partition_key_restrictions_is_empty() && !has_token(_partition_key_restrictions)) {
if (!partition_key_restrictions_is_empty() && !has_token_restrictions()) {
throw exceptions::invalid_request_exception(
format("Columns \"{}\" cannot be restricted by both a normal relation and a token relation",
fmt::join(expr::get_sorted_column_defs(_partition_key_restrictions) |
@@ -736,7 +749,7 @@ void statement_restrictions::process_partition_key_restrictions(bool for_view, b
// - Is it queriable without 2ndary index, which is always more efficient
// If a component of the partition key is restricted by a relation, all preceding
// components must have a EQ. Only the last partition key component can be in IN relation.
if (has_token(_partition_key_restrictions)) {
if (has_token_restrictions()) {
_is_key_range = true;
} else if (expr::is_empty_restriction(_partition_key_restrictions)) {
_is_key_range = true;
@@ -775,7 +788,7 @@ size_t statement_restrictions::partition_key_restrictions_size() const {
bool statement_restrictions::pk_restrictions_need_filtering() const {
return !expr::is_empty_restriction(_partition_key_restrictions)
&& !has_token(_partition_key_restrictions)
&& !has_token_restrictions()
&& (has_partition_key_unrestricted_components() || expr::has_slice_or_needs_filtering(_partition_key_restrictions));
}
@@ -886,7 +899,7 @@ bounds_slice statement_restrictions::get_clustering_slice() const {
bool statement_restrictions::parition_key_restrictions_have_supporting_index(const secondary_index::secondary_index_manager& index_manager,
expr::allow_local_index allow_local) const {
// Token restrictions can't be supported by an index
if (has_token(_partition_key_restrictions)) {
if (has_token_restrictions()) {
return false;
}
@@ -926,8 +939,10 @@ namespace {
using namespace expr;
/// Computes partition-key ranges from token atoms in ex.
dht::partition_range_vector partition_ranges_from_token(const expr::expression& ex, const query_options& options) {
auto values = possible_lhs_values(nullptr, ex, options);
dht::partition_range_vector partition_ranges_from_token(const expr::expression& ex,
const query_options& options,
const schema& table_schema) {
auto values = possible_partition_token_values(ex, options, table_schema);
if (values == expr::value_set(expr::value_list{})) {
return {};
}
@@ -975,7 +990,7 @@ dht::partition_range_vector partition_ranges_from_singles(
for (const auto& e : expressions) {
if (const auto arbitrary_binop = find_binop(e, [] (const binary_operator&) { return true; })) {
if (auto cv = expr::as_if<expr::column_value>(&arbitrary_binop->lhs)) {
const value_set vals = possible_lhs_values(cv->col, e, options);
const value_set vals = possible_column_values(cv->col, e, options);
if (auto lst = std::get_if<value_list>(&vals)) {
if (lst->empty()) {
return {};
@@ -1004,7 +1019,7 @@ dht::partition_range_vector partition_ranges_from_EQs(
std::vector<managed_bytes> pk_value(schema.partition_key_size());
for (const auto& e : eq_expressions) {
const auto col = expr::get_subscripted_column(find(e, oper_t::EQ)->lhs).col;
const auto vals = std::get<value_list>(possible_lhs_values(col, e, options));
const auto vals = std::get<value_list>(possible_column_values(col, e, options));
if (vals.empty()) { // Case of C=1 AND C=2.
return {};
}
@@ -1019,13 +1034,13 @@ dht::partition_range_vector statement_restrictions::get_partition_key_ranges(con
if (_partition_range_restrictions.empty()) {
return {dht::partition_range::make_open_ended_both_sides()};
}
if (has_token(_partition_range_restrictions[0])) {
if (has_partition_token(_partition_range_restrictions[0], *_schema)) {
if (_partition_range_restrictions.size() != 1) {
on_internal_error(
rlogger,
format("Unexpected size of token restrictions: {}", _partition_range_restrictions.size()));
}
return partition_ranges_from_token(_partition_range_restrictions[0], options);
return partition_ranges_from_token(_partition_range_restrictions[0], options, *_schema);
} else if (_partition_range_is_simple) {
// Special case to avoid extra allocations required for a Cartesian product.
return partition_ranges_from_EQs(_partition_range_restrictions, options, *_schema);
@@ -1233,10 +1248,6 @@ struct multi_column_range_accumulator {
on_internal_error(rlogger, "Subscript encountered outside binary operator");
}
void operator()(const token&) {
on_internal_error(rlogger, "Token encountered outside binary operator");
}
void operator()(const unresolved_identifier&) {
on_internal_error(rlogger, "Unresolved identifier encountered outside binary operator");
}
@@ -1340,7 +1351,7 @@ std::vector<query::clustering_range> get_single_column_clustering_bounds(
size_t product_size = 1;
std::vector<std::vector<managed_bytes>> prior_column_values; // Equality values of columns seen so far.
for (size_t i = 0; i < single_column_restrictions.size(); ++i) {
auto values = possible_lhs_values(
auto values = possible_column_values(
&schema.clustering_column_at(i), // This should be the LHS of restrictions[i].
single_column_restrictions[i],
options);
@@ -1410,10 +1421,10 @@ static std::vector<query::clustering_range> get_index_v1_token_range_clustering_
const column_definition& token_column,
const expression& token_restriction) {
// A workaround in order to make possible_lhs_values work properly.
// possible_lhs_values looks at the column type and uses this type's comparator.
// A workaround in order to make possible_column_values work properly.
// possible_column_values looks at the column type and uses this type's comparator.
// This is a problem because when using blob's comparator, -4 is greater than 4.
// This makes possible_lhs_values think that an expression like token(p) > -4 and token(p) < 4
// This makes possible_column_values think that an expression like token(p) > -4 and token(p) < 4
// is impossible to fulfill.
// Create a fake token column with the type set to bigint, translate the restriction to use this column
// and use this restriction to calculate possible lhs values.
@@ -1422,7 +1433,7 @@ static std::vector<query::clustering_range> get_index_v1_token_range_clustering_
expression new_token_restrictions = replace_column_def(token_restriction, &token_column_bigint);
std::variant<value_list, nonwrapping_range<managed_bytes>> values =
possible_lhs_values(&token_column_bigint, new_token_restrictions, options);
possible_column_values(&token_column_bigint, new_token_restrictions, options);
return std::visit(overloaded_functor {
[](const value_list& list) {
@@ -1690,7 +1701,7 @@ bool token_known(const statement_restrictions& r) {
bool statement_restrictions::need_filtering() const {
using namespace expr;
if (_uses_secondary_indexing && has_token(_partition_key_restrictions)) {
if (_uses_secondary_indexing && has_token_restrictions()) {
// If there is a token(p1, p2) restriction, no p1, p2 restrictions are allowed in the query.
// All other restrictions must be on clustering or regular columns.
int64_t non_pk_restrictions_count = clustering_columns_restrictions_size();
@@ -1787,11 +1798,11 @@ void statement_restrictions::prepare_indexed_global(const schema& idx_tbl_schema
const column_definition* token_column = &idx_tbl_schema.clustering_column_at(0);
if (has_token(_partition_key_restrictions)) {
if (has_token_restrictions()) {
// When there is a token(p1, p2) >/</= ? restriction, it is not allowed to have restrictions on p1 or p2.
// This means that p1 and p2 can have many different values (token is a hash, can have collisions).
// Clustering prefix ends after token_restriction, all further restrictions have to be filtered.
expr::expression token_restriction = replace_token(_partition_key_restrictions, token_column);
expr::expression token_restriction = replace_partition_token(_partition_key_restrictions, token_column, *_schema);
_idx_tbl_ck_prefix = std::vector{std::move(token_restriction)};
return;
@@ -1899,7 +1910,7 @@ std::vector<query::clustering_range> statement_restrictions::get_global_index_cl
std::vector<managed_bytes> pk_value(_schema->partition_key_size());
for (const auto& e : _partition_range_restrictions) {
const auto col = expr::as<column_value>(find(e, oper_t::EQ)->lhs).col;
const auto vals = std::get<value_list>(possible_lhs_values(col, e, options));
const auto vals = std::get<value_list>(possible_column_values(col, e, options));
if (vals.empty()) { // Case of C=1 AND C=2.
return {};
}

View File

@@ -181,7 +181,7 @@ public:
}
bool has_token_restrictions() const {
return has_token(_partition_key_restrictions);
return has_partition_token(_partition_key_restrictions, *_schema);
}
// Checks whether the given column has an EQ restriction.
@@ -478,7 +478,7 @@ public:
// If token restrictions are present in an indexed query, then all other restrictions need to be filtered.
// A single token restriction can have multiple matching partition key values.
// Because of this we can't create a clustering prefix with more than token restriction.
|| (_uses_secondary_indexing && has_token(_partition_key_restrictions));
|| (_uses_secondary_indexing && has_token_restrictions());
}
bool clustering_key_restrictions_need_filtering() const;

View File

@@ -10,6 +10,7 @@
#include "selection/selection.hh"
#include "stats.hh"
#include "utils/buffer_view-to-managed_bytes_view.hh"
namespace cql3 {
class untyped_result_set;
@@ -34,10 +35,10 @@ private:
private:
void accept_cell_value(const column_definition& def, query::result_row_view::iterator_type& i) {
if (def.is_multi_cell()) {
_visitor.accept_value(i.next_collection_cell());
_visitor.accept_value(utils::buffer_view_to_managed_bytes_view(i.next_collection_cell()));
} else {
auto cell = i.next_atomic_cell();
_visitor.accept_value(cell ? std::optional<query::result_bytes_view>(cell->value()) : std::optional<query::result_bytes_view>());
_visitor.accept_value(cell ? utils::buffer_view_to_managed_bytes_view(cell->value()) : managed_bytes_view_opt());
}
}
public:
@@ -65,11 +66,11 @@ private:
for (auto&& def : _selection.get_columns()) {
switch (def->kind) {
case column_kind::partition_key:
_visitor.accept_value(query::result_bytes_view(bytes_view(_partition_key[def->component_index()])));
_visitor.accept_value(bytes_view(_partition_key[def->component_index()]));
break;
case column_kind::clustering_key:
if (_clustering_key.size() > def->component_index()) {
_visitor.accept_value(query::result_bytes_view(bytes_view(_clustering_key[def->component_index()])));
_visitor.accept_value(bytes_view(_clustering_key[def->component_index()]));
} else {
_visitor.accept_value(std::nullopt);
}
@@ -92,7 +93,7 @@ private:
auto static_row_iterator = static_row.iterator();
for (auto&& def : _selection.get_columns()) {
if (def->is_partition_key()) {
_visitor.accept_value(query::result_bytes_view(bytes_view(_partition_key[def->component_index()])));
_visitor.accept_value(bytes_view(_partition_key[def->component_index()]));
} else if (def->is_static()) {
accept_cell_value(*def, static_row_iterator);
} else {

View File

@@ -113,14 +113,23 @@ bool result_set::empty() const {
return _rows.empty();
}
void result_set::add_row(std::vector<bytes_opt> row) {
void result_set::add_row(std::vector<managed_bytes_opt> row) {
assert(row.size() == _metadata->value_count());
_rows.emplace_back(std::move(row));
}
void result_set::add_column_value(bytes_opt value) {
void result_set::add_row(std::vector<bytes_opt> row) {
row_type new_row;
new_row.reserve(row.size());
for (auto& bo : row) {
new_row.emplace_back(bo ? managed_bytes_opt(*bo) : managed_bytes_opt());
}
add_row(std::move(new_row));
}
void result_set::add_column_value(managed_bytes_opt value) {
if (_rows.empty() || _rows.back().size() == _metadata->value_count()) {
std::vector<bytes_opt> row;
std::vector<managed_bytes_opt> row;
row.reserve(_metadata->value_count());
_rows.emplace_back(std::move(row));
}
@@ -128,6 +137,10 @@ void result_set::add_column_value(bytes_opt value) {
_rows.back().emplace_back(std::move(value));
}
void result_set::add_column_value(bytes_opt value) {
add_column_value(to_managed_bytes_opt(value));
}
void result_set::reverse() {
std::reverse(_rows.begin(), _rows.end());
}
@@ -146,7 +159,7 @@ const metadata& result_set::get_metadata() const {
return *_metadata;
}
const utils::chunked_vector<std::vector<bytes_opt>>& result_set::rows() const {
const utils::chunked_vector<std::vector<managed_bytes_opt>>& result_set::rows() const {
return _rows;
}

View File

@@ -129,14 +129,14 @@ public:
};
template<typename Visitor>
concept ResultVisitor = requires(Visitor& visitor) {
concept ResultVisitor = requires(Visitor& visitor, managed_bytes_view_opt val) {
visitor.start_row();
visitor.accept_value(std::optional<query::result_bytes_view>());
visitor.accept_value(std::move(val));
visitor.end_row();
};
class result_set {
using col_type = bytes_opt;
using col_type = managed_bytes_opt;
using row_type = std::vector<col_type>;
using rows_type = utils::chunked_vector<row_type>;
@@ -157,8 +157,10 @@ public:
bool empty() const;
void add_row(row_type row);
void add_row(std::vector<bytes_opt> row);
void add_column_value(col_type value);
void add_column_value(bytes_opt value);
void reverse();
@@ -187,7 +189,7 @@ public:
visitor.start_row();
for (auto i = 0u; i < column_count; i++) {
auto& cell = row[i];
visitor.accept_value(cell ? std::optional<query::result_bytes_view>(*cell) : std::optional<query::result_bytes_view>());
visitor.accept_value(cell ? managed_bytes_view_opt(*cell) : managed_bytes_view_opt());
}
visitor.end_row();
}
@@ -204,12 +206,12 @@ public:
: _result(std::move(mtd)) { }
void start_row() { }
void accept_value(std::optional<query::result_bytes_view> value) {
void accept_value(managed_bytes_view_opt value) {
if (!value) {
_current_row.emplace_back();
return;
}
_current_row.emplace_back(value->linearize());
_current_row.emplace_back(value);
}
void end_row() {
_result.add_row(std::exchange(_current_row, { }));

View File

@@ -31,16 +31,17 @@ public:
for (size_t i = 0; i < m; ++i) {
auto&& s = _arg_selectors[i];
s->add_input(rs);
_args[i + 1] = s->get_output();
_args[i + 1] = to_bytes_opt(s->get_output());
s->reset();
}
_accumulator = _aggregate.aggregation_function->execute(_args);
}
virtual bytes_opt get_output() override {
return _aggregate.state_to_result_function
virtual managed_bytes_opt get_output() override {
return to_managed_bytes_opt(
_aggregate.state_to_result_function
? _aggregate.state_to_result_function->execute(std::span(&_accumulator, 1))
: std::move(_accumulator);
: std::move(_accumulator));
}
virtual void reset() override {

View File

@@ -62,12 +62,12 @@ public:
_selected->add_input(rs);
}
virtual bytes_opt get_output() override {
virtual managed_bytes_opt get_output() override {
auto&& value = _selected->get_output();
if (!value) {
return std::nullopt;
}
return get_nth_tuple_element(single_fragmented_view(*value), _field);
return get_nth_tuple_element(managed_bytes_view(*value), _field);
}
virtual data_type get_type() const override {

View File

@@ -38,14 +38,14 @@ public:
virtual void reset() override {
}
virtual bytes_opt get_output() override {
virtual managed_bytes_opt get_output() override {
size_t m = _arg_selectors.size();
for (size_t i = 0; i < m; ++i) {
auto&& s = _arg_selectors[i];
_args[i] = s->get_output();
_args[i] = to_bytes_opt(s->get_output());
s->reset();
}
return fun()->execute(_args);
return to_managed_bytes_opt(fun()->execute(_args));
}
virtual bool requires_thread() const override;

View File

@@ -173,18 +173,6 @@ prepare_selectable(const schema& s, const expr::expression& raw_selectable) {
[&] (const expr::subscript& sub) -> shared_ptr<selectable> {
on_internal_error(slogger, "no way to express 'SELECT a[b]' in the grammar yet");
},
[&] (const expr::token& tok) -> shared_ptr<selectable> {
// expr::token implicitly the partition key as arguments, but
// the selectable equivalent (with_function) needs explicit arguments,
// so construct them here.
auto name = functions::function_name("system", "token");
auto args = boost::copy_range<std::vector<shared_ptr<selectable>>>(
s.partition_key_columns()
| boost::adaptors::transformed([&] (const column_definition& cdef) {
return ::make_shared<selectable_column>(column_identifier(cdef.name(), cdef.name_as_text()));
}));
return ::make_shared<selectable::with_function>(std::move(name), std::move(args));
},
[&] (const expr::unresolved_identifier& ui) -> shared_ptr<selectable> {
return make_shared<selectable_column>(*ui.ident->prepare(s));
},
@@ -260,11 +248,6 @@ selectable_processes_selection(const expr::expression& raw_selectable) {
// so bridge them.
return false;
},
[&] (const expr::token&) -> bool {
// Arguably, should return false, because it only processes the partition key.
// But selectable::with_function considers it true now, so return that.
return true;
},
[&] (const expr::unresolved_identifier& ui) -> bool {
return ui.ident->processes_selection();
},

View File

@@ -122,7 +122,7 @@ public:
protected:
class simple_selectors : public selectors {
private:
std::vector<bytes_opt> _current;
std::vector<managed_bytes_opt> _current;
bool _first = true; ///< Whether the next row we receive is the first in its group.
public:
virtual void reset() override {
@@ -132,7 +132,7 @@ protected:
virtual bool requires_thread() const override { return false; }
virtual std::vector<bytes_opt> get_output_row() override {
virtual std::vector<managed_bytes_opt> get_output_row() override {
return std::move(_current);
}
@@ -234,8 +234,8 @@ protected:
return _factories->does_aggregation();
}
virtual std::vector<bytes_opt> get_output_row() override {
std::vector<bytes_opt> output_row;
virtual std::vector<managed_bytes_opt> get_output_row() override {
std::vector<managed_bytes_opt> output_row;
output_row.reserve(_selectors.size());
for (auto&& s : _selectors) {
output_row.emplace_back(s->get_output());

View File

@@ -52,7 +52,7 @@ public:
*/
virtual void add_input_row(result_set_builder& rs) = 0;
virtual std::vector<bytes_opt> get_output_row() = 0;
virtual std::vector<managed_bytes_opt> get_output_row() = 0;
virtual void reset() = 0;
};
@@ -189,10 +189,10 @@ private:
std::unique_ptr<result_set> _result_set;
std::unique_ptr<selectors> _selectors;
const std::vector<size_t> _group_by_cell_indices; ///< Indices in \c current of cells holding GROUP BY values.
std::vector<bytes_opt> _last_group; ///< Previous row's group: all of GROUP BY column values.
std::vector<managed_bytes_opt> _last_group; ///< Previous row's group: all of GROUP BY column values.
bool _group_began; ///< Whether a group began being formed.
public:
std::optional<std::vector<bytes_opt>> current;
std::optional<std::vector<managed_bytes_opt>> current;
private:
std::vector<api::timestamp_type> _timestamps;
std::vector<int32_t> _ttls;

View File

@@ -49,7 +49,7 @@ public:
* @return the selector output
* @throws InvalidRequestException if a problem occurs while computing the output value
*/
virtual bytes_opt get_output() = 0;
virtual managed_bytes_opt get_output() = 0;
/**
* Returns the <code>selector</code> output type.

View File

@@ -49,7 +49,7 @@ private:
const sstring _column_name;
const uint32_t _idx;
data_type _type;
bytes_opt _current;
managed_bytes_opt _current;
bool _first; ///< Whether the next row we receive is the first in its group.
public:
static ::shared_ptr<factory> new_factory(const sstring& column_name, uint32_t idx, data_type type) {
@@ -74,7 +74,7 @@ public:
}
}
virtual bytes_opt get_output() override {
virtual managed_bytes_opt get_output() override {
return std::move(_current);
}

View File

@@ -21,7 +21,7 @@ class writetime_or_ttl_selector : public selector {
sstring _column_name;
int _idx;
bool _is_writetime;
bytes_opt _current;
managed_bytes_opt _current;
public:
static shared_ptr<selector::factory> new_factory(sstring column_name, int idx, bool is_writetime) {
class wtots_factory : public selector::factory {
@@ -60,25 +60,27 @@ public:
if (_is_writetime) {
int64_t ts = rs.timestamp_of(_idx);
if (ts != api::missing_timestamp) {
_current = bytes(bytes::initialized_later(), 8);
auto i = _current->begin();
auto tmp = bytes(bytes::initialized_later(), 8);
auto i = tmp.begin();
serialize_int64(i, ts);
_current = managed_bytes(tmp);
} else {
_current = std::nullopt;
}
} else {
int ttl = rs.ttl_of(_idx);
if (ttl > 0) {
_current = bytes(bytes::initialized_later(), 4);
auto i = _current->begin();
auto tmp = bytes(bytes::initialized_later(), 4);
auto i = tmp.begin();
serialize_int32(i, ttl);
_current = managed_bytes(tmp);
} else {
_current = std::nullopt;
}
}
}
virtual bytes_opt get_output() override {
virtual managed_bytes_opt get_output() override {
return _current;
}

View File

@@ -60,14 +60,14 @@ future<std::vector<mutation>> alter_type_statement::prepare_announcement_mutatio
auto to_update = all_types.find(_name.get_user_type_name());
// Shouldn't happen, unless we race with a drop
if (to_update == all_types.end()) {
throw exceptions::invalid_request_exception(format("No user type named {} exists.", _name.to_string()));
throw exceptions::invalid_request_exception(format("No user type named {} exists.", _name.to_cql_string()));
}
for (auto&& schema : ks.metadata()->cf_meta_data() | boost::adaptors::map_values) {
for (auto&& column : schema->partition_key_columns()) {
if (column.type->references_user_type(_name.get_keyspace(), _name.get_user_type_name())) {
throw exceptions::invalid_request_exception(format("Cannot add new field to type {} because it is used in the partition key column {} of table {}.{}",
_name.to_string(), column.name_as_text(), schema->ks_name(), schema->cf_name()));
_name.to_cql_string(), column.name_as_text(), schema->ks_name(), schema->cf_name()));
}
}
}
@@ -134,7 +134,7 @@ user_type alter_type_statement::add_or_alter::do_add(data_dictionary::database d
{
if (to_update->idx_of_field(_field_name->name())) {
throw exceptions::invalid_request_exception(format("Cannot add new field {} to type {}: a field of the same name already exists",
_field_name->to_string(), _name.to_string()));
_field_name->to_string(), _name.to_cql_string()));
}
if (to_update->size() == max_udt_fields) {
@@ -147,7 +147,7 @@ user_type alter_type_statement::add_or_alter::do_add(data_dictionary::database d
auto&& add_type = _field_type->prepare(db, keyspace()).get_type();
if (add_type->references_user_type(to_update->_keyspace, to_update->_name)) {
throw exceptions::invalid_request_exception(format("Cannot add new field {} of type {} to type {} as this would create a circular reference",
*_field_name, *_field_type, _name.to_string()));
*_field_name, *_field_type, _name.to_cql_string()));
}
new_types.push_back(std::move(add_type));
return user_type_impl::get_instance(to_update->_keyspace, to_update->_name, std::move(new_names), std::move(new_types), to_update->is_multi_cell());
@@ -157,7 +157,7 @@ user_type alter_type_statement::add_or_alter::do_alter(data_dictionary::database
{
auto idx = to_update->idx_of_field(_field_name->name());
if (!idx) {
throw exceptions::invalid_request_exception(format("Unknown field {} in type {}", _field_name->to_string(), _name.to_string()));
throw exceptions::invalid_request_exception(format("Unknown field {} in type {}", _field_name->to_string(), _name.to_cql_string()));
}
auto previous = to_update->field_types()[*idx];
@@ -194,7 +194,7 @@ user_type alter_type_statement::renames::make_updated_type(data_dictionary::data
auto&& from = rename.first;
auto idx = to_update->idx_of_field(from->name());
if (!idx) {
throw exceptions::invalid_request_exception(format("Unknown field {} in type {}", from->to_string(), _name.to_string()));
throw exceptions::invalid_request_exception(format("Unknown field {} in type {}", from->to_string(), _name.to_cql_string()));
}
new_names[*idx] = rename.second->name();
}

View File

@@ -65,7 +65,8 @@ void cql3::statements::authorization_statement::maybe_correct_resource(auth::res
// This is an "ALL FUNCTIONS IN KEYSPACE" resource.
return;
}
const auto& utm = qp.db().find_keyspace(*keyspace).user_types();
auto ks = qp.db().find_keyspace(*keyspace);
const auto& utm = ks.user_types();
auto function_name = *functions_view.function_name();
auto function_args = functions_view.function_args();
std::vector<data_type> parsed_types;

View File

@@ -90,6 +90,22 @@ create_aggregate_statement::prepare_schema_mutations(query_processor& qp, api::t
co_return std::make_pair(std::move(ret), std::move(m));
}
seastar::future<> create_aggregate_statement::check_access(query_processor &qp, const service::client_state &state) const {
co_await create_function_statement_base::check_access(qp, state);
auto&& ks = _name.has_keyspace() ? _name.keyspace : state.get_keyspace();
create_arg_types(qp);
std::vector<data_type> sfunc_args = _arg_types;
data_type stype = prepare_type(qp, *_stype);
sfunc_args.insert(sfunc_args.begin(), stype);
co_await state.has_function_access(qp.db(), ks, auth::encode_signature(_sfunc,sfunc_args), auth::permission::EXECUTE);
if (_rfunc) {
co_await state.has_function_access(qp.db(), ks, auth::encode_signature(*_rfunc,{stype, stype}), auth::permission::EXECUTE);
}
if (_ffunc) {
co_await state.has_function_access(qp.db(), ks, auth::encode_signature(*_ffunc,{stype}), auth::permission::EXECUTE);
}
}
create_aggregate_statement::create_aggregate_statement(functions::function_name name, std::vector<shared_ptr<cql3_type::raw>> arg_types,
sstring sfunc, shared_ptr<cql3_type::raw> stype, std::optional<sstring> rfunc, std::optional<sstring> ffunc, std::optional<expr::expression> ival, bool or_replace, bool if_not_exists)
: create_function_statement_base(std::move(name), std::move(arg_types), or_replace, if_not_exists)

View File

@@ -26,6 +26,7 @@ namespace statements {
class create_aggregate_statement final : public create_function_statement_base {
virtual std::unique_ptr<prepared_statement> prepare(data_dictionary::database db, cql_stats& stats) override;
future<std::pair<::shared_ptr<cql_transport::event::schema_change>, std::vector<mutation>>> prepare_schema_mutations(query_processor& qp, api::timestamp_type) const override;
virtual future<> check_access(query_processor& qp, const service::client_state& state) const override;
virtual seastar::future<shared_ptr<db::functions::function>> create(query_processor& qp, db::functions::function* old) const override;

View File

@@ -134,7 +134,7 @@ future<std::pair<::shared_ptr<cql_transport::event::schema_change>, std::vector<
_name.get_string_type_name());
} else {
if (!_if_not_exists) {
co_await coroutine::return_exception(exceptions::invalid_request_exception(format("A user type of name {} already exists", _name.to_string())));
co_await coroutine::return_exception(exceptions::invalid_request_exception(format("A user type of name {} already exists", _name.to_cql_string())));
}
}
} catch (data_dictionary::no_such_keyspace& e) {

View File

@@ -52,7 +52,7 @@ drop_keyspace_statement::prepare_schema_mutations(query_processor& qp, api::time
::shared_ptr<cql_transport::event::schema_change> ret;
try {
m = qp.get_migration_manager().prepare_keyspace_drop_announcement(_keyspace, ts);
m = co_await qp.get_migration_manager().prepare_keyspace_drop_announcement(_keyspace, ts);
using namespace cql_transport;
ret = ::make_shared<event::schema_change>(

View File

@@ -56,7 +56,7 @@ void drop_type_statement::validate_while_executing(query_processor& qp) const {
if (_if_exists) {
return;
} else {
throw exceptions::invalid_request_exception(format("No user type named {} exists.", _name.to_string()));
throw exceptions::invalid_request_exception(format("No user type named {} exists.", _name.to_cql_string()));
}
}

View File

@@ -393,7 +393,7 @@ modification_statement::process_where_clause(data_dictionary::database db, expr:
_has_regular_column_conditions = true;
}
}
if (has_token(_restrictions->get_partition_key_restrictions())) {
if (_restrictions->has_token_restrictions()) {
throw exceptions::invalid_request_exception(format("The token function cannot be used in WHERE clauses for UPDATE and DELETE statements: {}",
to_string(_restrictions->get_partition_key_restrictions())));
}

View File

@@ -11,6 +11,7 @@
#include <seastar/core/thread.hh>
#include "auth/service.hh"
#include "db/system_keyspace.hh"
#include "permission_altering_statement.hh"
#include "cql3/functions/functions.hh"
#include "cql3/functions/user_aggregate.hh"
@@ -49,15 +50,10 @@ future<> cql3::statements::permission_altering_statement::check_access(query_pro
return state.ensure_exists(_resource).then([this, &state] {
if (_resource.kind() == auth::resource_kind::functions) {
// Even if the function exists, it may be a builtin function, in which case we disallow altering permissions on it.
// Even if the resource exists, it may be a builtin function or all builtin functions, in which case we disallow altering permissions on it.
auth::functions_resource_view v(_resource);
if (v.function_signature()) {
// If the resource has a signature, it is a specific funciton and not "all functions"
auto [name, function_args] = auth::decode_signature(*v.function_signature());
auto fun = cql3::functions::functions::find(db::functions::function_name{sstring(*v.keyspace()), name}, function_args);
if (fun->is_native()) {
return make_exception_future<>(exceptions::invalid_request_exception("Altering permissions on builtin functions is not supported"));
}
if (v.keyspace() && *v.keyspace() == db::system_keyspace::NAME) {
return make_exception_future<>(exceptions::invalid_request_exception("Altering permissions on builtin functions is not supported"));
}
}
// check that the user has AUTHORIZE permission on the resource or its parents, otherwise reject

View File

@@ -75,7 +75,7 @@ public:
template<typename T>
using compare_fn = std::function<bool(const T&, const T&)>;
using result_row_type = std::vector<bytes_opt>;
using result_row_type = std::vector<managed_bytes_opt>;
using ordering_comparator_type = compare_fn<result_row_type>;
private:
using prepared_orderings_type = std::vector<std::pair<const column_definition*, ordering>>;

View File

@@ -46,6 +46,7 @@
#include "utils/result_combinators.hh"
#include "utils/result_loop.hh"
#include "service/forward_service.hh"
#include "replica/database.hh"
template<typename T = void>
using coordinator_result = cql3::statements::select_statement::coordinator_result<T>;
@@ -560,7 +561,9 @@ indexed_table_select_statement::do_execute_base_query(
auto cmd = prepare_command_for_base_query(qp, options, state, now, bool(paging_state));
auto timeout = db::timeout_clock::now() + get_timeout(state.get_client_state(), options);
uint32_t queried_ranges_count = partition_ranges.size();
query_ranges_to_vnodes_generator ranges_to_vnodes(qp.proxy().get_token_metadata_ptr(), _schema, std::move(partition_ranges));
auto&& table = qp.proxy().local_db().find_column_family(_schema);
auto erm = table.get_effective_replication_map();
query_ranges_to_vnodes_generator ranges_to_vnodes(erm->make_splitter(), _schema, std::move(partition_ranges));
struct base_query_state {
query::result_merger merger;
@@ -873,7 +876,7 @@ primary_key_select_statement::primary_key_select_statement(schema_ptr schema, ui
if (_ks_sel == ks_selector::NONSYSTEM) {
if (_restrictions->need_filtering() ||
_restrictions->partition_key_restrictions_is_empty() ||
(has_token(_restrictions->get_partition_key_restrictions()) &&
(_restrictions->has_token_restrictions() &&
!find(_restrictions->get_partition_key_restrictions(), expr::oper_t::EQ))) {
_range_scan = true;
if (!_parameters->bypass_cache())
@@ -897,7 +900,8 @@ indexed_table_select_statement::prepare(data_dictionary::database db,
cql_stats &stats,
std::unique_ptr<attributes> attrs)
{
auto& sim = db.find_column_family(schema).get_index_manager();
auto cf = db.find_column_family(schema);
auto& sim = cf.get_index_manager();
auto [index_opt, used_index_restrictions] = restrictions->find_idx(sim);
if (!index_opt) {
throw std::runtime_error("No index found.");
@@ -1208,7 +1212,7 @@ query::partition_slice indexed_table_select_statement::get_partition_slice_for_g
partition_slice_builder partition_slice_builder{*_view_schema};
if (!_restrictions->has_partition_key_unrestricted_components()) {
bool pk_restrictions_is_single = !has_token(_restrictions->get_partition_key_restrictions());
bool pk_restrictions_is_single = !_restrictions->has_token_restrictions();
// Only EQ restrictions on base partition key can be used in an index view query
if (pk_restrictions_is_single && _restrictions->partition_key_restrictions_is_all_eq()) {
partition_slice_builder.with_ranges(
@@ -2024,7 +2028,7 @@ static bool needs_allow_filtering_anyway(
const auto& pk_restrictions = restrictions.get_partition_key_restrictions();
// Even if no filtering happens on the coordinator, we still warn about poor performance when partition
// slice is defined but in potentially unlimited number of partitions (see #7608).
if ((expr::is_empty_restriction(pk_restrictions) || has_token(pk_restrictions)) // Potentially unlimited partitions.
if ((expr::is_empty_restriction(pk_restrictions) || restrictions.has_token_restrictions()) // Potentially unlimited partitions.
&& !expr::is_empty_restriction(ck_restrictions) // Slice defined.
&& !restrictions.uses_secondary_indexing()) { // Base-table is used. (Index-table use always limits partitions.)
if (strict_allow_filtering == flag_t::WARN) {

View File

@@ -56,13 +56,7 @@ public:
return size_t(_type);
}
bool operator==(const statement_type& other) const {
return _type == other._type;
}
bool operator!=(const statement_type& other) const {
return !(_type == other._type);
}
bool operator==(const statement_type&) const = default;
friend std::ostream &operator<<(std::ostream &os, const statement_type& t) {
switch (t._type) {

View File

@@ -119,7 +119,7 @@ strongly_consistent_select_statement::execute_without_checking_exception_message
});
if (query_result->value) {
result_set->add_row({ query_result->value });
result_set->add_row({ managed_bytes_opt(query_result->value) });
}
co_return ::make_shared<cql_transport::messages::result_message::rows>(cql3::result{std::move(result_set)});

View File

@@ -85,7 +85,7 @@ private:
uint64_t _query_cnt[(size_t)source_selector::SIZE]
[(size_t)ks_selector::SIZE]
[(size_t)cond_selector::SIZE]
[statements::statement_type::MAX_VALUE + 1] = {0ul};
[statements::statement_type::MAX_VALUE + 1] = {};
};
}

View File

@@ -30,17 +30,17 @@ size_t cql3::untyped_result_set_row::index(const std::string_view& name) const {
bool cql3::untyped_result_set_row::has(std::string_view name) const {
auto i = index(name);
if (i < _data.size()) {
return !std::holds_alternative<std::monostate>(_data.at(i));
return _data.at(i).has_value();
}
return false;
}
cql3::untyped_result_set_row::view_type cql3::untyped_result_set_row::get_view(std::string_view name) const {
return std::visit(make_visitor(
[](std::monostate) -> view_type { throw std::bad_variant_access(); },
[](const view_type& v) -> view_type { return v; },
[](const bytes& b) -> view_type { return view_type(b); }
), _data.at(index(name)));
auto& data = _data.at(index(name));
if (!data) {
throw std::bad_variant_access();
}
return *data;
}
const std::vector<lw_shared_ptr<cql3::column_specification>>& cql3::untyped_result_set_row::get_columns() const {
@@ -74,12 +74,12 @@ struct cql3::untyped_result_set::visitor {
void start_row() {
tmp.reserve(index.size());
}
void accept_value(std::optional<query::result_bytes_view>&& v) {
void accept_value(managed_bytes_view_opt&& v) {
if (v) {
tmp.emplace_back(std::move(*v));
tmp.emplace_back(*v);
} else {
tmp.emplace_back(std::monostate{});
}
tmp.emplace_back(std::nullopt);
}
}
// somewhat weird dispatch, but when visiting directly via
// result_generator, pk:s will be temporary - and sent

View File

@@ -41,13 +41,12 @@ class metadata;
class untyped_result_set_row {
public:
using view_type = query::result_bytes_view;
using view_type = managed_bytes_view;
using opt_view_type = std::optional<view_type>;
using view_holder = std::variant<std::monostate, view_type, bytes>;
private:
friend class untyped_result_set;
using index_map = std::unordered_map<std::string_view, size_t>;
using data_views = std::vector<view_holder>;
using data_views = std::vector<managed_bytes_opt>;
const index_map& _name_to_index;
const cql3::metadata& _metadata;
@@ -62,7 +61,7 @@ public:
bool has(std::string_view) const;
view_type get_view(std::string_view name) const;
bytes get_blob(std::string_view name) const {
return get_view(name).linearize();
return to_bytes(get_view(name));
}
managed_bytes get_blob_fragmented(std::string_view name) const {
return managed_bytes(get_view(name));
@@ -150,6 +149,8 @@ public:
class result_set;
/// A tabular result. Unlike result_set, untyped_result_set is optimized for safety
/// and convenience, not performance.
class untyped_result_set {
public:
using row = untyped_result_set_row;

View File

@@ -39,8 +39,8 @@ sstring ut_name::get_string_type_name() const
return _ut_name->to_string();
}
sstring ut_name::to_string() const {
return (has_keyspace() ? (_ks_name.value() + ".") : "") + _ut_name->to_string();
sstring ut_name::to_cql_string() const {
return (has_keyspace() ? (_ks_name.value() + ".") : "") + _ut_name->to_cql_string();
}
}

Some files were not shown because too many files have changed in this diff Show More