Compare commits

..

156 Commits

Author SHA1 Message Date
Amos Kong
44ec73cfc4 schema.cc/describe: fix invalid compaction options in schema
There is a typo in schema.cql of snapshot, lack of comma after
compaction strategy. It will fail to restore schema by the file.

    AND compaction = {'class': 'SizeTieredCompactionStrategy''max_compaction_threshold': '32'}

map_as_cql_param() function has a `first` parameter to smartly add
comma, the compaction_strategy_options is always not the first.

Fixes #7741

Signed-off-by: Amos Kong <amos@scylladb.com>

Closes #7734

(cherry picked from commit 6b1659ee80)
2021-03-24 12:58:11 +02:00
Tomasz Grabiec
df6f9a200f sstable: writer: ka/la: Write row marker cell after row tombstone
Row marker has a cell name which sorts after the row tombstone's start
bound. The old code was writing the marker first, then the row
tombstone, which is incorrect.

This was harmeless to our sstable reader, which recognized both as
belonging to the current clustering row fragment, and collects both
fine.

However, if both atoms trigger creation of promoted index blocks, the
writer will create a promoted index with entries wich violate the cell
name ordering. It's very unlikely to run into in practice, since to
trigger promoted index entries for both atoms, the clustering key
would be so large so that the size of the marker cell exceeds the
desired promoted index block size, which is 64KB by default (but
user-controlled via column_index_size_in_kb option). 64KB is also the
limit on clustering key size accepted by the system.

This was caught by one of our unit tests:

  sstable_conforms_to_mutation_source_test

...which runs a battery of mutation reader tests with various
desired promoted index block sizes, including the target size of 1
byte, which triggers an entry for every atom.

The test started to fail for some random seeds after commit ecb6abe
inside the
test_streamed_mutation_forwarding_is_consistent_with_slicing test
case, reporting a mutation mismatch in the following line:

    assert_that(*sliced_m).is_equal_to(*fwd_m, slice_with_ranges.row_ranges(*m.schema(), m.key()));

It compares mutations read from the same sstable using different
methods, slicing using clustering key restricitons, and fast
forwarding. The reported mismatch was that fwd_m contained the row
marker, but sliced_m did not. The sstable does contain the marker, so
both reads should return it.

After reverting the commit which introduced dynamic adjustments, the
test passes, but both mutations are missing the marker, both are
wrong!

They are wrong because the promoted index contians entries whose
starting positions violate the ordering, so binary search gets confused
and selects the row tombstone's position, which is emitted after the
marker, thus skipping over the row marker.

The explanation for why the test started to fail after dynamic
adjustements is the following. The promoted index cursor works by
incrementally parsing buffers fed by the file input stream. It first
parses the whole block and then does a binary search within the parsed
array. The entries which cursor touches during binary search depend on
the size of the block read from the file. The commit which enabled
dynamic adjustements causes the block size to be different for
subsequent reads, which allows one of the reads to walk over the
corrupted entries and read the correct data by selecting the entry
corresponding to the row marker.

Fixes #8324
Message-Id: <20210322235812.1042137-1-tgrabiec@scylladb.com>

(cherry picked from commit 9272e74e8c)
2021-03-24 10:42:11 +02:00
Nadav Har'El
2f4a3c271c storage_service: correct missing exception in logging rebuild failure
When failing to rebuild a node, we would print the error with the useless
explanation "<no exception>". The problem was a typo in the logging command
which used std::current_exception() - which wasn't relevant in that point -
instead of "ep".

Refs #8089

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210314113118.1690132-1-nyh@scylladb.com>
(cherry picked from commit d73934372d)
2021-03-21 10:51:36 +02:00
Raphael S. Carvalho
6a11c20b4a LCS: reshape: tolerate more sstables in level 0 with relaxed mode
Relaxed mode, used during initialization, of reshape only tolerates min_threshold
(default: 4) L0 sstables. However, relaxed mode should tolerate more sstables in
level 0, otherwise boot will have to reshape level 0 every time it crosses the
min threshold. So let's make LCS reshape tolerate a max of max_threshold and 32.
This change is beneficial because once table is populated, LCS regular compaction
can decide to merge those sstables in level 0 into level 1 instead, therefore
reducing WA.

Refs #8297.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210318131442.17935-1-raphaelsc@scylladb.com>
(cherry picked from commit e53cedabb1)
2021-03-18 19:20:10 +02:00
Raphael S. Carvalho
cccdd6aaae compaction_manager: Fix performance of cleanup compaction due to unlimited parallelism
Prior to 463d0ab, only one table could be cleaned up at a time on a given shard.
Since then, all tables belonging to a given keyspace are cleaned up in parallel.
Cleanup serialization on each shard was enforced with a semaphore, which was
incorrectly removed by the patch aforementioned.

So space requirement for cleanup to succeed can be up to the size of keyspace,
increasing the chances of node running out of space.

Node could also run out of memory if there are tons of tables in the keyspace.
Memory requirement is at least #_of_tables * 128k (not taking into account write
behind, etc). With 5k tables, it's ~0.64G per shard.

Also all tables being cleaned up in parallel will compete for the same
disk and cpu bandwidth, so making them all much slower, and consequently
the operation time is significantly higher.

This problem was detected with cleanup, but scrub and upgrade go through the
same rewrite procedure, so they're affected by exact the same problem.

Fixes #8247.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210312162223.149993-1-raphaelsc@scylladb.com>
(cherry picked from commit 7171244844)
2021-03-18 14:29:38 +02:00
Raphael S. Carvalho
92871a88c3 compaction: Prevent cleanup and regular from compacting the same sstable
Due to regression introduced by 463d0ab, regular can compact in parallel a sstable
being compacted by cleanup, scrub or upgrade.

This redundancy causes resources to be wasted, write amplification is increased
and so does the operation time, etc.

That's a potential source of data resurrection because the now-owned data from
a sstable being compacted by both cleanup and regular will still exist in the
node afterwards, so resurrection can happen if node regains ownership.

Fixes #8155.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210225172641.787022-1-raphaelsc@scylladb.com>
(cherry picked from commit 2cf0c4bbf1)

Includes fixup patch:

compaction_manager: Fix use-after-free in rewrite_sstables()

Use-after-free introduced by 2cf0c4bbf1.
That's because compacting is moved into then_wrapped() lambda, so it's
potentially freed on the next iteration of repeat().

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210309232940.433490-1-raphaelsc@scylladb.com>
(cherry picked from commit f7cc431477)
2021-03-11 08:24:56 +02:00
Benny Halevy
85bbf6751d repair: repair_writer: do not capture lw_shared_ptr cross-shard
The shared_from_this lw_shared_ptr must not be accessed
across shards.  Capturing it in the lambda passed to
mutation_writer::distribute_reader_and_consume_on_shards
causes exactly that since the captured lw_shared_ptr
is copied on other shards, and ends up in memory corruption
as seen in #7535 (probably due to lw_shared_ptr._count
going out-of-sync when incremented/decremented in parallel
on other shards with no synchronization.

This was introduced in 289a08072a.

The writer is not needed in the body of this lambda anyways
so it doesn't need to capture it.  It is already held
by the continuations until the end of the chain.

Fixes #7535

Test: repair_additional_test:RepairAdditionalTest.repair_disjoint_row_3nodes_diff_shard_count_test (dev)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201104142216.125249-1-bhalevy@scylladb.com>
(cherry picked from commit f93fb55726)
2021-03-03 21:27:44 +02:00
Hagit Segev
0ac069fdcc release: prepare for 4.2.4 2021-03-02 14:52:31 +02:00
Avi Kivity
738f8eaccd Update seastar submodule
* seastar 1266e42c82...0fba7da929 (1):
  > io_queue: Fix "delay" metrics

Fixes #8166.
2021-03-01 13:59:02 +02:00
Avi Kivity
5d32e91e16 Update seastar submodule
* seastar f760efe0a0...1266e42c82 (1):
  > rpc: streaming sink: order outgoing messages

Fixes #7552.
2021-03-01 12:22:17 +02:00
Benny Halevy
6c5f6b3f69 large_data_handler: disable deletion of large data entries
Currently we decide whether to delete large data entries
based on the overall sstable data_size, since the entries
themselves are typically much smaller than the whole sstable
(especially cells and rows), this causes overzealous
deletions (#7668) and inefficiency in the rows cache
due to the large number of range tombstones created.

Refs #7575

Test: sstable_3_x_test(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

This patch is targetted for branch-4.3 or earlier.
In 4.4, the problem was fixed in #7669, but the fix
is out of scope for backporting.

Branch: 4.3
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201203130018.1920271-1-bhalevy@scylladb.com>
(cherry picked from commit bb99d7ced6)
2021-03-01 10:54:33 +02:00
Raphael S. Carvalho
fba26b78d2 sstables: Fix TWCS reshape for windows with at least min_threshold sstables
TWCS reshape was silently ignoring windows which contain at least
min_threshold sstables (can happen with data segregation).
When resizing candidates, size of multi_window was incorrectly used and
it was always empty in this path, which means candidates was always
cleared.

Fixes #8147.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210224125322.637128-1-raphaelsc@scylladb.com>
(cherry picked from commit 21608bd677)
2021-02-28 16:43:02 +02:00
Pavel Solodovnikov
06e785994f large_data_handler: fix segmentation fault when constructing data_value from a nullptr
It turns out that `cql_table_large_data_handler::record_large_rows`
and `cql_table_large_data_handler::record_large_cells` were broken
for reporting static cells and static rows from the very beginning:

In case a large static cell or a large static row is encountered,
it tries to execute `db::try_record` with `nullptr` additional values,
denoting that there is no clustering key to be recorded.

These values are next passed to `qctx.execute_cql()`, which
creates `data_value` instances for each statement parameter,
hence invoking `data_value(nullptr)`.

This uses `const char*` overload which delegates to
`std::string_view` ctor overload. It is UB to pass `nullptr`
pointer to `std::string_view` ctor. Hence leading to
segmentation faults in the aforementioned large data reporting
code.

What we want here is to make a null `data_value` instead, so
just add an overload specifically for `std::nullptr_t`, which
will create a null `data_value` with `text` type.

A regression test is provided for the issue (written in
`cql-pytest` framework).

Tests: test/cql-pytest/test_large_cells_rows.py

Fixes: #6780

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20201223204552.61081-1-pa.solodovnikov@scylladb.com>
(cherry picked from commit 219ac2bab5)
2021-02-23 12:14:12 +02:00
Takuya ASADA
5bc48673aa scylla_util.py: resolve /dev/root to get actual device on aws
When psutil.disk_paritions() reports / is /dev/root, aws_instance mistakenly
reports root partition is part of ephemeral disks, and RAID construction will
fail.
This prevents the error and reports correct free disks.

Fixes #8055

Closes #8040

(cherry picked from commit 32d4ec6b8a)
2021-02-21 16:23:45 +02:00
Nadav Har'El
59a01b2981 alternator: fix ValidationException in FilterExpression - and more
The first condition expressions we implemented in Alternator were the old
"Expected" syntax of conditional updates. That implementation had some
specific assumptions on how it handles errors: For example, in the "LT"
operator in "Expected", the second operand is always part of the query, so
an error in it (e.g., an unsupported type) resulted it a ValidationException
error.

When we implemented ConditionExpression and FilterExpression, we wrongly
used the same functions check_compare(), check_BETWEEN(), etc., to implement
them. This results in some inaccurate error handling. The worst example is
what happens when you use a FilterExpression with an expression such as
"x < y" - this filter is supposed to silently skip items whose "x" and "y"
attributes have unsupported or different types, but in our implementation
a bad type (e.g., a list) for y resulted in a ValidationException which
aborted the entire scan! Interestingly, in once case (that of BEGINS_WITH)
we actually noticed the slightly different behavior needed and implemented
the same operator twice - with ugly code duplication. But in other operators
we missed this problem completely.

This patch first adds extensive tests of how the different expressions
(Expected, QueryFilter, FilterExpression, ConditionExpression) and the
different operators handle various input errors - unsupported types,
missing items, incompatible types, etc. Importantly, the tests demonstrate
that there is often different behavior depending on whether the bad
input comes from the query, or from the item. Some of the new tests
fail before this patch, but others pass and were useful to verify that
the patch doesn't break anything that already worked correctly previously.
As usual, all the tests pass on Cassandra.

Finally, this patch *fixes* all these problems. The comparison functions
like check_compare() and check_BETWEEN() now not only take the operands,
they also take booleans saying if each of the operands came from the
query or from an item. The old-syntax caller (Expected or QueryFilter)
always say that the first operand is from the item and the second is
from the query - but in the new-syntax caller (ConditionExpression or
FilterExpression) any or all of the operands can come from the query
and need verification.

The old duplicated code for check_BEGINS_WITH() - which a TODO to remove
it - is finally removed. Instead we use the same idea of passing booleans
saying if each of its operands came from an item or from the query.

Fixes #8043

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 653610f4bc)
2021-02-21 10:06:50 +02:00
Nadav Har'El
5dd49788c1 alternator: fix UpdateItem ADD for non-existent attribute
UpdateItem's "ADD" operation usually adds elements to an existing set
or adds a number to an existing counter. But it can *also* be used
to create a new set or counter (as if adding to an empty set or zero).

We unfortunately did not have a test for this case (creating a new set
or counter), and when I wrote such a test now, I discovered the
implementation was missing. So this patch adds both the test and the
implementation. The new test used to fail before this patch, and passes
with it - and passes on DynamoDB.

Note that we only had this bug for the newer UpdateItem syntax.
For the old AttributeUpdates syntax, we already support ADD actions
on missing attributes, and already tested it in test_update_item_add().
I just forgot to test the same thing for the newer syntax, so I missed
this bug :-(

Fixes #7763.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201207085135.2551845-1-nyh@scylladb.com>
(cherry picked from commit a8fdbf31cd)
2021-02-21 08:58:49 +02:00
Benny Halevy
56cbc9f3ed stream_session: prepare: fix missing string format argument
As seen in
mv_populating_from_existing_data_during_node_decommission_test dtest:
```
ERROR 2021-02-11 06:01:32,804 [shard 0] stream_session - failed to log message: fmt::v7::format_error (argument not found)
```

Fixes #8067

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210211100158.543952-1-bhalevy@scylladb.com>
(cherry picked from commit d01e7e7b58)
2021-02-14 13:11:43 +02:00
Avi Kivity
7469896017 table: fix on_compaction_completion corrupting _sstables_compacted_but_not_deleted during self-race
on_compaction_completion() updates _sstables_compacted_but_not_deleted
through a temporary to avoid an exception causing a partial update:

  1. copy _sstables_compacted_but_not_deleted to a temporary
  2. update temporary
  3. do dangerous stuff
  4. move temporary to _sstables_compacted_but_not_deleted

This is racy when we have parallel compactions, since step 3 yields.
We can have two invocations running in parallel, taking snapshots
of the same _sstables_compacted_but_not_deleted in step 1, each
modifying it in different ways, and only one of them winning the
race and assigning in step 4. With the right timing we can end
with extra sstables in _sstables_compacted_but_not_deleted.

Before a5369881b3, this was a benign race (only resulting in
deleted file space not being reclaimed until the service is shut
down), but afterwards, extra sstable references result in the service
refusing to shut down. This was observed in database_test in debug
mode, where the race more or less reliably happens for system.truncated.

Fix by using a different method to protect
_sstables_compacted_but_not_deleted. We unconditionally update it,
and also unconditionally fix it up (on success or failure) using
seastar::defer(). The fixup includes a call to rebuild_statistics()
which must happen every time we touch the sstable list.

Ref #7331.
Fixes #8038.

BACKPORT NOTES:
- Turns out this race prevented deletion of expired sstables because the leaked
deleted sstables would be accounted when checking if an expired sstable can
be purged.
- Switch to unordered_set<>::count() as it's not supported by older compilers.

(cherry picked from commit a43d5079f3)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210212203832.45846-1-raphaelsc@scylladb.com>
2021-02-14 11:35:57 +02:00
Piotr Wojtczak
c7e2711dd4 Validate ascii values when creating from CQL
Although the code for it existed already, the validation function
hasn't been invoked properly. This change fixes that, adding
a validating check when converting from text to specific value
type and throwing a marshal exception if some characters
are not ASCII.

Fixes #5421

Closes #7532

(cherry picked from commit caa3c471c0)
2021-02-10 19:37:56 +02:00
Piotr Dulikowski
a2355a35db hinted handoff: use default timeout for sending orphaned hints
This patch causes orphaned hints (hints that were written towards a node
that is no longer their replica) to be sent with a default write
timeout. This is what is currently done for non-orphaned hints.

Previously, the timeout was hardcoded to one hour. This could cause a
long delay while shutting down, as hints manager waits until all ongoing
hint sending operation finish before stopping itself.

Fixes: #7051
(cherry picked from commit b111fa98ca)
2021-02-10 10:15:01 +02:00
Piotr Sarna
9e225ab447 Merge 'select_statement: Fix aggregate results on indexed selects (timeouts fixed) ' from Piotr Grabowski
Overview
Fixes #7355.

Before this changes, there were a few invalid results of aggregates/GROUP BY on tables with secondary indexes (see below).

Unfortunately, it still does NOT fix the problem in issue #7043. Although this PR moves forward fixing of that issue, there is still a bug with `TOKEN(...)` in `WHERE` clauses of indexed selects that is not addressed in this PR. It will be fixed in my next PR.

It does NOT fix the problems in issues #7432, #7431 as those are out-of-scope of this PR and do not affect the correctness of results (only return a too large page).

GROUP BY (first commit)
Before the change, `GROUP BY` `SELECT`s with some `WHERE` restrictions on an indexed column would return invalid results (same grouped column values appearing multiple times):
```
CREATE TABLE ks.t(pk int, ck int, v int, PRIMARY KEY(pk, ck));
CREATE INDEX ks_t on ks.t(v);
INSERT INTO ks.t(pk, ck, v) VALUES (1, 2, 3);
INSERT INTO ks.t(pk, ck, v) VALUES (1, 4, 3);
SELECT pk FROM ks.t WHERE v=3 GROUP BY pk;
 pk
----
  1
  1
```
This is fixed by correctly passing `_group_by_cell_indices` to `result_set_builder`. Fixes the third failing example from issue #7355.

Paging (second commit)
Fixes two issues related to improper paging on indexed `SELECT`s. As those two issues are closely related (fixing one without fixing the other causes invalid results of queries), they are in a single commit (second commit).

The first issue is that when using `slice.set_range`, the existing `_row_ranges` (which specify clustering key prefixes) are not taken into account. This caused the wrong rows to be included in the result, as the clustering key bound was set to a half-open range:
```
CREATE TABLE ks.t(a int, b int, c int, PRIMARY KEY ((a, b), c));
CREATE INDEX kst_index ON ks.t(c);
INSERT INTO ks.t(a, b, c) VALUES (1, 2, 3);
INSERT INTO ks.t(a, b, c) VALUES (1, 2, 4);
INSERT INTO ks.t(a, b, c) VALUES (1, 2, 5);
SELECT COUNT(*) FROM ks.t WHERE c = 3;
 count
-------
     2
```
The second commit fixes this issue by properly trimming `row_ranges`.

The second fixed problem is related to setting the `paging_state` to `internal_options`. It was improperly set to the value just after reading from index, making the base query start from invalid `paging_state`.

The second commit fixes this issue by setting the `paging_state` after both index and base table queries are done. Moreover, the `paging_state` is now set based on `paging_state` of index query and the results of base table query (as base query can return more rows than index query).

The second commit fixes the first two failing examples from issue #7355.

Tests (fourth commit)
Extensively tests queries on tables with secondary indices with  aggregates and `GROUP BY`s.

Tests three cases that are implemented in `indexed_table_select_statement::do_execute` - `partition_slices`,
`whole_partitions` and (non-`partition_slices` and non-`whole_partitions`). As some of the issues found were related to paging, the tests check scenarios where the inserted data is smaller than a page, larger than a page and larger than two pages (and some in-between page boundaries scenarios).

I found all those parameters (case of `do_execute`, number of inserted rows) to have an impact of those fixed bugs, therefore the tests validate a large number of those scenarios.

Configurable internal_paging_size (third commit)
Before this change, internal `page_size` when doing aggregate, `GROUP BY` or nonpaged filtering queries was hard-coded to `DEFAULT_COUNT_PAGE_SIZE` (10,000).  This change adds new internal_paging_size variable, which is configurable by `set_internal_paging_size` and `reset_internal_paging_size` free functions. This functionality is only meant for testing purposes.

Closes #7497

* github.com:scylladb/scylla:
  tests: Add secondary index aggregates tests
  select_statement: Introduce internal_paging_size
  select_statement: Fix paging on indexed selects
  select_statement: Fix GROUP BY on indexed select

(cherry picked from commit 8c645f74ce)
2021-02-08 20:32:36 +02:00
Amnon Heiman
e1205d1d5b API: Fix aggregation in column_familiy
Few method in column_familiy API were doing the aggregation wrong,
specifically, bloom filter disk size.

The issue is not always visible, it happens when there are multiple
filter files per shard.

Fixes #4513

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Closes #8007

(cherry picked from commit 4498bb0a48)
2021-02-08 17:04:27 +02:00
Avi Kivity
a78402efae Merge 'Add waiting for flushes on table drops' from Piotr Sarna
This series makes sure that before the table is dropped, all pending memtable flushes related to its memtables would finish.
Normally, flushes are not problematic in Scylla, because all tables are by default `auto_snapshot=true`, which also implies that a table is flushed before being dropped. However, with `auto_snapshot=false` the flush is not attempted at all. It leads to the following race:
1. Run a node with `auto_snapshot=false`
2. Schedule a memtable flush  (e.g. via nodetool)
3. Get preempted in the middle of the flush
4. Drop the table
5. The flush that already started wakes up and starts operating on freed memory, which causes a segfault

Tests: manual(artificially preempting for a long time in bullet point 2. to ensure that the race occurs; segfaults were 100% reproducible before the series and do not happen anymore after the series is applied)

Fixes #7792

Closes #7798

* github.com:scylladb/scylla:
  database: add flushes to waiting for pending operations
  table: unify waiting for pending operations
  database: add a phaser for flush operations
  database: add waiting for pending streams on table drop

(cherry picked from commit 7636799b18)
2021-02-02 17:23:34 +02:00
Avi Kivity
9fcf790234 row_cache: linearize key in cache_entry::do_read()
do_read() does not linearize cache_entry::_key; this can cause a crash
with keys larger than 13k.

Fixes #7897.

Closes #7898

(cherry picked from commit d508a63d4b)
2021-01-17 09:30:44 +02:00
Hagit Segev
24346215c2 release: prepare for 4.2.3 2021-01-04 19:51:12 +02:00
Benny Halevy
918ec5ecb3 compaction: compaction_writer: destroy shared_sstable after the sstable_writer
sstable_writer may depend on the sstable throughout its whole lifecycle.
If the sstable is freed before the sstable_writer we might hit use-after-free
as in the follwing case:
```
std::_Deque_iterator<sstables::compression::segmented_offsets::bucket, sstables::compression::segmented_offsets::bucket&, sstables::compression::segmented_offsets::bucket*>::operator+=(long) at /usr/include/c++/10/bits/stl_deque.h:240
 (inlined by) std::operator+(std::_Deque_iterator<sstables::compression::segmented_offsets::bucket, sstables::compression::segmented_offsets::bucket&, sstables::compression::segmented_offsets::bucket*> const&, long) at /usr/include/c++/10/bits/stl_deque.h:378
 (inlined by) std::_Deque_iterator<sstables::compression::segmented_offsets::bucket, sstables::compression::segmented_offsets::bucket&, sstables::compression::segmented_offsets::bucket*>::operator[](long) const at /usr/include/c++/10/bits/stl_deque.h:252
 (inlined by) std::deque<sstables::compression::segmented_offsets::bucket, std::allocator<sstables::compression::segmented_offsets::bucket> >::operator[](unsigned long) at /usr/include/c++/10/bits/stl_deque.h:1327
 (inlined by) sstables::compression::segmented_offsets::push_back(unsigned long, sstables::compression::segmented_offsets::state&) at ./sstables/compress.cc:214
sstables::compression::segmented_offsets::writer::push_back(unsigned long) at ./sstables/compress.hh:123
 (inlined by) compressed_file_data_sink_impl<crc32_utils, (compressed_checksum_mode)1>::put(seastar::temporary_buffer<char>) at ./sstables/compress.cc:519
seastar::output_stream<char>::put(seastar::temporary_buffer<char>) at table.cc:?
 (inlined by) seastar::output_stream<char>::put(seastar::temporary_buffer<char>) at ././seastar/include/seastar/core/iostream-impl.hh:432
seastar::output_stream<char>::flush() at table.cc:?
seastar::output_stream<char>::close() at table.cc:?
sstables::file_writer::close() at sstables.cc:?
sstables::mc::writer::~writer() at writer.cc:?
 (inlined by) sstables::mc::writer::~writer() at ./sstables/mx/writer.cc:790
sstables::mc::writer::~writer() at writer.cc:?
flat_mutation_reader::impl::consumer_adapter<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> > >::~consumer_adapter() at compaction.cc:?
 (inlined by) std::_Optional_payload_base<sstables::compaction_writer>::_M_destroy() at /usr/include/c++/10/optional:260
 (inlined by) std::_Optional_payload_base<sstables::compaction_writer>::_M_reset() at /usr/include/c++/10/optional:280
 (inlined by) std::_Optional_payload<sstables::compaction_writer, false, false, false>::~_Optional_payload() at /usr/include/c++/10/optional:401
 (inlined by) std::_Optional_base<sstables::compaction_writer, false, false>::~_Optional_base() at /usr/include/c++/10/optional:474
 (inlined by) std::optional<sstables::compaction_writer>::~optional() at /usr/include/c++/10/optional:659
 (inlined by) sstables::compacting_sstable_writer::~compacting_sstable_writer() at ./sstables/compaction.cc:229
 (inlined by) compact_mutation<(emit_only_live_rows)0, (compact_for_sstables)1, sstables::compacting_sstable_writer, noop_compacted_fragments_consumer>::~compact_mutation() at ././mutation_compactor.hh:468
 (inlined by) compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer>::~compact_for_compaction() at ././mutation_compactor.hh:538
 (inlined by) std::default_delete<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >::operator()(compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer>*) const at /usr/include/c++/10/bits/unique_ptr.h:85
 (inlined by) std::unique_ptr<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer>, std::default_delete<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> > >::~unique_ptr() at /usr/include/c++/10/bits/unique_ptr.h:361
 (inlined by) stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >::~stable_flattened_mutations_consumer() at ././mutation_reader.hh:342
 (inlined by) flat_mutation_reader::impl::consumer_adapter<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> > >::~consumer_adapter() at ././flat_mutation_reader.hh:201
auto flat_mutation_reader::impl::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >, flat_mutation_reader::no_filter>(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >, flat_mutation_reader::no_filter, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ././flat_mutation_reader.hh:272
 (inlined by) auto flat_mutation_reader::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >, flat_mutation_reader::no_filter>(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >, flat_mutation_reader::no_filter, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ././flat_mutation_reader.hh:383
 (inlined by) auto flat_mutation_reader::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> > >(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ././flat_mutation_reader.hh:389
 (inlined by) seastar::future<void> sstables::compaction::setup<noop_compacted_fragments_consumer>(noop_compacted_fragments_consumer)::{lambda(flat_mutation_reader)#1}::operator()(flat_mutation_reader)::{lambda()#1}::operator()() at ./sstables/compaction.cc:612
```

What happens here is that:

    compressed_file_data_sink_impl(output_stream<char> out, sstables::compression* cm, sstables::local_compression lc)
            : _out(std::move(out))
            , _compression_metadata(cm)
            , _offsets(_compression_metadata->offsets.get_writer())
            , _compression(lc)
            , _full_checksum(ChecksumType::init_checksum())

_compression_metadata points to a buffer held by the sstable object.
and _compression_metadata->offsets.get_writer returns a writer that keeps
a reference to the segmented_offsets in the sstables::compression
that is used in the ~writer -> close path.

Fixes #7821

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201227145726.33319-1-bhalevy@scylladb.com>
(cherry picked from commit 8a745a0ee0)
2021-01-04 15:04:34 +02:00
Avi Kivity
7457683328 Revert "Merge 'Move temporaries to value view' from Piotr S"
This reverts commit d1fa0adcbe. It causes
a regression when processing some bind variables.

Fixes #7761.
2020-12-24 12:40:46 +02:00
Gleb Natapov
567889d283 mutation_writer: pass exceptions through feed_writer
feed_writer() eats exception and transforms it into an end of stream
instead. Downstream validators hate when this happens.

Fixes #7482
Message-Id: <20201216090038.GB3244976@scylladb.com>

(cherry picked from commit 61520a33d6)
2020-12-16 17:20:11 +02:00
Aleksandr Bykov
c605ed73bf dist: scylla_util: fix aws_instance.ebs_disks method
aws_instance.ebs_disks() method should return ebs disk
instead of ephemeral

Signed-off-by: Aleksandr Bykov <alex.bykov@scylladb.com>

Closes #7780

(cherry picked from commit e74dc311e7)
2020-12-16 11:58:47 +02:00
Takuya ASADA
d0530d8ac2 node_exporter_install: stop service before force installing
Stop node-exporter.service before re-install it, to avoid 'Text file busy' error.

Fixes #6782

(cherry picked from commit ef05ea8e91)
2020-12-15 16:28:25 +02:00
Hagit Segev
696ef24226 release: prepare for 4.2.2 2020-12-13 20:34:03 +02:00
Avi Kivity
b8fe144301 dist: rpm: uninstall tuned when installing scylla-kernel-conf
tuned 2.11.0-9 and later writes to kerned.sched_wakeup_granularity_ns
and other sysctl tunables that we so laboriously tuned, dropping
performance by a factor of 5 (due to increased latency). Fix by
obsoleting tuned during install (in effect, we are a better tuned,
at least for us).

Not needed for .deb, since debian/ubunto do not install tuned by
default.

Fixes #7696

Closes #7776

(cherry picked from commit 615b8e8184)
2020-12-12 14:30:38 +02:00
Nadav Har'El
62f783be87 alternator: fix broken Scan/Query paging with bytes keys
When an Alternator table has partition keys or sort keys of type "bytes"
(blobs), a Scan or Query which required paging used to fail - we used
an incorrect function to output LastEvaluatedKey (which tells the user
where to continue at the next page), and this incorrect function was
correct for strings and numbers - but NOT for bytes (for bytes, we
need to encode them as base-64).

This patch also includes two tests - for bytes partition key and
for bytes sort key - that failed before this patch and now pass.
The test test_fetch_from_system_tables also used to fail after a
Limit was added to it, because one of the tables it scans had a bytes
key. That test is also fixed by this patch.

Fixes #7768

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201207175957.2585456-1-nyh@scylladb.com>
(cherry picked from commit 86779664f4)
2020-12-09 15:16:41 +02:00
Piotr Sarna
863e784951 db: fix getting local ranges for size estimates table
When getting local ranges, an assumption is made that
if a range does not contain an end or when its end is a maximum token,
then it must contain a start. This assumption proven not true
during manual tests, so it's now fortified with an additional check.

Here's a gdb output for a set of local ranges which causes an assertion
failure when calling `get_local_ranges` on it:

(gdb) p ranges
$1 = std::vector of length 2, capacity 2 = {{_interval = {_start = std::optional<interval_bound<dht::token>> = {[contained value] = {_value = {_kind = dht::token_kind::before_all_keys,
            _data = 0}, _inclusive = false}}, _end = std::optional<interval_bound<dht::token>> [no contained value], _singular = false}}, {_interval = {
      _start = std::optional<interval_bound<dht::token>> [no contained value], _end = std::optional<interval_bound<dht::token>> = {[contained value] = {_value = {
            _kind = dht::token_kind::before_all_keys, _data = 0}, _inclusive = true}}, _singular = false}}}

Closes #7764

(cherry picked from commit 1cc4ed50c1)
2020-12-09 15:16:14 +02:00
Nadav Har'El
e5a6199b4d alternator, test: make test_fetch_from_system_tables faster
The test test_fetch_from_system_tables tests Alternator's system-table
feature by reading from all system tables. The intention was to confirm
we don't crash reading any of them - as they have different schemas and
can run into different problems (we had such problems in the initial
implementation). The intention was not to read *a lot* from each table -
we only make a single "Scan" call on each, to read one page of data.
However, the Scan call did not set a Limit, so the single page can get
pretty big.

This is not normally a problem, but in extremely slow runs - such as when
running the debug build on an extremely overcommitted test machine (e.g.,
issue #7706) reading this large page may take longer than our default
timeout. I'll send a separate patch for the timeout issue, but for now,
there is really no reason why we need to read a big page. It is good
enough to just read 50 rows (with Limit=50). This will still read all
the different types and make the test faster.

As an example, in the debug run on my laptop, this test spent 2.4
seconds to read the "compaction_history" table before this patch,
and only 0.1 seconds after this patch. 2.4 seconds is close to our
default timeout (10 seconds), 0.1 is very far.

Fixes #7706

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201207075112.2548178-1-nyh@scylladb.com>
(cherry picked from commit 220d6dde17)
2020-12-09 15:15:15 +02:00
Nadav Har'El
abaf6c192a alternator: fix query with both projection and filtering
We had a bug when a Query/Scan had both projection (ProjectionExpression
or AttributesToGet) and filtering (FilterExpression or Query/ScanFilter).
The problem was that projection left only the requested attributes, and
the filter might have needed - and not got - additional attributes.

The solution in this patch is to add the generated JSON item also
the extra attributes needed by filtering (if any), run the filter on
that, and only at the end remove the extra filtering attributes from
the item to be returned.

The two tests

 test_query_filter.py::test_query_filter_and_attributes_to_get
 test_filter_expression.py::test_filter_expression_and_projection_expression

Which failed before this patch now pass so we drop their "xfail" tag.

Fixes #6951.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 282742a469)
2020-12-09 14:39:17 +02:00
Eliran Sinvani
ef2f5ed434 consistency level: fix wrong quorum calculation whe RF = 0
We used to calculate the number of endpoints for quorum and local_quorum
unconditionally as ((rf / 2) + 1). This formula doesn't take into
account the corner case where RF = 0, in this situation quorum should
also be 0.
This commit adds the missing corner case.

Tests: Unit Tests (dev)
Fixes #6905

Closes #7296

(cherry picked from commit 925cdc9ae1)
2020-11-29 16:45:14 +02:00
Raphael S. Carvalho
bac40e2512 sstable_directory: Fix 50% space requirement for resharding
This is a regression caused by aebd965f0.

After the sstable_directory changes, resharding now waits for all sstables
to be exhausted before releasing reference to them, which prevents their
resources like disk space and fd from being released. Let's restore the
old behavior of incrementally releasing resources, reducing the space
requirement significantly.

Fixes #7463.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20201020140939.118787-1-raphaelsc@scylladb.com>
(cherry picked from commit 6f805bd123)
2020-11-29 15:26:14 +02:00
Asias He
681c4d77bb repair: Make repair_writer a shared pointer
The future of the fiber that writes data into sstables inside
the repair_writer is stored in _writer_done like below:

class repair_writer {
   _writer_done[node_idx] =
      mutation_writer::distribute_reader_and_consume_on_shards().then([this] {
         ...
      }).handle_exception([this] {
         ...
      });
}

The fiber access repair_writer object in the error handling path. We
wait for the _writer_done to finish before we destroy repair_meta
object which contains the repair_writer object to avoid the fiber
accessing already freed repair_writer object.

To be safer, we can make repair_writer a shared pointer and take a
reference in the distribute_reader_and_consume_on_shards code path.

Fixes #7406

Closes #7430

(cherry picked from commit 289a08072a)
2020-11-29 13:30:49 +02:00
Pavel Emelyanov
8572ee9da2 query_pager: Fix continuation handling for noop visitor
Before updating the _last_[cp]key (for subsequent .fetch_page())
the pager checks is 'if the pager is not exhausted OR the result
has data'.

The check seems broken: if the pager is not exhausted, but the
result is empty the call for keys will unconditionally try to
reference the last element from empty vector. The not exhausted
condition for empty result can happen if the short_read is set,
which, in turn, unconditionally happens upon meeting partition
end when visiting the partition with result builder.

The correct check should be 'if the pager is not exhausted AND
the result has data': the _last_[pc]key-s should be taken for
continuation (not exhausted), but can be taken if the result is
not empty (has data).

fixes: #7263
tests: unit(dev), but tests don't trigger this corner case

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200921124329.21209-1-xemul@scylladb.com>
(cherry picked from commit 550fc734d9)
2020-11-29 12:01:37 +02:00
Takuya ASADA
95dbac56e5 install.sh: set PATH for relocatable CLI tools in python thunk
We currently set PATH for relocatable CLI tools in scylla_util.run() and
scylla_util.out(), but it doesn't work for perftune.py, since it's not part of
Scylla, does not use scylla_util module.
We can set PATH in python thunk instead, it can set PATH for all python scripts.

Fixes #7350

(cherry picked from commit 5867af4edd)
2020-11-29 11:54:42 +02:00
Bentsi Magidovich
eeadeff0dc scylla_util.py: fix exception handling in curl
Retry mechanism didn't work when URLError happend. For example:

  urllib.error.URLError: <urlopen error [Errno 101] Network is unreachable>

Let's catch URLError instead of HTTP since URLError is a base exception
for all exceptions in the urllib module.

Fixes: #7569

Closes #7567

(cherry picked from commit 956b97b2a8)
2020-11-29 11:48:30 +02:00
Takuya ASADA
62f3caab18 dist/redhat: packaging dependencies.conf as normal file, not ghost
When we introduced dependencies.conf, we mistakenly added it on rpm as %ghost,
but it should be normal file, should be installed normally on package installation.

Fixes #7703

Closes #7704

(cherry picked from commit ba4d54efa3)
2020-11-29 11:40:22 +02:00
Takuya ASADA
1a4869231a install.sh: apply sysctl.d files on non-packaging installation
We don't apply sysctl.d files on non-packaging installation, apply them
just like rpm/deb taking care of that.

Fixes #7702

Closes #7705

(cherry picked from commit 5f81f97773)
2020-11-29 11:35:37 +02:00
Avi Kivity
3568d0cbb6 dist: sysctl: configure more inotify instances
Since f3bcd4d205 ("Merge 'Support SSL Certificate Hot
Reloading' from Calle"), we reload certificates as they are
modified on disk. This uses inotify, which is limited by a
sysctl fs.inotify.max_user_instances, with a default of 128.

This is enough for 64 shards only, if both rpc and cql are
encrypted; above that startup fails.

Increase to 1200, which is enough for 6 instances * 200 shards.

Fixes #7700.

Closes #7701

(cherry picked from commit 390e07d591)
2020-11-29 11:04:45 +02:00
Raphael S. Carvalho
030c2e3270 compaction: Make sure a partition is filtered out only by producer
If interposer consumer is enabled, partition filtering will be done by the
consumer instead, but that's not possible because only the producer is able
to skip to the next partition if the current one is filtered out, so scylla
crashes when that happens with a bad function call in queue_reader.
This is a regression which started here: 55a8b6e3c9

To fix this problem, let's make sure that partition filtering will only
happen on the producer side.

Fixes #7590.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20201111221513.312283-1-raphaelsc@scylladb.com>
(cherry picked from commit 13fa2bec4c)
2020-11-19 14:08:25 +02:00
Piotr Dulikowski
37a5e9ab15 hints: don't read hint files when it's not allowed to send
When there are hint files to be sent and the target endpoint is DOWN,
end_point_hints_manager works in the following loop:

- It reads the first hint file in the queue,
- For each hint in the file it decides that it won't be sent because the
  target endpoint is DOWN,
- After realizing that there are some unsent hints, it decides to retry
  this operation after sleeping 1 second.

This causes the first segment to be wholly read over and over again,
with 1 second pauses, until the target endpoint becomes UP or leaves the
cluster. This causes unnecessary I/O load in the streaming scheduling
group.

This patch adds a check which prevents end_point_hints_manager from
reading the first hint file at all when it is not allowed to send hints.

First observed in #6964

Tests:
- unit(dev)
- hinted handoff dtests

Closes #7407

(cherry picked from commit 77a0f1a153)
2020-11-16 14:30:07 +02:00
Botond Dénes
a15b5d514d mutation_reader: queue_reader: don't set EOS flag on abort
If the consumer happens to check the EOS flag before it hits the
exception injected by the abort (by calling fill_buffer()), they can
think the stream ended normally and expect it to be valid. However this
is not guaranteed when the reader is aborted. To avoid consumers falsely
thinking the stream ended normally, don't set the EOS flag on abort at
all.

Additionally make sure the producer is aborted too on abort. In theory
this is not needed as they are the one initiating the abort, but better
to be safe then sorry.

Fixes: #7411
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20201102100732.35132-1-bdenes@scylladb.com>
(cherry picked from commit f5323b29d9)
2020-11-15 11:07:38 +02:00
Botond Dénes
064f8f8bcf types: validate(): linearize values lazily
Instead of eagerly linearizing all values as they are passed to
validate(), defer linearization to those validators that actually need
linearized values. Linearizing large values puts pressure on the memory
allocator with large contiguous allocation requests. This is something
we are trying to actively avoid, especially if it is not really neaded.
Turns out the types, whose validators really want linearized values are
a minority, as most validators just look at the size of the value, and
some like bytes don't need validation at all, while usually having large
values.

This is achieved by templating the validator struct on the view and
using the FragmentedRange concept to treat all passed in views
(`bytes_view` and `fragmented_temporary_buffer_view`) uniformly.
This patch makes no attempt at converting existing validators to work
with fragmented buffers, only trivial cases are converted. The major
offenders still left are ascii/utf8 and collections.

Fixes: #7318

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20201007054524.909420-1-bdenes@scylladb.com>
(cherry picked from commit db56ae695c)
2020-11-11 10:55:54 +02:00
Amnon Heiman
04fe0a7395 scyllatop/livedata.py: Safe iteration over metrics
This patch change the code that iterates over the metrics to use a copy
of the metrics names to make it safe to remove the metrics from the
metrics object.

Fixes #7488

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 52db99f25f)
2020-11-08 19:16:13 +02:00
Calle Wilund
ef26d90868 partition_version: Change range_tombstones() to return chunked_vector
Refs #7364

The number of tombstones can be large. As a stopgap measure to
just returning a source range (with keepalive), we can at least
alleviate the problem by using a chunked vector.

Closes #7433

(cherry picked from commit 4b65d67a1a)
2020-11-08 14:38:32 +02:00
Tomasz Grabiec
790f51c210 sstables: ka/la: Fix abort when next_partition() is called with certain reader state
Cleanup compaction is using consume_pausable_in_thread() to skip over
disowned partitions, which uses flat_mutation_reader::next_partition().

The implementation of next_partition() for the sstable reader has a
bug which may cause the following assertion failure:

  scylla: sstables/mp_row_consumer.hh:422: row_consumer::proceed sstables::mp_row_consumer_k_l::flush(): Assertion `!_ready' failed.

This happens when the sstable reader's buffer gets full when we reach
the partition end. The last fragment of the partition won't be pushed
into the buffer but will stay in the _ready variable. When
next_partition() is called in this state, _ready will not be cleared
and the fragment will be carried over to the next partition. This will
cause assertion failure when the reader attempts to emit the first
fragment of the next partition.

The fix is to clear _ready when entering a partition, just like we
clear _range_tombstones there.

Fixes #7553.
Message-Id: <1604534702-12777-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit fb9b5cae05)
2020-11-08 14:25:47 +02:00
Yaron Kaikov
4fb8ebccff release: prepare for 4.2.1 2020-11-08 12:41:06 +02:00
Avi Kivity
d1fa0adcbe Merge 'Move temporaries to value view' from Piotr S
"
Issue https://github.com/scylladb/scylla/issues/7019 describes a problem of an ever-growing map of temporary values stored in query_options. In order to mitigate this kind of problems, the storage for temporary values is moved from an external data structure to the value views itself. This way, the temporary lives only as long as it's accessible and is automatically destroyed once a request finishes. The downside is that each temporary is now allocated separately, while previously they were bundled in a single byte stream.

Tests: unit(dev)
Fixes https://github.com/scylladb/scylla/issues/7019
"

7055297649 ("cql3: remove query_options::linearize and _temporaries")
is reverted from this backport since linearize() is still used in
this branch.

* psarna-move_temporaries_to_value_view:
  cql3: remove query_options::linearize and _temporaries
  cql3: remove make_temporary helper function
  cql3: store temporaries in-place instead of in query_options
  cql3: add temporary_value to value view
  cql3: allow moving data out of raw_value
  cql3: split values.hh into a .cc file

(cherry picked from commit 2b308a973f)
2020-11-05 19:24:23 +02:00
Piotr Sarna
46b56d885e schema_tables: fix fixing old secondary index schemas
Old secondary index schemas did not have their idx_token column
marked as computed, and there already exists code which updates
them. Unfortunately, the fix itself contains an error and doesn't
fire if computed columns are not yet supported by the whole cluster,
which is a very common situation during upgrades.

Fixes #7515

Closes #7516

(cherry picked from commit b66c285f94)
2020-11-05 17:53:08 +02:00
Yaron Kaikov
94597e38e2 release: prepare for 4.2.0 2020-10-25 09:12:38 +02:00
Piotr Sarna
c74ba1bc36 Merge 'Backport PR #7469 to 4.2' from Eliran Sinvani
This is a backport of PR #7469 that did not apply cleanly to 4.2 with a trivial conflict, another commit that touched one of the files but in a completely different region.

Closes #7480

* github.com:scylladb/scylla:
  materialized views: add a base table reference if missing
  view info: support partial match between base and view for only reading from view.
  view info: guard against null dereference of the base info
2020-10-23 17:18:02 +02:00
Eliran Sinvani
06cfc63c59 materialized views: add a base table reference if missing
schema pointers can be obtained from two distinct entities,
one is the database, those schema are obtained from the table
objects and the other is from the schema registry.
When a schema or a new schema is attached to a table object that
represents a base table for views, all of the corresponding attached
view schemas are guarantied to have their base info in sync.
However if an older schema is inserted into the registry by the
migratrion manager i.e loaded from other node, it will be
missing this info.
This becomes a problem when this schema is published through the
schema registry as it can be obtained for an obsolete read command
for example and then eventually cause a segmentation fault by null
dereferencing the _base_info ptr.

Refs #7420
2020-10-23 18:09:45 +03:00
Eliran Sinvani
56d25930ec view info: support partial match between base and view for
only reading from view.

The current implementation of materialized views does
no keep the version to which a specific version of materialized
view schema corresponds to. This complicate things especially on
old views versions that the schema doesn't support anymore. However,
the views, being also an independent table should allow reading from
them as long as they exist even if the base table changed since then.
For the reading purpose, we don't need to know the exact composition
of view primary key columns that are not part of the base primary
key, we only need to know that there are any, and this is a much
looser constrain on the schema.
We can rely on a table invariants such as the fact that pk columns are
not going to disappear on newer version of the table.
This means that if we don't find a view column in the base table, it is
not a part of the base table primary key.
This information is enough for us to perform read on the view.
This commit adds support for being able to rely on such partial
information along with a validation that it is not going to be used for
writes. If it is, we simply abort since this means that our schema
integrity is compromised.
2020-10-23 18:08:56 +03:00
Eliran Sinvani
fa1cd048d7 view info: guard against null dereference of the base info
The change's purpose is to guard against segfault that is the
result of dereferencing the _base_info member when it is
uninitialized. We already know this can happen (#7420).
The only purpose of this change is to treat this condition as
an internal error, the reason is that it indicates a schema integrity
problem.
Besides this change, other measures should be taken to ensure that
the _base_table member is initialized before calling methods that
rely on it.
We call the internal_error as a last resort.
2020-10-23 18:08:56 +03:00
Nadav Har'El
94b754eee5 alternator: change name of Alternator's SSL options
When Alternator is enabled over HTTPS - by setting the
"alternator_https_port" option - it needs to know some SSL-related options,
most importantly where to pick up the certificate and key.

Before this patch, we used the "server_encryption_options" option for that.
However, this was a mistake: Although it sounds like these are the "server's
options", in fact prior to Alternator this option was only used when
communicating with other servers - i.e., connections between Scylla nodes.
For CQL connections with the client, we used a different option -
"client_encryption_options".

This patch introduces a third option "alternator_encryption_options", which
controls only Alternator's HTTPS server. Making it separate from the
existing CQL "client_encryption_options" allows both Alternator and CQL to
be active at the same time but with different certificates (if the user
so wishes).

For backward compatibility, we temporarily continue to allow
server_encryption_options to control the Alternator HTTPS server if
alternator_encryption_options is not specified. However, this generates
a warning in the log, urging the user to switch. This temporary workaround
should be removed in a future version.

This patch also:
1. fixes the test run code (which has an "--https" option to test over
   https) to use the new name of the option.
2. Adds documentation of the new option in alternator.md and protocols.md -
   previously the information on how to control the location of the
   certificate was missing from these documents.

Fixes #7204.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200930123027.213587-1-nyh@scylladb.com>
(cherry picked from commit 509a41db04)
2020-10-18 18:05:04 +03:00
Takuya ASADA
e02defb3bf install.sh: set LC_ALL=en_US.UTF-8 on python3 thunk
scylla-python3 causes segfault when non-default locale specified.
As workaround for this, we need to set LC_ALL=en_US.UTF_8 on python3 thunk.

Fixes #7408

Closes #7414

(cherry picked from commit ff129ee030)
2020-10-18 15:02:32 +03:00
Botond Dénes
95e712e244 reader_permit: reader_resources: make true RAII class
Currently in all cases we first deduct the to-be-consumed resources,
then construct the `reader_resources` class to protect it (release it on
destruction). This is error prone as it relies on no exception being
thrown while constructing the `reader_resources`. Albeit the
`reader_resources` constructor is `noexcept` right now this might change
in the future and as the call sites relying on this are disconnected
from the declaration, the one modifying them might not notice.
To make this safe going forward, make the `reader_resources` a true RAII
class, consuming the units in its constructor and releasing them in its
destructor.

Fixes: #7256

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200922150625.1253798-1-bdenes@scylladb.com>
(cherry picked from commit a0107ba1c6)
2020-10-18 15:00:47 +03:00
Avi Kivity
a9109f068c Update seastar submodule
* seastar 61b88d1da4...f760efe0a0 (1):
  > append_challenged_posix_file_impl: allow destructing file with no queued work

Fixes #7285.
2020-10-12 15:11:46 +03:00
Piotr Sarna
2e71779970 Merge 'Fix view_builder lockup and crash on shutdown' from Pavel
The lockup:

When view_builder starts all shards at some point get to a
barrier waiting for each other to pass. If any shard misses
this checkpoint, all others stuck forever. As this barrier
lives inside the _started future, which in turn is waited
on stop, the stop stucks as well.

Reasons to miss the barrier -- exception in the middle of the
fun^w start or explicit abort request while waiting for the
schema agreement.

Fix the "exception" case by unlocking the barrier promise with
exception and fix the "abort request" case by turning it into
an exception.

The bug can be reproduced by hands if making one shard never
see the schema agreement and continue looping until the abort
request.

The crash:

If the background start up fails, then the _started future is
resolved into exception. The view_builder::stop then turns this
future into a real exception caught-and-rethrown by main.cc.

This seems wrong that a failure in a background fiber aborts
the regular shutdown that may proceed otherwise.

tests: unit(dev), manual start-stop
branch: https://github.com/xemul/scylla/tree/br-view-builder-shutdown-fix-3
fixes: #7077

Patch #5 leaves the seastar::async() in the 1-st phase of the
start() although can also be tuned not to produce a thread.
However, there's one more (painless) issue with the _sem usage,
so this change appears too large for the part of the bug-fix
and will come as a followup.

* 'br-view-builder-shutdown-fix-3' of git://github.com/xemul/scylla:
  view_builder: Add comment about builder instances life-times
  view_builder: Do sleep abortable
  view_builder: Wakeup barrier on exception
  view_builder: Always resolve started future to success
  view_builder: Re-futurize start
  view_builder: Split calculate_shard_build_step into two
  view_builder: Populate the view_builder_init_state
  view_builder: Fix indentation after previous patch
  view_builder: Introduce view_builder_init_state

(cherry picked from commit ca9422ca73)
2020-10-07 15:05:12 +03:00
Gleb Natapov
5ed7de81ad lwt: do not return unavailable exception from the 'learn' stage
Unavailable exception means that operation was not started and it can be
retried safely. If lwt fails in the learn stage though it most
certainly means that its effect will be observable already. The patch
returns timeout exception instead which means uncertainty.

Fixes #7258

Message-Id: <20201001130724.GA2283830@scylladb.com>
(cherry picked from commit 3e8dbb3c09)
2020-10-07 10:59:30 +02:00
Juliusz Stasiewicz
5e21c9bd8a tracing: Fix error on slow batches
`trace_keyspace_helper::make_slow_query_mutation_data` expected a
"query" key in its parameters, which does not appear in case of
e.g. batches of prepared statements. This is example of failing
`record.parameters`:
```
...{"query[0]" : "INSERT INTO ks.tbl (pk, i) values (?, ?);"},
{"query[1]" : "INSERT INTO ks.tbl (pk, i) values (?, ?);"}...
```

In such case Scylla recorded no trace and said:
```
ERROR 2020-09-28 10:09:36,696 [shard 3] trace_keyspace_helper - No
"query" parameter set for a session requesting a slow_query_log record
```

Fix here is to leave query empty if not found. The users can still
retrieve the query contents from existing info.

Fixes #5843

Closes #7293

(cherry picked from commit 0afa738a8f)
2020-10-04 18:04:22 +03:00
Avi Kivity
8e22fddc9e Merge 'Fix ignoring cells after null in appending hash' from Piotr Sarna
"
This series fixes a bug in `appending_hash<row>` that caused it to ignore any cells after the first NULL. It also adds a cluster feature which starts using the new hashing only after the whole cluster is aware of it. The series comes with tests, which reproduce the issue.

Fixes #4567
Based on #4574
"

* psarna-fix_ignoring_cells_after_null_in_appending_hash:
  test: extend mutation_test for NULL values
  tests/mutation: add reproducer for #4567
  gms: add a cluster feature for fixed hashing
  digest: add null values to row digest
  mutation_partition: fix formatting
  appending_hash<row>: make publicly visible

(cherry picked from commit 0e03c979d2)
2020-10-01 23:23:00 +02:00
Avi Kivity
6cf2d998e3 Merge "Fix race in schema version recalculation leading to stale schema version in gossip" from Tomasz
"
Migration manager installs several cluster feature change listeners.
The listeners will call update_schema_version_and_announce() when cluster
features are enabled, which does this:

    return update_schema_version(proxy, features).then([] (utils::UUID uuid) {
        return announce_schema_version(uuid);
    });

It first updates the schema version and then publishes it via
gossip in announce_schema_version(). It is possible that the
announce_schema_version() part of the first schema change will be
deferred and will execute after the other four calls to
update_schema_version_and_announce(). It will install the old schema
version in gossip instead of the more recent one.

The fix is to serialize schema digest calculation and publishing.

Refs #7200

This problem also brought my attention to initialization code, which could be
prone to the same problem.

The storage service computes gossiper states before it starts the
gossiper. Among them, node's schema version. There are two problems with that.

First is that computing the schema version and publishing it is not
atomic, so is not safe against concurrent schema changes or schema
version recalculations. It will not exclude with
recalculate_schema_version() calls, and we could end up with the old
(and incorrect) schema version being advertised in gossip.

Second problem is that we should not allow the database layer to call
into the gossiper layer before it is fully initialized, as this may
produce undefined behavior.

Maybe we're not doing concurrent schema changes/recalculations now,
but it is easy to imagine that this could change for whatever reason
in the future.

The solution for both problems is to break the cyclic dependency
between the database layer and the storage_service layer by having the
database layer not use the gossiper at all. The database layer
publishes schema version inside the database class and allows
installing listeners on changes. The storage_service layer asks the
database layer for the current version when it initializes, and only
after that installs a listener which will update the gossiper.

Tests:

  - unit (dev)
  - manual (3 node ccm)
"

Fixes #7291

* tag 'fix-schema-digest-calculation-race-v1' of github.com:tgrabiec/scylla:
  db, schema: Hide update_schema_version_and_announce()
  db, storage_service: Do not call into gossiper from the database layer
  db: Make schema version observable
  utils: updateable_value_source: Introduce as_observable()
  schema: Fix race in schema version recalculation leading to stale schema version in gossip

(cherry picked from commit dcaf4ea4dd)
2020-10-01 17:44:37 +02:00
Hagit Segev
5fcc1f205c release: prepare for 4.2.rc5 2020-09-30 20:40:44 +03:00
Avi Kivity
08c35c1aad Revert "Revert "config: Do not enable repair based node operations by default""
This reverts commit 71d0d58f8c. Repair-based
node operations stil have a significant regression (See #7249).
2020-09-30 14:18:37 +03:00
Tomasz Grabiec
54a913d452 Merge "evictable_reader: validate buffer on reader recreation" from Botond
The reader recreation mechanism is a very delicate and error-prone one,
as proven by the countless bugs it had. Most of these bugs were related
to the recreated reader not continuing the read from the expected
position, inserting out-of-order fragments into the stream.
This patch adds a defense mechanism against such bugs by validating the
start position of the recreated reader.
The intent is to prevent corrupt data from getting into the system as
well as to help catch these bugs as close to the source as possible.

Fixes: #7208

Tests: unit(dev), mutation_reader_test:debug (v4)

* botond/evictable-reader-validate-buffer/v5:
  mutation_reader_test: add unit test for evictable reader self-validation
  evictable_reader: validate buffer after recreation the underlying
  evictable_reader: update_next_position(): only use peek'd position on partition boundary
  mutation_reader_test: add unit test for evictable reader range tombstone trimming
  evictable_reader: trim range tombstones to the read clustering range
  position_in_partition_view: add position_in_partition_view before_key() overload
  flat_mutation_reader: add buffer() accessor

(cherry picked from commit 97c99ea9f3)
2020-09-30 13:13:09 +02:00
Piotr Dulikowski
8f9cd98c45 hinted handoff: fix race - decomission vs. endpoint mgr init
This patch fixes a race between two methods in hints manager: drain_for
and store_hint.

The first method is called when a node leaves the cluster, and it
'drains' end point hints manager for that node (sends out all hints for
that node). If this method is called when the local node is being
decomissioned or removed, it instead drains hints managers for all
endpoints.

In the case of decomission/remove, drain_for first calls
parallel_for_each on all current ep managers and tells them to drain
their hints. Then, after all of them complete, _ep_managers.clear() is
called.

End point hints managers are created lazily and inserted into
_ep_managers map the first time a hint is stored for that node. If
this happens between parallel_for_each and _ep_managers.clear()
described above, the clear operation will destroy the new ep manager
without draining it first. This is a bug and will trigger an assert in
ep manager's destructor.

To solve this, a new flag for the hints manager is added which is set
when it drains all ep managers on removenode/decommission, and prevents
further hints from being written.

Fixes #7257

Closes #7278

(cherry picked from commit 39771967bb)
2020-09-29 14:18:48 +03:00
Avi Kivity
2893f6e43b Update seastar submodule
* seastar 0c289412a9...61b88d1da4 (1):
  > lz4_fragmented_compressor: Fix buffer requirements

Fixes #6925.
2020-09-23 11:04:22 +03:00
Avi Kivity
18d6c27b05 Merge 'storage_proxy: add a separate smp_group for hints' from Eliran
Hints writes are handled by storage_proxy in the exact same way
regular writes are, which in turn means that the same smp service
group is used for both. The problem is that it can lead to a priority
inversion where writes of the lower priority  kind occupies a lot of
the semaphores units making the higher priority writes wait for an
empty slot.
This series adds a separate smp group for hints as well as a field
to pass the correct smp group to mutate_locally functions, and
then uses this field to properly classify the writes.

Fixes #7177

* eliransin-hint_priority_inversion:
  Storage proxy: use hints smp group in mutate locally
  Storage proxy: add a dedicated smp group for hints

(cherry picked from commit c075539fea)
2020-09-22 14:06:14 +03:00
Pavel Solodovnikov
97d7f6990c storage_proxy: un-hardcode force sync flag for mutate_locally(mutation) overload
Corresponding overload of `storage_proxy::mutate_locally`
was hardcoded to pass `db::commitlog::force_sync::no` to the
`database::apply`. Unhardcode it and substitute `force_sync::no`
to all existing call sites (as it were before).

`force_sync::yes` will be used later for paxos learn writes
when trying to apply mutations upgraded from an obsolete
schema version (similar to the current case when applying
locally a `frozen_mutation` stored in accepted proposal).

Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200716124915.464789-1-pa.solodovnikov@scylladb.com>
(cherry picked from commit 5ff5df1afd)

Prerequisite for #7177.
2020-09-22 14:05:39 +03:00
Nadav Har'El
9855e18c0d alternator: fix corruption of PutItem operation in case of contention
This patch fixes a bug noted in issue #7218 - where PutItem operations
sometimes lose part of the item's data - some attributes were lost,
and the name of other attributes replaced by empty strings. The problem
happened when the write-isolation policy was LWT and there was contention
of writes to the same partition (not necessarily the same item).

To use CAS (a.k.a. LWT), Alternator builds an alternator::rmw_operation
object with an apply() function which takes the old contents of the item
(if needed) and a timestamp, and builds a mutation that the CAS should
apply. In the case of the PutItem operation, we wrongly assumed that apply()
will be called only once - so as an optimization the strings saved in the
put_item_operation were moved into the returned mutation. But this
optimization is wrong - when there is contention, apply() may be called
again when the changed proposed by the previous one was not accepted by
the Paxos protocol.

The fix is to change the one place where put_item_operation *moved* strings
out of the saved operations into the mutations, to be a copy. But to prevent
this sort of bug from reoccuring in future code, this patch enlists the
compiler to help us verify that it can't happen: The apply() function is
marked "const" - it can use the information in the operation to build the
mutation, but it can never modify this information or move things out of it,
so it will be fine to call this function twice.

The single output field that apply() does write (_return_attributes) is
marked "mutable" to allow the const apply() to write to it anyway. Because
apply() might be called twice, it is important that if some apply()
implementation sometimes sets _return_attributes, then it must always
set it (even if to the default, empty, value) on every call to apply().

The const apply() means that the compiler verfies for us that I didn't
forget to fix additional wrong std::move()s. Additionally, a test I wrote
to easily reproduce issue #7218 (which I will submit as a dtest later)
passes after this fix.

Fixes #7218.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200916064906.333420-1-nyh@scylladb.com>
(cherry picked from commit 5e8bdf6877)
2020-09-16 19:17:52 +03:00
Avi Kivity
5d5ddd3539 Merge "materialized views: Fix undefined behavior on base table schema changes" from Tomasz
"
The view_info object, which is attached to the schema object of the
view, contains a data structure called
"base_non_pk_columns_in_view_pk". This data structure contains column
ids of the base table so is valid only for a particular version of the
base table schema. This data structure is used by materialized view
code to interpret mutations of the base table, those coming from base
table writes, or reads of the base table done as part of view updates
or view building.

The base table schema version of that data structure must match the
schema version of the mutation fragments, otherwise we hit undefined
behavior. This may include aborts, exceptions, segfaults, or data
corruption (e.g. writes landing in the wrong column in the view).

Before this patch, we could get schema version mismatch here after the
base table was altered. That's because the view schema did not change
when the base table was altered.

Another problem was that view building was using the current table's schema
to interpret the fragments and invoke view building. That's incorrect for two
reasons. First, fragments generated by a reader must be accessed only using
the reader's schema. Second, base_non_pk_columns_in_view_pk of the recorded
view ptrs may not longer match the current base table schema, which is used
to generate the view updates.

Part of the fix is to extract base_non_pk_columns_in_view_pk into a
third entity called base_dependent_view_info, which changes both on
base table schema changes and view schema changes.

It is managed by a shared pointer so that we can take immutable
snapshots of it, just like with schema_ptr. When starting the view
update, the base table schema_ptr and the corresponding
base_dependent_view_info have to match. So we must obtain them
atomically, and base_dependent_view_info cannot change during update.

Also, whenever the base table schema changes, we must update
base_dependent_view_infos of all attached views (atomically) so that
it matches the base table schema.

Fixes #7061.

Tests:

  - unit (dev)
  - [v1] manual (reproduced using scylla binary and cqlsh)
"

* tag 'mv-schema-mismatch-fix-v2' of github.com:tgrabiec/scylla:
  db: view: Refactor view_info::initialize_base_dependent_fields()
  tests: mv: Test dropping columns from base table
  db: view: Fix incorrect schema access during view building after base table schema changes
  schema: Call on_internal_error() when out of range id is passed to column_at()
  db: views: Fix undefined behavior on base table schema changes
  db: views: Introduce has_base_non_pk_columns_in_view_pk()

(cherry picked from commit 3daa49f098)
2020-09-16 16:42:02 +03:00
Benny Halevy
0a72893fef test: cql_query_test: test_cache_bypass: use table stats
test is currently flaky since system reads can happen
in the background and disturb the global row cache stats.

Use the table's row_cache stats instead.

Fixes #6773

Test: cql_query_test.test_cache_bypass(dev, debug)

Credit-to: Botond Dénes <bdenes@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200811140521.421813-1-bhalevy@scylladb.com>
(cherry picked from commit 6deba1d0b4)
2020-09-16 16:05:53 +03:00
Dejan Mircevski
1e45557d2a cql3: Fix NULL reference in get_column_defs_for_filtering
There was a typo in get_column_defs_for_filtering(): it checked the
wrong pointer before dereferencing.  Add a test exposing the NULL
dereference and fix the typo.

Tests: unit (dev)

Fixes #7198.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
(cherry picked from commit 9d02f10c71)
2020-09-16 15:46:58 +03:00
Avi Kivity
96e1e95c1d reconcilable_result_builder: don't aggrevate out-of-memory condition during recovery
Consider an unpaged query that consumes all of available memory, despite
fea5067dfa which limits them (perhaps the
user raised the limit, or this is a system query). Eventually we will see a
bad_alloc which will abort the query and destroy this reconcilable_result_builder.

During destruction, we first destroy _memory_accounter, and then _result.
Destroying _memory_accounter resumes some continuations which can then
allocate memory synchronously when increasing the task queue to accomodate
them. We will then crash. Had we not crashed, we would immediately afterwards
release _result, freeing all the memory that we would ever need.

Fix by making _result the last member, so it is freed first.

Fixes #7240.

(cherry picked from commit 9421cfded4)
2020-09-16 15:40:40 +03:00
Raphael S. Carvalho
338196eab6 storage_service: Fix use-after-free when calculating effective ownership
Use-after-free happens because we take a ref to keyspace_name, which
is stack allocated, and ceases to exist after the next deferring
action.

Fixes #7209.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200909210741.104397-1-raphaelsc@scylladb.com>
(cherry picked from commit 86b9ea6fb2)
2020-09-12 13:58:45 +03:00
Asias He
71cbec966b storage_service: Fix a TOKENS update race for replace operation
In commit 7d86a3b208 (storage_service:
Make replacing node take writes), application state of TOKENS of the
replacing node is added into gossip and propagated to the cluster after
the initial start of gossip service. This can cause a race below

1. The replacing node replaces the old dead node with the same ip address
2. The replacing node starts gossip without application state of the TOKENS
3. Other nodes in the cluster replace the application states of old dead node's
   version with the new replacing node's version
4. replacing node dies
5. replace operation is performed again, the TOKENS application state is
   not preset and replace operation fails.

To fix, we can always add TOKENS application state when the
gossip service starts.

Fixes: #7166
Backports: 4.1 and 4.2
(cherry picked from commit 3ba6e3d264)
2020-09-10 13:12:56 +03:00
Avi Kivity
067a065553 Merge "Fix repair stalls in get_sync_boundary and apply_rows_on_master_in_thread" from Asias
"
This path set fixes stalls in repair that are caused by std::list merge and clear operations during test_latency_read_with_nemesis test.

Fixes #6940
Fixes #6975
Fixes #6976
"

* 'fix_repair_list_stall_merge_clear_v2' of github.com:asias/scylla:
  repair: Fix stall in apply_rows_on_master_in_thread and apply_rows_on_follower
  repair: Use clear_gently in get_sync_boundary to avoid stall
  utils: Add clear_gently
  repair: Use merge_to_gently to merge two lists
  utils: Add merge_to_gently

(cherry picked from commit 4547949420)
2020-09-10 13:12:53 +03:00
Avi Kivity
e00bdc4f57 repair: apply_rows_on_follower(): remove copy of repair_rows list
We copy a list, which was reported to generate a 15ms stall.

This is easily fixed by moving it instead, which is safe since this is
the last use of the variable.

Fixes #7115.

(cherry picked from commit 6ff12b7f79)
2020-09-10 11:53:05 +03:00
Juliusz Stasiewicz
ad40f9222c cdc: Retry generation fetching after read_failure_exception
While fetching CDC generations, various exceptions can occur. They
are divided into "fatal" and "nonfatal", where "fatal" ones prevent
retrying of the fetch operation.

This patch makes `read_failure_exception` "non-fatal", because such
error may appear during restart. In general this type of error can
mean a few different things (e.g. an error code in a response from
replica, but also a broken connection) so retrying seems reasonable.

Fixes #6804

(cherry picked from commit d1dec3fcd7)
2020-09-09 15:10:50 +03:00
Kamil Braun
5d90fa17d6 cdc: fix deadlock inside check_and_repair_cdc_streams
check_and_repair_cdc_streams, in case it decides to create a new CDC
generation, updates the STATUS application state so that other nodes
gossiped with pick up the generation change.

The node which runs check_and_repair_cdc_streams also learns about a
generation change: the STATUS update causes a notification change.
This happens during add_local_application_state call
which caused the STATUS update; it would lead to calling
handle_cdc_generation, which detects a generation change and calls
add_local_application_state with the new generation's timestamp.

Thus, we get a recursive add_local_application_state call. Unforunately,
the function takes a lock before doing on_change notifications, so we
get a deadlock.

This commit prevents the deadlock.
We update the local variable which stores the generation timestamp
before updating STATUS, so handle_cdc_generation won't consider
the observed generation to be new, hence it won't perform the recursive
add_local_application_state call.

(cherry picked from commit 42fb4fe37c)
2020-09-09 10:14:18 +03:00
Yaron Kaikov
bf0c493c28 release: prepare for 4.2.rc4 2020-09-07 14:56:32 +03:00
Raphael S. Carvalho
26cb0935f0 sstables/LCS: increase per-level overlapping tolerance in reshape
LCS can have its overlapping invariant broken after operations that can
proceed in parallel to regular compaction like cleanup. That's because
there could be two compactions in parallel placing data in overlapping
token ranges of a given level > 0.
After reshape, the whole table will be rewritten, on restart, if a
given level has more than (fan_out*2)=20 overlaps.
That may sound like enough, but that's not taking into account the
exponential growth in # of SSTables per level, so 20 overlaps may
sound like a lot for level 2 which can afford 100 sstables, but it's
only 2% of level 3, and 0.2% of level 4. So let's change the
overlapping tolerance from the constant of fan_out*2 to 10% of level
limit on # of SSTables, or fan_out, whichever is higher.

Refs #6938.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200810154510.32794-1-raphaelsc@scylladb.com>
(cherry picked from commit 7d7f9e1c54)
2020-09-06 18:28:55 +03:00
Raphael S. Carvalho
4e97d562eb compaction: Prevent non-regular compaction from picking compacting SSTables
After 8014c7124, cleanup can potentially pick a compacting SSTable.
Upgrade and scrub can also pick a compacting SSTable.
The problem is that table::candidates_for_compaction() was badly named.
It misleads the user into thinking that the SSTables returned are perfect
candidates for compaction, but manager still need to filter out the
compacting SSTables from the returned set. So it's being renamed.

When the same SSTable is compacted in parallel, the strategy invariant
can be broken like overlapping being introduced in LCS, and also
some deletion failures as more than one compaction process would try
to delete the same files.

Let's fix scrub, cleanup and ugprade by calling the manager function
which gets the correct candidates for compaction.

Fixes #6938.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200811200135.25421-1-raphaelsc@scylladb.com>
(cherry picked from commit 11df96718a)
2020-09-06 18:26:43 +03:00
Takuya ASADA
3f1b932c04 aws: update enhanced networking supported instance list
Sync enhanced networking supported instance list to latest one.

Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html

Fixes #6991

(cherry picked from commit 7cccb018b8)
2020-09-06 18:21:12 +03:00
Avi Kivity
b9498ab947 Update seastar submodule
* seastar 7816796dd1...0c289412a9 (1):
  > TLS: Use "known" (precalculated) DH parameters if available

Fixes #6191.
2020-09-06 17:38:05 +03:00
Avi Kivity
e1b3d6d0a2 Update seastar submodule
* seastar adaabdfbc...7816796dd (1):
  > core/reactor: complete_timers(): restore previous scheduling group

Fixes #7117.
2020-09-03 23:47:22 +03:00
Avi Kivity
67378cda03 Merge "Fix TWCS compaction aggressiveness due to data segregation" from Raphael
"
After data segregation feature, anything that cause out-of-order writes,
like read repair, can result in small updates to past time windows.
This causes compaction to be very aggressive because whenever a past time
window is updated like that, that time window is recompacted into a
single SSTable.
Users expect that once a window is closed, it will no longer be written
to, but that has changed since the introduction of the data segregation
future. We didn't anticipate the write amplification issues that the
feature would cause. To fix this problem, let's perform size-tiered
compaction on the windows that are no longer active and were updated
because data was segregated. The current behavior where the last active
window is merged into one file is kept. But thereafter, that same
window will only be compacted using STCS.

Fixes #6928.
"

* 'fix_twcs_agressiveness_after_data_segregation_v2' of github.com:raphaelsc/scylla:
  compaction/twcs: improve further debug messages
  compaction/twcs: Improve debug log which shows all windows
  test: Check that TWCS properly performs size-tiered compaction on past windows
  compaction/twcs: Make task estimation take into account the size-tiered behavior
  compaction/stcs: Export static function that estimates pending tasks
  compaction/stcs: Make get_buckets() static
  compact/twcs: Perform size-tiered compaction on past time windows
  compaction/twcs: Make strategy easier to extend by removing duplicated knowledge
  compaction/twcs: Make newest_bucket() non-static
  compaction/twcs: Move TWCS implementation into source file

(cherry picked from commit 6f986df458)
2020-09-02 12:53:45 +03:00
Nadav Har'El
6ab3965465 redis: fix another use-after-free crash in "exists" command
Never trust Occam's Razor - it turns out that the use-after-free bug in the
"exists" command was caused by two separate bugs. We fixed one in commit
9636a33993, but there is a second one fixed in
this patch.

The problem fixed here was that a "service_permit" object, which is designed to
be copied around from place to place (it contains a shared pointer, so is cheap
to copy), was saved by reference, and the reference was to a function argument
and was destroyed prematurely.

This time I tested *many times* that that test_strings.py passes on both dev and
debug builds.

Note that test/run/redis still fails in a debug build, but due to a different
problem.

Fixes #6469

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200825183313.120331-1-nyh@scylladb.com>
(cherry picked from commit 868194cd17)
2020-08-27 12:16:19 +03:00
Nadav Har'El
ca22461a9b redis: fix use-after-free crash in "exists" command
A missing "&" caused the key stored in a long-living command to be copied
and the copy quickly freed - and then used after freed.
This caused the test test_strings.py::test_exists_multiple_existent_key for
this feature to frequently crash.

Fixes #6469

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200823190141.88816-1-nyh@scylladb.com>
(cherry picked from commit 9636a33993)
2020-08-27 12:16:19 +03:00
Asias He
b3d83ad073 compaction_manager: Avoid stall in perform_cleanup
The following stall was seen during a cleanup operation:

scylla: Reactor stalled for 16262 ms on shard 4.

| std::_MakeUniq<locator::tokens_iterator_impl>::__single_object std::make_unique<locator::tokens_iterator_impl, locator::tokens_iterator_impl&>(locator::tokens_iterator_impl&) at /usr/include/fmt/format.h:1158
|  (inlined by) locator::token_metadata::tokens_iterator::tokens_iterator(locator::token_metadata::tokens_iterator const&) at ./locator/token_metadata.cc:1602
| locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at simple_strategy.cc:?
|  (inlined by) locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at ./locator/simple_strategy.cc:56
| locator::abstract_replication_strategy::get_ranges(gms::inet_address, locator::token_metadata&) const at /usr/include/fmt/format.h:1158
| locator::abstract_replication_strategy::get_ranges(gms::inet_address) const at /usr/include/fmt/format.h:1158
| service::storage_service::get_ranges_for_endpoint(seastar::basic_sstring<char, unsigned int, 15u, true> const&, gms::inet_address const&) const at /usr/include/fmt/format.h:1158
| service::storage_service::get_local_ranges(seastar::basic_sstring<char, unsigned int, 15u, true> const&) const at /usr/include/fmt/format.h:1158
|  (inlined by) operator() at ./sstables/compaction_manager.cc:691
|  (inlined by) _M_invoke at /usr/include/c++/9/bits/std_function.h:286
| std::function<std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > (table const&)>::operator()(table const&) const at /usr/include/fmt/format.h:1158
|  (inlined by) compaction_manager::rewrite_sstables(table*, sstables::compaction_options, std::function<std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > (table const&)>) at ./sstables/compaction_manager.cc:604
| compaction_manager::perform_cleanup(table*) at /usr/include/fmt/format.h:1158

To fix, we furturize the function to get local ranges and sstables.

In addition, this patch removes the dependency to global storage_service object.

Fixes #6662

(cherry picked from commit 07e253542d)
2020-08-27 12:16:19 +03:00
Raphael S. Carvalho
7e6f47fbce sstables: optimize procedure that checks if a sstable needs cleanup
needs_cleanup() returns true if a sstable needs cleanup.

Turns out it's very slow because it iterates through all the local
ranges for all sstables in the set, making its complexity:
	O(num_sstables * local_ranges)

We can optimize it by taking into account that abstract_replication_strategy
documents that get_ranges() will return a list of ranges that is sorted
and non-overlapping. Compaction for cleanup already takes advantage of that
when checking if a given partition can be actually purged.

So needs_cleanup() can be optimized into O(num_sstables * log(local_ranges)).

With num_sstables=1000, RF=3, then local_ranges=256(num_tokens)*3, it means
the max # of checks performed will go from 768000 to ~9584.

Fixes #6730.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200629171355.45118-2-raphaelsc@scylladb.com>
(cherry picked from commit cf352e7c14)
2020-08-27 12:16:16 +03:00
Asias He
9ca49cba6b abstract_replication_strategy: Add get_ranges_in_thread
Add a version that runs inside a seastar thread. The benefit is that
get_ranges can yield to avoid stalls.

Refs #6662

(cherry picked from commit 94995acedb)
2020-08-27 12:15:33 +03:00
Raphael S. Carvalho
6da8ba2d3f sstables: export needs_cleanup()
May be needed elsewhere, like in an unit test.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200629171355.45118-1-raphaelsc@scylladb.com>
(cherry picked from commit a9eebdc778)
2020-08-27 12:15:29 +03:00
Asias He
366c1c2c59 gossip: Fix race between shutdown message handler and apply_state_locally
1. The node1 is shutdown
2. The node1 sends shutdown message to node2
3. The node2 receives gossip shutdown message but the handler yields
4. The node1 is restarted
5. The node1 sends new gossip endpoint_state to node2, node2 applies the state
   in apply_state_locally and calls gossiper::handle_major_state_change
   and then calls gossiper::mark_alive
6. The shutdown message handler in step 3 resumes and sets status of node1 to SHUTDOWN
7. The gossiper::mark_alive fiber in step 5 resumes and calls gossiper::real_mark_alive,
   node2 will skip to mark node1 as alive because the status of node1 is
   SHUTDOWN. As a result, node1 is alive but it is not marked as UP by node2.

To fix, we serialize the two operations.

Fixes #7032

(cherry picked from commit e6ceec1685)
2020-08-27 11:15:48 +03:00
Nadav Har'El
05cdb173f3 Alternator: allow CreateTable with SSESpecification explicitly disabled
While Alternator doesn't yet support creating a table with a different
"server-side encryption" (a.k.a. encryption-at-rest) parameters, the
SSESpecification option with Enabled=false should still be allowed, as
it is just the default, and means exactly the same as would a missing
SSESpecification.

This patch also adds a test for this case, which failed on Alternator
before this patch.

Fixes #7031.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200812205853.173846-1-nyh@scylladb.com>
(cherry picked from commit 4c73d43153)
2020-08-26 20:15:19 +03:00
Nadav Har'El
8c929a96cf alternator: CreateTable with bad Tags shouldn't create a table
Currently, if a user tries to CreateTable with a forbidden set of tags,
e.g., the Tags list is too long or contains an invalid value for
system:write_isolation, then the CreateTable request fails but the table
is still created. Without the tag of course.

This patch fixes this bug, and adds two test cases for it that fail
before this patch, and succeed with it. One of the test cases is
scylla_only because it checks the Scylla-specific system:write_isolation
tag, but the second test case works on DynamoDB as well.

What this patch does is to split the update_tags() function into two
parts - the first part just parses the Tags, validates them, and builds
a map. Only the second part actually writes the tags to the schema.
CreateTable now does the first part early, before creating the table,
so failure in parsing or validating the Tags will not leave a created
table behind.

Fixes #6809.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200713120611.767736-1-nyh@scylladb.com>
(cherry picked from commit 35f7048228)
2020-08-26 19:52:49 +03:00
Avi Kivity
a9aa10e8de Merge "Unregister RPC verbs on stop" from Pavel E
"
There are 5 services, that register their RPC handlers in messaging
service, but quite a few of them unregister them on stop.

Unregistering is somewhat critical, not just because it makes the
code look clean, but also because unregistration does wait for the
message processing to complete, thus avoiding use-after-free's in
the handlers.

In particular, several handlers call service::get_schema_for_write()
which, in turn, may end up in service::maybe_sync() calling for
the local migration manager instance. All those handlers' processing
must be waited for before stopping the migration manager.

The set brings the RPC handlers unregistration in sync with the
registration part.

tests: unit (dev)
       dtest (dev: simple_boot_shutdown, repair)
       start-stop by hands (dev)
fixes: #6904
"

* 'br-rpc-unregister-verbs' of https://github.com/xemul/scylla:
  main: Add missing calls to unregister RPC hanlers
  messaging: Add missing per-service unregistering methods
  messaging: Add missing handlers unregistration helpers
  streaming: Do not use db->invoke_on_all in vain
  storage_proxy: Detach rpc unregistration from stop
  main: Shorten call to storage_proxy::init_messaging_service

(cherry picked from commit 01b838e291)
2020-08-26 14:41:04 +03:00
Raphael S. Carvalho
989d8fe636 cql3/statements: verify that counter column cannot be added into non-counter table
A check, to validate that counter column cannot be added into non-counter table,
is missing for alter table statement. Validation is performed when building new
schema, but it's limited to checking that a schema will not contain both counter
and non-counter columns.

Due to lack of validation, the added counter column could be incorrectly
persisted to the schema, but this results in a crash when setting the new
schema to its table. On restart, it can be confirmed that the schema change
was indeed persisted when describing the table.
This problem is fixed by doing proper validation for the alter table statement,
which consists of making sure a new counter column cannot be added to a
non-counter table.

The test cdc_disallow_cdc_for_counters_test is adjusted because one of its tests
was built on the assumption that counter column can be added into a non-counter
table.

Fixes #7065.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200824155709.34743-1-raphaelsc@scylladb.com>
(cherry picked from commit 1c29f0a43d)
2020-08-25 18:44:42 +03:00
Takuya ASADA
48d79a1d9f dist/debian: disable debuginfo compression on .deb
Since older binutils on some distribution does not able to handle
compressed debuginfo generated on Fedora, we need to disable it.
However, debian packager force debuginfo compression since debian/compat = 9,
we have to uncompress them after compressed automatically.

Fixes #6982

(cherry picked from commit 75c2362c95)
2020-08-23 19:01:00 +03:00
Botond Dénes
4c65413413 scylla-gdb.py: find_db(): don't return current shard's database for shard=0
The `shard` parameter of `find_db()` is optional and is defaulted to
`None`. When missing, the current shard's database instance is returned.
The problem is that the if condition checking this uses `not shard`,
which also evaluates to `True` if `shard == 0`, resulting in returning
the current shard's database instance for shard 0. Change the condition
to `shard is None` to avoid this.

Fixes: #7016
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200812091546.1704016-1-bdenes@scylladb.com>
(cherry picked from commit 4cfab59eb1)
2020-08-23 18:56:19 +03:00
Hagit Segev
e931d28673 release: prepare for 4.2.rc3 2020-08-19 14:39:08 +03:00
Botond Dénes
ec71688ff2 view_update_generator: fix race between registering and processing sstables
fea83f6 introduced a race between processing (and hence removing)
sstables from `_sstables_with_tables` and registering new ones. This
manifested in sstables that were added concurrently with processing a
batch for the same sstables being dropped and the semaphore units
associated with them not returned. This resulted in repairs being
blocked indefinitely as the units of the semaphore were effectively
leaked.

This patch fixes this by moving the contents of `_sstables_with_tables`
to a local variable before starting the processing. A unit test
reproducing the problem is also added.

Fixes: #6892

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200817160913.2296444-1-bdenes@scylladb.com>
(cherry picked from commit 22a6493716)
2020-08-19 00:11:48 +03:00
Botond Dénes
9710a91100 table: get_sstables_by_partition_key(): don't make a copy of selected sstables
Currently we assign the reference to the vector of selected sstables to
`auto sst`. This makes a copy and we pass this local variable to
`do_for_each()`, which will result in a use-after-free if the latter
defers.
Fix by not making a copy and instead just keep the reference.

Fixes: #7060

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200818091241.2341332-1-bdenes@scylladb.com>
(cherry picked from commit 78f94ba36a)
2020-08-19 00:01:36 +03:00
Nadav Har'El
b052f3f5ce Update Seastar submodule module
> http: add "Expect: 100-continue" handling

Refs #6844.
2020-08-11 13:06:03 +03:00
Calle Wilund
d70cab0444 database: Do not assert on replay positions if truncate does not flush
Fixes #6995

In c2c6c71 the assert on replay positions in flushed sstables discarded by
truncate was broken, by the fact that we no longer flush all sstables
unless auto snapshot is enabled.

This means the low_mark assertion does not hold, because we maybe/probably
never got around to creating the sstables that would hold said mark.

Note that the (old) change to not create sstables and then just delete
them is in itself good. But in that case we should not try to verify
the rp mark.

(cherry picked from commit 9620755c7f)
2020-08-11 00:00:43 +03:00
Avi Kivity
0ce3799187 Update seastar submodule
* seastar 4641f4f2d3...2775a54dcb (1):
  > memory: fix small aligned free memory corruption

Fixes #6831
2020-08-09 18:35:44 +03:00
Avi Kivity
ee113eca52 Merge 'hinted handoff: fix commitlog memory leak' from Piotr D
"
When commitlog is recreated in hints manager, only shutdown() method is
called, but not release(). Because of that, some internal commitlog
objects (`segment_manager` and `segment`s) may be left pointing to each
other through shared_ptr reference cycles, which may result in memory
leak when the parent commitlog object is destroyed.

This PR prevents memory leaks that may happen this way by calling
release() after shutdown() from the hints manager.

Fixes: #6409, Fixes #6776
"

* piodul-fix-commitlog-memory-leak-in-hinted-handoff:
  hinted handoff: disable warnings about segments left on disk
  hinted handoff: release memory on commitlog termination

(cherry picked from commit 4c221855a1)
2020-08-09 17:25:20 +03:00
Tomasz Grabiec
be11514985 thrift: Fix crash on unsorted column names in SlicePredicate
The column names in SlicePredicate can be passed in arbitrary order.
We converted them to clustering ranges in read_command preserving the
original order. As a result, the clustering ranges in read command may
appear out of order. This violates storage engine's assumptions and
lead to undefined behavior.

It was seen manifesting as a SIGSEGV or an abort in sstable reader
when executing a get_slice() thrift verb:

scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed.

Fixes #6486.

Tests:

   - added a new dtest to thrift_tests.py which reproduces the problem

Message-Id: <1596725657-15802-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit bfd129cffe)
2020-08-08 19:47:57 +03:00
Rafael Ávila de Espíndola
ec874bdc31 alternator: Fix use after return
Avoid a copy of timeout so that we don't end up with a reference to a
stack allocated variable.

Fixes #6897

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200721184939.111665-1-espindola@scylladb.com>
(cherry picked from commit e83e91e352)
2020-08-03 22:24:12 +03:00
Nadav Har'El
43169ffa2c alternator: fix Expected's "NULL" operator with missing AttributeValueList
The "NULL" operator in Expected (old-style conditional operations) doesn't
have any parameters, so we insisted that the AttributeValueList be empty.
However, we forgot to allow it to also be missing - a possibility which
DynamoDB allows.

This patch adds a test to reproduce this case (the test passes on DyanmoDB,
fails on Alternator before this patch, and succeeds after this patch), and
a fix.

Fixes #6816.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200709161254.618755-1-nyh@scylladb.com>
(cherry picked from commit f549d147ea)
2020-08-03 20:39:01 +03:00
Yaron Kaikov
c5ed14bff6 release: prepare for 4.2.rc2 2020-08-03 16:50:38 +03:00
Takuya ASADA
8366eda943 scylla_util.py: always use relocatable CLI tools
On some CLI tools, command options may different between latest version
vs older version.
To maximize compatibility of setup scripts, we should always use
relocatable CLI tools instead of distribution version of the tool.

Related #6954

(cherry picked from commit a19a62e6f6)
2020-08-03 10:39:26 +03:00
Takuya ASADA
5d0b0dd4c4 create-relocatable-package.py: add lsblk for relocatable CLI tools
We need latest version of lsblk that supported partition type UUID.

Fixes #6954

(cherry picked from commit 6ba2a6c42e)
2020-08-03 10:39:07 +03:00
Juliusz Stasiewicz
6f259be5f1 aggregate_fcts: Use per-type comparators for dynamic types
For collections and UDTs the `MIN()` and `MAX()` functions are
generated on the fly. Until now they worked by comparing just the
byte representations of arguments.

This patch uses specific per-type comparators to provide semantically
sensible, dynamically created aggregates.

Fixes #6768

(cherry picked from commit 5b438e79be)
2020-08-03 10:26:02 +03:00
Calle Wilund
16e512e21c cql3::lists: Fix setter_by_uuid not handing null value
Fixes #6828

When using the scylla list index from UUID extension,
null values were not handled properly causing throws
from underlying layer.

(cherry picked from commit 3b74b9585f)
2020-08-03 10:19:13 +03:00
Avi Kivity
c61dc4e87d tools: toolchain: regenerate for gcc 10.2
Fixes #6813.

As a side effect, this also brings in xxhash 0.7.4.

(matches commit 66c2b4c8bf)
2020-07-31 08:48:12 +03:00
Takuya ASADA
af76a3ba79 scylla_post_install.sh: generate memory.conf for CentOS7
On CentOS7, systemd does not support percentage-based parameter.
To apply memory parameter on CentOS7, we need to override the parameter
in bytes, instead of percentage.

Fixes #6783

(cherry picked from commit 3a25e7285b)
2020-07-30 16:41:10 +03:00
Tomasz Grabiec
8fb5ebb2c6 commitlog: Fix use-after-free on mutation object during replay
The mutation object may be freed prematurely during commitlog replay
in the schema upgrading path. We will hit the problem if the memtable
is full and apply_in_memory() needs to defer.

This will typically manifest as a segfault.

Fixes #6953

Introduced in 79935df

Tests:
  - manual using scylla binary. Reproduced the problem then verified the fix makes it go away

Message-Id: <1596044010-27296-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 3486eba1ce)
2020-07-30 16:36:42 +03:00
Takuya ASADA
bfb11defdd scylla_setup: skip boot partition
On GCE, /dev/sda14 reported as unused disk but it's BIOS boot partition,
should not use for scylla data partition, also cannot use for it since it's
too small.

It's better to exclude such partiotion from unsed disk list.

Fixes #6636

(cherry picked from commit d7de9518fe)
2020-07-29 09:48:10 +03:00
Asias He
2d1ddcbb6a repair: Fix race between create_writer and wait_for_writer_done
We saw scylla hit user after free in repair with the following procedure during tests:

- n1 and n2 in the cluster

- n2 ran decommission

- n2 sent data to n1 using repair

- n2 was killed forcely

- n1 tried to remove repair_meta for n1

- n1 hit use after free on repair_meta object

This was what happened on n1:

1) data was received -> do_apply_rows was called -> yield before create_writer() was called

2) repair_meta::stop() was called -> wait_for_writer_done() / do_wait_for_writer_done was called
   with _writer_done[node_idx] not engaged

3) step 1 resumed, create_writer() was called and _repair_writer object was referenced

4) repair_meta::stop() finished, repair_meta object and its member _repair_writer was destroyed

5) The fiber created by create_writer() at step 3 hit use after free on _repair_writer object

To fix, we should call wait_for_writer_done() after any pending
operations were done which were protected by repair_meta::_gate. This
prevents wait for writer done finishes before the writer is in the
process of being created.

Fixes: #6853
Fixes: #6868
Backports: 4.0, 4.1, 4.2
(cherry picked from commit e6f640441a)
2020-07-29 09:48:10 +03:00
Raphael S. Carvalho
4c560b63f0 sstable: index_reader: Make sure streams are all properly closed on failure
Turns out the fix f591c9c710 wasn't enough to make sure all input streams
are properly closed on failure.
It only closes the main input stream that belongs to context, but it misses
all the input streams that can be opened in the consumer for promote index
reading. Consumer stores a list of indexes, where each of them has its own
input stream. On failure, we need to make sure that every single one of
them is properly closed before destroying the indexes as that could cause
memory corruption due to read ahead.

Fixes #6924.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200727182214.377140-1-raphaelsc@scylladb.com>
(cherry picked from commit 0d70efa58e)
2020-07-29 09:48:10 +03:00
Nadav Har'El
00155e32b1 merge: db/view: view_update_generator: make staging reader evictable
Merged patch set by Botond Dénes:

The view update generation process creates two readers. One is used to
read the staging sstables, the data which needs view updates to be
generated for, and another reader for each processed mutation, which
reads the current value (pre-image) of each row in said mutation. The

staging reader is created first and is kept alive until all staging data
is processed. The pre-image reader is created separately for each
processed mutation. The staging reader is not restricted, meaning it
does not wait for admission on the relevant reader concurrency
semaphore, but it does register its resource usage on it. The pre-image
reader however *is* restricted. This creates a situation, where the
staging reader possibly consumes all resources from the semaphore,
leaving none for the later created pre-image reader, which will not be
able to start reading. This will block the view building process meaning
that the staging reader will not be destroyed, causing a deadlock.

This patch solves this by making the staging reader restricted and
making it evictable. To prevent thrashing -- evicting the staging reader
after reading only a really small partition -- we only make the staging
reader evictable after we have read at least 1MB worth of data from it.

  test/boost: view_build_test: add test_view_update_generator_buffering
  test/boost: view_build_test: add test test_view_update_generator_deadlock
  reader_permit: reader_resources: add operator- and operator+
  reader_concurrency_semaphore: add initial_resources()
  test: cql_test_env: allow overriding database_config
  mutation_reader: expose new_reader_base_cost
  db/view: view_updating_consumer: allow passing custom update pusher
  db/view: view_update_generator: make staging reader evictable
  db/view: view_updating_consumer: move implementation from table.cc to view.cc
  database: add make_restricted_range_sstable_reader()

Signed-off-by: Botond Dénes <bdenes@scylladb.com>

(cherry picked from commit f488eaebaf)

Fixes #6892.
2020-07-28 17:02:09 +03:00
Avi Kivity
b06dffcc19 Merge "messaging: make verb handler registering independent of current scheduling group" from Botond
"
0c6bbc8 refactored `get_rpc_client_idx()` to select different clients
for statement verbs depending on the current scheduling group.
The goal was to allow statement verbs to be sent on different
connections depending on the current scheduling group. The new
connections use per-connection isolation. For backward compatibility the
already existing connections fall-back to per-handler isolation used
previously. The old statement connection, called the default statement
connection, also used this. `get_rpc_client_idx()` was changed to select
the default statement connection when the current scheduling group is
the statement group, and a non-default connection otherwise.

This inadvertently broke `scheduling_group_for_verb()` which also used
this method to get the scheduling group to be used to isolate a verb at
handle register time. This method needs the default client idx for each
verb, but if verb registering is run under the system group it instead
got the non-default one, resulting in the per-handler isolation not
being set-up for the default statement connection, resulting in default
statement verb handlers running in whatever scheduling group the process
loop of the rpc is running in, which is the system scheduling group.

This caused all sorts of problems, even beyond user queries running in
the system group. Also as of 0c6bbc8 queries on the replicas are
classified based on the scheduling group they are running on, so user
reads also ended up using the system concurrency semaphore.

In particular this caused severe problems with ranges scans, which in
some cases ended up using different semaphores per page resulting in a
crash. This could happen because when the page was read locally the code
would run in the statement scheduling group, but when the request
arrived from a remote coordinator via rpc, it was read in a system
scheduling group. This caused a mismatch between the semaphore the saved
reader was created with and the one the new page was read with. The
result was that in some cases when looking up a paused reader from the
wrong semaphore, a reader belonging to another read was returned,
creating a disconnect between the lifecycle between readers and that of
the slice and range they were referencing.

This series fixes the underlying problem of the scheduling group
influencing the verb handler registration, as well as adding some
additional defenses if this semaphore mismatch ever happens in the
future. Inactive read handles are now unique across all semaphores,
meaning that it is not possible anymore that a handle succeeds in
looking up a reader when used with the wrong semaphore. The range scan
algorithm now also makes sure there is no semaphore mismatch between the
one used for the current page and that of the saved reader from the
previous page.

I manually checked that each individual defense added is already
preventing the crash from happening.

Fixes: #6613
Fixes: #6907
Fixes: #6908

Tests: unit(dev), manual(run the crash reproducer, observe no crash)
"

* 'query-classification-regressions/v1' of https://github.com/denesb/scylla:
  multishard_mutation_query: use cached semaphore
  messaging: make verb handler registering independent of current scheduling group
  multishard_mutation_query: validate the semaphore of the looked-up reader
  reader_concurrency_semaphore: make inactive read handles unique across semaphores
  reader_concurrency_semaphore: add name() accessor
  reader_concurrency_semaphore: allow passing name to no-limit constructor

(cherry picked from commit 3f84d41880)
2020-07-27 17:41:51 +03:00
Botond Dénes
508e58ef9e sstables: clamp estimated_partitions to [1, +inf) in writers
In some cases estimated number of partitions can be 0, which is albeit a
legit estimation result, breaks many low-level sstable writer code, so
some of these have assertions to ensure estimated partitions is > 0.
To avoid hitting this assert all users of the sstable writers do the
clamping, to ensure estimated partitions is at least 1. However leaving
this to the callers is error prone as #6913 has shown it. As this
clamping is standard practice, it is better to do it in the writers
themselves, avoiding this problem altogether. This is exactly what this
patch does. It also adds two unit tests, one that reproduces the crash
in #6913, and another one that ensures all sstable writers are fine with
estimated partitions being 0 now. Call sites previously doing the
clamping are changed to not do it, it is unnecessary now as the writer
does it itself.

Fixes #6913

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200724120227.267184-1-bdenes@scylladb.com>
(cherry picked from commit fe127a2155)
2020-07-27 15:00:00 +03:00
Piotr Sarna
776faa809f Merge 'view_update_generator: use partitioned sstable set'
from Botond.

Recently it was observed (#6603) that since 4e6400293ea, the staging
reader is reading from a lot of sstables (200+). This consumes a lot of
memory, and after this reaches a certain threshold -- the entire memory
amount of the streaming reader concurrency semaphore -- it can cause a
deadlock within the view update generation. To reduce this memory usage,
we exploit the fact that the staging sstables are usually disjoint, and
use the partitioned sstable set to create the staging reader. This
should ensure that only the minimum number of sstable readers will be
opened at any time.

Refs: #6603
Fixes: #6707

Tests: unit(dev)

* 'view-update-generator-use-partitioned-set/v1' of https://github.com/denesb/scylla:
  db/view: view_update_generator: use partitioned sstable set
  sstables: make_partitioned_sstable_set(): return an sstable_set

(cherry picked from commit e4b74356bb)
2020-07-21 15:40:02 +03:00
Raphael S. Carvalho
7037f43a17 table: Fix Staging SSTables being incorrectly added or removed from the backlog tracker
Staging SSTables can be incorrectly added or removed from the backlog tracker,
after an ALTER TABLE or TRUNCATE, because the add and removal don't take
into account if the SSTable requires view building, so a Staging SSTable can
be added to the tracker after a ALTER table, or removed after a TRUNCATE,
even though not added previously, potentially causing the backlog to
become negative.

Fixes #6798.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200716180737.944269-1-raphaelsc@scylladb.com>
(cherry picked from commit b67066cae2)
2020-07-21 12:57:09 +03:00
Avi Kivity
bd713959ce Update seastar submodule
* seastar 8aad24a5f8...4641f4f2d3 (4):
  > httpd: Don't warn on ECONNABORTED
  > httpd: Avoid calling future::then twice on the same future
Fixes #6709.
  > httpd: Use handle_exception instead of then_wrapped
  > httpd: Use std::unique_ptr instead of a raw pointer
2020-07-19 11:49:02 +03:00
Rafael Ávila de Espíndola
b7c5a918cb mutation_reader_test: Wait for a future
Nothing was waiting for this future. Found while testing another
patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200630183929.1704908-1-espindola@scylladb.com>
(cherry picked from commit 6fe7706fce)

Fixes #6858.
2020-07-16 14:44:31 +03:00
Asias He
fb2ae9e66b repair: Relax node selection in bootstrap when nodes are less than RF
Consider a cluster with two nodes:

 - n1 (dc1)
 - n2 (dc2)

A third node is bootstrapped:

 - n3 (dc2)

The n3 fails to bootstrap as follows:

 [shard 0] init - Startup failed: std::runtime_error
 (bootstrap_with_repair: keyspace=system_distributed,
 range=(9183073555191895134, 9196226903124807343], no existing node in
 local dc)

The system_distributed keyspace is using SimpleStrategy with RF 3. For
the keyspace that does not use NetworkTopologyStrategy, we should not
require the source node to be in the same DC.

Fixes: #6744
Backports: 4.0 4.1, 4.2
(cherry picked from commit 38d964352d)
2020-07-16 12:02:38 +03:00
Asias He
7a7ed8c65d repair: Relax size check of get_row_diff and set_diff
In case a row hash conflict, a hash in set_diff will get more than one
row from get_row_diff.

For example,

Node1 (Repair master):
row1  -> hash1
row2  -> hash2
row3  -> hash3
row3' -> hash3

Node2 (Repair follower):
row1  -> hash1
row2  -> hash2

We will have set_diff = {hash3} between node1 and node2, while
get_row_diff({hash3}) will return two rows: row3 and row3'. And the
error below was observed:

   repair - Got error in row level repair: std::runtime_error
   (row_diff.size() != set_diff.size())

In this case, node1 should send both row3 and row3' to peer node
instead of fail the whole repair. Because node2 does not have row3 or
row3', otherwise node1 won't send row with hash3 to node1 in the first
place.

Refs: #6252
(cherry picked from commit a00ab8688f)
2020-07-15 14:48:49 +03:00
Nadav Har'El
7b9be752ec alternator test: configurable temporary directory
The test/alternator/run script creates a temporary directory for the Scylla
database in /tmp. The assumption was that this is the fastest disk (usually
even a ramdisk) on the test machine, and we didn't need anything else from
it.

But it turns out that on some systems, /tmp is actually a slow disk, so
this patch adds a way to configure the temporary directory - if the TMPDIR
environment variable exists, it is used instead of /tmp. As before this
patch, a temporary subdirectry is created in $TMPDIR, and this subdirectory
is automatically deleted when the test ends.

The test.py script already passes an appropriate TMPDIR (testlog/$mode),
which after this patch the Alternator test will use instead of /tmp.

Fixes #6750

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200713193023.788634-1-nyh@scylladb.com>
(cherry picked from commit 8e3be5e7d6)
2020-07-14 12:34:26 +03:00
Konstantin Osipov
903e967a16 Export TMPDIR pointing at subdir of testlog/
Export TMPDIR environment variable pointing at a subdir of testlog.
This variable is used by seastar/scylla tests to create a
a subdirectory with temporary test data. Normally a test cleans
up the temporary directory, but if it crashes or is killed the
directory remains.

By resetting the default location from /tmp to testlog/{mode}
we allow test.py we consolidate all test artefacts in a single
place.

Fixes #6062, "test.py uses tmpfs"

(cherry picked from commit e628da863d)
2020-07-14 12:34:06 +03:00
Avi Kivity
b84946895c Update seastar submodule
* seastar 1e762652c4...8aad24a5f8 (2):
  > futures: Add a test for a broken promise in a parallel_for_each
  > future: Call set_to_broken_promise earlier

Fixes #6749 (probably).
2020-07-13 20:08:16 +03:00
Asias He
a27188886a repair: Switch to btree_set for repair_hash.
In one of the longevity tests, we observed 1.3s reactor stall which came from
repair_meta::get_full_row_hashes_source_op. It traced back to a call to
std::unordered_set::insert() which triggered big memory allocation and
reclaim.

I measured std::unordered_set, absl::flat_hash_set, absl::node_hash_set
and absl::btree_set. The absl::btree_set was the only one that seastar
oversized allocation checker did not warn in my tests where around 300K
repair hashes were inserted into the container.

- unordered_set:
hash_sets=295634, time=333029199 ns

- flat_hash_set:
hash_sets=295634, time=312484711 ns

- node_hash_set:
hash_sets=295634, time=346195835 ns

- btree_set:
hash_sets=295634, time=341379801 ns

The btree_set is a bit slower than unordered_set but it does not have
huge memory allocation. I do not measure real difference of total time
to finish repair of the same dataset with unordered_set and btree_set.

To fix, switch to absl btree_set container.

Fixes #6190

(cherry picked from commit 67f6da6466)
2020-07-13 10:09:23 +03:00
Dmitry Kropachev
51d4efc321 dist/common/scripts/scylla-housekeeping: wrap urllib.request with try ... except
We could hit "cannot serialize '_io.BufferedReader' object" when request get 404 error from the server
	Now you will get legit error message in the case.

	Fixes #6690

(cherry picked from commit de82b3efae)
2020-07-09 18:24:55 +03:00
Avi Kivity
0847eea8d6 Update seastar submodule
* seastar 11e86172ba...1e762652c4 (1):
  > sharded: Do not hang on never set freed promise

Fixes #6606.
2020-07-09 15:52:26 +03:00
Avi Kivity
35ad57cb9c Point seastar submodule at scylla-seastar.git
This allows us to backport seastar patches to 4.2.
2020-07-09 15:50:25 +03:00
Hagit Segev
42b0b9ad08 release: prepare for 4.2.rc1 2020-07-08 23:01:10 +03:00
Dejan Mircevski
68b95bf2ac cql/restrictions: Handle WHERE a>0 AND a<0
WHERE clauses with start point above the end point were handled
incorrectly.  When the slice bounds are transformed to interval
bounds, the resulting interval is interpreted as wrap-around (because
start > end), so it contains all values above 0 and all values below
0.  This is clearly incorrect, as the user's intent was to filter out
all possible values of a.

Fix it by explicitly short-circuiting to false when start > end.  Add
a test case.

Fixes #5799.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
(cherry picked from commit 921dbd0978)
2020-07-08 13:20:10 +03:00
Botond Dénes
fea83f6ae0 db/view: view_update_generator: re-balance wait/signal on the register semaphore
The view update generator has a semaphore to limit concurrency. This
semaphore is waited on in `register_staging_sstable()` and later the
unit is returned after the sstable is processed in the loop inside
`start()`.
This was broken by 4e64002, which changed the loop inside `start()` to
process sstables in per table batches, however didn't change the
`signal()` call to return the amount of units according to the number of
sstables processed. This can cause the semaphore units to dry up, as the
loop can process multiple sstables per table but return just a single
unit. This can also block callers of `register_staging_sstable()`
indefinitely as some waiters will never be released as under the right
circumstances the units on the semaphore can permanently go below 0.
In addition to this, 4e64002 introduced another bug: table entries from
the `_sstables_with_tables` are never removed, so they are processed
every turn. If the sstable list is empty, there won't be any update
generated but due to the unconditional `signal()` described above, this
can cause the units on the semaphore to grow to infinity, allowing
future staging sstables producers to register a huge amount of sstables,
causing memory problems due to the amount of sstable readers that have
to be opened (#6603, #6707).
Both outcomes are equally bad. This patch fixes both issues and modifies
the `test_view_update_generator` unit test to reproduce them and hence
to verify that this doesn't happen in the future.

Fixes: #6774
Refs: #6707
Refs: #6603

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200706135108.116134-1-bdenes@scylladb.com>
(cherry picked from commit 5ebe2c28d1)
2020-07-08 11:13:24 +03:00
Takuya ASADA
76618a7e06 scylla_setup: don't add same disk device twice
We shouldn't accept adding same disk twice for RAID prompt.

Fixes #6711

(cherry picked from commit 835e76fdfc)
2020-07-07 13:07:59 +03:00
Takuya ASADA
189a08ac72 scylla_setup: follow hugepages package name change on Ubuntu 20.04LTS
hugepages package now renamed to libhugetlbfs-bin, we need to follow
the change.

Fixes #6673

(cherry picked from commit 03ce19d53a)
2020-07-05 14:41:33 +03:00
Takuya ASADA
a3e9915a83 dist/debian: apply generated package version for .orig.tar.gz file
We currently does not able to apply version number fixup for .orig.tar.gz file,
even we applied correct fixup on debian/changelog, becuase it just reading
SCYLLA-VERSION-FILE.
We should parse debian/{changelog,control} instead.

Fixes #6736

(cherry picked from commit a107f086bc)
2020-07-05 14:08:37 +03:00
Asias He
e4bc14ec1a boot_strapper: Ignore node to be replaced explicitly as stream source
After commit 7d86a3b208 (storage_service:
Make replacing node take writes), during replace operation, tokens in
_token_metadata for node being replaced are updated only after the replace
operation is finished. As a result, in range_streamer::add_ranges, the
node being replaced will be considered as a source to stream data from.

Before commit 7d86a3b208, the node being
replaced will not be considered as a source node because it is already
replaced by the replacing node before the replace operation is finished.
This is the reason why it works in the past.

To fix, filter out the node being replaced as a source node explicitly.

Tests: replace_first_boot_test and replace_stopped_node_test
Backports: 4.1
Fixes: #6728
(cherry picked from commit e338028b7e22b0a80be7f80c337c52f958bfe1d7)
2020-07-01 14:36:43 +03:00
Takuya ASADA
972acb6d56 scylla_swap_setup: handle <1GB environment
Show better error message and exit with non-zero status when memory size <1GB.

Fixes #6659

(cherry picked from commit a9de438b1f)
2020-07-01 12:40:25 +03:00
Yaron Kaikov
7fbfedf025 dist/docker/redhat/Dockerfile: update 4.2 params
Set SCYLLA_REPO and VERSION values for scylla-4.2
2020-06-30 13:09:06 +03:00
Avi Kivity
5f175f8103 Merge "Fix handling of decimals with negative scales" from Rafael
"
Before this series scylla would effectively infinite loop when, for
example, casting a decimal with a negative scale to float.

Fixes #6720
"

* 'espindola/fix-decimal-issue' of https://github.com/espindola/scylla:
  big_decimal: Add a test for a corner case
  big_decimal: Correctly handle negative scales
  big_decimal: Add a as_rational member function
  big_decimal: Move constructors out of line

(cherry picked from commit 3e2eeec83a)
2020-06-29 12:05:17 +03:00
Benny Halevy
674ad6656a comapction: restore % in compaction completion message
The % sign fell off in c4841fa735

Fixes #6727.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200625151352.736561-1-bhalevy@scylladb.com>
(cherry picked from commit a843945115)
2020-06-28 12:10:21 +03:00
Hagit Segev
58498b4b6c release: prepare for 4.2.rc0 2020-06-26 13:06:07 +03:00
4819 changed files with 24656 additions and 46588 deletions

87
.github/CODEOWNERS vendored
View File

@@ -1,87 +0,0 @@
# AUTH
auth/* @elcallio @vladzcloudius
# CACHE
row_cache* @tgrabiec @haaawk
*mutation* @tgrabiec @haaawk
tests/mvcc* @tgrabiec @haaawk
# CDC
cdc/* @haaawk @kbr- @elcallio @piodul @jul-stas
test/cql/cdc_* @haaawk @kbr- @elcallio @piodul @jul-stas
test/boost/cdc_* @haaawk @kbr- @elcallio @piodul @jul-stas
# COMMITLOG / BATCHLOG
db/commitlog/* @elcallio
db/batch* @elcallio
# COORDINATOR
service/storage_proxy* @gleb-cloudius
# COMPACTION
sstables/compaction* @raphaelsc @nyh
# CQL TRANSPORT LAYER
transport/* @penberg
# CQL QUERY LANGUAGE
cql3/* @tgrabiec @penberg @psarna
# COUNTERS
counters* @haaawk @jul-stas
tests/counter_test* @haaawk @jul-stas
# GOSSIP
gms/* @tgrabiec @asias
# DOCKER
dist/docker/* @penberg
# LSA
utils/logalloc* @tgrabiec
# MATERIALIZED VIEWS
db/view/* @nyh @psarna
cql3/statements/*view* @nyh @psarna
test/boost/view_* @nyh @psarna
# PACKAGING
dist/* @syuu1228
# REPAIR
repair/* @tgrabiec @asias @nyh
# SCHEMA MANAGEMENT
db/schema_tables* @tgrabiec @nyh
db/legacy_schema_migrator* @tgrabiec @nyh
service/migration* @tgrabiec @nyh
schema* @tgrabiec @nyh
# SECONDARY INDEXES
db/index/* @nyh @penberg @psarna
cql3/statements/*index* @nyh @penberg @psarna
test/boost/*index* @nyh @penberg @psarna
# SSTABLES
sstables/* @tgrabiec @raphaelsc @nyh
# STREAMING
streaming/* @tgrabiec @asias
service/storage_service.* @tgrabiec @asias
# ALTERNATOR
alternator/* @nyh @psarna
test/alternator/* @nyh @psarna
# HINTED HANDOFF
db/hints/* @haaawk @piodul @vladzcloudius
# REDIS
redis/* @nyh @syuu1228
redis-test/* @nyh @syuu1228
# READERS
reader_* @denesb
querier* @denesb
test/boost/mutation_reader_test.cc @denesb
test/boost/querier_cache_test.cc @denesb

3
.gitignore vendored
View File

@@ -22,6 +22,5 @@ resources
.pytest_cache
/expressions.tokens
tags
testlog
testlog/*
test/*/*.reject
.vscode

9
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui
@@ -13,11 +13,8 @@
path = abseil
url = ../abseil-cpp
[submodule "scylla-jmx"]
path = tools/jmx
path = scylla-jmx
url = ../scylla-jmx
[submodule "scylla-tools"]
path = tools/java
path = scylla-tools
url = ../scylla-tools-java
[submodule "scylla-python3"]
path = tools/python3
url = ../scylla-python3

View File

@@ -1,5 +1,8 @@
cmake_minimum_required(VERSION 3.18)
##
## For best results, first compile the project using the Ninja build-system.
##
cmake_minimum_required(VERSION 3.7)
project(scylla)
if(NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)
@@ -17,739 +20,136 @@ else()
set(BUILD_TYPE "release")
endif()
function(default_target_arch arch)
set(x86_instruction_sets i386 i686 x86_64)
if(CMAKE_SYSTEM_PROCESSOR IN_LIST x86_instruction_sets)
set(${arch} "westmere" PARENT_SCOPE)
elseif(CMAKE_SYSTEM_PROCESSOR EQUAL "aarch64")
set(${arch} "armv8-a+crc+crypto" PARENT_SCOPE)
else()
set(${arch} "" PARENT_SCOPE)
endif()
endfunction()
default_target_arch(target_arch)
if(target_arch)
set(target_arch_flag "-march=${target_arch}")
if (NOT DEFINED FOR_IDE AND NOT DEFINED ENV{FOR_IDE} AND NOT DEFINED ENV{CLION_IDE})
message(FATAL_ERROR "This CMakeLists.txt file is only valid for use in IDEs, please define FOR_IDE to acknowledge this.")
endif()
# Configure Seastar compile options to align with Scylla
set(Seastar_CXX_FLAGS -fcoroutines ${target_arch_flag} CACHE INTERNAL "" FORCE)
set(Seastar_CXX_DIALECT gnu++20 CACHE INTERNAL "" FORCE)
# These paths are always available, since they're included in the repository. Additional DPDK headers are placed while
# Seastar is built, and are captured in `SEASTAR_INCLUDE_DIRS` through parsing the Seastar pkg-config file (below).
set(SEASTAR_DPDK_INCLUDE_DIRS
seastar/dpdk/lib/librte_eal/common/include
seastar/dpdk/lib/librte_eal/common/include/generic
seastar/dpdk/lib/librte_eal/common/include/x86
seastar/dpdk/lib/librte_ether)
add_subdirectory(seastar)
add_subdirectory(abseil)
# Exclude absl::strerror from the default "all" target since it's not
# used in Scylla build and, moreover, makes use of deprecated glibc APIs,
# such as sys_nerr, which are not exposed from "stdio.h" since glibc 2.32,
# which happens to be the case for recent Fedora distribution versions.
#
# Need to use the internal "absl_strerror" target name instead of namespaced
# variant because `set_target_properties` does not understand the latter form,
# unfortunately.
set_target_properties(absl_strerror PROPERTIES EXCLUDE_FROM_ALL TRUE)
find_package(PkgConfig REQUIRED)
# System libraries dependencies
find_package(Boost COMPONENTS filesystem program_options system thread regex REQUIRED)
find_package(Lua REQUIRED)
find_package(ZLIB REQUIRED)
find_package(ICU COMPONENTS uc REQUIRED)
set(ENV{PKG_CONFIG_PATH} "${CMAKE_SOURCE_DIR}/build/${BUILD_TYPE}/seastar:$ENV{PKG_CONFIG_PATH}")
pkg_check_modules(SEASTAR seastar)
set(scylla_build_dir "${CMAKE_BINARY_DIR}/build/${BUILD_TYPE}")
set(scylla_gen_build_dir "${scylla_build_dir}/gen")
file(MAKE_DIRECTORY "${scylla_build_dir}" "${scylla_gen_build_dir}")
if(NOT SEASTAR_INCLUDE_DIRS)
# Default value. A more accurate list is populated through `pkg-config` below if `seastar.pc` is available.
set(SEASTAR_INCLUDE_DIRS "seastar/include")
endif()
# Place libraries, executables and archives in ${buildroot}/build/${mode}/
foreach(mode RUNTIME LIBRARY ARCHIVE)
set(CMAKE_${mode}_OUTPUT_DIRECTORY "${scylla_build_dir}")
endforeach()
find_package(Boost COMPONENTS filesystem program_options system thread)
# Generate C++ source files from thrift definitions
function(scylla_generate_thrift)
set(one_value_args TARGET VAR IN_FILE OUT_DIR SERVICE)
cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})
##
## Populate the names of all source and header files in the indicated paths in a designated variable.
##
## When RECURSIVE is specified, directories are traversed recursively.
##
## Use: scan_scylla_source_directories(VAR my_result_var [RECURSIVE] PATHS [path1 path2 ...])
##
function (scan_scylla_source_directories)
set(options RECURSIVE)
set(oneValueArgs VAR)
set(multiValueArgs PATHS)
cmake_parse_arguments(args "${options}" "${oneValueArgs}" "${multiValueArgs}" "${ARGN}")
get_filename_component(in_file_name ${args_IN_FILE} NAME_WE)
set(globs "")
set(aux_out_file_name ${args_OUT_DIR}/${in_file_name})
set(outputs
${aux_out_file_name}_types.cpp
${aux_out_file_name}_types.h
${aux_out_file_name}_constants.cpp
${aux_out_file_name}_constants.h
${args_OUT_DIR}/${args_SERVICE}.cpp
${args_OUT_DIR}/${args_SERVICE}.h)
foreach (dir ${args_PATHS})
list(APPEND globs "${dir}/*.cc" "${dir}/*.hh")
endforeach()
add_custom_command(
DEPENDS
${args_IN_FILE}
thrift
OUTPUT ${outputs}
COMMAND ${CMAKE_COMMAND} -E make_directory ${args_OUT_DIR}
COMMAND thrift -gen cpp:cob_style,no_skeleton -out "${args_OUT_DIR}" "${args_IN_FILE}")
if (args_RECURSIVE)
set(glob_kind GLOB_RECURSE)
else()
set(glob_kind GLOB)
endif()
add_custom_target(${args_TARGET}
DEPENDS ${outputs})
file(${glob_kind} var
${globs})
set(${args_VAR} ${outputs} PARENT_SCOPE)
set(${args_VAR} ${var} PARENT_SCOPE)
endfunction()
scylla_generate_thrift(
TARGET scylla_thrift_gen_cassandra
VAR scylla_thrift_gen_cassandra_files
IN_FILE interface/cassandra.thrift
OUT_DIR ${scylla_gen_build_dir}
SERVICE Cassandra)
## Although Seastar is an external project, it is common enough to explore the sources while doing
## Scylla development that we'll treat the Seastar sources as part of this project for easier navigation.
scan_scylla_source_directories(
VAR SEASTAR_SOURCE_FILES
RECURSIVE
# Parse antlr3 grammar files and generate C++ sources
function(scylla_generate_antlr3)
set(one_value_args TARGET VAR IN_FILE OUT_DIR)
cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})
PATHS
seastar/core
seastar/http
seastar/json
seastar/net
seastar/rpc
seastar/testing
seastar/util)
get_filename_component(in_file_pure_name ${args_IN_FILE} NAME)
get_filename_component(stem ${in_file_pure_name} NAME_WE)
scan_scylla_source_directories(
VAR SCYLLA_ROOT_SOURCE_FILES
PATHS .)
set(outputs
"${args_OUT_DIR}/${stem}Lexer.hpp"
"${args_OUT_DIR}/${stem}Lexer.cpp"
"${args_OUT_DIR}/${stem}Parser.hpp"
"${args_OUT_DIR}/${stem}Parser.cpp")
scan_scylla_source_directories(
VAR SCYLLA_SUB_SOURCE_FILES
RECURSIVE
add_custom_command(
DEPENDS
${args_IN_FILE}
OUTPUT ${outputs}
# Remove #ifdef'ed code from the grammar source code
COMMAND sed -e "/^#if 0/,/^#endif/d" "${args_IN_FILE}" > "${args_OUT_DIR}/${in_file_pure_name}"
COMMAND antlr3 "${args_OUT_DIR}/${in_file_pure_name}"
# We replace many local `ExceptionBaseType* ex` variables with a single function-scope one.
# Because we add such a variable to every function, and because `ExceptionBaseType` is not a global
# name, we also add a global typedef to avoid compilation errors.
COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Lexer.hpp"
COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Lexer.cpp"
COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Parser.hpp"
COMMAND sed -i
-e "s/^\\( *\\)\\(ImplTraits::CommonTokenType\\* [a-zA-Z0-9_]* = NULL;\\)$/\\1const \\2/"
-e "/^.*On :.*$/d"
-e "1i using ExceptionBaseType = int;"
-e "s/^{/{ ExceptionBaseType\\* ex = nullptr;/; s/ExceptionBaseType\\* ex = new/ex = new/; s/exceptions::syntax_exception e/exceptions::syntax_exception\\& e/"
"${args_OUT_DIR}/${stem}Parser.cpp"
VERBATIM)
PATHS
api
auth
cql3
db
dht
exceptions
gms
index
io
locator
message
repair
service
sstables
streaming
test
thrift
tracing
transport
utils)
add_custom_target(${args_TARGET}
DEPENDS ${outputs})
scan_scylla_source_directories(
VAR SCYLLA_GEN_SOURCE_FILES
RECURSIVE
PATHS build/${BUILD_TYPE}/gen)
set(${args_VAR} ${outputs} PARENT_SCOPE)
endfunction()
set(antlr3_grammar_files
cql3/Cql.g
alternator/expressions.g)
set(antlr3_gen_files)
foreach(f ${antlr3_grammar_files})
get_filename_component(grammar_file_name "${f}" NAME_WE)
get_filename_component(f_dir "${f}" DIRECTORY)
scylla_generate_antlr3(
TARGET scylla_antlr3_gen_${grammar_file_name}
VAR scylla_antlr3_gen_${grammar_file_name}_files
IN_FILE ${f}
OUT_DIR ${scylla_gen_build_dir}/${f_dir})
list(APPEND antlr3_gen_files "${scylla_antlr3_gen_${grammar_file_name}_files}")
endforeach()
# Generate C++ sources from ragel grammar files
seastar_generate_ragel(
TARGET scylla_ragel_gen_protocol_parser
VAR scylla_ragel_gen_protocol_parser_file
IN_FILE redis/protocol_parser.rl
OUT_FILE ${scylla_gen_build_dir}/redis/protocol_parser.hh)
# Generate C++ sources from Swagger definitions
set(swagger_files
api/api-doc/cache_service.json
api/api-doc/collectd.json
api/api-doc/column_family.json
api/api-doc/commitlog.json
api/api-doc/compaction_manager.json
api/api-doc/config.json
api/api-doc/endpoint_snitch_info.json
api/api-doc/error_injection.json
api/api-doc/failure_detector.json
api/api-doc/gossiper.json
api/api-doc/hinted_handoff.json
api/api-doc/lsa.json
api/api-doc/messaging_service.json
api/api-doc/storage_proxy.json
api/api-doc/storage_service.json
api/api-doc/stream_manager.json
api/api-doc/system.json
api/api-doc/utils.json)
set(swagger_gen_files)
foreach(f ${swagger_files})
get_filename_component(fname "${f}" NAME_WE)
get_filename_component(dir "${f}" DIRECTORY)
seastar_generate_swagger(
TARGET scylla_swagger_gen_${fname}
VAR scylla_swagger_gen_${fname}_files
IN_FILE "${f}"
OUT_DIR "${scylla_gen_build_dir}/${dir}")
list(APPEND swagger_gen_files "${scylla_swagger_gen_${fname}_files}")
endforeach()
# Create C++ bindings for IDL serializers
function(scylla_generate_idl_serializer)
set(one_value_args TARGET VAR IN_FILE OUT_FILE)
cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})
get_filename_component(out_dir ${args_OUT_FILE} DIRECTORY)
set(idl_compiler "${CMAKE_SOURCE_DIR}/idl-compiler.py")
find_package(Python3 COMPONENTS Interpreter)
add_custom_command(
DEPENDS
${args_IN_FILE}
${idl_compiler}
OUTPUT ${args_OUT_FILE}
COMMAND ${CMAKE_COMMAND} -E make_directory ${out_dir}
COMMAND Python3::Interpreter ${idl_compiler} --ns ser -f ${args_IN_FILE} -o ${args_OUT_FILE})
add_custom_target(${args_TARGET}
DEPENDS ${args_OUT_FILE})
set(${args_VAR} ${args_OUT_FILE} PARENT_SCOPE)
endfunction()
set(idl_serializers
idl/cache_temperature.idl.hh
idl/commitlog.idl.hh
idl/consistency_level.idl.hh
idl/frozen_mutation.idl.hh
idl/frozen_schema.idl.hh
idl/gossip_digest.idl.hh
idl/idl_test.idl.hh
idl/keys.idl.hh
idl/messaging_service.idl.hh
idl/mutation.idl.hh
idl/paging_state.idl.hh
idl/partition_checksum.idl.hh
idl/paxos.idl.hh
idl/query.idl.hh
idl/range.idl.hh
idl/read_command.idl.hh
idl/reconcilable_result.idl.hh
idl/replay_position.idl.hh
idl/result.idl.hh
idl/ring_position.idl.hh
idl/streaming.idl.hh
idl/token.idl.hh
idl/tracing.idl.hh
idl/truncation_record.idl.hh
idl/uuid.idl.hh
idl/view.idl.hh)
set(idl_gen_files)
foreach(f ${idl_serializers})
get_filename_component(idl_name "${f}" NAME)
get_filename_component(idl_target "${idl_name}" NAME_WE)
get_filename_component(idl_dir "${f}" DIRECTORY)
string(REPLACE ".idl.hh" ".dist.hh" idl_out_hdr_name "${idl_name}")
scylla_generate_idl_serializer(
TARGET scylla_idl_gen_${idl_target}
VAR scylla_idl_gen_${idl_target}_files
IN_FILE ${f}
OUT_FILE ${scylla_gen_build_dir}/${idl_dir}/${idl_out_hdr_name})
list(APPEND idl_gen_files "${scylla_idl_gen_${idl_target}_files}")
endforeach()
set(scylla_sources
absl-flat_hash_map.cc
alternator/auth.cc
alternator/base64.cc
alternator/conditions.cc
alternator/executor.cc
alternator/expressions.cc
alternator/serialization.cc
alternator/server.cc
alternator/stats.cc
alternator/streams.cc
api/api.cc
api/cache_service.cc
api/collectd.cc
api/column_family.cc
api/commitlog.cc
api/compaction_manager.cc
api/config.cc
api/endpoint_snitch.cc
api/error_injection.cc
api/failure_detector.cc
api/gossiper.cc
api/hinted_handoff.cc
api/lsa.cc
api/messaging_service.cc
api/storage_proxy.cc
api/storage_service.cc
api/stream_manager.cc
api/system.cc
atomic_cell.cc
auth/allow_all_authenticator.cc
auth/allow_all_authorizer.cc
auth/authenticated_user.cc
auth/authentication_options.cc
auth/authenticator.cc
auth/common.cc
auth/default_authorizer.cc
auth/password_authenticator.cc
auth/passwords.cc
auth/permission.cc
auth/permissions_cache.cc
auth/resource.cc
auth/role_or_anonymous.cc
auth/roles-metadata.cc
auth/sasl_challenge.cc
auth/service.cc
auth/standard_role_manager.cc
auth/transitional.cc
bytes.cc
canonical_mutation.cc
cdc/cdc_partitioner.cc
cdc/generation.cc
cdc/log.cc
cdc/metadata.cc
cdc/split.cc
clocks-impl.cc
collection_mutation.cc
compress.cc
connection_notifier.cc
converting_mutation_partition_applier.cc
counters.cc
cql3/abstract_marker.cc
cql3/attributes.cc
cql3/cf_name.cc
cql3/column_condition.cc
cql3/column_identifier.cc
cql3/column_specification.cc
cql3/constants.cc
cql3/cql3_type.cc
cql3/expr/expression.cc
cql3/functions/aggregate_fcts.cc
cql3/functions/castas_fcts.cc
cql3/functions/error_injection_fcts.cc
cql3/functions/functions.cc
cql3/functions/user_function.cc
cql3/index_name.cc
cql3/keyspace_element_name.cc
cql3/lists.cc
cql3/maps.cc
cql3/operation.cc
cql3/query_options.cc
cql3/query_processor.cc
cql3/relation.cc
cql3/restrictions/statement_restrictions.cc
cql3/result_set.cc
cql3/role_name.cc
cql3/selection/abstract_function_selector.cc
cql3/selection/selectable.cc
cql3/selection/selection.cc
cql3/selection/selector.cc
cql3/selection/selector_factories.cc
cql3/selection/simple_selector.cc
cql3/sets.cc
cql3/single_column_relation.cc
cql3/statements/alter_keyspace_statement.cc
cql3/statements/alter_table_statement.cc
cql3/statements/alter_type_statement.cc
cql3/statements/alter_view_statement.cc
cql3/statements/authentication_statement.cc
cql3/statements/authorization_statement.cc
cql3/statements/batch_statement.cc
cql3/statements/cas_request.cc
cql3/statements/cf_prop_defs.cc
cql3/statements/cf_statement.cc
cql3/statements/create_function_statement.cc
cql3/statements/create_index_statement.cc
cql3/statements/create_keyspace_statement.cc
cql3/statements/create_table_statement.cc
cql3/statements/create_type_statement.cc
cql3/statements/create_view_statement.cc
cql3/statements/delete_statement.cc
cql3/statements/drop_function_statement.cc
cql3/statements/drop_index_statement.cc
cql3/statements/drop_keyspace_statement.cc
cql3/statements/drop_table_statement.cc
cql3/statements/drop_type_statement.cc
cql3/statements/drop_view_statement.cc
cql3/statements/function_statement.cc
cql3/statements/grant_statement.cc
cql3/statements/index_prop_defs.cc
cql3/statements/index_target.cc
cql3/statements/ks_prop_defs.cc
cql3/statements/list_permissions_statement.cc
cql3/statements/list_users_statement.cc
cql3/statements/modification_statement.cc
cql3/statements/permission_altering_statement.cc
cql3/statements/property_definitions.cc
cql3/statements/raw/parsed_statement.cc
cql3/statements/revoke_statement.cc
cql3/statements/role-management-statements.cc
cql3/statements/schema_altering_statement.cc
cql3/statements/select_statement.cc
cql3/statements/truncate_statement.cc
cql3/statements/update_statement.cc
cql3/statements/use_statement.cc
cql3/token_relation.cc
cql3/tuples.cc
cql3/type_json.cc
cql3/untyped_result_set.cc
cql3/update_parameters.cc
cql3/user_types.cc
cql3/ut_name.cc
cql3/util.cc
cql3/values.cc
cql3/variable_specifications.cc
data/cell.cc
database.cc
db/batchlog_manager.cc
db/commitlog/commitlog.cc
db/commitlog/commitlog_entry.cc
db/commitlog/commitlog_replayer.cc
db/config.cc
db/consistency_level.cc
db/cql_type_parser.cc
db/data_listeners.cc
db/extensions.cc
db/heat_load_balance.cc
db/hints/manager.cc
db/hints/resource_manager.cc
db/large_data_handler.cc
db/legacy_schema_migrator.cc
db/marshal/type_parser.cc
db/schema_tables.cc
db/size_estimates_virtual_reader.cc
db/snapshot-ctl.cc
db/sstables-format-selector.cc
db/system_distributed_keyspace.cc
db/system_keyspace.cc
db/view/row_locking.cc
db/view/view.cc
db/view/view_update_generator.cc
dht/boot_strapper.cc
dht/i_partitioner.cc
dht/murmur3_partitioner.cc
dht/range_streamer.cc
dht/token.cc
distributed_loader.cc
duration.cc
exceptions/exceptions.cc
flat_mutation_reader.cc
frozen_mutation.cc
frozen_schema.cc
gms/application_state.cc
gms/endpoint_state.cc
gms/failure_detector.cc
gms/feature_service.cc
gms/gossip_digest_ack.cc
gms/gossip_digest_ack2.cc
gms/gossip_digest_syn.cc
gms/gossiper.cc
gms/inet_address.cc
gms/version_generator.cc
gms/versioned_value.cc
hashers.cc
index/secondary_index.cc
index/secondary_index_manager.cc
init.cc
keys.cc
lister.cc
locator/abstract_replication_strategy.cc
locator/ec2_multi_region_snitch.cc
locator/ec2_snitch.cc
locator/everywhere_replication_strategy.cc
locator/gce_snitch.cc
locator/gossiping_property_file_snitch.cc
locator/local_strategy.cc
locator/network_topology_strategy.cc
locator/production_snitch_base.cc
locator/rack_inferring_snitch.cc
locator/simple_snitch.cc
locator/simple_strategy.cc
locator/snitch_base.cc
locator/token_metadata.cc
lua.cc
main.cc
memtable.cc
message/messaging_service.cc
multishard_mutation_query.cc
mutation.cc
raft/fsm.cc
raft/log.cc
raft/progress.cc
raft/raft.cc
raft/server.cc
mutation_fragment.cc
mutation_partition.cc
mutation_partition_serializer.cc
mutation_partition_view.cc
mutation_query.cc
mutation_reader.cc
mutation_writer/multishard_writer.cc
mutation_writer/shard_based_splitting_writer.cc
mutation_writer/timestamp_based_splitting_writer.cc
partition_slice_builder.cc
partition_version.cc
querier.cc
query-result-set.cc
query.cc
range_tombstone.cc
range_tombstone_list.cc
reader_concurrency_semaphore.cc
redis/abstract_command.cc
redis/command_factory.cc
redis/commands.cc
redis/keyspace_utils.cc
redis/lolwut.cc
redis/mutation_utils.cc
redis/options.cc
redis/query_processor.cc
redis/query_utils.cc
redis/server.cc
redis/service.cc
redis/stats.cc
repair/repair.cc
repair/row_level.cc
row_cache.cc
schema.cc
schema_mutations.cc
schema_registry.cc
service/client_state.cc
service/migration_manager.cc
service/migration_task.cc
service/misc_services.cc
service/pager/paging_state.cc
service/pager/query_pagers.cc
service/paxos/paxos_state.cc
service/paxos/prepare_response.cc
service/paxos/prepare_summary.cc
service/paxos/proposal.cc
service/priority_manager.cc
service/storage_proxy.cc
service/storage_service.cc
sstables/compaction.cc
sstables/compaction_manager.cc
sstables/compaction_strategy.cc
sstables/compress.cc
sstables/integrity_checked_file_impl.cc
sstables/kl/writer.cc
sstables/leveled_compaction_strategy.cc
sstables/m_format_read_helpers.cc
sstables/metadata_collector.cc
sstables/mp_row_consumer.cc
sstables/mx/writer.cc
sstables/partition.cc
sstables/prepended_input_stream.cc
sstables/random_access_reader.cc
sstables/size_tiered_compaction_strategy.cc
sstables/sstable_directory.cc
sstables/sstable_version.cc
sstables/sstables.cc
sstables/sstables_manager.cc
sstables/time_window_compaction_strategy.cc
sstables/writer.cc
streaming/progress_info.cc
streaming/session_info.cc
streaming/stream_coordinator.cc
streaming/stream_manager.cc
streaming/stream_plan.cc
streaming/stream_reason.cc
streaming/stream_receive_task.cc
streaming/stream_request.cc
streaming/stream_result_future.cc
streaming/stream_session.cc
streaming/stream_session_state.cc
streaming/stream_summary.cc
streaming/stream_task.cc
streaming/stream_transfer_task.cc
table.cc
table_helper.cc
thrift/controller.cc
thrift/handler.cc
thrift/server.cc
thrift/thrift_validation.cc
timeout_config.cc
tracing/trace_keyspace_helper.cc
tracing/trace_state.cc
tracing/traced_file.cc
tracing/tracing.cc
tracing/tracing_backend_registry.cc
transport/controller.cc
transport/cql_protocol_extension.cc
transport/event.cc
transport/event_notifier.cc
transport/messages/result_message.cc
transport/server.cc
types.cc
unimplemented.cc
utils/UUID_gen.cc
utils/arch/powerpc/crc32-vpmsum/crc32_wrapper.cc
utils/array-search.cc
utils/ascii.cc
utils/big_decimal.cc
utils/bloom_calculations.cc
utils/bloom_filter.cc
utils/buffer_input_stream.cc
utils/build_id.cc
utils/config_file.cc
utils/directories.cc
utils/disk-error-handler.cc
utils/dynamic_bitset.cc
utils/error_injection.cc
utils/exceptions.cc
utils/file_lock.cc
utils/generation-number.cc
utils/gz/crc_combine.cc
utils/human_readable.cc
utils/i_filter.cc
utils/large_bitset.cc
utils/like_matcher.cc
utils/limiting_data_source.cc
utils/logalloc.cc
utils/managed_bytes.cc
utils/multiprecision_int.cc
utils/murmur_hash.cc
utils/rate_limiter.cc
utils/rjson.cc
utils/runtime.cc
utils/updateable_value.cc
utils/utf8.cc
utils/uuid.cc
validation.cc
vint-serialization.cc
zstd.cc
release.cc)
set(scylla_gen_sources
"${scylla_thrift_gen_cassandra_files}"
"${scylla_ragel_gen_protocol_parser_file}"
"${swagger_gen_files}"
"${idl_gen_files}"
"${antlr3_gen_files}")
set(SCYLLA_SOURCE_FILES
${SCYLLA_ROOT_SOURCE_FILES}
${SCYLLA_GEN_SOURCE_FILES}
${SCYLLA_SUB_SOURCE_FILES})
add_executable(scylla
${scylla_sources}
${scylla_gen_sources})
${SEASTAR_SOURCE_FILES}
${SCYLLA_SOURCE_FILES})
target_link_libraries(scylla PRIVATE
seastar
# Boost dependencies
Boost::filesystem
Boost::program_options
Boost::system
Boost::thread
Boost::regex
Boost::headers
# Abseil libs
absl::hashtablez_sampler
absl::raw_hash_set
absl::synchronization
absl::graphcycles_internal
absl::stacktrace
absl::symbolize
absl::debugging_internal
absl::demangle_internal
absl::time
absl::time_zone
absl::int128
absl::city
absl::hash
absl::malloc_internal
absl::spinlock_wait
absl::base
absl::dynamic_annotations
absl::raw_logging_internal
absl::exponential_biased
absl::throw_delegate
# System libs
ZLIB::ZLIB
ICU::uc
systemd
zstd
snappy
${LUA_LIBRARIES}
thrift
crypt)
# If the Seastar pkg-config information is available, append to the default flags.
#
# For ease of browsing the source code, we always pretend that DPDK is enabled.
target_compile_options(scylla PUBLIC
-std=gnu++20
-DHAVE_DPDK
-DHAVE_HWLOC
"${SEASTAR_CFLAGS}")
target_link_libraries(scylla PRIVATE
-Wl,--build-id=sha1 # Force SHA1 build-id generation
# TODO: Use lld linker if it's available, otherwise gold, else bfd
-fuse-ld=lld)
# TODO: patch dynamic linker to match configure.py behavior
target_compile_options(scylla PRIVATE
-std=gnu++20
-fcoroutines # TODO: Clang does not have this flag, adjust to both variants
${target_arch_flag})
# Hacks needed to expose internal APIs for xxhash dependencies
target_compile_definitions(scylla PRIVATE XXH_PRIVATE_API HAVE_LZ4_COMPRESS_DEFAULT)
target_include_directories(scylla PRIVATE
"${CMAKE_CURRENT_SOURCE_DIR}"
libdeflate
abseil
"${scylla_gen_build_dir}")
###
### Create crc_combine_table helper executable.
### Use it to generate crc_combine_table.cc to be used in scylla at build time.
###
add_executable(crc_combine_table utils/gz/gen_crc_combine_table.cc)
target_link_libraries(crc_combine_table PRIVATE seastar)
target_include_directories(crc_combine_table PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}")
target_compile_options(crc_combine_table PRIVATE
-std=gnu++20
-fcoroutines
${target_arch_flag})
add_dependencies(scylla crc_combine_table)
# Generate an additional source file at build time that is needed for Scylla compilation
add_custom_command(OUTPUT "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc"
COMMAND $<TARGET_FILE:crc_combine_table> > "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc"
DEPENDS crc_combine_table)
target_sources(scylla PRIVATE "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc")
###
### Generate version file and supply appropriate compile definitions for release.cc
###
execute_process(COMMAND ${CMAKE_SOURCE_DIR}/SCYLLA-VERSION-GEN RESULT_VARIABLE scylla_version_gen_res)
if(scylla_version_gen_res)
message(SEND_ERROR "Version file generation failed. Return code: ${scylla_version_gen_res}")
endif()
file(READ build/SCYLLA-VERSION-FILE scylla_version)
string(STRIP "${scylla_version}" scylla_version)
file(READ build/SCYLLA-RELEASE-FILE scylla_release)
string(STRIP "${scylla_release}" scylla_release)
get_property(release_cdefs SOURCE "${CMAKE_SOURCE_DIR}/release.cc" PROPERTY COMPILE_DEFINITIONS)
list(APPEND release_cdefs "SCYLLA_VERSION=\"${scylla_version}\"" "SCYLLA_RELEASE=\"${scylla_release}\"")
set_source_files_properties("${CMAKE_SOURCE_DIR}/release.cc" PROPERTIES COMPILE_DEFINITIONS "${release_cdefs}")
###
### Custom command for building libdeflate. Link the library to scylla.
###
set(libdeflate_lib "${scylla_build_dir}/libdeflate/libdeflate.a")
add_custom_command(OUTPUT "${libdeflate_lib}"
COMMAND make -C libdeflate
BUILD_DIR=../build/${BUILD_TYPE}/libdeflate/
CC=${CMAKE_C_COMPILER}
"CFLAGS=${target_arch_flag}"
../build/${BUILD_TYPE}/libdeflate//libdeflate.a) # Two backslashes are important!
# Hack to force generating custom command to produce libdeflate.a
add_custom_target(libdeflate DEPENDS "${libdeflate_lib}")
target_link_libraries(scylla PRIVATE "${libdeflate_lib}")
# TODO: create cmake/ directory and move utilities (generate functions etc) there
# TODO: Build tests if BUILD_TESTING=on (using CTest module)
# The order matters here: prefer the "static" DPDK directories to any dynamic paths from pkg-config. Some files are only
# available dynamically, though.
target_include_directories(scylla PUBLIC
.
${SEASTAR_DPDK_INCLUDE_DIRS}
${SEASTAR_INCLUDE_DIRS}
${Boost_INCLUDE_DIRS}
xxhash
libdeflate
build/${BUILD_TYPE}/gen)

114
MAINTAINERS Normal file
View File

@@ -0,0 +1,114 @@
M: Maintainer with commit access
R: Reviewer with subsystem expertise
F: Filename, directory, or pattern for the subsystem
---
AUTH
R: Calle Wilund <calle@scylladb.com>
R: Vlad Zolotarov <vladz@scylladb.com>
R: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
F: auth/*
CACHE
M: Tomasz Grabiec <tgrabiec@scylladb.com>
R: Piotr Jastrzebski <piotr@scylladb.com>
F: row_cache*
F: *mutation*
F: tests/mvcc*
COMMITLOG / BATCHLOGa
R: Calle Wilund <calle@scylladb.com>
F: db/commitlog/*
F: db/batch*
COORDINATOR
R: Gleb Natapov <gleb@scylladb.com>
F: service/storage_proxy*
COMPACTION
R: Raphael S. Carvalho <raphaelsc@scylladb.com>
R: Glauber Costa <glauber@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
F: sstables/compaction*
CQL TRANSPORT LAYER
M: Pekka Enberg <penberg@scylladb.com>
F: transport/*
CQL QUERY LANGUAGE
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Pekka Enberg <penberg@scylladb.com>
F: cql3/*
COUNTERS
F: counters*
F: tests/counter_test*
GOSSIP
M: Tomasz Grabiec <tgrabiec@scylladb.com>
R: Asias He <asias@scylladb.com>
F: gms/*
DOCKER
M: Pekka Enberg <penberg@scylladb.com>
F: dist/docker/*
LSA
M: Tomasz Grabiec <tgrabiec@scylladb.com>
F: utils/logalloc*
MATERIALIZED VIEWS
M: Pekka Enberg <penberg@scylladb.com>
M: Nadav Har'El <nyh@scylladb.com>
F: db/view/*
F: cql3/statements/*view*
PACKAGING
R: Takuya ASADA <syuu@scylladb.com>
F: dist/*
REPAIR
M: Tomasz Grabiec <tgrabiec@scylladb.com>
R: Asias He <asias@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
F: repair/*
SCHEMA MANAGEMENT
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Pekka Enberg <penberg@scylladb.com>
F: db/schema_tables*
F: db/legacy_schema_migrator*
F: service/migration*
F: schema*
SECONDARY INDEXES
M: Pekka Enberg <penberg@scylladb.com>
M: Nadav Har'El <nyh@scylladb.com>
R: Pekka Enberg <penberg@scylladb.com>
F: db/index/*
F: cql3/statements/*index*
SSTABLES
M: Tomasz Grabiec <tgrabiec@scylladb.com>
R: Raphael S. Carvalho <raphaelsc@scylladb.com>
R: Glauber Costa <glauber@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
F: sstables/*
STREAMING
M: Tomasz Grabiec <tgrabiec@scylladb.com>
R: Asias He <asias@scylladb.com>
F: streaming/*
F: service/storage_service.*
ALTERNATOR
M: Nadav Har'El <nyh@scylladb.com>
F: alternator/*
F: alternator-test/*
THE REST
M: Avi Kivity <avi@scylladb.com>
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Nadav Har'El <nyh@scylladb.com>
F: *

102
README.md
View File

@@ -1,66 +1,43 @@
# Scylla
[![Slack](https://img.shields.io/badge/slack-scylla-brightgreen.svg?logo=slack)](http://slack.scylladb.com)
[![Twitter](https://img.shields.io/twitter/follow/ScyllaDB.svg?style=social&label=Follow)](https://twitter.com/intent/follow?screen_name=ScyllaDB)
## What is Scylla?
Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB.
Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.
For more information, please see the [ScyllaDB web site].
[ScyllaDB web site]: https://www.scylladb.com
## Build Prerequisites
## Quick-start
Scylla is fairly fussy about its build environment, requiring very recent
versions of the C++20 compiler and of many libraries to build. The document
[HACKING.md](HACKING.md) includes detailed information on building and
developing Scylla, but to get Scylla building quickly on (almost) any build
machine, Scylla offers a [frozen toolchain](tools/toolchain/README.md),
machine, Scylla offers offers a [frozen toolchain](tools/toolchain/README.md),
This is a pre-configured Docker image which includes recent versions of all
the required compilers, libraries and build tools. Using the frozen toolchain
allows you to avoid changing anything in your build machine to meet Scylla's
requirements - you just need to meet the frozen toolchain's prerequisites
(mostly, Docker or Podman being available).
## Building Scylla
Building Scylla with the frozen toolchain `dbuild` is as easy as:
Building and running Scylla with the frozen toolchain is as easy as:
```bash
$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla
$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1
```
For further information, please see:
* [Developer documentation] for more information on building Scylla.
* [Build documentation] on how to build Scylla binaries, tests, and packages.
* [Docker image build documentation] for information on how to build Docker images.
[developer documentation]: HACKING.md
[build documentation]: docs/building.md
[docker image build documentation]: dist/docker/redhat/README.md
## Running Scylla
To start Scylla server, run:
* Run Scylla
```
./build/release/scylla
```bash
$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1
```
This will start a Scylla node with one CPU core allocated to it and data files stored in the `tmp` directory.
The `--developer-mode` is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations).
Please note that you need to run Scylla with `dbuild` if you built it with the frozen toolchain.
* run Scylla with one CPU and ./tmp as work directory
For more run options, run:
```
./build/release/scylla --workdir tmp --smp 1
```
```bash
$ ./tools/toolchain/dbuild ./build/release/scylla --help
* For more run options:
```
./build/release/scylla --help
```
## Testing
@@ -69,10 +46,10 @@ See [test.py manual](docs/testing.md).
## Scylla APIs and compatibility
By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and
Thrift. There is also support for the API of Amazon DynamoDB,
which needs to be enabled and configured in order to be used. For more
information on how to enable the DynamoDB™ API in Scylla,
and the current compatibility of this feature as well as Scylla-specific extensions, see
Thrift. There is also experimental support for the API of Amazon DynamoDB,
but being experimental it needs to be explicitly enabled to be used. For more
information on how to enable the experimental DynamoDB compatibility in Scylla,
and the current limitations of this feature, see
[Alternator](docs/alternator/alternator.md) and
[Getting started with Alternator](docs/alternator/getting-started.md).
@@ -92,22 +69,27 @@ The courses are free, self-paced and include hands-on examples. They cover a var
administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions,
multi-datacenters and how Scylla integrates with third-party applications.
## Building a CentOS-based Docker image
Build a Docker image with:
```
cd dist/docker/redhat
docker build -t <image-name> .
```
This build is based on executables downloaded from downloads.scylladb.com,
**not** on the executables built in this source directory. See further
instructions in dist/docker/redhat/README.md to build a docker image from
your own executables.
Run the image with:
```
docker run -p $(hostname -i):9042:9042 -i -t <image name>
```
## Contributing to Scylla
If you want to report a bug or submit a pull request or a patch, please read the [contribution guidelines].
If you are a developer working on Scylla, please read the [developer guidelines].
[contribution guidelines]: CONTRIBUTING.md
[developer guidelines]: HACKING.md
## Contact
* The [users mailing list] and [Slack channel] are for users to discuss configuration, management, and operations of the ScyllaDB open source.
* The [developers mailing list] is for developers and people interested in following the development of ScyllaDB to discuss technical topics.
[Users mailing list]: https://groups.google.com/forum/#!forum/scylladb-users
[Slack channel]: http://slack.scylladb.com/
[Developers mailing list]: https://groups.google.com/forum/#!forum/scylladb-dev
[Hacking howto](HACKING.md)
[Guidelines for contributing](CONTRIBUTING.md)

View File

@@ -1,7 +1,7 @@
#!/bin/sh
PRODUCT=scylla
VERSION=4.4.dev
VERSION=4.2.4
if test -f version
then

2
abseil

Submodule abseil updated: 1e3d25b265...2069dc796a

View File

@@ -78,12 +78,12 @@ void check_expiry(std::string_view signature_date) {
std::string expiration_str = format_time_point(db_clock::now() - 15min);
std::string validity_str = format_time_point(db_clock::now() + 15min);
if (signature_date < expiration_str) {
throw api_error::invalid_signature(
throw api_error("InvalidSignatureException",
fmt::format("Signature expired: {} is now earlier than {} (current time - 15 min.)",
signature_date, expiration_str));
}
if (signature_date > validity_str) {
throw api_error::invalid_signature(
throw api_error("InvalidSignatureException",
fmt::format("Signature not yet current: {} is still later than {} (current time + 15 min.)",
signature_date, validity_str));
}
@@ -94,13 +94,13 @@ std::string get_signature(std::string_view access_key_id, std::string_view secre
std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string) {
auto amz_date_it = signed_headers_map.find("x-amz-date");
if (amz_date_it == signed_headers_map.end()) {
throw api_error::invalid_signature("X-Amz-Date header is mandatory for signature verification");
throw api_error("InvalidSignatureException", "X-Amz-Date header is mandatory for signature verification");
}
std::string_view amz_date = amz_date_it->second;
check_expiry(amz_date);
std::string_view datestamp = amz_date.substr(0, 8);
if (datestamp != orig_datestamp) {
throw api_error::invalid_signature(
throw api_error("InvalidSignatureException",
format("X-Amz-Date date does not match the provided datestamp. Expected {}, got {}",
orig_datestamp, datestamp));
}
@@ -126,18 +126,19 @@ std::string get_signature(std::string_view access_key_id, std::string_view secre
future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username) {
static const sstring query = format("SELECT salted_hash FROM {} WHERE {} = ?",
auth::meta::roles_table::qualified_name, auth::meta::roles_table::role_col_name);
auth::meta::roles_table::qualified_name(), auth::meta::roles_table::role_col_name);
auto cl = auth::password_authenticator::consistency_for_user(username);
return qp.execute_internal(query, cl, auth::internal_distributed_query_state(), {sstring(username)}, true).then_wrapped([username = std::move(username)] (future<::shared_ptr<cql3::untyped_result_set>> f) {
auto& timeout = auth::internal_distributed_timeout_config();
return qp.execute_internal(query, cl, timeout, {sstring(username)}, true).then_wrapped([username = std::move(username)] (future<::shared_ptr<cql3::untyped_result_set>> f) {
auto res = f.get0();
auto salted_hash = std::optional<sstring>();
if (res->empty()) {
throw api_error::unrecognized_client(fmt::format("User not found: {}", username));
throw api_error("UnrecognizedClientException", fmt::format("User not found: {}", username));
}
salted_hash = res->one().get_opt<sstring>("salted_hash");
if (!salted_hash) {
throw api_error::unrecognized_client(fmt::format("No password found for user: {}", username));
throw api_error("UnrecognizedClientException", fmt::format("No password found for user: {}", username));
}
return make_ready_future<std::string>(*salted_hash);
});

View File

@@ -32,13 +32,13 @@
// and the character used in base64 encoding to represent it.
static class base64_chars {
public:
static constexpr const char to[] =
static constexpr const char* to =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
int8_t from[255];
base64_chars() {
static_assert(sizeof(to) == 64 + 1);
static_assert(strlen(to) == 64);
for (int i = 0; i < 255; i++) {
from[i] = -1; // signal invalid character
from[i] = 255; // signal invalid character
}
for (int i = 0; i < 64; i++) {
from[(unsigned) to[i]] = i;

View File

@@ -23,7 +23,7 @@
#include <string_view>
#include "bytes.hh"
#include "utils/rjson.hh"
#include "rjson.hh"
std::string base64_encode(bytes_view);

View File

@@ -26,7 +26,7 @@
#include "alternator/error.hh"
#include "cql3/constants.hh"
#include <unordered_map>
#include "utils/rjson.hh"
#include "rjson.hh"
#include "serialization.hh"
#include "base64.hh"
#include <stdexcept>
@@ -57,12 +57,12 @@ comparison_operator_type get_comparison_operator(const rjson::value& comparison_
{"NOT_CONTAINS", comparison_operator_type::NOT_CONTAINS},
};
if (!comparison_operator.IsString()) {
throw api_error::validation(format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));
throw api_error("ValidationException", format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));
}
std::string op = comparison_operator.GetString();
auto it = ops.find(op);
if (it == ops.end()) {
throw api_error::validation(format("Unsupported comparison operator {}", op));
throw api_error("ValidationException", format("Unsupported comparison operator {}", op));
}
return it->second;
}
@@ -104,10 +104,10 @@ static void verify_operand_count(const rjson::value* array, const size_check& ex
return;
}
if (!array || !array->IsArray()) {
throw api_error::validation("With ComparisonOperator, AttributeValueList must be given and an array");
throw api_error("ValidationException", "With ComparisonOperator, AttributeValueList must be given and an array");
}
if (!expected(array->Size())) {
throw api_error::validation(
throw api_error("ValidationException",
format("{} operator requires AttributeValueList {}, instead found list size {}",
op, expected.what(), array->Size()));
}
@@ -131,7 +131,7 @@ static bool check_EQ_for_sets(const rjson::value& set1, const rjson::value& set2
set1_raw.insert(&*it);
}
for (const auto& a : set2.GetArray()) {
if (!set1_raw.contains(&a)) {
if (set1_raw.count(&a) == 0) {
return false;
}
}
@@ -159,23 +159,40 @@ static bool check_NE(const rjson::value* v1, const rjson::value& v2) {
}
// Check if two JSON-encoded values match with the BEGINS_WITH relation
static bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2) {
// BEGINS_WITH requires that its single operand (v2) be a string or
// binary - otherwise it's a validation error. However, problems with
// the stored attribute (v1) will just return false (no match).
if (!v2.IsObject() || v2.MemberCount() != 1) {
throw api_error::validation(format("BEGINS_WITH operator encountered malformed AttributeValue: {}", v2));
}
auto it2 = v2.MemberBegin();
if (it2->name != "S" && it2->name != "B") {
throw api_error::validation(format("BEGINS_WITH operator requires String or Binary type in AttributeValue, got {}", it2->name));
}
bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2,
bool v1_from_query, bool v2_from_query) {
bool bad = false;
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
if (v1_from_query) {
throw api_error("ValidationException", "begins_with() encountered malformed argument");
} else {
bad = true;
}
} else if (v1->MemberBegin()->name != "S" && v1->MemberBegin()->name != "B") {
if (v1_from_query) {
throw api_error("ValidationException", format("begins_with supports only string or binary type, got: {}", *v1));
} else {
bad = true;
}
}
if (!v2.IsObject() || v2.MemberCount() != 1) {
if (v2_from_query) {
throw api_error("ValidationException", "begins_with() encountered malformed argument");
} else {
bad = true;
}
} else if (v2.MemberBegin()->name != "S" && v2.MemberBegin()->name != "B") {
if (v2_from_query) {
throw api_error("ValidationException", format("begins_with() supports only string or binary type, got: {}", v2));
} else {
bad = true;
}
}
if (bad) {
return false;
}
auto it1 = v1->MemberBegin();
auto it2 = v2.MemberBegin();
if (it1->name != it2->name) {
return false;
}
@@ -233,12 +250,12 @@ static bool check_NOT_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
// Check if a JSON-encoded value equals any element of an array, which must have at least one element.
static bool check_IN(const rjson::value* val, const rjson::value& array) {
if (!array[0].IsObject() || array[0].MemberCount() != 1) {
throw api_error::validation(
throw api_error("ValidationException",
format("IN operator encountered malformed AttributeValue: {}", array[0]));
}
const auto& type = array[0].MemberBegin()->name;
if (type != "S" && type != "N" && type != "B") {
throw api_error::validation(
throw api_error("ValidationException",
"IN operator requires AttributeValueList elements to be of type String, Number, or Binary ");
}
if (!val) {
@@ -247,7 +264,7 @@ static bool check_IN(const rjson::value* val, const rjson::value& array) {
bool have_match = false;
for (const auto& elem : array.GetArray()) {
if (!elem.IsObject() || elem.MemberCount() != 1 || elem.MemberBegin()->name != type) {
throw api_error::validation(
throw api_error("ValidationException",
"IN operator requires all AttributeValueList elements to have the same type ");
}
if (!have_match && *val == elem) {
@@ -279,24 +296,38 @@ static bool check_NOT_NULL(const rjson::value* val) {
return val != nullptr;
}
// Only types S, N or B (string, number or bytes) may be compared by the
// various comparion operators - lt, le, gt, ge, and between.
static bool check_comparable_type(const rjson::value& v) {
if (!v.IsObject() || v.MemberCount() != 1) {
return false;
}
const rjson::value& type = v.MemberBegin()->name;
return type == "S" || type == "N" || type == "B";
}
// Check if two JSON-encoded values match with cmp.
template <typename Comparator>
bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp) {
if (!v2.IsObject() || v2.MemberCount() != 1) {
throw api_error::validation(
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic));
bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp,
bool v1_from_query, bool v2_from_query) {
bool bad = false;
if (!v1 || !check_comparable_type(*v1)) {
if (v1_from_query) {
throw api_error("ValidationException", format("{} allow only the types String, Number, or Binary", cmp.diagnostic));
}
bad = true;
}
const auto& kv2 = *v2.MemberBegin();
if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {
throw api_error::validation(
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic));
if (!check_comparable_type(v2)) {
if (v2_from_query) {
throw api_error("ValidationException", format("{} allow only the types String, Number, or Binary", cmp.diagnostic));
}
bad = true;
}
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
if (bad) {
return false;
}
const auto& kv1 = *v1->MemberBegin();
const auto& kv2 = *v2.MemberBegin();
if (kv1.name != kv2.name) {
return false;
}
@@ -310,7 +341,8 @@ bool check_compare(const rjson::value* v1, const rjson::value& v2, const Compara
if (kv1.name == "B") {
return cmp(base64_decode(kv1.value), base64_decode(kv2.value));
}
clogger.error("check_compare panic: LHS type equals RHS type, but one is in {N,S,B} while the other isn't");
// cannot reach here, as check_comparable_type() verifies the type is one
// of the above options.
return false;
}
@@ -341,56 +373,71 @@ struct cmp_gt {
static constexpr const char* diagnostic = "GT operator";
};
// True if v is between lb and ub, inclusive. Throws if lb > ub.
// True if v is between lb and ub, inclusive. Throws or returns false
// (depending on bounds_from_query parameter) if lb > ub.
template <typename T>
static bool check_BETWEEN(const T& v, const T& lb, const T& ub) {
static bool check_BETWEEN(const T& v, const T& lb, const T& ub, bool bounds_from_query) {
if (cmp_lt()(ub, lb)) {
throw api_error::validation(
format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));
if (bounds_from_query) {
throw api_error("ValidationException",
format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));
} else {
return false;
}
}
return cmp_ge()(v, lb) && cmp_le()(v, ub);
}
static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const rjson::value& ub) {
if (!v) {
static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const rjson::value& ub,
bool v_from_query, bool lb_from_query, bool ub_from_query) {
if ((v && v_from_query && !check_comparable_type(*v)) ||
(lb_from_query && !check_comparable_type(lb)) ||
(ub_from_query && !check_comparable_type(ub))) {
throw api_error("ValidationException", "between allow only the types String, Number, or Binary");
}
if (!v || !v->IsObject() || v->MemberCount() != 1 ||
!lb.IsObject() || lb.MemberCount() != 1 ||
!ub.IsObject() || ub.MemberCount() != 1) {
return false;
}
if (!v->IsObject() || v->MemberCount() != 1) {
throw api_error::validation(format("BETWEEN operator encountered malformed AttributeValue: {}", *v));
}
if (!lb.IsObject() || lb.MemberCount() != 1) {
throw api_error::validation(format("BETWEEN operator encountered malformed AttributeValue: {}", lb));
}
if (!ub.IsObject() || ub.MemberCount() != 1) {
throw api_error::validation(format("BETWEEN operator encountered malformed AttributeValue: {}", ub));
}
const auto& kv_v = *v->MemberBegin();
const auto& kv_lb = *lb.MemberBegin();
const auto& kv_ub = *ub.MemberBegin();
bool bounds_from_query = lb_from_query && ub_from_query;
if (kv_lb.name != kv_ub.name) {
throw api_error::validation(
if (bounds_from_query) {
throw api_error("ValidationException",
format("BETWEEN operator requires the same type for lower and upper bound; instead got {} and {}",
kv_lb.name, kv_ub.name));
} else {
return false;
}
}
if (kv_v.name != kv_lb.name) { // Cannot compare different types, so v is NOT between lb and ub.
return false;
}
if (kv_v.name == "N") {
const char* diag = "BETWEEN operator";
return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag));
return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag), bounds_from_query);
}
if (kv_v.name == "S") {
return check_BETWEEN(std::string_view(kv_v.value.GetString(), kv_v.value.GetStringLength()),
std::string_view(kv_lb.value.GetString(), kv_lb.value.GetStringLength()),
std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()));
std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()),
bounds_from_query);
}
if (kv_v.name == "B") {
return check_BETWEEN(base64_decode(kv_v.value), base64_decode(kv_lb.value), base64_decode(kv_ub.value));
return check_BETWEEN(base64_decode(kv_v.value), base64_decode(kv_lb.value), base64_decode(kv_ub.value), bounds_from_query);
}
throw api_error::validation(
format("BETWEEN operator requires AttributeValueList elements to be of type String, Number, or Binary; instead got {}",
if (v_from_query) {
throw api_error("ValidationException",
format("BETWEEN operator requires AttributeValueList elements to be of type String, Number, or Binary; instead got {}",
kv_lb.name));
} else {
return false;
}
}
// Verify one Expect condition on one attribute (whose content is "got")
@@ -408,24 +455,24 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu
// and requires a different combinations of parameters in the request
if (value) {
if (exists && (!exists->IsBool() || exists->GetBool() != true)) {
throw api_error::validation("Cannot combine Value with Exists!=true");
throw api_error("ValidationException", "Cannot combine Value with Exists!=true");
}
if (comparison_operator) {
throw api_error::validation("Cannot combine Value with ComparisonOperator");
throw api_error("ValidationException", "Cannot combine Value with ComparisonOperator");
}
return check_EQ(got, *value);
} else if (exists) {
if (comparison_operator) {
throw api_error::validation("Cannot combine Exists with ComparisonOperator");
throw api_error("ValidationException", "Cannot combine Exists with ComparisonOperator");
}
if (!exists->IsBool() || exists->GetBool() != false) {
throw api_error::validation("Exists!=false requires Value");
throw api_error("ValidationException", "Exists!=false requires Value");
}
// Remember Exists=false, so we're checking that the attribute does *not* exist:
return !got;
} else {
if (!comparison_operator) {
throw api_error::validation("Missing ComparisonOperator, Value or Exists");
throw api_error("ValidationException", "Missing ComparisonOperator, Value or Exists");
}
comparison_operator_type op = get_comparison_operator(*comparison_operator);
switch (op) {
@@ -437,19 +484,19 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu
return check_NE(got, (*attribute_value_list)[0]);
case comparison_operator_type::LT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_lt{});
return check_compare(got, (*attribute_value_list)[0], cmp_lt{}, false, true);
case comparison_operator_type::LE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_le{});
return check_compare(got, (*attribute_value_list)[0], cmp_le{}, false, true);
case comparison_operator_type::GT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_gt{});
return check_compare(got, (*attribute_value_list)[0], cmp_gt{}, false, true);
case comparison_operator_type::GE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_ge{});
return check_compare(got, (*attribute_value_list)[0], cmp_ge{}, false, true);
case comparison_operator_type::BEGINS_WITH:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_BEGINS_WITH(got, (*attribute_value_list)[0]);
return check_BEGINS_WITH(got, (*attribute_value_list)[0], false, true);
case comparison_operator_type::IN:
verify_operand_count(attribute_value_list, nonempty(), *comparison_operator);
return check_IN(got, *attribute_value_list);
@@ -461,7 +508,8 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu
return check_NOT_NULL(got);
case comparison_operator_type::BETWEEN:
verify_operand_count(attribute_value_list, exact_size(2), *comparison_operator);
return check_BETWEEN(got, (*attribute_value_list)[0], (*attribute_value_list)[1]);
return check_BETWEEN(got, (*attribute_value_list)[0], (*attribute_value_list)[1],
false, true, true);
case comparison_operator_type::CONTAINS:
{
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
@@ -470,7 +518,7 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu
const rjson::value& arg = (*attribute_value_list)[0];
const auto& argtype = (*arg.MemberBegin()).name;
if (argtype != "S" && argtype != "N" && argtype != "B") {
throw api_error::validation(
throw api_error("ValidationException",
format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "
"got {} instead", argtype));
}
@@ -484,7 +532,7 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu
const rjson::value& arg = (*attribute_value_list)[0];
const auto& argtype = (*arg.MemberBegin()).name;
if (argtype != "S" && argtype != "N" && argtype != "B") {
throw api_error::validation(
throw api_error("ValidationException",
format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "
"got {} instead", argtype));
}
@@ -501,7 +549,7 @@ conditional_operator_type get_conditional_operator(const rjson::value& req) {
return conditional_operator_type::MISSING;
}
if (!conditional_operator->IsString()) {
throw api_error::validation("'ConditionalOperator' parameter, if given, must be a string");
throw api_error("ValidationException", "'ConditionalOperator' parameter, if given, must be a string");
}
auto s = rjson::to_string_view(*conditional_operator);
if (s == "AND") {
@@ -509,7 +557,7 @@ conditional_operator_type get_conditional_operator(const rjson::value& req) {
} else if (s == "OR") {
return conditional_operator_type::OR;
} else {
throw api_error::validation(
throw api_error("ValidationException",
format("'ConditionalOperator' parameter must be AND, OR or missing. Found {}.", s));
}
}
@@ -524,13 +572,13 @@ bool verify_expected(const rjson::value& req, const rjson::value* previous_item)
auto conditional_operator = get_conditional_operator(req);
if (conditional_operator != conditional_operator_type::MISSING &&
(!expected || (expected->IsObject() && expected->GetObject().ObjectEmpty()))) {
throw api_error::validation("'ConditionalOperator' parameter cannot be specified for missing or empty Expression");
throw api_error("ValidationException", "'ConditionalOperator' parameter cannot be specified for missing or empty Expression");
}
if (!expected) {
return true;
}
if (!expected->IsObject()) {
throw api_error::validation("'Expected' parameter, if given, must be an object");
throw api_error("ValidationException", "'Expected' parameter, if given, must be an object");
}
bool require_all = conditional_operator != conditional_operator_type::OR;
return verify_condition(*expected, require_all, previous_item);
@@ -573,7 +621,8 @@ static bool calculate_primitive_condition(const parsed::primitive_condition& con
// Shouldn't happen unless we have a bug in the parser
throw std::logic_error(format("Wrong number of values {} in BETWEEN primitive_condition", cond._values.size()));
}
return check_BETWEEN(&calculated_values[0], calculated_values[1], calculated_values[2]);
return check_BETWEEN(&calculated_values[0], calculated_values[1], calculated_values[2],
cond._values[0].is_constant(), cond._values[1].is_constant(), cond._values[2].is_constant());
case parsed::primitive_condition::type::IN:
return check_IN(calculated_values);
case parsed::primitive_condition::type::VALUE:
@@ -588,7 +637,7 @@ static bool calculate_primitive_condition(const parsed::primitive_condition& con
return it->value.GetBool();
}
}
throw api_error::validation(
throw api_error("ValidationException",
format("ConditionExpression: condition results in a non-boolean value: {}",
calculated_values[0]));
default:
@@ -604,13 +653,17 @@ static bool calculate_primitive_condition(const parsed::primitive_condition& con
case parsed::primitive_condition::type::NE:
return check_NE(&calculated_values[0], calculated_values[1]);
case parsed::primitive_condition::type::GT:
return check_compare(&calculated_values[0], calculated_values[1], cmp_gt{});
return check_compare(&calculated_values[0], calculated_values[1], cmp_gt{},
cond._values[0].is_constant(), cond._values[1].is_constant());
case parsed::primitive_condition::type::GE:
return check_compare(&calculated_values[0], calculated_values[1], cmp_ge{});
return check_compare(&calculated_values[0], calculated_values[1], cmp_ge{},
cond._values[0].is_constant(), cond._values[1].is_constant());
case parsed::primitive_condition::type::LT:
return check_compare(&calculated_values[0], calculated_values[1], cmp_lt{});
return check_compare(&calculated_values[0], calculated_values[1], cmp_lt{},
cond._values[0].is_constant(), cond._values[1].is_constant());
case parsed::primitive_condition::type::LE:
return check_compare(&calculated_values[0], calculated_values[1], cmp_le{});
return check_compare(&calculated_values[0], calculated_values[1], cmp_le{},
cond._values[0].is_constant(), cond._values[1].is_constant());
default:
// Shouldn't happen unless we have a bug in the parser
throw std::logic_error(format("Unknown type {} in primitive_condition object", (int)(cond._op)));

View File

@@ -52,6 +52,7 @@ bool verify_expected(const rjson::value& req, const rjson::value* previous_item)
bool verify_condition(const rjson::value& condition, bool require_all, const rjson::value* previous_item);
bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2);
bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2, bool v1_from_query, bool v2_from_query);
bool verify_condition_expression(
const parsed::condition_expression& condition_expression,

View File

@@ -26,15 +26,12 @@
namespace alternator {
// api_error contains a DynamoDB error message to be returned to the user.
// It can be returned by value (see executor::request_return_type) or thrown.
// The DynamoDB's error messages are described in detail in
// DynamoDB's error messages are described in detail in
// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html
// An error message has an HTTP code (almost always 400), a type, e.g.,
// "ResourceNotFoundException", and a human readable message.
// Eventually alternator::api_handler will convert a returned or thrown
// api_error into a JSON object, and that is returned to the user.
class api_error final {
// Ah An error message has a "type", e.g., "ResourceNotFoundException", a coarser
// HTTP code (almost always, 400), and a human readable message. Eventually these
// will be wrapped into a JSON object returned to the client.
class api_error : public std::exception {
public:
using status_type = httpd::reply::status_type;
status_type _http_code;
@@ -45,41 +42,8 @@ public:
, _type(std::move(type))
, _msg(std::move(msg))
{ }
// Factory functions for some common types of DynamoDB API errors
static api_error validation(std::string msg) {
return api_error("ValidationException", std::move(msg));
}
static api_error resource_not_found(std::string msg) {
return api_error("ResourceNotFoundException", std::move(msg));
}
static api_error resource_in_use(std::string msg) {
return api_error("ResourceInUseException", std::move(msg));
}
static api_error invalid_signature(std::string msg) {
return api_error("InvalidSignatureException", std::move(msg));
}
static api_error unrecognized_client(std::string msg) {
return api_error("UnrecognizedClientException", std::move(msg));
}
static api_error unknown_operation(std::string msg) {
return api_error("UnknownOperationException", std::move(msg));
}
static api_error access_denied(std::string msg) {
return api_error("AccessDeniedException", std::move(msg));
}
static api_error conditional_check_failed(std::string msg) {
return api_error("ConditionalCheckFailedException", std::move(msg));
}
static api_error expired_iterator(std::string msg) {
return api_error("ExpiredIteratorException", std::move(msg));
}
static api_error trimmed_data_access_exception(std::string msg) {
return api_error("TrimmedDataAccessException", std::move(msg));
}
static api_error internal(std::string msg) {
return api_error("InternalServerError", std::move(msg), reply::status_type::internal_server_error);
}
api_error() = default;
virtual const char* what() const noexcept override { return _msg.c_str(); }
};
}

File diff suppressed because it is too large Load Diff

View File

@@ -30,51 +30,16 @@
#include "service/storage_proxy.hh"
#include "service/migration_manager.hh"
#include "service/client_state.hh"
#include "db/timeout_clock.hh"
#include "alternator/error.hh"
#include "stats.hh"
#include "utils/rjson.hh"
namespace db {
class system_distributed_keyspace;
}
namespace query {
class partition_slice;
class result;
}
namespace cql3::selection {
class selection;
}
namespace service {
class storage_service;
}
#include "rjson.hh"
namespace alternator {
class rmw_operation;
struct make_jsonable : public json::jsonable {
rjson::value _value;
public:
explicit make_jsonable(rjson::value&& value);
std::string to_json() const override;
};
struct json_string : public json::jsonable {
std::string _value;
public:
explicit json_string(std::string&& value);
std::string to_json() const override;
};
class executor : public peering_sharded_service<executor> {
service::storage_proxy& _proxy;
service::migration_manager& _mm;
db::system_distributed_keyspace& _sdks;
service::storage_service& _ss;
// An smp_service_group to be used for limiting the concurrency when
// forwarding Alternator request between shards - if necessary for LWT.
smp_service_group _ssg;
@@ -87,13 +52,12 @@ public:
static constexpr auto KEYSPACE_NAME_PREFIX = "alternator_";
static constexpr std::string_view INTERNAL_TABLE_PREFIX = ".scylla.alternator.";
executor(service::storage_proxy& proxy, service::migration_manager& mm, db::system_distributed_keyspace& sdks, service::storage_service& ss, smp_service_group ssg)
: _proxy(proxy), _mm(mm), _sdks(sdks), _ss(ss), _ssg(ssg) {}
executor(service::storage_proxy& proxy, service::migration_manager& mm, smp_service_group ssg)
: _proxy(proxy), _mm(mm), _ssg(ssg) {}
future<request_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> delete_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> update_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> put_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> get_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> delete_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
@@ -107,10 +71,6 @@ public:
future<request_return_type> tag_resource(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> untag_resource(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> list_tags_of_resource(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> list_streams(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> describe_stream(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> get_shard_iterator(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> get_records(client_state& client_state, tracing::trace_state_ptr, service_permit permit, rjson::value request);
future<> start();
future<> stop() { return make_ready_future<>(); }
@@ -118,37 +78,6 @@ public:
future<> create_keyspace(std::string_view keyspace_name);
static tracing::trace_state_ptr maybe_trace_query(client_state& client_state, sstring_view op, sstring_view query);
static sstring table_name(const schema&);
static db::timeout_clock::time_point default_timeout();
static schema_ptr find_table(service::storage_proxy&, const rjson::value& request);
private:
friend class rmw_operation;
static bool is_alternator_keyspace(const sstring& ks_name);
static sstring make_keyspace_name(const sstring& table_name);
static void describe_key_schema(rjson::value& parent, const schema&, std::unordered_map<std::string,std::string> * = nullptr);
static void describe_key_schema(rjson::value& parent, const schema& schema, std::unordered_map<std::string,std::string>&);
public:
static std::optional<rjson::value> describe_single_item(schema_ptr,
const query::partition_slice&,
const cql3::selection::selection&,
const query::result&,
const std::unordered_set<std::string>&);
static void describe_single_item(const cql3::selection::selection&,
const std::vector<bytes_opt>&,
const std::unordered_set<std::string>&,
rjson::value&,
bool = false);
void add_stream_options(const rjson::value& stream_spec, schema_builder&) const;
void supplement_table_info(rjson::value& descr, const schema& schema) const;
void supplement_table_stream_info(rjson::value& descr, const schema& schema) const;
};
}

View File

@@ -157,12 +157,12 @@ static void resolve_path(parsed::path& p,
const std::string& column_name = p.root();
if (column_name.size() > 0 && column_name.front() == '#') {
if (!expression_attribute_names) {
throw api_error::validation(
throw api_error("ValidationException",
format("ExpressionAttributeNames missing, entry '{}' required by expression", column_name));
}
const rjson::value* value = rjson::find(*expression_attribute_names, column_name);
if (!value || !value->IsString()) {
throw api_error::validation(
throw api_error("ValidationException",
format("ExpressionAttributeNames missing entry '{}' required by expression", column_name));
}
used_attribute_names.emplace(column_name);
@@ -176,16 +176,16 @@ static void resolve_constant(parsed::constant& c,
std::visit(overloaded_functor {
[&] (const std::string& valref) {
if (!expression_attribute_values) {
throw api_error::validation(
throw api_error("ValidationException",
format("ExpressionAttributeValues missing, entry '{}' required by expression", valref));
}
const rjson::value* value = rjson::find(*expression_attribute_values, valref);
if (!value) {
throw api_error::validation(
throw api_error("ValidationException",
format("ExpressionAttributeValues missing entry '{}' required by expression", valref));
}
if (value->IsNull()) {
throw api_error::validation(
throw api_error("ValidationException",
format("ExpressionAttributeValues null value for entry '{}' required by expression", valref));
}
validate_value(*value, "ExpressionAttributeValues");
@@ -392,7 +392,7 @@ static rjson::value list_concatenate(const rjson::value& v1, const rjson::value&
const rjson::value* list1 = unwrap_list(v1);
const rjson::value* list2 = unwrap_list(v2);
if (!list1 || !list2) {
throw api_error::validation("UpdateExpression: list_append() given a non-list");
throw api_error("ValidationException", "UpdateExpression: list_append() given a non-list");
}
rjson::value cat = rjson::copy(*list1);
for (const auto& a : list2->GetArray()) {
@@ -413,28 +413,28 @@ static rjson::value calculate_size(const rjson::value& v) {
// must come from the request itself, not from the database, so it makes
// sense to throw a ValidationException if we see such a problem.
if (!v.IsObject() || v.MemberCount() != 1) {
throw api_error::validation(format("invalid object: {}", v));
throw api_error("ValidationException", format("invalid object: {}", v));
}
auto it = v.MemberBegin();
int ret;
if (it->name == "S") {
if (!it->value.IsString()) {
throw api_error::validation(format("invalid string: {}", v));
throw api_error("ValidationException", format("invalid string: {}", v));
}
ret = it->value.GetStringLength();
} else if (it->name == "NS" || it->name == "SS" || it->name == "BS" || it->name == "L") {
if (!it->value.IsArray()) {
throw api_error::validation(format("invalid set: {}", v));
throw api_error("ValidationException", format("invalid set: {}", v));
}
ret = it->value.Size();
} else if (it->name == "M") {
if (!it->value.IsObject()) {
throw api_error::validation(format("invalid map: {}", v));
throw api_error("ValidationException", format("invalid map: {}", v));
}
ret = it->value.MemberCount();
} else if (it->name == "B") {
if (!it->value.IsString()) {
throw api_error::validation(format("invalid byte string: {}", v));
throw api_error("ValidationException", format("invalid byte string: {}", v));
}
ret = base64_decoded_len(rjson::to_string_view(it->value));
} else {
@@ -478,11 +478,11 @@ static const
std::unordered_map<std::string_view, function_handler_type*> function_handlers {
{"list_append", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::UpdateExpression) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: list_append() not allowed here", caller));
}
if (f._parameters.size() != 2) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: list_append() accepts 2 parameters, got {}", caller, f._parameters.size()));
}
rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);
@@ -492,15 +492,15 @@ std::unordered_map<std::string_view, function_handler_type*> function_handlers {
},
{"if_not_exists", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::UpdateExpression) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: if_not_exists() not allowed here", caller));
}
if (f._parameters.size() != 2) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: if_not_exists() accepts 2 parameters, got {}", caller, f._parameters.size()));
}
if (!std::holds_alternative<parsed::path>(f._parameters[0]._value)) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: if_not_exists() must include path as its first argument", caller));
}
rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);
@@ -510,11 +510,11 @@ std::unordered_map<std::string_view, function_handler_type*> function_handlers {
},
{"size", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::ConditionExpression) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: size() not allowed here", caller));
}
if (f._parameters.size() != 1) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: size() accepts 1 parameter, got {}", caller, f._parameters.size()));
}
rjson::value v = calculate_value(f._parameters[0], caller, previous_item);
@@ -523,15 +523,15 @@ std::unordered_map<std::string_view, function_handler_type*> function_handlers {
},
{"attribute_exists", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::ConditionExpressionAlone) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: attribute_exists() not allowed here", caller));
}
if (f._parameters.size() != 1) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: attribute_exists() accepts 1 parameter, got {}", caller, f._parameters.size()));
}
if (!std::holds_alternative<parsed::path>(f._parameters[0]._value)) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: attribute_exists()'s parameter must be a path", caller));
}
rjson::value v = calculate_value(f._parameters[0], caller, previous_item);
@@ -540,15 +540,15 @@ std::unordered_map<std::string_view, function_handler_type*> function_handlers {
},
{"attribute_not_exists", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::ConditionExpressionAlone) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: attribute_not_exists() not allowed here", caller));
}
if (f._parameters.size() != 1) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: attribute_not_exists() accepts 1 parameter, got {}", caller, f._parameters.size()));
}
if (!std::holds_alternative<parsed::path>(f._parameters[0]._value)) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: attribute_not_exists()'s parameter must be a path", caller));
}
rjson::value v = calculate_value(f._parameters[0], caller, previous_item);
@@ -557,18 +557,18 @@ std::unordered_map<std::string_view, function_handler_type*> function_handlers {
},
{"attribute_type", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::ConditionExpressionAlone) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: attribute_type() not allowed here", caller));
}
if (f._parameters.size() != 2) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: attribute_type() accepts 2 parameters, got {}", caller, f._parameters.size()));
}
// There is no real reason for the following check (not
// allowing the type to come from a document attribute), but
// DynamoDB does this check, so we do too...
if (!f._parameters[1].is_constant()) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: attribute_types()'s first parameter must be an expression attribute", caller));
}
rjson::value v0 = calculate_value(f._parameters[0], caller, previous_item);
@@ -577,7 +577,7 @@ std::unordered_map<std::string_view, function_handler_type*> function_handlers {
// If the type parameter is not one of the legal types
// we should generate an error, not a failed condition:
if (!known_type(rjson::to_string_view(v1.MemberBegin()->value))) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: attribute_types()'s second parameter, {}, is not a known type",
caller, v1.MemberBegin()->value));
}
@@ -587,77 +587,33 @@ std::unordered_map<std::string_view, function_handler_type*> function_handlers {
return to_bool_json(false);
}
} else {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: attribute_type() second parameter must refer to a string, got {}", caller, v1));
}
}
},
{"begins_with", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::ConditionExpressionAlone) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: begins_with() not allowed here", caller));
}
if (f._parameters.size() != 2) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: begins_with() accepts 2 parameters, got {}", caller, f._parameters.size()));
}
rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);
rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);
// TODO: There's duplication here with check_BEGINS_WITH().
// But unfortunately, the two functions differ a bit.
// If one of v1 or v2 is malformed or has an unsupported type
// (not B or S), what we do depends on whether it came from
// the user's query (is_constant()), or the item. Unsupported
// values in the query result in an error, but if they are in
// the item, we silently return false (no match).
bool bad = false;
if (!v1.IsObject() || v1.MemberCount() != 1) {
bad = true;
if (f._parameters[0].is_constant()) {
throw api_error::validation(format("{}: begins_with() encountered malformed AttributeValue: {}", caller, v1));
}
} else if (v1.MemberBegin()->name != "S" && v1.MemberBegin()->name != "B") {
bad = true;
if (f._parameters[0].is_constant()) {
throw api_error::validation(format("{}: begins_with() supports only string or binary in AttributeValue: {}", caller, v1));
}
}
if (!v2.IsObject() || v2.MemberCount() != 1) {
bad = true;
if (f._parameters[1].is_constant()) {
throw api_error::validation(format("{}: begins_with() encountered malformed AttributeValue: {}", caller, v2));
}
} else if (v2.MemberBegin()->name != "S" && v2.MemberBegin()->name != "B") {
bad = true;
if (f._parameters[1].is_constant()) {
throw api_error::validation(format("{}: begins_with() supports only string or binary in AttributeValue: {}", caller, v2));
}
}
bool ret = false;
if (!bad) {
auto it1 = v1.MemberBegin();
auto it2 = v2.MemberBegin();
if (it1->name == it2->name) {
if (it2->name == "S") {
std::string_view val1 = rjson::to_string_view(it1->value);
std::string_view val2 = rjson::to_string_view(it2->value);
ret = val1.starts_with(val2);
} else /* it2->name == "B" */ {
ret = base64_begins_with(rjson::to_string_view(it1->value), rjson::to_string_view(it2->value));
}
}
}
return to_bool_json(ret);
return to_bool_json(check_BEGINS_WITH(v1.IsNull() ? nullptr : &v1, v2,
f._parameters[0].is_constant(), f._parameters[1].is_constant()));
}
},
{"contains", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::ConditionExpressionAlone) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: contains() not allowed here", caller));
}
if (f._parameters.size() != 2) {
throw api_error::validation(
throw api_error("ValidationException",
format("{}: contains() accepts 2 parameters, got {}", caller, f._parameters.size()));
}
rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);
@@ -683,7 +639,7 @@ rjson::value calculate_value(const parsed::value& v,
[&] (const parsed::value::function_call& f) -> rjson::value {
auto function_it = function_handlers.find(std::string_view(f._function_name));
if (function_it == function_handlers.end()) {
throw api_error::validation(
throw api_error("ValidationException",
format("UpdateExpression: unknown function '{}' called.", f._function_name));
}
return function_it->second(caller, previous_item, f);
@@ -695,7 +651,7 @@ rjson::value calculate_value(const parsed::value& v,
std::string update_path = p.root();
if (p.has_operators()) {
// FIXME: support this
throw api_error::validation("Reading attribute paths not yet implemented");
throw api_error("ValidationException", "Reading attribute paths not yet implemented");
}
const rjson::value* previous_value = rjson::find(*previous_item, update_path);
return previous_value ? rjson::copy(*previous_value) : rjson::null_value();
@@ -707,7 +663,7 @@ rjson::value calculate_value(const parsed::value& v,
// either a single value, or v1+v2 or v1-v2.
rjson::value calculate_value(const parsed::set_rhs& rhs,
const rjson::value* previous_item) {
switch (rhs._op) {
switch(rhs._op) {
case 'v':
return calculate_value(rhs._v1, calculate_value_caller::UpdateExpression, previous_item);
case '+': {

View File

@@ -30,7 +30,7 @@
#include <seastar/util/noncopyable_function.hh>
#include "expressions_types.hh"
#include "utils/rjson.hh"
#include "rjson.hh"
namespace alternator {

View File

@@ -27,7 +27,7 @@
#include <seastar/core/shared_ptr.hh>
#include "utils/rjson.hh"
#include "rjson.hh"
/*
* Parsed representation of expressions and their components.

View File

@@ -20,12 +20,13 @@
*/
#include "rjson.hh"
#include "error.hh"
#include <seastar/core/print.hh>
#include <seastar/core/thread.hh>
namespace rjson {
allocator the_allocator;
static allocator the_allocator;
/*
* This wrapper class adds nested level checks to rapidjson's handlers.
@@ -127,21 +128,6 @@ std::string print(const rjson::value& value) {
return std::string(buffer.GetString());
}
rjson::malformed_value::malformed_value(std::string_view name, const rjson::value& value)
: malformed_value(name, print(value))
{}
rjson::malformed_value::malformed_value(std::string_view name, std::string_view value)
: error(format("Malformed value {} : {}", name, value))
{}
rjson::missing_value::missing_value(std::string_view name)
// TODO: using old message here, but as pointed out.
// "parameter" is not really a JSON concept. It is a value
// missing according to (implicit) schema.
: error(format("JSON parameter {} not found", name))
{}
rjson::value copy(const rjson::value& value) {
return rjson::value(value, the_allocator);
}
@@ -156,20 +142,6 @@ rjson::value parse(std::string_view str) {
return std::move(v);
}
std::optional<rjson::value> try_parse(std::string_view str) {
guarded_yieldable_json_handler<document, false> d(78);
try {
d.Parse(str.data(), str.size());
} catch (const rjson::error&) {
return std::nullopt;
}
if (d.HasParseError()) {
return std::nullopt;
}
rjson::value& v = d;
return std::move(v);
}
rjson::value parse_yieldable(std::string_view str) {
guarded_yieldable_json_handler<document, true> d(78);
d.Parse(str.data(), str.size());
@@ -186,18 +158,20 @@ rjson::value& get(rjson::value& value, std::string_view name) {
// Luckily, the variant taking a GenericValue doesn't share this bug,
// and we can create a string GenericValue without copying the string.
auto member_it = value.FindMember(rjson::value(name.data(), name.size()));
if (member_it != value.MemberEnd()) {
if (member_it != value.MemberEnd())
return member_it->value;
else {
throw rjson::error(format("JSON parameter {} not found", name));
}
throw missing_value(name);
}
const rjson::value& get(const rjson::value& value, std::string_view name) {
auto member_it = value.FindMember(rjson::value(name.data(), name.size()));
if (member_it != value.MemberEnd()) {
if (member_it != value.MemberEnd())
return member_it->value;
else {
throw rjson::error(format("JSON parameter {} not found", name));
}
throw missing_value(name);
}
rjson::value from_string(const std::string& str) {
@@ -319,66 +293,6 @@ bool single_value_comp::operator()(const rjson::value& r1, const rjson::value& r
}
}
rjson::value from_string_map(const std::map<sstring, sstring>& map) {
rjson::value v = rjson::empty_object();
for (auto& entry : map) {
rjson::set_with_string_name(v, std::string_view(entry.first), rjson::from_string(entry.second));
}
return v;
}
static inline bool is_control_char(char c) {
return c >= 0 && c <= 0x1F;
}
static inline bool needs_escaping(const sstring& s) {
return std::any_of(s.begin(), s.end(), [](char c) {return is_control_char(c) || c == '"' || c == '\\';});
}
sstring quote_json_string(const sstring& value) {
if (!needs_escaping(value)) {
return format("\"{}\"", value);
}
std::ostringstream oss;
oss << std::hex << std::uppercase << std::setfill('0');
oss.put('"');
for (char c : value) {
switch (c) {
case '"':
oss.put('\\').put('"');
break;
case '\\':
oss.put('\\').put('\\');
break;
case '\b':
oss.put('\\').put('b');
break;
case '\f':
oss.put('\\').put('f');
break;
case '\n':
oss.put('\\').put('n');
break;
case '\r':
oss.put('\\').put('r');
break;
case '\t':
oss.put('\\').put('t');
break;
default:
if (is_control_char(c)) {
oss.put('\\').put('u') << std::setw(4) << static_cast<int>(c);
} else {
oss.put(c);
}
break;
}
}
oss.put('"');
return oss.str();
}
} // end namespace rjson
std::ostream& std::operator<<(std::ostream& os, const rjson::value& v) {

177
alternator/rjson.hh Normal file
View File

@@ -0,0 +1,177 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
/*
* rjson is a wrapper over rapidjson library, providing fast JSON parsing and generation.
*
* rapidjson has strict copy elision policies, which, among other things, involves
* using provided char arrays without copying them and allows copying objects only explicitly.
* As such, one should be careful when passing strings with limited liveness
* (e.g. data underneath local std::strings) to rjson functions, because created JSON objects
* may end up relying on dangling char pointers. All rjson functions that create JSONs from strings
* by rjson have both APIs for string_ref_type (more optimal, used when the string is known to live
* at least as long as the object, e.g. a static char array) and for std::strings. The more optimal
* variants should be used *only* if the liveness of the string is guaranteed, otherwise it will
* result in undefined behaviour.
* Also, bear in mind that methods exposed by rjson::value are generic, but some of them
* work fine only for specific types. In case the type does not match, an rjson::error will be thrown.
* Examples of such mismatched usages is calling MemberCount() on a JSON value not of object type
* or calling Size() on a non-array value.
*/
#include <string>
#include <stdexcept>
namespace rjson {
class error : public std::exception {
std::string _msg;
public:
error() = default;
error(const std::string& msg) : _msg(msg) {}
virtual const char* what() const noexcept override { return _msg.c_str(); }
};
}
// rapidjson configuration macros
#define RAPIDJSON_HAS_STDSTRING 1
// Default rjson policy is to use assert() - which is dangerous for two reasons:
// 1. assert() can be turned off with -DNDEBUG
// 2. assert() crashes a program
// Fortunately, the default policy can be overridden, and so rapidjson errors will
// throw an rjson::error exception instead.
#define RAPIDJSON_ASSERT(x) do { if (!(x)) throw rjson::error(std::string("JSON error: condition not met: ") + #x); } while (0)
#include <rapidjson/document.h>
#include <rapidjson/writer.h>
#include <rapidjson/stringbuffer.h>
#include <rapidjson/error/en.h>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
namespace rjson {
using allocator = rapidjson::CrtAllocator;
using encoding = rapidjson::UTF8<>;
using document = rapidjson::GenericDocument<encoding, allocator>;
using value = rapidjson::GenericValue<encoding, allocator>;
using string_ref_type = value::StringRefType;
using string_buffer = rapidjson::GenericStringBuffer<encoding>;
using writer = rapidjson::Writer<string_buffer, encoding>;
using type = rapidjson::Type;
// Returns an object representing JSON's null
inline rjson::value null_value() {
return rjson::value(rapidjson::kNullType);
}
// Returns an empty JSON object - {}
inline rjson::value empty_object() {
return rjson::value(rapidjson::kObjectType);
}
// Returns an empty JSON array - []
inline rjson::value empty_array() {
return rjson::value(rapidjson::kArrayType);
}
// Returns an empty JSON string - ""
inline rjson::value empty_string() {
return rjson::value(rapidjson::kStringType);
}
// Convert the JSON value to a string with JSON syntax, the opposite of parse().
// The representation is dense - without any redundant indentation.
std::string print(const rjson::value& value);
// Returns a string_view to the string held in a JSON value (which is
// assumed to hold a string, i.e., v.IsString() == true). This is a view
// to the existing data - no copying is done.
inline std::string_view to_string_view(const rjson::value& v) {
return std::string_view(v.GetString(), v.GetStringLength());
}
// Copies given JSON value - involves allocation
rjson::value copy(const rjson::value& value);
// Parses a JSON value from given string or raw character array.
// The string/char array liveness does not need to be persisted,
// as parse() will allocate member names and values.
// Throws rjson::error if parsing failed.
rjson::value parse(std::string_view str);
// Needs to be run in thread context
rjson::value parse_yieldable(std::string_view str);
// Creates a JSON value (of JSON string type) out of internal string representations.
// The string value is copied, so str's liveness does not need to be persisted.
rjson::value from_string(const std::string& str);
rjson::value from_string(const sstring& str);
rjson::value from_string(const char* str, size_t size);
rjson::value from_string(std::string_view view);
// Returns a pointer to JSON member if it exists, nullptr otherwise
rjson::value* find(rjson::value& value, std::string_view name);
const rjson::value* find(const rjson::value& value, std::string_view name);
// Returns a reference to JSON member if it exists, throws otherwise
rjson::value& get(rjson::value& value, std::string_view name);
const rjson::value& get(const rjson::value& value, std::string_view name);
// Sets a member in given JSON object by moving the member - allocates the name.
// Throws if base is not a JSON object.
void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member);
void set_with_string_name(rjson::value& base, std::string_view name, rjson::value&& member);
// Sets a string member in given JSON object by assigning its reference - allocates the name.
// NOTICE: member string liveness must be ensured to be at least as long as base's.
// Throws if base is not a JSON object.
void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member);
void set_with_string_name(rjson::value& base, std::string_view name, rjson::string_ref_type member);
// Sets a member in given JSON object by moving the member.
// NOTICE: name liveness must be ensured to be at least as long as base's.
// Throws if base is not a JSON object.
void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member);
// Sets a string member in given JSON object by assigning its reference.
// NOTICE: name liveness must be ensured to be at least as long as base's.
// NOTICE: member liveness must be ensured to be at least as long as base's.
// Throws if base is not a JSON object.
void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member);
// Adds a value to a JSON list by moving the item to its end.
// Throws if base_array is not a JSON array.
void push_back(rjson::value& base_array, rjson::value&& item);
// Remove a member from a JSON object. Throws if value isn't an object.
bool remove_member(rjson::value& value, std::string_view name);
struct single_value_comp {
bool operator()(const rjson::value& r1, const rjson::value& r2) const;
};
} // end namespace rjson
namespace std {
std::ostream& operator<<(std::ostream& os, const rjson::value& v);
}

View File

@@ -24,7 +24,7 @@
#include "seastarx.hh"
#include "service/storage_proxy.hh"
#include "service/storage_proxy.hh"
#include "utils/rjson.hh"
#include "rjson.hh"
#include "executor.hh"
namespace alternator {

View File

@@ -65,7 +65,7 @@ struct from_json_visitor {
void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), from_json_visitor{v, bo}); };
void operator()(const string_type_impl& t) {
bo.write(t.from_string(rjson::to_string_view(v)));
bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));
}
void operator()(const bytes_type_impl& t) const {
bo.write(base64_decode(v));
@@ -74,27 +74,23 @@ struct from_json_visitor {
bo.write(boolean_type->decompose(v.GetBool()));
}
void operator()(const decimal_type_impl& t) const {
try {
bo.write(t.from_string(rjson::to_string_view(v)));
} catch (const marshal_exception& e) {
throw api_error::validation(format("The parameter cannot be converted to a numeric value: {}", v));
}
bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));
}
// default
void operator()(const abstract_type& t) const {
bo.write(from_json_object(t, v, cql_serialization_format::internal()));
bo.write(from_json_object(t, Json::Value(rjson::print(v)), cql_serialization_format::internal()));
}
};
bytes serialize_item(const rjson::value& item) {
if (item.IsNull() || item.MemberCount() != 1) {
throw api_error::validation(format("An item can contain only one attribute definition: {}", item));
throw api_error("ValidationException", format("An item can contain only one attribute definition: {}", item));
}
auto it = item.MemberBegin();
type_info type_info = type_info_from_string(rjson::to_string_view(it->name)); // JSON keys are guaranteed to be strings
if (type_info.atype == alternator_type::NOT_SUPPORTED_YET) {
slogger.trace("Non-optimal serialization of type {}", it->name);
slogger.trace("Non-optimal serialization of type {}", it->name.GetString());
return bytes{int8_t(type_info.atype)} + to_bytes(rjson::print(item));
}
@@ -132,7 +128,7 @@ struct to_json_visitor {
rjson::value deserialize_item(bytes_view bv) {
rjson::value deserialized(rapidjson::kObjectType);
if (bv.empty()) {
throw api_error::validation("Serialized value empty");
throw api_error("ValidationException", "Serialized value empty");
}
alternator_type atype = alternator_type(bv[0]);
@@ -168,7 +164,7 @@ bytes get_key_column_value(const rjson::value& item, const column_definition& co
std::string column_name = column.name_as_text();
const rjson::value* key_typed_value = rjson::find(item, column_name);
if (!key_typed_value) {
throw api_error::validation(format("Key column {} not found", column_name));
throw api_error("ValidationException", format("Key column {} not found", column_name));
}
return get_key_from_typed_value(*key_typed_value, column);
}
@@ -179,20 +175,20 @@ bytes get_key_column_value(const rjson::value& item, const column_definition& co
bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column) {
if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1 ||
!key_typed_value.MemberBegin()->value.IsString()) {
throw api_error::validation(
throw api_error("ValidationException",
format("Malformed value object for key column {}: {}",
column.name_as_text(), key_typed_value));
}
auto it = key_typed_value.MemberBegin();
if (it->name != type_to_string(column.type)) {
throw api_error::validation(
throw api_error("ValidationException",
format("Type mismatch: expected type {} for key column {}, got type {}",
type_to_string(column.type), column.name_as_text(), it->name));
type_to_string(column.type), column.name_as_text(), it->name.GetString()));
}
std::string_view value_view = rjson::to_string_view(it->value);
if (value_view.empty()) {
throw api_error::validation(
throw api_error("ValidationException",
format("The AttributeValue for a key attribute cannot contain an empty string value. Key: {}", column.name_as_text()));
}
if (column.type == bytes_type) {
@@ -251,24 +247,20 @@ clustering_key ck_from_json(const rjson::value& item, schema_ptr schema) {
big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {
if (!v.IsObject() || v.MemberCount() != 1) {
throw api_error::validation(format("{}: invalid number object", diagnostic));
throw api_error("ValidationException", format("{}: invalid number object", diagnostic));
}
auto it = v.MemberBegin();
if (it->name != "N") {
throw api_error::validation(format("{}: expected number, found type '{}'", diagnostic, it->name));
throw api_error("ValidationException", format("{}: expected number, found type '{}'", diagnostic, it->name));
}
try {
if (it->value.IsNumber()) {
// FIXME(sarna): should use big_decimal constructor with numeric values directly:
return big_decimal(rjson::print(it->value));
}
if (!it->value.IsString()) {
throw api_error::validation(format("{}: improperly formatted number constant", diagnostic));
}
return big_decimal(rjson::to_string_view(it->value));
} catch (const marshal_exception& e) {
throw api_error::validation(format("The parameter cannot be converted to a numeric value: {}", it->value));
if (it->value.IsNumber()) {
// FIXME(sarna): should use big_decimal constructor with numeric values directly:
return big_decimal(rjson::print(it->value));
}
if (!it->value.IsString()) {
throw api_error("ValidationException", format("{}: improperly formatted number constant", diagnostic));
}
return big_decimal(it->value.GetString());
}
const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v) {
@@ -320,10 +312,10 @@ rjson::value set_sum(const rjson::value& v1, const rjson::value& v2) {
auto [set1_type, set1] = unwrap_set(v1);
auto [set2_type, set2] = unwrap_set(v2);
if (set1_type != set2_type) {
throw api_error::validation(format("Mismatched set types: {} and {}", set1_type, set2_type));
throw api_error("ValidationException", format("Mismatched set types: {} and {}", set1_type, set2_type));
}
if (!set1 || !set2) {
throw api_error::validation("UpdateExpression: ADD operation for sets must be given sets as arguments");
throw api_error("ValidationException", "UpdateExpression: ADD operation for sets must be given sets as arguments");
}
rjson::value sum = rjson::copy(*set1);
std::set<rjson::value, rjson::single_value_comp> set1_raw;
@@ -331,7 +323,7 @@ rjson::value set_sum(const rjson::value& v1, const rjson::value& v2) {
set1_raw.insert(rjson::copy(*it));
}
for (const auto& a : set2->GetArray()) {
if (!set1_raw.contains(a)) {
if (set1_raw.count(a) == 0) {
rjson::push_back(sum, rjson::copy(a));
}
}
@@ -348,10 +340,10 @@ std::optional<rjson::value> set_diff(const rjson::value& v1, const rjson::value&
auto [set1_type, set1] = unwrap_set(v1);
auto [set2_type, set2] = unwrap_set(v2);
if (set1_type != set2_type) {
throw api_error::validation(format("Mismatched set types: {} and {}", set1_type, set2_type));
throw api_error("ValidationException", format("Mismatched set types: {} and {}", set1_type, set2_type));
}
if (!set1 || !set2) {
throw api_error::validation("UpdateExpression: DELETE operation can only be performed on a set");
throw api_error("ValidationException", "UpdateExpression: DELETE operation can only be performed on a set");
}
std::set<rjson::value, rjson::single_value_comp> set1_raw;
for (auto it = set1->Begin(); it != set1->End(); ++it) {

View File

@@ -26,7 +26,7 @@
#include "types.hh"
#include "schema_fwd.hh"
#include "keys.hh"
#include "utils/rjson.hh"
#include "rjson.hh"
#include "utils/big_decimal.hh"
namespace alternator {

View File

@@ -25,7 +25,7 @@
#include <seastar/json/json_elements.hh>
#include "seastarx.hh"
#include "error.hh"
#include "utils/rjson.hh"
#include "rjson.hh"
#include "auth.hh"
#include <cctype>
#include "cql3/query_processor.hh"
@@ -75,17 +75,20 @@ public:
// returned to the client as expected. Other types of
// exceptions are unexpected, and returned to the user
// as an internal server error:
api_error ret;
try {
resf.get();
} catch (api_error &ae) {
generate_error_reply(*rep, ae);
ret = ae;
} catch (rjson::error & re) {
generate_error_reply(*rep,
api_error::validation(re.what()));
ret = api_error("ValidationException", re.what());
} catch (...) {
generate_error_reply(*rep,
api_error::internal(format("Internal server error: {}", std::current_exception())));
ret = api_error(
"Internal Server Error",
format("Internal server error: {}", std::current_exception()),
reply::status_type::internal_server_error);
}
generate_error_reply(*rep, ret);
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
auto res = resf.get0();
@@ -185,11 +188,11 @@ future<> server::verify_signature(const request& req) {
}
auto host_it = req._headers.find("Host");
if (host_it == req._headers.end()) {
throw api_error::invalid_signature("Host header is mandatory for signature verification");
throw api_error("InvalidSignatureException", "Host header is mandatory for signature verification");
}
auto authorization_it = req._headers.find("Authorization");
if (authorization_it == req._headers.end()) {
throw api_error::invalid_signature("Authorization header is mandatory for signature verification");
throw api_error("InvalidSignatureException", "Authorization header is mandatory for signature verification");
}
std::string host = host_it->second;
std::vector<std::string_view> credentials_raw = split(authorization_it->second, ' ');
@@ -201,7 +204,7 @@ future<> server::verify_signature(const request& req) {
std::vector<std::string_view> entry_split = split(entry, '=');
if (entry_split.size() != 2) {
if (entry != "AWS4-HMAC-SHA256") {
throw api_error::invalid_signature(format("Only AWS4-HMAC-SHA256 algorithm is supported. Found: {}", entry));
throw api_error("InvalidSignatureException", format("Only AWS4-HMAC-SHA256 algorithm is supported. Found: {}", entry));
}
continue;
}
@@ -222,7 +225,7 @@ future<> server::verify_signature(const request& req) {
}
std::vector<std::string_view> credential_split = split(credential, '/');
if (credential_split.size() != 5) {
throw api_error::validation(format("Incorrect credential information format: {}", credential));
throw api_error("ValidationException", format("Incorrect credential information format: {}", credential));
}
std::string user(credential_split[0]);
std::string datestamp(credential_split[1]);
@@ -243,8 +246,8 @@ future<> server::verify_signature(const request& req) {
}
}
auto cache_getter = [&qp = _qp] (std::string username) {
return get_key_from_roles(qp, std::move(username));
auto cache_getter = [] (std::string username) {
return get_key_from_roles(cql3::get_query_processor().local(), std::move(username));
};
return _key_cache.get_ptr(user, cache_getter).then([this, &req,
user = std::move(user),
@@ -260,7 +263,7 @@ future<> server::verify_signature(const request& req) {
if (signature != std::string_view(user_signature)) {
_key_cache.remove(user);
throw api_error::unrecognized_client("The security token included in the request is invalid.");
throw api_error("UnrecognizedClientException", "The security token included in the request is invalid.");
}
});
}
@@ -271,12 +274,13 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr
std::vector<std::string_view> split_target = split(target, '.');
//NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)
std::string op = split_target.empty() ? std::string() : std::string(split_target.back());
slogger.trace("Request: {} {} {}", op, req->content, req->_headers);
slogger.trace("Request: {} {}", op, req->content);
return verify_signature(*req).then([this, op, req = std::move(req)] () mutable {
auto callback_it = _callbacks.find(op);
if (callback_it == _callbacks.end()) {
_executor._stats.unsupported_operations++;
throw api_error::unknown_operation(format("Unsupported operation {}", op));
throw api_error("UnknownOperationException",
format("Unsupported operation {}", op));
}
return with_gate(_pending_requests, [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] () mutable {
//FIXME: Client state can provide more context, e.g. client's endpoint address
@@ -328,11 +332,10 @@ void server::set_routes(routes& r) {
//FIXME: A way to immediately invalidate the cache should be considered,
// e.g. when the system table which stores the keys is changed.
// For now, this propagation may take up to 1 minute.
server::server(executor& exec, cql3::query_processor& qp)
server::server(executor& exec)
: _http_server("http-alternator")
, _https_server("https-alternator")
, _executor(exec)
, _qp(qp)
, _key_cache(1024, 1min, slogger)
, _enforce_authorization(false)
, _enabled_servers{}
@@ -347,9 +350,6 @@ server::server(executor& exec, cql3::query_processor& qp)
{"DeleteTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.delete_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"UpdateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.update_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"PutItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.put_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
@@ -389,18 +389,6 @@ server::server(executor& exec, cql3::query_processor& qp)
{"ListTagsOfResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.list_tags_of_resource(client_state, std::move(permit), std::move(json_request));
}},
{"ListStreams", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.list_streams(client_state, std::move(permit), std::move(json_request));
}},
{"DescribeStream", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.describe_stream(client_state, std::move(permit), std::move(json_request));
}},
{"GetShardIterator", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.get_shard_iterator(client_state, std::move(permit), std::move(json_request));
}},
{"GetRecords", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.get_records(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
} {
}

View File

@@ -41,7 +41,6 @@ class server {
http_server _http_server;
http_server _https_server;
executor& _executor;
cql3::query_processor& _qp;
key_cache _key_cache;
bool _enforce_authorization;
@@ -69,7 +68,7 @@ class server {
json_parser _json_parser;
public:
server(executor& executor, cql3::query_processor& qp);
server(executor& executor);
future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,
bool enforce_authorization, semaphore* memory_limiter);

View File

@@ -20,7 +20,7 @@
*/
#include "stats.hh"
#include "utils/histogram_metrics_helper.hh"
#include <seastar/core/metrics.hh>
namespace alternator {
@@ -37,7 +37,7 @@ stats::stats() : api_operations{} {
seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}),
#define OPERATION_LATENCY(name, CamelCaseName) \
seastar::metrics::make_histogram("op_latency", \
seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return to_metrics_histogram(api_operations.name);}),
seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return api_operations.name.get_histogram(1,20);}),
OPERATION(batch_write_item, "BatchWriteItem")
OPERATION(create_backup, "CreateBackup")
OPERATION(create_global_table, "CreateGlobalTable")
@@ -77,11 +77,6 @@ stats::stats() : api_operations{} {
OPERATION_LATENCY(get_item_latency, "GetItem")
OPERATION_LATENCY(delete_item_latency, "DeleteItem")
OPERATION_LATENCY(update_item_latency, "UpdateItem")
OPERATION(list_streams, "ListStreams")
OPERATION(describe_stream, "DescribeStream")
OPERATION(get_shard_iterator, "GetShardIterator")
OPERATION(get_records, "GetRecords")
OPERATION_LATENCY(get_records_latency, "GetRecords")
});
_metrics.add_group("alternator", {
seastar::metrics::make_total_operations("unsupported_operations", unsupported_operations,

View File

@@ -74,16 +74,11 @@ public:
uint64_t update_item = 0;
uint64_t update_table = 0;
uint64_t update_time_to_live = 0;
uint64_t list_streams = 0;
uint64_t describe_stream = 0;
uint64_t get_shard_iterator = 0;
uint64_t get_records = 0;
utils::time_estimated_histogram put_item_latency;
utils::time_estimated_histogram get_item_latency;
utils::time_estimated_histogram delete_item_latency;
utils::time_estimated_histogram update_item_latency;
utils::time_estimated_histogram get_records_latency;
utils::estimated_histogram put_item_latency;
utils::estimated_histogram get_item_latency;
utils::estimated_histogram delete_item_latency;
utils::estimated_histogram update_item_latency;
} api_operations;
// Miscellaneous event counters
uint64_t total_operations = 0;

File diff suppressed because it is too large Load Diff

View File

@@ -249,7 +249,7 @@
"MIGRATION_REQUEST",
"PREPARE_MESSAGE",
"PREPARE_DONE_MESSAGE",
"UNUSED__STREAM_MUTATION",
"STREAM_MUTATION",
"STREAM_MUTATION_DONE",
"COMPLETE_MESSAGE",
"REPAIR_CHECKSUM_RANGE",

View File

@@ -68,7 +68,7 @@
"summary":"Get the hinted handoff enabled by dc",
"type":"array",
"items":{
"type":"array"
"type":"mapper_list"
},
"nickname":"get_hinted_handoff_enabled_by_dc",
"produces":[

View File

@@ -833,43 +833,6 @@
}
]
},
{
"path":"/storage_service/repair_status/",
"operations":[
{
"method":"GET",
"summary":"Query the repair status and return when the repair is finished or timeout",
"type":"string",
"enum":[
"RUNNING",
"SUCCESSFUL",
"FAILED"
],
"nickname":"repair_await_completion",
"produces":[
"application/json"
],
"parameters":[
{
"name":"id",
"description":"The repair ID to check for status",
"required":true,
"allowMultiple":false,
"type": "long",
"paramType":"query"
},
{
"name":"timeout",
"description":"Seconds to wait before the query returns even if the repair is not finished. The value -1 or not providing this parameter means no timeout",
"required":false,
"allowMultiple":false,
"type": "long",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/repair_async/{keyspace}",
"operations":[
@@ -2468,7 +2431,7 @@
"version":{
"type":"string",
"enum":[
"ka", "la", "mc", "md"
"ka", "la", "mc"
],
"description":"SSTable version"
},

View File

@@ -113,20 +113,8 @@ future<> set_server_storage_service(http_context& ctx) {
return register_api(ctx, "storage_service", "The storage service API", set_storage_service);
}
future<> set_server_repair(http_context& ctx, sharded<netw::messaging_service>& ms) {
return ctx.http_server.set_routes([&ctx, &ms] (routes& r) { set_repair(ctx, r, ms); });
}
future<> unset_server_repair(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_repair(ctx, r); });
}
future<> set_server_snapshot(http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl) {
return ctx.http_server.set_routes([&ctx, &snap_ctl] (routes& r) { set_snapshot(ctx, r, snap_ctl); });
}
future<> unset_server_snapshot(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_snapshot(ctx, r); });
future<> set_server_snapshot(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { set_snapshot(ctx, r); });
}
future<> set_server_snitch(http_context& ctx) {
@@ -143,14 +131,9 @@ future<> set_server_load_sstable(http_context& ctx) {
"The column family API", set_column_family);
}
future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms) {
future<> set_server_messaging_service(http_context& ctx) {
return register_api(ctx, "messaging_service",
"The messaging service API", [&ms] (http_context& ctx, routes& r) {
set_messaging_service(ctx, r, ms);
});
}
future<> unset_server_messaging_service(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_messaging_service(ctx, r); });
"The messaging service API", set_messaging_service);
}
future<> set_server_storage_proxy(http_context& ctx) {

View File

@@ -256,6 +256,4 @@ public:
operator T() const { return value; }
};
utils_json::estimated_histogram time_to_json_histogram(const utils::time_estimated_histogram& val);
}

View File

@@ -24,11 +24,9 @@
#include <seastar/http/httpd.hh>
namespace service { class load_meter; }
namespace locator { class shared_token_metadata; }
namespace locator { class token_metadata; }
namespace cql_transport { class controller; }
class thrift_controller;
namespace db { class snapshot_ctl; }
namespace netw { class messaging_service; }
namespace api {
@@ -39,33 +37,27 @@ struct http_context {
distributed<database>& db;
distributed<service::storage_proxy>& sp;
service::load_meter& lmeter;
const sharded<locator::shared_token_metadata>& shared_token_metadata;
sharded<locator::token_metadata>& token_metadata;
http_context(distributed<database>& _db,
distributed<service::storage_proxy>& _sp,
service::load_meter& _lm, const sharded<locator::shared_token_metadata>& _stm)
: db(_db), sp(_sp), lmeter(_lm), shared_token_metadata(_stm) {
service::load_meter& _lm, sharded<locator::token_metadata>& _tm)
: db(_db), sp(_sp), lmeter(_lm), token_metadata(_tm) {
}
const locator::token_metadata& get_token_metadata();
};
future<> set_server_init(http_context& ctx);
future<> set_server_config(http_context& ctx);
future<> set_server_snitch(http_context& ctx);
future<> set_server_storage_service(http_context& ctx);
future<> set_server_repair(http_context& ctx, sharded<netw::messaging_service>& ms);
future<> unset_server_repair(http_context& ctx);
future<> set_transport_controller(http_context& ctx, cql_transport::controller& ctl);
future<> unset_transport_controller(http_context& ctx);
future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl);
future<> unset_rpc_controller(http_context& ctx);
future<> set_server_snapshot(http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl);
future<> unset_server_snapshot(http_context& ctx);
future<> set_server_snapshot(http_context& ctx);
future<> set_server_gossip(http_context& ctx);
future<> set_server_load_sstable(http_context& ctx);
future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms);
future<> unset_server_messaging_service(http_context& ctx);
future<> set_server_messaging_service(http_context& ctx);
future<> set_server_storage_proxy(http_context& ctx);
future<> set_server_stream_manager(http_context& ctx);
future<> set_server_gossip_settle(http_context& ctx);

View File

@@ -249,12 +249,6 @@ static future<json::json_return_type> sum_sstable(http_context& ctx, bool total)
});
}
future<json::json_return_type> map_reduce_cf_time_histogram(http_context& ctx, const sstring& name, std::function<utils::time_estimated_histogram(const column_family&)> f) {
return map_reduce_cf_raw(ctx, name, utils::time_estimated_histogram(), f, utils::time_estimated_histogram_merge).then([](const utils::time_estimated_histogram& res) {
return make_ready_future<json::json_return_type>(time_to_json_histogram(res));
});
}
template <typename T>
class sum_ratio {
uint64_t _n = 0;
@@ -656,7 +650,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst->filter_size();
return s + sst->filter_size();
});
}, std::plus<uint64_t>());
});
@@ -664,7 +658,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_all_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst->filter_size();
return s + sst->filter_size();
});
}, std::plus<uint64_t>());
});
@@ -672,7 +666,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst->filter_memory_size();
return s + sst->filter_memory_size();
});
}, std::plus<uint64_t>());
});
@@ -680,7 +674,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_all_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst->filter_memory_size();
return s + sst->filter_memory_size();
});
}, std::plus<uint64_t>());
});
@@ -688,7 +682,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst->get_summary().memory_footprint();
return s + sst->get_summary().memory_footprint();
});
}, std::plus<uint64_t>());
});
@@ -696,7 +690,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_all_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst->get_summary().memory_footprint();
return s + sst->get_summary().memory_footprint();
});
}, std::plus<uint64_t>());
});
@@ -802,21 +796,24 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_cas_prepare;
});
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_cas_accept;
});
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_cas_learn;
});
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -865,9 +862,7 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_built_indexes.set(r, [&ctx](std::unique_ptr<request> req) {
auto ks_cf = parse_fully_qualified_cf_name(req->param["name"]);
auto&& ks = std::get<0>(ks_cf);
auto&& cf_name = std::get<1>(ks_cf);
auto [ks, cf_name] = parse_fully_qualified_cf_name(req->param["name"]);
return db::system_keyspace::load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace::view_build_progress>& vb) mutable {
std::set<sstring> vp;
for (auto b : vb) {
@@ -880,7 +875,7 @@ void set_column_family(http_context& ctx, routes& r) {
column_family& cf = ctx.db.local().find_column_family(uuid);
res.reserve(cf.get_index_manager().list_indexes().size());
for (auto&& i : cf.get_index_manager().list_indexes()) {
if (!vp.contains(secondary_index::index_table_name(i.metadata().name()))) {
if (vp.find(secondary_index::index_table_name(i.metadata().name())) == vp.end()) {
res.emplace_back(i.metadata().name());
}
}
@@ -914,15 +909,17 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_read_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_read;
});
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_write_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_write;
});
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::set_compaction_strategy_class.set(r, [&ctx](std::unique_ptr<request> req) {

View File

@@ -68,8 +68,6 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& n
});
}
future<json::json_return_type> map_reduce_cf_time_histogram(http_context& ctx, const sstring& name, std::function<utils::time_estimated_histogram(const column_family&)> f);
struct map_reduce_column_families_locally {
std::any init;
std::function<std::unique_ptr<std::any>(column_family&)> mapper;

View File

@@ -53,8 +53,8 @@ std::vector<message_counter> map_to_message_counters(
* according to a function that it gets as a parameter.
*
*/
future_json_function get_client_getter(sharded<netw::messaging_service>& ms, std::function<uint64_t(const shard_info&)> f) {
return [&ms, f](std::unique_ptr<request> req) {
future_json_function get_client_getter(std::function<uint64_t(const shard_info&)> f) {
return [f](std::unique_ptr<request> req) {
using map_type = std::unordered_map<gms::inet_address, uint64_t>;
auto get_shard_map = [f](messaging_service& ms) {
std::unordered_map<gms::inet_address, unsigned long> map;
@@ -63,15 +63,15 @@ future_json_function get_client_getter(sharded<netw::messaging_service>& ms, std
});
return map;
};
return ms.map_reduce0(get_shard_map, map_type(), map_sum<map_type>).
return get_messaging_service().map_reduce0(get_shard_map, map_type(), map_sum<map_type>).
then([](map_type&& map) {
return make_ready_future<json::json_return_type>(map_to_message_counters(map));
});
};
}
future_json_function get_server_getter(sharded<netw::messaging_service>& ms, std::function<uint64_t(const rpc::stats&)> f) {
return [&ms, f](std::unique_ptr<request> req) {
future_json_function get_server_getter(std::function<uint64_t(const rpc::stats&)> f) {
return [f](std::unique_ptr<request> req) {
using map_type = std::unordered_map<gms::inet_address, uint64_t>;
auto get_shard_map = [f](messaging_service& ms) {
std::unordered_map<gms::inet_address, unsigned long> map;
@@ -80,53 +80,53 @@ future_json_function get_server_getter(sharded<netw::messaging_service>& ms, std
});
return map;
};
return ms.map_reduce0(get_shard_map, map_type(), map_sum<map_type>).
return get_messaging_service().map_reduce0(get_shard_map, map_type(), map_sum<map_type>).
then([](map_type&& map) {
return make_ready_future<json::json_return_type>(map_to_message_counters(map));
});
};
}
void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging_service>& ms) {
get_timeout_messages.set(r, get_client_getter(ms, [](const shard_info& c) {
void set_messaging_service(http_context& ctx, routes& r) {
get_timeout_messages.set(r, get_client_getter([](const shard_info& c) {
return c.get_stats().timeout;
}));
get_sent_messages.set(r, get_client_getter(ms, [](const shard_info& c) {
get_sent_messages.set(r, get_client_getter([](const shard_info& c) {
return c.get_stats().sent_messages;
}));
get_dropped_messages.set(r, get_client_getter(ms, [](const shard_info& c) {
get_dropped_messages.set(r, get_client_getter([](const shard_info& c) {
// We don't have the same drop message mechanism
// as origin has.
// hence we can always return 0
return 0;
}));
get_exception_messages.set(r, get_client_getter(ms, [](const shard_info& c) {
get_exception_messages.set(r, get_client_getter([](const shard_info& c) {
return c.get_stats().exception_received;
}));
get_pending_messages.set(r, get_client_getter(ms, [](const shard_info& c) {
get_pending_messages.set(r, get_client_getter([](const shard_info& c) {
return c.get_stats().pending;
}));
get_respond_pending_messages.set(r, get_server_getter(ms, [](const rpc::stats& c) {
get_respond_pending_messages.set(r, get_server_getter([](const rpc::stats& c) {
return c.pending;
}));
get_respond_completed_messages.set(r, get_server_getter(ms, [](const rpc::stats& c) {
get_respond_completed_messages.set(r, get_server_getter([](const rpc::stats& c) {
return c.sent_messages;
}));
get_version.set(r, [&ms](const_req req) {
return ms.local().get_raw_version(req.get_query_param("addr"));
get_version.set(r, [](const_req req) {
return netw::get_local_messaging_service().get_raw_version(req.get_query_param("addr"));
});
get_dropped_messages_by_ver.set(r, [&ms](std::unique_ptr<request> req) {
get_dropped_messages_by_ver.set(r, [](std::unique_ptr<request> req) {
shared_ptr<std::vector<uint64_t>> map = make_shared<std::vector<uint64_t>>(num_verb);
return ms.map_reduce([map](const uint64_t* local_map) mutable {
return netw::get_messaging_service().map_reduce([map](const uint64_t* local_map) mutable {
for (auto i = 0; i < num_verb; i++) {
(*map)[i]+= local_map[i];
}
@@ -151,18 +151,5 @@ void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging
});
});
}
void unset_messaging_service(http_context& ctx, routes& r) {
get_timeout_messages.unset(r);
get_sent_messages.unset(r);
get_dropped_messages.unset(r);
get_exception_messages.unset(r);
get_pending_messages.unset(r);
get_respond_pending_messages.unset(r);
get_respond_completed_messages.unset(r);
get_version.unset(r);
get_dropped_messages_by_ver.unset(r);
}
}

View File

@@ -23,11 +23,8 @@
#include "api.hh"
namespace netw { class messaging_service; }
namespace api {
void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging_service>& ms);
void unset_messaging_service(http_context& ctx, routes& r);
void set_messaging_service(http_context& ctx, routes& r);
}

View File

@@ -201,39 +201,29 @@ void set_storage_proxy(http_context& ctx, routes& r) {
});
sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<request> req) {
const auto& filter = service::get_storage_proxy().local().get_hints_host_filter();
return make_ready_future<json::json_return_type>(!filter.is_disabled_for_all());
auto enabled = ctx.db.local().get_config().hinted_handoff_enabled();
return make_ready_future<json::json_return_type>(enabled);
});
sp::set_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("enable");
auto filter = (enable == "true" || enable == "1")
? db::hints::host_filter(db::hints::host_filter::enabled_for_all_tag {})
: db::hints::host_filter(db::hints::host_filter::disabled_for_all_tag {});
return service::get_storage_proxy().invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {
return sp.change_hints_host_filter(filter);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
return make_ready_future<json::json_return_type>(json_void());
});
sp::get_hinted_handoff_enabled_by_dc.set(r, [](std::unique_ptr<request> req) {
std::vector<sstring> res;
const auto& filter = service::get_storage_proxy().local().get_hints_host_filter();
const auto& dcs = filter.get_dcs();
res.reserve(res.size());
std::copy(dcs.begin(), dcs.end(), std::back_inserter(res));
//TBD
unimplemented();
std::vector<sp::mapper_list> res;
return make_ready_future<json::json_return_type>(res);
});
sp::set_hinted_handoff_enabled_by_dc_list.set(r, [](std::unique_ptr<request> req) {
auto dcs = req->get_query_param("dcs");
auto filter = db::hints::host_filter::parse_from_dc_list(std::move(dcs));
return service::get_storage_proxy().invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {
return sp.change_hints_host_filter(filter);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
//TBD
unimplemented();
auto enable = req->get_query_param("dcs");
return make_ready_future<json::json_return_type>(json_void());
});
sp::get_max_hint_window.set(r, [](std::unique_ptr<request> req) {

View File

@@ -22,7 +22,6 @@
#include "storage_service.hh"
#include "api/api-doc/storage_service.json.hh"
#include "db/config.hh"
#include "db/schema_tables.hh"
#include <optional>
#include <time.h>
#include <boost/range/adaptor/map.hpp>
@@ -42,17 +41,11 @@
#include "sstables/sstables.hh"
#include "database.hh"
#include "db/extensions.hh"
#include "db/snapshot-ctl.hh"
#include "transport/controller.hh"
#include "thrift/controller.hh"
#include "locator/token_metadata.hh"
namespace api {
const locator::token_metadata& http_context::get_token_metadata() {
return *shared_token_metadata.local().get();
}
namespace ss = httpd::storage_service_json;
using namespace json;
@@ -156,104 +149,6 @@ void unset_rpc_controller(http_context& ctx, routes& r) {
ss::is_rpc_server_running.unset(r);
}
void set_repair(http_context& ctx, routes& r, sharded<netw::messaging_service>& ms) {
ss::repair_async.set(r, [&ctx, &ms](std::unique_ptr<request> req) {
static std::vector<sstring> options = {"primaryRange", "parallelism", "incremental",
"jobThreads", "ranges", "columnFamilies", "dataCenters", "hosts", "trace",
"startToken", "endToken" };
std::unordered_map<sstring, sstring> options_map;
for (auto o : options) {
auto s = req->get_query_param(o);
if (s != "") {
options_map[o] = s;
}
}
// The repair process is asynchronous: repair_start only starts it and
// returns immediately, not waiting for the repair to finish. The user
// then has other mechanisms to track the ongoing repair's progress,
// or stop it.
return repair_start(ctx.db, ms, validate_keyspace(ctx, req->param),
options_map).then([] (int i) {
return make_ready_future<json::json_return_type>(i);
});
});
ss::get_active_repair_async.set(r, [&ctx](std::unique_ptr<request> req) {
return get_active_repairs(ctx.db).then([] (std::vector<int> res){
return make_ready_future<json::json_return_type>(res);
});
});
ss::repair_async_status.set(r, [&ctx](std::unique_ptr<request> req) {
return repair_get_status(ctx.db, boost::lexical_cast<int>( req->get_query_param("id")))
.then_wrapped([] (future<repair_status>&& fut) {
ss::ns_repair_async_status::return_type_wrapper res;
try {
res = fut.get0();
} catch(std::runtime_error& e) {
throw httpd::bad_param_exception(e.what());
}
return make_ready_future<json::json_return_type>(json::json_return_type(res));
});
});
ss::repair_await_completion.set(r, [&ctx](std::unique_ptr<request> req) {
int id;
using clock = std::chrono::steady_clock;
clock::time_point expire;
try {
id = boost::lexical_cast<int>(req->get_query_param("id"));
// If timeout is not provided, it means no timeout.
sstring s = req->get_query_param("timeout");
int64_t timeout = s.empty() ? int64_t(-1) : boost::lexical_cast<int64_t>(s);
if (timeout < 0 && timeout != -1) {
return make_exception_future<json::json_return_type>(
httpd::bad_param_exception("timeout can only be -1 (means no timeout) or non negative integer"));
}
if (timeout < 0) {
expire = clock::time_point::max();
} else {
expire = clock::now() + std::chrono::seconds(timeout);
}
} catch (std::exception& e) {
return make_exception_future<json::json_return_type>(httpd::bad_param_exception(e.what()));
}
return repair_await_completion(ctx.db, id, expire)
.then_wrapped([] (future<repair_status>&& fut) {
ss::ns_repair_async_status::return_type_wrapper res;
try {
res = fut.get0();
} catch (std::exception& e) {
return make_exception_future<json::json_return_type>(httpd::server_error_exception(e.what()));
}
return make_ready_future<json::json_return_type>(json::json_return_type(res));
});
});
ss::force_terminate_all_repair_sessions.set(r, [](std::unique_ptr<request> req) {
return repair_abort_all(service::get_local_storage_service().db()).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::force_terminate_all_repair_sessions_new.set(r, [](std::unique_ptr<request> req) {
return repair_abort_all(service::get_local_storage_service().db()).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
}
void unset_repair(http_context& ctx, routes& r) {
ss::repair_async.unset(r);
ss::get_active_repair_async.unset(r);
ss::repair_async_status.unset(r);
ss::repair_await_completion.unset(r);
ss::force_terminate_all_repair_sessions.unset(r);
ss::force_terminate_all_repair_sessions_new.unset(r);
}
void set_storage_service(http_context& ctx, routes& r) {
ss::local_hostid.set(r, [](std::unique_ptr<request> req) {
return db::system_keyspace::get_local_host_id().then([](const utils::UUID& id) {
@@ -262,14 +157,14 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_tokens.set(r, [&ctx] (std::unique_ptr<request> req) {
return make_ready_future<json::json_return_type>(stream_range_as_array(ctx.get_token_metadata().sorted_tokens(), [](const dht::token& i) {
return make_ready_future<json::json_return_type>(stream_range_as_array(ctx.token_metadata.local().sorted_tokens(), [](const dht::token& i) {
return boost::lexical_cast<std::string>(i);
}));
});
ss::get_node_tokens.set(r, [&ctx] (std::unique_ptr<request> req) {
gms::inet_address addr(req->param["endpoint"]);
return make_ready_future<json::json_return_type>(stream_range_as_array(ctx.get_token_metadata().get_tokens(addr), [](const dht::token& i) {
return make_ready_future<json::json_return_type>(stream_range_as_array(ctx.token_metadata.local().get_tokens(addr), [](const dht::token& i) {
return boost::lexical_cast<std::string>(i);
}));
});
@@ -288,7 +183,7 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_leaving_nodes.set(r, [&ctx](const_req req) {
return container_to_vec(ctx.get_token_metadata().get_leaving_endpoints());
return container_to_vec(ctx.token_metadata.local().get_leaving_endpoints());
});
ss::get_moving_nodes.set(r, [](const_req req) {
@@ -297,7 +192,7 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_joining_nodes.set(r, [&ctx](const_req req) {
auto points = ctx.get_token_metadata().get_bootstrap_tokens();
auto points = ctx.token_metadata.local().get_bootstrap_tokens();
std::unordered_set<sstring> addr;
for (auto i: points) {
addr.insert(boost::lexical_cast<std::string>(i.second));
@@ -325,26 +220,11 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_range_to_endpoint_map.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req->param);
std::vector<ss::maplist_mapper> res;
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_range_to_address_map(keyspace),
[](const std::pair<dht::token_range, std::vector<gms::inet_address>>& entry){
ss::maplist_mapper m;
if (entry.first.start()) {
m.key.push(entry.first.start().value().value().to_sstring());
} else {
m.key.push("");
}
if (entry.first.end()) {
m.key.push(entry.first.end().value().value().to_sstring());
} else {
m.key.push("");
}
for (const gms::inet_address& address : entry.second) {
m.value.push(address.to_sstring());
}
return m;
}));
return make_ready_future<json::json_return_type>(res);
});
ss::get_pending_range_to_endpoint_map.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -366,7 +246,7 @@ void set_storage_service(http_context& ctx, routes& r) {
ss::get_host_id_map.set(r, [&ctx](const_req req) {
std::vector<ss::mapper> res;
return map_to_key_value(ctx.get_token_metadata().get_endpoint_to_host_id_map_for_reading(), res);
return map_to_key_value(ctx.token_metadata.local().get_endpoint_to_host_id_map_for_reading(), res);
});
ss::get_load.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -458,7 +338,7 @@ void set_storage_service(http_context& ctx, routes& r) {
return do_for_each(column_families, [=, &db](sstring cfname) {
auto& cm = db.get_compaction_manager();
auto& cf = db.find_column_family(keyspace, cfname);
return cm.perform_sstable_upgrade(db, &cf, exclude_current_version);
return cm.perform_sstable_upgrade(&cf, exclude_current_version);
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
@@ -481,6 +361,59 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::repair_async.set(r, [&ctx](std::unique_ptr<request> req) {
static std::vector<sstring> options = {"primaryRange", "parallelism", "incremental",
"jobThreads", "ranges", "columnFamilies", "dataCenters", "hosts", "trace",
"startToken", "endToken" };
std::unordered_map<sstring, sstring> options_map;
for (auto o : options) {
auto s = req->get_query_param(o);
if (s != "") {
options_map[o] = s;
}
}
// The repair process is asynchronous: repair_start only starts it and
// returns immediately, not waiting for the repair to finish. The user
// then has other mechanisms to track the ongoing repair's progress,
// or stop it.
return repair_start(ctx.db, validate_keyspace(ctx, req->param),
options_map).then([] (int i) {
return make_ready_future<json::json_return_type>(i);
});
});
ss::get_active_repair_async.set(r, [&ctx](std::unique_ptr<request> req) {
return get_active_repairs(ctx.db).then([] (std::vector<int> res){
return make_ready_future<json::json_return_type>(res);
});
});
ss::repair_async_status.set(r, [&ctx](std::unique_ptr<request> req) {
return repair_get_status(ctx.db, boost::lexical_cast<int>( req->get_query_param("id")))
.then_wrapped([] (future<repair_status>&& fut) {
ss::ns_repair_async_status::return_type_wrapper res;
try {
res = fut.get0();
} catch(std::runtime_error& e) {
throw httpd::bad_param_exception(e.what());
}
return make_ready_future<json::json_return_type>(json::json_return_type(res));
});
});
ss::force_terminate_all_repair_sessions.set(r, [](std::unique_ptr<request> req) {
return repair_abort_all(service::get_local_storage_service().db()).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::force_terminate_all_repair_sessions_new.set(r, [](std::unique_ptr<request> req) {
return repair_abort_all(service::get_local_storage_service().db()).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::decommission.set(r, [](std::unique_ptr<request> req) {
return service::get_local_storage_service().decommission().then([] {
return make_ready_future<json::json_return_type>(json_void());
@@ -738,12 +671,9 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::reset_local_schema.set(r, [](std::unique_ptr<request> req) {
// FIXME: We should truncate schema tables if more than one node in the cluster.
auto& sp = service::get_storage_proxy();
auto& fs = service::get_local_storage_service().features();
return db::schema_tables::recalculate_schema_version(sp, fs).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
});
ss::set_trace_probability.set(r, [](std::unique_ptr<request> req) {
@@ -991,7 +921,7 @@ void set_storage_service(http_context& ctx, routes& r) {
e.value = p.second;
nm.attributes.push(std::move(e));
}
if (!cp->options().contains(compression_parameters::SSTABLE_COMPRESSION)) {
if (!cp->options().count(compression_parameters::SSTABLE_COMPRESSION)) {
ss::mapper e;
e.key = compression_parameters::SSTABLE_COMPRESSION;
e.value = cp->name();
@@ -1049,29 +979,31 @@ void set_storage_service(http_context& ctx, routes& r) {
}
void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_ctl) {
ss::get_snapshot_details.set(r, [&snap_ctl](std::unique_ptr<request> req) {
return snap_ctl.local().get_snapshot_details().then([] (std::unordered_map<sstring, std::vector<db::snapshot_ctl::snapshot_details>>&& result) {
std::function<future<>(output_stream<char>&&)> f = [result = std::move(result)](output_stream<char>&& s) {
return do_with(output_stream<char>(std::move(s)), true, [&result] (output_stream<char>& s, bool& first){
return s.write("[").then([&s, &first, &result] {
return do_for_each(result, [&s, &first](std::tuple<sstring, std::vector<db::snapshot_ctl::snapshot_details>>&& map){
return do_with(ss::snapshots(), [&s, &first, &map](ss::snapshots& all_snapshots) {
all_snapshots.key = std::get<0>(map);
future<> f = first ? make_ready_future<>() : s.write(", ");
first = false;
std::vector<ss::snapshot> snapshot;
for (auto& cf: std::get<1>(map)) {
ss::snapshot snp;
snp.ks = cf.ks;
snp.cf = cf.cf;
snp.live = cf.live;
snp.total = cf.total;
snapshot.push_back(std::move(snp));
}
all_snapshots.value = std::move(snapshot);
return f.then([&s, &all_snapshots] {
return all_snapshots.write(s);
void set_snapshot(http_context& ctx, routes& r) {
ss::get_snapshot_details.set(r, [](std::unique_ptr<request> req) {
std::function<future<>(output_stream<char>&&)> f = [](output_stream<char>&& s) {
return do_with(output_stream<char>(std::move(s)), true, [] (output_stream<char>& s, bool& first){
return s.write("[").then([&s, &first] {
return service::get_local_storage_service().get_snapshot_details().then([&s, &first] (std::unordered_map<sstring, std::vector<service::storage_service::snapshot_details>>&& result) {
return do_with(std::move(result), [&s, &first](const std::unordered_map<sstring, std::vector<service::storage_service::snapshot_details>>& result) {
return do_for_each(result, [&s, &result,&first](std::tuple<sstring, std::vector<service::storage_service::snapshot_details>>&& map){
return do_with(ss::snapshots(), [&s, &first, &result, &map](ss::snapshots& all_snapshots) {
all_snapshots.key = std::get<0>(map);
future<> f = first ? make_ready_future<>() : s.write(", ");
first = false;
std::vector<ss::snapshot> snapshot;
for (auto& cf: std::get<1>(map)) {
ss::snapshot snp;
snp.ks = cf.ks;
snp.cf = cf.cf;
snp.live = cf.live;
snp.total = cf.total;
snapshot.push_back(std::move(snp));
}
all_snapshots.value = std::move(snapshot);
return f.then([&s, &all_snapshots] {
return all_snapshots.write(s);
});
});
});
});
@@ -1081,13 +1013,12 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_
});
});
});
};
return make_ready_future<json::json_return_type>(std::move(f));
});
});
};
return make_ready_future<json::json_return_type>(std::move(f));
});
ss::take_snapshot.set(r, [&snap_ctl](std::unique_ptr<request> req) {
ss::take_snapshot.set(r, [](std::unique_ptr<request> req) {
auto tag = req->get_query_param("tag");
auto column_families = split(req->get_query_param("cf"), ",");
@@ -1095,7 +1026,7 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_
auto resp = make_ready_future<>();
if (column_families.empty()) {
resp = snap_ctl.local().take_snapshot(tag, keynames);
resp = service::get_local_storage_service().take_snapshot(tag, keynames);
} else {
if (keynames.empty()) {
throw httpd::bad_param_exception("The keyspace of column families must be specified");
@@ -1103,37 +1034,37 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_
if (keynames.size() > 1) {
throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");
}
resp = snap_ctl.local().take_column_family_snapshot(keynames[0], column_families, tag);
resp = service::get_local_storage_service().take_column_family_snapshot(keynames[0], column_families, tag);
}
return resp.then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::del_snapshot.set(r, [&snap_ctl](std::unique_ptr<request> req) {
ss::del_snapshot.set(r, [](std::unique_ptr<request> req) {
auto tag = req->get_query_param("tag");
auto column_family = req->get_query_param("cf");
std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");
return snap_ctl.local().clear_snapshot(tag, keynames, column_family).then([] {
return service::get_local_storage_service().clear_snapshot(tag, keynames, column_family).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::true_snapshots_size.set(r, [&snap_ctl](std::unique_ptr<request> req) {
return snap_ctl.local().true_snapshots_size().then([] (int64_t size) {
ss::true_snapshots_size.set(r, [](std::unique_ptr<request> req) {
return service::get_local_storage_service().true_snapshots_size().then([] (int64_t size) {
return make_ready_future<json::json_return_type>(size);
});
});
ss::scrub.set(r, wrap_ks_cf(ctx, [&snap_ctl] (http_context& ctx, std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {
ss::scrub.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {
const auto skip_corrupted = req_param<bool>(*req, "skip_corrupted", false);
auto f = make_ready_future<>();
if (!req_param<bool>(*req, "disable_snapshot", false)) {
auto tag = format("pre-scrub-{:d}", db_clock::now().time_since_epoch().count());
f = parallel_for_each(column_families, [&snap_ctl, keyspace, tag](sstring cf) {
return snap_ctl.local().take_column_family_snapshot(keyspace, cf, tag);
f = parallel_for_each(column_families, [keyspace, tag](sstring cf) {
return service::get_local_storage_service().take_column_family_snapshot(keyspace, cf, tag);
});
}
@@ -1151,12 +1082,4 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_
}));
}
void unset_snapshot(http_context& ctx, routes& r) {
ss::get_snapshot_details.unset(r);
ss::take_snapshot.unset(r);
ss::del_snapshot.unset(r);
ss::true_snapshots_size.unset(r);
ss::scrub.unset(r);
}
}

View File

@@ -21,24 +21,18 @@
#pragma once
#include <seastar/core/sharded.hh>
#include "api.hh"
namespace cql_transport { class controller; }
class thrift_controller;
namespace db { class snapshot_ctl; }
namespace netw { class messaging_service; }
namespace api {
void set_storage_service(http_context& ctx, routes& r);
void set_repair(http_context& ctx, routes& r, sharded<netw::messaging_service>& ms);
void unset_repair(http_context& ctx, routes& r);
void set_transport_controller(http_context& ctx, routes& r, cql_transport::controller& ctl);
void unset_transport_controller(http_context& ctx, routes& r);
void set_rpc_controller(http_context& ctx, routes& r, thrift_controller& ctl);
void unset_rpc_controller(http_context& ctx, routes& r);
void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_ctl);
void unset_snapshot(http_context& ctx, routes& r);
void set_snapshot(http_context& ctx, routes& r);
}

View File

@@ -208,7 +208,7 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)
external_value_size = cell_view.value_size();
}
// Add overhead of chunk headers. The last one is a special case.
external_value_size += (external_value_size - 1) / data::cell::effective_external_chunk_length * data::cell::external_chunk_overhead;
external_value_size += (external_value_size - 1) / data::cell::maximum_external_chunk_length * data::cell::external_chunk_overhead;
external_value_size += data::cell::external_last_chunk_overhead;
}
return data::cell::structure::serialized_object_size(_data.get(), ctx)

View File

@@ -38,7 +38,6 @@
class abstract_type;
class collection_type_impl;
class atomic_cell_or_collection;
using atomic_cell_value_view = data::value_view;
using atomic_cell_value_mutable_view = data::value_mutable_view;

View File

@@ -26,7 +26,10 @@
namespace auth {
constexpr std::string_view allow_all_authenticator_name("org.apache.cassandra.auth.AllowAllAuthenticator");
const sstring& allow_all_authenticator_name() {
static const sstring name = meta::AUTH_PACKAGE_NAME + "AllowAllAuthenticator";
return name;
}
// To ensure correct initialization order, we unfortunately need to use a string literal.
static const class_registrator<

View File

@@ -37,7 +37,7 @@ class migration_manager;
namespace auth {
extern const std::string_view allow_all_authenticator_name;
const sstring& allow_all_authenticator_name();
class allow_all_authenticator final : public authenticator {
public:
@@ -53,7 +53,7 @@ public:
}
virtual std::string_view qualified_java_name() const override {
return allow_all_authenticator_name;
return allow_all_authenticator_name();
}
virtual bool require_authentication() const override {

View File

@@ -26,7 +26,10 @@
namespace auth {
constexpr std::string_view allow_all_authorizer_name("org.apache.cassandra.auth.AllowAllAuthorizer");
const sstring& allow_all_authorizer_name() {
static const sstring name = meta::AUTH_PACKAGE_NAME + "AllowAllAuthorizer";
return name;
}
// To ensure correct initialization order, we unfortunately need to use a string literal.
static const class_registrator<

View File

@@ -34,7 +34,7 @@ class migration_manager;
namespace auth {
extern const std::string_view allow_all_authorizer_name;
const sstring& allow_all_authorizer_name();
class allow_all_authorizer final : public authorizer {
public:
@@ -50,7 +50,7 @@ public:
}
virtual std::string_view qualified_java_name() const override {
return allow_all_authorizer_name;
return allow_all_authorizer_name();
}
virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override {

View File

@@ -34,9 +34,10 @@ namespace auth {
namespace meta {
constexpr std::string_view AUTH_KS("system_auth");
constexpr std::string_view USERS_CF("users");
constexpr std::string_view AUTH_PACKAGE_NAME("org.apache.cassandra.auth.");
const sstring DEFAULT_SUPERUSER_NAME("cassandra");
const sstring AUTH_KS("system_auth");
const sstring USERS_CF("users");
const sstring AUTH_PACKAGE_NAME("org.apache.cassandra.auth.");
}
@@ -108,17 +109,10 @@ future<> wait_for_schema_agreement(::service::migration_manager& mm, const datab
});
}
::service::query_state& internal_distributed_query_state() noexcept {
#ifdef DEBUG
// Give the much slower debug tests more headroom for completing auth queries.
static const auto t = 30s;
#else
const timeout_config& internal_distributed_timeout_config() noexcept {
static const auto t = 5s;
#endif
static const timeout_config tc{t, t, t, t, t, t, t};
static thread_local ::service::client_state cs(::service::client_state::internal_tag{}, tc);
static thread_local ::service::query_state qs(cs, empty_service_permit());
return qs;
return tc;
}
}

View File

@@ -35,7 +35,6 @@
#include "log.hh"
#include "seastarx.hh"
#include "utils/exponential_backoff_retry.hh"
#include "service/query_state.hh"
using namespace std::chrono_literals;
@@ -54,10 +53,10 @@ namespace auth {
namespace meta {
constexpr std::string_view DEFAULT_SUPERUSER_NAME("cassandra");
extern const std::string_view AUTH_KS;
extern const std::string_view USERS_CF;
extern const std::string_view AUTH_PACKAGE_NAME;
extern const sstring DEFAULT_SUPERUSER_NAME;
extern const sstring AUTH_KS;
extern const sstring USERS_CF;
extern const sstring AUTH_PACKAGE_NAME;
}
@@ -88,6 +87,6 @@ future<> wait_for_schema_agreement(::service::migration_manager&, const database
///
/// Time-outs for internal, non-local CQL queries.
///
::service::query_state& internal_distributed_query_state() noexcept;
const timeout_config& internal_distributed_timeout_config() noexcept;
}

View File

@@ -65,14 +65,15 @@ extern "C" {
namespace auth {
std::string_view default_authorizer::qualified_java_name() const {
return "org.apache.cassandra.auth.CassandraAuthorizer";
const sstring& default_authorizer_name() {
static const sstring name = meta::AUTH_PACKAGE_NAME + "CassandraAuthorizer";
return name;
}
static constexpr std::string_view ROLE_NAME = "role";
static constexpr std::string_view RESOURCE_NAME = "resource";
static constexpr std::string_view PERMISSIONS_NAME = "permissions";
static constexpr std::string_view PERMISSIONS_CF = "role_permissions";
static const sstring ROLE_NAME = "role";
static const sstring RESOURCE_NAME = "resource";
static const sstring PERMISSIONS_NAME = "permissions";
static const sstring PERMISSIONS_CF = "role_permissions";
static logging::logger alogger("default_authorizer");
@@ -103,6 +104,7 @@ future<bool> default_authorizer::any_granted() const {
return _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config,
{},
true).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return !results->empty();
@@ -115,7 +117,8 @@ future<> default_authorizer::migrate_legacy_metadata() const {
return _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE).then([this](::shared_ptr<cql3::untyped_result_set> results) {
db::consistency_level::LOCAL_ONE,
infinite_timeout_config).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
return do_with(
row.get_as<sstring>("username"),
@@ -195,6 +198,7 @@ default_authorizer::authorize(const role_or_anonymous& maybe_role, const resourc
return _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config,
{*maybe_role.name, r.name()}).then([](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
return permissions::NONE;
@@ -223,7 +227,7 @@ default_authorizer::modify(
return _qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{permissions::to_strings(set), sstring(role_name), resource.name()}).discard_result();
});
}
@@ -248,7 +252,7 @@ future<std::vector<permission_details>> default_authorizer::list_all() const {
return _qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{},
true).then([](::shared_ptr<cql3::untyped_result_set> results) {
std::vector<permission_details> all_details;
@@ -275,7 +279,7 @@ future<> default_authorizer::revoke_all(std::string_view role_name) const {
return _qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{sstring(role_name)}).discard_result().handle_exception([role_name](auto ep) {
try {
std::rethrow_exception(ep);
@@ -295,6 +299,7 @@ future<> default_authorizer::revoke_all(const resource& resource) const {
return _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config,
{resource.name()}).then_wrapped([this, resource](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
@@ -311,6 +316,7 @@ future<> default_authorizer::revoke_all(const resource& resource) const {
return _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config,
{r.get_as<sstring>(ROLE_NAME), resource.name()}).discard_result().handle_exception(
[resource](auto ep) {
try {

View File

@@ -51,6 +51,8 @@
namespace auth {
const sstring& default_authorizer_name();
class default_authorizer : public authorizer {
cql3::query_processor& _qp;
@@ -69,7 +71,9 @@ public:
virtual future<> stop() override;
virtual std::string_view qualified_java_name() const override;
virtual std::string_view qualified_java_name() const override {
return default_authorizer_name();
}
virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override;

View File

@@ -62,13 +62,15 @@
namespace auth {
constexpr std::string_view password_authenticator_name("org.apache.cassandra.auth.PasswordAuthenticator");
const sstring& password_authenticator_name() {
static const sstring name = meta::AUTH_PACKAGE_NAME + "PasswordAuthenticator";
return name;
}
// name of the hash column.
static constexpr std::string_view SALTED_HASH = "salted_hash";
static constexpr std::string_view OPTIONS = "options";
static constexpr std::string_view DEFAULT_USER_NAME = meta::DEFAULT_SUPERUSER_NAME;
static const sstring DEFAULT_USER_PASSWORD = sstring(meta::DEFAULT_SUPERUSER_NAME);
static const sstring SALTED_HASH = "salted_hash";
static const sstring DEFAULT_USER_NAME = meta::DEFAULT_SUPERUSER_NAME;
static const sstring DEFAULT_USER_PASSWORD = meta::DEFAULT_SUPERUSER_NAME;
static logging::logger plogger("password_authenticator");
@@ -96,7 +98,7 @@ static bool has_salted_hash(const cql3::untyped_result_set_row& row) {
static const sstring& update_row_query() {
static const sstring update_row_query = format("UPDATE {} SET {} = ? WHERE {} = ?",
meta::roles_table::qualified_name,
meta::roles_table::qualified_name(),
SALTED_HASH,
meta::roles_table::role_col_name);
return update_row_query;
@@ -115,7 +117,7 @@ future<> password_authenticator::migrate_legacy_metadata() const {
return _qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_query_state()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
auto username = row.get_as<sstring>("username");
auto salted_hash = row.get_as<sstring>(SALTED_HASH);
@@ -123,7 +125,7 @@ future<> password_authenticator::migrate_legacy_metadata() const {
return _qp.execute_internal(
update_row_query(),
consistency_for_user(username),
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{std::move(salted_hash), username}).discard_result();
}).finally([results] {});
}).then([] {
@@ -140,7 +142,7 @@ future<> password_authenticator::create_default_if_missing() const {
return _qp.execute_internal(
update_row_query(),
db::consistency_level::QUORUM,
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{passwords::hash(DEFAULT_USER_PASSWORD, rng_for_salt), DEFAULT_USER_NAME}).then([](auto&&) {
plogger.info("Created default superuser authentication record.");
});
@@ -196,7 +198,7 @@ db::consistency_level password_authenticator::consistency_for_user(std::string_v
}
std::string_view password_authenticator::qualified_java_name() const {
return password_authenticator_name;
return password_authenticator_name();
}
bool password_authenticator::require_authentication() const {
@@ -204,19 +206,19 @@ bool password_authenticator::require_authentication() const {
}
authentication_option_set password_authenticator::supported_options() const {
return authentication_option_set{authentication_option::password, authentication_option::options};
return authentication_option_set{authentication_option::password};
}
authentication_option_set password_authenticator::alterable_options() const {
return authentication_option_set{authentication_option::password, authentication_option::options};
return authentication_option_set{authentication_option::password};
}
future<authenticated_user> password_authenticator::authenticate(
const credentials_map& credentials) const {
if (!credentials.contains(USERNAME_KEY)) {
if (!credentials.count(USERNAME_KEY)) {
throw exceptions::authentication_exception(format("Required key '{}' is missing", USERNAME_KEY));
}
if (!credentials.contains(PASSWORD_KEY)) {
if (!credentials.count(PASSWORD_KEY)) {
throw exceptions::authentication_exception(format("Required key '{}' is missing", PASSWORD_KEY));
}
@@ -231,13 +233,13 @@ future<authenticated_user> password_authenticator::authenticate(
return futurize_invoke([this, username, password] {
static const sstring query = format("SELECT {} FROM {} WHERE {} = ?",
SALTED_HASH,
meta::roles_table::qualified_name,
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
return _qp.execute_internal(
query,
consistency_for_user(username),
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{username},
true);
}).then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
@@ -263,91 +265,49 @@ future<authenticated_user> password_authenticator::authenticate(
});
}
future<> password_authenticator::maybe_update_custom_options(std::string_view role_name, const authentication_options& options) const {
static const sstring query = format("UPDATE {} SET {} = ? WHERE {} = ?",
meta::roles_table::qualified_name,
OPTIONS,
meta::roles_table::role_col_name);
if (!options.options) {
return make_ready_future<>();
}
std::vector<std::pair<data_value, data_value>> entries;
for (const auto& entry : *options.options) {
entries.push_back({data_value(entry.first), data_value(entry.second)});
}
auto map_value = make_map_value(map_type_impl::get_instance(utf8_type, utf8_type, false), entries);
return _qp.execute_internal(
query,
consistency_for_user(role_name),
internal_distributed_query_state(),
{std::move(map_value), sstring(role_name)}).discard_result();
}
future<> password_authenticator::create(std::string_view role_name, const authentication_options& options) const {
if (!options.password) {
return maybe_update_custom_options(role_name, options);
return make_ready_future<>();
}
return _qp.execute_internal(
update_row_query(),
consistency_for_user(role_name),
internal_distributed_query_state(),
{passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result().then([this, role_name, &options] {
return maybe_update_custom_options(role_name, options);
});
internal_distributed_timeout_config(),
{passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result();
}
future<> password_authenticator::alter(std::string_view role_name, const authentication_options& options) const {
if (!options.password) {
return maybe_update_custom_options(role_name, options);
return make_ready_future<>();
}
static const sstring query = format("UPDATE {} SET {} = ? WHERE {} = ?",
meta::roles_table::qualified_name,
meta::roles_table::qualified_name(),
SALTED_HASH,
meta::roles_table::role_col_name);
return _qp.execute_internal(
query,
consistency_for_user(role_name),
internal_distributed_query_state(),
{passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result().then([this, role_name, &options] {
return maybe_update_custom_options(role_name, options);
}).discard_result();
internal_distributed_timeout_config(),
{passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result();
}
future<> password_authenticator::drop(std::string_view name) const {
static const sstring query = format("DELETE {} FROM {} WHERE {} = ?",
SALTED_HASH,
meta::roles_table::qualified_name,
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
return _qp.execute_internal(
query, consistency_for_user(name),
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{sstring(name)}).discard_result();
}
future<custom_options> password_authenticator::query_custom_options(std::string_view role_name) const {
static const sstring query = format("SELECT {} FROM {} WHERE {} = ?",
OPTIONS,
meta::roles_table::qualified_name,
meta::roles_table::role_col_name);
return _qp.execute_internal(
query, consistency_for_user(role_name),
internal_distributed_query_state(),
{sstring(role_name)}).then([](::shared_ptr<cql3::untyped_result_set> rs) {
custom_options opts;
const auto& row = rs->one();
if (row.has(OPTIONS)) {
row.get_map_data<sstring, sstring>(OPTIONS, std::inserter(opts, opts.end()), utf8_type, utf8_type);
}
return opts;
});
return make_ready_future<custom_options>();
}
const resource_set& password_authenticator::protected_resources() const {

View File

@@ -52,7 +52,7 @@ class migration_manager;
namespace auth {
extern const std::string_view password_authenticator_name;
const sstring& password_authenticator_name();
class password_authenticator : public authenticator {
cql3::query_processor& _qp;
@@ -94,8 +94,6 @@ public:
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const override;
private:
future<> maybe_update_custom_options(std::string_view role_name, const authentication_options& options) const;
bool legacy_metadata_exists() const;
future<> migrate_legacy_metadata() const;

View File

@@ -43,16 +43,18 @@ std::string_view creation_query() {
" can_login boolean,"
" is_superuser boolean,"
" member_of set<text>,"
" salted_hash text,"
" options frozen<map<text, text>>,"
" salted_hash text"
")",
qualified_name,
qualified_name(),
role_col_name);
return instance;
}
constexpr std::string_view qualified_name("system_auth.roles");
std::string_view qualified_name() noexcept {
static const sstring instance = AUTH_KS + "." + sstring(name);
return instance;
}
}
@@ -62,20 +64,21 @@ future<bool> default_role_row_satisfies(
cql3::query_processor& qp,
std::function<bool(const cql3::untyped_result_set_row&)> p) {
static const sstring query = format("SELECT * FROM {} WHERE {} = ?",
meta::roles_table::qualified_name,
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
return do_with(std::move(p), [&qp](const auto& p) {
return qp.execute_internal(
query,
db::consistency_level::ONE,
infinite_timeout_config,
{meta::DEFAULT_SUPERUSER_NAME},
true).then([&qp, &p](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
return qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{meta::DEFAULT_SUPERUSER_NAME},
true).then([&p](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
@@ -94,13 +97,13 @@ future<bool> default_role_row_satisfies(
future<bool> any_nondefault_role_row_satisfies(
cql3::query_processor& qp,
std::function<bool(const cql3::untyped_result_set_row&)> p) {
static const sstring query = format("SELECT * FROM {}", meta::roles_table::qualified_name);
static const sstring query = format("SELECT * FROM {}", meta::roles_table::qualified_name());
return do_with(std::move(p), [&qp](const auto& p) {
return qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_query_state()).then([&p](::shared_ptr<cql3::untyped_result_set> results) {
internal_distributed_timeout_config()).then([&p](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
return false;
}

View File

@@ -43,7 +43,7 @@ std::string_view creation_query();
constexpr std::string_view name{"roles", 5};
extern const std::string_view qualified_name;
std::string_view qualified_name() noexcept;
constexpr std::string_view role_col_name{"role", 4};

View File

@@ -31,7 +31,9 @@
#include "auth/allow_all_authenticator.hh"
#include "auth/allow_all_authorizer.hh"
#include "auth/common.hh"
#include "auth/password_authenticator.hh"
#include "auth/role_or_anonymous.hh"
#include "auth/standard_role_manager.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
#include "db/consistency_level_type.hh"
@@ -123,7 +125,18 @@ service::service(
, _authorizer(std::move(z))
, _authenticator(std::move(a))
, _role_manager(std::move(r))
, _migration_listener(std::make_unique<auth_migration_listener>(*_authorizer)) {}
, _migration_listener(std::make_unique<auth_migration_listener>(*_authorizer)) {
// The password authenticator requires that the `standard_role_manager` is running so that the roles metadata table
// it manages is created and updated. This cross-module dependency is rather gross, but we have to maintain it for
// the sake of compatibility with Apache Cassandra and its choice of auth. schema.
if ((_authenticator->qualified_java_name() == password_authenticator_name())
&& (_role_manager->qualified_java_name() != standard_role_manager_name())) {
throw incompatible_module_combination(
format("The {} authenticator must be loaded alongside the {} role-manager.",
password_authenticator_name(),
standard_role_manager_name()));
}
}
service::service(
permissions_cache_config c,
@@ -210,6 +223,7 @@ future<bool> service::has_existing_legacy_users() const {
return _qp.execute_internal(
default_user_query,
db::consistency_level::ONE,
infinite_timeout_config,
{meta::DEFAULT_SUPERUSER_NAME},
true).then([this](auto results) {
if (!results->empty()) {
@@ -219,6 +233,7 @@ future<bool> service::has_existing_legacy_users() const {
return _qp.execute_internal(
default_user_query,
db::consistency_level::QUORUM,
infinite_timeout_config,
{meta::DEFAULT_SUPERUSER_NAME},
true).then([this](auto results) {
if (!results->empty()) {
@@ -227,7 +242,8 @@ future<bool> service::has_existing_legacy_users() const {
return _qp.execute_internal(
all_users_query,
db::consistency_level::QUORUM).then([](auto results) {
db::consistency_level::QUORUM,
infinite_timeout_config).then([](auto results) {
return make_ready_future<bool>(!results->empty());
});
});
@@ -360,28 +376,25 @@ future<permission_set> get_permissions(const service& ser, const authenticated_u
}
bool is_enforcing(const service& ser) {
const bool enforcing_authorizer = ser.underlying_authorizer().qualified_java_name() != allow_all_authorizer_name;
const bool enforcing_authorizer = ser.underlying_authorizer().qualified_java_name() != allow_all_authorizer_name();
const bool enforcing_authenticator = ser.underlying_authenticator().qualified_java_name()
!= allow_all_authenticator_name;
!= allow_all_authenticator_name();
return enforcing_authorizer || enforcing_authenticator;
}
bool is_protected(const service& ser, command_desc cmd) noexcept {
if (cmd.type_ == command_desc::type::ALTER_WITH_OPTS) {
return false; // Table attributes are OK to modify; see #7057.
}
return ser.underlying_role_manager().protected_resources().contains(cmd.resource)
|| ser.underlying_authenticator().protected_resources().contains(cmd.resource)
|| ser.underlying_authorizer().protected_resources().contains(cmd.resource);
bool is_protected(const service& ser, const resource& r) noexcept {
return ser.underlying_role_manager().protected_resources().count(r)
|| ser.underlying_authenticator().protected_resources().count(r)
|| ser.underlying_authorizer().protected_resources().count(r);
}
static void validate_authentication_options_are_supported(
const authentication_options& options,
const authentication_option_set& supported) {
const auto check = [&supported](authentication_option k) {
if (!supported.contains(k)) {
if (supported.count(k) == 0) {
throw unsupported_authentication_option(k);
}
};
@@ -461,7 +474,7 @@ future<bool> has_role(const service& ser, std::string_view grantee, std::string_
return when_all_succeed(
validate_role_exists(ser, name),
ser.get_roles(grantee)).then_unpack([name](role_set all_roles) {
return make_ready_future<bool>(all_roles.contains(sstring(name)));
return make_ready_future<bool>(all_roles.count(sstring(name)) != 0);
});
}
future<bool> has_role(const service& ser, const authenticated_user& u, std::string_view name) {
@@ -518,9 +531,14 @@ future<std::vector<permission_details>> list_filtered_permissions(
? auth::expand_resource_family(r)
: auth::resource_set{r};
std::erase_if(all_details, [&resources](const permission_details& pd) {
return !resources.contains(pd.resource);
});
all_details.erase(
std::remove_if(
all_details.begin(),
all_details.end(),
[&resources](const permission_details& pd) {
return resources.count(pd.resource) == 0;
}),
all_details.end());
}
std::transform(
@@ -533,9 +551,11 @@ future<std::vector<permission_details>> list_filtered_permissions(
});
// Eliminate rows with an empty permission set.
std::erase_if(all_details, [](const permission_details& pd) {
return pd.permissions.mask() == 0;
});
all_details.erase(
std::remove_if(all_details.begin(), all_details.end(), [](const permission_details& pd) {
return pd.permissions.mask() == 0;
}),
all_details.end());
if (!role_name) {
return make_ready_future<std::vector<permission_details>>(std::move(all_details));
@@ -547,9 +567,14 @@ future<std::vector<permission_details>> list_filtered_permissions(
return do_with(std::move(all_details), [&ser, role_name](auto& all_details) {
return ser.get_roles(*role_name).then([&all_details](role_set all_roles) {
std::erase_if(all_details, [&all_roles](const permission_details& pd) {
return !all_roles.contains(pd.role_name);
});
all_details.erase(
std::remove_if(
all_details.begin(),
all_details.end(),
[&all_roles](const permission_details& pd) {
return all_roles.count(pd.role_name) == 0;
}),
all_details.end());
return make_ready_future<std::vector<permission_details>>(std::move(all_details));
});

View File

@@ -181,21 +181,10 @@ future<permission_set> get_permissions(const service&, const authenticated_user&
///
bool is_enforcing(const service&);
/// A description of a CQL command from which auth::service can tell whether or not this command could endanger
/// internal data on which auth::service depends.
struct command_desc {
auth::permission permission; ///< Nature of the command's alteration.
const ::auth::resource& resource; ///< Resource impacted by this command.
enum class type {
ALTER_WITH_OPTS, ///< Command is ALTER ... WITH ...
OTHER
} type_ = type::OTHER;
};
///
/// Protected resources cannot be modified even if the performer has permissions to do so.
///
bool is_protected(const service&, command_desc) noexcept;
bool is_protected(const service&, const resource&) noexcept;
///
/// Create a role with optional authentication information.

View File

@@ -49,7 +49,11 @@ namespace meta {
namespace role_members_table {
constexpr std::string_view name{"role_members" , 12};
constexpr std::string_view qualified_name("system_auth.role_members");
static std::string_view qualified_name() noexcept {
static const sstring instance = AUTH_KS + "." + sstring(name);
return instance;
}
}
@@ -80,13 +84,13 @@ static db::consistency_level consistency_for_role(std::string_view role_name) no
static future<std::optional<record>> find_record(cql3::query_processor& qp, std::string_view role_name) {
static const sstring query = format("SELECT * FROM {} WHERE {} = ?",
meta::roles_table::qualified_name,
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
return qp.execute_internal(
query,
consistency_for_role(role_name),
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{sstring(role_name)},
true).then([](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
@@ -120,8 +124,13 @@ static bool has_can_login(const cql3::untyped_result_set_row& row) {
return row.has("can_login") && !(boolean_type->deserialize(row.get_blob("can_login")).is_null());
}
std::string_view standard_role_manager_name() noexcept {
static const sstring instance = meta::AUTH_PACKAGE_NAME + "CassandraRoleManager";
return instance;
}
std::string_view standard_role_manager::qualified_java_name() const noexcept {
return "org.apache.cassandra.auth.CassandraRoleManager";
return standard_role_manager_name();
}
const resource_set& standard_role_manager::protected_resources() const {
@@ -139,7 +148,7 @@ future<> standard_role_manager::create_metadata_tables_if_missing() const {
" member text,"
" PRIMARY KEY (role, member)"
")",
meta::role_members_table::qualified_name);
meta::role_members_table::qualified_name());
return when_all_succeed(
@@ -159,13 +168,13 @@ future<> standard_role_manager::create_default_role_if_missing() const {
return default_role_row_satisfies(_qp, &has_can_login).then([this](bool exists) {
if (!exists) {
static const sstring query = format("INSERT INTO {} ({}, is_superuser, can_login) VALUES (?, true, true)",
meta::roles_table::qualified_name,
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
return _qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{meta::DEFAULT_SUPERUSER_NAME}).then([](auto&&) {
log.info("Created default superuser role '{}'.", meta::DEFAULT_SUPERUSER_NAME);
return make_ready_future<>();
@@ -192,7 +201,7 @@ future<> standard_role_manager::migrate_legacy_metadata() const {
return _qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_query_state()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
role_config config;
config.is_superuser = row.get_or<bool>("super", false);
@@ -247,13 +256,13 @@ future<> standard_role_manager::stop() {
future<> standard_role_manager::create_or_replace(std::string_view role_name, const role_config& c) const {
static const sstring query = format("INSERT INTO {} ({}, is_superuser, can_login) VALUES (?, ?, ?)",
meta::roles_table::qualified_name,
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
return _qp.execute_internal(
query,
consistency_for_role(role_name),
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{sstring(role_name), c.is_superuser, c.can_login},
true).discard_result();
}
@@ -292,11 +301,11 @@ standard_role_manager::alter(std::string_view role_name, const role_config_updat
return _qp.execute_internal(
format("UPDATE {} SET {} WHERE {} = ?",
meta::roles_table::qualified_name,
meta::roles_table::qualified_name(),
build_column_assignments(u),
meta::roles_table::role_col_name),
consistency_for_role(role_name),
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{sstring(role_name)}).discard_result();
});
}
@@ -310,12 +319,12 @@ future<> standard_role_manager::drop(std::string_view role_name) const {
// First, revoke this role from all roles that are members of it.
const auto revoke_from_members = [this, role_name] {
static const sstring query = format("SELECT member FROM {} WHERE role = ?",
meta::role_members_table::qualified_name);
meta::role_members_table::qualified_name());
return _qp.execute_internal(
query,
consistency_for_role(role_name),
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{sstring(role_name)}).then([this, role_name](::shared_ptr<cql3::untyped_result_set> members) {
return parallel_for_each(
members->begin(),
@@ -348,13 +357,13 @@ future<> standard_role_manager::drop(std::string_view role_name) const {
// Finally, delete the role itself.
auto delete_role = [this, role_name] {
static const sstring query = format("DELETE FROM {} WHERE {} = ?",
meta::roles_table::qualified_name,
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
return _qp.execute_internal(
query,
consistency_for_role(role_name),
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{sstring(role_name)}).discard_result();
};
@@ -374,14 +383,14 @@ standard_role_manager::modify_membership(
const auto modify_roles = [this, role_name, grantee_name, ch] {
const auto query = format(
"UPDATE {} SET member_of = member_of {} ? WHERE {} = ?",
meta::roles_table::qualified_name,
meta::roles_table::qualified_name(),
(ch == membership_change::add ? '+' : '-'),
meta::roles_table::role_col_name);
return _qp.execute_internal(
query,
consistency_for_role(grantee_name),
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{role_set{sstring(role_name)}, sstring(grantee_name)}).discard_result();
};
@@ -390,24 +399,24 @@ standard_role_manager::modify_membership(
case membership_change::add:
return _qp.execute_internal(
format("INSERT INTO {} (role, member) VALUES (?, ?)",
meta::role_members_table::qualified_name),
meta::role_members_table::qualified_name()),
consistency_for_role(role_name),
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{sstring(role_name), sstring(grantee_name)}).discard_result();
case membership_change::remove:
return _qp.execute_internal(
format("DELETE FROM {} WHERE role = ? AND member = ?",
meta::role_members_table::qualified_name),
meta::role_members_table::qualified_name()),
consistency_for_role(role_name),
internal_distributed_query_state(),
internal_distributed_timeout_config(),
{sstring(role_name), sstring(grantee_name)}).discard_result();
}
return make_ready_future<>();
};
return when_all_succeed(modify_roles(), modify_role_members).discard_result();
return when_all_succeed(modify_roles(), modify_role_members()).discard_result();
}
future<>
@@ -416,7 +425,7 @@ standard_role_manager::grant(std::string_view grantee_name, std::string_view rol
return this->query_granted(
grantee_name,
recursive_role_query::yes).then([role_name, grantee_name](role_set roles) {
if (roles.contains(sstring(role_name))) {
if (roles.count(sstring(role_name)) != 0) {
throw role_already_included(grantee_name, role_name);
}
@@ -428,7 +437,7 @@ standard_role_manager::grant(std::string_view grantee_name, std::string_view rol
return this->query_granted(
role_name,
recursive_role_query::yes).then([role_name, grantee_name](role_set roles) {
if (roles.contains(sstring(grantee_name))) {
if (roles.count(sstring(grantee_name)) != 0) {
throw role_already_included(role_name, grantee_name);
}
@@ -451,7 +460,7 @@ standard_role_manager::revoke(std::string_view revokee_name, std::string_view ro
return this->query_granted(
revokee_name,
recursive_role_query::no).then([revokee_name, role_name](role_set roles) {
if (!roles.contains(sstring(role_name))) {
if (roles.count(sstring(role_name)) == 0) {
throw revoke_ungranted_role(revokee_name, role_name);
}
@@ -495,7 +504,7 @@ future<role_set> standard_role_manager::query_granted(std::string_view grantee_n
future<role_set> standard_role_manager::query_all() const {
static const sstring query = format("SELECT {} FROM {}",
meta::roles_table::role_col_name,
meta::roles_table::qualified_name);
meta::roles_table::qualified_name());
// To avoid many copies of a view.
static const auto role_col_name_string = sstring(meta::roles_table::role_col_name);
@@ -503,7 +512,7 @@ future<role_set> standard_role_manager::query_all() const {
return _qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_query_state()).then([](::shared_ptr<cql3::untyped_result_set> results) {
internal_distributed_timeout_config()).then([](::shared_ptr<cql3::untyped_result_set> results) {
role_set roles;
std::transform(

View File

@@ -42,6 +42,8 @@ class migration_manager;
namespace auth {
std::string_view standard_role_manager_name() noexcept;
class standard_role_manager final : public role_manager {
cql3::query_processor& _qp;
::service::migration_manager& _migration_manager;

View File

@@ -101,7 +101,7 @@ public:
virtual future<authenticated_user> authenticate(const credentials_map& credentials) const override {
auto i = credentials.find(authenticator::USERNAME_KEY);
if ((i == credentials.end() || i->second.empty())
&& (!credentials.contains(PASSWORD_KEY) || credentials.at(PASSWORD_KEY).empty())) {
&& (!credentials.count(PASSWORD_KEY) || credentials.at(PASSWORD_KEY).empty())) {
// return anon user
return make_ready_future<authenticated_user>(anonymous_user());
}

View File

@@ -20,7 +20,7 @@ static const Elf64_Nhdr* get_nt_build_id(dl_phdr_info* info) {
continue;
}
auto* p = reinterpret_cast<const char*>(base + h->p_vaddr);
auto* p = reinterpret_cast<const char*>(base) + h->p_vaddr;
auto* e = p + h->p_memsz;
while (p != e) {
const auto* n = reinterpret_cast<const Elf64_Nhdr*>(p);
@@ -49,17 +49,16 @@ static int callback(dl_phdr_info* info, size_t size, void* data) {
assert(strlen(info->dlpi_name) == 0);
auto* n = get_nt_build_id(info);
auto* p = reinterpret_cast<const unsigned char*>(n);
auto* p = reinterpret_cast<const char*>(n);
p += sizeof(Elf64_Nhdr);
p += n->n_namesz;
p = align_up(p, 4);
auto* desc = p;
auto* desc_end = p + n->n_descsz;
while (desc < desc_end) {
fmt::fprintf(os, "%02x", *desc++);
const char* desc = p;
for (unsigned i = 0; i < n->n_descsz; ++i) {
fmt::fprintf(os, "%02x", (unsigned char)*(desc + i));
}
ret = os.str();
return 1;

View File

@@ -100,7 +100,3 @@ std::ostream& operator<<(std::ostream& os, const bytes_view& b) {
}
}
std::ostream& operator<<(std::ostream& os, const fmt_hex& b) {
return os << to_hex(b.v);
}

View File

@@ -39,10 +39,6 @@ inline sstring_view to_sstring_view(bytes_view view) {
return {reinterpret_cast<const char*>(view.data()), view.size()};
}
inline bytes_view to_bytes_view(sstring_view view) {
return {reinterpret_cast<const int8_t*>(view.data()), view.size()};
}
namespace std {
template <>
@@ -54,13 +50,6 @@ struct hash<bytes_view> {
}
struct fmt_hex {
bytes_view& v;
fmt_hex(bytes_view& v) noexcept : v(v) {}
};
std::ostream& operator<<(std::ostream& os, const fmt_hex& hex);
bytes from_hex(sstring_view s);
sstring to_hex(bytes_view b);
sstring to_hex(const bytes& b);
@@ -95,12 +84,9 @@ struct appending_hash<bytes_view> {
};
inline int32_t compare_unsigned(bytes_view v1, bytes_view v2) {
auto size = std::min(v1.size(), v2.size());
if (size) {
auto n = memcmp(v1.begin(), v2.begin(), size);
auto n = memcmp(v1.begin(), v2.begin(), std::min(v1.size(), v2.size()));
if (n) {
return n;
}
}
return (int32_t) (v1.size() - v2.size());
}

View File

@@ -65,14 +65,7 @@ private:
size_type _size;
size_type _initial_chunk_size = default_chunk_size;
public:
class fragment_iterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = bytes_view;
using difference_type = std::ptrdiff_t;
using pointer = bytes_view*;
using reference = bytes_view&;
private:
class fragment_iterator : public std::iterator<std::input_iterator_tag, bytes_view> {
chunk* _current = nullptr;
public:
fragment_iterator() = default;

View File

@@ -28,6 +28,7 @@
#include "partition_version.hh"
#include "utils/logalloc.hh"
#include "query-request.hh"
#include "partition_snapshot_reader.hh"
#include "partition_snapshot_row_cursor.hh"
#include "read_context.hh"
#include "flat_mutation_reader.hh"
@@ -133,7 +134,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
void maybe_add_to_cache(const static_row& sr);
void maybe_set_static_row_continuous();
void finish_reader() {
push_mutation_fragment(*_schema, _permit, partition_end());
push_mutation_fragment(partition_end());
_end_of_stream = true;
_state = state::end_of_stream;
}
@@ -145,7 +146,7 @@ public:
lw_shared_ptr<read_context> ctx,
partition_snapshot_ptr snp,
row_cache& cache)
: flat_mutation_reader::impl(std::move(s), ctx->permit())
: flat_mutation_reader::impl(std::move(s))
, _snp(std::move(snp))
, _position_cmp(*_schema)
, _ck_ranges(std::move(crr))
@@ -157,8 +158,8 @@ public:
, _read_context(std::move(ctx))
, _next_row(*_schema, *_snp)
{
clogger.trace("csm {}: table={}.{}", fmt::ptr(this), _schema->ks_name(), _schema->cf_name());
push_mutation_fragment(*_schema, _permit, partition_start(std::move(dk), _snp->partition_tombstone()));
clogger.trace("csm {}: table={}.{}", this, _schema->ks_name(), _schema->cf_name());
push_mutation_fragment(partition_start(std::move(dk), _snp->partition_tombstone()));
}
cache_flat_mutation_reader(const cache_flat_mutation_reader&) = delete;
cache_flat_mutation_reader(cache_flat_mutation_reader&&) = delete;
@@ -187,7 +188,7 @@ future<> cache_flat_mutation_reader::process_static_row(db::timeout_clock::time_
return _snp->static_row(_read_context->digest_requested());
});
if (!sr.empty()) {
push_mutation_fragment(mutation_fragment(*_schema, _permit, std::move(sr)));
push_mutation_fragment(mutation_fragment(std::move(sr)));
}
return make_ready_future<>();
} else {
@@ -231,7 +232,7 @@ future<> cache_flat_mutation_reader::fill_buffer(db::timeout_clock::time_point t
return after_static_row();
}
}
clogger.trace("csm {}: fill_buffer(), range={}, lb={}", fmt::ptr(this), *_ck_ranges_curr, _lower_bound);
clogger.trace("csm {}: fill_buffer(), range={}, lb={}", this, *_ck_ranges_curr, _lower_bound);
return do_until([this] { return _end_of_stream || is_buffer_full(); }, [this, timeout] {
return do_fill_buffer(timeout);
});
@@ -276,7 +277,7 @@ future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_poin
// assert(_state == state::reading_from_cache)
return _lsa_manager.run_in_read_section([this] {
auto next_valid = _next_row.iterators_valid();
clogger.trace("csm {}: reading_from_cache, range=[{}, {}), next={}, valid={}", fmt::ptr(this), _lower_bound,
clogger.trace("csm {}: reading_from_cache, range=[{}, {}), next={}, valid={}", this, _lower_bound,
_upper_bound, _next_row.position(), next_valid);
// We assume that if there was eviction, and thus the range may
// no longer be continuous, the cursor was invalidated.
@@ -290,7 +291,7 @@ future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_poin
}
}
_next_row.maybe_refresh();
clogger.trace("csm {}: next={}, cont={}", fmt::ptr(this), _next_row.position(), _next_row.continuous());
clogger.trace("csm {}: next={}, cont={}", this, _next_row.position(), _next_row.continuous());
_lower_bound_changed = false;
while (_state == state::reading_from_cache) {
copy_from_cache_to_buffer();
@@ -356,7 +357,7 @@ future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::tim
e.release();
auto next = std::next(it);
it->set_continuous(next->continuous());
clogger.trace("csm {}: inserted dummy at {}, cont={}", fmt::ptr(this), it->position(), it->continuous());
clogger.trace("csm {}: inserted dummy at {}, cont={}", this, it->position(), it->continuous());
}
});
} else if (ensure_population_lower_bound()) {
@@ -367,11 +368,11 @@ future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::tim
auto insert_result = rows.insert_check(_next_row.get_iterator_in_latest_version(), *e, less);
auto inserted = insert_result.second;
if (inserted) {
clogger.trace("csm {}: inserted dummy at {}", fmt::ptr(this), _upper_bound);
clogger.trace("csm {}: inserted dummy at {}", this, _upper_bound);
_snp->tracker()->insert(*e);
e.release();
} else {
clogger.trace("csm {}: mark {} as continuous", fmt::ptr(this), insert_result.first->position());
clogger.trace("csm {}: mark {} as continuous", this, insert_result.first->position());
insert_result.first->set_continuous(true);
}
});
@@ -412,7 +413,7 @@ bool cache_flat_mutation_reader::ensure_population_lower_bound() {
auto insert_result = rows.insert_check(rows.end(), *e, less);
auto inserted = insert_result.second;
if (inserted) {
clogger.trace("csm {}: inserted lower bound dummy at {}", fmt::ptr(this), e->position());
clogger.trace("csm {}: inserted lower bound dummy at {}", this, e->position());
_snp->tracker()->insert(*e);
e.release();
}
@@ -452,7 +453,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
_read_context->cache().on_mispopulate();
return;
}
clogger.trace("csm {}: populate({})", fmt::ptr(this), clustering_row::printer(*_schema, cr));
clogger.trace("csm {}: populate({})", this, clustering_row::printer(*_schema, cr));
_lsa_manager.run_in_update_section_with_allocator([this, &cr] {
mutation_partition& mp = _snp->version()->partition();
rows_entry::compare less(*_schema);
@@ -474,7 +475,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
rows_entry& e = *it;
if (ensure_population_lower_bound()) {
clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), e.position());
clogger.trace("csm {}: set_continuous({})", this, e.position());
e.set_continuous(true);
} else {
_read_context->cache().on_mispopulate();
@@ -493,14 +494,14 @@ bool cache_flat_mutation_reader::after_current_range(position_in_partition_view
inline
void cache_flat_mutation_reader::start_reading_from_underlying() {
clogger.trace("csm {}: start_reading_from_underlying(), range=[{}, {})", fmt::ptr(this), _lower_bound, _next_row_in_range ? _next_row.position() : _upper_bound);
clogger.trace("csm {}: start_reading_from_underlying(), range=[{}, {})", this, _lower_bound, _next_row_in_range ? _next_row.position() : _upper_bound);
_state = state::move_to_underlying;
_next_row.touch();
}
inline
void cache_flat_mutation_reader::copy_from_cache_to_buffer() {
clogger.trace("csm {}: copy_from_cache, next={}, next_row_in_range={}", fmt::ptr(this), _next_row.position(), _next_row_in_range);
clogger.trace("csm {}: copy_from_cache, next={}, next_row_in_range={}", this, _next_row.position(), _next_row_in_range);
_next_row.touch();
position_in_partition_view next_lower_bound = _next_row.dummy() ? _next_row.position() : position_in_partition_view::after_key(_next_row.key());
for (auto &&rts : _snp->range_tombstones(_lower_bound, _next_row_in_range ? next_lower_bound : _upper_bound)) {
@@ -508,7 +509,7 @@ void cache_flat_mutation_reader::copy_from_cache_to_buffer() {
// This guarantees that rts starts after any emitted clustering_row
// and not before any emitted range tombstone.
if (!less(_lower_bound, rts.position())) {
rts.set_start(_lower_bound);
rts.set_start(*_schema, _lower_bound);
} else {
_lower_bound = position_in_partition(rts.position());
_lower_bound_changed = true;
@@ -516,7 +517,7 @@ void cache_flat_mutation_reader::copy_from_cache_to_buffer() {
return;
}
}
push_mutation_fragment(*_schema, _permit, std::move(rts));
push_mutation_fragment(std::move(rts));
}
// We add the row to the buffer even when it's full.
// This simplifies the code. For more info see #3139.
@@ -532,7 +533,7 @@ void cache_flat_mutation_reader::copy_from_cache_to_buffer() {
inline
void cache_flat_mutation_reader::move_to_end() {
finish_reader();
clogger.trace("csm {}: eos", fmt::ptr(this));
clogger.trace("csm {}: eos", this);
}
inline
@@ -557,7 +558,7 @@ void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::con
_ck_ranges_curr = next_it;
auto adjacent = _next_row.advance_to(_lower_bound);
_next_row_in_range = !after_current_range(_next_row.position());
clogger.trace("csm {}: move_to_range(), range={}, lb={}, ub={}, next={}", fmt::ptr(this), *_ck_ranges_curr, _lower_bound, _upper_bound, _next_row.position());
clogger.trace("csm {}: move_to_range(), range={}, lb={}, ub={}, next={}", this, *_ck_ranges_curr, _lower_bound, _upper_bound, _next_row.position());
if (!adjacent && !_next_row.continuous()) {
// FIXME: We don't insert a dummy for singular range to avoid allocating 3 entries
// for a hit (before, at and after). If we supported the concept of an incomplete row,
@@ -567,7 +568,7 @@ void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::con
// Insert dummy for lower bound
if (can_populate()) {
// FIXME: _lower_bound could be adjacent to the previous row, in which case we could skip this
clogger.trace("csm {}: insert dummy at {}", fmt::ptr(this), _lower_bound);
clogger.trace("csm {}: insert dummy at {}", this, _lower_bound);
auto it = with_allocator(_lsa_manager.region().allocator(), [&] {
auto& rows = _snp->version()->partition().clustered_rows();
auto new_entry = current_allocator().construct<rows_entry>(*_schema, _lower_bound, is_dummy::yes, is_continuous::no);
@@ -586,7 +587,7 @@ void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::con
// _next_row must be inside the range.
inline
void cache_flat_mutation_reader::move_to_next_entry() {
clogger.trace("csm {}: move_to_next_entry(), curr={}", fmt::ptr(this), _next_row.position());
clogger.trace("csm {}: move_to_next_entry(), curr={}", this, _next_row.position());
if (no_clustering_row_between(*_schema, _next_row.position(), _upper_bound)) {
move_to_next_range();
} else {
@@ -595,7 +596,7 @@ void cache_flat_mutation_reader::move_to_next_entry() {
return;
}
_next_row_in_range = !after_current_range(_next_row.position());
clogger.trace("csm {}: next={}, cont={}, in_range={}", fmt::ptr(this), _next_row.position(), _next_row.continuous(), _next_row_in_range);
clogger.trace("csm {}: next={}, cont={}, in_range={}", this, _next_row.position(), _next_row.continuous(), _next_row_in_range);
if (!_next_row.continuous()) {
start_reading_from_underlying();
}
@@ -604,7 +605,7 @@ void cache_flat_mutation_reader::move_to_next_entry() {
inline
void cache_flat_mutation_reader::add_to_buffer(mutation_fragment&& mf) {
clogger.trace("csm {}: add_to_buffer({})", fmt::ptr(this), mutation_fragment::printer(*_schema, mf));
clogger.trace("csm {}: add_to_buffer({})", this, mutation_fragment::printer(*_schema, mf));
if (mf.is_clustering_row()) {
add_clustering_row_to_buffer(std::move(mf));
} else {
@@ -617,7 +618,7 @@ inline
void cache_flat_mutation_reader::add_to_buffer(const partition_snapshot_row_cursor& row) {
if (!row.dummy()) {
_read_context->cache().on_row_hit();
add_clustering_row_to_buffer(mutation_fragment(*_schema, _permit, row.row(_read_context->digest_requested())));
add_clustering_row_to_buffer(row.row(_read_context->digest_requested()));
}
}
@@ -626,7 +627,7 @@ void cache_flat_mutation_reader::add_to_buffer(const partition_snapshot_row_curs
// (2) If _lower_bound > mf.position(), mf was emitted
inline
void cache_flat_mutation_reader::add_clustering_row_to_buffer(mutation_fragment&& mf) {
clogger.trace("csm {}: add_clustering_row_to_buffer({})", fmt::ptr(this), mutation_fragment::printer(*_schema, mf));
clogger.trace("csm {}: add_clustering_row_to_buffer({})", this, mutation_fragment::printer(*_schema, mf));
auto& row = mf.as_clustering_row();
auto new_lower_bound = position_in_partition::after_key(row.key());
push_mutation_fragment(std::move(mf));
@@ -636,7 +637,7 @@ void cache_flat_mutation_reader::add_clustering_row_to_buffer(mutation_fragment&
inline
void cache_flat_mutation_reader::add_to_buffer(range_tombstone&& rt) {
clogger.trace("csm {}: add_to_buffer({})", fmt::ptr(this), rt);
clogger.trace("csm {}: add_to_buffer({})", this, rt);
// This guarantees that rt starts after any emitted clustering_row
// and not before any emitted range tombstone.
position_in_partition::less_compare less(*_schema);
@@ -644,18 +645,18 @@ void cache_flat_mutation_reader::add_to_buffer(range_tombstone&& rt) {
return;
}
if (!less(_lower_bound, rt.position())) {
rt.set_start(_lower_bound);
rt.set_start(*_schema, _lower_bound);
} else {
_lower_bound = position_in_partition(rt.position());
_lower_bound_changed = true;
}
push_mutation_fragment(*_schema, _permit, std::move(rt));
push_mutation_fragment(std::move(rt));
}
inline
void cache_flat_mutation_reader::maybe_add_to_cache(const range_tombstone& rt) {
if (can_populate()) {
clogger.trace("csm {}: maybe_add_to_cache({})", fmt::ptr(this), rt);
clogger.trace("csm {}: maybe_add_to_cache({})", this, rt);
_lsa_manager.run_in_update_section_with_allocator([&] {
_snp->version()->partition().row_tombstones().apply_monotonically(*_schema, rt);
});
@@ -667,7 +668,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const range_tombstone& rt) {
inline
void cache_flat_mutation_reader::maybe_add_to_cache(const static_row& sr) {
if (can_populate()) {
clogger.trace("csm {}: populate({})", fmt::ptr(this), static_row::printer(*_schema, sr));
clogger.trace("csm {}: populate({})", this, static_row::printer(*_schema, sr));
_read_context->cache().on_static_row_insert();
_lsa_manager.run_in_update_section_with_allocator([&] {
if (_read_context->digest_requested()) {
@@ -683,7 +684,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const static_row& sr) {
inline
void cache_flat_mutation_reader::maybe_set_static_row_continuous() {
if (can_populate()) {
clogger.trace("csm {}: set static row continuous", fmt::ptr(this));
clogger.trace("csm {}: set static row continuous", this);
_snp->version()->partition().set_static_row_continuous(true);
} else {
_read_context->cache().on_mispopulate();

View File

@@ -23,7 +23,7 @@
#include <seastar/core/sstring.hh>
#include <boost/lexical_cast.hpp>
#include "exceptions/exceptions.hh"
#include "utils/rjson.hh"
#include "json.hh"
#include "seastarx.hh"
class schema;
@@ -76,7 +76,7 @@ public:
}
sstring to_sstring() const {
return rjson::print(rjson::from_string_map(to_map()));
return json::to_json(to_map());
}
static caching_options get_disabled_caching_options() {
@@ -97,14 +97,13 @@ public:
} else if (p.first == "enabled") {
e = p.second == "true";
} else {
throw exceptions::configuration_exception(format("Invalid caching option: {}", p.first));
throw exceptions::configuration_exception("Invalid caching option: " + p.first);
}
}
return caching_options(k, r, e);
}
static caching_options from_sstring(const sstring& str) {
return from_map(rjson::parse_to_map<std::map<sstring, sstring>>(str));
return from_map(json::to_map(str));
}
bool operator==(const caching_options& other) const {

View File

@@ -33,13 +33,9 @@ template<typename T>
struct cartesian_product {
const std::vector<std::vector<T>>& _vec_of_vecs;
public:
class iterator {
class iterator : public std::iterator<std::forward_iterator_tag, std::vector<T>> {
public:
using iterator_category = std::forward_iterator_tag;
using value_type = std::vector<T>;
using difference_type = std::ptrdiff_t;
using pointer = std::vector<T>*;
using reference = std::vector<T>&;
private:
size_t _pos;
const std::vector<std::vector<T>>* _vec_of_vecs;

View File

@@ -20,16 +20,10 @@
#pragma once
#include <map>
#include <seastar/core/sstring.hh>
#include "bytes.hh"
#include "serializer.hh"
#include "db/extensions.hh"
#include "cdc/cdc_options.hh"
#include "schema.hh"
#include "serializer_impl.hh"
namespace cdc {
@@ -39,7 +33,6 @@ public:
static constexpr auto NAME = "cdc";
cdc_extension() = default;
cdc_extension(const options& opts) : _cdc_options(opts) {}
explicit cdc_extension(std::map<sstring, sstring> tags) : _cdc_options(std::move(tags)) {}
explicit cdc_extension(const bytes& b) : _cdc_options(cdc_extension::deserialize(b)) {}
explicit cdc_extension(const sstring& s) {

View File

@@ -27,32 +27,10 @@
namespace cdc {
enum class delta_mode : uint8_t {
keys,
full,
};
/**
* (for now only pre-) image collection mode.
* Stating how much info to record.
* off == none
* on == changed columns
* full == all (changed and unmodified columns)
*/
enum class image_mode : uint8_t {
off,
on,
full,
};
std::ostream& operator<<(std::ostream& os, delta_mode);
std::ostream& operator<<(std::ostream& os, image_mode);
class options final {
bool _enabled = false;
image_mode _preimage = image_mode::off;
bool _preimage = false;
bool _postimage = false;
delta_mode _delta_mode = delta_mode::full;
int _ttl = 86400; // 24h in seconds
public:
options() = default;
@@ -62,19 +40,10 @@ public:
sstring to_sstring() const;
bool enabled() const { return _enabled; }
bool preimage() const { return _preimage != image_mode::off; }
bool full_preimage() const { return _preimage == image_mode::full; }
bool preimage() const { return _preimage; }
bool postimage() const { return _postimage; }
delta_mode get_delta_mode() const { return _delta_mode; }
void set_delta_mode(delta_mode m) { _delta_mode = m; }
int ttl() const { return _ttl; }
void enabled(bool b) { _enabled = b; }
void preimage(bool b) { preimage(b ? image_mode::on : image_mode::off); }
void preimage(image_mode m) { _preimage = m; }
void postimage(bool b) { _postimage = b; }
void ttl(int v) { _ttl = v; }
bool operator==(const options& o) const;
bool operator!=(const options& o) const;
};

View File

@@ -1,283 +0,0 @@
/*
* Copyright (C) 2020 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "mutation.hh"
/*
* This file contains a general abstraction for walking over mutations,
* deconstructing them into ``atomic'' pieces, and consuming these pieces.
*
* The pieces considered atomic are:
* - atomic_cells, either in collections or in atomic columns
* (see `live_collection_cell`, `dead_collection_cell`, `live_atomic_cell`, `dead_atomic_cell`),
* - collection tombstones (see `collection_tombstone`)
* - row markers (see `marker`)
* - row tombstones (see `clustered_row_delete`),
* - range tombstones (see `range_delete`),
* - partition tombstones (see `partition_delete`).
* We use the term ``changes'' to refer to these atomic pieces, hence the name ``ChangeVisitor''.
*
* IMPORTANT: this doesn't understand all possible states that a mutation can have, e.g. it doesn't understand
* the concept of ``continuity''. However, it is sufficient for analyzing mutations created by a write coordinator,
* e.g. obtained by parsing a CQL statement.
*
* To analyze a mutation, create a visitor (described by the `ChangeVisitor` concept below) and pass it
* together with the mutation to `inspect_mutation`.
*
* To analyze certain fragments of the mutation, the inspecting code requires further visitors to be passed.
* For example, when it encounters a clustered row update, it calls `clustered_row_cells` on the visitor,
* passing it the row's key and the callback. The visitor can then decide:
* - if it's not interested in the row's cells, it can simply not call the callback,
* - otherwise, it can call the callback with a value of type that satisfies the ``RowCellsVisitor'' concept.
* If the callback is called, the inspector walks over the row and passes the changes into the ``row cells visitor''.
* In either case, it will then proceed to analyze further parts of the mutation, if any.
*
* Note that the type passed to the callbacks provided by the inspector (such as in the example above)
* can be decided at runtime. This can be especially useful with the callback passed to `collection_column`
* in RowCellsVisitor, if different collection types require different logic to handle.
*
* The dummy visitors below are there only to define the concepts.
* For example, in the RowCellsVisitor concept I wanted to express that `visit_collection` in RowCellsVisitor
* is a function that handles *any* type which satisfies CollectionVisitor. I didn't find a way to do that
* other than providing a ``most generic'' concrete type which satisfies the interface (`dummy_collection_visitor`).
* Unfortunately C++ is still not Haskell.
*
* The inspector calls `finished()` after visiting each change, and sometimes before (e.g. when it starts
* visiting a static row, but before it visits any of its cells). If it returns true, the inspector
* will stop the visitation. Thus, if at any point during the walk the visitor decides it's not interested
* in any more changes, it can inform the inspector by returning `true` from `finished()`.
*
* IMPORTANT: if the visitor returns `true` from `finished()`, it should keep returning `true`. This is because
* the inspector may call `finished()` multiple times when exiting some nested loops.
*
* The order of visitation is as follows:
* - First the static row is visited, if it has any cells.
* Within the row, its columns are visited in order of increasing column IDs.
*
* - Then, for each clustering key, if a change (row marker, cell, or tombstone) exists for this key:
* - The row marker is visited, if there is one.
* - Columns are visited in order of increasing column IDs.
* - The row tombstone is visited, if there is one.
*
* For both the static row and a clustering row, for each column:
* - If the column is atomic, a corresponding atomic_cell is visited (if there is one).
* - Otherwise (the column is non-atomic):
* - The collection tombstone is visited first.
* - Cells are visited in order of increasing keys
* (assuming that the mutation was correctly constructed, i.e. it stores cells in key order).
*
* WARNING: visited collection tombstone and cells
* are guaranteed to live only for the duration of `collection_column` call.
*
* - Then range tombstones are visited. The order is unspecified
* (more accurately: if it's specified, I don't know what it is)
*
* - Finally, the partition tombstone is visited, if it exists.
*/
namespace cdc {
template <typename V>
concept CollectionVisitor = requires(V v,
const tombstone& t,
bytes_view key,
const atomic_cell_view& cell) {
{ v.collection_tombstone(t) } -> std::same_as<void>;
{ v.live_collection_cell(key, cell) } -> std::same_as<void>;
{ v.dead_collection_cell(key, cell) } -> std::same_as<void>;
{ v.finished() } -> std::same_as<bool>;
};
struct dummy_collection_visitor {
void collection_tombstone(const tombstone&) {}
void live_collection_cell(bytes_view, const atomic_cell_view&) {}
void dead_collection_cell(bytes_view, const atomic_cell_view&) {}
bool finished() { return false; }
};
template <typename V>
concept RowCellsVisitor = requires(V v,
const column_definition& cdef,
const atomic_cell_view& cell,
noncopyable_function<void(dummy_collection_visitor&)> visit_collection) {
{ v.live_atomic_cell(cdef, cell) } -> std::same_as<void>;
{ v.dead_atomic_cell(cdef, cell) } -> std::same_as<void>;
{ v.collection_column(cdef, std::move(visit_collection)) } -> std::same_as<void>;
{ v.finished() } -> std::same_as<bool>;
};
struct dummy_row_cells_visitor {
void live_atomic_cell(const column_definition&, const atomic_cell_view&) {}
void dead_atomic_cell(const column_definition&, const atomic_cell_view&) {}
void collection_column(const column_definition&, auto&& visit_collection) {
dummy_collection_visitor v;
visit_collection(v);
}
bool finished() { return false; }
};
template <typename V>
concept ClusteredRowCellsVisitor = requires(V v,
const row_marker& rm) {
requires RowCellsVisitor<V>;
{ v.marker(rm) } -> std::same_as<void>;
};
struct dummy_clustered_row_cells_visitor : public dummy_row_cells_visitor {
void marker(const row_marker&) {}
};
template <typename V>
concept ChangeVisitor = requires(V v,
api::timestamp_type ts,
const clustering_key& ckey,
const range_tombstone& rt,
const tombstone& t,
noncopyable_function<void(dummy_clustered_row_cells_visitor&)> visit_clustered_row_cells,
noncopyable_function<void(dummy_row_cells_visitor&)> visit_row_cells) {
{ v.static_row_cells(std::move(visit_row_cells)) } -> std::same_as<void>;
{ v.clustered_row_cells(ckey, std::move(visit_clustered_row_cells)) } -> std::same_as<void>;
{ v.clustered_row_delete(ckey, t) } -> std::same_as<void>;
{ v.range_delete(rt) } -> std::same_as<void>;
{ v.partition_delete(t) } -> std::same_as<void>;
{ v.finished() } -> std::same_as<bool>;
};
template <RowCellsVisitor V>
void inspect_row_cells(const schema& s, column_kind ckind, const row& r, V& v) {
r.for_each_cell_until([&s, ckind, &v] (column_id id, const atomic_cell_or_collection& acoc) {
auto& cdef = s.column_at(ckind, id);
if (cdef.is_atomic()) {
auto cell = acoc.as_atomic_cell(cdef);
if (cell.is_live()) {
v.live_atomic_cell(cdef, cell);
} else {
v.dead_atomic_cell(cdef, cell);
}
return stop_iteration(v.finished());
}
acoc.as_collection_mutation().with_deserialized(*cdef.type, [&v, &cdef] (collection_mutation_view_description view) {
v.collection_column(cdef, [&view] (CollectionVisitor auto& cv) {
if (cv.finished()) {
return;
}
if (view.tomb) {
cv.collection_tombstone(view.tomb);
if (cv.finished()) {
return;
}
}
for (auto& [key, cell]: view.cells) {
if (cell.is_live()) {
cv.live_collection_cell(key, cell);
} else {
cv.dead_collection_cell(key, cell);
}
if (cv.finished()) {
return;
}
}
});
});
return stop_iteration(v.finished());
});
}
template <ChangeVisitor V>
void inspect_mutation(const mutation& m, V& v) {
auto& p = m.partition();
auto& s = *m.schema();
if (!p.static_row().empty()) {
v.static_row_cells([&s, &p] (RowCellsVisitor auto& srv) {
if (srv.finished()) {
return;
}
inspect_row_cells(s, column_kind::static_column, p.static_row().get(), srv);
});
if (v.finished()) {
return;
}
}
for (auto& cr: p.clustered_rows()) {
auto& r = cr.row();
if (r.marker().is_live() || !r.cells().empty()) {
v.clustered_row_cells(cr.key(), [&s, &r] (ClusteredRowCellsVisitor auto& crv) {
if (crv.finished()) {
return;
}
auto& rm = r.marker();
if (rm.is_live()) {
crv.marker(rm);
if (crv.finished()) {
return;
}
}
inspect_row_cells(s, column_kind::regular_column, r.cells(), crv);
});
if (v.finished()) {
return;
}
}
if (r.deleted_at()) {
auto t = r.deleted_at().tomb();
assert(t.timestamp != api::missing_timestamp);
v.clustered_row_delete(cr.key(), t);
if (v.finished()) {
return;
}
}
}
for (auto& rt: p.row_tombstones()) {
assert(rt.tomb.timestamp != api::missing_timestamp);
v.range_delete(rt);
if (v.finished()) {
return;
}
}
if (p.partition_tombstone()) {
v.partition_delete(p.partition_tombstone());
}
}
} // namespace cdc

View File

@@ -59,57 +59,14 @@ static void copy_int_to_bytes(int64_t i, size_t offset, bytes& b) {
std::copy_n(reinterpret_cast<int8_t*>(&i), sizeof(int64_t), b.begin() + offset);
}
static constexpr auto stream_id_version_bits = 4;
static constexpr auto stream_id_random_bits = 38;
static constexpr auto stream_id_index_bits = sizeof(uint64_t)*8 - stream_id_version_bits - stream_id_random_bits;
static constexpr auto stream_id_version_shift = 0;
static constexpr auto stream_id_index_shift = stream_id_version_shift + stream_id_version_bits;
static constexpr auto stream_id_random_shift = stream_id_index_shift + stream_id_index_bits;
/**
* Responsibilty for encoding stream_id moved from factory method to
* this constructor, to keep knowledge of composition in a single place.
* Note this is private and friended to topology_description_generator,
* because he is the one who defined the "order" we view vnodes etc.
*/
stream_id::stream_id(dht::token token, size_t vnode_index)
stream_id::stream_id(int64_t first, int64_t second)
: _value(bytes::initialized_later(), 2 * sizeof(int64_t))
{
static thread_local std::mt19937_64 rand_gen(std::random_device{}());
static thread_local std::uniform_int_distribution<uint64_t> rand_dist;
auto rand = rand_dist(rand_gen);
auto mask_shift = [](uint64_t val, size_t bits, size_t shift) {
return (val & ((1ull << bits) - 1u)) << shift;
};
/**
* Low qword:
* 0-4: version
* 5-26: vnode index as when created (see generation below). This excludes shards
* 27-64: random value (maybe to be replaced with timestamp)
*/
auto low_qword = mask_shift(version_1, stream_id_version_bits, stream_id_version_shift)
| mask_shift(vnode_index, stream_id_index_bits, stream_id_index_shift)
| mask_shift(rand, stream_id_random_bits, stream_id_random_shift)
;
copy_int_to_bytes(dht::token::to_int64(token), 0, _value);
copy_int_to_bytes(low_qword, sizeof(int64_t), _value);
// not a hot code path. make sure we did not mess up the shifts and masks.
assert(version() == version_1);
assert(index() == vnode_index);
copy_int_to_bytes(first, 0, _value);
copy_int_to_bytes(second, sizeof(int64_t), _value);
}
stream_id::stream_id(bytes b)
: _value(std::move(b))
{
// this is not a very solid check. Id:s previous to GA/versioned id:s
// have fully random bits in low qword, so this could go either way...
if (version() > version_1) {
throw std::invalid_argument("Unknown CDC stream id version");
}
}
stream_id::stream_id(bytes b) : _value(std::move(b)) { }
bool stream_id::is_set() const {
return !_value.empty();
@@ -119,10 +76,6 @@ bool stream_id::operator==(const stream_id& o) const {
return _value == o._value;
}
bool stream_id::operator!=(const stream_id& o) const {
return !(*this == o);
}
bool stream_id::operator<(const stream_id& o) const {
return _value < o._value;
}
@@ -134,26 +87,18 @@ static int64_t bytes_to_int64(bytes_view b, size_t offset) {
return net::ntoh(res);
}
dht::token stream_id::token() const {
return dht::token::from_int64(token_from_bytes(_value));
int64_t stream_id::first() const {
return token_from_bytes(_value);
}
int64_t stream_id::second() const {
return bytes_to_int64(_value, sizeof(int64_t));
}
int64_t stream_id::token_from_bytes(bytes_view b) {
return bytes_to_int64(b, 0);
}
static uint64_t unpack_value(bytes_view b, size_t off, size_t shift, size_t bits) {
return (uint64_t(bytes_to_int64(b, off)) >> shift) & ((1ull << bits) - 1u);
}
uint8_t stream_id::version() const {
return unpack_value(_value, sizeof(int64_t), stream_id_version_shift, stream_id_version_bits);
}
size_t stream_id::index() const {
return unpack_value(_value, sizeof(int64_t), stream_id_index_shift, stream_id_index_bits);
}
const bytes& stream_id::to_bytes() const {
return _value;
}
@@ -178,15 +123,22 @@ const std::vector<token_range_description>& topology_description::entries() cons
return _entries;
}
static stream_id create_stream_id(dht::token t) {
static thread_local std::mt19937_64 rand_gen(std::random_device().operator()());
static thread_local std::uniform_int_distribution<int64_t> rand_dist(std::numeric_limits<int64_t>::min());
return {dht::token::to_int64(t), rand_dist(rand_gen)};
}
class topology_description_generator final {
const db::config& _cfg;
const std::unordered_set<dht::token>& _bootstrap_tokens;
const locator::token_metadata_ptr _tmptr;
const locator::token_metadata& _token_metadata;
const gms::gossiper& _gossiper;
// Compute a set of tokens that split the token ring into vnodes
auto get_tokens() const {
auto tokens = _tmptr->sorted_tokens();
auto tokens = _token_metadata.sorted_tokens();
auto it = tokens.insert(
tokens.end(), _bootstrap_tokens.begin(), _bootstrap_tokens.end());
std::sort(it, tokens.end());
@@ -198,10 +150,10 @@ class topology_description_generator final {
// Fetch sharding parameters for a node that owns vnode ending with this.end
// Returns <shard_count, ignore_msb> pair.
std::pair<size_t, uint8_t> get_sharding_info(dht::token end) const {
if (_bootstrap_tokens.contains(end)) {
if (_bootstrap_tokens.count(end) > 0) {
return {smp::count, _cfg.murmur3_partitioner_ignore_msb_bits()};
} else {
auto endpoint = _tmptr->get_endpoint(end);
auto endpoint = _token_metadata.get_endpoint(end);
if (!endpoint) {
throw std::runtime_error(
format("Can't find endpoint for token {}", end));
@@ -211,7 +163,7 @@ class topology_description_generator final {
}
}
token_range_description create_description(size_t index, dht::token start, dht::token end) const {
token_range_description create_description(dht::token start, dht::token end) const {
token_range_description desc;
desc.token_range_end = end;
@@ -223,10 +175,7 @@ class topology_description_generator final {
dht::sharder sharder(shard_count, ignore_msb);
for (size_t shard_idx = 0; shard_idx < shard_count; ++shard_idx) {
auto t = dht::find_first_token_for_shard(sharder, start, end, shard_idx);
// compose the id from token and the "index" of the range end owning vnode
// as defined by token sort order. Basically grouping within this
// shard set.
desc.streams.emplace_back(stream_id(t, index));
desc.streams.push_back(create_stream_id(t));
}
return desc;
@@ -235,11 +184,11 @@ public:
topology_description_generator(
const db::config& cfg,
const std::unordered_set<dht::token>& bootstrap_tokens,
const locator::token_metadata_ptr tmptr,
const locator::token_metadata& token_metadata,
const gms::gossiper& gossiper)
: _cfg(cfg)
, _bootstrap_tokens(bootstrap_tokens)
, _tmptr(std::move(tmptr))
, _token_metadata(token_metadata)
, _gossiper(gossiper)
{}
@@ -264,10 +213,10 @@ public:
vnode_descriptions.reserve(tokens.size());
vnode_descriptions.push_back(
create_description(0, tokens.back(), tokens.front()));
create_description(tokens.back(), tokens.front()));
for (size_t idx = 1; idx < tokens.size(); ++idx) {
vnode_descriptions.push_back(
create_description(idx, tokens[idx - 1], tokens[idx]));
create_description(tokens[idx - 1], tokens[idx]));
}
return {std::move(vnode_descriptions)};
@@ -298,19 +247,18 @@ future<db_clock::time_point> get_local_streams_timestamp() {
db_clock::time_point make_new_cdc_generation(
const db::config& cfg,
const std::unordered_set<dht::token>& bootstrap_tokens,
const locator::token_metadata_ptr tmptr,
const locator::token_metadata& tm,
const gms::gossiper& g,
db::system_distributed_keyspace& sys_dist_ks,
std::chrono::milliseconds ring_delay,
bool add_delay) {
using namespace std::chrono;
auto gen = topology_description_generator(cfg, bootstrap_tokens, tmptr, g).generate();
bool for_testing) {
auto gen = topology_description_generator(cfg, bootstrap_tokens, tm, g).generate();
// Begin the race.
auto ts = db_clock::now() + (
(!add_delay || ring_delay == milliseconds(0)) ? milliseconds(0) : (
2 * ring_delay + duration_cast<milliseconds>(generation_leeway)));
sys_dist_ks.insert_cdc_topology_description(ts, std::move(gen), { tmptr->count_normal_token_owners() }).get();
for_testing ? std::chrono::milliseconds(0) : (
2 * ring_delay + std::chrono::duration_cast<std::chrono::milliseconds>(generation_leeway)));
sys_dist_ks.insert_cdc_topology_description(ts, std::move(gen), { tm.count_normal_token_owners() }).get();
return ts;
}

View File

@@ -40,7 +40,6 @@
#include "database_fwd.hh"
#include "db_clock.hh"
#include "dht/token.hh"
#include "locator/token_metadata.hh"
namespace seastar {
class abort_source;
@@ -56,31 +55,29 @@ namespace gms {
class gossiper;
} // namespace gms
namespace locator {
class token_metadata;
} // namespace locator
namespace cdc {
class stream_id final {
bytes _value;
public:
static constexpr uint8_t version_1 = 1;
stream_id() = default;
stream_id(int64_t, int64_t);
stream_id(bytes);
bool is_set() const;
bool operator==(const stream_id&) const;
bool operator!=(const stream_id&) const;
bool operator<(const stream_id&) const;
uint8_t version() const;
size_t index() const;
int64_t first() const;
int64_t second() const;
const bytes& to_bytes() const;
dht::token token() const;
partition_key to_partition_key(const schema& log_schema) const;
static int64_t token_from_bytes(bytes_view);
private:
friend class topology_description_generator;
stream_id(dht::token, size_t);
};
/* Describes a mapping of tokens to CDC streams in a token range.
@@ -116,23 +113,6 @@ public:
const std::vector<token_range_description>& entries() const;
};
/**
* The set of streams for a single topology version/generation
* I.e. the stream ids at a given time.
*/
class streams_version {
public:
std::vector<stream_id> streams;
db_clock::time_point timestamp;
std::optional<db_clock::time_point> expired;
streams_version(std::vector<stream_id> s, db_clock::time_point ts, std::optional<db_clock::time_point> exp)
: streams(std::move(s))
, timestamp(ts)
, expired(std::move(exp))
{}
};
/* Should be called when we're restarting and we noticed that we didn't save any streams timestamp in our local tables,
* which means that we're probably upgrading from a non-CDC/old CDC version (another reason could be
* that there's a bug, or the user messed with our local tables).
@@ -150,8 +130,8 @@ bool should_propose_first_generation(const gms::inet_address& me, const gms::gos
*/
future<db_clock::time_point> get_local_streams_timestamp();
/* Generate a new set of CDC streams and insert it into the distributed cdc_generation_descriptions table.
* Returns the timestamp of this new generation
/* Generate a new set of CDC streams and insert it into the distributed cdc_generations table.
* Returns the timestamp of this new generation.
*
* Should be called when starting the node for the first time (i.e., joining the ring).
*
@@ -165,11 +145,11 @@ future<db_clock::time_point> get_local_streams_timestamp();
db_clock::time_point make_new_cdc_generation(
const db::config& cfg,
const std::unordered_set<dht::token>& bootstrap_tokens,
const locator::token_metadata_ptr tmptr,
const locator::token_metadata& tm,
const gms::gossiper& g,
db::system_distributed_keyspace& sys_dist_ks,
std::chrono::milliseconds ring_delay,
bool add_delay);
bool for_testing);
/* Retrieves CDC streams generation timestamp from the given endpoint's application state (broadcasted through gossip).
* We might be during a rolling upgrade, so the timestamp might not be there (if the other node didn't upgrade yet),
@@ -181,7 +161,7 @@ std::optional<db_clock::time_point> get_streams_timestamp_for(const gms::inet_ad
/* Inform CDC users about a generation of streams (identified by the given timestamp)
* by inserting it into the cdc_streams table.
*
* Assumes that the cdc_generation_descriptions table contains this generation.
* Assumes that the cdc_generations table contains this generation.
*
* Returning from this function does not mean that the table update was successful: the function
* might run an asynchronous task in the background.

1511
cdc/log.cc

File diff suppressed because it is too large Load Diff

View File

@@ -41,6 +41,7 @@
#include "exceptions/exceptions.hh"
#include "timestamp.hh"
#include "tracing/trace_state.hh"
#include "cdc_options.hh"
#include "utils/UUID.hh"
class schema;
@@ -62,7 +63,6 @@ class query_state;
class mutation;
class partition_key;
class database;
namespace cdc {
@@ -100,19 +100,19 @@ public:
struct db_context final {
service::storage_proxy& _proxy;
service::migration_notifier& _migration_notifier;
const locator::token_metadata& _token_metadata;
locator::token_metadata& _token_metadata;
cdc::metadata& _cdc_metadata;
class builder final {
service::storage_proxy& _proxy;
std::optional<std::reference_wrapper<service::migration_notifier>> _migration_notifier;
std::optional<std::reference_wrapper<const locator::token_metadata>> _token_metadata;
std::optional<std::reference_wrapper<locator::token_metadata>> _token_metadata;
std::optional<std::reference_wrapper<cdc::metadata>> _cdc_metadata;
public:
builder(service::storage_proxy& proxy);
builder& with_migration_notifier(service::migration_notifier& migration_notifier);
builder& with_token_metadata(const locator::token_metadata& token_metadata);
builder& with_token_metadata(locator::token_metadata& token_metadata);
builder& with_cdc_metadata(cdc::metadata&);
db_context build();
@@ -129,12 +129,7 @@ enum class operation : int8_t {
};
bool is_log_for_some_table(const sstring& ks_name, const std::string_view& table_name);
schema_ptr get_base_table(const database&, const schema&);
schema_ptr get_base_table(const database&, sstring_view, std::string_view);
seastar::sstring base_name(std::string_view log_name);
seastar::sstring log_name(std::string_view table_name);
seastar::sstring log_name(const seastar::sstring& table_name);
seastar::sstring log_data_column_name(std::string_view column_name);
seastar::sstring log_meta_column_name(std::string_view column_name);
bytes log_data_column_name_bytes(const bytes& column_name);
@@ -146,8 +141,6 @@ bytes log_data_column_deleted_name_bytes(const bytes& column_name);
seastar::sstring log_data_column_deleted_elements_name(std::string_view column_name);
bytes log_data_column_deleted_elements_name_bytes(const bytes& column_name);
bool is_cdc_metacolumn_name(const sstring& name);
utils::UUID generate_timeuuid(api::timestamp_type t);
} // namespace cdc

View File

@@ -77,12 +77,6 @@ cdc::metadata::container_t::const_iterator cdc::metadata::gen_used_at(api::times
return std::prev(it);
}
bool cdc::metadata::streams_available() const {
auto now = api::new_timestamp();
auto it = gen_used_at(now);
return it != _gens.end();
}
cdc::stream_id cdc::metadata::get_stream(api::timestamp_type ts, dht::token tok) {
auto now = api::new_timestamp();
if (ts > now + generation_leeway.count()) {

View File

@@ -57,10 +57,6 @@ public:
/* Is a generation with the given timestamp already known or superseded by a newer generation? */
bool known_or_obsolete(db_clock::time_point) const;
/* Are there streams available. I.e. valid for time == now. If this is false, any writes to
* CDC logs will fail fast.
*/
bool streams_available() const;
/* Return the stream for the base partition whose token is `tok` to which a corresponding log write should go
* according to the generation used at time `ts` (i.e, the latest generation whose timestamp is less or equal to `ts`).
*

View File

@@ -22,14 +22,8 @@
#include "mutation.hh"
#include "schema.hh"
#include "concrete_types.hh"
#include "types/user.hh"
#include "split.hh"
#include "log.hh"
#include "change_visitor.hh"
#include <type_traits>
struct atomic_column_update {
column_id id;
@@ -76,37 +70,6 @@ struct partition_deletion {
tombstone t;
};
using clustered_column_set = std::map<clustering_key, cdc::one_kind_column_set, clustering_key::less_compare>;
template<typename Container>
concept EntryContainer = requires(Container& container) {
// Parenthesized due to https://bugs.llvm.org/show_bug.cgi?id=45088
{ (container.atomic_entries) } -> std::same_as<std::vector<atomic_column_update>&>;
{ (container.nonatomic_entries) } -> std::same_as<std::vector<nonatomic_column_update>&>;
};
template<EntryContainer Container>
static void add_columns_affected_by_entries(cdc::one_kind_column_set& cset, const Container& cont) {
for (const auto& entry : cont.atomic_entries) {
cset.set(entry.id);
}
for (const auto& entry : cont.nonatomic_entries) {
cset.set(entry.id);
}
}
/* Given a mutation with multiple timestamps/ttl/types of changes, we split it into multiple mutations
* before passing it into `process_change` (see comment above `should_split_visitor` for more details).
*
* The first step of the splitting is to walk over the mutation and put each change into an appropriate bucket
* (see `batch`). The buckets are sorted by timestamps (see `set_of_changes`), and within each bucket,
* the changes are split according to their types (`static_updates`, `clustered_inserts`, and so on).
* Within each type, the changes are sorted w.r.t TTLs. Changes without a TTL are treated as if they had TTL = 0.
*
* The function that puts changes into bucket is called `extract_changes`. Underneath, it uses
* `extract_changes_visitor`, `extract_collection_visitor` and `extract_row_visitor`.
*/
struct batch {
std::vector<static_row_update> static_updates;
std::vector<clustered_row_insert> clustered_inserts;
@@ -114,40 +77,6 @@ struct batch {
std::vector<clustered_row_deletion> clustered_row_deletions;
std::vector<clustered_range_deletion> clustered_range_deletions;
std::optional<partition_deletion> partition_deletions;
clustered_column_set get_affected_clustered_columns_per_row(const schema& s) const {
clustered_column_set ret{clustering_key::less_compare(s)};
if (!clustered_row_deletions.empty()) {
// When deleting a row, all columns are affected
cdc::one_kind_column_set all_columns{s.regular_columns_count()};
all_columns.set(0, s.regular_columns_count(), true);
for (const auto& change : clustered_row_deletions) {
ret.insert(std::make_pair(change.key, all_columns));
}
}
auto process_change_type = [&] (const auto& changes) {
for (const auto& change : changes) {
auto& cset = ret[change.key];
cset.resize(s.regular_columns_count());
add_columns_affected_by_entries(cset, change);
}
};
process_change_type(clustered_inserts);
process_change_type(clustered_updates);
return ret;
}
cdc::one_kind_column_set get_affected_static_columns(const schema& s) const {
cdc::one_kind_column_set ret{s.static_columns_count()};
for (const auto& change : static_updates) {
add_columns_affected_by_entries(ret, change);
}
return ret;
}
};
using set_of_changes = std::map<api::timestamp_type, batch>;
@@ -157,179 +86,100 @@ struct row_update {
std::vector<nonatomic_column_update> nonatomic_entries;
};
static gc_clock::duration get_ttl(const atomic_cell_view& acv) {
return acv.is_live_and_has_ttl() ? acv.ttl() : gc_clock::duration(0);
}
static gc_clock::duration get_ttl(const row_marker& rm) {
return rm.is_expiring() ? rm.ttl() : gc_clock::duration(0);
}
using change_key_t = std::pair<api::timestamp_type, gc_clock::duration>;
/* Visits the cells and tombstone of a collection, putting the encountered changes into buckets
* sorted by timestamp first and ttl second (see `_updates`).
*/
template <typename V>
struct extract_collection_visitor {
private:
const column_id _id;
std::map<change_key_t, row_update>& _updates;
nonatomic_column_update& get_or_append_entry(api::timestamp_type ts, gc_clock::duration ttl) {
auto& updates = this->_updates[std::pair(ts, ttl)].nonatomic_entries;
if (updates.empty() || updates.back().id != _id) {
updates.push_back({_id});
static
std::map<std::pair<api::timestamp_type, gc_clock::duration>, row_update>
extract_row_updates(const row& r, column_kind ckind, const schema& schema) {
std::map<std::pair<api::timestamp_type, gc_clock::duration>, row_update> result;
r.for_each_cell([&] (column_id id, const atomic_cell_or_collection& cell) {
auto& cdef = schema.column_at(ckind, id);
if (cdef.is_atomic()) {
auto view = cell.as_atomic_cell(cdef);
auto timestamp_and_ttl = std::pair(
view.timestamp(),
view.is_live_and_has_ttl() ? view.ttl() : gc_clock::duration(0)
);
result[timestamp_and_ttl].atomic_entries.push_back({id, atomic_cell(*cdef.type, view)});
return;
}
return updates.back();
}
/* To copy a value from a collection/non-frozen UDT (in order to put it into a bucket) we need to know the value's type.
* The method of obtaining the type depends on the collection type; in particular, for non-frozen UDT, each value
* might have a different type, thus in general we need a method that, given a key (identifying the value in the collection),
* returns the value' type.
*
* We use the `Curiously Recurring Template Pattern' to avoid performing a dynamic dispatch on the collection's type for each visited cell.
* Instead we perform a single dynamic dispatch at the beginning, when encountering the collection column;
* the dispatch provides us with a correct `get_value_type` method.
* See `extract_row_visitor::collection_column` where the dispatch is done.
data_type get_value_type(bytes_view);
*/
void cell(bytes_view key, const atomic_cell_view& c) {
auto& entry = get_or_append_entry(c.timestamp(), get_ttl(c));
entry.cells.emplace_back(to_bytes(key), atomic_cell(*static_cast<V&>(*this).get_value_type(key), c));
}
public:
extract_collection_visitor(column_id id, std::map<change_key_t, row_update>& updates)
: _id(id), _updates(updates) {}
void collection_tombstone(const tombstone& t) {
auto& entry = get_or_append_entry(t.timestamp + 1, gc_clock::duration(0));
entry.t = t;
}
void live_collection_cell(bytes_view key, const atomic_cell_view& c) {
cell(key, c);
}
void dead_collection_cell(bytes_view key, const atomic_cell_view& c) {
cell(key, c);
}
constexpr bool finished() const { return false; }
};
/* Visits all cells and tombstones in a row, putting the encountered changes into buckets
* sorted by timestamp first and ttl second (see `_updates`).
*/
struct extract_row_visitor {
std::map<change_key_t, row_update> _updates;
void cell(const column_definition& cdef, const atomic_cell_view& cell) {
_updates[std::pair(cell.timestamp(), get_ttl(cell))].atomic_entries.push_back({cdef.id, atomic_cell(*cdef.type, cell)});
}
void live_atomic_cell(const column_definition& cdef, const atomic_cell_view& c) {
cell(cdef, c);
}
void dead_atomic_cell(const column_definition& cdef, const atomic_cell_view& c) {
cell(cdef, c);
}
void collection_column(const column_definition& cdef, auto&& visit_collection) {
visit(*cdef.type, make_visitor(
[&] (const collection_type_impl& ctype) {
struct collection_visitor : public extract_collection_visitor<collection_visitor> {
data_type _value_type;
collection_visitor(column_id id, std::map<change_key_t, row_update>& updates, const collection_type_impl& ctype)
: extract_collection_visitor<collection_visitor>(id, updates), _value_type(ctype.value_comparator()) {}
data_type get_value_type(bytes_view) {
return _value_type;
cell.as_collection_mutation().with_deserialized(*cdef.type, [&] (collection_mutation_view_description mview) {
auto desc = mview.materialize(*cdef.type);
for (auto& [k, v]: desc.cells) {
auto timestamp_and_ttl = std::pair(
v.timestamp(),
v.is_live_and_has_ttl() ? v.ttl() : gc_clock::duration(0)
);
auto& updates = result[timestamp_and_ttl].nonatomic_entries;
if (updates.empty() || updates.back().id != id) {
updates.push_back({id, {}});
}
} v(cdef.id, _updates, ctype);
visit_collection(v);
},
[&] (const user_type_impl& utype) {
struct udt_visitor : public extract_collection_visitor<udt_visitor> {
const user_type_impl& _utype;
udt_visitor(column_id id, std::map<change_key_t, row_update>& updates, const user_type_impl& utype)
: extract_collection_visitor<udt_visitor>(id, updates), _utype(utype) {}
data_type get_value_type(bytes_view key) {
return _utype.type(deserialize_field_index(key));
}
} v(cdef.id, _updates, utype);
visit_collection(v);
},
[&] (const abstract_type& o) {
throw std::runtime_error(format("extract_changes: unknown collection type:", o.name()));
}
));
}
constexpr bool finished() const { return false; }
};
struct extract_changes_visitor {
set_of_changes _result;
void static_row_cells(auto&& visit_row_cells) {
extract_row_visitor v;
visit_row_cells(v);
for (auto& [ts_ttl, row_update]: v._updates) {
_result[ts_ttl.first].static_updates.push_back({
ts_ttl.second,
std::move(row_update.atomic_entries),
std::move(row_update.nonatomic_entries)
});
}
}
void clustered_row_cells(const clustering_key& ckey, auto&& visit_row_cells) {
struct clustered_cells_visitor : public extract_row_visitor {
api::timestamp_type _marker_ts;
gc_clock::duration _marker_ttl;
std::optional<row_marker> _marker;
void marker(const row_marker& rm) {
_marker_ts = rm.timestamp();
_marker_ttl = get_ttl(rm);
_marker = rm;
// make sure that an entry corresponding to the row marker's timestamp and ttl is in the map
(void)_updates[std::pair(_marker_ts, _marker_ttl)];
updates.back().cells.push_back({std::move(k), std::move(v)});
}
} v;
visit_row_cells(v);
for (auto& [ts_ttl, row_update]: v._updates) {
if (desc.tomb) {
auto timestamp_and_ttl = std::pair(desc.tomb.timestamp + 1, gc_clock::duration(0));
auto& updates = result[timestamp_and_ttl].nonatomic_entries;
if (updates.empty() || updates.back().id != id) {
updates.push_back({id, {}});
}
updates.back().t = std::move(desc.tomb);
}
});
});
return result;
};
set_of_changes extract_changes(const mutation& base_mutation, const schema& base_schema) {
set_of_changes res;
auto& p = base_mutation.partition();
auto sr_updates = extract_row_updates(p.static_row().get(), column_kind::static_column, base_schema);
for (auto& [k, up]: sr_updates) {
auto [timestamp, ttl] = k;
res[timestamp].static_updates.push_back({
ttl,
std::move(up.atomic_entries),
std::move(up.nonatomic_entries)
});
}
for (const rows_entry& cr : p.clustered_rows()) {
auto cr_updates = extract_row_updates(cr.row().cells(), column_kind::regular_column, base_schema);
const auto& marker = cr.row().marker();
auto marker_timestamp = marker.timestamp();
auto marker_ttl = marker.is_expiring() ? marker.ttl() : gc_clock::duration(0);
if (marker.is_live()) {
// make sure that an entry corresponding to the row marker's timestamp and ttl is in the map
(void)cr_updates[std::pair(marker_timestamp, marker_ttl)];
}
auto is_insert = [&] (api::timestamp_type timestamp, gc_clock::duration ttl) {
if (!marker.is_live()) {
return false;
}
return timestamp == marker_timestamp && ttl == marker_ttl;
};
for (auto& [k, up]: cr_updates) {
// It is important that changes in the resulting `set_of_changes` are listed
// in increasing TTL order. The reason is explained in a comment in cdc/log.cc,
// search for "#6070".
auto [ts, ttl] = ts_ttl;
auto [timestamp, ttl] = k;
if (v._marker && ts == v._marker_ts && ttl == v._marker_ttl) {
_result[ts].clustered_inserts.push_back({
if (is_insert(timestamp, ttl)) {
res[timestamp].clustered_inserts.push_back({
ttl,
ckey,
*v._marker,
std::move(row_update.atomic_entries),
cr.key(),
marker,
std::move(up.atomic_entries),
{}
});
auto& cr_insert = _result[ts].clustered_inserts.back();
auto& cr_insert = res[timestamp].clustered_inserts.back();
bool clustered_update_exists = false;
for (auto& nonatomic_up: row_update.nonatomic_entries) {
for (auto& nonatomic_up: up.nonatomic_entries) {
// Updating a collection column with an INSERT statement implies inserting a tombstone.
//
// For example, suppose that we have:
@@ -355,9 +205,9 @@ struct extract_changes_visitor {
cr_insert.nonatomic_entries.push_back(std::move(nonatomic_up));
} else {
if (!clustered_update_exists) {
_result[ts].clustered_updates.push_back({
res[timestamp].clustered_updates.push_back({
ttl,
ckey,
cr.key(),
{},
{}
});
@@ -378,239 +228,201 @@ struct extract_changes_visitor {
clustered_update_exists = true;
}
auto& cr_update = _result[ts].clustered_updates.back();
auto& cr_update = res[timestamp].clustered_updates.back();
cr_update.nonatomic_entries.push_back(std::move(nonatomic_up));
}
}
} else {
_result[ts].clustered_updates.push_back({
res[timestamp].clustered_updates.push_back({
ttl,
ckey,
std::move(row_update.atomic_entries),
std::move(row_update.nonatomic_entries)
cr.key(),
std::move(up.atomic_entries),
std::move(up.nonatomic_entries)
});
}
}
auto row_tomb = cr.row().deleted_at().regular();
if (row_tomb) {
res[row_tomb.timestamp].clustered_row_deletions.push_back({cr.key(), row_tomb});
}
}
void clustered_row_delete(const clustering_key& ckey, const tombstone& t) {
_result[t.timestamp].clustered_row_deletions.push_back({ckey, t});
for (const auto& rt: p.row_tombstones()) {
if (rt.tomb.timestamp != api::missing_timestamp) {
res[rt.tomb.timestamp].clustered_range_deletions.push_back({rt});
}
}
void range_delete(const range_tombstone& rt) {
_result[rt.tomb.timestamp].clustered_range_deletions.push_back({rt});
auto partition_tomb_timestamp = p.partition_tombstone().timestamp;
if (partition_tomb_timestamp != api::missing_timestamp) {
res[partition_tomb_timestamp].partition_deletions = {p.partition_tombstone()};
}
void partition_delete(const tombstone& t) {
_result[t.timestamp].partition_deletions = {t};
}
constexpr bool finished() const { return false; }
};
set_of_changes extract_changes(const mutation& m) {
extract_changes_visitor v;
cdc::inspect_mutation(m, v);
return std::move(v._result);
return res;
}
namespace cdc {
struct find_timestamp_visitor {
api::timestamp_type _ts = api::missing_timestamp;
bool should_split(const mutation& base_mutation, const schema& base_schema) {
auto& p = base_mutation.partition();
bool finished() const { return _ts != api::missing_timestamp; }
api::timestamp_type found_ts = api::missing_timestamp;
std::optional<gc_clock::duration> found_ttl; // 0 = "no ttl"
void visit(api::timestamp_type ts) { _ts = ts; }
void visit(const atomic_cell_view& cell) { visit(cell.timestamp()); }
auto check_or_set = [&] (api::timestamp_type ts, gc_clock::duration ttl) {
if (found_ts != api::missing_timestamp && found_ts != ts) {
return true;
}
found_ts = ts;
void live_atomic_cell(const column_definition&, const atomic_cell_view& cell) { visit(cell); }
void dead_atomic_cell(const column_definition&, const atomic_cell_view& cell) { visit(cell); }
void collection_tombstone(const tombstone& t) {
// A collection tombstone with timestamp T can be created with:
// UPDATE ks.t USING TIMESTAMP T + 1 SET X = null WHERE ...
// (where X is a collection column).
// This is, among others, the reason why we show it in the CDC log
// with cdc$time using timestamp T + 1 instead of T.
visit(t.timestamp + 1);
}
void live_collection_cell(bytes_view, const atomic_cell_view& cell) { visit(cell); }
void dead_collection_cell(bytes_view, const atomic_cell_view& cell) { visit(cell); }
void collection_column(const column_definition&, auto&& visit_collection) { visit_collection(*this); }
void marker(const row_marker& rm) { visit(rm.timestamp()); }
void static_row_cells(auto&& visit_row_cells) { visit_row_cells(*this); }
void clustered_row_cells(const clustering_key&, auto&& visit_row_cells) { visit_row_cells(*this); }
void clustered_row_delete(const clustering_key&, const tombstone& t) { visit(t.timestamp); }
void range_delete(const range_tombstone& t) { visit(t.tomb.timestamp); }
void partition_delete(const tombstone& t) { visit(t.timestamp); }
};
if (found_ttl && *found_ttl != ttl) {
return true;
}
found_ttl = ttl;
/* Find some timestamp inside the given mutation.
*
* If this mutation was created using a single insert/update/delete statement, then it will have a single,
* well-defined timestamp (even if this timestamp occurs multiple times, e.g. in a cell and row_marker).
*
* This function shouldn't be used for mutations that have multiple different timestamps: the function
* would only find one of them. When dealing with such mutations, the caller should first split the mutation
* into multiple ones, each with a single timestamp.
*/
api::timestamp_type find_timestamp(const mutation& m) {
find_timestamp_visitor v;
return false;
};
cdc::inspect_mutation(m, v);
bool had_static_row = false;
if (v._ts == api::missing_timestamp) {
throw std::runtime_error("cdc: could not find timestamp of mutation");
bool should_split = false;
p.static_row().get().for_each_cell([&] (column_id id, const atomic_cell_or_collection& cell) {
had_static_row = true;
auto& cdef = base_schema.column_at(column_kind::static_column, id);
if (cdef.is_atomic()) {
auto view = cell.as_atomic_cell(cdef);
if (check_or_set(view.timestamp(), view.is_live_and_has_ttl() ? view.ttl() : gc_clock::duration(0))) {
should_split = true;
}
return;
}
cell.as_collection_mutation().with_deserialized(*cdef.type, [&] (collection_mutation_view_description mview) {
auto desc = mview.materialize(*cdef.type);
for (auto& [k, v]: desc.cells) {
if (check_or_set(v.timestamp(), v.is_live_and_has_ttl() ? v.ttl() : gc_clock::duration(0))) {
should_split = true;
return;
}
}
if (desc.tomb) {
if (check_or_set(desc.tomb.timestamp + 1, gc_clock::duration(0))) {
should_split = true;
return;
}
}
});
});
if (should_split) {
return true;
}
return v._ts;
bool had_clustered_row = false;
if (!p.clustered_rows().empty() && had_static_row) {
return true;
}
for (const rows_entry& cr : p.clustered_rows()) {
had_clustered_row = true;
const auto& marker = cr.row().marker();
if (marker.is_live() && check_or_set(marker.timestamp(), marker.is_expiring() ? marker.ttl() : gc_clock::duration(0))) {
return true;
}
bool is_insert = marker.is_live();
bool had_cells = false;
cr.row().cells().for_each_cell([&] (column_id id, const atomic_cell_or_collection& cell) {
had_cells = true;
auto& cdef = base_schema.column_at(column_kind::regular_column, id);
if (cdef.is_atomic()) {
auto view = cell.as_atomic_cell(cdef);
if (check_or_set(view.timestamp(), view.is_live_and_has_ttl() ? view.ttl() : gc_clock::duration(0))) {
should_split = true;
}
return;
}
cell.as_collection_mutation().with_deserialized(*cdef.type, [&] (collection_mutation_view_description mview) {
for (auto& [k, v]: mview.cells) {
if (check_or_set(v.timestamp(), v.is_live_and_has_ttl() ? v.ttl() : gc_clock::duration(0))) {
should_split = true;
return;
}
if (is_insert) {
// nonatomic updates cannot be expressed with an INSERT.
should_split = true;
return;
}
}
if (mview.tomb) {
if (check_or_set(mview.tomb.timestamp + 1, gc_clock::duration(0))) {
should_split = true;
return;
}
}
});
});
if (should_split) {
return true;
}
auto row_tomb = cr.row().deleted_at().regular();
if (row_tomb) {
if (had_cells) {
return true;
}
// there were no cells, so no ttl
assert(!found_ttl);
if (found_ts != api::missing_timestamp && found_ts != row_tomb.timestamp) {
return true;
}
found_ts = row_tomb.timestamp;
}
}
if (!p.row_tombstones().empty() && (had_static_row || had_clustered_row)) {
return true;
}
for (const auto& rt: p.row_tombstones()) {
if (rt.tomb) {
if (found_ts != api::missing_timestamp && found_ts != rt.tomb.timestamp) {
return true;
}
found_ts = rt.tomb.timestamp;
}
}
if (p.partition_tombstone().timestamp != api::missing_timestamp
&& (!p.row_tombstones().empty() || had_static_row || had_clustered_row)) {
return true;
}
// A mutation with no timestamp will be split into 0 mutations
return found_ts == api::missing_timestamp;
}
/* If a mutation contains multiple timestamps, multiple ttls, or multiple types of changes
* (e.g. it was created from a batch that both updated a clustered row and deleted a clustered row),
* we split it into multiple mutations, each with exactly one timestamp, at most one ttl, and a single type of change.
* We also split if we find both a change with no ttl (e.g. a cell tombstone) and a change with ttl (e.g. a ttled cell update).
*
* The `should_split` function checks whether the mutation requires such splitting, using `should_split_visitor`.
* The visitor uses the order in which the mutation is being visited (see the documentation of ChangeVisitor),
* remembers a bunch of state based on whatever was visited until now (e.g. was there a static row update?
* Was there a clustered row update? Was there a clustered row delete? Was there a TTL?)
* and tells the caller to stop on the first occurence of a second timestamp/ttl/type of change.
*/
struct should_split_visitor {
bool _had_static_row = false;
bool _had_clustered_row = false;
bool _had_upsert = false;
bool _had_row_marker = false;
bool _had_range_delete = false;
bool _result = false;
// This becomes a valid (non-missing) timestamp after visiting the first change.
// Then, if we encounter any different timestamp, it means that we should split.
api::timestamp_type _ts = api::missing_timestamp;
// This becomes non-null after visiting the fist change.
// If the change did not have a ttl (e.g. a non-ttled cell, or a tombstone), we store gc_clock::duration(0) there,
// because specifying ttl = 0 is equivalent to not specifying a TTL.
// Otherwise we store the change's ttl.
std::optional<gc_clock::duration> _ttl = std::nullopt;
inline bool finished() const { return _result; }
inline void stop() { _result = true; }
void visit(api::timestamp_type ts, gc_clock::duration ttl = gc_clock::duration(0)) {
if (_ts != api::missing_timestamp && _ts != ts) {
return stop();
}
_ts = ts;
if (_ttl && *_ttl != ttl) {
return stop();
}
_ttl = { ttl };
}
void visit(const atomic_cell_view& cell) { visit(cell.timestamp(), get_ttl(cell)); }
void live_atomic_cell(const column_definition&, const atomic_cell_view& cell) { visit(cell); }
void dead_atomic_cell(const column_definition&, const atomic_cell_view& cell) { visit(cell); }
void collection_tombstone(const tombstone& t) { visit(t.timestamp + 1); }
void live_collection_cell(bytes_view, const atomic_cell_view& cell) {
if (_had_row_marker) {
// nonatomic updates cannot be expressed with an INSERT.
return stop();
}
visit(cell);
}
void dead_collection_cell(bytes_view, const atomic_cell_view& cell) { visit(cell); }
void collection_column(const column_definition&, auto&& visit_collection) { visit_collection(*this); }
void marker(const row_marker& rm) {
_had_row_marker = true;
visit(rm.timestamp(), get_ttl(rm));
}
void static_row_cells(auto&& visit_row_cells) {
_had_static_row = true;
visit_row_cells(*this);
}
void clustered_row_cells(const clustering_key&, auto&& visit_row_cells) {
if (_had_static_row) {
return stop();
}
_had_clustered_row = _had_upsert = true;
visit_row_cells(*this);
}
void clustered_row_delete(const clustering_key&, const tombstone& t) {
if (_had_static_row || _had_upsert) {
return stop();
}
_had_clustered_row = true;
visit(t.timestamp);
}
void range_delete(const range_tombstone& t) {
if (_had_static_row || _had_clustered_row) {
return stop();
}
_had_range_delete = true;
visit(t.tomb.timestamp);
}
void partition_delete(const tombstone&) {
if (_had_range_delete || _had_static_row || _had_clustered_row) {
return stop();
}
}
};
bool should_split(const mutation& m) {
should_split_visitor v;
cdc::inspect_mutation(m, v);
return v._result
// A mutation with no timestamp will be split into 0 mutations:
|| v._ts == api::missing_timestamp;
}
void process_changes_with_splitting(const mutation& base_mutation, change_processor& processor,
bool enable_preimage, bool enable_postimage) {
const auto base_schema = base_mutation.schema();
auto changes = extract_changes(base_mutation);
void for_each_change(const mutation& base_mutation, const schema_ptr& base_schema,
seastar::noncopyable_function<void(mutation, api::timestamp_type, bytes, int&)> f) {
auto changes = extract_changes(base_mutation, *base_schema);
auto pk = base_mutation.key();
if (changes.empty()) {
return;
}
const auto last_timestamp = changes.rbegin()->first;
for (auto& [change_ts, btch] : changes) {
const bool is_last = change_ts == last_timestamp;
processor.begin_timestamp(change_ts, is_last);
clustered_column_set affected_clustered_columns_per_row{clustering_key::less_compare(*base_schema)};
one_kind_column_set affected_static_columns{base_schema->static_columns_count()};
if (enable_preimage || enable_postimage) {
affected_static_columns = btch.get_affected_static_columns(*base_schema);
affected_clustered_columns_per_row = btch.get_affected_clustered_columns_per_row(*base_mutation.schema());
}
if (enable_preimage) {
if (affected_static_columns.count() > 0) {
processor.produce_preimage(nullptr, affected_static_columns);
}
for (const auto& [ck, affected_row_cells] : affected_clustered_columns_per_row) {
processor.produce_preimage(&ck, affected_row_cells);
}
}
auto tuuid = timeuuid_type->decompose(generate_timeuuid(change_ts));
int batch_no = 0;
for (auto& sr_update : btch.static_updates) {
mutation m(base_schema, pk);
@@ -622,7 +434,7 @@ void process_changes_with_splitting(const mutation& base_mutation, change_proces
auto& cdef = base_schema->column_at(column_kind::static_column, nonatomic_update.id);
m.set_static_cell(cdef, collection_mutation_description{nonatomic_update.t, std::move(nonatomic_update.cells)}.serialize(*cdef.type));
}
processor.process_change(m);
f(std::move(m), change_ts, tuuid, batch_no);
}
for (auto& cr_insert : btch.clustered_inserts) {
@@ -639,7 +451,7 @@ void process_changes_with_splitting(const mutation& base_mutation, change_proces
}
row.apply(cr_insert.marker);
processor.process_change(m);
f(std::move(m), change_ts, tuuid, batch_no);
}
for (auto& cr_update : btch.clustered_updates) {
@@ -655,86 +467,27 @@ void process_changes_with_splitting(const mutation& base_mutation, change_proces
row.apply(cdef, collection_mutation_description{nonatomic_update.t, std::move(nonatomic_update.cells)}.serialize(*cdef.type));
}
processor.process_change(m);
f(std::move(m), change_ts, tuuid, batch_no);
}
for (auto& cr_delete : btch.clustered_row_deletions) {
mutation m(base_schema, pk);
m.partition().apply_delete(*base_schema, cr_delete.key, cr_delete.t);
processor.process_change(m);
f(std::move(m), change_ts, tuuid, batch_no);
}
for (auto& crange_delete : btch.clustered_range_deletions) {
mutation m(base_schema, pk);
m.partition().apply_delete(*base_schema, crange_delete.rt);
processor.process_change(m);
f(std::move(m), change_ts, tuuid, batch_no);
}
if (btch.partition_deletions) {
mutation m(base_schema, pk);
m.partition().apply(btch.partition_deletions->t);
processor.process_change(m);
}
if (enable_postimage) {
if (affected_static_columns.count() > 0) {
processor.produce_postimage(nullptr);
}
for (const auto& [ck, crow] : affected_clustered_columns_per_row) {
processor.produce_postimage(&ck);
}
}
processor.end_record();
}
}
void process_changes_without_splitting(const mutation& base_mutation, change_processor& processor,
bool enable_preimage, bool enable_postimage) {
auto ts = find_timestamp(base_mutation);
processor.begin_timestamp(ts, true);
const auto base_schema = base_mutation.schema();
if (enable_preimage) {
const auto& p = base_mutation.partition();
one_kind_column_set columns{base_schema->static_columns_count()};
if (!p.static_row().empty()) {
p.static_row().get().for_each_cell([&] (column_id id, const atomic_cell_or_collection& cell) {
columns.set(id);
});
processor.produce_preimage(nullptr, columns);
}
columns.resize(base_schema->regular_columns_count());
for (const rows_entry& cr : p.clustered_rows()) {
columns.reset();
if (cr.row().deleted_at().regular()) {
// Row deleted - include all columns in preimage
columns.set(0, base_schema->regular_columns_count(), true);
} else {
cr.row().cells().for_each_cell([&] (column_id id, const atomic_cell_or_collection& cell) {
columns.set(id);
});
}
processor.produce_preimage(&cr.key(), columns);
f(std::move(m), change_ts, tuuid, batch_no);
}
}
processor.process_change(base_mutation);
if (enable_postimage) {
const auto& p = base_mutation.partition();
if (!p.static_row().empty()) {
processor.produce_postimage(nullptr);
}
for (const rows_entry& cr : p.clustered_rows()) {
processor.produce_postimage(&cr.key());
}
}
processor.end_record();
}
} // namespace cdc

View File

@@ -22,7 +22,6 @@
#pragma once
#include <vector>
#include <boost/dynamic_bitset.hpp>
#include "schema_fwd.hh"
#include "timestamp.hh"
#include "bytes.hh"
@@ -32,61 +31,8 @@ class mutation;
namespace cdc {
// Represents a set of column ids of one kind (partition key, clustering key, regular row or static row).
// There already exists a column_set type, but it keeps ordinal_column_ids, not column_ids (ordinal column ids
// are unique across whole table, while kind-specific ids are unique only within one column kind).
// To avoid converting back and forth between ordinal and kind-specific ids, one_kind_column_set is used instead.
using one_kind_column_set = boost::dynamic_bitset<uint64_t>;
// An object that processes changes from a single, big mutation.
// It is intended to be used with process_changes_xxx_splitting. Those functions define the order and layout in which
// changes should appear in CDC log, and change_processor is responsible for producing CDC log rows from changes given
// by those two functions.
//
// The flow of calling its methods should go as follows:
// -> begin_timestamp #1
// -> produce_preimage (one call for each preimage row to be generated)
// -> process_change (one call for each part generated by the splitting function)
// -> produce_postimage (one call for each postimage row to be generated)
// -> begin_timestamp #2
// ...
class change_processor {
protected:
~change_processor() {};
public:
// Tells the processor that changes that follow from now on will be of given timestamp.
// This method must be called in increasing timestamp order.
// begin_timestamp can be called only once for a given timestamp and change_processor object.
// ts - timestamp of mutation parts
// is_last - determines if this will be the last timestamp to be processed by this change_processor instance.
virtual void begin_timestamp(api::timestamp_type ts, bool is_last) = 0;
// Tells the processor to produce a preimage for a given clustering/static row.
// ck - clustering key of the row for which to produce a preimage; if nullptr, static row preimage is requested
// columns_to_include - include information about the current state of those columns only, leave others as null
virtual void produce_preimage(const clustering_key* ck, const one_kind_column_set& columns_to_include) = 0;
// Tells the processor to produce a postimage for a given clustering/static row.
// Contrary to preimage, this requires data from all columns to be present.
// ck - clustering key of the row for which to produce a postimage; if nullptr, static row postimage is requested
virtual void produce_postimage(const clustering_key* ck) = 0;
// Processes a smaller mutation which is a subset of the big mutation.
// The mutation provided to process_change should be simple enough for it to be possible to convert it
// into CDC log rows - for example, it cannot represent a write to two columns of the same row, where
// both columns have different timestamp or TTL set.
// m - the small mutation to be converted into CDC log rows.
virtual void process_change(const mutation& m) = 0;
// Tells processor we have reached end of record - last part
// of a given timestamp batch
virtual void end_record() = 0;
};
bool should_split(const mutation& base_mutation);
void process_changes_with_splitting(const mutation& base_mutation, change_processor& processor,
bool enable_preimage, bool enable_postimage);
void process_changes_without_splitting(const mutation& base_mutation, change_processor& processor,
bool enable_preimage, bool enable_postimage);
bool should_split(const mutation& base_mutation, const schema& base_schema);
void for_each_change(const mutation& base_mutation, const schema_ptr& base_schema,
seastar::noncopyable_function<void(mutation, api::timestamp_type, bytes, int&)>);
}

View File

@@ -72,14 +72,7 @@ public:
}
return result;
}
class position_range_iterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = const position_range;
using difference_type = std::ptrdiff_t;
using pointer = const position_range*;
using reference = const position_range&;
private:
class position_range_iterator : public std::iterator<std::input_iterator_tag, const position_range> {
set_type::iterator _i;
public:
position_range_iterator(set_type::iterator i) : _i(i) {}

View File

@@ -21,7 +21,8 @@
#pragma once
#include "utils/rjson.hh"
#include <json/json.h>
#include "bytes.hh"
class schema;
@@ -46,7 +47,7 @@ public:
virtual ~column_computation() = default;
static column_computation_ptr deserialize(bytes_view raw);
static column_computation_ptr deserialize(const rjson::value& json);
static column_computation_ptr deserialize(const Json::Value& json);
virtual column_computation_ptr clone() const = 0;
@@ -54,36 +55,6 @@ public:
virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const = 0;
};
/*
* Computes token value of partition key and returns it as bytes.
*
* Should NOT be used (use token_column_computation), because ordering
* of bytes is different than ordering of tokens (signed vs unsigned comparison).
*
* The type name stored for computations of this class is "token" - this was
* the original implementation. (now depracated for new tables)
*/
class legacy_token_column_computation : public column_computation {
public:
virtual column_computation_ptr clone() const override {
return std::make_unique<legacy_token_column_computation>(*this);
}
virtual bytes serialize() const override;
virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const override;
};
/*
* Computes token value of partition key and returns it as long_type.
* The return type means that it can be trivially sorted (for example
* if computed column using this computation is a clustering key),
* preserving the correct order of tokens (using signed comparisons).
*
* Please use this class instead of legacy_token_column_computation.
*
* The type name stored for computations of this class is "token_v2".
* (the name "token" refers to the depracated legacy_token_column_computation)
*/
class token_column_computation : public column_computation {
public:
virtual column_computation_ptr clone() const override {

View File

@@ -130,13 +130,7 @@ public:
bytes decompose_value(const value_type& values) const {
return serialize_value(values);
}
class iterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = const bytes_view;
using difference_type = std::ptrdiff_t;
using pointer = const bytes_view*;
using reference = const bytes_view&;
class iterator : public std::iterator<std::input_iterator_tag, const bytes_view> {
private:
bytes_view _v;
bytes_view _current;

View File

@@ -61,14 +61,7 @@ public:
, _packed(packed)
{ }
class iterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = bytes::value_type;
using difference_type = std::ptrdiff_t;
using pointer = bytes::value_type*;
using reference = bytes::value_type&;
private:
class iterator : public std::iterator<std::input_iterator_tag, bytes::value_type> {
bool _singular;
// Offset within virtual output space of a component.
//
@@ -155,8 +148,8 @@ public:
_type.begin(k1), _type.end(k1),
_type.begin(k2), _type.end(k2),
[] (const bytes_view& c1, const bytes_view& c2) -> int {
if (c1.size() != c2.size() || !c1.size()) {
return c1.size() < c2.size() ? -1 : c1.size() ? 1 : 0;
if (c1.size() != c2.size()) {
return c1.size() < c2.size() ? -1 : 1;
}
return memcmp(c1.begin(), c2.begin(), c1.size());
});
@@ -346,14 +339,7 @@ public:
return eoc_byte == 0 ? eoc::none : (eoc_byte < 0 ? eoc::start : eoc::end);
}
class iterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = const component_view;
using difference_type = std::ptrdiff_t;
using pointer = const component_view*;
using reference = const component_view&;
private:
class iterator : public std::iterator<std::input_iterator_tag, const component_view> {
bytes_view _v;
component_view _current;
bool _strict_mode = true;

View File

@@ -205,7 +205,7 @@ void compression_parameters::validate_options(const std::map<sstring, sstring>&
ckw = _compressor->option_names();
}
for (auto&& opt : options) {
if (!keywords.contains(opt.first) && !ckw.contains(opt.first)) {
if (!keywords.count(opt.first) && !ckw.count(opt.first)) {
throw exceptions::configuration_exception(format("Unknown compression option '{}'.", opt.first));
}
}

View File

@@ -100,13 +100,8 @@ listen_address: localhost
# port for the CQL native transport to listen for clients on
# For security reasons, you should not expose this port to the internet. Firewall it if needed.
# To disable the CQL native transport, set this option to 0.
native_transport_port: 9042
# Like native_transport_port, but clients are forwarded to specific shards, based on the
# client-side port numbers.
native_shard_aware_transport_port: 19042
# Enabling native transport encryption in client_encryption_options allows you to either use
# encryption for the standard port or to use a dedicated, additional port along with the unencrypted
# standard native_transport_port.
@@ -116,10 +111,6 @@ native_shard_aware_transport_port: 19042
# keeping native_transport_port unencrypted.
#native_transport_port_ssl: 9142
# Like native_transport_port_ssl, but clients are forwarded to specific shards, based on the
# client-side port numbers.
#native_shard_aware_transport_port_ssl: 19142
# How long the coordinator should wait for read operations to complete
read_request_timeout_in_ms: 5000
@@ -230,9 +221,6 @@ batch_size_fail_threshold_in_kb: 50
# - PasswordAuthenticator relies on username/password pairs to authenticate
# users. It keeps usernames and hashed passwords in system_auth.credentials table.
# Please increase system_auth keyspace replication factor if you use this authenticator.
# - com.scylladb.auth.TransitionalAuthenticator requires username/password pair
# to authenticate in the same manner as PasswordAuthenticator, but improper credentials
# result in being logged in as an anonymous user. Use for upgrading clusters' auth.
# authenticator: AllowAllAuthenticator
# Authorization backend, implementing IAuthorizer; used to limit access/provide permissions
@@ -242,9 +230,6 @@ batch_size_fail_threshold_in_kb: 50
# - AllowAllAuthorizer allows any action to any user - set it to disable authorization.
# - CassandraAuthorizer stores permissions in system_auth.permissions table. Please
# increase system_auth keyspace replication factor if you use this authorizer.
# - com.scylladb.auth.TransitionalAuthorizer wraps around the CassandraAuthorizer, using it for
# authorizing permission management. Otherwise, it allows all. Use for upgrading
# clusters' auth.
# authorizer: AllowAllAuthorizer
# initial_token allows you to specify tokens manually. While you can use # it with

View File

@@ -34,9 +34,7 @@ from distutils.spawn import find_executable
curdir = os.getcwd()
outdir = 'build'
tempfile.tempdir = f"{outdir}/tmp"
tempfile.tempdir = "./build/tmp"
configure_args = str.join(' ', [shlex.quote(x) for x in sys.argv[1:]])
@@ -58,7 +56,6 @@ i18n_xlat = {
},
}
python3_dependencies = subprocess.run('./install-dependencies.sh --print-python3-runtime-packages', shell=True, capture_output=True, encoding='utf-8').stdout.strip()
def pkgname(name):
if name in i18n_xlat:
@@ -252,24 +249,20 @@ def find_headers(repodir, excluded_dirs):
modes = {
'debug': {
'cxxflags': '-DDEBUG -DSANITIZE -DDEBUG_LSA_SANITIZER -DSCYLLA_ENABLE_ERROR_INJECTION',
'cxx_ld_flags': '',
'stack-usage-threshold': 1024*40,
'cxxflags': '-DDEBUG -DDEBUG_LSA_SANITIZER -DSCYLLA_ENABLE_ERROR_INJECTION',
'cxx_ld_flags': '-Wstack-usage=%s' % (1024*40),
},
'release': {
'cxxflags': '-O3 -ffunction-sections -fdata-sections ',
'cxx_ld_flags': '-Wl,--gc-sections',
'stack-usage-threshold': 1024*13,
'cxxflags': '',
'cxx_ld_flags': '-O3 -Wstack-usage=%s' % (1024*13),
},
'dev': {
'cxxflags': '-O1 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSCYLLA_ENABLE_ERROR_INJECTION',
'cxx_ld_flags': '',
'stack-usage-threshold': 1024*21,
'cxxflags': '-DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSCYLLA_ENABLE_ERROR_INJECTION',
'cxx_ld_flags': '-O1 -Wstack-usage=%s' % (1024*21),
},
'sanitize': {
'cxxflags': '-Os -DDEBUG -DSANITIZE -DDEBUG_LSA_SANITIZER -DSCYLLA_ENABLE_ERROR_INJECTION',
'cxx_ld_flags': '',
'stack-usage-threshold': 1024*50,
'cxxflags': '-DDEBUG -DDEBUG_LSA_SANITIZER -DSCYLLA_ENABLE_ERROR_INJECTION',
'cxx_ld_flags': '-Os -Wstack-usage=%s' % (1024*50),
}
}
@@ -297,7 +290,6 @@ scylla_tests = set([
'test/boost/checksum_utils_test',
'test/boost/chunked_vector_test',
'test/boost/clustering_ranges_walker_test',
'test/boost/column_mapping_test',
'test/boost/commitlog_test',
'test/boost/compound_test',
'test/boost/compress_test',
@@ -314,7 +306,6 @@ scylla_tests = set([
'test/boost/crc_test',
'test/boost/data_listeners_test',
'test/boost/database_test',
'test/boost/double_decker_test',
'test/boost/duration_test',
'test/boost/dynamic_bitset_test',
'test/boost/enum_option_test',
@@ -330,12 +321,9 @@ scylla_tests = set([
'test/boost/gossiping_property_file_snitch_test',
'test/boost/hash_test',
'test/boost/idl_test',
'test/boost/imr_test',
'test/boost/input_stream_test',
'test/boost/json_cql_query_test',
'test/boost/json_test',
'test/boost/keys_test',
'test/boost/large_paging_state_test',
'test/boost/like_matcher_test',
'test/boost/limiting_data_source_test',
'test/boost/linearizing_input_stream_test',
@@ -344,7 +332,6 @@ scylla_tests = set([
'test/boost/estimated_histogram_test',
'test/boost/logalloc_test',
'test/boost/managed_vector_test',
'test/boost/intrusive_array_test',
'test/boost/map_difference_test',
'test/boost/memtable_test',
'test/boost/meta_test',
@@ -366,7 +353,6 @@ scylla_tests = set([
'test/boost/range_test',
'test/boost/range_tombstone_list_test',
'test/boost/reusable_buffer_test',
'test/boost/restrictions_test',
'test/boost/role_manager_test',
'test/boost/row_cache_test',
'test/boost/schema_change_test',
@@ -385,10 +371,10 @@ scylla_tests = set([
'test/boost/sstable_resharding_test',
'test/boost/sstable_directory_test',
'test/boost/sstable_test',
'test/boost/sstable_move_test',
'test/boost/storage_proxy_test',
'test/boost/top_k_test',
'test/boost/transport_test',
'test/boost/truncation_migration_test',
'test/boost/types_test',
'test/boost/user_function_test',
'test/boost/user_types_test',
@@ -400,15 +386,13 @@ scylla_tests = set([
'test/boost/view_schema_ckey_test',
'test/boost/vint_serialization_test',
'test/boost/virtual_reader_test',
'test/boost/bptree_test',
'test/boost/double_decker_test',
'test/boost/stall_free_test',
'test/boost/imr_test',
'test/manual/ec2_snitch_test',
'test/manual/enormous_table_scan_test',
'test/manual/gce_snitch_test',
'test/manual/gossip',
'test/manual/hint_test',
'test/manual/imr_test',
'test/manual/json_test',
'test/manual/message',
'test/manual/partition_data_test',
'test/manual/row_locker_test',
@@ -420,7 +404,6 @@ scylla_tests = set([
'test/perf/perf_fast_forward',
'test/perf/perf_hash',
'test/perf/perf_mutation',
'test/perf/perf_collection',
'test/perf/perf_row_cache_update',
'test/perf/perf_simple_query',
'test/perf/perf_sstable',
@@ -428,8 +411,6 @@ scylla_tests = set([
'test/unit/lsa_sync_eviction_test',
'test/unit/row_cache_alloc_stress_test',
'test/unit/row_cache_stress_test',
'test/unit/bptree_stress_test',
'test/unit/bptree_compaction_test',
])
perf_tests = set([
@@ -441,18 +422,13 @@ perf_tests = set([
'test/perf/perf_big_decimal',
])
raft_tests = set([
'test/raft/replication_test',
'test/boost/raft_fsm_test',
])
apps = set([
'scylla',
'test/tools/cql_repl',
'tools/scylla-types',
])
tests = scylla_tests | perf_tests | raft_tests
tests = scylla_tests | perf_tests
other = set([
'iotune',
@@ -471,18 +447,18 @@ arg_parser.add_argument('--so', dest='so', action='store_true',
arg_parser.add_argument('--mode', action='append', choices=list(modes.keys()), dest='selected_modes')
arg_parser.add_argument('--with', dest='artifacts', action='append', choices=all_artifacts, default=[])
arg_parser.add_argument('--with-seastar', action='store', dest='seastar_path', default='seastar', help='Path to Seastar sources')
add_tristate(arg_parser, name='dist', dest='enable_dist',
help='scylla-tools-java, scylla-jmx and packages')
arg_parser.add_argument('--cflags', action='store', dest='user_cflags', default='',
help='Extra flags for the C++ compiler')
arg_parser.add_argument('--ldflags', action='store', dest='user_ldflags', default='',
help='Extra flags for the linker')
arg_parser.add_argument('--target', action='store', dest='target', default=default_target_arch(),
help='Target architecture (-march)')
arg_parser.add_argument('--compiler', action='store', dest='cxx', default='clang++',
arg_parser.add_argument('--compiler', action='store', dest='cxx', default='g++',
help='C++ compiler path')
arg_parser.add_argument('--c-compiler', action='store', dest='cc', default='clang',
arg_parser.add_argument('--c-compiler', action='store', dest='cc', default='gcc',
help='C compiler path')
arg_parser.add_argument('--with-osv', action='store', dest='with_osv', default='',
help='Shortcut for compile for OSv')
add_tristate(arg_parser, name='dpdk', dest='dpdk',
help='Use dpdk (from seastar dpdk sources) (default=True for release builds)')
arg_parser.add_argument('--dpdk-target', action='store', dest='dpdk_target', default='',
@@ -505,32 +481,21 @@ arg_parser.add_argument('--split-dwarf', dest='split_dwarf', action='store_true'
help='use of split dwarf (https://gcc.gnu.org/wiki/DebugFission) to speed up linking')
arg_parser.add_argument('--enable-alloc-failure-injector', dest='alloc_failure_injector', action='store_true', default=False,
help='enable allocation failure injection')
arg_parser.add_argument('--enable-seastar-debug-allocations', dest='seastar_debug_allocations', action='store_true', default=False,
help='enable seastar debug allocations')
arg_parser.add_argument('--with-antlr3', dest='antlr3_exec', action='store', default=None,
help='path to antlr3 executable')
arg_parser.add_argument('--with-ragel', dest='ragel_exec', action='store', default='ragel',
help='path to ragel executable')
arg_parser.add_argument('--build-raft', dest='build_raft', action='store_true', default=False,
help='build raft code')
add_tristate(arg_parser, name='stack-guards', dest='stack_guards', help='Use stack guards')
arg_parser.add_argument('--verbose', dest='verbose', action='store_true',
help='Make configure.py output more verbose (useful for debugging the build process itself)')
arg_parser.add_argument('--test-repeat', dest='test_repeat', action='store', type=str, default='1',
help='Set number of times to repeat each unittest.')
arg_parser.add_argument('--test-timeout', dest='test_timeout', action='store', type=str, default='7200')
args = arg_parser.parse_args()
if not args.build_raft:
all_artifacts.difference_update(raft_tests)
tests.difference_update(raft_tests)
defines = ['XXH_PRIVATE_API',
'SEASTAR_TESTING_MAIN',
]
extra_cxxflags = {}
cassandra_interface = Thrift(source='interface/cassandra.thrift', service='Cassandra')
scylla_core = (['database.cc',
'absl-flat_hash_map.cc',
'table.cc',
@@ -551,7 +516,6 @@ scylla_core = (['database.cc',
'frozen_mutation.cc',
'memtable.cc',
'schema_mutations.cc',
'utils/array-search.cc',
'utils/logalloc.cc',
'utils/large_bitset.cc',
'utils/buffer_input_stream.cc',
@@ -559,8 +523,6 @@ scylla_core = (['database.cc',
'utils/updateable_value.cc',
'utils/directories.cc',
'utils/generation-number.cc',
'utils/rjson.cc',
'utils/human_readable.cc',
'mutation_partition.cc',
'mutation_partition_view.cc',
'mutation_partition_serializer.cc',
@@ -568,6 +530,7 @@ scylla_core = (['database.cc',
'mutation_reader.cc',
'flat_mutation_reader.cc',
'mutation_query.cc',
'json.cc',
'keys.cc',
'counters.cc',
'compress.cc',
@@ -575,8 +538,7 @@ scylla_core = (['database.cc',
'sstables/mp_row_consumer.cc',
'sstables/sstables.cc',
'sstables/sstables_manager.cc',
'sstables/mx/writer.cc',
'sstables/kl/writer.cc',
'sstables/mc/writer.cc',
'sstables/sstable_version.cc',
'sstables/compress.cc',
'sstables/partition.cc',
@@ -590,10 +552,6 @@ scylla_core = (['database.cc',
'sstables/prepended_input_stream.cc',
'sstables/m_format_read_helpers.cc',
'sstables/sstable_directory.cc',
'sstables/random_access_reader.cc',
'sstables/metadata_collector.cc',
'sstables/writer.cc',
'transport/cql_protocol_extension.cc',
'transport/event.cc',
'transport/event_notifier.cc',
'transport/server.cc',
@@ -616,8 +574,6 @@ scylla_core = (['database.cc',
'cql3/sets.cc',
'cql3/tuples.cc',
'cql3/maps.cc',
'cql3/values.cc',
'cql3/expr/expression.cc',
'cql3/functions/user_function.cc',
'cql3/functions/functions.cc',
'cql3/functions/aggregate_fcts.cc',
@@ -665,7 +621,6 @@ scylla_core = (['database.cc',
'cql3/statements/alter_keyspace_statement.cc',
'cql3/statements/role-management-statements.cc',
'cql3/update_parameters.cc',
'cql3/util.cc',
'cql3/ut_name.cc',
'cql3/role_name.cc',
'thrift/handler.cc',
@@ -685,6 +640,7 @@ scylla_core = (['database.cc',
'service/paxos/prepare_response.cc',
'service/paxos/paxos_state.cc',
'service/paxos/prepare_summary.cc',
'cql3/operator.cc',
'cql3/relation.cc',
'cql3/column_identifier.cc',
'cql3/column_specification.cc',
@@ -718,7 +674,6 @@ scylla_core = (['database.cc',
'db/data_listeners.cc',
'db/hints/manager.cc',
'db/hints/resource_manager.cc',
'db/hints/host_filter.cc',
'db/config.cc',
'db/extensions.cc',
'db/heat_load_balance.cc',
@@ -729,7 +684,6 @@ scylla_core = (['database.cc',
'db/view/view_update_generator.cc',
'db/view/row_locking.cc',
'db/sstables-format-selector.cc',
'db/snapshot-ctl.cc',
'index/secondary_index_manager.cc',
'index/secondary_index.cc',
'utils/UUID_gen.cc',
@@ -797,7 +751,6 @@ scylla_core = (['database.cc',
'streaming/stream_manager.cc',
'streaming/stream_result_future.cc',
'streaming/stream_session_state.cc',
'streaming/stream_reason.cc',
'clocks-impl.cc',
'partition_slice_builder.cc',
'init.cc',
@@ -898,8 +851,8 @@ alternator = [
'alternator/expressions.cc',
Antlr3Grammar('alternator/expressions.g'),
'alternator/conditions.cc',
'alternator/rjson.cc',
'alternator/auth.cc',
'alternator/streams.cc',
]
redis = [
@@ -954,8 +907,6 @@ scylla_tests_generic_dependencies = [
'test/lib/log.cc',
'test/lib/reader_permit.cc',
'test/lib/test_utils.cc',
'test/lib/tmpdir.cc',
'test/lib/sstable_run_based_compaction_strategy_for_tests.cc',
]
scylla_tests_dependencies = scylla_core + idls + scylla_tests_generic_dependencies + [
@@ -968,17 +919,8 @@ scylla_tests_dependencies = scylla_core + idls + scylla_tests_generic_dependenci
'test/lib/random_schema.cc',
]
scylla_raft_dependencies = [
'raft/raft.cc',
'raft/server.cc',
'raft/fsm.cc',
'raft/progress.cc',
'raft/log.cc',
'utils/uuid.cc'
]
deps = {
'scylla': idls + ['main.cc', 'release.cc', 'utils/build_id.cc'] + scylla_core + api + alternator + redis,
'scylla': idls + ['main.cc', 'release.cc', 'build_id.cc'] + scylla_core + api + alternator + redis,
'test/tools/cql_repl': idls + ['test/tools/cql_repl.cc'] + scylla_core + scylla_tests_generic_dependencies,
#FIXME: we don't need all of scylla_core here, only the types module, need to modularize scylla_core.
'tools/scylla-types': idls + ['tools/scylla-types.cc'] + scylla_core,
@@ -1002,7 +944,6 @@ pure_boost_tests = set([
'test/boost/enum_option_test',
'test/boost/enum_set_test',
'test/boost/idl_test',
'test/boost/json_test',
'test/boost/keys_test',
'test/boost/like_matcher_test',
'test/boost/linearizing_input_stream_test',
@@ -1016,7 +957,7 @@ pure_boost_tests = set([
'test/boost/small_vector_test',
'test/boost/top_k_test',
'test/boost/vint_serialization_test',
'test/boost/bptree_test',
'test/manual/json_test',
'test/manual/streaming_histogram_test',
])
@@ -1030,13 +971,10 @@ tests_not_using_seastar_test_framework = set([
'test/perf/perf_cql_parser',
'test/perf/perf_hash',
'test/perf/perf_mutation',
'test/perf/perf_collection',
'test/perf/perf_row_cache_update',
'test/unit/lsa_async_eviction_test',
'test/unit/lsa_sync_eviction_test',
'test/unit/row_cache_alloc_stress_test',
'test/unit/bptree_stress_test',
'test/unit/bptree_compaction_test',
'test/manual/sstable_scan_footprint_test',
]) | pure_boost_tests
@@ -1080,7 +1018,7 @@ deps['test/boost/anchorless_list_test'] = ['test/boost/anchorless_list_test.cc']
deps['test/perf/perf_fast_forward'] += ['release.cc']
deps['test/perf/perf_simple_query'] += ['release.cc']
deps['test/boost/meta_test'] = ['test/boost/meta_test.cc']
deps['test/boost/imr_test'] = ['test/boost/imr_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['test/manual/imr_test'] = ['test/manual/imr_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['test/boost/reusable_buffer_test'] = [
"test/boost/reusable_buffer_test.cc",
"test/lib/log.cc",
@@ -1097,12 +1035,8 @@ deps['test/boost/linearizing_input_stream_test'] = [
deps['test/boost/duration_test'] += ['test/lib/exception_utils.cc']
deps['test/boost/alternator_base64_test'] += ['alternator/base64.cc']
deps['test/raft/replication_test'] = ['test/raft/replication_test.cc'] + scylla_raft_dependencies
deps['test/boost/raft_fsm_test'] = ['test/boost/raft_fsm_test.cc', 'test/lib/log.cc'] + scylla_raft_dependencies
deps['utils/gz/gen_crc_combine_table'] = ['utils/gz/gen_crc_combine_table.cc']
warnings = [
'-Wall',
'-Werror',
@@ -1124,29 +1058,6 @@ warnings = [
'-Wno-ignored-attributes',
'-Wno-overloaded-virtual',
'-Wno-stringop-overflow',
'-Wno-unused-command-line-argument',
'-Wno-inconsistent-missing-override',
'-Wno-defaulted-function-deleted',
'-Wno-redeclared-class-member',
'-Wno-pessimizing-move',
'-Wno-redundant-move',
'-Wno-gnu-designator',
'-Wno-instantiation-after-specialization',
'-Wno-unused-private-field',
'-Wno-unsupported-friend',
'-Wno-unused-variable',
'-Wno-return-std-move',
'-Wno-delete-non-abstract-non-virtual-dtor',
'-Wno-unknown-attributes',
'-Wno-braced-scalar-init',
'-Wno-unused-value',
'-Wno-range-loop-construct',
'-Wno-unused-function',
'-Wno-implicit-int-float-conversion',
'-Wno-delete-abstract-non-virtual-dtor',
'-Wno-uninitialized-const-reference',
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728
'-Wno-psabi',
]
warnings = [w
@@ -1156,17 +1067,12 @@ warnings = [w
warnings = ' '.join(warnings + ['-Wno-error=deprecated-declarations'])
optimization_flags = [
'--param inline-unit-growth=300', # gcc
'-mllvm -inline-threshold=2500', # clang
'--param inline-unit-growth=300',
]
optimization_flags = [o
for o in optimization_flags
if flag_supported(flag=o, compiler=args.cxx)]
modes['release']['cxxflags'] += ' ' + ' '.join(optimization_flags)
if flag_supported(flag='-Wstack-usage=4096', compiler=args.cxx):
for mode in modes:
modes[mode]['cxxflags'] += f' -Wstack-usage={modes[mode]["stack-usage-threshold"]} -Wno-error=stack-usage='
modes['release']['cxx_ld_flags'] += ' ' + ' '.join(optimization_flags)
linker_flags = linker_flags(compiler=args.cxx)
@@ -1201,20 +1107,9 @@ pkgs.append('libsystemd')
compiler_test_src = '''
// clang pretends to be gcc (defined __GNUC__), so we
// must check it first
#ifdef __clang__
#if __clang_major__ < 10
#if __GNUC__ < 8
#error "MAJOR"
#endif
#elif defined(__GNUC__)
#if __GNUC__ < 10
#error "MAJOR"
#elif __GNUC__ == 10
#elif __GNUC__ == 8
#if __GNUC_MINOR__ < 1
#error "MINOR"
#elif __GNUC_MINOR__ == 1
@@ -1224,16 +1119,10 @@ compiler_test_src = '''
#endif
#endif
#else
#error "Unrecognized compiler"
#endif
int main() { return 0; }
'''
if not try_compile_and_link(compiler=args.cxx, source=compiler_test_src):
print('Wrong GCC version. Scylla needs GCC >= 10.1.1 to compile.')
print('Wrong GCC version. Scylla needs GCC >= 8.1.1 to compile.')
sys.exit(1)
if not try_compile(compiler=args.cxx, source='#include <boost/version.hpp>'):
@@ -1277,12 +1166,10 @@ if status != 0:
print('Version file generation failed')
sys.exit(1)
file = open(f'{outdir}/SCYLLA-VERSION-FILE', 'r')
file = open('build/SCYLLA-VERSION-FILE', 'r')
scylla_version = file.read().strip()
file = open(f'{outdir}/SCYLLA-RELEASE-FILE', 'r')
file = open('build/SCYLLA-RELEASE-FILE', 'r')
scylla_release = file.read().strip()
file = open(f'{outdir}/SCYLLA-PRODUCT-FILE', 'r')
scylla_product = file.read().strip()
extra_cxxflags["release.cc"] = "-DSCYLLA_VERSION=\"\\\"" + scylla_version + "\\\"\" -DSCYLLA_RELEASE=\"\\\"" + scylla_release + "\\\"\""
@@ -1320,10 +1207,11 @@ forced_ldflags += f'--dynamic-linker={dynamic_linker}'
args.user_ldflags = forced_ldflags + ' ' + args.user_ldflags
args.user_cflags += f" -ffile-prefix-map={curdir}=."
args.user_cflags += ' -Wno-error=stack-usage='
args.user_cflags += f"-ffile-prefix-map={curdir}=."
seastar_cflags = args.user_cflags
if args.target != '':
seastar_cflags += ' -march=' + args.target
seastar_ldflags = args.user_ldflags
@@ -1332,13 +1220,6 @@ libdeflate_cflags = seastar_cflags
MODE_TO_CMAKE_BUILD_TYPE = {'release' : 'RelWithDebInfo', 'debug' : 'Debug', 'dev' : 'Dev', 'sanitize' : 'Sanitize' }
# cmake likes to separate things with semicolons
def semicolon_separated(*flags):
# original flags may be space separated, so convert to string still
# using spaces
f = ' '.join(flags)
return re.sub(' +', ';', f)
def configure_seastar(build_dir, mode):
seastar_build_dir = os.path.join(build_dir, mode, 'seastar')
@@ -1347,10 +1228,10 @@ def configure_seastar(build_dir, mode):
'-DCMAKE_C_COMPILER={}'.format(args.cc),
'-DCMAKE_CXX_COMPILER={}'.format(args.cxx),
'-DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON',
'-DSeastar_CXX_FLAGS={}'.format((seastar_cflags).replace(' ', ';')),
'-DSeastar_LD_FLAGS={}'.format(semicolon_separated(seastar_ldflags, modes[mode]['cxx_ld_flags'])),
'-DSeastar_CXX_FLAGS={}'.format((seastar_cflags + ' ' + modes[mode]['cxx_ld_flags']).replace(' ', ';')),
'-DSeastar_LD_FLAGS={}'.format(seastar_ldflags),
'-DSeastar_CXX_DIALECT=gnu++20',
'-DSeastar_API_LEVEL=6',
'-DSeastar_API_LEVEL=4',
'-DSeastar_UNUSED_RESULT_ERROR=ON',
]
@@ -1360,15 +1241,13 @@ def configure_seastar(build_dir, mode):
dpdk = args.dpdk
if dpdk is None:
dpdk = platform.machine() == 'x86_64' and mode == 'release'
dpdk = mode == 'release'
if dpdk:
seastar_cmake_args += ['-DSeastar_DPDK=ON', '-DSeastar_DPDK_MACHINE=wsm']
if args.split_dwarf:
seastar_cmake_args += ['-DSeastar_SPLIT_DWARF=ON']
if args.alloc_failure_injector:
seastar_cmake_args += ['-DSeastar_ALLOC_FAILURE_INJECTION=ON']
if args.seastar_debug_allocations:
seastar_cmake_args += ['-DSeastar_DEBUG_ALLOCATIONS=ON']
seastar_cmd = ['cmake', '-G', 'Ninja', os.path.relpath(args.seastar_path, seastar_build_dir)] + seastar_cmake_args
cmake_dir = seastar_build_dir
@@ -1378,15 +1257,14 @@ def configure_seastar(build_dir, mode):
relative_seastar_build_dir = os.path.join('..', seastar_build_dir) # relative to seastar/
seastar_cmd = ['./cooking.sh', '-i', 'dpdk', '-d', relative_seastar_build_dir, '--'] + seastar_cmd[4:]
if args.verbose:
print(" \\\n ".join(seastar_cmd))
print(seastar_cmd)
os.makedirs(seastar_build_dir, exist_ok=True)
subprocess.check_call(seastar_cmd, shell=False, cwd=cmake_dir)
for mode in build_modes:
configure_seastar(outdir, mode)
configure_seastar('build', mode)
pc = {mode: f'{outdir}/{mode}/seastar/seastar.pc' for mode in build_modes}
pc = {mode: 'build/{}/seastar/seastar.pc'.format(mode) for mode in build_modes}
ninja = find_executable('ninja') or find_executable('ninja-build')
if not ninja:
print('Ninja executable (ninja or ninja-build) not found on PATH\n')
@@ -1441,6 +1319,7 @@ abseil_libs = ['absl/' + lib for lib in [
'base/libabsl_malloc_internal.a',
'base/libabsl_spinlock_wait.a',
'base/libabsl_base.a',
'base/libabsl_dynamic_annotations.a',
'base/libabsl_raw_logging_internal.a',
'base/libabsl_exponential_biased.a',
'base/libabsl_throw_delegate.a']]
@@ -1452,15 +1331,18 @@ libs = ' '.join([maybe_static(args.staticyamlcpp, '-lyaml-cpp'), '-latomic', '-l
# Must link with static version of libzstd, since
# experimental APIs that we use are only present there.
maybe_static(True, '-lzstd'),
maybe_static(args.staticboost, '-lboost_date_time -lboost_regex -licuuc -licui18n'),
'-lxxhash'])
maybe_static(args.staticboost, '-lboost_date_time -lboost_regex -licuuc'), ])
pkgconfig_libs = [
'libxxhash',
]
args.user_cflags += ' ' + ' '.join([pkg_config(lib, '--cflags') for lib in pkgconfig_libs])
libs += ' ' + ' '.join([pkg_config(lib, '--libs') for lib in pkgconfig_libs])
if not args.staticboost:
args.user_cflags += ' -DBOOST_TEST_DYN_LINK'
if build_raft:
args.user_cflags += ' -DENABLE_SCYLLA_RAFT'
# thrift version detection, see #4538
proc_res = subprocess.run(["thrift", "-version"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
proc_res_output = proc_res.stdout.decode("utf-8")
@@ -1485,9 +1367,13 @@ if args.staticthrift:
else:
thrift_libs = "-lthrift"
outdir = 'build'
buildfile = 'build.ninja'
os.makedirs(outdir, exist_ok=True)
do_sanitize = True
if args.static:
do_sanitize = False
if args.antlr3_exec:
antlr3_exec = args.antlr3_exec
@@ -1513,7 +1399,7 @@ with open(buildfile_tmp, 'w') as f:
configure_args = {configure_args}
builddir = {outdir}
cxx = {cxx}
cxxflags = --std=gnu++20 {user_cflags} {warnings} {defines}
cxxflags = {user_cflags} {warnings} {defines}
ldflags = {linker_flags} {user_ldflags}
ldflags_build = {linker_flags}
libs = {libs}
@@ -1543,7 +1429,7 @@ with open(buildfile_tmp, 'w') as f:
command = $in > $out
description = GEN $out
rule copy
command = cp --reflink=auto $in $out
command = cp $in $out
description = COPY $out
rule package
command = scripts/create-relocatable-package.py --mode $mode $out
@@ -1551,8 +1437,6 @@ with open(buildfile_tmp, 'w') as f:
command = reloc/build_rpm.sh --reloc-pkg $in --builddir $out
rule debbuild
command = reloc/build_deb.sh --reloc-pkg $in --builddir $out
rule unified
command = unified/build_unified.sh --mode $mode --unified-pkg $out
''').format(**globals()))
for mode in build_modes:
modeval = modes[mode]
@@ -1585,49 +1469,45 @@ with open(buildfile_tmp, 'w') as f:
rule thrift.{mode}
command = thrift -gen cpp:cob_style -out $builddir/{mode}/gen $in
description = THRIFT $in
restat = 1
rule antlr3.{mode}
# We replace many local `ExceptionBaseType* ex` variables with a single function-scope one.
# Because we add such a variable to every function, and because `ExceptionBaseType` is not a global
# name, we also add a global typedef to avoid compilation errors.
command = sed -e '/^#if 0/,/^#endif/d' $in > $builddir/{mode}/gen/$in $
&& {antlr3_exec} $builddir/{mode}/gen/$in $
&& sed -i -e '/^.*On :.*$$/d' $builddir/{mode}/gen/${{stem}}Lexer.hpp $
&& sed -i -e '/^.*On :.*$$/d' $builddir/{mode}/gen/${{stem}}Lexer.cpp $
&& sed -i -e '/^.*On :.*$$/d' $builddir/{mode}/gen/${{stem}}Parser.hpp $
&& sed -i -e '/^.*On :.*$$/d' build/{mode}/gen/${{stem}}Lexer.hpp $
&& sed -i -e '/^.*On :.*$$/d' build/{mode}/gen/${{stem}}Lexer.cpp $
&& sed -i -e '/^.*On :.*$$/d' build/{mode}/gen/${{stem}}Parser.hpp $
&& sed -i -e 's/^\\( *\)\\(ImplTraits::CommonTokenType\\* [a-zA-Z0-9_]* = NULL;\\)$$/\\1const \\2/' $
-e '/^.*On :.*$$/d' $
-e '1i using ExceptionBaseType = int;' $
-e 's/^{{/{{ ExceptionBaseType\* ex = nullptr;/; $
s/ExceptionBaseType\* ex = new/ex = new/; $
s/exceptions::syntax_exception e/exceptions::syntax_exception\& e/' $
$builddir/{mode}/gen/${{stem}}Parser.cpp
build/{mode}/gen/${{stem}}Parser.cpp
description = ANTLR3 $in
rule checkhh.{mode}
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags --include $in -c -o $out $builddir/{mode}/gen/empty.cc
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags --include $in -c -o $out build/{mode}/gen/empty.cc
description = CHECKHH $in
depfile = $out.d
rule test.{mode}
command = ./test.py --mode={mode} --repeat={test_repeat} --timeout={test_timeout}
pool = console
command = ./test.py --mode={mode}
description = TEST {mode}
''').format(mode=mode, antlr3_exec=antlr3_exec, fmt_lib=fmt_lib, test_repeat=test_repeat, test_timeout=test_timeout, **modeval))
''').format(mode=mode, antlr3_exec=antlr3_exec, fmt_lib=fmt_lib, **modeval))
f.write(
'build {mode}-build: phony {artifacts}\n'.format(
'build {mode}: phony {artifacts}\n'.format(
mode=mode,
artifacts=str.join(' ', ('$builddir/' + mode + '/' + x for x in build_artifacts))
)
)
include_dist_target = f'dist-{mode}' if args.enable_dist is None or args.enable_dist else ''
f.write(f'build {mode}: phony {mode}-build {include_dist_target}\n')
compiles = {}
swaggers = set()
serializers = {}
thrifts = set()
ragels = {}
antlr3_grammars = set()
seastar_dep = '$builddir/{}/seastar/libseastar.a'.format(mode)
seastar_testing_dep = '$builddir/{}/seastar/libseastar_testing.a'.format(mode)
seastar_dep = 'build/{}/seastar/libseastar.a'.format(mode)
seastar_testing_dep = 'build/{}/seastar/libseastar_testing.a'.format(mode)
for binary in build_artifacts:
if binary in other:
continue
@@ -1715,7 +1595,7 @@ with open(buildfile_tmp, 'w') as f:
)
f.write(
'build {mode}-test: test.{mode} {test_executables} $builddir/{mode}/test/tools/cql_repl $builddir/{mode}/scylla\n'.format(
'build {mode}-test: test.{mode} {test_executables} $builddir/{mode}/test/tools/cql_repl\n'.format(
mode=mode,
test_executables=' '.join(['$builddir/{}/{}'.format(mode, binary) for binary in tests]),
)
@@ -1775,136 +1655,106 @@ with open(buildfile_tmp, 'w') as f:
if has_sanitize_address_use_after_scope:
flags += ' -fno-sanitize-address-use-after-scope'
f.write(' obj_cxxflags = %s\n' % flags)
f.write(f'build $builddir/{mode}/gen/empty.cc: gen\n')
f.write(f'build build/{mode}/gen/empty.cc: gen\n')
for hh in headers:
f.write('build $builddir/{mode}/{hh}.o: checkhh.{mode} {hh} | $builddir/{mode}/gen/empty.cc || {gen_headers_dep}\n'.format(
f.write('build $builddir/{mode}/{hh}.o: checkhh.{mode} {hh} | build/{mode}/gen/empty.cc || {gen_headers_dep}\n'.format(
mode=mode, hh=hh, gen_headers_dep=gen_headers_dep))
f.write('build $builddir/{mode}/seastar/libseastar.a: ninja | always\n'
f.write('build build/{mode}/seastar/libseastar.a: ninja | always\n'
.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(' subdir = $builddir/{mode}/seastar\n'.format(**locals()))
f.write(' subdir = build/{mode}/seastar\n'.format(**locals()))
f.write(' target = seastar\n'.format(**locals()))
f.write('build $builddir/{mode}/seastar/libseastar_testing.a: ninja | always\n'
f.write('build build/{mode}/seastar/libseastar_testing.a: ninja | always\n'
.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(' subdir = $builddir/{mode}/seastar\n'.format(**locals()))
f.write(' subdir = build/{mode}/seastar\n'.format(**locals()))
f.write(' target = seastar_testing\n'.format(**locals()))
f.write('build $builddir/{mode}/seastar/apps/iotune/iotune: ninja\n'
f.write('build build/{mode}/seastar/apps/iotune/iotune: ninja\n'
.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(' subdir = $builddir/{mode}/seastar\n'.format(**locals()))
f.write(' subdir = build/{mode}/seastar\n'.format(**locals()))
f.write(' target = iotune\n'.format(**locals()))
f.write(textwrap.dedent('''\
build $builddir/{mode}/iotune: copy $builddir/{mode}/seastar/apps/iotune/iotune
build build/{mode}/iotune: copy build/{mode}/seastar/apps/iotune/iotune
''').format(**locals()))
f.write('build $builddir/{mode}/dist/tar/{scylla_product}-package.tar.gz: package $builddir/{mode}/scylla $builddir/{mode}/iotune $builddir/SCYLLA-RELEASE-FILE $builddir/SCYLLA-VERSION-FILE $builddir/debian/debian | always\n'.format(**locals()))
f.write('build build/{mode}/scylla-package.tar.gz: package build/{mode}/scylla build/{mode}/iotune build/SCYLLA-RELEASE-FILE build/SCYLLA-VERSION-FILE build/debian/debian | always\n'.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(' mode = {mode}\n'.format(**locals()))
f.write(f'build $builddir/dist/{mode}/redhat: rpmbuild $builddir/{mode}/dist/tar/{scylla_product}-package.tar.gz\n')
f.write(f'build build/dist/{mode}/redhat: rpmbuild build/{mode}/scylla-package.tar.gz\n')
f.write(f' pool = submodule_pool\n')
f.write(f' mode = {mode}\n')
f.write(f'build $builddir/dist/{mode}/debian: debbuild $builddir/{mode}/dist/tar/{scylla_product}-package.tar.gz\n')
f.write(f' mode = {mode}\n')
f.write(f'build dist-server-{mode}: phony $builddir/dist/{mode}/redhat $builddir/dist/{mode}/debian\n')
f.write(f'build dist-jmx-{mode}: phony $builddir/{mode}/dist/tar/{scylla_product}-jmx-package.tar.gz dist-jmx-rpm dist-jmx-deb\n')
f.write(f'build dist-tools-{mode}: phony $builddir/{mode}/dist/tar/{scylla_product}-tools-package.tar.gz dist-tools-rpm dist-tools-deb\n')
f.write(f'build dist-python3-{mode}: phony dist-python3-tar dist-python3-rpm dist-python3-deb compat-python3-rpm compat-python3-deb\n')
f.write(f'build dist-unified-{mode}: phony $builddir/{mode}/dist/tar/{scylla_product}-unified-package-{scylla_version}.{scylla_release}.tar.gz\n')
f.write(f'build $builddir/{mode}/dist/tar/{scylla_product}-unified-package-{scylla_version}.{scylla_release}.tar.gz: unified $builddir/{mode}/dist/tar/{scylla_product}-package.tar.gz $builddir/{mode}/dist/tar/{scylla_product}-python3-package.tar.gz $builddir/{mode}/dist/tar/{scylla_product}-jmx-package.tar.gz $builddir/{mode}/dist/tar/{scylla_product}-tools-package.tar.gz | always\n')
f.write(f'build build/dist/{mode}/debian: debbuild build/{mode}/scylla-package.tar.gz\n')
f.write(f' pool = submodule_pool\n')
f.write(f' mode = {mode}\n')
f.write(f'build dist-server-{mode}: phony build/dist/{mode}/redhat build/dist/{mode}/debian\n')
f.write('rule libdeflate.{mode}\n'.format(**locals()))
f.write(' command = make -C libdeflate BUILD_DIR=../$builddir/{mode}/libdeflate/ CFLAGS="{libdeflate_cflags}" CC={args.cc} ../$builddir/{mode}/libdeflate//libdeflate.a\n'.format(**locals()))
f.write('build $builddir/{mode}/libdeflate/libdeflate.a: libdeflate.{mode}\n'.format(**locals()))
f.write(' command = make -C libdeflate BUILD_DIR=../build/{mode}/libdeflate/ CFLAGS="{libdeflate_cflags}" CC={args.cc} ../build/{mode}/libdeflate//libdeflate.a\n'.format(**locals()))
f.write('build build/{mode}/libdeflate/libdeflate.a: libdeflate.{mode}\n'.format(**locals()))
f.write(' pool = submodule_pool\n')
for lib in abseil_libs:
f.write('build $builddir/{mode}/abseil/{lib}: ninja\n'.format(**locals()))
f.write('build build/{mode}/abseil/{lib}: ninja\n'.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(' subdir = $builddir/{mode}/abseil\n'.format(**locals()))
f.write(' subdir = build/{mode}/abseil\n'.format(**locals()))
f.write(' target = {lib}\n'.format(**locals()))
checkheaders_mode = 'dev' if 'dev' in modes else modes[0]
f.write('build checkheaders: phony || {}\n'.format(' '.join(['$builddir/{}/{}.o'.format(checkheaders_mode, hh) for hh in headers])))
mode = 'dev' if 'dev' in modes else modes[0]
f.write('build checkheaders: phony || {}\n'.format(' '.join(['$builddir/{}/{}.o'.format(mode, hh) for hh in headers])))
f.write(
'build build: phony {}\n'.format(' '.join([f'{mode}-build' for mode in build_modes]))
'build test: phony {}\n'.format(' '.join(['{mode}-test'.format(mode=mode) for mode in modes]))
)
f.write(
'build test: phony {}\n'.format(' '.join(['{mode}-test'.format(mode=mode) for mode in build_modes]))
)
f.write(
'build check: phony {}\n'.format(' '.join(['{mode}-check'.format(mode=mode) for mode in build_modes]))
'build check: phony {}\n'.format(' '.join(['{mode}-check'.format(mode=mode) for mode in modes]))
)
f.write(textwrap.dedent(f'''\
build dist-unified-tar: phony {' '.join([f'$builddir/{mode}/dist/tar/{scylla_product}-unified-package-{scylla_version}.{scylla_release}.tar.gz' for mode in build_modes])}
build dist-unified: phony dist-unified-tar
build dist-server-deb: phony {' '.join(['$builddir/dist/{mode}/debian'.format(mode=mode) for mode in build_modes])}
build dist-server-rpm: phony {' '.join(['$builddir/dist/{mode}/redhat'.format(mode=mode) for mode in build_modes])}
build dist-server-tar: phony {' '.join(['$builddir/{mode}/dist/tar/{scylla_product}-package.tar.gz'.format(mode=mode, scylla_product=scylla_product) for mode in build_modes])}
build dist-server: phony dist-server-tar dist-server-rpm dist-server-deb
build dist-server-deb: phony {' '.join(['build/dist/{mode}/debian'.format(mode=mode) for mode in build_modes])}
build dist-server-rpm: phony {' '.join(['build/dist/{mode}/redhat'.format(mode=mode) for mode in build_modes])}
build dist-server: phony dist-server-rpm dist-server-deb
rule build-submodule-reloc
command = cd $reloc_dir && ./reloc/build_reloc.sh --version $$(<../../build/SCYLLA-PRODUCT-FILE)-$$(<../../build/SCYLLA-VERSION-FILE)-$$(<../../build/SCYLLA-RELEASE-FILE) --nodeps $args
command = cd $reloc_dir && ./reloc/build_reloc.sh
rule build-submodule-rpm
command = cd $dir && ./reloc/build_rpm.sh --reloc-pkg $artifact
rule build-submodule-deb
command = cd $dir && ./reloc/build_deb.sh --reloc-pkg $artifact
build tools/jmx/build/{scylla_product}-jmx-package.tar.gz: build-submodule-reloc
reloc_dir = tools/jmx
build dist-jmx-rpm: build-submodule-rpm tools/jmx/build/{scylla_product}-jmx-package.tar.gz
dir = tools/jmx
artifact = $builddir/{scylla_product}-jmx-package.tar.gz
build dist-jmx-deb: build-submodule-deb tools/jmx/build/{scylla_product}-jmx-package.tar.gz
dir = tools/jmx
artifact = $builddir/{scylla_product}-jmx-package.tar.gz
build dist-jmx-tar: phony {' '.join(['$builddir/{mode}/dist/tar/{scylla_product}-jmx-package.tar.gz'.format(mode=mode, scylla_product=scylla_product) for mode in build_modes])}
build dist-jmx: phony dist-jmx-tar dist-jmx-rpm dist-jmx-deb
build scylla-jmx/build/scylla-jmx-package.tar.gz: build-submodule-reloc
reloc_dir = scylla-jmx
build dist-jmx-rpm: build-submodule-rpm scylla-jmx/build/scylla-jmx-package.tar.gz
dir = scylla-jmx
artifact = build/scylla-jmx-package.tar.gz
build dist-jmx-deb: build-submodule-deb scylla-jmx/build/scylla-jmx-package.tar.gz
dir = scylla-jmx
artifact = build/scylla-jmx-package.tar.gz
build dist-jmx: phony dist-jmx-rpm dist-jmx-deb
build tools/java/build/{scylla_product}-tools-package.tar.gz: build-submodule-reloc
reloc_dir = tools/java
build dist-tools-rpm: build-submodule-rpm tools/java/build/{scylla_product}-tools-package.tar.gz
dir = tools/java
artifact = $builddir/{scylla_product}-tools-package.tar.gz
build dist-tools-deb: build-submodule-deb tools/java/build/{scylla_product}-tools-package.tar.gz
dir = tools/java
artifact = $builddir/{scylla_product}-tools-package.tar.gz
build dist-tools-tar: phony {' '.join(['$builddir/{mode}/dist/tar/{scylla_product}-tools-package.tar.gz'.format(mode=mode, scylla_product=scylla_product) for mode in build_modes])}
build dist-tools: phony dist-tools-tar dist-tools-rpm dist-tools-deb
build scylla-tools/build/scylla-tools-package.tar.gz: build-submodule-reloc
reloc_dir = scylla-tools
build dist-tools-rpm: build-submodule-rpm scylla-tools/build/scylla-tools-package.tar.gz
dir = scylla-tools
artifact = build/scylla-tools-package.tar.gz
build dist-tools-deb: build-submodule-deb scylla-tools/build/scylla-tools-package.tar.gz
dir = scylla-tools
artifact = build/scylla-tools-package.tar.gz
build dist-tools: phony dist-tools-rpm dist-tools-deb
rule compat-python3-reloc
command = mkdir -p $builddir/release && ln -f $dir/$artifact $builddir/release/
rule compat-python3-rpm
command = cd $dir && ./reloc/build_rpm.sh --reloc-pkg $artifact --builddir ../../build/redhat
rule compat-python3-deb
command = cd $dir && ./reloc/build_deb.sh --reloc-pkg $artifact --builddir ../../build/debian
build $builddir/release/{scylla_product}-python3-package.tar.gz: compat-python3-reloc tools/python3/build/{scylla_product}-python3-package.tar.gz
dir = tools/python3
artifact = $builddir/{scylla_product}-python3-package.tar.gz
build compat-python3-rpm: compat-python3-rpm tools/python3/build/{scylla_product}-python3-package.tar.gz
dir = tools/python3
artifact = $builddir/{scylla_product}-python3-package.tar.gz
build compat-python3-deb: compat-python3-deb tools/python3/build/{scylla_product}-python3-package.tar.gz
dir = tools/python3
artifact = $builddir/{scylla_product}-python3-package.tar.gz
rule build-python-reloc
command = ./reloc/python3/build_reloc.sh
rule build-python-rpm
command = ./reloc/python3/build_rpm.sh
rule build-python-deb
command = ./reloc/python3/build_deb.sh
build tools/python3/build/{scylla_product}-python3-package.tar.gz: build-submodule-reloc
reloc_dir = tools/python3
args = --packages "{python3_dependencies}"
build dist-python3-rpm: build-submodule-rpm tools/python3/build/{scylla_product}-python3-package.tar.gz
dir = tools/python3
artifact = $builddir/{scylla_product}-python3-package.tar.gz
build dist-python3-deb: build-submodule-deb tools/python3/build/{scylla_product}-python3-package.tar.gz
dir = tools/python3
artifact = $builddir/{scylla_product}-python3-package.tar.gz
build dist-python3-tar: phony {' '.join(['$builddir/{mode}/dist/tar/{scylla_product}-python3-package.tar.gz'.format(mode=mode, scylla_product=scylla_product) for mode in build_modes])}
build dist-python3: phony dist-python3-tar dist-python3-rpm dist-python3-deb $builddir/release/{scylla_product}-python3-package.tar.gz compat-python3-rpm compat-python3-deb
build dist-deb: phony dist-server-deb dist-python3-deb dist-jmx-deb dist-tools-deb
build dist-rpm: phony dist-server-rpm dist-python3-rpm dist-jmx-rpm dist-tools-rpm
build dist-tar: phony dist-unified-tar dist-server-tar dist-python3-tar dist-jmx-tar dist-tools-tar
build dist: phony dist-unified dist-server dist-python3 dist-jmx dist-tools
build build/release/scylla-python3-package.tar.gz: build-python-reloc
build dist-python-rpm: build-python-rpm build/release/scylla-python3-package.tar.gz
build dist-python-deb: build-python-deb build/release/scylla-python3-package.tar.gz
build dist-python: phony dist-python-rpm dist-python-deb
build dist-deb: phony dist-server-deb dist-python-deb dist-jmx-deb dist-tools-deb
build dist-rpm: phony dist-server-rpm dist-python-rpm dist-jmx-rpm dist-tools-rpm
build dist: phony dist-server dist-python dist-jmx dist-tools
'''))
f.write(textwrap.dedent(f'''\
@@ -1914,11 +1764,6 @@ with open(buildfile_tmp, 'w') as f:
'''))
for mode in build_modes:
f.write(textwrap.dedent(f'''\
build $builddir/{mode}/dist/tar/{scylla_product}-python3-package.tar.gz: copy tools/python3/build/{scylla_product}-python3-package.tar.gz
build $builddir/{mode}/dist/tar/{scylla_product}-tools-package.tar.gz: copy tools/java/build/{scylla_product}-tools-package.tar.gz
build $builddir/{mode}/dist/tar/{scylla_product}-jmx-package.tar.gz: copy tools/jmx/build/{scylla_product}-jmx-package.tar.gz
build dist-{mode}: phony dist-server-{mode} dist-python3-{mode} dist-tools-{mode} dist-jmx-{mode} dist-unified-{mode}
build dist-check-{mode}: dist-check
mode = {mode}
'''))
@@ -1942,21 +1787,14 @@ with open(buildfile_tmp, 'w') as f:
build mode_list: mode_list
default {modes_list}
''').format(modes_list=' '.join(default_modes), **globals()))
unit_test_list = set(test for test in build_artifacts if test in set(tests))
f.write(textwrap.dedent('''\
rule unit_test_list
command = /usr/bin/env echo -e '{unit_test_list}'
description = List configured unit tests
build unit_test_list: unit_test_list
''').format(unit_test_list="\\n".join(unit_test_list)))
f.write(textwrap.dedent('''\
build always: phony
rule scylla_version_gen
command = ./SCYLLA-VERSION-GEN
build $builddir/SCYLLA-RELEASE-FILE $builddir/SCYLLA-VERSION-FILE: scylla_version_gen
build build/SCYLLA-RELEASE-FILE build/SCYLLA-VERSION-FILE: scylla_version_gen
rule debian_files_gen
command = ./dist/debian/debian_files_gen.py
build $builddir/debian/debian: debian_files_gen | always
''').format(**globals()))
build build/debian/debian: debian_files_gen | always
''').format(modes_list=' '.join(build_modes), **globals()))
os.rename(buildfile_tmp, buildfile)

View File

@@ -20,47 +20,44 @@
*/
#include "connection_notifier.hh"
#include "db/query_context.hh"
#include "cql3/constants.hh"
#include "database.hh"
#include "service/storage_proxy.hh"
#include <stdexcept>
sstring to_string(client_type ct) {
namespace db::system_keyspace {
extern const char *const CLIENTS;
}
static sstring to_string(client_type ct) {
switch (ct) {
case client_type::cql: return "cql";
case client_type::thrift: return "thrift";
case client_type::alternator: return "alternator";
default: throw std::runtime_error("Invalid client_type");
}
throw std::runtime_error("Invalid client_type");
}
static sstring to_string(client_connection_stage ccs) {
switch (ccs) {
case client_connection_stage::established: return connection_stage_literal<client_connection_stage::established>;
case client_connection_stage::authenticating: return connection_stage_literal<client_connection_stage::authenticating>;
case client_connection_stage::ready: return connection_stage_literal<client_connection_stage::ready>;
}
throw std::runtime_error("Invalid client_connection_stage");
}
future<> notify_new_client(client_data cd) {
// FIXME: consider prepared statement
const static sstring req
= format("INSERT INTO system.{} (address, port, client_type, connection_stage, shard_id, protocol_version, username) "
"VALUES (?, ?, ?, ?, ?, ?, ?);", db::system_keyspace::CLIENTS);
= format("INSERT INTO system.{} (address, port, client_type, shard_id, protocol_version, username) "
"VALUES (?, ?, ?, ?, ?, ?);", db::system_keyspace::CLIENTS);
return db::qctx->execute_cql(req,
std::move(cd.ip), cd.port, to_string(cd.ct), to_string(cd.connection_stage), cd.shard_id,
return db::execute_cql(req,
std::move(cd.ip), cd.port, to_string(cd.ct), cd.shard_id,
cd.protocol_version.has_value() ? data_value(*cd.protocol_version) : data_value::make_null(int32_type),
cd.username.value_or("anonymous")).discard_result();
}
future<> notify_disconnected_client(net::inet_address addr, int port, client_type ct) {
future<> notify_disconnected_client(gms::inet_address addr, client_type ct, int port) {
// FIXME: consider prepared statement
const static sstring req
= format("DELETE FROM system.{} where address=? AND port=? AND client_type=?;",
db::system_keyspace::CLIENTS);
return db::qctx->execute_cql(req, std::move(addr), port, to_string(ct)).discard_result();
return db::execute_cql(req, addr.addr(), port, to_string(ct)).discard_result();
}
future<> clear_clientlist() {

View File

@@ -20,65 +20,27 @@
*/
#pragma once
#include "db/query_context.hh"
#include <seastar/net/inet_address.hh>
#include "gms/inet_address.hh"
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
#include <optional>
namespace db::system_keyspace {
extern const char *const CLIENTS;
}
enum class client_type {
cql = 0,
thrift,
alternator,
};
sstring to_string(client_type ct);
enum class changed_column {
username = 0,
connection_stage,
driver_name,
driver_version,
hostname,
protocol_version,
};
template <changed_column column> constexpr const char* column_literal = "";
template <> inline constexpr const char* column_literal<changed_column::username> = "username";
template <> inline constexpr const char* column_literal<changed_column::connection_stage> = "connection_stage";
template <> inline constexpr const char* column_literal<changed_column::driver_name> = "driver_name";
template <> inline constexpr const char* column_literal<changed_column::driver_version> = "driver_version";
template <> inline constexpr const char* column_literal<changed_column::hostname> = "hostname";
template <> inline constexpr const char* column_literal<changed_column::protocol_version> = "protocol_version";
enum class client_connection_stage {
established = 0,
authenticating,
ready,
};
template <client_connection_stage ccs> constexpr const char* connection_stage_literal = "";
template <> inline constexpr const char* connection_stage_literal<client_connection_stage::established> = "ESTABLISHED";
template <> inline constexpr const char* connection_stage_literal<client_connection_stage::authenticating> = "AUTHENTICATING";
template <> inline constexpr const char* connection_stage_literal<client_connection_stage::ready> = "READY";
// Representation of a row in `system.clients'. std::optionals are for nullable cells.
struct client_data {
net::inet_address ip;
gms::inet_address ip;
int32_t port;
client_type ct;
client_connection_stage connection_stage = client_connection_stage::established;
int32_t shard_id; /// ID of server-side shard which is processing the connection.
// `optional' column means that it's nullable (possibly because it's
// unimplemented yet). If you want to fill ("implement") any of them,
// remember to update the query in `notify_new_client()'.
std::optional<sstring> connection_stage;
std::optional<sstring> driver_name;
std::optional<sstring> driver_version;
std::optional<sstring> hostname;
@@ -90,17 +52,6 @@ struct client_data {
};
future<> notify_new_client(client_data cd);
future<> notify_disconnected_client(net::inet_address addr, int port, client_type ct);
future<> notify_disconnected_client(gms::inet_address addr, client_type ct, int port);
future<> clear_clientlist();
template <changed_column column_enum_val>
struct notify_client_change {
template <typename T>
future<> operator()(net::inet_address addr, int port, client_type ct, T&& value) {
const static sstring req
= format("UPDATE system.{} SET {}=? WHERE address=? AND port=? AND client_type=?;",
db::system_keyspace::CLIENTS, column_literal<column_enum_val>);
return db::qctx->execute_cql(req, std::forward<T>(value), std::move(addr), port, to_string(ct)).discard_result();
}
};

View File

@@ -29,6 +29,15 @@ counter_id counter_id::local()
return counter_id(service::get_local_storage_service().get_local_id());
}
bool counter_id::less_compare_1_7_4::operator()(const counter_id& a, const counter_id& b) const
{
if (a._most_significant != b._most_significant) {
return a._most_significant < b._most_significant;
} else {
return a._least_significant < b._least_significant;
}
}
std::ostream& operator<<(std::ostream& os, const counter_id& id) {
return os << id.to_uuid();
}
@@ -59,6 +68,16 @@ void counter_cell_builder::do_sort_and_remove_duplicates()
_sorted = true;
}
std::vector<counter_shard> counter_cell_view::shards_compatible_with_1_7_4() const
{
auto sorted_shards = boost::copy_range<std::vector<counter_shard>>(shards());
counter_id::less_compare_1_7_4 cmp;
boost::range::sort(sorted_shards, [&] (auto& a, auto& b) {
return cmp(a.id(), b.id());
});
return sorted_shards;
}
static bool apply_in_place(const column_definition& cdef, atomic_cell_mutable_view dst, atomic_cell_mutable_view src)
{
auto dst_ccmv = counter_cell_mutable_view(dst);

View File

@@ -60,6 +60,11 @@ public:
bool operator!=(const counter_id& other) const {
return !(*this == other);
}
public:
// (Wrong) Counter ID ordering used by Scylla 1.7.4 and earlier.
struct less_compare_1_7_4 {
bool operator()(const counter_id& a, const counter_id& b) const;
};
public:
static counter_id local();
@@ -181,7 +186,7 @@ public:
int64_t logical_clock() const { return _logical_clock; }
counter_shard& update(int64_t value_delta, int64_t clock_increment) noexcept {
_value = uint64_t(_value) + uint64_t(value_delta); // signed int overflow is undefined hence the cast
_value += value_delta;
_logical_clock += clock_increment;
return *this;
}
@@ -277,14 +282,7 @@ public:
return ac;
}
class inserter_iterator {
public:
using iterator_category = std::output_iterator_tag;
using value_type = counter_shard;
using difference_type = std::ptrdiff_t;
using pointer = counter_shard*;
using reference = counter_shard&;
private:
class inserter_iterator : public std::iterator<std::output_iterator_tag, counter_shard> {
counter_cell_builder* _builder;
public:
explicit inserter_iterator(counter_cell_builder& b) : _builder(&b) { }
@@ -293,7 +291,7 @@ public:
return *this;
}
inserter_iterator& operator=(const counter_shard_view& csv) {
return this->operator=(counter_shard(csv));
return operator=(counter_shard(csv));
}
inserter_iterator& operator++() { return *this; }
inserter_iterator& operator++(int) { return *this; }
@@ -318,14 +316,7 @@ protected:
basic_atomic_cell_view<is_mutable> _cell;
linearized_value_view _value;
private:
class shard_iterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = basic_counter_shard_view<is_mutable>;
using difference_type = std::ptrdiff_t;
using pointer = basic_counter_shard_view<is_mutable>*;
using reference = basic_counter_shard_view<is_mutable>&;
private:
class shard_iterator : public std::iterator<std::input_iterator_tag, basic_counter_shard_view<is_mutable>> {
pointer_type _current;
basic_counter_shard_view<is_mutable> _current_view;
public:
@@ -426,6 +417,9 @@ struct counter_cell_view : basic_counter_cell_view<mutable_view::no> {
});
}
// Returns counter shards in an order that is compatible with Scylla 1.7.4.
std::vector<counter_shard> shards_compatible_with_1_7_4() const;
// Reversibly applies two counter cells, at least one of them must be live.
static void apply(const column_definition& cdef, atomic_cell_or_collection& dst, atomic_cell_or_collection& src);

View File

@@ -93,7 +93,6 @@ options {
#include "cql3/ut_name.hh"
#include "cql3/functions/function_name.hh"
#include "cql3/functions/function_call.hh"
#include "cql3/expr/expression.hh"
#include <seastar/core/sstring.hh>
#include "CqlLexer.hpp"
@@ -1333,7 +1332,7 @@ setOrMapLiteral[shared_ptr<cql3::term::raw> t] returns [shared_ptr<cql3::term::r
{ $value = ::make_shared<cql3::maps::literal>(std::move(m)); }
| { s.push_back(t); }
( ',' tn=term { s.push_back(tn); } )*
{ $value = ::make_shared<cql3::sets::literal>(std::move(s)); }
{ $value = make_shared(cql3::sets::literal(std::move(s))); }
;
collectionLiteral returns [shared_ptr<cql3::term::raw> value]
@@ -1344,7 +1343,7 @@ collectionLiteral returns [shared_ptr<cql3::term::raw> value]
| '{' t=term v=setOrMapLiteral[t] { $value = v; } '}'
// Note that we have an ambiguity between maps and set for "{}". So we force it to a set literal,
// and deal with it later based on the type of the column (SetLiteral.java).
| '{' '}' { $value = ::make_shared<cql3::sets::literal>(std::vector<shared_ptr<cql3::term::raw>>()); }
| '{' '}' { $value = make_shared(cql3::sets::literal({})); }
;
usertypeLiteral returns [shared_ptr<cql3::user_types::literal> ut]
@@ -1475,13 +1474,13 @@ udtColumnOperation[operations_type& operations,
columnCondition[conditions_type& conditions]
// Note: we'll reject duplicates later
: key=cident
( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, {}, op)); }
( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, {}, *op)); }
| K_IN
( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::in_condition({}, {}, values)); }
| marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::in_condition({}, marker, {})); }
)
| '[' element=term ']'
( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, element, op)); }
( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, element, *op)); }
| K_IN
( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::in_condition(element, {}, values)); }
| marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::in_condition(element, marker, {})); }
@@ -1504,31 +1503,31 @@ propertyValue returns [sstring str]
| u=unreserved_keyword { $str = u; }
;
relationType returns [cql3::expr::oper_t op]
: '=' { $op = cql3::expr::oper_t::EQ; }
| '<' { $op = cql3::expr::oper_t::LT; }
| '<=' { $op = cql3::expr::oper_t::LTE; }
| '>' { $op = cql3::expr::oper_t::GT; }
| '>=' { $op = cql3::expr::oper_t::GTE; }
| '!=' { $op = cql3::expr::oper_t::NEQ; }
| K_LIKE { $op = cql3::expr::oper_t::LIKE; }
relationType returns [const cql3::operator_type* op = nullptr]
: '=' { $op = &cql3::operator_type::EQ; }
| '<' { $op = &cql3::operator_type::LT; }
| '<=' { $op = &cql3::operator_type::LTE; }
| '>' { $op = &cql3::operator_type::GT; }
| '>=' { $op = &cql3::operator_type::GTE; }
| '!=' { $op = &cql3::operator_type::NEQ; }
| K_LIKE { $op = &cql3::operator_type::LIKE; }
;
relation[std::vector<cql3::relation_ptr>& clauses]
@init{ cql3::expr::oper_t rt; }
: name=cident type=relationType t=term { $clauses.emplace_back(::make_shared<cql3::single_column_relation>(std::move(name), type, std::move(t))); }
@init{ const cql3::operator_type* rt = nullptr; }
: name=cident type=relationType t=term { $clauses.emplace_back(::make_shared<cql3::single_column_relation>(std::move(name), *type, std::move(t))); }
| K_TOKEN l=tupleOfIdentifiers type=relationType t=term
{ $clauses.emplace_back(::make_shared<cql3::token_relation>(std::move(l), type, std::move(t))); }
{ $clauses.emplace_back(::make_shared<cql3::token_relation>(std::move(l), *type, std::move(t))); }
| name=cident K_IS K_NOT K_NULL {
$clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), cql3::expr::oper_t::IS_NOT, cql3::constants::NULL_LITERAL)); }
$clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), cql3::operator_type::IS_NOT, cql3::constants::NULL_LITERAL)); }
| name=cident K_IN marker=inMarker
{ $clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), cql3::expr::oper_t::IN, std::move(marker))); }
{ $clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), cql3::operator_type::IN, std::move(marker))); }
| name=cident K_IN in_values=singleColumnInValues
{ $clauses.emplace_back(cql3::single_column_relation::create_in_relation(std::move(name), std::move(in_values))); }
| name=cident K_CONTAINS { rt = cql3::expr::oper_t::CONTAINS; } (K_KEY { rt = cql3::expr::oper_t::CONTAINS_KEY; })?
t=term { $clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), rt, std::move(t))); }
| name=cident '[' key=term ']' type=relationType t=term { $clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), std::move(key), type, std::move(t))); }
| name=cident K_CONTAINS { rt = &cql3::operator_type::CONTAINS; } (K_KEY { rt = &cql3::operator_type::CONTAINS_KEY; })?
t=term { $clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), *rt, std::move(t))); }
| name=cident '[' key=term ']' type=relationType t=term { $clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), std::move(key), *type, std::move(t))); }
| ids=tupleOfIdentifiers
( K_IN
( '(' ')'
@@ -1544,10 +1543,10 @@ relation[std::vector<cql3::relation_ptr>& clauses]
)
| type=relationType literal=tupleLiteral /* (a, b, c) > (1, 2, 3) or (a, b, c) > (?, ?, ?) */
{
$clauses.emplace_back(cql3::multi_column_relation::create_non_in_relation(ids, type, literal));
$clauses.emplace_back(cql3::multi_column_relation::create_non_in_relation(ids, *type, literal));
}
| type=relationType tupleMarker=markerForTuple /* (a, b, c) >= ? */
{ $clauses.emplace_back(cql3::multi_column_relation::create_non_in_relation(ids, type, tupleMarker)); }
{ $clauses.emplace_back(cql3::multi_column_relation::create_non_in_relation(ids, *type, tupleMarker)); }
)
| '(' relation[$clauses] ')'
;
@@ -1695,7 +1694,7 @@ username returns [sstring str]
// Basically the same as cident, but we need to exlude existing CQL3 types
// (which for some reason are not reserved otherwise)
non_type_ident returns [shared_ptr<cql3::column_identifier> id]
: t=IDENT { if (_reserved_type_names().contains($t.text)) { add_recognition_error("Invalid (reserved) user type name " + $t.text); } $id = ::make_shared<cql3::column_identifier>($t.text, false); }
: t=IDENT { if (_reserved_type_names().count($t.text)) { add_recognition_error("Invalid (reserved) user type name " + $t.text); } $id = ::make_shared<cql3::column_identifier>($t.text, false); }
| t=QUOTED_NAME { $id = ::make_shared<cql3::column_identifier>($t.text, true); }
| k=basic_unreserved_keyword { $id = ::make_shared<cql3::column_identifier>(k, false); }
| kk=K_KEY { $id = ::make_shared<cql3::column_identifier>($kk.text, false); }

View File

@@ -52,6 +52,11 @@ attributes::attributes(::shared_ptr<term>&& timestamp, ::shared_ptr<term>&& time
, _time_to_live{std::move(time_to_live)}
{ }
bool attributes::uses_function(const sstring& ks_name, const sstring& function_name) const {
return (_timestamp && _timestamp->uses_function(ks_name, function_name))
|| (_time_to_live && _time_to_live->uses_function(ks_name, function_name));
}
bool attributes::is_timestamp_set() const {
return bool(_timestamp);
}

View File

@@ -59,6 +59,8 @@ public:
private:
attributes(::shared_ptr<term>&& timestamp, ::shared_ptr<term>&& time_to_live);
public:
bool uses_function(const sstring& ks_name, const sstring& function_name) const;
bool is_timestamp_set() const;
bool is_time_to_live_set() const;

View File

@@ -48,14 +48,13 @@
#include "types/map.hh"
#include "types/list.hh"
#include "utils/like_matcher.hh"
#include "expr/expression.hh"
namespace {
void validate_operation_on_durations(const abstract_type& type, cql3::expr::oper_t op) {
void validate_operation_on_durations(const abstract_type& type, const cql3::operator_type& op) {
using cql3::statements::request_validations::check_false;
if (is_slice(op) && type.references_duration()) {
if (op.is_slice() && type.references_duration()) {
check_false(type.is_collection(), "Slice conditions are not supported on collections containing durations");
check_false(type.is_tuple(), "Slice conditions are not supported on tuples containing durations");
check_false(type.is_user_type(), "Slice conditions are not supported on UDTs containing durations");
@@ -65,7 +64,7 @@ void validate_operation_on_durations(const abstract_type& type, cql3::expr::oper
}
}
int is_satisfied_by(cql3::expr::oper_t op, const abstract_type& cell_type,
int is_satisfied_by(const cql3::operator_type &op, const abstract_type& cell_type,
const abstract_type& param_type, const data_value& cell_value, const bytes& param) {
int rc;
@@ -83,24 +82,21 @@ int is_satisfied_by(cql3::expr::oper_t op, const abstract_type& cell_type,
} else {
rc = cell_type.compare(cell_type.decompose(cell_value), param);
}
switch (op) {
using cql3::expr::oper_t;
case oper_t::EQ:
if (op == cql3::operator_type::EQ) {
return rc == 0;
case oper_t::NEQ:
} else if (op == cql3::operator_type::NEQ) {
return rc != 0;
case oper_t::GTE:
} else if (op == cql3::operator_type::GTE) {
return rc >= 0;
case oper_t::LTE:
} else if (op == cql3::operator_type::LTE) {
return rc <= 0;
case oper_t::GT:
} else if (op == cql3::operator_type::GT) {
return rc > 0;
case oper_t::LT:
} else if (op == cql3::operator_type::LT) {
return rc < 0;
default:
assert(false);
return false;
}
assert(false);
return false;
}
// Read the list index from key and check that list index is not
@@ -118,6 +114,24 @@ uint32_t read_and_check_list_index(const cql3::raw_value_view& key) {
namespace cql3 {
bool
column_condition::uses_function(const sstring& ks_name, const sstring& function_name) const {
if (bool(_collection_element) && _collection_element->uses_function(ks_name, function_name)) {
return true;
}
if (bool(_value) && _value->uses_function(ks_name, function_name)) {
return true;
}
if (!_in_values.empty()) {
for (auto&& value : _in_values) {
if (bool(value) && value->uses_function(ks_name, function_name)) {
return true;
}
}
}
return false;
}
void column_condition::collect_marker_specificaton(variable_specifications& bound_names) const {
if (_collection_element) {
_collection_element->collect_marker_specification(bound_names);
@@ -209,7 +223,7 @@ bool column_condition::applies_to(const data_value* cell_value, const query_opti
}
}
if (is_compare(_op)) {
if (_op.is_compare()) {
// <, >, >=, <=, !=
cql3::raw_value_view param = _value->bind_and_get(options);
@@ -217,23 +231,23 @@ bool column_condition::applies_to(const data_value* cell_value, const query_opti
throw exceptions::invalid_request_exception("Invalid 'unset' value in condition");
}
if (param.is_null()) {
if (_op == expr::oper_t::EQ) {
if (_op == operator_type::EQ) {
return cell_value == nullptr;
} else if (_op == expr::oper_t::NEQ) {
} else if (_op == operator_type::NEQ) {
return cell_value != nullptr;
} else {
throw exceptions::invalid_request_exception(format("Invalid comparison with null for operator \"{}\"", _op));
}
} else if (cell_value == nullptr) {
// The condition parameter is not null, so only NEQ can return true
return _op == expr::oper_t::NEQ;
return _op == operator_type::NEQ;
}
// type::validate() is called by bind_and_get(), so it's safe to pass to_bytes() result
// directly to compare.
return is_satisfied_by(_op, *cell_value->type(), *column.type, *cell_value, to_bytes(param));
}
if (_op == expr::oper_t::LIKE) {
if (_op == operator_type::LIKE) {
if (cell_value == nullptr) {
return false;
}
@@ -252,7 +266,7 @@ bool column_condition::applies_to(const data_value* cell_value, const query_opti
}
}
assert(_op == expr::oper_t::IN);
assert(_op == operator_type::IN);
std::vector<bytes_opt> in_values;
@@ -270,7 +284,7 @@ bool column_condition::applies_to(const data_value* cell_value, const query_opti
// If cell value is NULL, IN list must contain NULL or an empty set/list. Otherwise it must contain cell value.
if (cell_value) {
return std::any_of(in_values.begin(), in_values.end(), [this, cell_value] (const bytes_opt& value) {
return value.has_value() && is_satisfied_by(expr::oper_t::EQ, *cell_value->type(), *column.type, *cell_value, *value);
return value.has_value() && is_satisfied_by(operator_type::EQ, *cell_value->type(), *column.type, *cell_value, *value);
});
} else {
return std::any_of(in_values.begin(), in_values.end(), [] (const bytes_opt& value) { return !value.has_value() || value->empty(); });
@@ -311,13 +325,13 @@ column_condition::raw::prepare(database& db, const sstring& keyspace, const colu
collection_element_term = _collection_element->prepare(db, keyspace, element_spec);
}
if (is_compare(_op)) {
if (_op.is_compare()) {
validate_operation_on_durations(*receiver.type, _op);
return column_condition::condition(receiver, collection_element_term,
_value->prepare(db, keyspace, value_spec), nullptr, _op);
}
if (_op == expr::oper_t::LIKE) {
if (_op == operator_type::LIKE) {
auto literal_term = dynamic_pointer_cast<constants::literal>(_value);
if (literal_term) {
// Pass matcher object
@@ -334,7 +348,7 @@ column_condition::raw::prepare(database& db, const sstring& keyspace, const colu
}
}
if (_op != expr::oper_t::IN) {
if (_op != operator_type::IN) {
throw exceptions::invalid_request_exception(format("Unsupported operator type {} in a condition ", _op));
}

View File

@@ -43,7 +43,7 @@
#include "cql3/term.hh"
#include "cql3/abstract_marker.hh"
#include "cql3/expr/expression.hh"
#include "cql3/operator.hh"
#include "utils/like_matcher.hh"
namespace cql3 {
@@ -67,11 +67,11 @@ private:
// List of terminals for "a IN (value, value, ...)"
std::vector<::shared_ptr<term>> _in_values;
const std::unique_ptr<like_matcher> _matcher;
expr::oper_t _op;
const operator_type& _op;
public:
column_condition(const column_definition& column, ::shared_ptr<term> collection_element,
::shared_ptr<term> value, std::vector<::shared_ptr<term>> in_values,
std::unique_ptr<like_matcher> matcher, expr::oper_t op)
std::unique_ptr<like_matcher> matcher, const operator_type& op)
: column(column)
, _collection_element(std::move(collection_element))
, _value(std::move(value))
@@ -79,7 +79,7 @@ public:
, _matcher(std::move(matcher))
, _op(op)
{
if (op != expr::oper_t::IN) {
if (op != operator_type::IN) {
assert(_in_values.empty());
}
}
@@ -91,6 +91,8 @@ public:
*/
void collect_marker_specificaton(variable_specifications& bound_names) const;
bool uses_function(const sstring& ks_name, const sstring& function_name) const;
// Retrieve parameter marker values, if any, find the appropriate collection
// element if the cell is a collection and an element access is used in the expression,
// and evaluate the condition.
@@ -103,7 +105,7 @@ public:
* "IF col LIKE <pattern>"
*/
static lw_shared_ptr<column_condition> condition(const column_definition& def, ::shared_ptr<term> collection_element,
::shared_ptr<term> value, std::unique_ptr<like_matcher> matcher, expr::oper_t op) {
::shared_ptr<term> value, std::unique_ptr<like_matcher> matcher, const operator_type& op) {
return make_lw_shared<column_condition>(def, std::move(collection_element), std::move(value),
std::vector<::shared_ptr<term>>{}, std::move(matcher), op);
}
@@ -112,7 +114,7 @@ public:
static lw_shared_ptr<column_condition> in_condition(const column_definition& def, ::shared_ptr<term> collection_element,
::shared_ptr<term> in_marker, std::vector<::shared_ptr<term>> in_values) {
return make_lw_shared<column_condition>(def, std::move(collection_element), std::move(in_marker),
std::move(in_values), nullptr, expr::oper_t::IN);
std::move(in_values), nullptr, operator_type::IN);
}
class raw final {
@@ -123,13 +125,13 @@ public:
// Can be nullptr, used with the syntax "IF m[e] = ..." (in which case it's 'e')
::shared_ptr<term::raw> _collection_element;
expr::oper_t _op;
const operator_type& _op;
public:
raw(::shared_ptr<term::raw> value,
std::vector<::shared_ptr<term::raw>> in_values,
::shared_ptr<abstract_marker::in_raw> in_marker,
::shared_ptr<term::raw> collection_element,
expr::oper_t op)
const operator_type& op)
: _value(std::move(value))
, _in_values(std::move(in_values))
, _in_marker(std::move(in_marker))
@@ -145,7 +147,7 @@ public:
* "IF col LIKE 'foo%'"
*/
static lw_shared_ptr<raw> simple_condition(::shared_ptr<term::raw> value, ::shared_ptr<term::raw> collection_element,
expr::oper_t op) {
const operator_type& op) {
return make_lw_shared<raw>(std::move(value), std::vector<::shared_ptr<term::raw>>{},
::shared_ptr<abstract_marker::in_raw>{}, std::move(collection_element), op);
}
@@ -161,7 +163,7 @@ public:
static lw_shared_ptr<raw> in_condition(::shared_ptr<term::raw> collection_element,
::shared_ptr<abstract_marker::in_raw> in_marker, std::vector<::shared_ptr<term::raw>> in_values) {
return make_lw_shared<raw>(::shared_ptr<term::raw>{}, std::move(in_values), std::move(in_marker),
std::move(collection_element), expr::oper_t::IN);
std::move(collection_element), operator_type::IN);
}
lw_shared_ptr<column_condition> prepare(database& db, const sstring& keyspace, const column_definition& receiver) const;

View File

@@ -46,7 +46,6 @@
#include "cql3/operation.hh"
#include "cql3/values.hh"
#include "cql3/term.hh"
#include "mutation.hh"
#include <seastar/core/shared_ptr.hh>
namespace cql3 {
@@ -185,8 +184,7 @@ public:
}
return value;
} catch (const marshal_exception& e) {
throw exceptions::invalid_request_exception(
format("Exception while binding column {:s}: {:s}", _receiver->name->to_cql_string(), e.what()));
throw exceptions::invalid_request_exception(e.what());
}
}

View File

@@ -348,22 +348,22 @@ cql3_type::raw::user_type(ut_name name) {
shared_ptr<cql3_type::raw>
cql3_type::raw::map(shared_ptr<raw> t1, shared_ptr<raw> t2) {
return ::make_shared<raw_collection>(abstract_type::kind::map, std::move(t1), std::move(t2));
return make_shared(raw_collection(abstract_type::kind::map, std::move(t1), std::move(t2)));
}
shared_ptr<cql3_type::raw>
cql3_type::raw::list(shared_ptr<raw> t) {
return ::make_shared<raw_collection>(abstract_type::kind::list, nullptr, std::move(t));
return make_shared(raw_collection(abstract_type::kind::list, {}, std::move(t)));
}
shared_ptr<cql3_type::raw>
cql3_type::raw::set(shared_ptr<raw> t) {
return ::make_shared<raw_collection>(abstract_type::kind::set, nullptr, std::move(t));
return make_shared(raw_collection(abstract_type::kind::set, {}, std::move(t)));
}
shared_ptr<cql3_type::raw>
cql3_type::raw::tuple(std::vector<shared_ptr<raw>> ts) {
return ::make_shared<raw_tuple>(std::move(ts));
return make_shared(raw_tuple(std::move(ts)));
}
shared_ptr<cql3_type::raw>

View File

@@ -101,15 +101,13 @@ public:
virtual future<::shared_ptr<cql_transport::messages::result_message>>
execute(service::storage_proxy& proxy, service::query_state& state, const query_options& options) const = 0;
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const = 0;
virtual bool depends_on_keyspace(const sstring& ks_name) const = 0;
virtual bool depends_on_column_family(const sstring& cf_name) const = 0;
virtual shared_ptr<const metadata> get_result_metadata() const = 0;
virtual bool is_conditional() const {
return false;
}
};
class cql_statement_no_metadata : public cql_statement {

View File

@@ -1,924 +0,0 @@
/*
* Copyright (C) 2020 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "expression.hh"
#include <boost/algorithm/cxx11/all_of.hpp>
#include <boost/algorithm/cxx11/any_of.hpp>
#include <boost/range/adaptors.hpp>
#include <fmt/ostream.h>
#include <unordered_map>
#include "cql3/lists.hh"
#include "cql3/tuples.hh"
#include "index/secondary_index_manager.hh"
#include "types/list.hh"
#include "types/map.hh"
#include "types/set.hh"
#include "utils/like_matcher.hh"
namespace cql3 {
namespace expr {
using boost::adaptors::filtered;
using boost::adaptors::transformed;
namespace {
std::optional<atomic_cell_value_view> do_get_value(const schema& schema,
const column_definition& cdef,
const partition_key& key,
const clustering_key_prefix& ckey,
const row& cells,
gc_clock::time_point now) {
switch (cdef.kind) {
case column_kind::partition_key:
return atomic_cell_value_view(key.get_component(schema, cdef.component_index()));
case column_kind::clustering_key:
return atomic_cell_value_view(ckey.get_component(schema, cdef.component_index()));
default:
auto cell = cells.find_cell(cdef.id);
if (!cell) {
return std::nullopt;
}
assert(cdef.is_atomic());
auto c = cell->as_atomic_cell(cdef);
return c.is_dead(now) ? std::nullopt : std::optional<atomic_cell_value_view>(c.value());
}
}
using children_t = std::vector<expression>; // conjunction's children.
children_t explode_conjunction(expression e) {
return std::visit(overloaded_functor{
[] (const conjunction& c) { return std::move(c.children); },
[&] (const auto&) { return children_t{std::move(e)}; },
}, e);
}
using cql3::selection::selection;
/// Serialized values for all types of cells, plus selection (to find a column's index) and options (for
/// subscript term's value).
struct row_data_from_partition_slice {
const std::vector<bytes>& partition_key;
const std::vector<bytes>& clustering_key;
const std::vector<bytes_opt>& other_columns;
const selection& sel;
};
/// Data used to derive cell values from a mutation.
struct row_data_from_mutation {
// Underscores avoid name clashes.
const partition_key& partition_key_;
const clustering_key_prefix& clustering_key_;
const row& other_columns;
const schema& schema_;
gc_clock::time_point now;
};
/// Everything needed to compute column values during restriction evaluation.
struct column_value_eval_bag {
const query_options& options; // For evaluating subscript terms.
std::variant<row_data_from_partition_slice, row_data_from_mutation> row_data;
};
/// Returns col's value from queried data.
bytes_opt get_value_from_partition_slice(
const column_value& col, row_data_from_partition_slice data, const query_options& options) {
auto cdef = col.col;
if (col.sub) {
auto col_type = static_pointer_cast<const collection_type_impl>(cdef->type);
if (!col_type->is_map()) {
throw exceptions::invalid_request_exception(format("subscripting non-map column {}", cdef->name_as_text()));
}
const auto deserialized = cdef->type->deserialize(*data.other_columns[data.sel.index_of(*cdef)]);
const auto& data_map = value_cast<map_type_impl::native_type>(deserialized);
const auto key = col.sub->bind_and_get(options);
auto&& key_type = col_type->name_comparator();
const auto found = with_linearized(*key, [&] (bytes_view key_bv) {
using entry = std::pair<data_value, data_value>;
return std::find_if(data_map.cbegin(), data_map.cend(), [&] (const entry& element) {
return key_type->compare(element.first.serialize_nonnull(), key_bv) == 0;
});
});
return found == data_map.cend() ? bytes_opt() : bytes_opt(found->second.serialize_nonnull());
} else {
switch (cdef->kind) {
case column_kind::partition_key:
return data.partition_key[cdef->id];
case column_kind::clustering_key:
return data.clustering_key[cdef->id];
case column_kind::static_column:
case column_kind::regular_column:
return data.other_columns[data.sel.index_of(*cdef)];
default:
throw exceptions::unsupported_operation_exception("Unknown column kind");
}
}
}
/// Returns col's value from a mutation.
bytes_opt get_value_from_mutation(const column_value& col, row_data_from_mutation data) {
const auto v = do_get_value(
data.schema_, *col.col, data.partition_key_, data.clustering_key_, data.other_columns, data.now);
return v ? v->linearize() : bytes_opt();
}
/// Returns col's value from the fetched data.
bytes_opt get_value(const column_value& col, const column_value_eval_bag& bag) {
using std::placeholders::_1;
return std::visit(overloaded_functor{
std::bind(get_value_from_mutation, col, _1),
std::bind(get_value_from_partition_slice, col, _1, bag.options),
}, bag.row_data);
}
/// Type for comparing results of get_value().
const abstract_type* get_value_comparator(const column_definition* cdef) {
return cdef->type->is_reversed() ? cdef->type->underlying_type().get() : cdef->type.get();
}
/// Type for comparing results of get_value().
const abstract_type* get_value_comparator(const column_value& cv) {
return cv.sub ? static_pointer_cast<const collection_type_impl>(cv.col->type)->value_comparator().get()
: get_value_comparator(cv.col);
}
/// If t represents a tuple value, returns that value. Otherwise, null.
///
/// Useful for checking binary_operator::rhs, which packs multiple values into a single term when lhs is itself
/// a tuple. NOT useful for the IN operator, whose rhs is either a list or tuples::in_value.
::shared_ptr<tuples::value> get_tuple(term& t, const query_options& opts) {
return dynamic_pointer_cast<tuples::value>(t.bind(opts));
}
/// True iff lhs's value equals rhs.
bool equal(const bytes_opt& rhs, const column_value& lhs, const column_value_eval_bag& bag) {
if (!rhs) {
return false;
}
const auto value = get_value(lhs, bag);
if (!value) {
return false;
}
return get_value_comparator(lhs)->equal(*value, *rhs);
}
/// Convenience overload for term.
bool equal(term& rhs, const column_value& lhs, const column_value_eval_bag& bag) {
return equal(to_bytes_opt(rhs.bind_and_get(bag.options)), lhs, bag);
}
/// True iff columns' values equal t.
bool equal(term& t, const std::vector<column_value>& columns, const column_value_eval_bag& bag) {
const auto tup = get_tuple(t, bag.options);
if (!tup) {
throw exceptions::invalid_request_exception("multi-column equality has right-hand side that isn't a tuple");
}
const auto& rhs = tup->get_elements();
if (rhs.size() != columns.size()) {
throw exceptions::invalid_request_exception(
format("tuple equality size mismatch: {} elements on left-hand side, {} on right",
columns.size(), rhs.size()));
}
return boost::equal(rhs, columns, [&] (const bytes_opt& b, const column_value& lhs) {
return equal(b, lhs, bag);
});
}
/// True iff lhs is limited by rhs in the manner prescribed by op.
bool limits(bytes_view lhs, oper_t op, bytes_view rhs, const abstract_type& type) {
const auto cmp = type.compare(lhs, rhs);
switch (op) {
case oper_t::LT:
return cmp < 0;
case oper_t::LTE:
return cmp <= 0;
case oper_t::GT:
return cmp > 0;
case oper_t::GTE:
return cmp >= 0;
case oper_t::EQ:
return cmp == 0;
case oper_t::NEQ:
return cmp != 0;
default:
throw std::logic_error(format("limits() called on non-compare op {}", op));
}
}
/// True iff the column value is limited by rhs in the manner prescribed by op.
bool limits(const column_value& col, oper_t op, term& rhs, const column_value_eval_bag& bag) {
if (!is_slice(op)) { // For EQ or NEQ, use equal().
throw std::logic_error("limits() called on non-slice op");
}
auto lhs = get_value(col, bag);
if (!lhs) {
return false;
}
const auto b = to_bytes_opt(rhs.bind_and_get(bag.options));
return b ? limits(*lhs, op, *b, *get_value_comparator(col)) : false;
}
/// True iff the column values are limited by t in the manner prescribed by op.
bool limits(const std::vector<column_value>& columns, const oper_t op, term& t,
const column_value_eval_bag& bag) {
if (!is_slice(op)) { // For EQ or NEQ, use equal().
throw std::logic_error("limits() called on non-slice op");
}
const auto tup = get_tuple(t, bag.options);
if (!tup) {
throw exceptions::invalid_request_exception(
"multi-column comparison has right-hand side that isn't a tuple");
}
const auto& rhs = tup->get_elements();
if (rhs.size() != columns.size()) {
throw exceptions::invalid_request_exception(
format("tuple comparison size mismatch: {} elements on left-hand side, {} on right",
columns.size(), rhs.size()));
}
for (size_t i = 0; i < rhs.size(); ++i) {
const auto cmp = get_value_comparator(columns[i])->compare(
// CQL dictates that columns[i] is a clustering column and non-null.
*get_value(columns[i], bag),
*rhs[i]);
// If the components aren't equal, then we just learned the LHS/RHS order.
if (cmp < 0) {
if (op == oper_t::LT || op == oper_t::LTE) {
return true;
} else if (op == oper_t::GT || op == oper_t::GTE) {
return false;
} else {
throw std::logic_error("Unknown slice operator");
}
} else if (cmp > 0) {
if (op == oper_t::LT || op == oper_t::LTE) {
return false;
} else if (op == oper_t::GT || op == oper_t::GTE) {
return true;
} else {
throw std::logic_error("Unknown slice operator");
}
}
// Otherwise, we don't know the LHS/RHS order, so check the next component.
}
// Getting here means LHS == RHS.
return op == oper_t::LTE || op == oper_t::GTE;
}
/// True iff collection (list, set, or map) contains value.
bool contains(const data_value& collection, const raw_value_view& value) {
if (!value) {
return true; // Compatible with old code, which skips null terms in value comparisons.
}
auto col_type = static_pointer_cast<const collection_type_impl>(collection.type());
auto&& element_type = col_type->is_set() ? col_type->name_comparator() : col_type->value_comparator();
return with_linearized(*value, [&] (bytes_view val) {
auto exists_in = [&](auto&& range) {
auto found = std::find_if(range.begin(), range.end(), [&] (auto&& element) {
return element_type->compare(element.serialize_nonnull(), val) == 0;
});
return found != range.end();
};
if (col_type->is_list()) {
return exists_in(value_cast<list_type_impl::native_type>(collection));
} else if (col_type->is_set()) {
return exists_in(value_cast<set_type_impl::native_type>(collection));
} else if (col_type->is_map()) {
auto data_map = value_cast<map_type_impl::native_type>(collection);
using entry = std::pair<data_value, data_value>;
return exists_in(data_map | transformed([] (const entry& e) { return e.second; }));
} else {
throw std::logic_error("unsupported collection type in a CONTAINS expression");
}
});
}
/// True iff a column is a collection containing value.
bool contains(const column_value& col, const raw_value_view& value, const column_value_eval_bag& bag) {
if (col.sub) {
throw exceptions::unsupported_operation_exception("CONTAINS lhs is subscripted");
}
const auto collection = get_value(col, bag);
if (collection) {
return contains(col.col->type->deserialize(*collection), value);
} else {
return false;
}
}
/// True iff a column is a map containing \p key.
bool contains_key(const column_value& col, cql3::raw_value_view key, const column_value_eval_bag& bag) {
if (col.sub) {
throw exceptions::unsupported_operation_exception("CONTAINS KEY lhs is subscripted");
}
if (!key) {
return true; // Compatible with old code, which skips null terms in key comparisons.
}
auto type = col.col->type;
const auto collection = get_value(col, bag);
if (!collection) {
return false;
}
const auto data_map = value_cast<map_type_impl::native_type>(type->deserialize(*collection));
auto key_type = static_pointer_cast<const collection_type_impl>(type)->name_comparator();
auto found = with_linearized(*key, [&] (bytes_view k_bv) {
using entry = std::pair<data_value, data_value>;
return std::find_if(data_map.begin(), data_map.end(), [&] (const entry& element) {
return key_type->compare(element.first.serialize_nonnull(), k_bv) == 0;
});
});
return found != data_map.end();
}
/// Fetches the next cell value from iter and returns its (possibly null) value.
bytes_opt next_value(query::result_row_view::iterator_type& iter, const column_definition* cdef) {
if (cdef->type->is_multi_cell()) {
auto cell = iter.next_collection_cell();
if (cell) {
return cell->with_linearized([] (bytes_view data) {
return bytes(data.cbegin(), data.cend());
});
}
} else {
auto cell = iter.next_atomic_cell();
if (cell) {
return cell->value().with_linearized([] (bytes_view data) {
return bytes(data.cbegin(), data.cend());
});
}
}
return std::nullopt;
}
/// Returns values of non-primary-key columns from selection. The kth element of the result
/// corresponds to the kth column in selection.
std::vector<bytes_opt> get_non_pk_values(const selection& selection, const query::result_row_view& static_row,
const query::result_row_view* row) {
const auto& cols = selection.get_columns();
std::vector<bytes_opt> vals(cols.size());
auto static_row_iterator = static_row.iterator();
auto row_iterator = row ? std::optional<query::result_row_view::iterator_type>(row->iterator()) : std::nullopt;
for (size_t i = 0; i < cols.size(); ++i) {
switch (cols[i]->kind) {
case column_kind::static_column:
vals[i] = next_value(static_row_iterator, cols[i]);
break;
case column_kind::regular_column:
if (row) {
vals[i] = next_value(*row_iterator, cols[i]);
}
break;
default: // Skip.
break;
}
}
return vals;
}
/// True iff cv matches the CQL LIKE pattern.
bool like(const column_value& cv, const bytes_opt& pattern, const column_value_eval_bag& bag) {
if (!cv.col->type->is_string()) {
throw exceptions::invalid_request_exception(
format("LIKE is allowed only on string types, which {} is not", cv.col->name_as_text()));
}
auto value = get_value(cv, bag);
// TODO: reuse matchers.
return (pattern && value) ? like_matcher(*pattern)(*value) : false;
}
/// True iff the column value is in the set defined by rhs.
bool is_one_of(const column_value& col, term& rhs, const column_value_eval_bag& bag) {
// RHS is prepared differently for different CQL cases. Cast it dynamically to discern which case this is.
if (auto dv = dynamic_cast<lists::delayed_value*>(&rhs)) {
// This is `a IN (1,2,3)`. RHS elements are themselves terms.
return boost::algorithm::any_of(dv->get_elements(), [&] (const ::shared_ptr<term>& t) {
return equal(*t, col, bag);
});
} else if (auto mkr = dynamic_cast<lists::marker*>(&rhs)) {
// This is `a IN ?`. RHS elements are values representable as bytes_opt.
const auto values = static_pointer_cast<lists::value>(mkr->bind(bag.options));
return boost::algorithm::any_of(values->get_elements(), [&] (const bytes_opt& b) {
return equal(b, col, bag);
});
}
throw std::logic_error("unexpected term type in is_one_of(single column)");
}
/// True iff the tuple of column values is in the set defined by rhs.
bool is_one_of(const std::vector<column_value>& cvs, term& rhs, const column_value_eval_bag& bag) {
// RHS is prepared differently for different CQL cases. Cast it dynamically to discern which case this is.
if (auto dv = dynamic_cast<lists::delayed_value*>(&rhs)) {
// This is `(a,b) IN ((1,1),(2,2),(3,3))`. RHS elements are themselves terms.
return boost::algorithm::any_of(dv->get_elements(), [&] (const ::shared_ptr<term>& t) {
return equal(*t, cvs, bag);
});
} else if (auto mkr = dynamic_cast<tuples::in_marker*>(&rhs)) {
// This is `(a,b) IN ?`. RHS elements are themselves tuples, represented as vector<bytes_opt>.
const auto marker_value = static_pointer_cast<tuples::in_value>(mkr->bind(bag.options));
return boost::algorithm::any_of(marker_value->get_split_values(), [&] (const std::vector<bytes_opt>& el) {
return boost::equal(cvs, el, [&] (const column_value& c, const bytes_opt& b) {
return equal(b, c, bag);
});
});
}
throw std::logic_error("unexpected term type in is_one_of(multi-column)");
}
/// True iff op means bnd type of bound.
bool matches(oper_t op, statements::bound bnd) {
switch (op) {
case oper_t::GT:
case oper_t::GTE:
return is_start(bnd); // These set a lower bound.
case oper_t::LT:
case oper_t::LTE:
return is_end(bnd); // These set an upper bound.
case oper_t::EQ:
return true; // Bounds from both sides.
default:
return false;
}
}
const value_set empty_value_set = value_list{};
const value_set unbounded_value_set = nonwrapping_range<bytes>::make_open_ended_both_sides();
struct intersection_visitor {
const abstract_type* type;
value_set operator()(const value_list& a, const value_list& b) const {
value_list common;
common.reserve(std::max(a.size(), b.size()));
boost::set_intersection(a, b, back_inserter(common), type->as_less_comparator());
return std::move(common);
}
value_set operator()(const nonwrapping_range<bytes>& a, const value_list& b) const {
const auto common = b | filtered([&] (const bytes& el) { return a.contains(el, type->as_tri_comparator()); });
return value_list(common.begin(), common.end());
}
value_set operator()(const value_list& a, const nonwrapping_range<bytes>& b) const {
return (*this)(b, a);
}
value_set operator()(const nonwrapping_range<bytes>& a, const nonwrapping_range<bytes>& b) const {
const auto common_range = a.intersection(b, type->as_tri_comparator());
return common_range ? *common_range : empty_value_set;
}
};
value_set intersection(value_set a, value_set b, const abstract_type* type) {
return std::visit(intersection_visitor{type}, std::move(a), std::move(b));
}
bool is_satisfied_by(const binary_operator& opr, const column_value_eval_bag& bag) {
return std::visit(overloaded_functor{
[&] (const column_value& col) {
if (opr.op == oper_t::EQ) {
return equal(*opr.rhs, col, bag);
} else if (opr.op == oper_t::NEQ) {
return !equal(*opr.rhs, col, bag);
} else if (is_slice(opr.op)) {
return limits(col, opr.op, *opr.rhs, bag);
} else if (opr.op == oper_t::CONTAINS) {
return contains(col, opr.rhs->bind_and_get(bag.options), bag);
} else if (opr.op == oper_t::CONTAINS_KEY) {
return contains_key(col, opr.rhs->bind_and_get(bag.options), bag);
} else if (opr.op == oper_t::LIKE) {
return like(col, to_bytes_opt(opr.rhs->bind_and_get(bag.options)), bag);
} else if (opr.op == oper_t::IN) {
return is_one_of(col, *opr.rhs, bag);
} else {
throw exceptions::unsupported_operation_exception(format("Unhandled binary_operator: {}", opr));
}
},
[&] (const std::vector<column_value>& cvs) {
if (opr.op == oper_t::EQ) {
return equal(*opr.rhs, cvs, bag);
} else if (is_slice(opr.op)) {
return limits(cvs, opr.op, *opr.rhs, bag);
} else if (opr.op == oper_t::IN) {
return is_one_of(cvs, *opr.rhs, bag);
} else {
throw exceptions::unsupported_operation_exception(
format("Unhandled multi-column binary_operator: {}", opr));
}
},
[] (const token& tok) -> bool {
// The RHS value was already used to ensure we fetch only rows in the specified
// token range. It is impossible for any fetched row not to match now.
return true;
},
}, opr.lhs);
}
bool is_satisfied_by(const expression& restr, const column_value_eval_bag& bag) {
return std::visit(overloaded_functor{
[&] (bool v) { return v; },
[&] (const conjunction& conj) {
return boost::algorithm::all_of(conj.children, [&] (const expression& c) {
return is_satisfied_by(c, bag);
});
},
[&] (const binary_operator& opr) { return is_satisfied_by(opr, bag); },
}, restr);
}
/// If t is a tuple, binds and gets its k-th element. Otherwise, binds and gets t's whole value.
bytes_opt get_kth(size_t k, const query_options& options, const ::shared_ptr<term>& t) {
auto bound = t->bind(options);
if (auto tup = dynamic_pointer_cast<tuples::value>(bound)) {
return tup->get_elements()[k];
} else {
throw std::logic_error("non-tuple RHS for multi-column IN");
}
}
template<typename Range>
value_list to_sorted_vector(Range r, const serialized_compare& comparator) {
BOOST_CONCEPT_ASSERT((boost::ForwardRangeConcept<Range>));
value_list tmp(r.begin(), r.end()); // Need random-access range to sort (r is not necessarily random-access).
const auto unique = boost::unique(boost::sort(tmp, comparator));
return value_list(unique.begin(), unique.end());
}
const auto non_null = boost::adaptors::filtered([] (const bytes_opt& b) { return b.has_value(); });
const auto deref = boost::adaptors::transformed([] (const bytes_opt& b) { return b.value(); });
/// Returns possible values from t, which must be RHS of IN.
value_list get_IN_values(
const ::shared_ptr<term>& t, const query_options& options, const serialized_compare& comparator) {
// RHS is prepared differently for different CQL cases. Cast it dynamically to discern which case this is.
if (auto dv = dynamic_pointer_cast<lists::delayed_value>(t)) {
// Case `a IN (1,2,3)`.
const auto result_range = dv->get_elements()
| boost::adaptors::transformed([&] (const ::shared_ptr<term>& t) { return to_bytes_opt(t->bind_and_get(options)); })
| non_null | deref;
return to_sorted_vector(std::move(result_range), comparator);
} else if (auto mkr = dynamic_pointer_cast<lists::marker>(t)) {
// Case `a IN ?`. Collect all list-element values.
const auto val = static_pointer_cast<lists::value>(mkr->bind(options));
return to_sorted_vector(val->get_elements() | non_null | deref, comparator);
}
throw std::logic_error(format("get_IN_values(single column) on invalid term {}", *t));
}
/// Returns possible values for k-th column from t, which must be RHS of IN.
value_list get_IN_values(const ::shared_ptr<term>& t, size_t k, const query_options& options,
const serialized_compare& comparator) {
// RHS is prepared differently for different CQL cases. Cast it dynamically to discern which case this is.
if (auto dv = dynamic_pointer_cast<lists::delayed_value>(t)) {
// Case `(a,b) in ((1,1),(2,2),(3,3))`. Get kth value from each term element.
const auto result_range = dv->get_elements()
| boost::adaptors::transformed(std::bind_front(get_kth, k, options)) | non_null | deref;
return to_sorted_vector(std::move(result_range), comparator);
} else if (auto mkr = dynamic_pointer_cast<tuples::in_marker>(t)) {
// Case `(a,b) IN ?`. Get kth value from each vector<bytes> element.
const auto val = static_pointer_cast<tuples::in_value>(mkr->bind(options));
const auto split_values = val->get_split_values(); // Need lvalue from which to make std::view.
const auto result_range = split_values
| boost::adaptors::transformed([k] (const std::vector<bytes_opt>& v) { return v[k]; }) | non_null | deref;
return to_sorted_vector(std::move(result_range), comparator);
}
throw std::logic_error(format("get_IN_values(multi-column) on invalid term {}", *t));
}
static constexpr bool inclusive = true, exclusive = false;
/// A range of all X such that X op val.
nonwrapping_range<bytes> to_range(oper_t op, const bytes& val) {
switch (op) {
case oper_t::GT:
return nonwrapping_range<bytes>::make_starting_with(interval_bound(val, exclusive));
case oper_t::GTE:
return nonwrapping_range<bytes>::make_starting_with(interval_bound(val, inclusive));
case oper_t::LT:
return nonwrapping_range<bytes>::make_ending_with(interval_bound(val, exclusive));
case oper_t::LTE:
return nonwrapping_range<bytes>::make_ending_with(interval_bound(val, inclusive));
default:
throw std::logic_error(format("to_range: unknown comparison operator {}", op));
}
}
} // anonymous namespace
expression make_conjunction(expression a, expression b) {
auto children = explode_conjunction(std::move(a));
boost::copy(explode_conjunction(std::move(b)), back_inserter(children));
return conjunction{std::move(children)};
}
bool is_satisfied_by(
const expression& restr,
const std::vector<bytes>& partition_key, const std::vector<bytes>& clustering_key,
const query::result_row_view& static_row, const query::result_row_view* row,
const selection& selection, const query_options& options) {
const auto regulars = get_non_pk_values(selection, static_row, row);
return is_satisfied_by(
restr, {options, row_data_from_partition_slice{partition_key, clustering_key, regulars, selection}});
}
bool is_satisfied_by(
const expression& restr,
const schema& schema, const partition_key& key, const clustering_key_prefix& ckey, const row& cells,
const query_options& options, gc_clock::time_point now) {
return is_satisfied_by(restr, {options, row_data_from_mutation{key, ckey, cells, schema, now}});
}
std::vector<bytes_opt> first_multicolumn_bound(
const expression& restr, const query_options& options, statements::bound bnd) {
auto found = find_atom(restr, [bnd] (const binary_operator& oper) {
return matches(oper.op, bnd) && std::holds_alternative<std::vector<column_value>>(oper.lhs);
});
if (found) {
return static_pointer_cast<tuples::value>(found->rhs->bind(options))->get_elements();
} else {
return std::vector<bytes_opt>{};
}
}
value_set possible_lhs_values(const column_definition* cdef, const expression& expr, const query_options& options) {
const auto type = cdef ? get_value_comparator(cdef) : long_type.get();
return std::visit(overloaded_functor{
[] (bool b) {
return b ? unbounded_value_set : empty_value_set;
},
[&] (const conjunction& conj) {
return boost::accumulate(conj.children, unbounded_value_set,
[&] (const value_set& acc, const expression& child) {
return intersection(
std::move(acc), possible_lhs_values(cdef, child, options), type);
});
},
[&] (const binary_operator& oper) -> value_set {
return std::visit(overloaded_functor{
[&] (const column_value& col) -> value_set {
if (!cdef || cdef != col.col) {
return unbounded_value_set;
}
if (is_compare(oper.op)) {
const auto val = to_bytes_opt(oper.rhs->bind_and_get(options));
if (!val) {
return empty_value_set; // All NULL comparisons fail; no column values match.
}
return oper.op == oper_t::EQ ? value_set(value_list{*val})
: to_range(oper.op, *val);
} else if (oper.op == oper_t::IN) {
return get_IN_values(oper.rhs, options, type->as_less_comparator());
}
throw std::logic_error(format("possible_lhs_values: unhandled operator {}", oper));
},
[&] (const std::vector<column_value>& cvs) -> value_set {
if (!cdef) {
return unbounded_value_set;
}
const auto found = boost::find_if(
cvs, [&] (const column_value& c) { return c.col == cdef; });
if (found == cvs.end()) {
return unbounded_value_set;
}
const auto column_index_on_lhs = std::distance(cvs.begin(), found);
if (is_compare(oper.op)) {
// RHS must be a tuple due to upstream checks.
bytes_opt val = get_tuple(*oper.rhs, options)->get_elements()[column_index_on_lhs];
if (!val) {
return empty_value_set; // All NULL comparisons fail; no column values match.
}
if (oper.op == oper_t::EQ) {
return value_list{*val};
}
if (column_index_on_lhs > 0) {
// A multi-column comparison restricts only the first column, because
// comparison is lexicographical.
return unbounded_value_set;
}
return to_range(oper.op, *val);
} else if (oper.op == oper_t::IN) {
return get_IN_values(oper.rhs, column_index_on_lhs, options, type->as_less_comparator());
}
return unbounded_value_set;
},
[&] (token) -> value_set {
if (cdef) {
return unbounded_value_set;
}
const auto val = to_bytes_opt(oper.rhs->bind_and_get(options));
if (!val) {
return empty_value_set; // All NULL comparisons fail; no token values match.
}
if (oper.op == oper_t::EQ) {
return value_list{*val};
} else if (oper.op == oper_t::GT) {
return nonwrapping_range<bytes>::make_starting_with(interval_bound(*val, exclusive));
} else if (oper.op == oper_t::GTE) {
return nonwrapping_range<bytes>::make_starting_with(interval_bound(*val, inclusive));
}
static const bytes MININT = serialized(std::numeric_limits<int64_t>::min()),
MAXINT = serialized(std::numeric_limits<int64_t>::max());
// Undocumented feature: when the user types `token(...) < MININT`, we interpret
// that as MAXINT for some reason.
const auto adjusted_val = (*val == MININT) ? serialized(MAXINT) : *val;
if (oper.op == oper_t::LT) {
return nonwrapping_range<bytes>::make_ending_with(interval_bound(adjusted_val, exclusive));
} else if (oper.op == oper_t::LTE) {
return nonwrapping_range<bytes>::make_ending_with(interval_bound(adjusted_val, inclusive));
}
throw std::logic_error(format("get_token_interval invalid operator {}", oper.op));
},
}, oper.lhs);
},
}, expr);
}
nonwrapping_range<bytes> to_range(const value_set& s) {
return std::visit(overloaded_functor{
[] (const nonwrapping_range<bytes>& r) { return r; },
[] (const value_list& lst) {
if (lst.size() != 1) {
throw std::logic_error(format("to_range called on list of size {}", lst.size()));
}
return nonwrapping_range<bytes>::make_singular(lst[0]);
},
}, s);
}
bool is_supported_by(const expression& expr, const secondary_index::index& idx) {
using std::placeholders::_1;
return std::visit(overloaded_functor{
[&] (const conjunction& conj) {
return boost::algorithm::all_of(conj.children, std::bind(is_supported_by, _1, idx));
},
[&] (const binary_operator& oper) {
return std::visit(overloaded_functor{
[&] (const column_value& col) {
return idx.supports_expression(*col.col, oper.op);
},
[&] (const std::vector<column_value>& cvs) {
return boost::algorithm::any_of(cvs, [&] (const column_value& c) {
return idx.supports_expression(*c.col, oper.op);
});
},
[&] (const token&) { return false; },
}, oper.lhs);
},
[] (const auto& default_case) { return false; }
}, expr);
}
bool has_supporting_index(
const expression& expr,
const secondary_index::secondary_index_manager& index_manager,
allow_local_index allow_local) {
const auto indexes = index_manager.list_indexes();
const auto support = std::bind(is_supported_by, expr, std::placeholders::_1);
return allow_local ? boost::algorithm::any_of(indexes, support)
: boost::algorithm::any_of(
indexes | filtered([] (const secondary_index::index& i) { return !i.metadata().local(); }),
support);
}
std::ostream& operator<<(std::ostream& os, const column_value& cv) {
os << *cv.col;
if (cv.sub) {
os << '[' << *cv.sub << ']';
}
return os;
}
std::ostream& operator<<(std::ostream& os, const expression& expr) {
std::visit(overloaded_functor{
[&] (bool b) { os << (b ? "TRUE" : "FALSE"); },
[&] (const conjunction& conj) { fmt::print(os, "({})", fmt::join(conj.children, ") AND (")); },
[&] (const binary_operator& opr) {
std::visit(overloaded_functor{
[&] (const token& t) { os << "TOKEN"; },
[&] (const column_value& col) {
fmt::print(os, "({})", col);
},
[&] (const std::vector<column_value>& cvs) {
fmt::print(os, "(({}))", fmt::join(cvs, ","));
},
}, opr.lhs);
os << ' ' << opr.op << ' ' << *opr.rhs;
},
}, expr);
return os;
}
sstring to_string(const expression& expr) {
return fmt::format("{}", expr);
}
bool is_on_collection(const binary_operator& b) {
if (b.op == oper_t::CONTAINS || b.op == oper_t::CONTAINS_KEY) {
return true;
}
if (auto cvs = std::get_if<std::vector<column_value>>(&b.lhs)) {
return boost::algorithm::any_of(*cvs, [] (const column_value& v) { return v.sub; });
}
return false;
}
expression replace_column_def(const expression& expr, const column_definition* new_cdef) {
return std::visit(overloaded_functor{
[] (bool b){ return expression(b); },
[&] (const conjunction& conj) {
const auto applied = conj.children | transformed(
std::bind(replace_column_def, std::placeholders::_1, new_cdef));
return expression(conjunction{std::vector(applied.begin(), applied.end())});
},
[&] (const binary_operator& oper) {
return std::visit(overloaded_functor{
[&] (const column_value& col) {
return expression(binary_operator{column_value{new_cdef}, oper.op, oper.rhs});
},
[&] (const std::vector<column_value>& cvs) -> expression {
throw std::logic_error(format("replace_column_def invalid LHS: {}", to_string(oper)));
},
[&] (const token&) { return expr; },
}, oper.lhs);
},
}, expr);
}
std::ostream& operator<<(std::ostream& s, oper_t op) {
switch (op) {
case oper_t::EQ:
return s << "=";
case oper_t::NEQ:
return s << "!=";
case oper_t::LT:
return s << "<";
case oper_t::LTE:
return s << "<=";
case oper_t::GT:
return s << ">";
case oper_t::GTE:
return s << ">=";
case oper_t::IN:
return s << "IN";
case oper_t::CONTAINS:
return s << "CONTAINS";
case oper_t::CONTAINS_KEY:
return s << "CONTAINS KEY";
case oper_t::IS_NOT:
return s << "IS NOT";
case oper_t::LIKE:
return s << "LIKE";
}
__builtin_unreachable();
}
} // namespace expr
} // namespace cql3
template <>
struct fmt::formatter<cql3::expr::expression> {
constexpr auto parse(format_parse_context& ctx) {
return ctx.end();
}
template <typename FormatContext>
auto format(const cql3::expr::expression& expr, FormatContext& ctx) {
std::ostringstream os;
os << expr;
return format_to(ctx.out(), "{}", os.str());
}
};
template <>
struct fmt::formatter<cql3::expr::column_value> {
constexpr auto parse(format_parse_context& ctx) {
return ctx.end();
}
template <typename FormatContext>
auto format(const cql3::expr::column_value& col, FormatContext& ctx) {
std::ostringstream os;
os << col;
return format_to(ctx.out(), "{}", os.str());
}
};

Some files were not shown because too many files have changed in this diff Show More