Commit Graph

2592 Commits

Author SHA1 Message Date
Benny Halevy
761d62cd82 compare_atomic_cell_for_merge: compare value last for live cells
Currently, when two cells have the same write timestamp
and both are alive or expiring, we compare their value first,
before checking if either of them is expiring
and if both are expiring, comparing their expiration time
and ttl value to determine which of them will expire
later or was written later.

This was changed in CASSANDRA-14592
for consistency with the preference for dead cells over live cells,
as expiring cells will become tombstones at a future time
and then they'd win over live cells with the same timestamp,
hence they should win also before expiration.

In addition, comparing the cell value before expiration
can lead to unintuitive corner cases where rewriting
a cell using the same timestamp but different TTL
may cause scylla to return the cell with null value
if it expired in the meanwhile.

Also, when multiple columns are written using two upserts
using the same write timestamp but with different expiration,
selecting cells by their value may return a mixed result
where each cell is selected individually from either upsert,
by picking the cells with the largest values for each column,
while using the expiration time to break tie will lead
to a more consistent results where a set of cell from
only one of the upserts will be selected.

Fixes scylladb/scylladb#14182

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-06-20 10:10:39 +03:00
Benny Halevy
ec034b92c0 mutation_test: test_cell_ordering: improve debuggability
Currently, it is hard to tell which of the many sub-cases
fail in this unit test, in case any of them fails.

This change uses logging in debug and trace level
to help with that by reproducing the error
with --logger-log-level testlog=trace
(The cases are deterministic so reproducing should not
be a problem)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-06-20 10:10:39 +03:00
Botond Dénes
bd7a3e5871 Merge 'Sanitize sstables-making utils in tests' from Pavel Emelyanov
There are tons of wrappers that help test cases make sstables for their needs. And lots of code duplication in test cases that do parts of those helpers' work on their own. This set cleans some bits of those

Closes #14280

* github.com:scylladb/scylladb:
  test/utils: Generalize making memtable from vector<mutation>
  test/util: Generalize make_sstable_easy()-s
  test/sstable_mutation: Remove useless helper
  test/sstable_mutation: Make writer config in make_sstable_mutation_source()
  test/utils: De-duplicate make_sstable_containing-s
  test/sstable_compaction: Remove useless one-line local lambda
  test/sstable_compaction: Simplify sstable making
  test/sstables*: Make sstable from vector of mutations
  test/mutation_reader: Remove create_sstable() helper from test
2023-06-19 14:05:29 +03:00
Pavel Emelyanov
6bec03f96f test: Remove sstable_utils' storage_prefix() helper
It's excessive, test case that needs it can get storage prefix without
this fancy wrapper-helper

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #14273
2023-06-19 13:51:04 +03:00
Pavel Emelyanov
1a332ef5e2 test: Check sstable bytes correctness on S3 too
Commit 4e205650 (test: Verify correctness of sstable::bytes_on_disk())
added a test to verify that sstable::bytes_on_disk() is equal to the
real size of real files. The same test case makes sense for S3-backed
sstables as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #14272
2023-06-19 13:47:31 +03:00
Nadav Har'El
ac3d0d4460 Merge 'cql3: expr: support evaluate(column_mutation_attribute)' from Avi Kivity
In preparation for converting selectors to evaluate expressions,
add support for evaluating column_mutation_attribute (representing
the WRITETIME/TTL pseudo-functions).

A unit test is added.

Fixes #12906

Closes #14287

* github.com:scylladb/scylladb:
  test: expr: test evaluation of column_mutation_attribute
  test: lib: enhance make_evaluation_inputs() with support for ttls/timestamps
  cql3: expr: evaluate() column_mutation_attribute
2023-06-19 11:11:49 +03:00
Botond Dénes
562087beff Revert "Merge 'treewide: add uuid_sstable_identifier_enabled support' from Kefu Chai"
This reverts commit d1dc579062, reversing
changes made to 3a73048bc9.

Said commit caused regressions in dtests. We need to investigate and fix
those, but in the meanwhile let's revert this to reduce the disruption
to our workflows.

Refs: #14283
2023-06-19 08:49:27 +03:00
Avi Kivity
0f98e9f8c8 test: expr: test evaluation of column_mutation_attribute
There's no way to evaluate a column_mutation_attribute via CQL
yet (the only user uses old-style cql3::selection::selector), so
we only supply a unit test.
2023-06-18 22:47:46 +03:00
Nadav Har'El
97d444bbf7 Merge 'cql3/expression: implement evaluate(field_selection) ' from Jan Ciołek
Implement `expr:valuate()` for `expr::field_selection`.

`field_selection` is used to represent access to a struct field.
For example, with a UDT value:
```
CREATE TYPE my_type (a int, b int);
```
The expression `my_type_value.a` would be represented as a `field_selection`, which selects the field `a`.

Evaluating such an expression consists of finding the right element's value in a serialized UDT value and returning it.

Note that it's still not possible to use `field_selection` inside the `WHERE` clause. Enabling it would require changes to the grammar, as well as query planning, Current `statement_restrictions` just reacts with `on_internal_error` when it encounters a `field_selection`.
Nonetheless it's a step towards relaxing the grammar, and now it's finally possible to evaluate all kinds of prepared expressions (#12906)

Fixes: https://github.com/scylladb/scylladb/issues/12906

Closes #14235

* github.com:scylladb/scylladb:
  boost/expr_test: test evaluate(field_selection)
  cql3/expr: fix printing of field_selection
  cql3/expression: implement evaluate(field_selection)
  types/user: modify idx_of_field to use bytes_view
  column_identifer: add column_identifier_raw::text()
  types: add read_nth_user_type_field()
  types: add read_nth_tuple_element()
2023-06-18 11:08:25 +03:00
Pavel Emelyanov
85310bc043 test/sstable_mutation: Remove useless helper
There are two make_sstable_mutation_source() helpers that call one
another and test cases only need one of them, so leave just one that's
in use.

Also don't pass env's tempdir to make_sstable() util call, it can get
env's tempdir on its own.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-16 21:21:40 +03:00
Pavel Emelyanov
4a7be304ac test/sstable_mutation: Make writer config in make_sstable_mutation_source()
These local helpers accept writer config which's made the same way by
callers, so the helpers can do it on their own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-16 21:20:50 +03:00
Pavel Emelyanov
753b674c31 test/sstable_compaction: Remove useless one-line local lambda
The get_usable_sst() wrapper lambda is not needed, calling the
make_sstable_containing() is shorter

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-16 21:19:15 +03:00
Pavel Emelyanov
5b46993438 test/sstable_compaction: Simplify sstable making
There's a temporary memtable and on-stack lambda that makes the
mutation. Both are overkill, make_sstable_containing() can work on just
plan on-stack-constructed mutation

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-16 21:18:13 +03:00
Pavel Emelyanov
ce29f41436 test/sstables*: Make sstable from vector of mutations
There are many cases that want to call make_sstable_containing() with
the vector of mutations at hand. For that they apply it to a temporary
memtable, but sstable-utils can work with the mutations vector as well

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-16 21:17:12 +03:00
Pavel Emelyanov
c2eb3e2c4c test/mutation_reader: Remove create_sstable() helper from test
It's a one-liner wrapper, caller can get the same result with existing
utils facilities

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-16 21:16:34 +03:00
Pavel Emelyanov
900c609269 Merge 'Initialize query_processor early, without messaging_service or gossiper' from Kamil Braun
In https://github.com/scylladb/scylladb/pull/14231 we split `storage_proxy` initialization into two phases: for local and remote parts. Here we do the same with `query_processor`. This allows performing queries for local tables early in the Scylla startup procedure, before we initialize services used for cluster communication such as `messaging_service` or `gossiper`.

Fixes: #14202

As a follow-up we will simplify `system_keyspace` initialization, making it available earlier as well.

Closes #14256

* github.com:scylladb/scylladb:
  main, cql_test_env: start `query_processor` early
  cql3: query_processor: split `remote` initialization step
  cql3: query_processor: move `migration_manager&`, `forwarder&`, `group0_client&` to a `remote` object
  cql3: query_processor: make `forwarder()` private
  cql3: query_processor: make `get_group0_client()` private
  cql3: strongly_consistent_modification_statement: fix indentation
  cql3: query_processor: make `get_migration_manager` private
  tracing: remove `qp.get_migration_manager()` calls
  table_helper: remove `qp.get_migration_manager()` calls
  thrift: handler: move implementation of `execute_schema_command` to `query_processor`
  data_dictionary: add `get_version`
  cql3: statements: schema_altering_statement: move `execute0` to `query_processor`
  cql3: statements: pass `migration_manager&` explicitly to `prepare_schema_mutations`
  main: add missing `supervisor::notify` message
2023-06-16 17:41:08 +03:00
Jan Ciolek
d6728a7eb5 boost/expr_test: test evaluate(field_selection)
Add a unit test which tests evaluating field selections.

Alas at the moment it's impossible to add a cql-pytest,
as the grammar and query planning doesn't handle field
selections inside the WHERE clause.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-06-16 01:21:02 +02:00
Nadav Har'El
e1513f1199 Merge 'cql3: prepare selectors' from Avi Kivity
CQL statements carry expressions in many contexts: the SELECT, WHERE, SET, and IF clauses, plus various attributes. Previously, each of these contexts had its own representation for an expression, and another one for the same expression but before preparation. We have been gradually moving towards a uniform representation of expressions.

This series tackles SELECT clause elements (selectors), in their unprepared phase. It's relatively simple since there are only five types of expression components (column references, writetime/ttl modifiers, function calls, casts, and field selections). Nevertheless, there isn't much commonality with previously converted expression elements so quite a lot of code is involved.

After the series, we are still left with a custom post-prepare representation of expressions. It's quite complicated since it deals with two passes, for aggregation, so it will be left for another series.

Closes #14219

* github.com:scylladb/scylladb:
  cql3: seletor: drop inheritance from assignment_testable
  cql3: selection: rely on prepared expressions
  cql3: selection: prepare selector expressions
  cql3: expr: match counter arguments to function parameters expecting bigint
  cql3: expr: avoid function constant-folding if a thread is needed
  cql3: add optional type annotation to assignment_testable
  cql3: expr: wire unresolved_identifier to test_assignment()
  cql3: expr: support preparing column_mutation_attribute
  cql3: expr: support preparing SQL-style casts
  cql3: expr: support preparing field_selection expressions
  cql3: expr: make the two styles of cast expressions explicit
  cql3: error injection functions: mark enabled_injections() as impure
  cql3: eliminate dynamic_cast<selector> from functions::get()
  cql3: test_assignment: pass optional schema everywhere
  cql3: expr: prepare_expr(): allow aggregate functions
  cql3: add checks for aggregation functions after prepare
  cql3: expr: add verify_no_aggregate_functions() helper
  test: add regression test for rejection of aggregates in the WHERE clause
  cql3: expr: extract column_mutation_attribute_type
  cql3: expr: add fmt formatter for column_mutation_attribute_kind
  cql3: statements: select_statement: reuse to_selectable() computation in SELECT JSON
2023-06-15 15:59:41 +03:00
Kefu Chai
2d265e860d replica,sstable: introduce invalid generation id
the invalid sstable id is the NULL of a sstable identifier. with
this concept, it would be a lot simpler to find/track the greatest
generation. the complexity is hidden in the generation_type, which
compares the a) integer-based identifiers b) uuid-based identifiers
c) invalid identitifer in different ways.

so, in this change

* the default constructor generation_type is
  now public.
* we don't check for empty generation anymore when loading
  SSTables or enumerating them.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-06-15 17:54:59 +08:00
Kefu Chai
939fa087cc sstables, replica: pass uuid_sstable_identifiers to generation generator
before this change, we assume that generation is always integer based.
in order to enable the UUID-based generation identifier if the related
option is set, we should populate this option down to generation generator.

because we don't have access to the cluster features in some places where
a new generation is created, a new accessor exposing feature_service from
sstable manager is added.

Fixes #10459
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-06-15 17:54:59 +08:00
Kefu Chai
15543464ce sstables, replica: support UUID in generation_type
this change generalize the value of generation_type so it also
supports UUID based identifier.

* sstables/generation_type.h:
  - add formatter and parse for UUID. please note, Cassandra uses
    a different format for formatting the SSTable identifier. and
    this formatter suits our needs as it uses underscore "_" as the
    delimiter, as the file name of components uses dash "-" as the
    delimiter. instead of reinventing the formatting or just use
    another delimiter in the stringified UUID, we choose to use the
    Cassandra's formatting.
  - add accessors for accessing the type and value of generation_type
  - add constructor for constructing generation_type with UUID and
    string.
  - use hash for placing sstables with uuid identifiers into shards
    for more uniformed distrbution of tables in shards.
* replica/table.cc:
  - only update the generator if the given generation contains an
    integer
* test/boost:
  - add a simple test to verify the generation_type is able to
    parse and format

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-06-15 17:54:59 +08:00
Kamil Braun
59d4bb3787 tracing: remove qp.get_migration_manager() calls
Pass `migration_manager&` from top-level instead.
2023-06-15 09:48:54 +02:00
Avi Kivity
b7bbcdd178 cql3: expr: support preparing column_mutation_attribute
Fairly straightforward. A unit test is added.
2023-06-13 21:04:49 +03:00
Pavel Emelyanov
ce6a1ca13b Update seastar submodule
* seastar afe39231...99d28ff0 (16):
  > file/util: Include seastar.hh
  > http/exception: Use http::reply explicitly
  > http/client: Include lost condition-variable.hh
  > util: file: drop unnecessary include of reactor.hh
  > tests: perf: add a markdown printer
  > http/client: Introduce unexpected_status_error for client requests
  > sharded: avoid #include <seastar/core/reactor.hh> for run_in_background()
  > code: Use std::is_invocable_r_v instead of InvokeReturns
  > http/client: Add ability to change pool size on the fly
  > http/client: Add getters for active/idle connections counts
  > http/client: Count and limit the number of connections
  > http/client: Add connection->client RAII backref
  > build: use the user-specified compiler when building DPDK
  > build: use proper toolchain based on specified compiler
  > build: only pass CMAKE_C_COMPILER when building ingredients
  > build: use specified compiler when building liburing

Two changes are folded into the commit:

1. missing seastar/core/coroutine.hh include in one .cc file that
   got it indirectly included before seastar reactor.hh drop from
   file.hh

2. http client now returns unexpected_status_error instead of
   std::runtime_error, so s3 test is updated respectively

Closes #14168
2023-06-07 20:25:49 +03:00
Nadav Har'El
5984db047d Merge 'mv: forbid IS NOT NULL on columns outside the primary key' from Jan Ciołek
statement_restrictions: forbid IS NOT NULL on columns outside the primary key

IS NOT NULL is currently allowed only when creating materialized views.
It's used to convey that the view will not include any rows that would make the view's primary key columns NULL.

Generally materialized views allow to place restrictions on the primary key columns, but restrictions on the regular columns are forbidden. The exception was IS NOT NULL - it was allowed to write regular_col IS NOT NULL. The problem is that this restriction isn't respected, it's just silently ignored (see #10365).

Supporting IS NOT NULL on regular columns seems to be as hard as supporting any other restrictions on regular columns.
It would be a big effort, and there are some reasons why we don't support them.

For now let's forbid such restrictions, it's better to fail than be wrong silently.

Throwing a hard error would be a breaking change.
To avoid breaking existing code the reaction to an invalid IS NOT NULL restrictions is controlled by the `strict_is_not_null_in_views` flag.

This flag can have the following values:
* `true` - strict checking. Having an `IS NOT NULL` restriction on a column that doesn't belong to the view's primary key causes an error to be thrown.
* `warn` - allow invalid `IS NOT NULL` restrictions, but throw a warning. The invalid restrictions are silently ignored.
* `false` - allow invalid `IS NOT NULL` restricitons, without any warnings or errors. The invalid restrictions are silently ignored.

The default values for this flag are `warn` in `db::config` and `true` in scylla.yaml.

This way the existing clusters will have `warn` by default, so they'll get a warning if they try to create such an invalid view.

New clusters with fresh scylla.yaml will have the flag set to `true`, as scylla.yaml overwrites the default value in `db::config`.
New clusters will throw a hard error for invalid views, but in older existing clusters it will just be a warning.
This way we can maintain backwards compatibility, but still move forward by rejecting invalid queries on new clusters.

Fixes: #10365

Closes #13013

* github.com:scylladb/scylladb:
  boost/restriction_test: test the strict_is_not_null_in_views flag
  docs/cql/mv: columns outside of view's primary key can't be restricted
  cql-pytest: enable test_is_not_null_forbidden_in_filter
  statement_restrictions: forbid IS NOT NULL on columns outside the primary key
  schema_altering_statement: return warnings from prepare_schema_mutations()
  db/config: add strict_is_not_null_in_views config option
  statement_restrictions: add get_not_null_columns()
  test: remove invalid IS NOT NULL restrictions from tests
2023-06-07 12:12:19 +03:00
Jan Ciolek
ec0cac8862 boost/restriction_test: test the strict_is_not_null_in_views flag
Add unit tests for the strict_is_not_null_in_views flag.
This flag controls the behavior in case of an invalid
IS NOT NULL restrictions on a materialized view column.

Materialized views allow only restricting columns
that belong to the view's primary key, all other
restrictions should be rejected.

There was a bug where IS NOT NULL restrictions
weren't rejected, but simply ignored instead.

This flags controls what should happen when the user
runs a query with such an invalid IS NOT NULL restriction.

strict_is_not_null_in_views can have the following values:
* `true` - strict checking, invalid queries are rejected
* `warn` - the query is allowed, but a warning is printed
* `false` - the query is allowed, the invalid restrictions
            are silently ignored.

The tests are based on the ones for strict_allow_filtering,
which reside in the lines preceding the newly added tests.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-06-07 02:30:11 +02:00
Pavel Emelyanov
66e43912d6 code: Switch to seastar API level 7
In that level no io_priority_class-es exist. Instead, all the IO happens
in the context of current sched-group. File API no longer accepts prio
class argument (and makes io_intent arg mandatory to impls).

So the change consists of
- removing all usage of io_priority_class
- patching file_impl's inheritants to updated API
- priority manager goes away altogether
- IO bandwidth update is performed on respective sched group
- tune-up scylla-gdb.py io_queues command

The first change is huge and was made semi-autimatically by:
- grep io_priority_class | default_priority_class
- remove all calls, found methods' args and class' fields

Patching file_impl-s is smaller, but also mechanical:
- replace io_priority_class& argument with io_intent* one
- pass intent to lower file (if applicatble)

Dropping the priority manager is:
- git-rm .cc and .hh
- sed out all the #include-s
- fix configure.py and cmakefile

The scylla-gdb.py update is a bit hairry -- it needs to use task queues
list for IO classes names and shares, but to detect it should it checks
for the "commitlog" group is present.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13963
2023-06-06 13:29:16 +03:00
Avi Kivity
26c8470f65 treewide: use #include <seastar/...> for seastar headers
We treat Seastar as an external library, so fix the few places
that didn't do so to use angle brackets.

Closes #14037
2023-06-06 08:36:09 +03:00
Petr Gusev
0415ac3d5f test_secondary_index_collections: change insert/create index order
Secondary index creation is asynchronous, meaning it
takes time for existing data to be reflected within
the index. However, new data added after the
index is created should appear in it immediately.

The test consisted of two parts. The first created
a series of indexes for one table, added
test data to the table, and then ran a series of checks.
In the second part, several new indexes were added to
the same table, and checks were made to make sure that
already existing data would appear in them. This
last part was flaky.

The patch just moves the index creation statements
from the second part to the first.

Fixes: #14076

Closes #14090
2023-05-31 23:30:57 +03:00
Raphael S. Carvalho
23443e0574 compaction: Fix incremental compaction for sstable cleanup
After c7826aa910, sstable runs are cleaned up together.

The procedure which executes cleanup was holding reference to all
input sstables, such that it could later retry the same cleanup
job on failure.

Turns out it was not taking into account that incremental compaction
will exhaust the input set incrementally.

Therefore cleanup is affected by the 100% space overhead.

To fix it, cleanup will now have the input set updated, by removing
the sstables that were already cleaned up. On failure, cleanup
will retry the same job with the remaining sstables that weren't
exhausted by incremental compaction.

New unit test reproduces the failure, and passes with the fix.

Fixes #14035.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14038
2023-05-31 06:46:12 +03:00
Kefu Chai
82cac8e7cf treewide: s/std::source_location/seastar::compact::source_location/
CWG 2631 (https://cplusplus.github.io/CWG/issues/2631.html) reports
an issue on how the default argument is evaluated. this problem is
more obvious when it comes to how `std::source_location::current()`
is evaluated as a default argument. but not all compilers have the
same behavior, see https://godbolt.org/z/PK865KdG4.

notebaly, clang-15 evaluates the default argument at the callee
site. so we need to check the capability of compiler and fall back
to the one defined by util/source_location-compat.hh if the compiler
suffers from CWG 2631. and clang-16 implemented CWG2631 in
https://reviews.llvm.org/D136554. But unfortunately, this change
was not backported to clang-15.

before switching over to clang-16, for using std::source_location::current()
as the default parameter and expect the behavior defined by CWG2631,
we have to use the compatible layer provided by Seastar. otherwise
we always end up having the source_location at the callee side, which
is not interesting under most circumstances.

so in this change, all places using the idiom of passing
std::source_location::current() as the default parameter are changed
to use seastar::compat::source_location::current(). despite that
we have `#include "seastarx.h"` for opening the seastar namespace,
to disambiguate the "namespace compat" defined somewhere in scylladb,
the fully qualified name of
`seastar::compat::source_location::current()` is used.

see also 09a3c63345, where we used
std::source_location as an alias of std::experimental::source_location
if it was available. but this does not apply to the settings of our
current toolchain, where we have GCC-12 and Clang-15.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14086
2023-05-30 15:10:12 +03:00
Kefu Chai
74dd6dc185 Revert "test: string_format_test: don't compare std::string with sstring"
This reverts commit 3c54d5ec5e.

The reverted change fixed the FTBFS of the test in question with Clang 16,
which rightly stopped convert the LHS of `"hello" == sstring{"hello"}` to
the type of the type acceptable by the member operator even we have a
constructor for this conversion, like

class sstring {
public:
  bar_t(const char*);
  bool operator==(const sstring&) const;
  bool operator!=(const sstring&) const;
};

because we have an operator!=, as per the draft of C++ standard
https://eel.is/c++draft/over.match.oper#4 :

> A non-template function or function template F named operator==
> is a rewrite target with first operand o unless a search for the
> name operator!= in the scope S from the instantiation context of
> the operator expression finds a function or function template
> that would correspond ([basic.scope.scope]) to F if its name were
> operator==, where S is the scope of the class type of o if F is a
> class member, and the namespace scope of which F is a member
> otherwise.

in 397f4b51c3, the seastar submodule was
updated. in which, we now have a dedicated overload for the `const char*`
case. so the compiler is now able to compile the expression like
`"hello" == sstring{"hello"}` in C++20 now.

so, in this change, the workaround is reverted.

Closes #14040
2023-05-29 23:03:24 +03:00
Kefu Chai
af65d5a1e8 test: sstable: use BOOST_REQUIRE_*() when appropriate
instead of using BOOST_REQUIRE() use, for instance
BOOST_REQUIRE_NE() and BOOST_REQUIRE_EQUAL() for better
error message when the test fails, as Boost::test would
print out the LHS and RHS of the comparison expression
if it fails.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14050
2023-05-27 11:10:47 +03:00
Botond Dénes
5a14c3311a Merge 'Break S3 upload 50Gb file limit' from Pavel Emelyanov
Current S3 uploading sink has implicit limit for the final file size that comes from two places. First, S3 protocol declares that uploading parts count from 1 to 10000 (inclusive). Second, uploading sink sends out parts once they grow above S3 minimal part size which is 5Mb. Since sstables puts data in 128kb (or smaller) portions, parts are almost exactly 5Mb in size, so the total uploading size cannot grow above ~50Gb. That's too low.

To break the limit the new sink (called jumbo sink) uses the UploadPartCopy S3 call that helps splicing several objects into one right on the server. Jumbo sink starts uploading parts into an intermediate temporary object called a piece and named ${original_object}_${piece_number}. When the number of parts in current piece grows above the configured limit the piece is finalized and upload-copied into the object as its next part, then deleted. This happens in the background, meanwhile the new piece is created and subsequent data is put into it. When the sink is flushed the current piece is flushed as is and also squashed into the object.

The new jumbo sink is capable of uploading ~500Tb of data, which looks enough.

fixes: #13019

Closes #13577

* github.com:scylladb/scylladb:
  sstables: Switch data and index sink to use jumbo uploader
  s3/test: Tune-up multipart upload test alignment
  s3/test: Add jumbo upload test
  s3/client: Wait for background upload fiber on close-abort
  c3/client: Implement jumbo upload sink
  s3/client: Move memory buffers to upload_sink from base
  s3/client: Move last part upload out of finalize_upload()
  s3/client: Merge do_flush() with upload_part()
  s3/client: Rename upload_sink -> upload_sink_base
2023-05-25 11:44:06 +03:00
Tomasz Grabiec
51e3b9321b Merge ' mvcc: make schema upgrades gentle' from Michał Chojnowski
After a schema change, memtable and cache have to be upgraded to the new schema. Currently, they are upgraded (on the first access after a schema change) atomically, i.e. all rows of the entry are upgraded with one non-preemptible call. This is a one of the last vestiges of the times when partition were treated atomically, and it is a well known source of numerous large stalls.

This series makes schema upgrades gentle (preemptible). This is done by co-opting the existing MVCC machinery.
Before the series, all partition_versions in the partition_entry chain have the same schema, and an entry upgrade replaces the entire chain with a single squashed and upgraded version.
After the series, each partition_version has its own schema. A partition entry upgrade happens simply by adding an empty version with the new schema to the head of the chain. Row entries are upgraded to the current schema on-the-fly by the cursor during reads, and by the MVCC version merge ongoing in the background after the upgrade.

The series:
1. Does some code cleanup in the mutation_partition area.
2. Adds a schema field to partition_version and removes it from its containers (partition_snapshot, cache_entry, memtable_entry).
3. Adds upgrading variants of constructors and apply() for `row` and its wrappers.
4. Prepares partition_snapshot_row_cursor, mutation_partition_v2::apply_monotonically and partition_snapshot::merge_partition_versions for dealing with heterogeneous version chains.
5. Modifies partition_entry::upgrade to perform upgrades by extending the version chain with a new schema instead of squashing it to a single upgraded version.

Fixes #2577

Closes #13761

* github.com:scylladb/scylladb:
  test: mvcc_test: add a test for gentle schema upgrades
  partition_version: make partition_entry::upgrade() gentle
  partition_version: handle multi-schema snapshots in merge_partition_versions
  mutation_partition_v2: handle schema upgrades in apply_monotonically()
  partition_version: remove the unused "from" argument in partition_entry::upgrade()
  row_cache_test: prepare test_eviction_after_schema_change for gentle schema upgrades
  partition_version: handle multi-schema entries in partition_entry::squashed
  partition_snapshot_row_cursor: handle multi-schema snapshots
  partiton_version: prepare partition_snapshot::squashed() for multi-schema snapshots
  partition_version: prepare partition_snapshot::static_row() for multi-schema snapshots
  partition_version: add a logalloc::region argument to partition_entry::upgrade()
  memtable: propagate the region to memtable_entry::upgrade_schema()
  mutation_partition: add an upgrading variant of lazy_row::apply()
  mutation_partition: add an upgrading variant of rows_entry::rows_entry
  mutation_partition: switch an apply() call to apply_monotonically()
  mutation_partition: add an upgrading variant of rows_entry::apply_monotonically()
  mutation_fragment: add an upgrading variant of clustering_row::apply()
  mutation_partition: add an upgrading variant of row::row
  partition_version: remove _schema from partition_entry::operator<<
  partition_version: remove the schema argument from partition_entry::read()
  memtable: remove _schema from memtable_entry
  row_cache: remove _schema from cache_entry
  partition_version: remove the _schema field from partition_snapshot
  partition_version: add a _schema field to partition_version
  mutation_partition: change schema_ptr to schema& in mutation_partition::difference
  mutation_partition: change schema_ptr to schema& in mutation_partition constructor
  mutation_partition_v2: change schema_ptr to schema& in mutation_partition_v2 constructor
  mutation_partition: add upgrading variants of row::apply()
  partition_version: update the comment to apply_to_incomplete()
  mutation_partition_v2: clean up variants of apply()
  mutation_partition: remove apply_weak()
  mutation_partition_v2: remove a misleading comment in apply_monotonically()
  row_cache_test: add schema changes to test_concurrent_reads_and_eviction
  mutation_partition: fix mixed-schema apply()
2023-05-24 22:58:43 +02:00
Nadav Har'El
7cdee303cf Merge 'ks_prop_defs: disallow empty replication factor string in NTS' from Jan Ciołek
A CREATE KEYSPACE query which specifies an empty string ('') as the replication factor value is currently allowed:
```cql
CREATE KEYSPACE bad_ks WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': ''};
```

This is wrong, it's invalid to have an empty replication factor string.
It creates a keyspace without any replication, so the tables inside of it aren't writable.

Trying to create a `SimpleStrategy` keyspace with such replication factor throws an error, `NetworkTopolgyStrategy` should do the same.

The problem was in `prepare_options`, it treated an empty replication factor string as no replication factor.
Changing it to `std::optional` fixes the problem,
Now `std::nullopt` means no replication factor, and `make_optional("")` means that there is a replication factor, but it's described by an empty string.

Fixes: https://github.com/scylladb/scylladb/issues/13986

Closes #13988

* github.com:scylladb/scylladb:
  test/network_topology_strategy_test: Test NTS with replication_factor option in test_invalid_dcs
  ks_prop_defs: disallow empty replication factor string in NTS
2023-05-24 21:39:31 +03:00
Botond Dénes
2526b232f1 Merge 'Remove explicit default_priority_class() usage from sstable aux methods' from Pavel Emelyanov
There are few places in sstables/ code that require caller to specify priority class to pass it along to file stream options. All these callers use default class, so it makes little sense to keep it. This change makes the sched classes unification mega patch a bit smaller.

ref: #13963

Closes #13996

* github.com:scylladb/scylladb:
  sstables: Remove default prio class from rewrite_statistics()
  sstables: Remove prio class from validate_checksums subs
  sstables: Remove always default io-prio from validate_checksums()
2023-05-24 09:23:24 +03:00
Botond Dénes
313ae4ddac Merge 'Generalize some file accessing helpers in test/' from Pavel Emelyanov
Several test cases use common operations one files like existence checking, content comparing, etc. with the help of home-brew local helpers. The set makes use of some existing seastar:: ones and generalizes others into test/lib/. The primary intent here is `57 insertions(+), 135 deletions(-)`

Closes #13936

* github.com:scylladb/scylladb:
  test: Generalize touch_file() into test_utils.*
  test/database: Generalize file/dir touch and exists checks
  test/sstables: Use seastar::file_exists() to check
  test/sstables: Remove sstdesc
  test/sstables: Use compare_files from utils/ in sstable_test
  test/sstables: Use compare_files() from utils/ in sstable_3_x_test
  test/util: Add compare_file() helpers
2023-05-24 08:43:41 +03:00
Avi Kivity
da5467c687 Merge 'Use implicit default prio class in tests' from Pavel Emelyanov
There are several places in tests that either use default_priority_class() explicitly, or use some specific prio class obtained from priority manager. There's currently an ongoing work to remove all priority classes, this set makes the final patch a bit smaller and easier to review. In particular -- in many cases default_priority_class() is implicit and can be avoided by callers. Also, using any prio class by test is excessive, it can go with (implicit) default_priority_class.

ref: #13963

Closes #13991

* github.com:scylladb/scylladb:
  test, memtable: Use default prio class
  test, memtable: Add default value for make_flush_reader() last arg
  test, view_build: Use default prio class
  test, sstables: Use implicit default prio class in dma_write()
  test, sstables: Use default sstable::get_writer()'s prio class arg
2023-05-23 18:46:52 +03:00
Avi Kivity
3956e01640 Merge 'Clean index_reader API' from Pavel Emelyanov
The way index_reader maintains io_priority_class can be relaxed a bit. The main intent is to shorten the #13963 final patch a bit, as a side effect index_reader gets its portion of API polishing.

ref: #13963

Closes #13992

* github.com:scylladb/scylladb:
  index_reader: Introduce and use default arguments to constructor
  index_reader: Use _pc field in get_file_input_stream_options() directly
  index_reader: Move index_reader::get_file_input_stream_options to private: block
2023-05-23 18:46:26 +03:00
Avi Kivity
1c0e8c25ca Merge 'multishard_mutation_query: make reader_context::lookup_readers() exception safe' from Botond Dénes
With regards to closing the looked-up querier if an exception is thrown. In particular, this requires closing the querier if a semaphore mismatch is detected. Move the table lookup above the line where the querier is looked up, to avoid having to handle the exception from it. As a consequence of closing the querier on the error path, the lookup lambda has to be made a coroutine. This is sad, but this is executed once per page, so its cost should be insignificant when spread over an
entire page worth of work.

Also add a unit test checking that the mismatch is detected in the first place and that readers are closed.

Fixes: #13784

Closes #13790

* github.com:scylladb/scylladb:
  test/boost/database_test: add unit test for semaphore mismatch on range scans
  partition_slice_builder: add set_specific_ranges()
  multishard_mutation_query: make reader_context::lookup_readers() exception safe
  multishard_mutation_query: lookup_readers(): make inner lambda a coroutine
2023-05-23 14:05:10 +03:00
Pavel Emelyanov
7396d9d291 sstables: Remove always default io-prio from validate_checksums()
All calls to sstables::validate_checksums() happen with explicitly
default priority class. Just hard-code it as such in the method

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 13:54:31 +03:00
Pavel Emelyanov
2bb024c948 index_reader: Introduce and use default arguments to constructor
Most of creators of index_reader construct it with default prio class,
null trace pointer and use_caching::yes. Assigning implicit defaults to
constructor arguments keeps the code shorter and easier to read.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 11:29:04 +03:00
Pavel Emelyanov
9bdc0d3f44 test: Generalize touch_file() into test_utils.*
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 10:40:55 +03:00
Pavel Emelyanov
730c0439e0 test/database: Generalize file/dir touch and exists checks
There are cases that implement the same set of lambda helpers. Keep them
common in this .cc file.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 10:40:55 +03:00
Pavel Emelyanov
54fb8a022e test/sstables: Use seastar::file_exists() to check
There's a rather boring test_sstable_exists() helper in the test that
can be replaced with a more standard seastar API call.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 10:40:54 +03:00
Pavel Emelyanov
c06b5e2714 test/sstables: Remove sstdesc
The helper class is used to transfer directory name and generation int
value into the compare_sstables() helper. Remove both, the utils/ stuff
is useful enough not to use wrappers.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 10:40:20 +03:00
Pavel Emelyanov
c3dbe37669 test/sstables: Use compare_files from utils/ in sstable_test
There's yet another implementation of read-the-whole-file and
check-file-contents-matches helpers in the test. Replace it with the
utils/ facility. Next patch will be able to wash more stuff out of
this test.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 10:39:32 +03:00
Pavel Emelyanov
6619e87b70 test/sstables: Use compare_files() from utils/ in sstable_3_x_test
There's a static helper under the same name that can be replaced with
utils/ one. The code here runs in async context to .get0() the result.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 10:39:31 +03:00
Pavel Emelyanov
f9ff5cdfdf test, memtable: Use default prio class
Similarly to previous patch with view-building -- using default class is
OK for a unit test

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 10:21:27 +03:00