directory_lister provides a simpler interface compared to lister.
After creating the directory_lister,
its async get() method should be called repeatedly,
returning a std::optional<directory_entry> each call,
until it returns a disengaged entry or an error.
This is especially suitable for coroutines
as demonstrated in the unit tests that were added.
For example:
```c++
auto dl = directory_lister(path);
while (auto de = co_await dl.get()) {
co_await process(*de);
}
```
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#9835
* github.com:scylladb/scylla:
sstable_directory: process_sstable_dir: use directory_lister
sstable_directory: process_sstable_dir: fixup indentation
sstable_directory: coroutinize process_sstable_dir
lister: add directory_lister
There is a bug in `expr::visit`. When trying to return a reference from a visitor it actually returns a reference to some temporary location.
So trying to do something like:
```c++
const expression e = new_bind_variable(123);
const bind_variable& ref = visit(overloaded_functor {
[](const bind_variable& bv) -> const bind_variable& { return bv; },
[](const auto&) -> const bind_variable& { throw std::runtime_error("Unreachable"); }
}, e);
std::cout << ref << std::endl;
```
Would actually print a random stack location instead of the value inside of `e`.
Additionally trying to return a non-const reference doesn't compile.
Current implementation of `expr::visit` is:
```c++
auto visit(invocable_on_expression auto&& visitor, const expression& e) {
return std::visit(visitor, e._v->v);
}
```
For reference, `std::visit` looks like this:
```c++
template<typename _Res, typename _Visitor, typename... _Variants>
constexpr _Res
visit(_Visitor&& __visitor, _Variants&&... __variants)
{
return std::__do_visit<_Res>(std::forward<_Visitor>(__visitor),
std::forward<_Variants>(__variants)...);
}
```
The problem is that `auto` can evaluate to `int` or `float`, but not to `int&`.
It has now been changed to `decltype(auto)`, which is able to express references.
I also added a missing `std::forward` on the visitor argument.
The new version looks like this:
```c++
template <invocable_on_expression Visitor>
decltype(auto) visit(Visitor&& visitor, const expression& e) {
return std::visit(std::forward<Visitor>(visitor), e._v->v);
}
```
I added some tests of `expr::visit` in `boost/expr_test`, but sadly they are not as throughout as they could be, Ideally I could return a refernce from `std::visit` and `expr::visit` and then check that they both point to the same address in memory.
I can't do this because it would require to access a private field of `expression`.
Some test pass before the fix, even though they shouldn't, but I'm not sure how to make them better without making field of expression public.
I played around with some code, it can be found here: https://github.com/cvybhu/attached-files/blob/main/visit/visit_playground.cppCloses#10073
* github.com:scylladb/scylla:
cql3: expr: Add a test to show that std::forward is needed in expr::visit
cql3: expr: add std::forward in expr::visit
cql3: expr: Add tests for expr::visit
cql3: expr: Fix expr::visit so that it works with references
Adds a test with a vistior that can only be used as a rvalue.
Without std::forward in expr::visit this test doesn't compile.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add tests for new expr::visit to ensure that it is working correctly.
expr::visit had a hidden bug where trying to return a reference
actually returned a reference to freed location on the stack,
so now there are tests to ensure that everything works.
Sadly the test `expr_visit_const_ref` also passes
before the fix, but at lest expr_visit_ref doesn't compile before the fix.
It would be better to test this by taking references returned
by std::visit and expr::visit and checking that they point
to the same address in memory, but I can't do this
because I would have to access private field of expression.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Currently the keyspace is only auto-created for create type statements.
However the keyspace is needed even without UDTs being involved: for
example if the table contains a collection type. So auto-create the
keyspace unconditionally before preparing the first statement.
Also add a test-case with a create table statement which requires the
keyspace to be present at prepare time.
"
This series implements support for the ME sstable format (introduced
in C* 3.11.11).
Tests: unit(dev)
"
* tag 'me-sstable-format-v5' of https://github.com/cmm/scylla:
sstables: validate originating host id
sstable: add is_uploaded() predicate
config: make the ME sstable format default
scylla-gdb.py: recognize ME sstables
sstables: store originating host id in stats metadata
system_keyspace: cache local host id before flushing
database_test: ensure host id continuity
sstables_manager: add get_local_host_id() method and support
sstables_manager: formalize inheritability
system_keyspace, main: load (or create) local host id earlier
sstable_3_x_test: test ME sstable format too
add "ME_SSTABLE" cluster feature
add "sstable_format" config
add support for the ME sstable format
scylla-sstable: add ability to dump optionals and utils::UUID
sstables: add ability to write and parse optionals
globalize sstables::write(..., utils::UUID)
The "populate_from_quarantine_works" test case creates sstables with
one db config, then reads them with another. Ensure that both configs
have the same host id so the sstables pass validation.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Since ME sstable format includes originating host id in stats
metadata, local host id needs to be made available for writing and
validation.
Both Scylla server (where local host id comes from the `system.local`
table) and unit tests (where it is fabricated) must be accomodated.
Regardless of how the host id is obtained, it is stored in the db
config instance and accessed through `sstables_manager`.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Initialize it to "md" until ME format support is
complete (i.e. storing originating host id in sstable stats metadata
is implemented), so at present there is no observable change by
default.
Also declare "enable_sstables_md_format" unused -- the idea, going
forward, being that only "sstable_format" controls the written sstable
file format and that no more per-format enablement config options
shall be added.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
This commit changes the behavior of `exception_container::accept`. Now,
instead of throwing an `utils::bad_exception_container_access` exception
when the container is empty, the provided visitor is invoked with that
exception instead. There are two reasons for this change:
- The exception_container is supposed to allow handling exceptions
without using the costly C++'s exception runtime. Although an empty
container is an edge case, I think it the new behavior is more aligned
with the class' purpose. The old behavior can be simulated by
providing a visitor which throws when called with bad access
exception.
- The new behavior fixes a bug in `result_try`/`result_futurize_try`.
Before the change, if the `try` block returned a failed result with an
empty exception container, a bad access exception would either be
thrown or returned as an exceptional future without being handled by
the `catch` clauses. Although nobody is supposed to return such
result<>s on purpose, a moved out result can be returned by accident
and it's important for the exception handling logic to be correct in
such a situation.
Tests: unit(dev)
Closes#10086
CDC registers to the table-creation hook (before_create_column_family)
to add a second table - the CDC log table - to the same keyspace.
The handler function (on_before_update_column_family() in cdc/log.cc)
wants to retrieve the keyspace's definition, but that does NOT WORK if
we create the keyspace and table in one operation (which is exactly what
we intend to do in Alternator to solve issue #9868) - because at the
time of the hook, the keyspace does not yet exist in the schema.
It turns out that on_before_update_column_family() does not REALLY need
the keyspace. It needed it to pass it on to make_create_table_mutations()
but that function doesn't use the keyspace parameter passed to it! All
it needs is the keyspace's name - which is in the schema anyway and
doesn't need to be looked up.
So in this patch we fix make_create_table_mutations() to not require the
unused keyspace parameter - and fix the CDC code not to look for the
keyspace that is no longer needed.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220215162342.622509-1-nyh@scylladb.com>
Adds `utils::result_try` and `utils::result_futurize_try` - functions which allow to convert existing try..catch blocks into a version which handles C++ exceptions, failed results with exception containers and, depending on the function variant, exceptional futures using the same exception handling logic.
For example, you can convert the following try..catch block:
try {
return a_function_that_may_throw();
} catch (const my_exception& ex) {
return 123;
} catch (...) {
throw;
}
...to this:
return utils::result_try([&] {
return a_function_that_may_throw_or_return_a_failed_result();
}, utils::result_catch<my_exception>([&] (const Ex&) {
return 123;
}), utils::result_catch_dots([&] (auto&& handle) {
return handle.into_result();
});
Similarly, `utils::result_futurize_try` can be used to migrate `then_wrapped` or `f.handle_exception()` constructs.
As an example of the usability of the new constructs, two places in the current code which need to simultaneously handle exceptions and failed results are converted to use `result_try` and `result_futurize_try`.
Results of `perf_simple_query --smp 1 --operations-per-shard 1000000 --write`:
```
127041.61 tps ( 67.2 allocs/op, 14.2 tasks/op, 52422 insns/op)
126958.60 tps ( 67.2 allocs/op, 14.2 tasks/op, 52409 insns/op)
127088.37 tps ( 67.2 allocs/op, 14.2 tasks/op, 52411 insns/op)
127560.84 tps ( 67.2 allocs/op, 14.2 tasks/op, 52424 insns/op)
127826.61 tps ( 67.2 allocs/op, 14.2 tasks/op, 52406 insns/op)
126801.02 tps ( 67.2 allocs/op, 14.2 tasks/op, 52420 insns/op)
125371.51 tps ( 67.2 allocs/op, 14.2 tasks/op, 52425 insns/op)
126498.51 tps ( 67.2 allocs/op, 14.2 tasks/op, 52427 insns/op)
126359.41 tps ( 67.2 allocs/op, 14.2 tasks/op, 52423 insns/op)
126298.27 tps ( 67.2 allocs/op, 14.2 tasks/op, 52410 insns/op)
```
The number of tasks and allocations is unchanged. The number of instructions per operations seems similar, it may have increased slightly (by 10-20) but it's hard to tell for sure because of the noisiness of the results.
Tests: unit(dev)
Closes#10045
* github.com:scylladb/scylla:
transport: use result_try in process_request_one
storage_proxy: use result_futurize_try in mutate_end
storage_proxy: temporarily throw exception from result in mutate_end
utils: add result_try and result_futurize_try
After the mechanical change in fcb8d040e8
("treewide: use Software Package Data Exchange (SPDX) license identifiers"),
a few stray license blurbs or fragments thereof remain. In two cases
these were extra blurbs in code generators intended for the generated code,
in others they were just missed by the script.
Clean them up, adding an SPDX license identifier where needed.
Closes#10072
It now resembles the original parallel_for_each more, but uses a
coroutine instead of a custom `task` to collect not-ready futures.
Although the usage of a coroutine saves on allocations, the drawback is
that there is currently no way to co_await on a future and handle its
exception without throwing or without unconditionally allocating a
then_wrapped or handle_exception continuation - so it introduces a
rethrow.
Furthermore, now failed results and exceptions are treated as equals.
Previously, in case one parallel invocation returned failed result and
another returned an exception, the exception would always be returned.
Now, the failed result/exception of the invocation with the lowest index
is always preferred, regardless of the failure type.
The reimplementation manages to save about 350-400 instructions, one
task and one allocation in the perf_simple_query benchmark in write
mode.
Results from `perf_simple_query --smp 1 --operations-per-shard 1000000
--write` (before vs. after):
```
126872.54 tps ( 67.2 allocs/op, 14.2 tasks/op, 52404 insns/op)
126532.13 tps ( 67.2 allocs/op, 14.2 tasks/op, 52408 insns/op)
126864.99 tps ( 67.2 allocs/op, 14.2 tasks/op, 52428 insns/op)
127073.10 tps ( 67.2 allocs/op, 14.2 tasks/op, 52404 insns/op)
126895.85 tps ( 67.2 allocs/op, 14.2 tasks/op, 52411 insns/op)
127894.02 tps ( 66.2 allocs/op, 13.2 tasks/op, 52036 insns/op)
127671.51 tps ( 66.2 allocs/op, 13.2 tasks/op, 52042 insns/op)
127541.42 tps ( 66.2 allocs/op, 13.2 tasks/op, 52044 insns/op)
127409.10 tps ( 66.2 allocs/op, 13.2 tasks/op, 52052 insns/op)
127831.30 tps ( 66.2 allocs/op, 13.2 tasks/op, 52043 insns/op)
```
Test: unit(dev), unit(result_utils_test, debug)
Segregates result utilities into:
- result.hh - basic definitions related to results with exception
containers,
- result_combinators.hh - combinators for working with results in
conjunction with futures,
- result_loop.hh - loop-like combinators, currently has only
result_parallel_for_each.
The motivation for the split is:
1. In headers, usually only result.hh will be needed, so no need to
force most .cc files to compile definitions from other files,
2. Less files need to be recompiled when a combinator is added to
result_combinators or result_loop.
As a bonus, `result_with_exception` was moved from `utils::internal` to
just `utils`.
Adds result_try and result_futurize_try - functions which allow to
convert existing try..catch blocks into a version which handles C++
exceptions, failed results with exception containers and, depending on
the function variant, exceptional futures.
directory_lister provides a simpler interface
compared to lister.
After creating the directory_lister,
its async get() method should be called repeatedly,
returning a std::optional<directory_entry> each call,
until it returns a disengaged entry or an error.
This is especially suitable for coroutines
as demonstrated in the unit tests that were added.
For example:
auto dl = directory_lister(path);
while (auto de = co_await dl.get()) {
co_await process(*de);
}
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, most of the failures that occur during CQL reads or writes are reported using C++ exceptions. Although the seastar framework avoids most of the cost of unwinding by keeping exceptions in futures as `std::exception_ptr`s, the exceptions need to be inspected at various points for the purposes of accounting metrics or converting them to a CQL error response. Analyzing the value and type of an exception held by `std::exception_ptr`'s cannot be done without rethrowing the exception, and that can be very costly even if the exception is immediately caught. Because of that, exceptions are not a good fit for reporting failures which happen frequently during overload, especially if the CPU is the bottleneck.
This PR introduces facilities for reporting exceptions as values using the boost::outcome library. As a first step, the need to use exceptions for reporting timeouts was eliminated for regular and batch writes, and no exceptions are thrown between creation of a `mutation_write_timeout_exception` and its serialization as a CQL response in the `cql_server`.
The types and helpers introduced here can be reused in order to migrate more exceptions and exception paths in a similar fashion.
Results of `perf_simple_query --smp 1 --operations-per-shard 1000000`:
Master (00a9326ae7)
128789.53 tps ( 82.2 allocs/op, 12.2 tasks/op, 49245 insns/op)
This PR
127072.93 tps ( 82.2 allocs/op, 12.2 tasks/op, 49356 insns/op)
The new version seems to be slower by about 100 insns/op, fortunately not by much (about 0.2%).
Tests: unit(dev), unit(result_utils_test, debug)
Closes#10014
* github.com:scylladb/scylla:
cql_test_env: optimize handling result_message::exception
transport/server: handle exceptions from coordinator_result without throwing
transport/server: propagate coordinator_result to the error handling code
transport/server: unwrap the exception result_message in process_xyz_internal
query_processor: add exception-returning variants of execute_ methods
modification_statement: propagate failed result through result_message::exception
batch_statement: propagate failed result through result_message::exception
cql_statement: add `execute_without_checking_exception_message`
result_message: add result_message::exception
storage_proxy: change mutate_with_triggers to return future<result<>>
storage_proxy: add mutate_atomically_result
storage_proxy: return result<> from mutate_result
storage_proxy: return result<> from mutate_internal
storage_proxy: properly propagate future from mutate_begin to mutate_end
storage_proxy: handle exceptions as values in mutate_end
storage_proxy: let mutate_end take a future<result<>>
storage_proxy: resultify mutate_begin
storage_proxy: use result in the _ready future of write handlers
storage_proxy: introduce helpers for dealing with results
exceptions: add coordinator_exception_container and coordinator_result
utils: add result utils
utils: add exception_container
In order to propagate exceptions as values through the CQL layer with
minimal modifications to the interfaces, a new result_message type is
introduced: result_message::exception. Similarly to
result_message::bounce_to_shard, this is an internal type which is
supposed to be handled before being returned to the client.
Adds a number of utilities for working with boost::outcome::result
combined with exception_container. The utilities are meant to help with
migration of the existing code to use the boost::outcome::result:
- `exception_container_throw_policy` - a NoValuePolicy meant to be used
as a template parameter for the boost::outcome::result. It protects
the caller of `result::value()` and `result::error()` methods - if the
caller wishes to get a value but the result has an error
(exception_container in our case), the exception in the container will
be thrown instead. In case it's the other way around,
boost::outcome::bad_result_access is thrown.
- `result_parallel_for_each` - a version of `parallel_for_each` which is
aware of results and returns a failed result in case any of the
parallel invocations return a failed result.
- `result_into_future` - converts a result into a future. If the result
holds a value, converts it into make_ready_future; if it holds an
exception, the exception is returned as make_exception_future.
- `then_ok_result` takes a `future<T>` and converts it into
a `future<result<T>>`.
- `result_wrap` adapts a callable of type `T -> future<result<T>>` and
returns a callable of type `result<T> -> future<result<T>>`.
The function cql3::util::maybe_quote() is used throughout Scylla to
convert identifier names (column names, table names, etc.) into strings
that can be embedded in CQL commands. maybe_quote() sometimes needs to
quote these identifier names, but when the identifier name is lowercase,
and not a CQL keyword, it is not quoted.
Not quoting identifier names when not needed is nice and pretty, but has
a forward-compatibility problem: If some CQL command with an unquoted
identifier is saved somewhere, and new version of Scylla adss this
identifier as a new reserved keyword - the CQL command will break.
So this patch introduces a new function, cql3::util::quote(), which
unconditionally quotes the given identifier.
The new function is not yet used in Scylla, but we add a unit test
(based on the test of maybe_quote()) to confirm it behaves correctly.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220118161217.231811-2-nyh@scylladb.com>
cql3::util::maybe_quote() is a utility function formatting an identifier
name (table name, column name, etc.) that needs to be embedded in a CQL
statement - and might require quoting if it contains non-alphanumeric
characters, uppercase characters, or a CQL keyword.
maybe_quote() made an effort to only quote the identifier name if neccessary,
e.g., a lowercase name usually does not need quoting. But lowercase names
that are CQL keywords - e.g., to or where - cannot be used as identifiers
without quoting. This can cause problems for code that wants to generate
CQL statements, such as the materialized-view problem in issue #9450 - where
a user had a column called "to" and wanted to create a materialized view
for it.
So in this patch we fix maybe_quote() to recognize invalid identifiers by
using the CQL parser, and quote them. This will quote reserved keywords,
but not so-called unreserved keywords, which *are* allowed as identifiers
and don't need quoting. This addition slows down maybe_quote(), but
maybe_quote() is anyway only used in heavy operations which need to
generate CQL.
This patch also adds two tests that reproduce the bug and verify its
fix:
1. Add to the low-level maybe_quote() test (a C++ unit test) also tests
that maybe_quote() quotes reserved keywords like "to", but doesn't
quote unreserved keywords like "int".
2. Add a test reproducing issue #9450 - creating a materialized view
whose key column is a keyword. This new test passes on Cassandra,
failed on Scylla before this patch, and passes after this patch.
It is worth noting that maybe_quote() now has a "forward compatiblity"
problem: If we save CQL statements generated by maybe_quote(), and a
future version introduces a new reserved keyword, the parser of the
future version may not be able to parse the saved CQL statement that
was generated with the old mayb_quote() and didn't quote what is now
a keyword. This problem can be solved in two ways:
1. Try hard not to introduced new reserved keywords. Instead, introduce
unreserved keywords. We've been doing this even before recognizing
this maybe_quote() future-compatibility problem.
2. In the next patch we will introduce quote() - which unconditionally
quotes identifier names, even if lowercase. These quoted names will
be uglier for lowercase names - but will be safe from future
introduction of new keywords. So we can consider switching some or
all uses of maybe_quote() to quote().
Fixes#9450
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220118161217.231811-1-nyh@scylladb.com>
Adds `exception_container` - a helper type used to hold exceptions as a
value, without involving the std::exception_ptr.
The motivation behind this type is that it allows inspecting exception's
type and value without having to rethrow that exception and catch it,
unlike std::exception_ptr. In our current codebase, some exception
handling paths need to rethrow the exception multiple times in order to
account it into metrics or encode it as an error response to the CQL
client. Some types of exceptions can be thrown very frequently in case
of overload (e.g. timeouts) and inspecting those exceptions with
rethrows can make the overload even worse. For those kinds of exceptions
it is important to handle them as cheaply as possible, and
exception_container used with conjunction with boost::outcome::result
can help achieve that.
Snapshot-ctl methods fetch information about snapshots from
column family objects. The problem with this is that we get rid
of these objects once the table gets dropped, while the snapshots
might still be present (the auto_snapshot option is specifically
made to create this kind of situation). This commit switches from
relying on column family interface to scanning every datadir
that the database knows of in search for "snapshots" folders.
Fixes#3463Closes#7122Closes#9884
Signed-off-by: Piotr Wojtczak <piotr.m.wojtczak@gmail.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Added test that checks if a SELECT COUNT(*) query was transformed and
processed in a parallel way. Checking is done by looking at the cql
statistics and comparing subsequent counts of parallelized aggregation
SELECT query executions.
Schema changes on top of Raft do not allow concurrent changes.
If two changes are attempted concurrently, one of them gets
`group0_concurrent_modification` exception.
Catch the exception in CQL DDL statement execution function and retry.
In addition, improve the description of CQL DDL statements
in group 0 history table.
Add a test which checks that group 0 history grows iff a schema change does
not throw `group0_concurrent_modification`. Also check that the retry
mechanism works as expected.
* kbr/ddl-retry-v1:
test: unit test for group 0 concurrent change protection and CQL DDL retries
cql3: statements: schema_altering_statement: automatically retry in presence of concurrent changes
Fast forwarding is delegated to the underlying reader and assumes the
it's supported. The only corner case requiring special handling that has
shown up in the tests is producing partition start mutation in the
forwarding case if there are no other fragments.
compacting state keeps track of uncompacted partition start, but doesn't
emit it by default. If end of stream is reached without producing a
mutation fragment, partition start is not emitted. This is invalid
behaviour in the forwarding case, so I've added a public method to
compacting state to force marking partition as non-empty. I don't like
this solution, as it feels like breaking an abstraction, but I didn't
come across a better idea.
Tests: unit(dev, debug, release)
Message-Id: <20220128131021.93743-1-mikolaj.sieluzycki@scylladb.com>
If memory reclamation is triggered inside _cache.emplace(), the _cache
btree can get corrupted. Reclaimers erase from it, and emplace()
assumes that the tree is not modified during its execution. It first
locates the target node and then does memory allocation.
Fix by running emplace() under allocating section, which disables
memory reclamation.
The bug manifests with assert failures, e.g:
./utils/bptree.hh:1699: void bplus::node<unsigned long, cached_file::cached_page, cached_file::page_idx_less_comparator, 12, bplus::key_search::linear, bplus::with_debug::no>::refill(Less) [Key = unsigned long, T = cached_file::cached_page, Less = cached_file::page_idx_less_comparator, NodeSize = 12, Search = bplus::key_search::linear, Debug = bplus::with_debug::no]: Assertion `p._kids[i].n == this' failed.
Fixes#9915
Message-Id: <20220130175639.15258-1-tgrabiec@scylladb.com>
Check that group 0 history grows iff a schema change does not throw
`group0_concurrent_modification`. Check that the CQL DDL statement retry
mechanism works as expected.
We perform a bunch of schema changes with different values of
`migration_manager::_group0_history_gc_duration` and check if entries
are cleared according to this setting.
The compactor recently acquired the ability to consume a v2 stream. The
v2 spec requires that all streams end with a null tombstone.
`range_tombstone_assembler`, the component the compactor uses for
converting the v2 input into its v1 output enforces this with a check on
`consume_end_of_partition()`. Normally the producer of the stream the
compactor is consuming takes care of closing the active tombstone before
the stream ends. The compactor however (or its consumer) can decide to
end the consume early, e.g. to cut the current page. When this happens
the compactor must take care of closing the tombstone itself.
Furthermore it has to keep this tombstone around to re-open it on the
next page.
This patch implements this mechanism which was left out of 134601a15e.
It also adds a unit test which reproduces the problems caused by the
missing mechanism.
The compactor now tracks the last clustering position emitted. When the
page ends, this position will be used as the position of the closing
range tombstone change. This ensures the range tombstone only covers the
actually emitted range.
Fixes: #9907
Tests: unit(dev), dtest(paging_test.py, paging_additional_test.py)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20220114053215.481860-1-bdenes@scylladb.com>
`announce` now takes a `group0_guard` by value. `group0_guard` can only
be obtained through `migration_manager::start_group0_operation` and
moved, it cannot be constructed outside `migration_manager`.
The guard will be a method of ensuring linearizability for group 0
operations.
The functions which prepare schema change mutations (such as
`prepare_new_column_family_announcement`) would use internally
generated timestamps for these mutations. When schema changes are
managed by group 0 we want to ensure that timestamps of mutations
applied through Raft are monotonic. We will generate these timestamps at
call sites and pass them into the `prepare_` functions. This commit
prepares the APIs.
Without this fix, generate_random_content could generate 0 entries
and the expected exception would never be injected.
With it, we generate at least 1 entry and the test passes
with the offending random-seed:
```
random-seed=1898914316
Generated 1 dir entries
Aborting lister after 1 dir entries
test/boost/lister_test.cc(96): info: check 'exception "expected_exception" raised as expected' has passed
```
Fixes#9953
Test: lister_test.test_lister_abort --random-seed=1898914316(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220123122921.14017-1-bhalevy@scylladb.com>
Take into account that get_reshaping_job selects only
buckets that have more than min_threashold sstables in them.
Therefore, with 256 disjoint sstables in different windows,
allow first or last windows to not be selected by get_reshaping_job
that will return at least disjoint_sstable_count - min_threshold + 1
sstables, and not more than disjoint_sstable_count.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220123090044.38449-2-bhalevy@scylladb.com>
This pull request fixes two preexisting issues related to snapshot_ctl::true_snapshots_size
https://github.com/scylladb/scylla/issues/9897https://github.com/scylladb/scylla/issues/9898
And adds a couple unit tests to tests the snapshot_ctl functionality.
Test: unit(dev), database_test.{test_snapshot_ctl_details,test_snapshot_ctl_true_snapshots_size}(debug)
Closes#9899
* github.com:scylladb/scylla:
table: get_snapshot_details: count allocated_size
snapshot_ctl: cleanup true_snapshots_size
snpashot_ctl: true_snapshots_size: do not map_reduce across all shards
snapshot_ctl uses map_reduce over all database shards,
each counting the size of the snapshots directory,
which is shared, not per-shard.
So the total live size returned by it is multiples by the number of shards.
Add a unit test to test that.
Fixes#9897
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Fixes#9922
storage proxy uses is_timeout_exception to traverse different code paths.
a6202ae079 broke this (because bit rot and
intermixing), by wrapping exception for information purposes.
This adds check of nested types in exception handling, as well as a test
for the routine itself.
Closes#9932
* github.com:scylladb/scylla:
database/storage_proxy: Use "is_timeout_exception" instead of catch match
utils::is_timeout_exception: Ensure we handle nested exception types
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Currently, when TWCS reshape finds a bucket containing more than 32
files, it will blindly resize that bucket to 32.
That's very bad because it doesn't take into consideration that
compaction efficiency depends on relative sizes of files being
compacted together, meaning that a huge file can be compacted with
a tiny one, producing lots of write amplification.
To solve this problem, STCS reshape logic will now be reused in
each time bucket. So only similar-sized files are compacted together
and the time bucket will be considered reshaped once its size tiers
are properly compacted, according to the reshape mode.
Fixes#9938.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220117205000.121614-1-raphaelsc@scylladb.com>
"
Said method should take care of checking that parsing stopped in a valid
state. This patch-set expands the existing but very lacking
implementation by improving the existing error message and adding an
additional check for prematurely exiting the parser in the middle of
parsing an index entry, something we've seen recently in #9446.
To help in debugging such issues, some additional information is added
to the trace messages.
The series also fixes a bug in the error handling code of the partition
index cache.
Refs: #9446
Tests: unit(dev)
"
* 'index-reader-better-verify-end-state/v2.1' of https://github.com/denesb/scylla:
sstables/index_reader: process_state(): add additional information to trace logging
sstables/index_reader: verify_end_state(): add check for premature EOS
sstables/index_reader: convert exception in verify_end_state() to malformed sstable exception
sstables/index_reader: add const sstable& to index_consume_entry_context
sstables/index_reader: remove unused members from index_consume_entry_context
It's currently used only by unit tests
and it is dangerous to use on a populated token_metadata
as update_normal_tokens assumes that the set of tokens
owned by the given endpoint is compelte, i.e. previous
tokens owned by the endpoint are no longer owned by it,
but the single-token update_normal_token interface
seems commulative (and has no documentation whatsoever).
It is better to remove this interface and calculate a
complete map of endpoint->tokens from the tests.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220117101242.122512-1-bhalevy@scylladb.com>