Commit Graph

74 Commits

Author SHA1 Message Date
Benny Halevy
4d646636f2 locator: effective_replication_map: get_natural_replicas: get is_vnode param
Some callers, like `construct_range_to_endpoint_map` for describe_ring,
or `get_secondary_ranges` for alternator ttl pass vnode tokens (the
vnodes' start token), and therefore can benefit from the fast lookup
path in `vnode_effective_replication_map::do_get_replicas`.
Otherwise the vnode token is binary-searched in sorted_tokens using
token_metadata::first_token().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-13 12:41:00 +03:00
Benny Halevy
6dbbb80aae locator: abstract_replication_strategy: implement local_replication_strategy
Derive both vnode_effective_replication_map
and local_effective_replication_map from
static_effective_replication_map as both are static and per-keyspace.

However, local_effective_replication_map does not need vnodes
for the mapping of all tokens to the local node.

Note that everywhere_replication_strategy is not abstracted in a similar
way, although it could, since the plan is to get rid of it
once all system keyspaces areconverted to local or tablets replication
(and propagated everywhere if needed using raft group0)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 16:05:11 +03:00
Benny Halevy
cbad497859 locator: abstract_replication_strategy: rename vnode_effective_replication_map_ptr et. al
to static_effective_replication_map_ptr, in preparation
for separating local_effective_replication_map from
vnode_effective_replication_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 16:03:53 +03:00
Benny Halevy
bd62421c05 keyspace: rename get_vnode_effective_replication_map
to get_static_effective_replication_map, in preparation
for separating local_effective_replication_map from
vnode_effective_replication_map (both are per-keyspace).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 13:40:43 +03:00
Benny Halevy
59375e4751 alternator: ttl: use naked e_r_m pointers
Prepare for following patch that will separate
the local effective replication map from
vnode_effective_replication_map.

The caller is responsible to keep the
effective_replication_map_ptr alive while
in use by low-level async functions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 13:34:23 +03:00
Avi Kivity
ee138217ba alternator: simplify std::views::transform calls that extract a member from a class
Rather than calling std::views::transform with a lambda that extracts
a member from a class, call std::views::transform with a pointer-to-member
to do the same thing. This results in more concise code.

Closes scylladb/scylladb#25012
2025-07-22 12:39:01 +02:00
Nadav Har'El
d8fab2a01a alternator: clean up and simplify request_return_type
The previous patch introduced a function make_streamed_with_extra_array
which was a duplicate of the existing make_streamed. Reviewers
complained how baroque the new function is (just like the old function),
having to jump through hoops to return a copyable function working
on non-copyable objects, making strange-named copies and shared pointers
of everything.

We needed to return a copyable function (std::function) just because
Alternator used Seastar's json::json_return_type in the return type
from executor function (request_return_type). This json_return_type
contained either a sstring or an std::function, but neither was ever
really appropriate:

  1. We want to return noncopyable_function, not an std::function!
  2. We want to return an std::string (which rjson::print()) returns,
     not an sstring!

So in this patch we stop using seastar::json::json_return_type
entirely in Alternator.

Alternator's request_return_type is now an std::variant of *three* types:
  1. std::string for short responses,
  2. noncopyable_function for long streamed response
  3. api_error for errors.

The ugliest parts of make_streamed() where we made copies and shared
pointers to allow for a copyable function are all gone. Even nicer, a
lot of other ugly relics of using seastar::json_return_type are gone:

1. We no longer need obscure classes and functions like make_jsonable()
   and json_string() to convert strings to response bodies - an operation
   can simply return a string directly - usually returning
   rjson::print(value) or a fixed string like "" and it just works.

2. There is no more usage of seastar::json in Alternator (except one
   minor use of seastar::json::formatter::to_json in streams.cc that
   can be removed later). Alternator uses RapidJSON for its JSON
   needs, we don't need to use random pieces from a different JSON
   library.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-07-14 18:41:34 +03:00
Benny Halevy
3feb759943 everywhere: use utils::chunked_vector for list of mutations
Currently, we use std::vector<*mutation> to keep
a list of mutations for processing.
This can lead to large allocation, e.g. when the vector
size is a function of the number of tables.

Use a chunked vector instead to prevent oversized allocations.

`perf-simple-query --smp 1` results obtained for fixed 400MHz frequency
and PGO disabled:

Before (read path):
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...

89055.97 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39417 insns/op,   18003 cycles/op,        0 errors)
103372.72 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39380 insns/op,   17300 cycles/op,        0 errors)
98942.27 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39413 insns/op,   17336 cycles/op,        0 errors)
103752.93 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39407 insns/op,   17252 cycles/op,        0 errors)
102516.77 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39403 insns/op,   17288 cycles/op,        0 errors)
throughput:
	mean=   99528.13 standard-deviation=6155.71
	median= 102516.77 median-absolute-deviation=3844.59
	maximum=103752.93 minimum=89055.97
instructions_per_op:
	mean=   39403.99 standard-deviation=14.25
	median= 39406.75 median-absolute-deviation=9.30
	maximum=39416.63 minimum=39380.39
cpu_cycles_per_op:
	mean=   17435.81 standard-deviation=318.24
	median= 17300.40 median-absolute-deviation=147.59
	maximum=18002.53 minimum=17251.75
```

After (read path)
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
59755.04 tps ( 66.2 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39466 insns/op,   22834 cycles/op,        0 errors)
71854.16 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39417 insns/op,   17883 cycles/op,        0 errors)
82149.45 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39411 insns/op,   17409 cycles/op,        0 errors)
49640.04 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.3 tasks/op,   39474 insns/op,   19975 cycles/op,        0 errors)
54963.22 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.3 tasks/op,   39474 insns/op,   18235 cycles/op,        0 errors)
throughput:
	mean=   63672.38 standard-deviation=13195.12
	median= 59755.04 median-absolute-deviation=8709.16
	maximum=82149.45 minimum=49640.04
instructions_per_op:
	mean=   39448.38 standard-deviation=31.60
	median= 39466.17 median-absolute-deviation=25.75
	maximum=39474.12 minimum=39411.42
cpu_cycles_per_op:
	mean=   19267.01 standard-deviation=2217.03
	median= 18234.80 median-absolute-deviation=1384.25
	maximum=22834.26 minimum=17408.67
```

`perf-simple-query --smp 1 --write` results obtained for fixed 400MHz frequency
and PGO disabled:

Before (write path):
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no}
Disabling auto compaction
63736.96 tps ( 59.4 allocs/op,  16.4 logallocs/op,  14.3 tasks/op,   49667 insns/op,   19924 cycles/op,        0 errors)
64109.41 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   49992 insns/op,   20084 cycles/op,        0 errors)
56950.47 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50005 insns/op,   20501 cycles/op,        0 errors)
44858.42 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50014 insns/op,   21947 cycles/op,        0 errors)
28592.87 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50027 insns/op,   27659 cycles/op,        0 errors)
throughput:
	mean=   51649.63 standard-deviation=15059.74
	median= 56950.47 median-absolute-deviation=12087.33
	maximum=64109.41 minimum=28592.87
instructions_per_op:
	mean=   49941.18 standard-deviation=153.76
	median= 50005.24 median-absolute-deviation=73.01
	maximum=50027.07 minimum=49667.05
cpu_cycles_per_op:
	mean=   22023.01 standard-deviation=3249.92
	median= 20500.74 median-absolute-deviation=1938.76
	maximum=27658.75 minimum=19924.32
```

After (write path)
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no}
Disabling auto compaction
53395.93 tps ( 59.4 allocs/op,  16.5 logallocs/op,  14.3 tasks/op,   50326 insns/op,   21252 cycles/op,        0 errors)
46527.83 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50704 insns/op,   21555 cycles/op,        0 errors)
55846.30 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50731 insns/op,   21060 cycles/op,        0 errors)
55669.30 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50735 insns/op,   21521 cycles/op,        0 errors)
52130.17 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50757 insns/op,   21334 cycles/op,        0 errors)
throughput:
	mean=   52713.91 standard-deviation=3795.38
	median= 53395.93 median-absolute-deviation=2955.40
	maximum=55846.30 minimum=46527.83
instructions_per_op:
	mean=   50650.57 standard-deviation=182.46
	median= 50731.38 median-absolute-deviation=84.09
	maximum=50756.62 minimum=50325.87
cpu_cycles_per_op:
	mean=   21344.42 standard-deviation=202.86
	median= 21334.00 median-absolute-deviation=176.37
	maximum=21554.61 minimum=21060.24
```

Fixes #24815

Improvement for rare corner cases. No backport required

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#24919
2025-07-13 19:13:11 +03:00
Piotr Szymaniak
de96c28625 alternator: Add support for TTL when using tablets
Support for TTL-based data removal when using tablets.
The essence of this commit is a separate code path for finding token
ranges owned by the current shard for the cases when tablets are used
and not vnodes. At the same time, the vnodes-case is not touched not to
cause any regressions.

The TTL-caused data removal is normally performed by the primary
replica (both when using vnodes and tablets). For the tablets case,
the already-existing method tablet_map::get_primary_replica(tablet_id)
is used to know if a shard execuring the TTL-related data removal is
the primary replica for each tablet.

A new method tablet_map::get_secondary_replica(tablet_id) has been
added. It is needed by the data invalidation procedure to remove data
when the primary replica node is down - the data is then removed by the
secondary replica node. The mechanism is the same as in the vnodes case.

Since alternator now supports TTL, the test
`test_ttl_enable_error_with_tablets` has been removed.
Also, tests in the test_ttl.py have been made to run twice, once with
vnodes and once with tablets. When run with tablets, the due to lack of
support for LWT with tablets (#18068), tests use
'system:write_isolation' of 'unsafe_rmw'. This approach allows early
regression testing with tablets and is meant only as a tentative
solution.

Fixes scylladb/scylladb#16567

Closes scylladb/scylladb#23662
2025-06-05 17:39:29 +03:00
Amnon Heiman
a474e95ef0 alternator: label metrics with basic_level and alternator
The following metrics will be marked with basic_level label:
scylla_alternator_operation
scylla_alternator_op_latency_bucket
scylla_alternator_op_latency_count
scylla_alternator_op_latency_sum
scylla_alternator_total_operations
scylla_alternator_batch_item_count
scylla_alternator_op_latency
scylla_alternator_op_latency_summary
scylla_expiration_items_deleted

All alternator metrics are marked with __alternator label.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:38 +02:00
Kefu Chai
9fdbe0e74b tree: Remove unused boost headers
This commit eliminates unused boost header includes from the tree.

Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22997
2025-02-25 10:32:32 +03:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Gleb Natapov
38c13975ca alternator: move ttl to work with host ids instead of ips 2024-12-15 11:31:11 +02:00
Kefu Chai
6ead5a4696 treewide: move log.hh into utils/log.hh
the log.hh under the root of the tree was created keep the backward
compatibility when seastar was extracted into a separate library.
so log.hh should belong to `utils` directory, as it is based solely
on seastar, and can be used all subsystems.

in this change, we move log.hh into utils/log.hh to that it is more
modularized. and this also improves the readability, when one see
`#include "utils/log.hh"`, it is obvious that this source file
needs the logging system, instead of its own log facility -- please
note, we do have two other `log.hh` in the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-10-22 06:54:46 +03:00
Avi Kivity
820509026f schema: replace boost ranges with std ranges
To reduce dependency load, use std ranges instead of boost ranges.

The std::ranges::{lower,upper}_bound don't support heterogeneous lookup,
but a more natural solution is to use a projection to search for the name,
so we use that and the custom comparator is removed.

Many callers are converted as well due to poor interoperability between
boost ranges and std ranges.
2024-10-15 16:42:54 +03:00
Nadav Har'El
00793059e1 alternator: fix alternator_enforce_authorization=false
When the configuration has alternator_enforce_authorization=false,
Alternator should not do authentication (check which user signed each
request) nor authorization (check if that user has permissions to do
each operation).

Our implementation forgot to disable the authorization checks when
it's configured to false. The (incorrect) assumption was that when
alternator_enforce_authorization is configured to false, the CQL
'authenticator' and 'authorizer' configuration is also disabled -
so the authorization checks will be no-ops. But we can't assume
that: Users are free to configure 'authenticator' and 'authorizer'
for use in CQL, and then set alternator_enforce_authorization=false
just for Alternator.

So this patch adds a new test for this case - when we have
authenticator=PasswordAuthenticator, authorizer=CassandraAuthorizer
but alternator_enforce_authorization=false, and fixes it to work
correctly.

The heart of the fix is trivial: the `verify_*_permission()` functions
just need to check the alternator_enforce_authorization and return
immediately when false. The bigger part of this change is to get the
alternator_enforce_authorization into the "executor" object and then
to pass it into the verify calls.

Although alternator_enforce_authorization is not YET live updatable,
this code is prepared for the future that it may become live
updatable, so the executor object saves not the boolean value of
this flag, but a live-updatable reference to it.

Fixes #20619

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-09-17 15:50:00 +03:00
Nadav Har'El
15f8046fcb alternator ttl: fix use-after-free
The Alternator TTL scanning code uses an object "scan_ranges_context"
to hold the scanning context. One of the members of this object is
a service::query_state, and that in turn holds a reference to a
service::client_state. The existing constructor created a temporary
client_state object and saved a reference to it - which can result
in use after free as the temporary object is freed as soon as the
constructor ends.

The fix is to save a client_state in the scan_ranges_context object,
instead of a temporary object.

Fixes #19988

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#20418
2024-09-03 22:15:18 +03:00
Nadav Har'El
dd030f8112 alternator: improve RBAC access denied error messages
This patch address two requests made by reviewers of the original "Add
CQL-based RBAC support to Alternator" series. Both requests were about
the error messages produced when access is denied:

1. The error message is improved to use more proper English, and also
   to include the name of the role which was denied access.

2. The permission-check and error-message-formatting code is
   de-duplicated, using a common function verify_permission().

   This de-duplication required moving the access-denied error path to
   throwing an exception instead of the previous exception-free
   implementation. However, it can be argued that this change is actually
   a good thing, because it makes the successful case, when access is
   allowed, faster.

   The de-duplicated code is shorter and simpler, and allowed changing
   the text of the error message in just one place.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#20326
2024-09-03 14:39:30 +03:00
Benny Halevy
686a8f2939 abstract_replication_strategy: make get_ranges async
To prevent stalls due to large number of tokens.
For example, large cluster with say 70 nodes can have
more than 16K tokens.

Fixes #19757

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-25 10:57:34 +03:00
Benny Halevy
824bdf99d2 alternator: ttl: token_ranges_owned_by_this_shard: let caller make the ranges_holder
Add static `make` methods to ranges_holder_{primary,secondary}
and use them to make the ranges objects and pass them
to `token_ranges_owned_by_this_shard`, rather than letting
token_ranges_owned_by_this_shard invoke the right constructor
of the ranges_holder class.

Prepare for making `make` async.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-25 10:25:32 +03:00
Benny Halevy
b2abbae24b alternator: ttl: can pass const gms::gossiper& to ranges_holder
There's no need to pass a mutable reference to
the gossiper.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-25 10:25:32 +03:00
Benny Halevy
333c0d7c88 alternator: ttl: ranges_holder_primary: unconstify _token_ranges member
To allow the class to be nothrow_move_constructable.
Prepare for returning it as a future value.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-25 10:25:32 +03:00
Benny Halevy
d385219a12 alternator: ttl: refactor token_ranges_owned_by_this_shard
Rather than holding a variant member (and defining
both ranges_holder_{primary,secondary} in both
specilizations of the class, just make the internal
ranges_holder class first-class citizens
and parameterize the `token_ranges_owned_by_this_shard`
template by this class type.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-25 10:25:32 +03:00
Nadav Har'El
9417cf8bcf alternator: add RBAC enforcement to UpdateTimeToLive
This patch adds a requirement for the "ALTER" permission on a table to
run a UpdateTimeToLive on it. UpdateTimeToLive is similar in purpose to
UpdateTable, so it makes sense to use the same permission "ALTER" as we
do for UpdateTable.

A tests is also added.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:57:53 +02:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Tomasz Grabiec
dd4a086b87 selective_token_sharder: Avoid use of deprecated sharder API
I analyzed all the uses and all except the alternator/ttl.cc seem to
be interested in the result for the purpose of reading.

Alternator is not supported with tablets yet, so the use was annotated
with a relevant issue.
2024-05-16 00:28:47 +02:00
Nadav Har'El
28db187756 alternator, tablets: return error if enabling TTL with tablets
Alternator TTL doesn't yet work on tables using tablets (this is
issue #16567). Before this patch, it can be enabled on a table with
tablets, and the result is a lot of log spam and nothing will get expired.

So let's make the attempt to enable TTL on a table that uses tablets
into a clear error. The error message points to the issue, and also
suggests how to create a table that uses vnodes, not tablets.

This patch also adds a test that verifies that trying to enable TTL
with tablets is an error. Obviously, this test should be removed
once the issue is solved and TTL begins working with tablets.

Refs #16567
Refs #16807

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17306
2024-02-15 10:47:06 +02:00
Patryk Wrobel
a3fb44cbca Rename keyspace::get_effective_replication_map()
This commit renames keyspace::get_effective_replication_map()
to keyspace::get_vnode_effective_replication_map(). This change
is required to ease the analysis of the usage of this function.

When tablets are enabled, then this function shall not be used.
Instead of per-keyspace, per-table replication map should be used.
The rename was performed to distinguish between those two calls.
The next step will be an audit of usages of
keyspace::get_vnode_effective_replication_map().

Refs: scylladb#16626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17314
2024-02-13 20:22:02 +02:00
Kefu Chai
a0e5c14c55 alternator: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16736
2024-01-12 10:53:32 +02:00
Benny Halevy
03fe674314 alternator: ttl: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Tomasz Grabiec
ab94e74774 alternator: Use table sharder
schema::get_sharder() does not return the correct sharder for tablet-based tables.
Code which is supposed to work with all kinds of tables should use erm::get_sharder().
2023-06-21 00:58:24 +02:00
Avi Kivity
42a1ced73b cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt
The expression system uses managed_bytes_opt for values, but result_set
uses bytes_opt. This means that processing values from the result set
in expressions requires a copy.

Out of the two, managed_bytes_opt is the better choice, since it prevents
large contiguous allocations for large blobs. So we switch result_set
to use managed_bytes_opt. Users of the result_set API are adjusted.

The db::function interface is not modified to limit churn; instead we
convert the types on entry and exit. This will be adjusted in a following
patch.
2023-05-07 17:17:36 +03:00
Tomasz Grabiec
d3c9ad4ed6 locator: Rename effective_replication_map to vnode_effective_replication_map
In preparation for introducing a more abstract
effective_replication_map which can describe replication maps which
are not based on vnodes.
2023-04-24 10:49:36 +02:00
Nadav Har'El
c196bd78de alternator: isolate concurrent modification to tags
Alternator modifies tags in three operations - TagResource, UntagResource
and UpdateTimeToLive (the latter uses a tag to store the TTL configuration).

All three operations were implemented by three separate steps:

 1. Read the current tags.
 2. Modify the tags according to the desired operation.
 3. Write the modified tags back with update_tags().

This implementation was not safe for concurrent operations - some
modifications may be be lost. We fix this in this patch by using the new
modify_tags() function introduced in the previous patch, which performs
all three steps under one lock so the tag operations are serialized and
correctly isolated.

Fixes #6389

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-03-13 12:25:03 +02:00
Kefu Chai
df63e2ba27 types: move types.{cc,hh} into types
they are part of the CQL type system, and are "closer" to types.
let's move them into "types" directory.

the building systems are updated accordingly.

the source files referencing `types.hh` were updated using following
command:

```
find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} +
```

the source files under sstables include "types.hh", which is
indeed the one located under "sstables", so include "sstables/types.hh"
instea, so it's more explicit.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12926
2023-02-19 21:05:45 +02:00
Avi Kivity
c5e4bf51bd Introduce mutation/ module
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.

mutation_reader remains in the readers/ module.

mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.

This is a step forward towards librarization or modularization of the
source base.

Closes #12788
2023-02-14 11:19:03 +02:00
Nadav Har'El
5eda8ce4fd alternator ttl: in scanning thread, don't retry the same page too many times
Since fixing issue #11737, when the expiration scanner times out reading
a page of data, it retries asking for the same page instead of giving up
on the scan and starting anew later. This retry was infinite - which can
cause problems if we have a bug in the code or several nodes down, which
can lead to getting hung in the same place in the scan for a very long
(potentially infinite) time without making any progress.

An example of such a bug was issue #12145, where we forgot to handle
shutdowns, so on shutdown of the cluster we just hung forever repeating
the same request that will never succeed. It's better in this case to
just give up on the current scan, and start it anew (from a random
position) later.

Refs #12145 (that issue was already fixed, by a different patch which
stops the iteration when shutting down - not waiting for an infinite
number of iterations and not even one more).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-11-30 18:42:37 +02:00
Nadav Har'El
d08eef5a30 alternator: fix hang during shutdown of expiration-scanning thread
The expiration-scanning thread is a long-running thread which can scan
data for hours, but checks for its abort-source before fetching each
page to allow for timely shutdown. Recently, we added the ability to
retry the page fetching in case of timeout, for forgot to check the
abort source in this new retry loop - which lead to an infinitely-long
shutdown in some tests while the retry loop retries forever.

In this patch we fix this bug by using sleep_abortable() instead of
sleep(). sleep_abortable() will throw an exception if the abort source
was triggered before or during the sleep - and this exception will
stop the scan immediately.

Fixes #12145

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-11-30 18:38:17 +02:00
Nadav Har'El
1e59c3f9ef alternator: if TTL scan times out, continue immediately
The Alternator TTL expiration scanner scans an entire table using many
small pages. If any of those pages time out for some reason (e.g., an
overload situation), we currently consider the entire scan to have failed
and wait for the next scan period (which by default is 24 hours) when
we start the scan from scratch (at a random position). There is a risk
that if these timeouts are common enough to occur once or more per
scan, the result is that we double or more the effective expiration lag.

A better solution, done in this patch, is to retry from the same position
if a single page timed out - immediately (or almost immediately, we add
a one-second sleep).

Fixes #11737

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12092
2022-11-28 11:30:00 +02:00
Avi Kivity
f0643d1713 alternator: ttl: do not copy mutation while constructing a vector
The vector(initializer_list<T>) constructor copies the T since
initializer_list is read-only. Move the mutation instead.

This happens to fix a use-after-return on clang 15 on aarch64.
I'm fairly sure that's a miscompile, but the fix is worthwhile
regardless.

Closes #11818
2022-10-21 10:04:00 +03:00
Piotr Sarna
481240b8b4 Merge 'Alternator: Run more TTL tests by default (and add a test for metrics)' from Nadav Har'El
We had quite a few tests for Alternator TTL in test/alternator, but most
of them did not run as part of the usual Jenkins test suite, because
they were considered "very slow" (and require a special "--runveryslow"
flag to run).

In this series we enable six tests which run quickly enough to run by
default, without an additional flag. We also make them even quicker -
the six tests now take around 2.5 seconds.

I also noticed that we don't have a test for the Alternator TTL metrics
- and added one.

Fixes #11374.
Refs https://github.com/scylladb/scylla-monitoring/issues/1783

Closes #11384

* github.com:scylladb/scylladb:
  test/alternator: insert test names into Scylla logs
  rest api: add a new /system/log operation
  alternator ttl: log warning if scan took too long.
  alternator,ttl: allow sub-second TTL scanning period, for tests
  test/alternator: skip fewer Alternator TTL tests
  test/alternator: test Alternator TTL metrics
2022-09-22 09:47:50 +02:00
Nadav Har'El
33e6a88d9a alternator ttl: comment fixes
This patch fixes a few errors and out-of-date descriptions in comments
in alternator/ttl.cc. No functional changes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-09-15 00:03:43 +03:00
Nadav Har'El
b9792ffb06 alternator ttl: log warning if scan took too long.
Currently, we log at "info" level how much time remained at the end of
a full TTL scan until the next scanning period (we sleep for that time).
If the scan was slower than the period, we didn't print anything.

Let's print a warning in this case - it can be useful for debugging,
and also users should know when their desired scan period is not being
honored because the full scan is taking longer than the desired scan
period.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-09-12 10:32:56 +03:00
Nadav Har'El
e7e9adc519 alternator,ttl: allow sub-second TTL scanning period, for tests
Alternator has the "alternator_ttl_period_in_seconds" parameter for
controlling how often the expiration thread looks for expired items to
delete. It is usually a very large number of seconds, but for tests
to finish quickly, we set it to 1 second.
With 1 second expiration latency, test/alternator/test_ttl.py took 5
seconds to run.

In this patch, we change the parameter to allow a  floating-point number
of seconds instead of just an integer. Then, this allows us to halve the
TTL period used by tests to 0.5 seconds, and as a result, the run time of
test_ttl.py halves to 2.5 seconds. I think this is fast enough for now.

I verified that even if I change the period to 0.1, there is no noticable
slowdown to other Alternator tests, so 0.5 is definitely safe.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-09-12 10:32:56 +03:00
Botond Dénes
d1d53f1b84 query: add tombstone-limit to read-command
Propagate the tombstone-limit from coordinator to replicas, to make sure
all is using the same limit.
2022-08-10 06:01:47 +03:00
Kamil Braun
ab946e392f alternator: ttl: pass gossiper& to expiration_service
This allows us to remove the `gossiper()` getter from `storage_proxy`.
2022-08-04 12:12:43 +02:00
Michał Sala
041cb77ad0 alternator, db: move the tag code to db/tags
Tags are a useful mechanism that could be used outside of alternator
namespace. My motivation to move tags_extension and other utilities to
db/tags/ was that I wanted to use them to mark "synchronous mode" views.

I have extracted `get_tags_of_table`, `find_tag` and `update_tags`
method to db/tags/utils.cc and moved alternator/tags_extension.hh to
db/tags/.

The signature of `get_tags_of_table` was changed from `const
std::map<sstring, sstring>&` to `const std::map<sstring, sstring>*`
Original behavior of this function was to throw an
`alternator::api_error` exception. This was undesirable, as it
introduced a dependency on the alternator module. I chose to change it
to return a potentially null value, and added a wrapper function to the
alternator module - `get_tags_of_table_or_throw` to keep the previous
throwing behavior.
2022-07-25 09:53:33 +02:00
Piotr Dulikowski
e6beab3106 storage_proxy: add allow rate limit flag to mutate/mutate_result
Now, mutate/mutate_result accept a flag which decides whether the write
should be rate limited or not.

The new parameter is mandatory and all call sites were updated.
2022-06-22 20:16:49 +02:00
Nadav Har'El
8ecf1e306f alternator: allow DescribeTimeToLive even without TTL enabled
We still consider the TTL support in Alternator to be experimental, so we
don't want to allow a user to enable TTL on a table without turning on a
"--experimental-features" flag. However, there is no reason not to allow
the DescribeTimeToLive call when this experimental flag is off - this call
would simply reply with the truth - that the TTL feature is disabled for
the table!

This is important for client code (such as the Terraform module
described in issue #10660) which uses DescribeTimeToLive for
information, even when it never intends to actually enable TTL.

The patch is trivial - we simply remove the flag check in
DescribeTimeToLive, the code works just as before.

After this patch, the following test now works on Scylla without
experimental flags turned on:

    test/alternator/run test_ttl.py::test_describe_ttl_without_ttl

Refs #10660

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-05-26 10:55:36 +03:00