Some callers, like `construct_range_to_endpoint_map` for describe_ring,
or `get_secondary_ranges` for alternator ttl pass vnode tokens (the
vnodes' start token), and therefore can benefit from the fast lookup
path in `vnode_effective_replication_map::do_get_replicas`.
Otherwise the vnode token is binary-searched in sorted_tokens using
token_metadata::first_token().
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Derive both vnode_effective_replication_map
and local_effective_replication_map from
static_effective_replication_map as both are static and per-keyspace.
However, local_effective_replication_map does not need vnodes
for the mapping of all tokens to the local node.
Note that everywhere_replication_strategy is not abstracted in a similar
way, although it could, since the plan is to get rid of it
once all system keyspaces areconverted to local or tablets replication
(and propagated everywhere if needed using raft group0)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
to static_effective_replication_map_ptr, in preparation
for separating local_effective_replication_map from
vnode_effective_replication_map.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
to get_static_effective_replication_map, in preparation
for separating local_effective_replication_map from
vnode_effective_replication_map (both are per-keyspace).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Prepare for following patch that will separate
the local effective replication map from
vnode_effective_replication_map.
The caller is responsible to keep the
effective_replication_map_ptr alive while
in use by low-level async functions.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather than calling std::views::transform with a lambda that extracts
a member from a class, call std::views::transform with a pointer-to-member
to do the same thing. This results in more concise code.
Closesscylladb/scylladb#25012
The previous patch introduced a function make_streamed_with_extra_array
which was a duplicate of the existing make_streamed. Reviewers
complained how baroque the new function is (just like the old function),
having to jump through hoops to return a copyable function working
on non-copyable objects, making strange-named copies and shared pointers
of everything.
We needed to return a copyable function (std::function) just because
Alternator used Seastar's json::json_return_type in the return type
from executor function (request_return_type). This json_return_type
contained either a sstring or an std::function, but neither was ever
really appropriate:
1. We want to return noncopyable_function, not an std::function!
2. We want to return an std::string (which rjson::print()) returns,
not an sstring!
So in this patch we stop using seastar::json::json_return_type
entirely in Alternator.
Alternator's request_return_type is now an std::variant of *three* types:
1. std::string for short responses,
2. noncopyable_function for long streamed response
3. api_error for errors.
The ugliest parts of make_streamed() where we made copies and shared
pointers to allow for a copyable function are all gone. Even nicer, a
lot of other ugly relics of using seastar::json_return_type are gone:
1. We no longer need obscure classes and functions like make_jsonable()
and json_string() to convert strings to response bodies - an operation
can simply return a string directly - usually returning
rjson::print(value) or a fixed string like "" and it just works.
2. There is no more usage of seastar::json in Alternator (except one
minor use of seastar::json::formatter::to_json in streams.cc that
can be removed later). Alternator uses RapidJSON for its JSON
needs, we don't need to use random pieces from a different JSON
library.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Support for TTL-based data removal when using tablets.
The essence of this commit is a separate code path for finding token
ranges owned by the current shard for the cases when tablets are used
and not vnodes. At the same time, the vnodes-case is not touched not to
cause any regressions.
The TTL-caused data removal is normally performed by the primary
replica (both when using vnodes and tablets). For the tablets case,
the already-existing method tablet_map::get_primary_replica(tablet_id)
is used to know if a shard execuring the TTL-related data removal is
the primary replica for each tablet.
A new method tablet_map::get_secondary_replica(tablet_id) has been
added. It is needed by the data invalidation procedure to remove data
when the primary replica node is down - the data is then removed by the
secondary replica node. The mechanism is the same as in the vnodes case.
Since alternator now supports TTL, the test
`test_ttl_enable_error_with_tablets` has been removed.
Also, tests in the test_ttl.py have been made to run twice, once with
vnodes and once with tablets. When run with tablets, the due to lack of
support for LWT with tablets (#18068), tests use
'system:write_isolation' of 'unsafe_rmw'. This approach allows early
regression testing with tablets and is meant only as a tentative
solution.
Fixesscylladb/scylladb#16567Closesscylladb/scylladb#23662
The following metrics will be marked with basic_level label:
scylla_alternator_operation
scylla_alternator_op_latency_bucket
scylla_alternator_op_latency_count
scylla_alternator_op_latency_sum
scylla_alternator_total_operations
scylla_alternator_batch_item_count
scylla_alternator_op_latency
scylla_alternator_op_latency_summary
scylla_expiration_items_deleted
All alternator metrics are marked with __alternator label.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
This commit eliminates unused boost header includes from the tree.
Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22997
the log.hh under the root of the tree was created keep the backward
compatibility when seastar was extracted into a separate library.
so log.hh should belong to `utils` directory, as it is based solely
on seastar, and can be used all subsystems.
in this change, we move log.hh into utils/log.hh to that it is more
modularized. and this also improves the readability, when one see
`#include "utils/log.hh"`, it is obvious that this source file
needs the logging system, instead of its own log facility -- please
note, we do have two other `log.hh` in the tree.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
To reduce dependency load, use std ranges instead of boost ranges.
The std::ranges::{lower,upper}_bound don't support heterogeneous lookup,
but a more natural solution is to use a projection to search for the name,
so we use that and the custom comparator is removed.
Many callers are converted as well due to poor interoperability between
boost ranges and std ranges.
When the configuration has alternator_enforce_authorization=false,
Alternator should not do authentication (check which user signed each
request) nor authorization (check if that user has permissions to do
each operation).
Our implementation forgot to disable the authorization checks when
it's configured to false. The (incorrect) assumption was that when
alternator_enforce_authorization is configured to false, the CQL
'authenticator' and 'authorizer' configuration is also disabled -
so the authorization checks will be no-ops. But we can't assume
that: Users are free to configure 'authenticator' and 'authorizer'
for use in CQL, and then set alternator_enforce_authorization=false
just for Alternator.
So this patch adds a new test for this case - when we have
authenticator=PasswordAuthenticator, authorizer=CassandraAuthorizer
but alternator_enforce_authorization=false, and fixes it to work
correctly.
The heart of the fix is trivial: the `verify_*_permission()` functions
just need to check the alternator_enforce_authorization and return
immediately when false. The bigger part of this change is to get the
alternator_enforce_authorization into the "executor" object and then
to pass it into the verify calls.
Although alternator_enforce_authorization is not YET live updatable,
this code is prepared for the future that it may become live
updatable, so the executor object saves not the boolean value of
this flag, but a live-updatable reference to it.
Fixes#20619
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The Alternator TTL scanning code uses an object "scan_ranges_context"
to hold the scanning context. One of the members of this object is
a service::query_state, and that in turn holds a reference to a
service::client_state. The existing constructor created a temporary
client_state object and saved a reference to it - which can result
in use after free as the temporary object is freed as soon as the
constructor ends.
The fix is to save a client_state in the scan_ranges_context object,
instead of a temporary object.
Fixes#19988
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#20418
This patch address two requests made by reviewers of the original "Add
CQL-based RBAC support to Alternator" series. Both requests were about
the error messages produced when access is denied:
1. The error message is improved to use more proper English, and also
to include the name of the role which was denied access.
2. The permission-check and error-message-formatting code is
de-duplicated, using a common function verify_permission().
This de-duplication required moving the access-denied error path to
throwing an exception instead of the previous exception-free
implementation. However, it can be argued that this change is actually
a good thing, because it makes the successful case, when access is
allowed, faster.
The de-duplicated code is shorter and simpler, and allowed changing
the text of the error message in just one place.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#20326
To prevent stalls due to large number of tokens.
For example, large cluster with say 70 nodes can have
more than 16K tokens.
Fixes#19757
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Add static `make` methods to ranges_holder_{primary,secondary}
and use them to make the ranges objects and pass them
to `token_ranges_owned_by_this_shard`, rather than letting
token_ranges_owned_by_this_shard invoke the right constructor
of the ranges_holder class.
Prepare for making `make` async.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather than holding a variant member (and defining
both ranges_holder_{primary,secondary} in both
specilizations of the class, just make the internal
ranges_holder class first-class citizens
and parameterize the `token_ranges_owned_by_this_shard`
template by this class type.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This patch adds a requirement for the "ALTER" permission on a table to
run a UpdateTimeToLive on it. UpdateTimeToLive is similar in purpose to
UpdateTable, so it makes sense to use the same permission "ALTER" as we
do for UpdateTable.
A tests is also added.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.
Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.
To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.
[1] 66ef711d68Closesscylladb/scylladb#20006
I analyzed all the uses and all except the alternator/ttl.cc seem to
be interested in the result for the purpose of reading.
Alternator is not supported with tablets yet, so the use was annotated
with a relevant issue.
Alternator TTL doesn't yet work on tables using tablets (this is
issue #16567). Before this patch, it can be enabled on a table with
tablets, and the result is a lot of log spam and nothing will get expired.
So let's make the attempt to enable TTL on a table that uses tablets
into a clear error. The error message points to the issue, and also
suggests how to create a table that uses vnodes, not tablets.
This patch also adds a test that verifies that trying to enable TTL
with tablets is an error. Obviously, this test should be removed
once the issue is solved and TTL begins working with tablets.
Refs #16567
Refs #16807
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#17306
This commit renames keyspace::get_effective_replication_map()
to keyspace::get_vnode_effective_replication_map(). This change
is required to ease the analysis of the usage of this function.
When tablets are enabled, then this function shall not be used.
Instead of per-keyspace, per-table replication map should be used.
The rename was performed to distinguish between those two calls.
The next step will be an audit of usages of
keyspace::get_vnode_effective_replication_map().
Refs: scylladb#16626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#17314
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
schema::get_sharder() does not return the correct sharder for tablet-based tables.
Code which is supposed to work with all kinds of tables should use erm::get_sharder().
The expression system uses managed_bytes_opt for values, but result_set
uses bytes_opt. This means that processing values from the result set
in expressions requires a copy.
Out of the two, managed_bytes_opt is the better choice, since it prevents
large contiguous allocations for large blobs. So we switch result_set
to use managed_bytes_opt. Users of the result_set API are adjusted.
The db::function interface is not modified to limit churn; instead we
convert the types on entry and exit. This will be adjusted in a following
patch.
Alternator modifies tags in three operations - TagResource, UntagResource
and UpdateTimeToLive (the latter uses a tag to store the TTL configuration).
All three operations were implemented by three separate steps:
1. Read the current tags.
2. Modify the tags according to the desired operation.
3. Write the modified tags back with update_tags().
This implementation was not safe for concurrent operations - some
modifications may be be lost. We fix this in this patch by using the new
modify_tags() function introduced in the previous patch, which performs
all three steps under one lock so the tag operations are serialized and
correctly isolated.
Fixes#6389
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
they are part of the CQL type system, and are "closer" to types.
let's move them into "types" directory.
the building systems are updated accordingly.
the source files referencing `types.hh` were updated using following
command:
```
find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} +
```
the source files under sstables include "types.hh", which is
indeed the one located under "sstables", so include "sstables/types.hh"
instea, so it's more explicit.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#12926
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.
mutation_reader remains in the readers/ module.
mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.
This is a step forward towards librarization or modularization of the
source base.
Closes#12788
Since fixing issue #11737, when the expiration scanner times out reading
a page of data, it retries asking for the same page instead of giving up
on the scan and starting anew later. This retry was infinite - which can
cause problems if we have a bug in the code or several nodes down, which
can lead to getting hung in the same place in the scan for a very long
(potentially infinite) time without making any progress.
An example of such a bug was issue #12145, where we forgot to handle
shutdowns, so on shutdown of the cluster we just hung forever repeating
the same request that will never succeed. It's better in this case to
just give up on the current scan, and start it anew (from a random
position) later.
Refs #12145 (that issue was already fixed, by a different patch which
stops the iteration when shutting down - not waiting for an infinite
number of iterations and not even one more).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The expiration-scanning thread is a long-running thread which can scan
data for hours, but checks for its abort-source before fetching each
page to allow for timely shutdown. Recently, we added the ability to
retry the page fetching in case of timeout, for forgot to check the
abort source in this new retry loop - which lead to an infinitely-long
shutdown in some tests while the retry loop retries forever.
In this patch we fix this bug by using sleep_abortable() instead of
sleep(). sleep_abortable() will throw an exception if the abort source
was triggered before or during the sleep - and this exception will
stop the scan immediately.
Fixes#12145
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The Alternator TTL expiration scanner scans an entire table using many
small pages. If any of those pages time out for some reason (e.g., an
overload situation), we currently consider the entire scan to have failed
and wait for the next scan period (which by default is 24 hours) when
we start the scan from scratch (at a random position). There is a risk
that if these timeouts are common enough to occur once or more per
scan, the result is that we double or more the effective expiration lag.
A better solution, done in this patch, is to retry from the same position
if a single page timed out - immediately (or almost immediately, we add
a one-second sleep).
Fixes#11737
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12092
The vector(initializer_list<T>) constructor copies the T since
initializer_list is read-only. Move the mutation instead.
This happens to fix a use-after-return on clang 15 on aarch64.
I'm fairly sure that's a miscompile, but the fix is worthwhile
regardless.
Closes#11818
We had quite a few tests for Alternator TTL in test/alternator, but most
of them did not run as part of the usual Jenkins test suite, because
they were considered "very slow" (and require a special "--runveryslow"
flag to run).
In this series we enable six tests which run quickly enough to run by
default, without an additional flag. We also make them even quicker -
the six tests now take around 2.5 seconds.
I also noticed that we don't have a test for the Alternator TTL metrics
- and added one.
Fixes#11374.
Refs https://github.com/scylladb/scylla-monitoring/issues/1783Closes#11384
* github.com:scylladb/scylladb:
test/alternator: insert test names into Scylla logs
rest api: add a new /system/log operation
alternator ttl: log warning if scan took too long.
alternator,ttl: allow sub-second TTL scanning period, for tests
test/alternator: skip fewer Alternator TTL tests
test/alternator: test Alternator TTL metrics
This patch fixes a few errors and out-of-date descriptions in comments
in alternator/ttl.cc. No functional changes.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Currently, we log at "info" level how much time remained at the end of
a full TTL scan until the next scanning period (we sleep for that time).
If the scan was slower than the period, we didn't print anything.
Let's print a warning in this case - it can be useful for debugging,
and also users should know when their desired scan period is not being
honored because the full scan is taking longer than the desired scan
period.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Alternator has the "alternator_ttl_period_in_seconds" parameter for
controlling how often the expiration thread looks for expired items to
delete. It is usually a very large number of seconds, but for tests
to finish quickly, we set it to 1 second.
With 1 second expiration latency, test/alternator/test_ttl.py took 5
seconds to run.
In this patch, we change the parameter to allow a floating-point number
of seconds instead of just an integer. Then, this allows us to halve the
TTL period used by tests to 0.5 seconds, and as a result, the run time of
test_ttl.py halves to 2.5 seconds. I think this is fast enough for now.
I verified that even if I change the period to 0.1, there is no noticable
slowdown to other Alternator tests, so 0.5 is definitely safe.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Tags are a useful mechanism that could be used outside of alternator
namespace. My motivation to move tags_extension and other utilities to
db/tags/ was that I wanted to use them to mark "synchronous mode" views.
I have extracted `get_tags_of_table`, `find_tag` and `update_tags`
method to db/tags/utils.cc and moved alternator/tags_extension.hh to
db/tags/.
The signature of `get_tags_of_table` was changed from `const
std::map<sstring, sstring>&` to `const std::map<sstring, sstring>*`
Original behavior of this function was to throw an
`alternator::api_error` exception. This was undesirable, as it
introduced a dependency on the alternator module. I chose to change it
to return a potentially null value, and added a wrapper function to the
alternator module - `get_tags_of_table_or_throw` to keep the previous
throwing behavior.
Now, mutate/mutate_result accept a flag which decides whether the write
should be rate limited or not.
The new parameter is mandatory and all call sites were updated.
We still consider the TTL support in Alternator to be experimental, so we
don't want to allow a user to enable TTL on a table without turning on a
"--experimental-features" flag. However, there is no reason not to allow
the DescribeTimeToLive call when this experimental flag is off - this call
would simply reply with the truth - that the TTL feature is disabled for
the table!
This is important for client code (such as the Terraform module
described in issue #10660) which uses DescribeTimeToLive for
information, even when it never intends to actually enable TTL.
The patch is trivial - we simply remove the flag check in
DescribeTimeToLive, the code works just as before.
After this patch, the following test now works on Scylla without
experimental flags turned on:
test/alternator/run test_ttl.py::test_describe_ttl_without_ttl
Refs #10660
Signed-off-by: Nadav Har'El <nyh@scylladb.com>