Commit Graph

748 Commits

Author SHA1 Message Date
Nadav Har'El
af39675c38 alternator: fix "/localnodes" to not return nodes still joining
Alternator's "/localnodes" HTTP request is supposed to return the list of
nodes in the local DC to which the user can send requests.

The existing implementation incorrectly used gossiper::is_alive() to check
for which nodes to return - but "alive" nodes include nodes which are still
joining the cluster and not really usable. These nodes can remain in the
JOINING state for a long time while they are copying data, and an attempt
to send requests to them will fail.

The fix for this bug is trivial: change the call to is_alive() to a call
to is_normal().

But the hard part of this test is the testing:

1. An existing multi-node test for "/localnodes" assummed that right after
   a new node was created, it appears on "/localnodes". But after this
   patch, it may take a bit more time for the bootstrapping to complete
   and the new node to appear in /localnodes - so I had to add a retry loop.

2. I added a test that reproduces the bug fixed here, and verifies its
   fix. The test is in the multi-node topology framework. It adds an
   injection which delays the bootstrap, which leaves a new node in JOINING
   state for a long time. The test then verifies that the new node is
   alive (as checked by the REST API), but is not returned by "/localnodes".

3. The new injection for delaying the bootstrap is unfortunately not
   very pretty - I had to do it in three places because we have several
   code paths of how bootstrap works without repair, with repair, without
   Raft and with Raft - and I wanted to delay all of them.

Fixes #19694.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19725

(cherry picked from commit bac7c33313)
(deleted test for cherry-pick)
2024-07-24 11:29:36 +03:00
Pavel Emelyanov
62a23fd86a config: Remove experimental TABLETS feature
... and replace it with boolean enable_tablets option. All the places
in the code are patched to check the latter option instead of the former
feature.

The option is OFF by default, but the default scylla.yaml file sets this
to true, so that newly installed clusters turn tablets ON.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit 83d491af02)

Closes scylladb/scylladb#19012
2024-06-03 12:16:41 +03:00
Pavel Emelyanov
b24fb8dc87 inet_address: Remove to_sstring() in favor of fmt::to_string
The existing inet_address::to_string() calls fmt::format("{}", *this)
anyway. However, the to_string() method is declared in .cc file, while
form formatter is in the header and is equipeed with constexprs so
that converting an address to string is done as much as possible
compile-time.

Also, though minor, fmt::to_string(foo) is believed to be even faster
than fmt::format("{}", foo).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18712
2024-05-21 09:43:08 +03:00
Avi Kivity
52fe351c31 Merge 'Balance tablets within nodes (intra-node migration)' from Tomasz Grabiec
This is needed to avoid severe imbalance between shards which can
happen when some table grows and is split. The inter-node balance can
be equal, so inter-node migration cannot fix the imbalance. Also, if RF=N
then there is not even a possibility of moving tablets around to fix the imbalance.
The only way to bring the system to balance is to move tablets within the nodes.

The system is not prepared for intra-node migration currently. Request coordination
is host-based, while for intra-node migration it should be (also) shard-based.
The solution employed here is to keep the coordination between nodes as-is,
and for intra-node migration storage_proxy-level coordinator is not aware of
the migration (no pending host). The replica-side request handler will be a
second-level coordinator which routes requests to shards, similar to how
the first-level coordinator routes them to hosts.

Tablet sharder is adjusted to handle intra-migration where a tablet
can have two replicas on the same host. For reads, sharder uses the
read selector to resolve the conflict. For writes, the write selector
is used.

The old shard_of() API is kept to represent shard for reads, and new
method is introduced to query the shards for writing:
shard_for_writes(). All writers should be switched to that API, which
is not done in this patch yet.

The request handler on replica side acts as a second-level
coordinator, using sharder to determine routing to shards. A given
sharder has a scope of a single topology version, a single
effective_replication_map_ptr, which should be kept alive during
writes.

perf-simple-query test results show no signs of regression:

Command: perf-simple-query -c1 -m1G --write --tablets --duration=10

Before:

> 83294.81 tps ( 59.5 allocs/op,  14.3 tasks/op,   53725 insns/op,        0 errors)
> 87756.72 tps ( 59.5 allocs/op,  14.3 tasks/op,   54049 insns/op,        0 errors)
> 86428.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54208 insns/op,        0 errors)
> 86211.38 tps ( 59.7 allocs/op,  14.3 tasks/op,   54219 insns/op,        0 errors)
> 86559.89 tps ( 59.6 allocs/op,  14.3 tasks/op,   54188 insns/op,        0 errors)
> 86609.39 tps ( 59.6 allocs/op,  14.3 tasks/op,   54117 insns/op,        0 errors)
> 87464.06 tps ( 59.5 allocs/op,  14.3 tasks/op,   54039 insns/op,        0 errors)
> 86185.43 tps ( 59.6 allocs/op,  14.3 tasks/op,   54169 insns/op,        0 errors)
> 86254.71 tps ( 59.6 allocs/op,  14.3 tasks/op,   54139 insns/op,        0 errors)
> 83395.35 tps ( 60.2 allocs/op,  14.4 tasks/op,   54693 insns/op,        0 errors)
>
> median 86428.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54208 insns/op,        0 errors)
> median absolute deviation: 243.04
> maximum: 87756.72
> minimum: 83294.81
>

After:

> 85523.06 tps ( 59.5 allocs/op,  14.3 tasks/op,   53872 insns/op,        0 errors)
> 89362.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54226 insns/op,        0 errors)
> 88167.55 tps ( 59.7 allocs/op,  14.3 tasks/op,   54400 insns/op,        0 errors)
> 87044.40 tps ( 59.7 allocs/op,  14.3 tasks/op,   54310 insns/op,        0 errors)
> 88344.50 tps ( 59.6 allocs/op,  14.3 tasks/op,   54289 insns/op,        0 errors)
> 88355.06 tps ( 59.6 allocs/op,  14.3 tasks/op,   54242 insns/op,        0 errors)
> 88725.46 tps ( 59.6 allocs/op,  14.3 tasks/op,   54230 insns/op,        0 errors)
> 88640.08 tps ( 59.6 allocs/op,  14.3 tasks/op,   54210 insns/op,        0 errors)
> 90306.31 tps ( 59.4 allocs/op,  14.3 tasks/op,   54043 insns/op,        0 errors)
> 87343.62 tps ( 59.8 allocs/op,  14.3 tasks/op,   54496 insns/op,        0 errors)
>
> median 88355.06 tps ( 59.6 allocs/op,  14.3 tasks/op,   54242 insns/op,        0 errors)
> median absolute deviation: 1007.41
> maximum: 90306.31
> minimum: 85523.06

Command (reads): perf-simple-query -c1 -m1G  --tablets --duration=10

Before:

> 95860.18 tps ( 63.1 allocs/op,  14.1 tasks/op,   42476 insns/op,        0 errors)
> 97537.69 tps ( 63.1 allocs/op,  14.1 tasks/op,   42454 insns/op,        0 errors)
> 97549.23 tps ( 63.1 allocs/op,  14.1 tasks/op,   42470 insns/op,        0 errors)
> 97511.29 tps ( 63.1 allocs/op,  14.1 tasks/op,   42470 insns/op,        0 errors)
> 97227.32 tps ( 63.1 allocs/op,  14.1 tasks/op,   42471 insns/op,        0 errors)
> 94031.94 tps ( 63.1 allocs/op,  14.1 tasks/op,   42441 insns/op,        0 errors)
> 96978.04 tps ( 63.1 allocs/op,  14.1 tasks/op,   42462 insns/op,        0 errors)
> 96401.70 tps ( 63.1 allocs/op,  14.1 tasks/op,   42473 insns/op,        0 errors)
> 96573.77 tps ( 63.1 allocs/op,  14.1 tasks/op,   42440 insns/op,        0 errors)
> 96340.54 tps ( 63.1 allocs/op,  14.1 tasks/op,   42468 insns/op,        0 errors)
>
> median 96978.04 tps ( 63.1 allocs/op,  14.1 tasks/op,   42462 insns/op,        0 errors)
> median absolute deviation: 571.20
> maximum: 97549.23
> minimum: 94031.94
>

After:

> 99794.67 tps ( 63.1 allocs/op,  14.1 tasks/op,   42471 insns/op,        0 errors)
> 101244.99 tps ( 63.1 allocs/op,  14.1 tasks/op,   42472 insns/op,        0 errors)
> 101128.37 tps ( 63.1 allocs/op,  14.1 tasks/op,   42485 insns/op,        0 errors)
> 101065.27 tps ( 63.1 allocs/op,  14.1 tasks/op,   42465 insns/op,        0 errors)
> 101212.98 tps ( 63.1 allocs/op,  14.1 tasks/op,   42456 insns/op,        0 errors)
> 101413.31 tps ( 63.1 allocs/op,  14.1 tasks/op,   42463 insns/op,        0 errors)
> 101464.92 tps ( 63.1 allocs/op,  14.1 tasks/op,   42466 insns/op,        0 errors)
> 101086.74 tps ( 63.1 allocs/op,  14.1 tasks/op,   42488 insns/op,        0 errors)
> 101559.09 tps ( 63.1 allocs/op,  14.1 tasks/op,   42468 insns/op,        0 errors)
> 100742.58 tps ( 63.1 allocs/op,  14.1 tasks/op,   42491 insns/op,        0 errors)
>
> median 101212.98 tps ( 63.1 allocs/op,  14.1 tasks/op,   42456 insns/op,        0 errors)
> median absolute deviation: 200.33
> maximum: 101559.09
> minimum: 99794.67
>

Fixes #16594

Closes scylladb/scylladb#18026

* github.com:scylladb/scylladb:
  Implement fast streaming for intra-node migration
  test: tablets_test: Test sharding during intra-node migration
  test: tablets_test: Check sharding also on the pending host
  test: py: tablets: Test writes concurrent with migration
  test: py: tablets: Test crash during intra-node migration
  api, storage_service: Introduce API to wait for topology to quiesce
  dht, replica: Remove deprecated sharder APIs
  test: Avoid using deprecated sharded API
  db: do_apply_many() avoid deprecated sharded API
  replica: mutation_dump: Avoid deprecated sharder API
  repair: Avoid deprecated sharder API
  table: Remove optimization which returns empty reader when key is not owned by the shard
  dht: is_single_shard: Avoid deprecated sharder API
  dht: split_range_to_single_shard: Work with static_sharder only
  dht: ring_position_range_sharder: Avoid deprecated sharder APIs
  dht: token: Avoid use of deprecated sharder API by switching to static_sharder
  selective_token_sharder: Avoid use of deprecated sharder API
  docs: Document tablet sharding vs tablet replica placement
  readers/multishard.cc: use shard_for_reads() instead of shard_of()
  multishard_mutation_query.cc: use shard_for_reads() instead of shard_of()
  storage_proxy: Extract common code to apply mutations on many shards according to sharder
  storage_proxy: Prepare per-partition rate-limiting for intra-node migration
  storage_proxy: Avoid shard_of() use in mutate_counter_on_leader_and_replicate()
  storage_proxy: Prepare mutate_hint() for intra-node tablet migration
  commitlog_replayer: Avoid deprecated sharder::shard_of()
  lwt: Avoid deprecated sharder::shard_of()
  compaction: Avoid deprecated sharder::shard_of()
  dht: Extract dht::static_sharder
  replica: Deprecate table::shard_of()
  locator: Deprecate effective_replication_map::shard_of()
  dht: Deprecate old sharder API: shard_of/next_shard/token_for_next_shard
  tests: tablets: py: Add intra-node migration test
  tests: tablets: Test that drained nodes are not balanced internally
  tests: tablets: Add checks of replica set validity to test_load_balancing_with_random_load
  tests: tablets: Verify that disabling balancing results in no intra-node migrations
  tests: tablets: Check that nodes are internally balanced
  tests: tablets: Improve debuggability by showing which rows are missing
  tablets, storage_service: Support intra-node migration in move_tablet() API
  tablet_allocator: Generate intra-node migration plan
  tablet_allocator: Extract make_internode_plan()
  tablet_allocator: Maintain candidate list and shard tablet count for target nodes
  tablet_allocator: Lift apply_load/can_accept_load lambdas to member functions
  tablets, streaming: Implement tablet streaming for intra-node migration
  dht, auto_refreshing_sharder: Allow overriding write selector
  multishard_writer: Handle intra-node migration
  storage_proxy: Handle intra-node tablet migration for writes
  tablets: Get rid of tablet_map::get_shard()
  tablets: Avoid tablet_map::get_shard in cleanup
  tablets: test: Use sharder instead of tablet_map::get_shard()
  tablets: tablet_sharder: Allow working with non-local host
  sharding: Prepare for intra-node-migration
  docs: Document sharder use for tablets
  tablets: Introduce tablet transition kind for intra-node migration
  tests: tablets: Fix use-after-move of skiplist in rebalance_tablets()
  sstables, gdb: Track readers in a linked list
  raft topology: Fix global token metadata barrier to not fence ahead of what is drained
2024-05-20 16:13:01 +03:00
Avi Kivity
c7d7ca2c23 feature: grandfather CDC
The CDC feature was made non-experimental in e9072542c1 (2020; 4.4)
and can now be assumed to be always present. We also remove the corresponding
schema_feature.
2024-05-17 20:41:20 +03:00
Tomasz Grabiec
dd4a086b87 selective_token_sharder: Avoid use of deprecated sharder API
I analyzed all the uses and all except the alternator/ttl.cc seem to
be interested in the result for the purpose of reading.

Alternator is not supported with tablets yet, so the use was annotated
with a relevant issue.
2024-05-16 00:28:47 +02:00
Kefu Chai
0b0e661a85 build: bring abseil submodule back
because of https://bugzilla.redhat.com/show_bug.cgi?id=2278689,
the rebuilt abseil package provided by fedora has different settings
than the ones if the tree is built with the sanitizer enabled. this
inconsistency leads to a crash.

to address this problem, we have to reinstate the abseil submodule, so
we can built it with the same compiler options with which we build the
tree.

in this change

* Revert "build: drop abseil submodule, replace with distribution abseil"
* update CMake building system with abseil header include settings
* bump up the abseil submodule to the latest LTS branch of abseil:
  lts_2024_01_16
* update scylla-gdb.py to adapt to the new structure of
  flat_hash_map

This reverts commit 8635d24424.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18511
2024-05-05 23:31:09 +03:00
Kefu Chai
a439ebcfce treewide: include fmt/ranges.h and/or fmt/std.h
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:16 +08:00
Kefu Chai
168ade72f8 treewide: replace formatter<std::string_view> with formatter<string_view>
in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>`
for `std::string_view` as well as the specialization of `fmt::formatter<..>`
for `fmt::string_view` which is an implementation builtin in {fmt} for
compatibility of pre-C++17. and this type is used even if the code is
compiled with C++ stadandard greater or equal to C++17. also, before v10,
the `fmt::formatter<std::string_view>::format()` is defined so it accepts
`std::string_view`. after v10, `fmt::formatter<std::string_view>` still
exists, but it is now defined using `format_as()` machinery, so it's
`format()` method does not actually accept `std::string_view`, it
accepts `fmt::string_view`, as the former can be converted to
`fmt::string_view`.

this is why we can inherit from `fmt::formatter<std::string_view>` and
use `formatter<std::string_view>::format(foo, ctx);` to implement the
`format()` method with {fmt} v9, but we cannot do this with {fmt} v10,
and we would have following compilation failure:

```
FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o
/home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc
/home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format'
  254 |     return formatter<std::string_view>::format(it->second, ctx);
      |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument
 2759 |   FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const
      |                      ^      ~~~~~~~~~~~~
```

because the inherited `format()` method actually comes from
`fmt::formatter<fmt::string_view>`. to reduce the confusion, in this
change, we just inherit from `fmt::format<string_view>`, where
`string_view` is actually `fmt::string_view`. this follows
the document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,
and since there is less indirection under the hood -- we do not
use the specialization created by `FMT_FORMAT_AS` which inherit
from `formatter<fmt::string_view>`, hopefully this can improve
the compilation speed a little bit. also, this change addresses
the build failure with {fmt} v10.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18299
2024-04-19 07:44:07 +03:00
Nadav Har'El
c24bc3b57a alternator: do not use tablets on new Alternator tables
A few months ago, in merge d3c1be9107,
we decided that if Scylla has the experimental "tablets" feature enabled,
new Alternator tables should use this feature by default - exactly like
this is the default for new CQL tables.

Sadly, it was now decided to reverse this decision: We do not yet trust
enough LWT on tablets, and since Alternator often (if not always) relies
on LWT, we want Alternator tables to continue to use vnodes - not tablets.

The fix is trivial - just changing the default. No test needed to change
because anyway, all Alternator tests work correctly on Scylla with the
tablets experimental feature disabled. I added a new test to enshrine
the fact that Alternator does not use tablets.

An unfortunate result of this patch will be that Alternator tables
created on versions with this patch (e.g., Scylla 6.0) will not use
tablets and will continue to not use tablets even if Scylla is upgraded
(currently, the use of tablets is decided at table creation time, and
there is no way to "upgrade" a vnode-based table to be tablet based).

This patch should be reverted as soon as LWT support matures on tablets.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18157
2024-04-04 12:11:29 +03:00
Nadav Har'El
ba97fd98a3 alternator: reduce stall for Query and Scan with large pages
Before this patch, Alternator's Query and Scan operations convert an
entire result page to JSON without yielding. For a page of maximum
size (1MB) and tiny rows, this can cause a significant stall - the
test included in this patch reported stalls of 14-26ms on my laptop.

The problem is the describe_items() function, which does this conversion
immediately, without yielding. This patch changes this function to
return a future, and use the result_set::visit_gently() method
instead of visit() that yields when needed.

This patch does not completely eliminate stalls in the test, but
on my laptop usually reduces them to around 5ms. It appears that
the remaining stalls some from other places not fixed in this PR,
such as perhaps query_page::handle_result(), and will need to be
fixed by additional patches.

The test included in this patch is useful for manually reproducing
the stall, but not useful as a regression test: It is slow (requiring
a couple of seconds to set up the large partition) and doesn't
check anything, and can't even report the stall without modifying the
test runner. So the test is skipped by default (using the "veryslow"
marker) and can be enabled and run manually by developers who want
to continue working on #17995.

Refs #17995.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-03-26 18:32:45 +02:00
Nadav Har'El
c146b1224c alternator: coroutinize do_query() function
This patch changes the do_query() function, used to implement Alternator's
Query and Scan operations, from using continuations to be a coroutine.
There are no functional changes in this patch, it's just the necessary
changes to convert the function to a coroutine.

The new code is easier to read and less indented, but more importantly,
will be easier to extend in the next patch to add additional awaits
in the middle of the function.

In additional to the obvious changes, I also had to rename one local
variable (as the same name was used in two scopes), and to convert
pass-by-rvalue-reference to pass-by-value (these parameters are *moved*
by the caller, and moreover the old code had to move them again to a
continuation, so there is no performance penalty in this change).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-03-26 15:08:08 +02:00
Amnon Heiman
8b43609920 alternator: Use summary for shard-level latencies.
Shard-level latencies generate a lot of metrics. This patch reduces the
the number of latencies reported by Alternator while keeping the same
functionality.

On the shard level, summaries will be reported instead of histograms.
On the instance level, an aggregated histogram will be reported.

Summaries, histograms, and counters are marked with skip_when_empty.

Fixes #12230

Closes scylladb/scylladb#17581
2024-03-11 11:12:08 +02:00
Marcin Maliszkiewicz
9cb1f111d5 alternator: add support for auth-v2
Alternator doesn't do any writes to auth
tables so it's simply change of keyspace
name.

Docs will be updated later, when auth-v2
is enabled as default.
2024-03-01 16:25:14 +01:00
Kefu Chai
57c408ab5d alternator: add fmt::formatter for alternator::parsed::path
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `alternator::parsed::path`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17458
2024-02-22 16:40:01 +02:00
Kefu Chai
3c84f08b93 alternator: add formatter for attribute_path_map_node<update_expression::action>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`attribute_path_map_node<update_expression::action>`, and drop its
operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17270
2024-02-19 20:09:11 +02:00
Nadav Har'El
28db187756 alternator, tablets: return error if enabling TTL with tablets
Alternator TTL doesn't yet work on tables using tablets (this is
issue #16567). Before this patch, it can be enabled on a table with
tablets, and the result is a lot of log spam and nothing will get expired.

So let's make the attempt to enable TTL on a table that uses tablets
into a clear error. The error message points to the issue, and also
suggests how to create a table that uses vnodes, not tablets.

This patch also adds a test that verifies that trying to enable TTL
with tablets is an error. Obviously, this test should be removed
once the issue is solved and TTL begins working with tablets.

Refs #16567
Refs #16807

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17306
2024-02-15 10:47:06 +02:00
Patryk Wrobel
a3fb44cbca Rename keyspace::get_effective_replication_map()
This commit renames keyspace::get_effective_replication_map()
to keyspace::get_vnode_effective_replication_map(). This change
is required to ease the analysis of the usage of this function.

When tablets are enabled, then this function shall not be used.
Instead of per-keyspace, per-table replication map should be used.
The rename was performed to distinguish between those two calls.
The next step will be an audit of usages of
keyspace::get_vnode_effective_replication_map().

Refs: scylladb#16626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17314
2024-02-13 20:22:02 +02:00
Nadav Har'El
dce47a81b0 alternator, tablets: return error if enabling Streams with tablets
Alternator Streams doesn't yet work on tables using tablets (this is
issue #16317). Before this patch, an attempt to enable it results in
an unsightly InternalServerError, which isn't terrible - but we can
do better.

So in this patch, we make the attempt to enable Streams and tablets
together into a clear error. The error message points to the open issue,
and also suggests how to create a table that uses vnodes, not tablets.

Unfortunately, there are slightly two different code paths and error
messages for two cases: One case is the creation of a new table (where
the validation happens before the keyspace is actually created), and
the other case is an attempt to enable streams on an existing table
with an existing keyspace (which already might or might not be using
tablets).

This patch also adds a test that verifies that trying to enable Streams
with tablets is an error - in both cases (table creation and update).
Obviously, this test - and the validation code - should be removed once
the issue is solved and Alternator Streams begins working with tablets.

Fixes #16497
Refs #16807

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17311
2024-02-13 16:42:35 +02:00
Kefu Chai
d7a404e1ec alternator: add formatter for alternator::calculate_value_caller
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `alternator::calculate_value_caller`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17259
2024-02-11 11:49:46 +02:00
Avi Kivity
7cb1c10fed treewide: replace seastar::future::get0() with seastar::future::get()
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.
2024-02-02 22:12:57 +08:00
Botond Dénes
d3c1be9107 Merge 'alternator: enable tablets by default if experimental feature is enabled' from Nadav Har'El
This series does a similar change to Alternator as was done recently to CQL:

1. If the "tablets" experimental feature in enabled, new Alternator tables will use tablets automatically, without requiring an option on each new table. A default choice of initial_tablets is used. These choices can still be overridden per-table if the user wants to.
3. In particular, all test/alternator tests will also automatically run with tablets enabled
4. However, some tests will fail on tablets because they use features that haven't yet been implemented with tablets - namely Alternator Streams (Refs #16317) and Alternator TTL (Refs #16567). These tests will - until those features are implemented with tablets - continue to be run without tablets.
5. An option is added to the test/alternator/run to allow developers to manually run tests without tablets enabled, if they wish to (this option will be useful in the short term, and can be removed later).

Fixes #16355

Closes scylladb/scylladb#16900

* github.com:scylladb/scylladb:
  test/alternator: add "--vnodes" option to run script
  alternator: use tablets by default, if available
  test/alternator: run some tests without tablets
2024-01-29 09:22:13 +02:00
Nadav Har'El
08b26269d8 alternator: allow empty tag value
The existing code incorrectly forbid setting a tag on a table to an empty
string value, but this is allowed by DynamoDB and is useful, so we fix it
in this patch.

While at it, improve the error-checking code for tag parameters to
cleanly detect more cases (like missing or non-string keys or values).

The following patch is a test that fails before this patch (because
it fails to insert a tag with an empty value) and passes after it.

Fixes #16904.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 11:26:08 +02:00
Nadav Har'El
c496d60716 alternator: use tablets by default, if available
Before this patch, Alternator tables did not use tablets even if this
feature was available - tablets had to be manually enabled per table
by using a tag. But recently we changed CQL to enable tablets by default
on all keyspaces (when the experimental "tablets" option is turned on),
so this patch does the same for Alternator tables:

1. When the "tablets" experimental feature is on, new Alternator tables
   will use tablets instead of vnodes. They will use the default choice
   of initial_tablets.

2. The same tag that in the past could be used to enable tablets on a
   specific table, now can be used to disable tablets or change the
   default initial_tablets for a specific table at creation time.

Fixes #16355

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 10:53:23 +02:00
Kefu Chai
a0e5c14c55 alternator: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16736
2024-01-12 10:53:32 +02:00
Kefu Chai
64a227fba0 alternator/auth: remove unused #include
in `alternator/auth.cc`, none of the symbols in "query" namespace
provided by the removed headers is used is used, so there is no
need to include this header file.

the same applies to other removed header files.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16603
2024-01-02 17:50:59 +02:00
Pavel Emelyanov
ffdafe4024 keyspace_metadata: Add default value for new_keyspace's durable_writes
Almost all callers call new_keyspace with durable writes ON, so it's
worth having default value for it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-26 11:47:37 +03:00
Pavel Emelyanov
c43501d973 locator,schema: Move initial tablets from r.s. options to params
The option is kepd in DDL, but is _not_ stored in
system_schema.keyspaces. Instead, it's removed from the provided options
and kept in scylla_keyspaces table in its own column. All the places
that had optional initial_tablets disengaged now set this value up the
way the find appropriate.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:07:10 +03:00
Pavel Emelyanov
a67c535539 keyspace_metadata: Carry optional<initial_tablets> on board
The object in question fully describes the keyspace to be created and,
among other things, contains replication strategy options. Next patches
move the "initial_tablets" option out of those options and keep it
separately, so the ks metadata should also carry this option separately.

This patch is _just_ extending the metadata creation API, in fact the
new field is unused (write-only) so all the places that need to provide
this data keep it disengaged and are explicitly marked with FIXME
comment. Next patches will fix that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:58:05 +03:00
Nadav Har'El
2c0b472f44 alternator: explicitly request synchronous updates for LSI
DynamoDB's *local* secondary index (LSI) allows strongly-consistent
reads from the materialized view, which must be able to read what was
previously written to the base. To support this, we need the view to
use the "synchronous_updates".

Previously, with vnodes, there was no need for using this option
explicitly, because an LSI has the same partition key as the base table
so the base and view replicas are the same, and the local writes are
done synchronously. But with tablets, this changes - there is no longer
a guarantee that the base and view tablets are located on the same node.
So to restore the strong consistency of LSIs when tablets are enabled,
this patch explicitly adds the "synchronous_updates" option to views
created by Alternator LSIs. We do *not* add this option for GSIs - those
do not support strongly-consistent reads.

This fix was tested by a test that will be introduced in the following
patches. The test showed that before this patch, it was possible that
reading with ConsistentRead=True from an LSI right after the base was
written would miss the new changes, but after this patch, it always
sees the new data in the LSI.

Fixes #16313.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-17 20:14:59 +02:00
Nadav Har'El
d11f5e9625 alternator: fix view creation when using tablets
In commit 88a5ddabce, we fixed materialized
view creation to support tablets. We added to the function called to
create materialized views in CQL, prepare_new_view_announcement()
a missing call to the on_before_create_column_family() notifier that
creates tablets for this new view.

We have the same problem in Alternator when creating a view (GSI or LSI).
The Alternator code does not use prepare_new_view_announcement(), and
instead uses the lower-level function add_table_or_view_to_schema_mutation()
so it didn't get the call to the notifier, so we must add it here too.

Before this patch, creating an Alternator table with tablets (which has
become possible after the previous patch) fails with "Tablet map not found
for table <uuid>". With this patch, it works.

A test for materialized views in Alternator will come in a following
patch, and will test everything together - the CreateTable tag to use
tablets (from the previous patch), the LSI/GSI creation (fixed in this patch)
and the correct consistency of the LSI (fixed in the next patch).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-17 19:55:36 +02:00
Nadav Har'El
8e356d8c31 alternator: add experimental method to create a table with tablets
As explained in issue #16203, we cannot yet enable tablets on Alternator
keyspaces by default, because support for some of the features that
Alternator needs, such as CDC, is not yet available.
Nevertheless, to start testing Alternator integration with tablets,
we want to provide a way to enable tablets in Alternator for tests.

In this patch we add support for a tag, 'experimental:initial_tablets',
which if added on a table during creation, uses tablets for its keyspace.
The value of this tag is a numeric string, and it is exactly analogous
to the 'initial_tablets' property we have in CQL's NetworkTopologyStrategy.

We name this tag with the "experimental:" prefix to emphesize that it
is experimental, and the way to enable or disable tablets will probably
change later.

The new tag only has effect when added while *creating* a table.
Adding, deleting or changing it later on an existing table will have
no effect.

A later patch will have tests that use this tag to test Alternator with
tablets.

Refs #16203.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-17 19:55:30 +02:00
Botond Dénes
d2a88cd8de Merge 'Typos: fix typos in code' from Yaniv Kaul
Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255

Closes scylladb/scylladb#16289

* github.com:scylladb/scylladb:
  Update unified/build_unified.sh
  Update main.cc
  Update dist/common/scripts/scylla-housekeeping
  Typos: fix typos in code
2023-12-06 07:36:41 +02:00
Yaniv Kaul
ae2ab6000a Typos: fix typos in code
Fixes some more typos as found by codespell run on the code.
In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255
2023-12-05 15:18:11 +02:00
Benny Halevy
03fe674314 alternator: ttl: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Kefu Chai
efd65aebb2 build: cmake: add check-header target
to have feature parity with `configure.py`. we won't need this
once we migrate to C++20 modules. but before that day comes, we
need to stick with C++ headers.

we generate a rule for each .hh files to create a corresponding
.cc and then compile it, in order to verify the self-containness of
that header. so the number of rule is quite large, to avoid the
unnecessary overhead. the check-header target is enabled only if
`Scylla_CHECK_HEADERS` option is enabled.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15913
2023-11-13 10:27:06 +02:00
Nadav Har'El
a3621dbd3e Merge 'Alternator: Support new ReturnValuesOnConditionCheckFailure feature' from Marcin Maliszkiewicz
alternator: add support for ReturnValuesOnConditionCheckFailure feature

As announced in https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-dynamodb-cost-failed-conditional-writes/, DynamoDB added a new option for write operations (PutItem, UpdateItem, or DeleteItem), ReturnValuesOnConditionCheckFailure, which if set to ALL_OLD returns the current value of the item - but only if a condition check failed.

Fixes https://github.com/scylladb/scylladb/issues/14481

Closes scylladb/scylladb#15125

* github.com:scylladb/scylladb:
  alternator: add support for ReturnValuesOnConditionCheckFailure feature
  alternator: add ability to send additional fields in api_error
2023-11-07 23:19:51 +02:00
Kamil Braun
ae58e39743 Merge 'reduce announcements of the automatic schema changes' from Patryk Jędrzejczak
There are some schema modifications performed automatically (during
bootstrap, upgrade etc.) by Scylla that are announced by multiple calls
to `migration_manager::announce` even though they are logically one
change. Precisely, they appear in:
- `system_distributed_keyspace::start`,
- `redis:create_keyspace_if_not_exists_impl`,
- `table_helper::setup_keyspace` (for the `system_traces` keyspace).

All these places contain a FIXME telling us to `announce` only once.
There are a few reasons for this:
- calling `migration_manager::announce` with Raft is quite expensive --
  taking a `read_barrier` is necessary, and that requires contacting a
leader, which then must contact a quorum,
- we must implement a retrying mechanism for every automatic `announce`
  if `group0_concurrent_modification` occurs to enable support for
concurrent bootstrap in Raft-based topology. Doing it before the FIXMEs
mentioned above would be harder, and fixing the FIXMEs later would also
be harder.

This PR fixes the first two FIXMEs and improves the situation with the
last one by reducing the number of the `announce` calls to two.
Unfortunately, reducing this number to one requires a big refactor. We
can do it as a follow-up to a new, more specific issue. Also, we leave a
new FIXME.

Fixing the first two FIXMEs required enabling the announcement of a
keyspace together with its tables. Until now, the code responsible for
preparing mutations for a new table could assume the existence of the
keyspace. This assumption wasn't necessary, but removing it required
some refactoring.

Fixes scylladb/scylladb#15437

Closes scylladb/scylladb#15897

* github.com:scylladb/scylladb:
  table_helper: announce twice in setup_keyspace
  table_helper: refactor setup_table
  redis: create_keyspace_if_not_exists_impl: fix indentation
  redis: announce once in create_keyspace_if_not_exists_impl
  db: system_distributed_keyspace: fix indentation
  db: system_distributed_keyspace: announce once in start
  tablet_allocator: update on_before_create_column_family
  migration_listener: add parameter to on_before_create_column_family
  alternator: executor: use new prepare_new_column_family_announcement
  alternator: executor: introduce create_keyspace_metadata
  migration_manager: add new prepare_new_column_family_announcement
2023-11-02 09:32:35 +01:00
Patryk Jędrzejczak
a2e48b1a5b alternator: executor: use new prepare_new_column_family_announcement
We can use the new prepare_new_column_family_announcement function
that doesn't assume the existence of the keyspace instead of the
previous work-around.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
4ad2d895a3 alternator: executor: introduce create_keyspace_metadata
We need to store a new keyspace's keyspace_metadata as a local
variable in create_table_on_shard0. In the following commit, we
use it to call the new prepare_new_column_family_announcement
function.
2023-10-31 12:08:03 +01:00
Kefu Chai
9dd5af7fef alternator: avoid using the deprecated API
this change silences following compiling warning due to using the
deprecated API by using the recommended API in place of the deprecated
one:

```
/home/kefu/dev/scylladb/alternator/server.cc:569:27: warning: 'set_tls_credentials' is deprecated: use listen(socket_address addr, server_credentials_ptr credentials) [-Wdeprecated-declarations]
            _https_server.set_tls_credentials(creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {
                          ^
/home/kefu/dev/scylladb/seastar/include/seastar/http/httpd.hh:186:7: note: 'set_tls_credentials' has been explicitly marked deprecated here
    [[deprecated("use listen(socket_address addr, server_credentials_ptr credentials)")]]
      ^
1 warning generated.
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15884
2023-10-31 12:05:58 +03:00
Marcin Maliszkiewicz
3992d1c2ce alternator: add support for ReturnValuesOnConditionCheckFailure feature
As announced in https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-dynamodb-cost-failed-conditional-writes/,
DynamoDB added a new option for write operations (PutItem, UpdateItem, or DeleteItem),
ReturnValuesOnConditionCheckFailure, which if set to ALL_OLD returns the
current value of the item - but only if a condition check failed.

Fixes https://github.com/scylladb/scylladb/issues/14481
2023-10-30 15:33:56 +01:00
Marcin Maliszkiewicz
b4c77a373d alternator: add ability to send additional fields in api_error
While it may not be explicitly documented DynamoDB sometimes enchriches error
message by additional fields. For instance when ConditionalCheckFailedException
occurs while ReturnValuesOnConditionCheckFailure is set it will add Item object,
similarly for TransactionCanceledException it will add CancellationReasons object.
There may be more cases like this so generic json field is added to our error class.

The change will be used by future commit implementing ReturnValuesOnConditionCheckFailure
feature.
2023-10-30 15:13:06 +01:00
Avi Kivity
d450a145ce Revert "Merge 'reduce announcements of the automatic schema changes ' from Patryk Jędrzejczak"
This reverts commit 4b80130b0b, reversing
changes made to a5519c7c1f. It's suspected
of causing dtest failures due to a bug in coroutine::parallel_for_each.
2023-10-29 18:32:06 +02:00
Nadav Har'El
4b80130b0b Merge 'reduce announcements of the automatic schema changes ' from Patryk Jędrzejczak
There are some schema modifications performed automatically (during bootstrap, upgrade etc.) by Scylla that are announced by multiple calls to `migration_manager::announce` even though they are logically one change. Precisely, they appear in:
- `system_distributed_keyspace::start`,
- `redis:create_keyspace_if_not_exists_impl`,
- `table_helper::setup_keyspace` (for the `system_traces` keyspace).

All these places contain a FIXME telling us to `announce` only once. There are a few reasons for this:
- calling `migration_manager::announce` with Raft is quite expensive -- taking a `read_barrier` is necessary, and that requires contacting a leader, which then must contact a quorum,
- we must implement a retrying mechanism for every automatic `announce` if `group0_concurrent_modification` occurs to enable support for concurrent bootstrap in Raft-based topology. Doing it before the FIXMEs mentioned above would be harder, and fixing the FIXMEs later would also be harder.

This PR fixes the first two FIXMEs and improves the situation with the last one by reducing the number of the `announce` calls to two. Unfortunately, reducing this number to one requires a big refactor. We can do it as a follow-up to a new, more specific issue. Also, we leave a new FIXME.

Fixing the first two FIXMEs required enabling the announcement of a keyspace together with its tables. Until now, the code responsible for preparing mutations for a new table could assume the existence of the keyspace. This assumption wasn't necessary, but removing it required some refactoring.

Fixes #15437

Closes scylladb/scylladb#15594

* github.com:scylladb/scylladb:
  table_helper: announce twice in setup_keyspace
  table_helper: refactor setup_table
  redis: create_keyspace_if_not_exists_impl: fix indentation
  redis: announce once in create_keyspace_if_not_exists_impl
  db: system_distributed_keyspace: fix indentation
  db: system_distributed_keyspace: announce once in start
  tablet_allocator: update on_before_create_column_family
  migration_listener: add parameter to on_before_create_column_family
  alternator: executor: use new prepare_new_column_family_announcement
  alternator: executor: introduce create_keyspace_metadata
  migration_manager: add new prepare_new_column_family_announcement
2023-10-24 15:42:48 +03:00
Nikita Kurashkin
2a7932efa1 alternator: fix DeleteTable return values to match DynamoDB's
It seems that Scylla has more values returned by DeleteTable operation than DynamoDB.
In this patch I added a table status check when generating output.
If we delete the table, values KeySchema, AttributeDefinitions and CreationDateTime won't be returned.
The test has also been modified to check that these attributes are not returned.

Fixes scylladb#14132

Closes scylladb/scylladb#15707
2023-10-19 09:34:16 +03:00
Patryk Jędrzejczak
96d9e768c4 alternator: executor: use new prepare_new_column_family_announcement
We can use the new prepare_new_column_family_announcement function
that doesn't assume the existence of the keyspace instead of the
previous work-around.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
fcd092473c alternator: executor: introduce create_keyspace_metadata
We need to store a new keyspace's keyspace_metadata as a local
variable in create_table_on_shard0. In the following commit, we
use it to call the new prepare_new_column_family_announcement
function.
2023-10-16 14:59:53 +02:00
Benny Halevy
3208af1880 gossiper: add num_endpoints
Return the number of endpoints tracked by gossiper.
This is useful when the caller doesn't need
access to the endpoint states map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 08:30:40 +03:00