Commit Graph

41314 Commits

Author SHA1 Message Date
Petr Gusev
1d6caa42b9 join_cluster: move was_decommissioned check earlier
Before the patch if a decommissioned node tries
to restart, it calls _group0->discover_group0 first
in join_cluster, which hangs since decommissioned
nodes are banned and other nodes don't respond
to their discovering requests.

We fix the problem by checking was_decommissioned()
flag before calling discover_group0.

fixes scylladb/scylladb#17282

Closes scylladb/scylladb#17358
2024-02-18 22:07:28 +02:00
Kefu Chai
9d666f7d29 cmake: add -Wextra to compiling options
this matches what we have in configure.py

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17376
2024-02-18 19:21:54 +02:00
Kefu Chai
cb781c0ff7 gms: add add formatter for gms::versioned_value
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `gms::versioned_value`. its
operator<< is preserved, as it's still being used by the homebrew
generic formatter for std::unordered_map<gms::application_state,
gms::versioned_value>, which is in turn used in gms/gossiper.cc.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17366
2024-02-18 19:21:54 +02:00
Avi Kivity
43f1c3df2e Merge 'repair: Update repair history for tablet repair' from Asias He
This patch wires up tombstone_gc repair with tablet repair. The flush
hints logic from the vnode table repair is reused. The way to mark the
finish of the repair is also adjusted for tablet repair because it only
has one shard per tablet token range instead of smp::count shards.

Fixes: #17046
Tests: test_tablet_repair_history

Closes scylladb/scylladb#17047

* github.com:scylladb/scylladb:
  repair: Update repair history for tablet repair
  repair: Extract flush hints code
2024-02-18 19:21:54 +02:00
Kefu Chai
8fc4243cf6 configure.py: do not pass include cxx_ldflags in cxxflags
ldflags are passed to ld (the linker), while cxxflags are passed to the
C++ compiler. the compiler does not understand the ldflags. if we
pass ldflags to it, it complains if `-Wunused-command-line-argument` is
enabled.

in this change, we do not include the ldflags in cxxflags, this helps
us to enable the warning option of `-Wunused-command-line-argument`,
so we don't need to disabled it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17328
2024-02-18 19:21:54 +02:00
Avi Kivity
d257cc5003 Merge 'scylla-nodetool: implement the repair command' from Botond Dénes
As usual, the new command is covered with tests, which pass with both the legacy and the new native implementation.

Refs: #15588

Closes scylladb/scylladb#17368

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the repair command
  test/nodetool: utils: add check_nodetool_fails_with_error_contains()
  test/nodetool: util: replace flags with custom matcher
2024-02-18 19:21:54 +02:00
Petr Gusev
4ef5d92f50 gossiping_property_file_snitch_test: modernize + fix potential race
This is mostly a refactoring commit to make the test
more readable, as a byproduct of
scylladb/scylladb#17369 investigation.

We add the check for specific type of exceptions that
can be thrown (bad_property_file_error).

We also fix the potential race - the test may write
to res from multiple cores with no locks.

Closes scylladb/scylladb#17371
2024-02-18 19:21:53 +02:00
Kefu Chai
4812a57f71 gms: add add formatter for gms::gossip_*
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

- gms::gossip_digest
- gms::gossip_digest_ack
- gms::gossip_digest_syn

and drop their operator<<:s

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17379
2024-02-18 19:21:53 +02:00
Patryk Wrobel
3842bf18a7 storage_service/range_to_endpoint_map: allow API to properly handle tablets
This API endpoint was failing when tablets were enabled
because of usage of get_vnode_effective_replication_map().
Moreover, it was providing an error message that was not
user-friendly.

This change extends the handler to properly service the incoming requests.
Furthermore, it introduces two new test cases that verify the behavior of
storage_service/range_to_endpoint_map API. It also adjusts the test case
of this endpoint for vnodes to succeed when tablets are enabled by default.

The new logic is as follows:
 - when tablets are disabled then users may query endpoints
   for a keyspace or for a given table in a keyspace
 - when tablets are enabled then users have to provide
   table name, because effective replication map is per-table

When user does not provide table name when tablets are enabled
for a given keyspace, then BAD_REQUEST is returned with a
meaningful error message.

Fixes: scylladb#17343

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17372
2024-02-18 19:21:53 +02:00
Kefu Chai
808f4d72fb storage_service: fix typos in comment
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17377
2024-02-18 19:21:53 +02:00
Botond Dénes
b11213e547 tools/scylla-nodetool: implement the upgradesstables command
Refs: #15588

Closes scylladb/scylladb#17370
2024-02-18 19:21:53 +02:00
Kefu Chai
af2553e8bc cdc: add formatter for cdc::image_mode and cdc::delta_mode
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
cdc::image_mode and cdc::delta_mode, and drop their operator<<:s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17381
2024-02-18 19:21:53 +02:00
Avi Kivity
9bb4482ad0 Merge 'cdc: metadata: allow sending writes to the previous generations' from Patryk Jędrzejczak
Before this PR, writes to the previous CDC generations would
always be rejected. After this PR, they will be accepted if the
write's timestamp is greater than `now - generation_leeway`.

This change was proposed around 3 years ago. The motivation was
to improve user experience. If a client generates timestamps by
itself and its clock is desynchronized with the clock of the node
the client is connected to, there could be a period during
generation switching when writes fail. We didn't consider this
problem critical because the client could simply retry a failed
write with a higher timestamp. Eventually, it would succeed. This
approach is safe because these failed writes cannot have any side
effects. However, it can be inconvenient. Writing to previous
generations was proposed to improve it.

The idea was rejected 3 years ago. Recently, it turned out that
there is a case when the client cannot retry a write with the
increased timestamp. It happens when a table uses CDC and LWT,
which makes timestamps permanent. Once Paxos commits an entry
with a given timestamp, Scylla will keep trying to apply that entry
until it succeeds, with the same timestamp. Applying the entry
involves writing to the CDC log table. If it fails, we get stuck.
It's a major bug with an unknown perfect solution.

Allowing writes to previous generations for `generation_leeway` is
a probabilistic fix that should solve the problem in practice.

Apart from this change, this PR adds tests for it and updates
the documentation.

This PR is sufficient to enable writes to the previous generations
only in the gossiper-based topology. The Raft-based topology
needs some adjustments in loading and cleaning CDC generations.
These changes won't interfere with the changes introduced in this
PR, so they are left for a follow-up.

Fixes scylladb/scylladb#7251
Fixes scylladb/scylladb#15260

Closes scylladb/scylladb#17134

* github.com:scylladb/scylladb:
  docs: using-scylla: cdc: remove info about failing writes to old generations
  docs: dev: cdc: document writing to previous CDC generations
  test: add test_writes_to_previous_cdc_generations
  cdc: generation: allow increasing generation_leeway through error injection
  cdc: metadata: allow sending writes to the previous generations
2024-02-18 19:21:53 +02:00
Asias He
796044be1c repair: Update repair history for tablet repair
This patch wires up tombstone_gc repair with tablet repair. The flush
hints logic from the vnode table repair is reused. The way to mark the
finish of the repair is also adjusted for tablet repair because it only
has one shard per tablet token range instead of smp::count shards.

Fixes: #17046
Tests: test_tablet_repair_history
2024-02-18 10:21:58 +08:00
Asias He
e43bc775d0 repair: Extract flush hints code
So it can be used by tablet repair as well.
2024-02-18 09:42:02 +08:00
Kefu Chai
50964c423e hints: host_filter: add formatter for hints::host_filter
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `hints::host_filter`. its
operator<< is preserved as it's still used by the homebrew generic
formatter for vector<>, which is in turn used by db/config.cc.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17347
2024-02-16 19:03:11 +03:00
Anna Stuchlik
e132ffdb60 doc: add missing redirections
This commit adds the missing redirections
to the pages whose source files were
previously stored in the install-scylla folder
and were moved to another location.

Closes scylladb/scylladb#17367
2024-02-16 14:09:26 +02:00
Kefu Chai
47fec0428a tools/scylla-nodetool: return 1 when viewbuild not succeeds
this change introduces a new exception which carries the status code
so that an operation can return a non-zero exit code without printing
any errors. this mimics the behavior of "viewbuildstatus" command of
C* nodetool.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17359
2024-02-16 13:53:33 +02:00
Botond Dénes
8d8ea12862 tools/scylla-nodetool: implement the repair command 2024-02-16 04:42:08 -05:00
Botond Dénes
48e8435466 test/nodetool: utils: add check_nodetool_fails_with_error_contains()
Checks that at least one error snippet is contained in the error output.
2024-02-16 04:40:31 -05:00
Botond Dénes
190c9a7239 test/nodetool: util: replace flags with custom matcher
_do_check_nodetool_fails_with() currently has a `match_all` flag to
control how the match is checked. Now we need yet another way to control
how matching is done. Instead of adding yet another flag (and who knows
how many more), jut replace the flag and the errors input with a matcher
functor, which gets the stdout and stderr and is delegated to do any
checks it wants. This method will scale much better going forward.
2024-02-16 04:40:31 -05:00
Avi Kivity
eedb997568 Merge 'compaction: upgrade: handle keyspaces that use tablets' from Lakshmi Narayanan Sreethar
Tables in keyspaces governed by replication strategy that uses tablets, have separate effective_replication_maps. Update the upgrade compaction task to handle this when getting owned key ranges for a keyspace.

Fixes #16848

Closes scylladb/scylladb#17335

* github.com:scylladb/scylladb:
  compaction: upgrade: handle keyspaces that use tablets
  replica/database: add an optional variant to get_keyspace_local_ranges
2024-02-15 21:31:54 +02:00
Kefu Chai
f0b3068bcf build: cmake: disable unused-parameter, missing-field-initializers and deprecated-copy
-Wunused-parameter, -Wmissing-field-initializers and -Wdeprecated-copy
warning options are enabled by -Wextra. the tree fails to build with
these options enabled, before we address them if the warning are genuine
problems, let's disable them.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17352
2024-02-15 21:27:44 +02:00
Kamil Braun
50ebce8acc Merge 'Purge old ip on change' from Petr Gusev
When a node changes IP address we need to remove its old IP from `system.peers` and gossiper.

We do this in `sync_raft_topology_nodes` when the new IP is saved into `system.peers` to avoid losing the mapping if the node crashes between deleting and saving the new IP. We also handle the possible duplicates in this case by dropping them on the read path when the node is restarted.

The PR also fixes the problem with old IPs getting resurrected when a node changes its IP address.
The following scenario is possible: a node `A` changes its IP from `ip1` to `ip2` with restart, other nodes are not yet aware of `ip2` so they keep gossiping `ip1`. After restart `A` receives `ip1` in a gossip message and calls `handle_major_state_change` since it considers it as a new node. Then `on_join` event is called on the gossiper notification handlers, we receive such event in `raft_ip_address_updater` and reverts the IP of the node A back to ip1.

To fix this we ensure that the new gossiper generation number is used when a node registers its IP address in `raft_address_map` at startup.

The `test_change_ip` is adjusted to ensure that the old IPs are properly removed in all cases, even if the node crashes.

Fixes #16886
Fixes #16691
Fixes #17199

Closes scylladb/scylladb#17162

* github.com:scylladb/scylladb:
  test_change_ip: improve the test
  raft_ip_address_updater: remove stale IPs from gossiper
  raft_address_map: add my ip with the new generation
  system_keyspace::update_peer_info: check ep and host_id are not empty
  system_keyspace::update_peer_info: make host_id an explicit parameter
  system_keyspace::update_peer_info: remove any_set flag optimisation
  system_keyspace: remove duplicate ips for host_id
  system_keyspace: peers table: use coroutines
  storage_service::raft_ip_address_updater: log gossiper event name
  raft topology: ip change: purge old IP
  on_endpoint_change: coroutinize the lambda around sync_raft_topology_nodes
2024-02-15 17:40:29 +01:00
Lakshmi Narayanan Sreethar
7a98877798 compaction: upgrade: handle keyspaces that use tablets
Tables in keyspaces governed by replication strategy that uses tablets, have
separate effective_replication_maps. Update the upgrade compaction task to
handle this when getting owned key ranges for a keyspace.

Fixes #16848

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-15 17:47:39 +05:30
Lakshmi Narayanan Sreethar
8925a2c3cb replica/database: add an optional variant to get_keyspace_local_ranges
Add a new method database::maybe_get_keyspace_local_ranges that
optionally returns the owned ranges for the given keyspace if it has a
effective_replication_map for the entire keyspace.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-15 17:44:47 +05:30
Botond Dénes
22a5112bf1 tools/scylla-sstable-scripts: add keys.lua and largest-key.lua
I wrote these scripts to identify sstables with too large keys for a
recent investigation. I think they could be useful in the future,
certainly as further examples on how to write lua scripts for
scylla-sstable script.

Closes scylladb/scylladb#17000
2024-02-15 13:39:41 +02:00
Avi Kivity
5df5714331 Merge 'api: storage_service/natural_endpoints: add tablets support' from Botond Dénes
This API endpoint currently returns with status 500 if attempted to be called for a table which uses tablets. This series adds tablet support. No change in usage semantics is required, the endpoint already has a table parameter.
This endpoint is the backend of `nodetool getendpoints` which should now work, after this PR.

Fixes: #17313

Closes scylladb/scylladb#17316

* github.com:scylladb/scylladb:
  service/storage_service: get_natural_endpoints(): add tablets support
  replica/database: keyspace: add uses_tablets()
  service/storage_service: remove token overload of get_natural_endpoints()
2024-02-15 13:36:56 +02:00
Kefu Chai
caa20c491f storage_service: pass non-empty keyspace when performing cleanup_all
this change addresses the regression introduced by 5e0b3671, which
fall backs to local cleanup in cleanup_all. but 5e0b3671 failed to
pass the keyspace to the `shard_cleanup_keyspace_compaction_task_impl`
is its constructor parameter, that's why the test fails like
```
error executing POST request to http://localhost:10000/storage_service/cleanup_all with parameters {}: remote replied with status code 400 Bad Request:
Can't find a keyspace

```

where the string after "Can't find a keyspace" is empty.

in this change, the keyspace name of the keyspace to be cleaned is passed to
`shard_cleanup_keyspace_compaction_task_impl`.

we always enable the topology coordinator when performing testing,
that's why this issue does not pop up until the longevity test.

Fixes #17302
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17320
2024-02-15 13:17:45 +02:00
Botond Dénes
811e931b09 Merge 'tools/scylla-nodetool: implement compactionstats and viewbuildstatus' from Kefu Chai
Refs #15588

Closes scylladb/scylladb#17344

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement viewbuildstatus
  tools/scylla-nodetool: implement compactionstats
2024-02-15 12:44:05 +02:00
Petr Gusev
c4140678ba test_change_ip: improve the test
In this commit we refactor test_change_ip to improve
it in several ways:
  * We inject failure before old IP is removed and verify
    that after restart the node sees the proper peers - the
    new IP for node2 and old IP for node3, which is not restarted
    yet.
  * We introduce the lambda wait_proper_ips, which checks not only the
    system.peers table, but also gossiper and token_metadata.
  * We call this lambda for all nodes, not only the first node;
    this allows to validate that the node that has changed its
    IP has the proper IP of itself in the data structures above.

Note that we need to inject an additional delay ip-change-raft-sync-delay
before old IP is removed. Otherwise the problem stop reproducing - other
nodes remove the old IP before it's send back to the just restarted node.
2024-02-15 13:26:02 +04:00
Petr Gusev
a068dba8c9 raft_ip_address_updater: remove stale IPs from gossiper
In the scenario described in the previous commit the
on_endpoint_change could be called with our previous IP.
We can easily detect this case - after add_or_update_entry
the IP for a given id in address_map hasn't changed. We
remove such IP from gossiper since it's not needed, and
makes the test in the next commit more natural - all old
IPs are removed from all subsystems.
2024-02-15 13:25:56 +04:00
Petr Gusev
4b33ba2894 raft_address_map: add my ip with the new generation
The following scenario is possible: a node A changes its IP
from ip1 to ip2 with restart, other nodes are not yet aware of ip2
so they keep gossiping ip1, after restart A receives
ip1 in a gossip message and calls handle_major_state_change
since it considers it as a new node. Then on_join event is
called on the gossiper notification handles, we receive
such event in raft_ip_address_updater and reverts the IP
of the node A back to ip1.

The essence of the problem is that we don't pass the proper
generation when we add ip2 as a local IP during initialization
when node A restarts, so the zero generation is used
in raft_address_map::add_or_update_entry and the gossiper
message owerwrites ip2 to ip1.

In this commit we fix this problem by passing the new generation.
To do that we move the increment_and_get_generation call
from join_token_ring to scylla_main, so that we have a new generation
value before init_address_map is called.

Also we remove the load_initial_raft_address_map function from
raft_group0 since it's redundant. The comment above its call site
says that it's needed to not miss gossiper updates, but
the function storage_service::init_address_map where raft_address_map
is now initialized is called before gossiper is started. This
function does both - it load the previously persisted host_id<->IP
mappings from system.local and subscribes to gossiper notifications,
so there is no room for races.

Note that this problem reproduces less likely with the
'raft topology: ip change: purge old IP' commit - other
nodes remove the old IP before it's send back to the
just restarted node. This is also the reason why this
problem doesn't occur in gossiper mode.

fixes scylladb/scylladb#17199
2024-02-15 13:21:04 +04:00
Petr Gusev
2bf75c1a4e system_keyspace::update_peer_info: check ep and host_id are not empty 2024-02-15 13:21:04 +04:00
Petr Gusev
86410d71d1 system_keyspace::update_peer_info: make host_id an explicit parameter
The host_id field should always be set, so it's more
appropriate to pass it as a separate parameter.

The function storage_service::get_peer_info_for_update
is  updated. It shouldn't look for host_id app
state is the passed map, instead the callers should
get the host_id on their own.
2024-02-15 13:21:04 +04:00
Petr Gusev
e0072f7cb3 system_keyspace::update_peer_info: remove any_set flag optimisation
This optimization never worked -- there were four usages of
the update_peer_info function and in all of them some of
the peer_info fields were set or should be set:
* sync_raft_topology_nodes/process_normal_node: e.g. tokens is set
* sync_raft_topology_nodes/process_transition_node: host_id is set
* handle_state_normal: tokens is set
* storage_service::on_change: get_peer_info_for_update could potentially
return a peer_info with all fields set to empty, but this shouldn't
be possible, host_id should always be set.

Moreover, there is a bug here: we extract host_id from the
states_ parameter, which represent the gossiper application
states that have been changed. This parameter contains host_id
only if a node changes its IP address, in all other cases host_id
is unset. This means we could end up with a record with empty
host_id, if it wasn't previously set by some other means.

We are going to fix this bug in the next commit.
2024-02-15 13:21:04 +04:00
Petr Gusev
4a14988735 system_keyspace: remove duplicate ips for host_id
When a node changes IP we call sync_raft_topology_nodes
from raft_ip_address_updater::on_endpoint_change with
the old IP value in prev_ip parameter.
It's possible that the nodes crashes right after
we insert a new IP for the host_id, but before we
remove the old IP. In this commit we fix the
possible inconsistency by removing the system.peers
record with old timestamp. This is what the new
peers_table_read_fixup function is responsible for.

We call this function in all system_keyspace methods
that read the system.peers table. The function
loads the table in memory, decides if some rows
are stale by comparing their timestamps and
removes them.

The new function also removes the records with no
host_id, so we no longer need the get_host_id function.

We'll add a test for the problem this commit fixes
in the next commit.
2024-02-15 13:21:04 +04:00
Petr Gusev
fa8718085a system_keyspace: peers table: use coroutines
This is a refactoring commit with no observable
changes in behaviour.

We switch the functions to coroutines, it'll
be easy to work with them in this way in the
next commit. Also, we add more const-s
along the way.
2024-02-15 13:21:04 +04:00
Petr Gusev
00547d3f48 storage_service::raft_ip_address_updater: log gossiper event name
It's useful for debugging.
2024-02-15 13:20:54 +04:00
Petr Gusev
6955cfa419 raft topology: ip change: purge old IP
When a node changes IP address we need to
remove its old IP from system.peers and
gossiper.

We do this in sync_raft_topology_nodes when
the new IP is saved into system.peers to avoid
losing the mapping if the node crashes
between deleting and saving the new IP. In the
next commit we handle the possible duplicates
in this case by dropping them on the read path.

In subsequent commits, test_change_ip will be
adjusted to ensure that old IPs are removed.

fixes scylladb/scylladb#16886
fixes scylladb/scylladb#16691
2024-02-15 13:19:13 +04:00
Petr Gusev
a2c0384cd1 on_endpoint_change: coroutinize the lambda around sync_raft_topology_nodes
We introduce the helper 'ensure_alive' which takes a
coroutine lambda and returns a wrapper which
ensures the proper lifetime for it.
It works by moving the input lambda onto the heap and
keeping the ptr alive until the resulting future
is resolved.

We also move the holder acquired from _async_gate
to the 'then' lambda closure, since now these closures
will be kept alive during the lambda coroutine execution.

We'll be adding more code to this lambda in the subsequent
commits, it's easier to work with coroutines.
2024-02-15 13:13:44 +04:00
Kefu Chai
f9d19a61ff tools/scylla-nodetool: implement viewbuildstatus
Refs 15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-15 16:54:16 +08:00
Nadav Har'El
28db187756 alternator, tablets: return error if enabling TTL with tablets
Alternator TTL doesn't yet work on tables using tablets (this is
issue #16567). Before this patch, it can be enabled on a table with
tablets, and the result is a lot of log spam and nothing will get expired.

So let's make the attempt to enable TTL on a table that uses tablets
into a clear error. The error message points to the issue, and also
suggests how to create a table that uses vnodes, not tablets.

This patch also adds a test that verifies that trying to enable TTL
with tablets is an error. Obviously, this test should be removed
once the issue is solved and TTL begins working with tablets.

Refs #16567
Refs #16807

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17306
2024-02-15 10:47:06 +02:00
Kefu Chai
4da9a62472 utils: managed_bytes: fix typo in comment
s/assigments/assignments/

this misspelling was identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17333
2024-02-15 10:37:25 +02:00
Kefu Chai
8e8b73fa82 dht: add formatter for paritition_range_view and i_partition
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`partition_range_view` and `i_partition`, and drop their operator<<:s.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17331
2024-02-15 09:46:03 +02:00
Lakshmi Narayanan Sreethar
3b7b315f6a replica/database: quiesce compaction before closing system tables during shutdown
During shutdown, as all system tables are closed in parallel, there is a
possibility of a race condition between compaction stoppage and the
closure of the compaction_history table. So, quiesce all the compaction
tasks before attempting to close the tables.

Fixes #15721

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#17218
2024-02-15 09:44:16 +02:00
Nadav Har'El
b97ded5c4a test/topology: tests for setting tombstone_gc on materialized view
A user asked on the ScyllaDB forum several questions on whether
tombstone_gc works on materialized views. This patch includes two
tests that confirm the following:

1. The tombstone_gc may be set on a view - either during its creation
   with CREATE MATERIALIZED VIEW or later with ALTER MATERIALIZED VIEW.

2. The tombstone_gc setting is correctly shown - for both base tables
   and views - by the "DESC" statement.

3. The tombstone_gc setting is NOT inherited from a base table to a new
   view - if you want this option on a view, you need to set it
   separately.

Unfortunately, this test could not be a single-node cql-pytest because
we forbid tombstone_gc=repair when RF=1, and since recently, we forbid
setting RF>1 on a single-node setup. So the new tests are written in
the test/topology framework - which may run multiple tests against
a single three-node cluster run multiple tests against it.

To write tests over a shared cluster, we need functions which create
temporary keyspaces, tables and views, which are deleted automatically
as soon as a test ends. The test/topology framework was lacking such
functions, so this tests includes them - currently inside the test
file, but if other people find them useful they can be moved to a more
central location.

The new functions, net_test_keyspace(), new_test_table() and
new_materialized_view() are inspired by the identically-named
functions in test/cql-pytest/util.py, but the implementation is
different: Importantly, the new functions here are *async*
context managers, used via "async with", to fit with the rest
of the asynchronous code used in the topology test framework.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17345
2024-02-15 09:43:30 +02:00
Kefu Chai
bcb144ada3 configure.py: disable stack-use-after-scope check only when ASan is enabled
`-fno-sanitize-address-use-after-scope` is used to disable the check for
stack-use-after-scope bugs, but this check is only performed when ASan
is enabled. if we pass this option when ASan is not enabled, we'd have
following warning, so let's apply it only when ASan is enabled.

```
clang-16: error: argument unused during compilation:
'-fno-sanitize-address-use-after-scope' [-Werror,-Wunused-command-line-argument]
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17329
2024-02-15 09:28:29 +02:00
Botond Dénes
ca13ff10ea service/storage_service: get_natural_endpoints(): add tablets support
Also add a unit test for this API endpoint, testing it with both tablets
and vnodes.
2024-02-15 02:07:18 -05:00
Botond Dénes
7f17d3bb0e replica/database: keyspace: add uses_tablets()
Mirroring table::uses_tablets(), provides a convenient and -- more
importabtly -- easily discoverable way to determine whether the keyspace
uses tablets or not.
This information is of course already available via the abstract
replication strategy, but as seen in a few examples, this is not easily
discoverable and sometimes people resorted to enumerating the keyspace's
tables to be able to invoke table::uses_tablets().
2024-02-15 01:51:26 -05:00