Commit Graph

41479 Commits

Author SHA1 Message Date
Benny Halevy
08b0426318 scripts/open-coredump.sh: calculate MAIN_BRANCH before cloning repo
We need MAIN_BRANCH calculated earlier so we can use it
to checkout the right branch when cloning the src repo
(either `master` or `enterprise`, based on the detected `PRODUCT`)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17647
2024-03-06 09:46:30 +02:00
Avi Kivity
c32a4c8d5c build: docker: clean up after docker build
The `buildah commit` command doesn't remove the working container. These
accumulate in ~/.local/container/storage until something bad happens.

Fix by adding the `--rm` flag to remove the container and volume.

Closes scylladb/scylladb#17546
2024-03-06 09:41:36 +02:00
Kamil Braun
0a7854ea4d Merge 'test: test_topology_ops: fix flakiness and reenable bg writes' from Patryk Jędrzejczak
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.

Since scylladb/scylladb#15924 is the last issue mentioned in
scylladb/scylladb#15962, this PR also reenables background
writes in `test_topology_ops` with tablets disabled. The test
doesn't pass with tablets and background writes because of
scylladb/scylladb#17025. We will reenable background writes
with tablets after fixing that issue.

Fixes scylladb/scylladb#15924
Fixes scylladb/scylladb#15962

Closes scylladb/scylladb#17585

* github.com:scylladb/scylladb:
  test: test_topology_ops: reenable background writes without tablets
  test: test_topology_ops: run with and without tablets
  test: topology: decrease the server's request timeouts
2024-03-04 20:57:24 +01:00
Patryk Jędrzejczak
f1d9248df9 test: wait for CDC generations publishing before checking CDC-topology consistency
Tests that verify upgrading to the raft-based topology
(`test_topology_upgrade`, `test_topology_recovery_basic`,
`test_topology_recovery_majority_loss`) have flaky
`check_system_topology_and_cdc_generations_v3_consistency` calls.
`assert topo_results[0] == topo_res` can fail because of different
`unpublished_cdc_generations` on different nodes.

The upgrade procedure creates a new CDC generation, which is later
published by the CDC generation publisher. However, this can happen
after the upgrade procedure finishes. In tests, if publishing
happens just before querying `system.topology` in
`check_system_topology_and_cdc_generations_v3_consistency`, we can
observe different `unpublished_cdc_generations` on different nodes.
It is an expected and temporary inconsistency.

For the same reasons,
`check_system_topology_and_cdc_generations_v3_consistency` can
fail after adding a new node.

To make the tests not flaky, we wait until the CDC generation
publisher finishes its job. Then, all nodes should always have
equal (and empty) `unpublished_cdc_generations`.

Fixes scylladb/scylladb#17587
Fixes scylladb/scylladb#17600
Fixes scylladb/scylladb#17621

Closes scylladb/scylladb#17622
2024-03-04 19:28:51 +02:00
Kamil Braun
ec1f574b3a test/pylib: util: silence exception from refresh_nodes
Driver's `refresh_nodes` function may throw an exception if we call it
in the middle of driver reconnecting. Silence it.

Fixes scylladb/scylladb#17616

Closes scylladb/scylladb#17620
2024-03-04 17:50:16 +02:00
Avi Kivity
e3de30f943 tools: toolchain: update frozen toolchain for python driver 3.26.7
Fixes scylladb/scylladb#16709
Fixes scylladb/scylladb#17353

Closes scylladb/scylladb#17604
2024-03-03 16:36:14 +02:00
Gleb Natapov
94cd235888 topology_coordinator: drop group0 guard while changing raft configuration
Changing config under the guard can cause a deadlock.

The guard holds _read_apply_mutex. The same lock is held by the group0
apply() function. It means that no entry can be applied while the guard
is held and raft apply fiber may be even sleeping waiting for this lock
to be release. Configuration change OTOH waits for the config change
command to be committed before returning, but the way raft is implement
is that commit notifications are triggered from apply fiber which may
be stuck. Deadlock.

Drop and re-take guard around configuration changes.

Fixes scylladb/scylladb#17186
2024-03-01 11:20:15 +01:00
Patryk Jędrzejczak
e7d4e080e9 test: test_topology_ops: reenable background writes without tablets
After fixing scylladb/scylladb#15924 in one of the previous
patches, we reenable background writes in `test_topology_ops`.

We also start background writes a bit later after adding all nodes.
Without this change and with tablets, the test fails with:
```
>       await cql.run_async(f"CREATE TABLE tbl (pk int PRIMARY KEY, v int)")
E       cassandra.protocol.ConfigurationException: <Error from server: code=2300
        [Query invalid because of configuration issue] message="Datacenter
        datacenter1 doesn't have enough nodes for replication_factor=3">
```

The change above makes the test a bit weaker, but we don't have to
worry about it. If adding nodes is bugged, other tests should
detect it.

Unfortunately, the test still doesn't pass with tablets and
background writes because of scylladb/scylladb#17025, so we keep
background writes disabled with tablets and leave FIXME.

Fixes scylladb/scylladb#15962
2024-02-29 18:37:41 +01:00
Patryk Jędrzejczak
90317c5ceb test: test_topology_ops: run with and without tablets
`test_topology_ops` is a valuable test that has uncovered many bugs.
It's worth running it with and without tablets.
2024-02-29 18:37:41 +01:00
Patryk Jędrzejczak
9dfb26428b test: topology: decrease the server's request timeouts
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.

A high server's request timeout can slow down the topology tests
(see the new comment in `make_scylla_conf`). We make the timeout
dependent on the testing mode to not slow down tests for no reason.

We don't touch the driver's request timeout. Decreasing it in some
modes would require too much effort for almost no improvement.

Fixes scylladb/scylladb#15924
2024-02-29 18:37:38 +01:00
Gleb Natapov
4ef57096bc topology coordinator: fix use after free after streaming failure
node.rs pointer can be freed while guard is released, so it cannot be
accessed during error processing. Save state locally.

Fixes #17577

Message-ID: <Zd9keSwiIC4v_EiF@scylladb.com>
2024-02-29 18:27:12 +02:00
Kamil Braun
57b14580f0 Merge 'move migration_request handling to shard0' from Gleb
The RPC is used by group0 now which is available only on shard0

Fixes scylladb/scylladb#17565

* 'gleb/migration-request-shard0' of github.com:scylladb/scylla-dev:
  raft_group0_client: assert that hold_read_apply_mutex is called on shard 0
  migration_manager: fix indentation after the previous patch.
  messaging_service: process migration_request rpc on shard 0
2024-02-29 15:13:16 +01:00
Gleb Natapov
9847e272f9 raft_group0_client: assert that hold_read_apply_mutex is called on shard 0
group0 operations a valid on shard 0 only. Assert that.
2024-02-29 12:39:48 +02:00
Gleb Natapov
77907b97f1 migration_manager: fix indentation after the previous patch. 2024-02-29 12:39:48 +02:00
Gleb Natapov
4a3c79625f messaging_service: process migration_request rpc on shard 0
Commit 0c376043eb added access to group0
semaphore which can be done on shard0 only. Unlike all other group0 rpcs
(that already always forwarded to shard0) migration_request does not
since it is an rpc that what reused from non raft days. The patch adds
the missing jump to shard0 before executing the rpc.
2024-02-29 12:39:48 +02:00
Petr Gusev
6afa80a443 sync_raft_topology_nodes: do no emit REMOVED_NODE on IP change
Calling notify_left for old ip on topology change in raft mode
was a regression. In gossiper mode it didn't occur. In gossiper
mode the function handle_state_normal was responsible for spotting
IP addresses that weren't managing any parts of the data, and
it would then initiate their removal by calling remove_endpoint.
This removal process did not include calling notify_left.
Actually, notify_left was only supposed to be called (via excise) by
a 'real' removal procedures - removenode and decommission.

The redundant notify_left caused troubles in scylla python driver.
The driver could receive REMOVED_NODE and NEW_NODE notifications
in the same time and their handling routines could race with each other.

In this commit we fix the problem by not calling notify_left if
the remove_ip lambda was called from the ip change code path.
Also, we add a test which verifies that the driver log doesn't
mention the REMOVED_NODE notification.

Fixes scylladb/scylladb#17444

Closes scylladb/scylladb#17561
2024-02-29 10:18:20 +01:00
Avi Kivity
616eec2214 Merge ' test/topology_custom: test_read_repair.py: reduce run-time ' from Botond Dénes
This test needed a lot of data to ensure multiple pages when doing the read repair. This change two key configuration items, allowing for a drastic reduction of the data size and consequently a large reduction in run-time.
* Changes query-tombstone-page-limit 1000 -> 10. Before f068d1a6fa,  reducing this to a too small value would start killing internal queries. Now, after said commit, this is no longer a concern, as this limit no longer affects unpaged queries.
* Sets (the new) query-page-size-in-bytes 1MB (default) -> 1KB.

The latter configuration is a new one, added by the first patches of this series. It allows configuring the page-size in bytes, after which pages are cut. Previously this was a hard-coded constant: 1MB. This forced any tests which wanted to check paging, with pages cut on size, to work with large datasets. This was especially pronounced in the tests fixed in this PR, because this test works with tombstones which are tiny and a lot of them were needed to trigger paging based on the size.

With this two changes, we can reduce the data size:
* total_rows: 20000 -> 100
* max_live_rows: 32 -> 8

The runtime of the test consequently drops from 62 seconds to 13.5 seconds (dev mode, on my build machine).

Fixes: https://github.com/scylladb/scylladb/issues/15425
Fixes: https://github.com/scylladb/scylladb/issues/16899

Closes scylladb/scylladb#17529

* github.com:scylladb/scylladb:
  test/topology_custom: test_read_repair.py: reduce run-time
  replica/database: get_query_max_result_size(): use query_page_size_in_bytes
  replica/database: use include page-size in max-result-size
  query-request: max_result_size: add without_page_limit()
  db/config: introduce query_page_size_in_bytes
2024-02-27 18:54:38 +02:00
Nadav Har'El
fc861742d7 cql: avoid undefined behavior in totimestamp() of extreme dates
This patch fixes a UBSAN-reported integer overflow during one of our
existing tests,

   test_native_functions.py::test_mintimeuuid_extreme_from_totimestamp

when attempting to convert an extreme "date" value, millions of years
in the past, into a "timestamp" value. When UBSAN crashing is enabled,
this test crashes before this patch, and succeeds after this patch.

The "date" CQL type is 32-bit count of *days* from the epoch, which can
span 2^31 days (5 million years) before or after the epoch. Meanwhile,
the "timestamp" type measures the number of milliseconds from the same
epoch, in 64 bits. Luckily (or intentionally), every "date", however
extreme, can be converted into a "timestamp": This is because 2^31 days
is 1.85e17 milliseconds, well below timestamp's limit of 2^63 milliseconds
(9.2e18).

But it turns out that our conversion function, date_to_time_point(),
used some boost::gregorian library code, which carried out these
calculations in **microsecond** resolution. The extra conversion to
microseconds wasn't just wasteful, it also caused an integer overflow
in the extreme case: 2^31 days is 1.85e20 microseconds, which does NOT
fit in a 64-bit integer. UBSAN notices this overflow, and complains
(plus, the conversion is incorrect).

The fix is to do the trivial conversion on our own (a day is, by
convention, exactly 86400 seconds - no fancy library is needed),
without the grace of Boost. The result is simpler, faster, correct
for the Pliocene-age dates, and fixes the UBSAN crash in the test.

Fixes #17516

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17527
2024-02-27 17:04:18 +02:00
Gleb Natapov
0c376043eb migration_manager: take group0 lock during raft snapshot taking
Group0 state machine access atomicity is guaranteed by a mutex in group0
client. A code that reads or writes the state needs to hold the log. To
transfer schema part of the snapshot we used existing "migration
request" verb which did not follow the rule. Fix the code to take group0
lock before accessing schema in case the verb is called as part of
group0 snapshot transfer.

Fixes scylladb/scylladb#16821
2024-02-27 11:15:17 +01:00
Botond Dénes
5dc145a93f test/topology_custom: test_read_repair.py: reduce run-time
This test needed a lot of data to ensure multiple pages when doing the
read repair. This change two key configuration items, allowing
for a drastic reduction of the data size and consequently a large
reduction in run-time.
* Changes query-tombstone-page-limit 1000 -> 10. Before f068d1a6fa,
  reducing this to a too small value would start killing internal
  queries. Now, after said commit, this is no longer a concern, as this
  limit no longer affects unpaged queries.
* Sets (the new) query-page-size-in-bytes 1MB (default) -> 1KB.

With this two changes, we can reduce the data size:
* total_rows: 20000 -> 100
* max_live_rows: 32 -> 8

The runtime of the test consequently drops from 62 seconds to 13.5
seconds (dev mode, on my build machine).
2024-02-27 02:27:55 -05:00
Botond Dénes
7f3ca3a3d8 replica/database: get_query_max_result_size(): use query_page_size_in_bytes
As the page size for user queries, instead of the hard-coded constant
used before. For system queries, we keep using the previous constant.
2024-02-27 02:27:55 -05:00
Botond Dénes
8213e66815 replica/database: use include page-size in max-result-size
This patch changes get_unlimited_query_max_result_size():
* Also set the page-size field, not just the soft/hard limits
* Renames it to get_query_max_result_size()
* Update callers, specifically storage_proxy::get_max_result_size(),
  which now has a much simpler common return path and has to drop the
  page size on one rare return path.

This is a purely mechanical change, no behaviour is changed.
2024-02-27 02:27:55 -05:00
Botond Dénes
97615e0d9a query-request: max_result_size: add without_page_limit()
Returns an instance with the page_limit reset to 0. This converts a
max_results_size which is usable only with the
"page_size_and_safety_limit" feature, to one which can be used before
this feature.
To be used in the next patch.
2024-02-27 02:14:46 -05:00
Botond Dénes
5e37c1465f db/config: introduce query_page_size_in_bytes
Regulates the page size in bytes via config, instead of the currently
used hard-coded constant. Allows tests to configure lower limits so they
can work with smaller data-sets when testing paging related
functionality.
Not wired yet.
2024-02-27 02:14:45 -05:00
Kamil Braun
fd32e2ee10 Merge 'misc_services: fix data race from bad usage of get_next_version' from Piotr Dulikowski
The function `gms::version_generator::get_next_version()` can only be called from shard 0 as it uses a global, unsynchronized counter to issue versions. Notably, the function is used as a default argument for the constructor of `gms::versioned_value` which is used from shorthand constructors such as `versioned_value::cache_hitrates`, `versioned_value::schema` etc.

The `cache_hitrate_calculator` service runs a periodic job which updates the `CACHE_HITRATES` application state in the local gossiper state. Each time the job is scheduled, it runs on the next shard (it goes through shards in a round-robin fashion). The job uses the `versioned_value::cache_hitrates` shorthand to create a `versioned_value`, therefore risking a data race if it is not currently executing on shard 0.

The PR fixes the race by moving the call to `versioned_value::cache_hitrates` to shard 0. Additionally, in order to help detect similar issues in the future, a check is introduced to `get_next_version` which aborts the process if the function was called on other shard than 0.

There is a possibility that it is a fix for #17493. Because `get_next_version` uses a simple incrementation to advance the global counter, a data race can occur if two shards call it concurrently and it may result in shard 0 returning the same or smaller value when called two times in a row. The following sequence of events is suspected to occur on node A:

1. Shard 1 calls `get_next_version()`, loads version `v - 1` from the global counter and stores in a register; the thread then is preempted,
2. Shard 0 executes `add_local_application_state()` which internally calls `get_next_version()`, loads `v - 1` then stores `v` and uses version `v` to update the application state,
3. Shard 0 executes `add_local_application_state()` again, increments version to `v + 1` and uses it to update the application state,
4. Gossip message handler runs, exchanging application states with node B. It sends its application state to B. Note that the max version of any of the local application states is `v + 1`,
5. Shard 1 resumes and stores version `v` in the global counter,
6. Shard 0 executes `add_local_application_state()` and updates the application state - again - with version `v + 1`.
7. After that, node B will never learn about the application state introduced in point 6. as gossip exchange only sends endpoint states with version larger than the previous observed max version, which was `v + 1` in point 4.

Note that the above scenario was _not_ reproduced. However, I managed to observe a race condition by:

1. modifying Scylla to run update of `CACHE_HITRATES` much more frequently than usual,
2. putting an assertion in `add_local_application_state` which fails if the version returned by `get_next_version` was not larger than the previous returned value,
3. running a test which performs schema changes in a loop.

The assertion from the second point was triggered. While it's hard to tell how likely it is to occur without making updates of cache hitrates more frequent - not to mention the full theorized scenario - for now this is the best lead that we have, and the data race being fixed here is a real bug anyway.

Refs: #17493

Closes scylladb/scylladb#17499

* github.com:scylladb/scylladb:
  version_generator: check that get_next_version is called on shard 0
  misc_services: fix data race from bad usage of get_next_version
2024-02-25 19:35:34 +01:00
Gleb Natapov
59df47920b topology coordinator: fix use after free in rollback_to_normal state
node.rs pointer can be freed while guard is released, so it cannot be
accessed during error processing. Save state locally.

Fixes scylladb/scylladb#17402
CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/6993/

Message-ID: <ZdtJNJM056r4EZzz@scylladb.com>
2024-02-25 16:34:19 +02:00
Raphael S. Carvalho
f07c233ad5 Fix potential data resurrection when another compaction type does cleanup work
Since commit f1bbf70, many compaction types can do cleanup work, but turns out
we forgot to invalidate cache on their completion.

So if a node regains ownership of token that had partition deleted in its previous
owner (and tombstone is already gone), data can be resurrected.

Tablet is not affected, as it explicitly invalidates cache during migration
cleanup stage.

Scylla 5.4 is affected.

Fixes #17501.
Fixes #17452.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17502
2024-02-25 13:08:04 +02:00
Nadav Har'El
b4cef638ef Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* canonical_mutation
* atomic_cell_view
* atomic_cell
* atomic_cell_or_collection::printer

Refs #13245

Closes scylladb/scylladb#17506

* github.com:scylladb/scylladb:
  mutation: add fmt::formatter for canonical_mutation
  mutation: add fmt::formatter for atomic_cell_view and atomic_cell
  mutation: add fmt::formatter for atomic_cell_or_collection::printer
2024-02-25 09:48:56 +02:00
Kefu Chai
84ba624415 mutation: add fmt::formatter for canonical_mutation
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for canonical_mutation

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-25 12:48:13 +08:00
Kefu Chai
3625796222 mutation: add fmt::formatter for atomic_cell_view and atomic_cell
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* atomic_cell_view
* atomic_cell

and drop their operator<<:s.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-25 12:19:11 +08:00
Kefu Chai
b4fa32ec17 mutation: add fmt::formatter for atomic_cell_or_collection::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`atomic_cell_or_collection::printer`, and drop its operator<<.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-25 12:18:41 +08:00
Piotr Dulikowski
54546e1530 version_generator: check that get_next_version is called on shard 0
The get_next_version function can only be safely called from shard 0,
but this constraint is not enforced in any way. As evidenced in the
previous commit, it is easy to accidentally call it from a non-zero
shard.

Introduce a runtime check to get_next_version which calls
on_fatal_internal_error if it detects that the function was called form
the wrong shard. This will let us detect cross-shard use issues in
runtime.
2024-02-23 13:49:49 +01:00
Piotr Dulikowski
21d5d4e15c misc_services: fix data race from bad usage of get_next_version
The function `gms::version_generator::get_next_version()` can only be
called from shard 0 as it uses a global, unsynchronized counter to
issue versions. Notably, the function is used as a default argument for
the constructor of `gms::versioned_value` which is used from shorthand
constructors such as `versioned_value::cache_hitrates`,
`versioned_value::schema` etc.

The `cache_hitrate_calculator` service runs a periodic job which
updates the `CACHE_HITRATES` application state in the local gossiper
state. Each time the job is scheduled, it runs on the next shard (it
goes through shards in a round-robin fashion). The job uses the
`versioned_value::cache_hitrates` shorthand to create a
`versioned_value`, therefore risking a data race if it is not currently
executing on shard 0.

Fix the race by constructing the versioned value on shard 0.
2024-02-23 12:54:32 +01:00
Kefu Chai
496cf9a1d8 interval: add fmt::formatters for managed_bytes and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* wrapping_interval
* interval

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17488
2024-02-23 10:26:30 +02:00
Nadav Har'El
0aaa6b1a08 fmt: add formatter for mutation_fragment_v2::kind
Unfortunately, fmt v10 dropped support for operator<< formatters,
forcing us to replace the huge number of operator<< implementations
in our code by uglier and templated fmt::formatter implementations
to get Scylla to compile on modern distros (such as Fedora 39) :-(

Kefu has already started doing this migration, here is my small
contribution - the formatter for mutation_fragment_v2::kind.
This patch is need to compile, for example,
build/dev/mutation/mutation_fragment_stream_validator.o.

I can't remove the old operator<< because it's still used by
the implementation of other operator<< functions. We can remove
all of them when we're done with this coversion. In the meantime,
I replaced the original implementation of operator<< by a trivial
implementation just passing the work to the new fmt::print support.

Refs #13245

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17432
2024-02-23 10:25:39 +02:00
Botond Dénes
c1267900c6 Merge 'sstables: add fmt::formatter for sstable types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* bound_kind_m
* sstable_state
* indexable_element
* deletion_time

drop their operator<<:s

Refs #13245

Closes scylladb/scylladb#17490

* github.com:scylladb/scylladb:
  sstables: add fmt::formatter for deletion_time
  sstable: add fmt::formatter for indexable_element
  sstables: add fmt::foramtter for sstable_state
  sstables: add fmt::formatter for sstables::bound_kind_m
2024-02-23 10:09:26 +02:00
Botond Dénes
89efa89dd7 Merge 'test: add fmt::formatters' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for some types used in testing.

Refs #13245

Closes scylladb/scylladb#17485

* github.com:scylladb/scylladb:
  test/unit: add fmt::formatter for tree_test_key_base
  test: add printer for type for BOOST_REQUIRE_EQUAL
  test: add fmt::formatters
  test/perf: add fmt::formatters for scheduling_latency_measurer and perf_result
2024-02-23 09:32:39 +02:00
Botond Dénes
1f363a876e Merge 'utils: add fmt::formatter for occupancy_stats, managed_bytes and friends ' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* managed_bytes
* managed_bytes_view
* managed_bytes_opt
* occupancy_stats

and drop their operator<<:s

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#17462

* github.com:scylladb/scylladb:
  utils/managed_bytes: add fmt::formatters for managed_bytes and friends
  utils/logalloc: add fmt::formatter for occupancy_stats
2024-02-23 09:31:22 +02:00
Botond Dénes
d314ad2725 Merge 'sstables: close index_reader in has_partition_key' from Aleksandra Martyniuk
If index_reader isn't closed before it is destroyed, then ongoing
sstables reads won't be awaited and assertion will be triggered.

Close index_reader in has_partition_key before destroying it.

Fixes: #17232.

Closes scylladb/scylladb#17355

* github.com:scylladb/scylladb:
  test: add test to check if reader is closed
  sstables: close index_reader in has_partition_key
2024-02-23 09:27:55 +02:00
Kefu Chai
010fb5f323 tools/scylla-nodetool: make keyspace argument optional for "ring"
the "keyspace" argument of the "ring" command is optional. but before
this change, we considered it a mandatory option. it was wrong.

so, in this change, we make it optional, and print out the warning
message if the keyspace is not specified.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17472
2024-02-23 09:25:29 +02:00
Kefu Chai
6800810dba interval, multishard_mutation_query: fix typos in comments
these misspellings were identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17491
2024-02-23 09:06:24 +02:00
Botond Dénes
a08d9ba2a4 Merge 'tools/scylla-nodetool: fixes to address test failures with dtest' from Kefu Chai
* tighten the param check for toppartitions
* add an extra empty line inbetween reports

Closes scylladb/scylladb#17486

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: add an extra empty line inbetween reports
  tools/scylla-nodetool: tighten the param check for toppartitions
2024-02-23 09:05:30 +02:00
Botond Dénes
959d33ba39 Merge 'repair: streaming: handle no_such_column_family from remote node' from Aleksandra Martyniuk
RPC calls lose information about the type of returned exception.
Thus, if a table is dropped on receiver node, but it still exists
on a sender node and sender node streams the table's data, then
the whole operation fails.

To prevent that, add a method which synchronizes schema and then
checks, if the exception was caused by table drop. If so,
the exception is swallowed.

Use the method in streaming and repair to continue them when
the table is dropped in the meantime.

Fixes: #17028.
Fixes: #15370.
Fixes: #15598.

Closes scylladb/scylladb#17231

* github.com:scylladb/scylladb:
  repair: handle no_such_column_family from remote node gracefully
  test: test drop table on receiver side during streaming
  streaming: fix indentation
  streaming: handle no_such_column_family from remote node gracefully
  repair: add methods to skip dropped table
2024-02-23 08:25:45 +02:00
Kefu Chai
3574c22d73 test/nodetool/utils: print out unmatched output on test failure
would be more helpful if the matched could print out the unmatched
output on test failure. so, in this change, both stdout and stderr
are printed if they fail to match with the expected error.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17489
2024-02-23 08:20:30 +02:00
Botond Dénes
234aa99aaa Merge 'tools/scylla-nodetool: extract and use {yaml,json}_writers' from Kefu Chai
simpler this way.

Closes scylladb/scylladb#17437

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: use {yaml,json}_writers in compactionhistory_operation
  tools/scylla-nodetool: add {json,yaml}_writer
2024-02-23 08:13:07 +02:00
Kefu Chai
3a3f0d392f gms/versioned_value: impl operator<<(.., const gms::versioned_value) using fmt
less repeatings this way. this is also a follow-up change of
cb781c0ff7.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17390
2024-02-23 08:11:03 +02:00
Kefu Chai
62abf89312 sstables: add fmt::formatter for deletion_time
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `sstables::deletion_time`,
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 13:56:32 +08:00
Kefu Chai
a5a757387a sstable: add fmt::formatter for indexable_element
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `sstables::indexable_element`,
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 13:56:28 +08:00
Kefu Chai
5754b9eb08 sstables: add fmt::foramtter for sstable_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `sstables::sstable_state`,
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 13:55:49 +08:00
Kefu Chai
9a32029a8f sstables: add fmt::formatter for sstables::bound_kind_m
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `sstables::bound_kind_m`,
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 13:55:22 +08:00