We need MAIN_BRANCH calculated earlier so we can use it
to checkout the right branch when cloning the src repo
(either `master` or `enterprise`, based on the detected `PRODUCT`)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#17647
The `buildah commit` command doesn't remove the working container. These
accumulate in ~/.local/container/storage until something bad happens.
Fix by adding the `--rm` flag to remove the container and volume.
Closesscylladb/scylladb#17546
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.
Since scylladb/scylladb#15924 is the last issue mentioned in
scylladb/scylladb#15962, this PR also reenables background
writes in `test_topology_ops` with tablets disabled. The test
doesn't pass with tablets and background writes because of
scylladb/scylladb#17025. We will reenable background writes
with tablets after fixing that issue.
Fixesscylladb/scylladb#15924Fixesscylladb/scylladb#15962Closesscylladb/scylladb#17585
* github.com:scylladb/scylladb:
test: test_topology_ops: reenable background writes without tablets
test: test_topology_ops: run with and without tablets
test: topology: decrease the server's request timeouts
Tests that verify upgrading to the raft-based topology
(`test_topology_upgrade`, `test_topology_recovery_basic`,
`test_topology_recovery_majority_loss`) have flaky
`check_system_topology_and_cdc_generations_v3_consistency` calls.
`assert topo_results[0] == topo_res` can fail because of different
`unpublished_cdc_generations` on different nodes.
The upgrade procedure creates a new CDC generation, which is later
published by the CDC generation publisher. However, this can happen
after the upgrade procedure finishes. In tests, if publishing
happens just before querying `system.topology` in
`check_system_topology_and_cdc_generations_v3_consistency`, we can
observe different `unpublished_cdc_generations` on different nodes.
It is an expected and temporary inconsistency.
For the same reasons,
`check_system_topology_and_cdc_generations_v3_consistency` can
fail after adding a new node.
To make the tests not flaky, we wait until the CDC generation
publisher finishes its job. Then, all nodes should always have
equal (and empty) `unpublished_cdc_generations`.
Fixesscylladb/scylladb#17587Fixesscylladb/scylladb#17600Fixesscylladb/scylladb#17621Closesscylladb/scylladb#17622
Changing config under the guard can cause a deadlock.
The guard holds _read_apply_mutex. The same lock is held by the group0
apply() function. It means that no entry can be applied while the guard
is held and raft apply fiber may be even sleeping waiting for this lock
to be release. Configuration change OTOH waits for the config change
command to be committed before returning, but the way raft is implement
is that commit notifications are triggered from apply fiber which may
be stuck. Deadlock.
Drop and re-take guard around configuration changes.
Fixesscylladb/scylladb#17186
After fixing scylladb/scylladb#15924 in one of the previous
patches, we reenable background writes in `test_topology_ops`.
We also start background writes a bit later after adding all nodes.
Without this change and with tablets, the test fails with:
```
> await cql.run_async(f"CREATE TABLE tbl (pk int PRIMARY KEY, v int)")
E cassandra.protocol.ConfigurationException: <Error from server: code=2300
[Query invalid because of configuration issue] message="Datacenter
datacenter1 doesn't have enough nodes for replication_factor=3">
```
The change above makes the test a bit weaker, but we don't have to
worry about it. If adding nodes is bugged, other tests should
detect it.
Unfortunately, the test still doesn't pass with tablets and
background writes because of scylladb/scylladb#17025, so we keep
background writes disabled with tablets and leave FIXME.
Fixesscylladb/scylladb#15962
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.
A high server's request timeout can slow down the topology tests
(see the new comment in `make_scylla_conf`). We make the timeout
dependent on the testing mode to not slow down tests for no reason.
We don't touch the driver's request timeout. Decreasing it in some
modes would require too much effort for almost no improvement.
Fixesscylladb/scylladb#15924
node.rs pointer can be freed while guard is released, so it cannot be
accessed during error processing. Save state locally.
Fixes#17577
Message-ID: <Zd9keSwiIC4v_EiF@scylladb.com>
The RPC is used by group0 now which is available only on shard0
Fixesscylladb/scylladb#17565
* 'gleb/migration-request-shard0' of github.com:scylladb/scylla-dev:
raft_group0_client: assert that hold_read_apply_mutex is called on shard 0
migration_manager: fix indentation after the previous patch.
messaging_service: process migration_request rpc on shard 0
Commit 0c376043eb added access to group0
semaphore which can be done on shard0 only. Unlike all other group0 rpcs
(that already always forwarded to shard0) migration_request does not
since it is an rpc that what reused from non raft days. The patch adds
the missing jump to shard0 before executing the rpc.
Calling notify_left for old ip on topology change in raft mode
was a regression. In gossiper mode it didn't occur. In gossiper
mode the function handle_state_normal was responsible for spotting
IP addresses that weren't managing any parts of the data, and
it would then initiate their removal by calling remove_endpoint.
This removal process did not include calling notify_left.
Actually, notify_left was only supposed to be called (via excise) by
a 'real' removal procedures - removenode and decommission.
The redundant notify_left caused troubles in scylla python driver.
The driver could receive REMOVED_NODE and NEW_NODE notifications
in the same time and their handling routines could race with each other.
In this commit we fix the problem by not calling notify_left if
the remove_ip lambda was called from the ip change code path.
Also, we add a test which verifies that the driver log doesn't
mention the REMOVED_NODE notification.
Fixesscylladb/scylladb#17444Closesscylladb/scylladb#17561
This test needed a lot of data to ensure multiple pages when doing the read repair. This change two key configuration items, allowing for a drastic reduction of the data size and consequently a large reduction in run-time.
* Changes query-tombstone-page-limit 1000 -> 10. Before f068d1a6fa, reducing this to a too small value would start killing internal queries. Now, after said commit, this is no longer a concern, as this limit no longer affects unpaged queries.
* Sets (the new) query-page-size-in-bytes 1MB (default) -> 1KB.
The latter configuration is a new one, added by the first patches of this series. It allows configuring the page-size in bytes, after which pages are cut. Previously this was a hard-coded constant: 1MB. This forced any tests which wanted to check paging, with pages cut on size, to work with large datasets. This was especially pronounced in the tests fixed in this PR, because this test works with tombstones which are tiny and a lot of them were needed to trigger paging based on the size.
With this two changes, we can reduce the data size:
* total_rows: 20000 -> 100
* max_live_rows: 32 -> 8
The runtime of the test consequently drops from 62 seconds to 13.5 seconds (dev mode, on my build machine).
Fixes: https://github.com/scylladb/scylladb/issues/15425
Fixes: https://github.com/scylladb/scylladb/issues/16899Closesscylladb/scylladb#17529
* github.com:scylladb/scylladb:
test/topology_custom: test_read_repair.py: reduce run-time
replica/database: get_query_max_result_size(): use query_page_size_in_bytes
replica/database: use include page-size in max-result-size
query-request: max_result_size: add without_page_limit()
db/config: introduce query_page_size_in_bytes
This patch fixes a UBSAN-reported integer overflow during one of our
existing tests,
test_native_functions.py::test_mintimeuuid_extreme_from_totimestamp
when attempting to convert an extreme "date" value, millions of years
in the past, into a "timestamp" value. When UBSAN crashing is enabled,
this test crashes before this patch, and succeeds after this patch.
The "date" CQL type is 32-bit count of *days* from the epoch, which can
span 2^31 days (5 million years) before or after the epoch. Meanwhile,
the "timestamp" type measures the number of milliseconds from the same
epoch, in 64 bits. Luckily (or intentionally), every "date", however
extreme, can be converted into a "timestamp": This is because 2^31 days
is 1.85e17 milliseconds, well below timestamp's limit of 2^63 milliseconds
(9.2e18).
But it turns out that our conversion function, date_to_time_point(),
used some boost::gregorian library code, which carried out these
calculations in **microsecond** resolution. The extra conversion to
microseconds wasn't just wasteful, it also caused an integer overflow
in the extreme case: 2^31 days is 1.85e20 microseconds, which does NOT
fit in a 64-bit integer. UBSAN notices this overflow, and complains
(plus, the conversion is incorrect).
The fix is to do the trivial conversion on our own (a day is, by
convention, exactly 86400 seconds - no fancy library is needed),
without the grace of Boost. The result is simpler, faster, correct
for the Pliocene-age dates, and fixes the UBSAN crash in the test.
Fixes#17516
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#17527
Group0 state machine access atomicity is guaranteed by a mutex in group0
client. A code that reads or writes the state needs to hold the log. To
transfer schema part of the snapshot we used existing "migration
request" verb which did not follow the rule. Fix the code to take group0
lock before accessing schema in case the verb is called as part of
group0 snapshot transfer.
Fixesscylladb/scylladb#16821
This test needed a lot of data to ensure multiple pages when doing the
read repair. This change two key configuration items, allowing
for a drastic reduction of the data size and consequently a large
reduction in run-time.
* Changes query-tombstone-page-limit 1000 -> 10. Before f068d1a6fa,
reducing this to a too small value would start killing internal
queries. Now, after said commit, this is no longer a concern, as this
limit no longer affects unpaged queries.
* Sets (the new) query-page-size-in-bytes 1MB (default) -> 1KB.
With this two changes, we can reduce the data size:
* total_rows: 20000 -> 100
* max_live_rows: 32 -> 8
The runtime of the test consequently drops from 62 seconds to 13.5
seconds (dev mode, on my build machine).
This patch changes get_unlimited_query_max_result_size():
* Also set the page-size field, not just the soft/hard limits
* Renames it to get_query_max_result_size()
* Update callers, specifically storage_proxy::get_max_result_size(),
which now has a much simpler common return path and has to drop the
page size on one rare return path.
This is a purely mechanical change, no behaviour is changed.
Returns an instance with the page_limit reset to 0. This converts a
max_results_size which is usable only with the
"page_size_and_safety_limit" feature, to one which can be used before
this feature.
To be used in the next patch.
Regulates the page size in bytes via config, instead of the currently
used hard-coded constant. Allows tests to configure lower limits so they
can work with smaller data-sets when testing paging related
functionality.
Not wired yet.
The function `gms::version_generator::get_next_version()` can only be called from shard 0 as it uses a global, unsynchronized counter to issue versions. Notably, the function is used as a default argument for the constructor of `gms::versioned_value` which is used from shorthand constructors such as `versioned_value::cache_hitrates`, `versioned_value::schema` etc.
The `cache_hitrate_calculator` service runs a periodic job which updates the `CACHE_HITRATES` application state in the local gossiper state. Each time the job is scheduled, it runs on the next shard (it goes through shards in a round-robin fashion). The job uses the `versioned_value::cache_hitrates` shorthand to create a `versioned_value`, therefore risking a data race if it is not currently executing on shard 0.
The PR fixes the race by moving the call to `versioned_value::cache_hitrates` to shard 0. Additionally, in order to help detect similar issues in the future, a check is introduced to `get_next_version` which aborts the process if the function was called on other shard than 0.
There is a possibility that it is a fix for #17493. Because `get_next_version` uses a simple incrementation to advance the global counter, a data race can occur if two shards call it concurrently and it may result in shard 0 returning the same or smaller value when called two times in a row. The following sequence of events is suspected to occur on node A:
1. Shard 1 calls `get_next_version()`, loads version `v - 1` from the global counter and stores in a register; the thread then is preempted,
2. Shard 0 executes `add_local_application_state()` which internally calls `get_next_version()`, loads `v - 1` then stores `v` and uses version `v` to update the application state,
3. Shard 0 executes `add_local_application_state()` again, increments version to `v + 1` and uses it to update the application state,
4. Gossip message handler runs, exchanging application states with node B. It sends its application state to B. Note that the max version of any of the local application states is `v + 1`,
5. Shard 1 resumes and stores version `v` in the global counter,
6. Shard 0 executes `add_local_application_state()` and updates the application state - again - with version `v + 1`.
7. After that, node B will never learn about the application state introduced in point 6. as gossip exchange only sends endpoint states with version larger than the previous observed max version, which was `v + 1` in point 4.
Note that the above scenario was _not_ reproduced. However, I managed to observe a race condition by:
1. modifying Scylla to run update of `CACHE_HITRATES` much more frequently than usual,
2. putting an assertion in `add_local_application_state` which fails if the version returned by `get_next_version` was not larger than the previous returned value,
3. running a test which performs schema changes in a loop.
The assertion from the second point was triggered. While it's hard to tell how likely it is to occur without making updates of cache hitrates more frequent - not to mention the full theorized scenario - for now this is the best lead that we have, and the data race being fixed here is a real bug anyway.
Refs: #17493Closesscylladb/scylladb#17499
* github.com:scylladb/scylladb:
version_generator: check that get_next_version is called on shard 0
misc_services: fix data race from bad usage of get_next_version
Since commit f1bbf70, many compaction types can do cleanup work, but turns out
we forgot to invalidate cache on their completion.
So if a node regains ownership of token that had partition deleted in its previous
owner (and tombstone is already gone), data can be resurrected.
Tablet is not affected, as it explicitly invalidates cache during migration
cleanup stage.
Scylla 5.4 is affected.
Fixes#17501.
Fixes#17452.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#17502
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.
in this change, we define formatters for
* canonical_mutation
* atomic_cell_view
* atomic_cell
* atomic_cell_or_collection::printer
Refs #13245Closesscylladb/scylladb#17506
* github.com:scylladb/scylladb:
mutation: add fmt::formatter for canonical_mutation
mutation: add fmt::formatter for atomic_cell_view and atomic_cell
mutation: add fmt::formatter for atomic_cell_or_collection::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for canonical_mutation
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* atomic_cell_view
* atomic_cell
and drop their operator<<:s.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
`atomic_cell_or_collection::printer`, and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The get_next_version function can only be safely called from shard 0,
but this constraint is not enforced in any way. As evidenced in the
previous commit, it is easy to accidentally call it from a non-zero
shard.
Introduce a runtime check to get_next_version which calls
on_fatal_internal_error if it detects that the function was called form
the wrong shard. This will let us detect cross-shard use issues in
runtime.
The function `gms::version_generator::get_next_version()` can only be
called from shard 0 as it uses a global, unsynchronized counter to
issue versions. Notably, the function is used as a default argument for
the constructor of `gms::versioned_value` which is used from shorthand
constructors such as `versioned_value::cache_hitrates`,
`versioned_value::schema` etc.
The `cache_hitrate_calculator` service runs a periodic job which
updates the `CACHE_HITRATES` application state in the local gossiper
state. Each time the job is scheduled, it runs on the next shard (it
goes through shards in a round-robin fashion). The job uses the
`versioned_value::cache_hitrates` shorthand to create a
`versioned_value`, therefore risking a data race if it is not currently
executing on shard 0.
Fix the race by constructing the versioned value on shard 0.
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* wrapping_interval
* interval
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17488
Unfortunately, fmt v10 dropped support for operator<< formatters,
forcing us to replace the huge number of operator<< implementations
in our code by uglier and templated fmt::formatter implementations
to get Scylla to compile on modern distros (such as Fedora 39) :-(
Kefu has already started doing this migration, here is my small
contribution - the formatter for mutation_fragment_v2::kind.
This patch is need to compile, for example,
build/dev/mutation/mutation_fragment_stream_validator.o.
I can't remove the old operator<< because it's still used by
the implementation of other operator<< functions. We can remove
all of them when we're done with this coversion. In the meantime,
I replaced the original implementation of operator<< by a trivial
implementation just passing the work to the new fmt::print support.
Refs #13245
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#17432
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* bound_kind_m
* sstable_state
* indexable_element
* deletion_time
drop their operator<<:s
Refs #13245Closesscylladb/scylladb#17490
* github.com:scylladb/scylladb:
sstables: add fmt::formatter for deletion_time
sstable: add fmt::formatter for indexable_element
sstables: add fmt::foramtter for sstable_state
sstables: add fmt::formatter for sstables::bound_kind_m
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for some types used in testing.
Refs #13245Closesscylladb/scylladb#17485
* github.com:scylladb/scylladb:
test/unit: add fmt::formatter for tree_test_key_base
test: add printer for type for BOOST_REQUIRE_EQUAL
test: add fmt::formatters
test/perf: add fmt::formatters for scheduling_latency_measurer and perf_result
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* managed_bytes
* managed_bytes_view
* managed_bytes_opt
* occupancy_stats
and drop their operator<<:s
Refs https://github.com/scylladb/scylladb/issues/13245Closesscylladb/scylladb#17462
* github.com:scylladb/scylladb:
utils/managed_bytes: add fmt::formatters for managed_bytes and friends
utils/logalloc: add fmt::formatter for occupancy_stats
If index_reader isn't closed before it is destroyed, then ongoing
sstables reads won't be awaited and assertion will be triggered.
Close index_reader in has_partition_key before destroying it.
Fixes: #17232.
Closesscylladb/scylladb#17355
* github.com:scylladb/scylladb:
test: add test to check if reader is closed
sstables: close index_reader in has_partition_key
the "keyspace" argument of the "ring" command is optional. but before
this change, we considered it a mandatory option. it was wrong.
so, in this change, we make it optional, and print out the warning
message if the keyspace is not specified.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17472
* tighten the param check for toppartitions
* add an extra empty line inbetween reports
Closesscylladb/scylladb#17486
* github.com:scylladb/scylladb:
tools/scylla-nodetool: add an extra empty line inbetween reports
tools/scylla-nodetool: tighten the param check for toppartitions
RPC calls lose information about the type of returned exception.
Thus, if a table is dropped on receiver node, but it still exists
on a sender node and sender node streams the table's data, then
the whole operation fails.
To prevent that, add a method which synchronizes schema and then
checks, if the exception was caused by table drop. If so,
the exception is swallowed.
Use the method in streaming and repair to continue them when
the table is dropped in the meantime.
Fixes: #17028.
Fixes: #15370.
Fixes: #15598.
Closesscylladb/scylladb#17231
* github.com:scylladb/scylladb:
repair: handle no_such_column_family from remote node gracefully
test: test drop table on receiver side during streaming
streaming: fix indentation
streaming: handle no_such_column_family from remote node gracefully
repair: add methods to skip dropped table
would be more helpful if the matched could print out the unmatched
output on test failure. so, in this change, both stdout and stderr
are printed if they fail to match with the expected error.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17489
simpler this way.
Closesscylladb/scylladb#17437
* github.com:scylladb/scylladb:
tools/scylla-nodetool: use {yaml,json}_writers in compactionhistory_operation
tools/scylla-nodetool: add {json,yaml}_writer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `sstables::deletion_time`,
drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `sstables::indexable_element`,
drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `sstables::sstable_state`,
drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `sstables::bound_kind_m`,
drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>