Currently, we load the repair history during boot up. If the number of
repair history entries is high, it might take a while to load them.
In my test, to load 10M entries, it took around 60 seconds.
It is not a must to load the entries during boot up. It is better to
load them in the background to speed up the boot time.
Fixes#17993Closesscylladb/scylladb#17994
* github.com:scylladb/scylladb:
repair: Load repair history in background
repair: Abort load_history process in shutdown
We currently support the sync-label only in OSS. Since Scylla-enterprise
get all the commits from OSS repo, the sync-label is running and failing
during checkout (since it's a private repo and should have different
configuration)
For now, let's limit the workflows for oss repo
Closesscylladb/scylladb#18142
Added support to track and limit the memory usage by sstable components. A reclaimable component of an SSTable is one from which memory can be reclaimed. SSTables and their managers now track such reclaimable memory and limit the component memory usage accordingly. A new configuration variable defines the memory reclaim threshold. If the total memory of the reclaimable components exceeds this limit, memory will be reclaimed to keep the usage under the limit. This PR considers only the bloom filters as reclaimable and adds support to track and limit them as required.
The feature can be manually verified by doing the following :
1. run a single-node single-shard 1GB cluster
2. create a table with bloom-filter-false-positive-chance of 0.001 (to intentionally cause large bloom filter)
3. populate with tiny partitions
4. watch the bloom filter metrics get capped at 100MB
The default value of the `components_memory_reclaim_threshold` config variable which controls the reclamation process is `.1`. This can also be reduced further during manual tests to easily hit the threshold and verify the feature.
Fixes#17747Closesscylladb/scylladb#17771
* github.com:scylladb/scylladb:
test_bloom_filter.py: disable reclaiming memory from components
sstable_datafile_test: add tests to verify auto reclamation of components
test/lib: allow overriding available memory via test_env_config
sstables_manager: support reclaiming memory from components
sstables_manager: store available memory size
sstables_manager: add variable to track component memory usage
db/config: add a new variable to limit memory used by table components
sstable_datafile_test: add testcase to verify reclamation from sstables
sstables: support reclaiming memory from components
This patch makes the get_description.py script easier to use by the
documentation automation:
1. The script is now a library.
2. You can choose the output of the script, currently supported pipee
and yml.
You can still call the from the command line, like before, but you can
also calls it from another python script.
For example the folowing python script would generate the documentation
for the metrics description of the ./alternator/ttl.cc file.
```
import get_description
metrics = get_description.get_metrics_from_file("./alternator/ttl.cc", "scylla", get_description.get_metrics_information("metrics-config.yml"))
get_description.write_metrics_to_file("out.yaml", metrics, "yml")
```
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Closesscylladb/scylladb#18136
Coredumps coming from CI are produced by a commit, which is not
available in the scylla.git repository, as CI runs on a merge commit
between the main branch (master or enterprise) and the tested PR branch.
Currently the script will attempt to checkout this commit and will fail
as the commit hash is unrecognized.
To work around this, add a --ci flag, which when used, will force the
main branch to be checked out, instead of the commit hash.
Closesscylladb/scylladb#18023
This reverts commit 97b203b1af.
since Seastar provides the formatter, it's not necessary to vendor it in
scylladb anymore.
Refs #13245Closesscylladb/scylladb#18114
storage_group_id_for_token() was only needed from within
tablet_storage_group_manager, so we can kill
table::storage_group_id_for_token().
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#18134
Currently, we load the repair history during boot up. If the number of
repair history entries is high, it might take a while to load them.
In my test, to load 10M entries, it took around 60 seconds.
It is not a must to load the entries during boot up. It is better to
load them in the background to speed up the boot time.
Fixes#17993
Disabled reclaiming memory from sstable components in the testcase as it
interferes with the false positive calculation.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Reclaim memory from the SSTable that has the most reclaimable memory if
the total reclaimable memory has crossed the threshold. Only the bloom
filter memory is considered reclaimable for now.
Fixes#17747
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
The available memory size is required to calculate the reclaim memory
threshold, so store that within the sstables manager.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
sstables_manager::_total_reclaimable_memory variable tracks the total
memory that is reclaimable from all the SSTables managed by it.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
A new configuration variable, components_memory_reclaim_threshold, has
been added to configure the maximum allowed percentage of available
memory for all SSTable components in a shard. If the total memory usage
exceeds this threshold, it will be reclaimed from the components to
bring it back under the limit. Currently, only the memory used by the
bloom filters will be restricted.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Added support to track total memory from components that are reclaimable
and to reclaim memory from them if and when required. Right now only the
bloom filters are considered as reclaimable components but this can be
extended to any component in the future.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
if a pull request's cover letter is empty, `pr.body` is None. in that
case we should not try to pass it to `re.findall()` as the "string"
parameter. otherwise, we'd get
```
TypeError: expected string or bytes-like object, got 'NoneType'
```
so, in this change, we just return an empty list if the PR in question
has an empty cover letter.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18125
The cluster manager library doesn't set the asan/ubsan options
to abort on error and create core dumps; this makes debugging much
harder.
Fix by preparing the environment correctly.
Fixesscylladb/scylladb#17510Closesscylladb/scylladb#17511
what we need is but a script, so instead of checkout the whole repo,
with all history for all tags and branches, let's just checkout
a single file. faster this way.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18126
when adding a label to a PR request we keep getting the following error
message:
```
Traceback (most recent call last):
File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 93, in <module>
main()
File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 89, in main
sync_labels(repo, args.number, args.label, args.action, args.is_issue)
File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 74, in sync_labels
target.add_to_labels(label)
File "/usr/lib/python3/dist-packages/github/Issue.py", line 321, in add_to_labels
headers, data = self._requester.requestJsonAndCheck(
File "/usr/lib/python3/dist-packages/github/Requester.py", line 353, in requestJsonAndCheck
return self.__check(
File "/usr/lib/python3/dist-packages/github/Requester.py", line 378, in __check
raise self.__createException(status, responseHeaders, output)
github.GithubException.GithubException: 403 {"message": "Resource not accessible by integration", "documentation_url": "https://docs.github.com/rest/issues/labels#add-labels-to-an-issue"}
```
Based on
https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token.
The maximum access for pull requests from public forked repositories is
set to `read`
Switching to `pull_request_target` to solve it
Fixes: https://github.com/scylladb/scylladb/issues/18102Closesscylladb/scylladb#18052
The read_field is std::optional<View>. The raw_value::make_value()
accepts managed_bytes_opt which is std::optional<manager_bytes>.
Finally, there's std::optional<T>::optional(std::optional<U>&&)
move constructor (and its copy-constructor peer).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18128
The schema_builder::build() method creates a copy of raw schema
internaly in a hope that builder will be updated and be asked to build
the resulting schema again (e.g. alternator uses this).
However, there are places that build schema using temporary object once
in a `return schema_builder().with_...().build()` manner. For those
invocations copying raw schema is just waste of cycles.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18094
currently, our homebrew formatter formats `std::map` like
```
{{k1, v1}, {k2, v2}}
```
while {fmt} formats a map like:
```
{k1: v1, k2: v2}
```
and if the type of key/value is string, {fmt} quotes it, so a
compaction strategy option is formatted like
```
{"max_threshold": "1"}
```
before switching the formatter to the ones supported by {fmt},
let's update the test to match with the new format. this should
reduce the overhead of reviewing the change of switching the
formatter. we can revert this change, and use a simpler approach
after the change of formatter lands.
Closesscylladb/scylladb#18058
* github.com:scylladb/scylladb:
test/cql-pytest: match error message formated using {fmt}
test/cql-pytest: extract scylla_error() for not allowed options test
before this change, `reclaim_timer::report()` calls
```c++
fmt::format(", at {}", current_backtrace())
```
which allocates a `std::string` on heap, so it can fail and throw. in
that case, `std::terminate()` is called. but at that moment, the reason
why `reclaim_timer::report()` gets called is that we fail to reclaim
memory for the caller. so we are more likely to run into this issue. anyway,
we should not allocate memory in this path.
in this change, a dedicated printer is created so that we don't format
to a temporary `std::string`, and instead write directly to the buffer
of logger. this avoids the memory allocation.
Fixes#18099
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18100
Currently, the error message on a failed RAPIDJSON_ASSERT() is this:
rjson::error (JSON error: condition not met: false)
This is printed e.g. when the code processing a json expects an object
but the JSON has a different type. Or if a JSON object is missing an
expected member. This message however is completely inadequate for
determinig what went wrong. Change this to include a task-local
backtrace, like a real assert failure would. The new error looks like
this:
rjson::error (JSON assertion failed on condition '{}' at: libseastar.so+0x56dede 0x2bde95e 0x2cc18f3 0x2cf092d 0x2d2316b libseastar.so+0x46b623)
Closesscylladb/scylladb#18101
It's more applicable in this case.
Also, built tablets mutations are casted to canonical_mutations, but
when emplaced compiler can pick-up canonical_mutation(const mutation&)
constructor and the cast is not required.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18090
Test.py uses `ring_delay_ms = 0` by default. CDC creates generation's timestamp by adding `ring_delay_ms` to it.
In this test, nodes are learning about new generations (introduced by upgrade procedure and then by node bootstrap) concurrently with doing writes that should go to these generations.
Because of `ring_delay_ms = 0', the generation could have been committed when it should have already been in use.
This can be seen in the following logs from a node:
```
ERROR 2024-03-22 12:29:55,431 [shard 0:strm] cdc - just learned about a CDC generation newer than the one used the last time streams were retrieved. This generation, or some newer one, should have been used instead (new generation's timestamp: 2024/03/22 12:29:54, last time streams were retrieved: 2024/03/22 12:29:55). The new generation probably arrived too late due to a network partition and we've made a write using the wrong set streams.
```
Creating writes during such a generation can result in assigning them a wrong generation or a failure. Failure may occur if it hits short time window when `generation_service::handle_cdc_generation(cdc::generation_id_v2)` has executed
`svc._cdc_metadata.prepare(...)` but`_cdc_metadata.insert(...)` has not yet been executed. With a nonzero ring_delay_ms it's not a problem, because during this time window, the generation should not be in use.
Write can fail with the following response from a node:
```
cdc: attempted to get a stream from a generation that we know about, but weren't able to retrieve (generation timestamp: 2024/03/22 12:29:54, write timestamp: 2024/03/22 12:29:55). Make sure that the replicas which contain this generation's data are alive and reachable from this node.
```
Set ring_delay_ms to 15000 for the debug mode and 5000 in other modes. Wait for the last generation to be in use and sleep one second to make sure there are writes to the CDC table in this generation.
Fixesscylladb/scylladb#17977
Reapply b4144d14c6.
Closesscylladb/scylladb#17998
* github.com:scylladb/scylladb:
test.py: test_topology_upgrade_basic: make ring_delay_ms nonzero
Reapply "test.py: adjust the test for topology upgrade to write to and read from CDC tables"
Setting data accessor implicitly depends on node joining the cluster
with raft leader elected as only then service level mutation is put
into scylla_local table. Calling it after join_cluster avoids starting
new cluster with older version only to immediately migrate it to the
latest one in the background.
Closesscylladb/scylladb#18040
* github.com:scylladb/scylladb:
main: reload service levels data accessor after join_cluster
service: qos: create separate function for reloading data accessor
currently, our homebrew formatter formats `std::map` like
{{k1, v1}, {k2, v2}}
while {fmt} formats a map like:
{k1: v1, k2: v2}
and if the type of key/value is string, {fmt} quotes it, so a
compaction strategy option is formatted like
{"max_threshold": "1"}
before switching the formatter to the ones supported by {fmt},
let's update the test to match with the new format. this should
reduce the overhead of reviewing the change of switching the
formatter. we can revert this change, and use a simpler approach
after the change of formatter lands.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
currently, our homebrew formatter formats `std::map` like
{{k1, v1}, {k2, v2}}
while {fmt} formats a map like:
{k1: v1, k2: v2}
and if the type of key/value is string, {fmt} quotes it, so a
compaction strategy option is formatted like
{"max_threshold": "1"}
as we are switching to the formatters provided by {fmt}, would be
better to support its convention directly.
so, in this change, to prepare the change, before migrating to
{fmt}, let's refactor the test to support both formats by
extracting a helper to format the error message, so that we can
change it to emit both formats.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Test.py uses `ring_delay_ms = 0` by default. CDC creates generation's timestamp
by adding `ring_delay_ms` to it.
In this test, nodes are learning about new generations (introduced by upgrade
procedure and then by node bootstrap) concurrently with doing writes that
should go to these generations.
Because of `ring_delay_ms = 0', the generation could have been committed when
it should have already been in use.
This can be seen in the following logs from a node:
```
ERROR 2024-03-22 12:29:55,431 [shard 0:strm] cdc - just learned about a CDC generation newer than the one used the last time streams were retrieved. This generation, or some newer one, should have been used instead (new generation's timestamp: 2024/03/22 12:29:54, last time streams were retrieved: 2024/03/22 12:29:55). The new generation probably arrived too late due to a network partition and we've made a write using the wrong set streams.
```
Creating writes during such a generation can result in assigning them a wrong
generation or a failure. Failure may occur if it hits short time window when
`generation_service::handle_cdc_generation(cdc::generation_id_v2)` has executed
`svc._cdc_metadata.prepare(...)` but`_cdc_metadata.insert(...)` has not yet
been executed. With a nonzero ring_delay_ms it's not a problem, because during
this time window, the generation should not be in use.
Write can fail with the following response from a node:
```
cdc: attempted to get a stream from a generation that we know about, but weren't able to retrieve (generation timestamp: 2024/03/22 12:29:54, write timestamp: 2024/03/22 12:29:55). Make sure that the replicas which contain this generation's data are alive and reachable from this node.
```
Set ring_delay_ms to 15000 for the debug mode and 5000 in other modes.
Wait for the last generation to be in use and sleep one second to make sure
there are writes to the CDC table in this generation.
Fixes#17977
this series includes test related changes to enable us to drop `FMT_DEPRECATED_OSTREAM` deprecated in {fmt} v10.
Refs #13245Closesscylladb/scylladb#18054
* github.com:scylladb/scylladb:
test: unit: add fmt::formatter for test_data in tests
test/lib: do not print with fmt::to_string()
test/boost: print runtime_error using e.what()
* 'gleb/raft_snapshot_rpc-v3' of github.com:scylladb/scylla-dev:
raft topology: drop RAFT_PULL_TOPOLOGY_SNAPSHOT RPC
Use correct limit for raft commands throughout the code.
When repairing multiple keyspaces, bail out on the first failed keyspace repair, instead of continuing and reporting all failures at the end. This is what Origin does as well.
To be able to test this, a bit of refactoring was needed, to be able to assert that `scylla-nodetool` doesn't make repair requests, beyond the expected ones.
Refs: https://github.com/scylladb/scylla-cluster-tests/issues/7226Closesscylladb/scylladb#17678
* github.com:scylladb/scylladb:
tools/scylla-nodetool: repair: abort on first failed repair
test/nodetool: nodetool(): add check_return_code param
test/nodetool: nodetool(): return res object instead of just stdout
test/nodetool: count unexpected requests
This series provides a reallocate_tablets function, that's initially called by allocate_tablets_for_new_table.
The new allocation implementation is independent of vnodes/token ownership.
Rather than using the natural_endpoints_tracker, it implements its own tracking
based on dc/rack load (== number of replicas in rack), with the additional benefit
that tablet allocation will balance the allocation across racks, using a heap structure,
similar to the one we use to balance tablet allocation across shards in each node.
reallocate_tablets may also be called with an optional parameter pointing the the current tablet_map.
In this case the function either allocates more tablet replicas in datacenters for which the replication factor was increased,
or it will deallocate tablet replicas from datacenters for which replication factor was decreased.
The NetworkTopologyStrategy_tablets_test unit test was extended to cover replication factor changes.
Closesscylladb/scylladb#17846
* github.com:scylladb/scylladb:
network_topology_strategy: reallocate_tablets: consider new_racks before existing racks
network_topology_startegy_test: add NetworkTopologyStrategy_tablet_allocation_balancing_test
network_topology_strategy: reallocate_tablets: support deallocation via rf change
network_topology_startegy_test: tablets_test: randomize cases
network_topology_strategy: allocate_tablets_for_new_table: do not rely on token ownership
network_topology_startegy_test: add NetworkTopologyStrategy_tablets_negative_test
network_topology_strategy_test: endpoints_check: use particular BOOST_CHECK_* functions
network_topology_strategy_test: endpoints_check: verify that replicas are placed on unique nodes
network_topology_strategy_test: endpoints_check: strictly check rf for tablets
network_topology_strategy_test: full_ring_check for tablets: drop unused options param
GCC-14 rightly points out that the constructor of `atomic_cell_view`
is marked private, and cannot be called from its formatter:
```
/usr/bin/g++-14 -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/var/ssd/scylladb -I/var/ssd/scylladb/build/gen -I/var/ssd/scylladb/seastar/include -I/var/ssd/scylladb/build/seastar/gen/include -I/var/ssd/scylladb/build/seastar/gen/src -g -Og -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unused-parameter -ffile-prefix-map=/var/ssd/scylladb=. -march=westmere -Wstack-usage=40960 -U_FORTIFY_SOURCE -Wno-maybe-uninitialized -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT mutation/CMakeFiles/mutation.dir/Debug/atomic_cell.cc.o -MF mutation/CMakeFiles/mutation.dir/Debug/atomic_cell.cc.o.d -o mutation/CMakeFiles/mutation.dir/Debug/atomic_cell.cc.o -c /var/ssd/scylladb/mutation/atomic_cell.cc
In file included from /var/ssd/scylladb/mutation/atomic_cell.cc:9:
/var/ssd/scylladb/mutation/atomic_cell.hh: In member function ‘auto fmt::v10::formatter<atomic_cell>::format(const atomic_cell&, fmt::v10::format_context&) const’:
/var/ssd/scylladb/mutation/atomic_cell.hh:413:67: error: ‘atomic_cell_view::atomic_cell_view(basic_atomic_cell_view<is_mutable>) [with mutable_view is_mutable = mutable_view::yes]’ is private within this context
413 | return fmt::format_to(ctx.out(), "{}", atomic_cell_view(ac));
| ^
/var/ssd/scylladb/mutation/atomic_cell.hh:275:5: note: declared private here
275 | atomic_cell_view(basic_atomic_cell_view<is_mutable> view)
| ^~~~~~~~~~~~~~~~
```
so, in this change, we make the formatter a friend of
`atomic_cell_view`.
since the operator<< was dropped, there is no need to keep its friend
declaration around, so it is dropped in this change.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18081
our homebrew formatter for std::vector<string> formats like
```
{hello, world}
```
while {fmt}'s formatter for sequence-like container formats like
```
["hello", "world"]
```
since we are moving to {fmt} formatters. and in this context,
quoting the verbatim text makes more sense to user. let's
support the format used by {fmt} as well.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18057
according to {fmt}'s document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,
```
// the range will contain "f} continued". The formatter should parse
// specifiers until '}' or the end of the range. In this example the
// formatter should parse the 'f' specifier and return an iterator
// pointing to '}'.
```
so we should check for _both_ '}' and end of the range. when building
scylla with {fmt} 10.2.1, it fails to build code like
```c++
fmt::format_to(out, "{}", fmt_hex(frag))
```
as {fmt}'s compile-time checker fails to parse this format string
along with given argument, as at compile time,
```c++
throw format_error("invalid group_size")
```
is executed.
so, in this change, we check both '}' and the end of range.
the change which introduced this formatter was
2f9dfba800
Refs 2f9dfba800
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18080
correctness when constructing range_streamer depends on compiler
evaluation order of params.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#18079
Raft uses schema commitlog, so all its limits should be derived from
this commitlog segment size, but many places used regular commitlog size
to calculate the limits and did not do what they really suppose to be
doing.