Files
scylladb/cmake/limit_jobs.cmake
Kefu Chai 6e1fb2c74e build: limit ThinLTO link parallelism to prevent OOM in release builds
When building Scylla with ThinLTO enabled (default with Clang), the linker
spawns threads equal to the number of CPU cores during linking. This high
parallelism can cause out-of-memory (OOM) issues in CI environments,
potentially freezing the build host or triggering the OOM killer.

In this change:

1. Rename `LINK_MEM_PER_JOB` to `Scylla_RAM_PER_LINK_JOB` and make it
   user-configurable
2. Add `Scylla_PARALLEL_LINK_JOBS` option to directly control concurrent
   link jobs (useful for hosts with large RAM)
3. Increase the default value of `Scylla_PARALLEL_LINK_JOBS` to 16 GiB
   when LTO is enabled
4. Default to 2 parallel link jobs when LTO is enabled if the calculated
   number if less than 2 for faster build.

Notes:
- Host memory is shared across job pools, so pool separation alone doesn't help
- Ninja lacks per-job memory quota support
- Only affects link parallelism in LTO-enabled builds

See
https://clang.llvm.org/docs/ThinLTO.html#controlling-backend-parallelism

Fixes scylladb/scylladb#22275

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22383
2025-02-12 10:24:13 +02:00

37 lines
1.4 KiB
CMake

if(NOT DEFINED Scylla_PARALLEL_LINK_JOBS)
if(NOT DEFINED Scylla_RAM_PER_LINK_JOB)
# preserve user-provided value
set(_default_ram_value 4096)
if(Scylla_ENABLE_LTO)
# When ThinLTO optimization is enabled, the linker uses all available CPU threads.
# To prevent excessive memory usage, we limit parallel link jobs based on available RAM,
# as each link job requires significant memory during optimization.
set(_default_ram_value 16384)
endif()
set(Scylla_RAM_PER_LINK_JOB ${_default_ram_value} CACHE STRING
"Maximum amount of memory used by each link job (in MiB)")
endif()
cmake_host_system_information(
RESULT _total_mem_mb
QUERY AVAILABLE_PHYSICAL_MEMORY)
math(EXPR _link_pool_depth "${_total_mem_mb} / ${Scylla_RAM_PER_LINK_JOB}")
# Use 2 parallel link jobs to optimize build throughput. The main executable requires
# LTO (slower link phase) while tests are linked without LTO (faster link phase).
# This allows simultaneous linking of LTO and non-LTO targets, enabling better CPU
# utilization by overlapping the slower LTO link with faster test links.
if(_link_pool_depth LESS 2)
set(_link_pool_depth 2)
endif()
set(Scylla_PARALLEL_LINK_JOBS "${_link_pool_depth}" CACHE STRING
"Maximum number of concurrent link jobs")
endif()
set_property(
GLOBAL
APPEND
PROPERTY JOB_POOLS
link_pool=${Scylla_PARALLEL_LINK_JOBS}
submodule_pool=1)
set(CMAKE_JOB_POOL_LINK link_pool)