We would like to deploy Scylla in constrained environments where
internet access is not permitted. In those environments it is not
possible to acquire the dependencies of Scylla from external repos and
the packages have to be sent alongside with its dependencies.
In older distributions, like CentOS7 there isn't a python3 interpreter
available. And while we can package one from EPEL this tends to break in
practice when installing the software in older patchlevels (for
instance, installing into RHEL7.3 when the latest is RHEL7.5).
The reason for that, as we saw in practice, is that EPEL may
not respect RHEL patchlevels and have the python interpreter depending
on newer versions of some system libraries.
virtualenv can be used to create isolated python enviornments, but it is
not designed for full isolation and I hit at least two roadblocks in
practice:
1) It doesn't copy the files, linking some instead. There is an
--always-copy option but it is broken (for years) in some
distributions.
2) Even when the above works, it still doesn't copy some files, relying
on the system files instead (one sad example was the subprocess
module that was just kept in the system and not moved to the
virtualenv)
This patch solves that problem by creating a python3 environment in a
directory with the modules that Scylla uses, and no other else. It is
essentially doing what vitualenv should do but doesn't. Once this
environment is assembled the binaries are then made relocatable the same
way the Scylla binary is.
One difference (for now) between the Scylla binary relocation process
and ours is that we steer away from LD_LIBRARY_PATH: the environment
variable is inherited by any child process steming from the caller,
which means that we are unable to use the subprocess module to call
system binaries like mkfs (which our scripts do a lot). Instead, we rely
on RUNPATH to tell the binary where to search for its libraries.
In terms of the python interpreter, PYTHONPATH does not need to be set
for this to work as the python interpreter will include the lib
directory in its PYTHONPATH. To confirm this, we executed the following
code:
bin/python3 -c "import sys; print('\n'.join(sys.path))"
with the interpreter unpacked to both /home/centos/glaubertmp/test/ and
/tmp. It yields respectively:
/home/centos/glaubertmp/test/lib64/python36.zip
/home/centos/glaubertmp/test/lib64/python3.6
/home/centos/glaubertmp/test/lib64/python3.6/lib-dynload
/home/centos/glaubertmp/test/lib64/python3.6/site-packages
and
/tmp/python/lib64/python36.zip
/tmp/python/lib64/python3.6
/tmp/python/lib64/python3.6/lib-dynload
/tmp/python/lib64/python3.6/site-packages
This was tested by moving the .tar.gz generated on my Fedora28 laptop to
a CentOS machine without python3 installed. I could then invoke
./scylla_python_env/python3 and use the interpreter to call 'ls' through
the subprocess module.
I have also tested that we can successfully import all the modules we listed
for installation and that we can read a sample yaml file (since PyYAML depends
on the system's libyaml, we know that this works)
Time to build:
real 0m15.935s
user 0m15.198s
sys 0m0.382s
Final archive size (uncompressed): 81MB
Final archive sie (compressed) : 25MB
Signed-off-by: Glauber Costa <glauber@scylladb.com>
--
v3:
- rewrite in python3
- do not use temporary directories, add directly to the archive. Only the python binary
have to be materialized
- Use --cacheonly for repoquery, and also repoquery --list in a second step to grab the file list
v2:
- do not use yum, resolve dependencies from installed packages instead
- move to scripts as Avi wants this not only for old offline CentOS
225 lines
9.1 KiB
Python
Executable File
225 lines
9.1 KiB
Python
Executable File
#!/usr/bin/env python3
|
|
# -*- coding: utf-8 -*-
|
|
#
|
|
# Copyright (C) 2019 ScyllaDB
|
|
#
|
|
|
|
#
|
|
# This file is part of Scylla.
|
|
#
|
|
# Scylla is free software: you can redistribute it and/or modify
|
|
# it under the terms of the GNU Affero General Public License as published by
|
|
# the Free Software Foundation, either version 3 of the License, or
|
|
# (at your option) any later version.
|
|
#
|
|
# Scylla is distributed in the hope that it will be useful,
|
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
# GNU General Public License for more details.
|
|
#
|
|
# You should have received a copy of the GNU General Public License
|
|
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
|
|
#
|
|
|
|
import argparse
|
|
import io
|
|
import os
|
|
import pathlib
|
|
import subprocess
|
|
import tarfile
|
|
import pathlib
|
|
import shutil
|
|
import sys
|
|
import tarfile
|
|
from tempfile import mkstemp
|
|
|
|
def should_copy(f):
|
|
'''Given a file, returns whether or not we are interested in copying this file.
|
|
We want the actual python interepreter, and the files in /lib(64) and /usr/lib(64)
|
|
All the stuff in /var and other paths is not useful for the relocatable package.
|
|
The locale files take a lot of space and we won't use them, so we get rid of them as well.
|
|
The build_id files will be symlinks to binaries and shared libraries, that we don't want to keep.
|
|
'''
|
|
|
|
if f == "": # package with no files
|
|
return False
|
|
|
|
if f.startswith("/usr/bin/python3."):
|
|
return f[-1] != "m" # python ships with two binaries, one of them with a specialized malloc (python 3.xm). No need.
|
|
|
|
if f.startswith("/lib64/ld-linux"): # the interpreter is copied by the binary fixup process
|
|
return False
|
|
|
|
parts = list(pathlib.PurePath(f).parts)
|
|
el = parts.pop(0)
|
|
if el != "/":
|
|
raise RuntimeError("unexpected path: not absolute! {}".format(f))
|
|
|
|
if len(parts) > 0 and parts[0] == "usr":
|
|
parts.pop(0)
|
|
|
|
if not parts:
|
|
return False
|
|
|
|
if parts[0] != "lib" and parts[0] != "lib64":
|
|
return False
|
|
parts.pop(0)
|
|
|
|
if len(parts) > 0 and (parts[0] == "locale" or parts[0] == ".build-id"):
|
|
return False
|
|
return True
|
|
|
|
def fix_binary(ar, path, libpath):
|
|
'''Makes one binary or shared library relocatable. To do that, we need to set RUNPATH to $ORIGIN/../lib64 so we get libraries
|
|
from the relocatable directory and not from the system during runtime. We also want to copy the interpreter used so
|
|
we can launch with it later.
|
|
'''
|
|
# it's a pity patchelf have to patch an actual binary.
|
|
patched_elf = mkstemp()[1]
|
|
shutil.copy2(path, patched_elf)
|
|
|
|
subprocess.check_call(['patchelf',
|
|
'--set-rpath',
|
|
libpath,
|
|
patched_elf])
|
|
return patched_elf
|
|
|
|
def fix_python_binary(ar, binpath):
|
|
'''Makes the python binary relocatable. To do that, we need to set RUNPATH to $ORIGIN/../lib64 so we get libraries
|
|
from the relocatable directory and not from the system during runtime. We also want to copy the interpreter used so
|
|
we can launch with it later.
|
|
'''
|
|
pyname = os.path.basename(binpath)
|
|
patched_binary = fix_binary(ar, binpath, '$ORIGIN/../lib64/')
|
|
interpreter = subprocess.check_output(['patchelf',
|
|
'--print-interpreter',
|
|
patched_binary], universal_newlines=True).splitlines()[0]
|
|
ar.add(os.path.realpath(interpreter), arcname=os.path.join("libexec", "ld.so"))
|
|
ar.add(patched_binary, arcname=os.path.join("libexec", pyname + ".bin"))
|
|
|
|
def fix_dynload(ar, binpath, targetpath):
|
|
patched_binary = fix_binary(ar, binpath, '$ORIGIN/../../')
|
|
ar.add(patched_binary, arcname=targetpath, recursive=False)
|
|
|
|
def gen_python_thunk(ar, pybin):
|
|
thunk=b'''\
|
|
#!/bin/bash
|
|
x="$(readlink -f "$0")"
|
|
b="$(basename "$x")"
|
|
d="$(dirname "$x")/.."
|
|
ldso="$d/libexec/ld.so"
|
|
realexe="$d/libexec/$b.bin"
|
|
exec -a "$0" "$ldso" "$realexe" "$@"
|
|
'''
|
|
|
|
ti = tarfile.TarInfo(name=os.path.join("bin", pybin))
|
|
ti.size = len(thunk)
|
|
ti.mode = 0o755
|
|
ar.addfile(ti, fileobj=io.BytesIO(thunk))
|
|
|
|
ti = tarfile.TarInfo(name=os.path.join("bin", "python3"))
|
|
ti.type = tarfile.SYMTYPE
|
|
ti.linkname = pybin
|
|
ar.addfile(ti)
|
|
|
|
def copy_file_to_python_env(ar, f):
|
|
if f.startswith("/usr/bin/python"):
|
|
gen_python_thunk(ar, os.path.basename(f))
|
|
fix_python_binary(ar, f)
|
|
else:
|
|
libfile = f
|
|
# python tends to install in both /usr/lib and /usr/lib64, which doesn't mean it is
|
|
# a package for the wrong arch. So we need to handle both /lib and /lib64. Copying files
|
|
# blindly from /lib could be a problem, but we filtered out all the i686 packages during
|
|
# the dependency generation.
|
|
if libfile.startswith("/usr/"):
|
|
libfile = libfile.replace("/usr/", "/", 1)
|
|
if libfile.startswith("/lib/"):
|
|
libfile = libfile.replace("/lib/", "lib64/", 1)
|
|
elif libfile.startswith("/lib64/"):
|
|
libfile = libfile.replace("/lib64/", "lib64/", 1)
|
|
else:
|
|
raise RuntimeError("unexpected path: don't know what to do with {}".format(f))
|
|
|
|
# copy file instead of link unless we link to the current directory.
|
|
# links to the current directory are usually safe, but because we are manipulating
|
|
# the directory structure, very likely links that transverse paths will break.
|
|
if os.path.islink(f) and os.readlink(f) != os.path.basename(os.readlink(f)):
|
|
ar.add(os.path.realpath(f), arcname=libfile)
|
|
elif os.path.dirname(f).endswith("lib-dynload"):
|
|
fix_dynload(ar, f, libfile)
|
|
else:
|
|
# in case this is a directory that is listed, we don't want to include everything that is in that directory
|
|
# for instance, the python3 package will own site-packages, but other packages that we are not packaging could have
|
|
# filled it with stuff.
|
|
ar.add(f, arcname=libfile, recursive=False)
|
|
|
|
def filter_basic_packages(package):
|
|
'''Returns true if this package should be filtered out. We filter out packages that are too basic like the Fedora repos,
|
|
or contains no files'''
|
|
# The packages below are way too basic and are listed just because repoquery will, correctly, list
|
|
# everything. We make our lives easier by filtering them out.
|
|
too_basic_packages = ["filesystem",
|
|
"tzdata",
|
|
"chkconfig",
|
|
"basesystem",
|
|
"coreutils",
|
|
"fedora-release",
|
|
"fedora-repos",
|
|
"fedora-gpg-keys",
|
|
"glibc-minimal-langpack",
|
|
"glibc-all-langpacks"]
|
|
return True in [package.startswith(x) for x in too_basic_packages]
|
|
|
|
|
|
def dependencies(package_list):
|
|
'''Generates a list of RPM dependencies for the python interpreter and its modules'''
|
|
output = subprocess.check_output(['repoquery',
|
|
# Some architectures like x86_64 also carry packages for
|
|
# their 32-bit versions. In thise cases, we won't want
|
|
# to mix them since we will only install lib64/
|
|
'--archlist=noarch,{machine}'.format(machine=os.uname().machine),
|
|
# Don't look into the yum cache. Guarantees consistent builds
|
|
'--cacheonly',
|
|
'--installed',
|
|
'--resolve',
|
|
'--requires',
|
|
'--recursive'] + package_list,
|
|
universal_newlines=True).splitlines()
|
|
|
|
output = [x for x in output if not filter_basic_packages(x)]
|
|
return output + package_list
|
|
|
|
def generate_file_list(executables):
|
|
'''Given the RPM files that we want to scan in this run, returns a list of all files in those packages that are of interest to us'''
|
|
|
|
exclusions = []
|
|
for exe in executables:
|
|
exclusions += subprocess.check_output(['rpm', '-qd', exe], universal_newlines=True).splitlines()
|
|
|
|
# we don't want to use --list the first time: For one, we want to be able to filter out some packages with files we don't want to copy
|
|
# Second, repoquery --list do not include the actual package files when used with --resolve and --recursive (only its dependencies').
|
|
# So we need a separate step in which all packages are added together.
|
|
candidates = subprocess.check_output(['repoquery',
|
|
'--installed',
|
|
'--cacheonly',
|
|
'--list' ] + executables, universal_newlines=True).splitlines()
|
|
|
|
return [x for x in set(candidates) - set(exclusions) if should_copy(x)]
|
|
|
|
ap = argparse.ArgumentParser(description='Create a relocatable python3 interpreter.')
|
|
ap.add_argument('--output', required=True,
|
|
help='Destination file (tar format)')
|
|
ap.add_argument('modules', nargs='*', help='list of python modules to add, separated by spaces')
|
|
|
|
args = ap.parse_args()
|
|
packages= ["python3"] + args.modules
|
|
|
|
file_list = generate_file_list(dependencies(packages))
|
|
ar = tarfile.open(args.output, mode='w|gz')
|
|
|
|
for f in file_list:
|
|
copy_file_to_python_env(ar, f)
|
|
|
|
ar.close()
|