We would like to deploy Scylla in constrained environments where
internet access is not permitted. In those environments it is not
possible to acquire the dependencies of Scylla from external repos and
the packages have to be sent alongside with its dependencies.
In older distributions, like CentOS7 there isn't a python3 interpreter
available. And while we can package one from EPEL this tends to break in
practice when installing the software in older patchlevels (for
instance, installing into RHEL7.3 when the latest is RHEL7.5).
The reason for that, as we saw in practice, is that EPEL may
not respect RHEL patchlevels and have the python interpreter depending
on newer versions of some system libraries.
virtualenv can be used to create isolated python enviornments, but it is
not designed for full isolation and I hit at least two roadblocks in
practice:
1) It doesn't copy the files, linking some instead. There is an
--always-copy option but it is broken (for years) in some
distributions.
2) Even when the above works, it still doesn't copy some files, relying
on the system files instead (one sad example was the subprocess
module that was just kept in the system and not moved to the
virtualenv)
This patch solves that problem by creating a python3 environment in a
directory with the modules that Scylla uses, and no other else. It is
essentially doing what vitualenv should do but doesn't. Once this
environment is assembled the binaries are then made relocatable the same
way the Scylla binary is.
One difference (for now) between the Scylla binary relocation process
and ours is that we steer away from LD_LIBRARY_PATH: the environment
variable is inherited by any child process steming from the caller,
which means that we are unable to use the subprocess module to call
system binaries like mkfs (which our scripts do a lot). Instead, we rely
on RUNPATH to tell the binary where to search for its libraries.
In terms of the python interpreter, PYTHONPATH does not need to be set
for this to work as the python interpreter will include the lib
directory in its PYTHONPATH. To confirm this, we executed the following
code:
bin/python3 -c "import sys; print('\n'.join(sys.path))"
with the interpreter unpacked to both /home/centos/glaubertmp/test/ and
/tmp. It yields respectively:
/home/centos/glaubertmp/test/lib64/python36.zip
/home/centos/glaubertmp/test/lib64/python3.6
/home/centos/glaubertmp/test/lib64/python3.6/lib-dynload
/home/centos/glaubertmp/test/lib64/python3.6/site-packages
and
/tmp/python/lib64/python36.zip
/tmp/python/lib64/python3.6
/tmp/python/lib64/python3.6/lib-dynload
/tmp/python/lib64/python3.6/site-packages
This was tested by moving the .tar.gz generated on my Fedora28 laptop to
a CentOS machine without python3 installed. I could then invoke
./scylla_python_env/python3 and use the interpreter to call 'ls' through
the subprocess module.
I have also tested that we can successfully import all the modules we listed
for installation and that we can read a sample yaml file (since PyYAML depends
on the system's libyaml, we know that this works)
Time to build:
real 0m15.935s
user 0m15.198s
sys 0m0.382s
Final archive size (uncompressed): 81MB
Final archive sie (compressed) : 25MB
Signed-off-by: Glauber Costa <glauber@scylladb.com>
--
v3:
- rewrite in python3
- do not use temporary directories, add directly to the archive. Only the python binary
have to be materialized
- Use --cacheonly for repoquery, and also repoquery --list in a second step to grab the file list
v2:
- do not use yum, resolve dependencies from installed packages instead
- move to scripts as Avi wants this not only for old offline CentOS