Commit Graph

3 Commits

Author SHA1 Message Date
Glauber Costa
8ba6b569b1 relocatable python: make sure all shared objects are relocated
The interpreter as it is right now has a bug: I incorrectly assumed that
all the shared libraries that python dynamically links would be in
lib-dynload. That is not true, and at least some of them are in
site-packages.

With that, we were loading system libraries for some shared objects.
The approach taken to fix this is to just check if we're seeing a shared
library and relocate everything we see: we will end up relocating the
ones in lib64 too, but that not only should be okay, it is probably even
more fool-proof.

While doing that I noticed that I had forgotten to incorporate one of
previous feedback from Avi (that we're leaving temporary files behind).
So I'm fixing that as well.

[avi: update toolchain]
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190208115501.7234-1-glauber@scylladb.com>
2019-02-08 18:42:24 +02:00
Glauber Costa
fb742473e2 replace /usr/local as a source of packages in the python relocatable interpreter
I was playing with the python3 interpreter trying to get pip to work,
just to see how far we can go. We don't really need pip, but I figured
it would be a good stress test to make sure that the process is working
and robust.

And it didn't really work, because although pip will correctly install
things into $relocatable_root/local/lib, sys.path will still refer to a
hardcoded /usr/local. While this should not affect Scylla, since we
expect to have all our modules in out path anyway -- and that path is
searched before /usr/local, it is still dangerous to make an absolute
reference like this.

Unfortunately, /usr/local/ it is included unconditionally by site.py,
which is executed when the interpreter is started and there is no
environment variable I found to change that (the help string refers to
PYTHONNOUSERSITE, but I found no mention of that in site.py whatsoever)

There is a way to tell site.py not to bother to add user sites, by
passing the -s flag, which this patch does.

Aside from doing that, we also enhance PYTHONPATH to include a reference
to ./local/{lib,lib64}/python<version>/site-packages.

After applying this patch, I was able to build an interpreter containing
only python3-pip and python3-setuptools, and build the relocatable
environment from there.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190206052104.25927-1-glauber@scylladb.com>
2019-02-08 18:41:52 +02:00
Glauber Costa
afed2cddae Create a relocatable python3 interpreter
We would like to deploy Scylla in constrained environments where
internet access is not permitted. In those environments it is not
possible to acquire the dependencies of Scylla from external repos and
the packages have to be sent alongside with its dependencies.

In older distributions, like CentOS7 there isn't a python3 interpreter
available. And while we can package one from EPEL this tends to break in
practice when installing the software in older patchlevels (for
instance, installing into RHEL7.3 when the latest is RHEL7.5).

The reason for that, as we saw in practice, is that EPEL may
not respect RHEL patchlevels and have the python interpreter depending
on newer versions of some system libraries.

virtualenv can be used to create isolated python enviornments, but it is
not designed for full isolation and I hit at least two roadblocks in
practice:

1) It doesn't copy the files, linking some instead. There is an
  --always-copy option but it is broken (for years) in some
  distributions.
2) Even when the above works, it still doesn't copy some files, relying
   on the system files instead (one sad example was the subprocess
   module that was just kept in the system and not moved to the
   virtualenv)

This patch solves that problem by creating a python3 environment in a
directory with the modules that Scylla uses, and no other else. It is
essentially doing what vitualenv should do but doesn't. Once this
environment is assembled the binaries are then made relocatable the same
way the Scylla binary is.

One difference (for now) between the Scylla binary relocation process
and ours is that we steer away from LD_LIBRARY_PATH: the environment
variable is inherited by any child process steming from the caller,
which means that we are unable to use the subprocess module to call
system binaries like mkfs (which our scripts do a lot). Instead, we rely
on RUNPATH to tell the binary where to search for its libraries.

In terms of the python interpreter, PYTHONPATH does not need to be set
for this to work as the python interpreter will include the lib
directory in its PYTHONPATH. To confirm this, we executed the following
code:

    bin/python3 -c "import sys; print('\n'.join(sys.path))"

with the interpreter unpacked to  both /home/centos/glaubertmp/test/ and
/tmp. It yields respectively:

    /home/centos/glaubertmp/test/lib64/python36.zip
    /home/centos/glaubertmp/test/lib64/python3.6
    /home/centos/glaubertmp/test/lib64/python3.6/lib-dynload
    /home/centos/glaubertmp/test/lib64/python3.6/site-packages

and

    /tmp/python/lib64/python36.zip
    /tmp/python/lib64/python3.6
    /tmp/python/lib64/python3.6/lib-dynload
    /tmp/python/lib64/python3.6/site-packages

This was tested by moving the .tar.gz generated on my Fedora28 laptop to
a CentOS machine without python3 installed. I could then invoke
./scylla_python_env/python3 and use the interpreter to call 'ls' through
the subprocess module.

I have also tested that we can successfully import all the modules we listed
for installation and that we can read a sample yaml file (since PyYAML depends
on the system's libyaml, we know that this works)

Time to build:
real	0m15.935s
user	0m15.198s
sys	0m0.382s

Final archive size (uncompressed): 81MB
Final archive sie (compressed)   : 25MB

Signed-off-by: Glauber Costa <glauber@scylladb.com>
--
v3:
- rewrite in python3
- do not use temporary directories, add directly to the archive. Only the python binary
  have to be materialized
- Use --cacheonly for repoquery, and also repoquery --list in a second step to grab the file list
v2:
- do not use yum, resolve dependencies from installed packages instead
- move to scripts as Avi wants this not only for old offline CentOS
2019-02-04 18:02:40 -05:00