PE101-02: Repositories, Sharing, and Conda
As we’ve mentioned, there is an awful lot of Python code out in the world completely free for us to use. There packages as broadly useful as “SciPy” (numerical methods for the sciences) and as narrowly interesting as “bosch-thermostat-client” for setting values in Bosch Thermostats. With so many packages available, if there is something you need to do in a Jupyter notebook or in a Python program, there is a good chance someone else has done at least part of it and made it available as a package.
Publicly-available packages have to be kept somewhere to be useful. If programmers can’t find them, then they might as well not be made public. Fortunately, the Python world has a central repository - www.pypi.org. The repository (usually shortened to just “repo”, both vowels are long) is searchable.
One thing to notice in the repository is that packages there have version numbers. It’s also pretty common for one package to require another - scikit-learn requires scipy, for instance. Sometimes the requirements also have version numbers. There can be cases where Package A requires Package B, and specifically Package B has to have a version number greater than or equal to 11. When you have a lot of packages to import it can get cumbersome to check all the dependencies and make sure you have a combination that satisfies all the constraints.
In fact, it’s more than just cumbersome. Automatic checking is the only practical solution once the problem gets much size to it, and a piece of software that does this is called a SAT solver.
Conda - hard problems made solvable
Fortunately, there is Conda, a software tool and a repository of its own. The conda developers keep a subset of the half billion packages that are available and they ensure that their repository reflects a combination of versions that should work together. They do the hard work, we take advantage of it. They stay in business by selling their tools to commercial users but, being a research organization, we’re not required to pay.
Conda also has another useful trick: it can take advantage of Python’s virtual environments to let you load outside packages into a completely private space. This way, when you download and install the “instantnobelprize” package (I made that up), it’s only written to your own directories. Other users, and the system as a whole, are protected from whatever it might contain.
Setting up Conda and using it with Jupyter notebooks takes a little bit of work and has to be done from the command line, but so often it’s worth it. If you haven’t used the command line yet, take a look at the training units in SF100 on the Linux command line and scripting.
What follows is taken directly from the CLASSE wiki entry for JupyterHub with just a few modifications.
Python Environments
A Python environment is a local, unique to a user, repository plus a copy of the Python interpreter itself. Having a private environment is how we can load specific versions of packages even when the server has a different one. It even lets us install specific versions of python without affecting anyone else.
When you launch a new notebook, you are presented with a dropdown to select your desired python kernel. The default Python 3 kernel is a CLASSE-IT maintained conda environment in /nfs/opt/anaconda3/envs/python3
Adding New Environments
In addition, you can install your own python environments and have them added as an option when creating new notebooks.
Create your own python environment using your desired python installation. Please see LinuxSoftwareDevelopment for a list of centrally maintained python environments, and further down LinuxSoftwareDevelopment for tips on creating your own conda installation.
Install anything you like in the environment, but you MUST at least install ipykernel. For example
pip install ipykernel
Activate the new environment. If using conda, this would look something like:
source /path/to/conda/install/bin/activate conda activate my-python-env
Add the virtual environment as a jupyter kernel using
python -m ipykernel install --user --name=my-python-env --display-name "My Python Env"
This adds the kernel to ~/.local/share/jupyter/kernels/ and now it can be used by Jupyter. When you create a new notebook now, “My Python Env” will be one of your choices.