PE101-01: Using Python Packages and Modules

By itself, Python provides everything you need to write programs. These programs won’t have a fancy user interface and they may not run very fast, but they’ll work. If that’s all Python offered, it might have become a popular language but it wouldn’t have taken over most of the world the way it has. No, what Python has going for it is a simple way to take commonly-used chunks of code, wrap them up neatly into sharable budles, and distribute those bundles far and wide. The mechanism for doing this in Python is called packages.

In this training unit, PE101-01, we’re going to look at some of the packages that come with Python. These are packages that you can count on being available anywhere you can run Python. In the next unit, PE101-02, we’ll look at how to find and use packages hosted in repositories available to anyone but not necessarily already installed where you’re running your programs.

Python is, by itself, a rather simple language. The PE100 series of units has introduced you to almost all of the language. The language is kept small by moving the “nice to have, but not really necessary” parts into their own independent packages. Let’s start with an example:

pi equals 3.141592653589793
There are 2652 possible outcomes when drawing two cards from a deck
The natural logarithm of 7.994 is 2.0786912602891316

Packages

There are literally oodles of mathematical functions already implemented for you in the math package. To see a list of them as they stand currently, see ” math - mathematical functions ” in the current Python documentation.

Taking a look at the code above, the first thing we notice is the line import math. This tells the Python interpreter to find the package named “math” and to open it up and make its contents available to this session. The things in the package we can get to will all be named by the word “math”, a period, and then the name of the actual part of the package to use. We would say the package “math” is imported into the “math namespace”. This is the default behavior, but we can change that. Indeed:

0.5728159131285796

By using the as keyword in our import statement, we’re telling Python to load the “random” package but let us refer to everything as though its name was “rand”. In a little more detail, we’re creating a namespace “rand” instead of just letting Python automatically create a namespace with the same name as the package and load everything into that space.

To see a current list of the packages that come with a standard Python installation, take a look at this comprehensive list. In the first few sections it will list “built-in” capabilities - this is what you can do without importing anything. The rest of the page lists the available packages. Click on any of them for details.

Modules

So far we’ve seen functions and constants placed into packages and directly accessible with just the package name. If you have a large package, or a package that has lots of custom changes to manage, it can be helpful to break things up into modules. Think of a module as a “sub-package”. Package and module names are separated by periods. Let’s take a look…

all is good.

We imported the “path” module from the “os” package and loaded it into a namespace called “op”. Then we were able to use that namespace to get to the exists() function. We checked to see if the “/usr/bin” directory exists. That is, as you might suspect, a critically important directory.

Coming up next: Packages from the outside world

As we keep saying, one of the biggest (if not the biggest) strengths of Python is the half million packages that people have written and made publicly available. In the next unit, PE101-02: Repositories, Sharing, and Conda, we’ll take a look at how to find those packages, copy them to CHESS servers, and use them in your own notebooks.

Source: PE101-01: Using Python Packages and Modules

PE101-02: Repositories, Sharing, and Conda

As we’ve mentioned, there is an awful lot of Python code out in the world completely free for us to use. There packages as broadly useful as “SciPy” (numerical methods for the sciences) and as narrowly interesting as “bosch-thermostat-client” for setting values in Bosch Thermostats. With so many packages available, if there is something you need to do in a Jupyter notebook or in a Python program, there is a good chance someone else has done at least part of it and made it available as a package.

Publicly-available packages have to be kept somewhere to be useful. If programmers can’t find them, then they might as well not be made public. Fortunately, the Python world has a central repository - www.pypi.org. The repository (usually shortened to just “repo”, both vowels are long) is searchable.

One thing to notice in the repository is that packages there have version numbers. It’s also pretty common for one package to require another - scikit-learn requires scipy, for instance. Sometimes the requirements also have version numbers. There can be cases where Package A requires Package B, and specifically Package B has to have a version number greater than or equal to 11. When you have a lot of packages to import it can get cumbersome to check all the dependencies and make sure you have a combination that satisfies all the constraints.

In fact, it’s more than just cumbersome. Automatic checking is the only practical solution once the problem gets much size to it, and a piece of software that does this is called a SAT solver.

Conda - hard problems made solvable

Fortunately, there is Conda, a software tool and a repository of its own. The conda developers keep a subset of the half billion packages that are available and they ensure that their repository reflects a combination of versions that should work together. They do the hard work, we take advantage of it. They stay in business by selling their tools to commercial users but, being a research organization, we’re not required to pay.

Conda also has another useful trick: it can take advantage of Python’s virtual environments to let you load outside packages into a completely private space. This way, when you download and install the “instantnobelprize” package (I made that up), it’s only written to your own directories. Other users, and the system as a whole, are protected from whatever it might contain.

Setting up Conda and using it with Jupyter notebooks takes a little bit of work and has to be done from the command line, but so often it’s worth it. If you haven’t used the command line yet, take a look at the training units in SF100 on the Linux command line and scripting.

What follows is taken directly from the CLASSE wiki entry for JupyterHub with just a few modifications.

Python Environments

A Python environment is a local, unique to a user, repository plus a copy of the Python interpreter itself. Having a private environment is how we can load specific versions of packages even when the server has a different one. It even lets us install specific versions of python without affecting anyone else.

When you launch a new notebook, you are presented with a dropdown to select your desired python kernel. The default Python 3 kernel is a CLASSE-IT maintained conda environment in /nfs/opt/anaconda3/envs/python3

Adding New Environments

In addition, you can install your own python environments and have them added as an option when creating new notebooks.

Create your own python environment using your desired python installation. Please see LinuxSoftwareDevelopment for a list of centrally maintained python environments, and further down LinuxSoftwareDevelopment for tips on creating your own conda installation.

Install anything you like in the environment, but you MUST at least install ipykernel. For example

pip install ipykernel

Activate the new environment. If using conda, this would look something like:

source /path/to/conda/install/bin/activate conda activate my-python-env

Add the virtual environment as a jupyter kernel using

python -m ipykernel install --user --name=my-python-env --display-name "My Python Env"

This adds the kernel to ~/.local/share/jupyter/kernels/ and now it can be used by Jupyter. When you create a new notebook now, “My Python Env” will be one of your choices.

Source: PE101-02: Repositories, Sharing, and Conda