Python
Python is programing language that continues to grow in popularity for scientific computing. It is very fast to write code in, but the software that results is much much slower than C or Fortran; one should be wary of doing too much compute-intensive work in Python.
Python on Niagara
We currently have two families of Python installed on Niagara.
- Regular Python
- Intel Python (a variant of anaconda)
Here we describe the differences between these packages.
Note that it is highly recommended that you use the NiaEnv/2019b stack by loading the corresponding module, ie.:
module load NiaEnv/2019b
If you do not, you are using the 2018a stack whose python setup is less optimal.
Regular Python
Several Python versions have been installed from source as modules and are optimized for Niagara. We call these 'regular' python versions because they are not dependent on other distribution mechanisms like (ana)conda. Such distributions do not play well with the rest of the software stack, so the 'regular' python modules should be your first choice.
In the Niagara Software Stack version 2019b, i.e., NiaEnv/2019b, the specific versions are 2.7.15, 3.6.8, 3.7.9, 3.8.5, 3.9.8, and 3.11.5, so you can load python 2 or python 3 using
module load python/2.7.15 module load python/3.6.8 module load python/3.7.9 module load python/3.8.5 module load python/3.9.8 module load python/3.11.5
These installations come with the following optimized python packages preinstalled:
Python module | ||||||
---|---|---|---|---|---|---|
Package | python/2.7.15 | python/3.6.8 | python/3.7.9 | python/3.8.5 | python/3.9.8 | python/3.11.5 |
cffi | 1.12.2 | 1.12.2 | 1.14.2 | 1.14.2 | 1.15.0 | 1.15.1 |
cython | 0.29.6 | 0.29.6 | 0.29.21 | 0.29.21 | 0.29.24 | 3.0.2 |
daal | 2019.0.0 | 2019.0.0 | 2020.0.133 | 2020.0.133 | 2021.4.0 | 2023.2.1 |
dask | x | x | 2.25.0 | 2.26.0 | 2021.11.1 | 2023.9.1 |
hypothesis | x | x | x | 5.35.3 | 6.24.3 | 6.84.3 |
ipp | 2019.0 | 2019.0 | 2019.4.243 | 2019.4.243 | 2021.4.0 | 2021.9.0 |
ipython | 5.8.0 | 7.4.0 | 7.18.1 | 7.18.1 | 7.29.0 | 8.15.0 |
jinja2 | 2.10 | 2.10 | x | x | x | 3.1.2 |
line-profiler | 2.1.2 | 2.1.2 | 3.0.2 | 3.0.2 | 3.3.1 | 4.1.1 |
matplotlib | 2.2.4 | 3.0.3 | 3.3.1 | 3.3.2 | 3.4.3 | 3.7.3 |
memory-profiler | 0.55.0 | 0.55.0 | 0.57.0 | 0.57.0 | 0.58.0 | 0.61.0 |
mkl | 2019.0 | 2019.0 | x | x | 2021.4.0 | 2023.2.0 |
numba | 0.43.1 | 0.43.1 | 0.51.1 | 0.51.2 | 0.54.1 | 0.58.0 |
numexpr | 2.6.9 | 2.6.9 | 2.7.1 | 2.7.1 | 2.7.3 | 2.8.6 |
numpy | 1.15.1 | 1.15.1 | 1.19.1 | 1.19.2 | 1.20.4 | 1.25.2 |
pandas | 0.24.2 | 0.24.2 | 1.1.1 | 1.1.2 | 1.3.4 | 2.1.0 |
plotnine | x | x | x | x | x | 0.12.3 |
pybind11 | x | x | 2.5.0 | 2.5.0 | 2.8.1 | 2.11.1 |
pytest | x | x | 6.0.1 | 6.0.2 | 6.2.5 | 7.4.2 |
pythran | x | x | x | x | 0.10.0 | 0.14.0 |
PyYAML | 5.1 | 5.1 | 5.3.1 | 5.3.1 | 6.0 | 6.0.1 |
requests | 2.21.0 | 2.21.0 | 2.24.0 | 2.24.0 | 2.26.0 | 2.31.0 |
scalene | x | x | x | x | x | 1.5.30 |
scikit-learn | 0.19.2 | 0.19.2 | 0.23.2 | 0.23.2 | 1.0.1 | 1.3.0 |
scipy | 1.1.0 | 1.1.0 | 1.5.2 | 1.5.2 | 1.7.2 | 1.11.2 |
snakemake | x | x | x | x | x | 7.32.3 |
virtualenv | 16.4.3 | 16.4.3 | 20.0.31 | 20.0.31 | 20.10.0 | 20.24.5 |
xgboost | 0.82 | 0.82 | 1.2.0 | 1.2.0 | 1.5.0 | 2.0.0 |
In the previous NiaEnv/2018a stack, the regular python versions did not have these packages, and users needed to install them in their own home directory. This was wasteful in terms of storage and has occasional led to quota issues, so we highly recommend using the NiaEnv/2019b packages, which is the default since September 1, 2020.
Additional packages in these module should be installed in virtual environments.
Intel Python
The Intel Python modules are based on the Anaconda package, a python distribution that aims to simplify package management. Intel has modified the package, and optimized the libraries to use the MKL libraries, which should make them faster than the Anaconda modules for some calculations. These modifications have also been incorporated in the intel-PACKAGES included in the regular python modules discussed above, but with Intel Python, you also get the conda command. You can load the python 2 version or the python 3 version of intel python with
module load intelpython2 module load intelpython3
Packages in this module can be installed in so-called conda environments (see below), although virtualenv also works.
A word of caution:
Conda environment are very wasteful when it comes to the number of files that they store in your home directory, and there is a good chance you will hit your quote of 250,000 files with only a few conda environments. And conda being a package manager on its own means that it does not always work well in combination with the rest of the software stack.
Furthermore, the intelpython packages are based on old versions of conda that now have trouble installing packages.
For those reasons, we strongly discourage the use of conda and conda-derived python distributions on Niagara.
Miniconda and Anaconda
If your are looking for anaconda or miniconda, you should find that intelpython is a good substitute. In the NiaEnv/2019b stack, we no longer provide anaconda modules, but we do have aliases conda2 and conda3 for intelpython2 and intelpython3.
We advice against installing your own anaconda or miniconda in your home directory. Their behavior in terms of the amount of material and files they install in your home directory is worse than using the intelpython modules with conda environments. Installing your own anaconda or miniconda would cause many more files to be installed in your $HOME directory, and this might cause trouble with the quota on the number of files. Better is to start from one of the regular python module and create a virtualenv in which you can install your own packages.
Installing your own Python Modules
If you need to install your own Python modules, either in regular python or with conda, you should set up a virtual or conda environment. Visit the Installing your own Python Modules page for instructions on how to set this up.
We would urge you do remove any conda or virtual environments that you are not using, to help reduce the number of files on the $HOME file system.
There are many optional and conflicting packages for Python that users could potentially want (see e.g. http://pypi.python.org/pypi). Therefore, users need to install these additional packages locally in their home directories. In fact, there is no choice, as users do not have permissions to install packages system-wide.
Python provides a number of ways to install packages, the most common of which are the pip and conda commands. By default, these commands would install in the same directory as the one in which the python executable lives, but python provides a number of ways for users to install libraries in their home directories instead.
One way to do this with pip using the --user option, but you shouldn't. That approach is now mostly superseded by virtual environments, and we do not recommend using the --user option as it can interfere with other Python environments.
Virtual environments are a standard in Python to create isolated Python environments. This is useful when certain modules or certain versions of modules are not available in the default python environment.
Virtual environments can be used either with the regular python modules or the intelpython/anaconda modules.
Note that the use of conda is highly discouraged on Niagara.
Using Virtualenv in Regular Python
Creation
In the terminal, first load a python module, e.g.
module load NiaEnv/2019b python/3.11.5
or (on e.g. Teach)
module load TeachEnv/2022a python/3.11.5
Then create a directory for the virtual environments. One can put a virtual environment anywhere, but this directory structure is recommended:
mkdir ~/.virtualenvs
Now we create our first virtualenv called myenv
choose any name you like:
virtualenv --system-site-packages ~/.virtualenvs/myenv
The "--system-site-packages" flag will use the system-installed versions of packages rather than installing them anew (the list of these packages can be found on the Python wiki page). This will result in fewer files created in your virtual environment. After that you can activate that virtual environment:
source ~/.virtualenvs/myenv/bin/activate
As you are in the virtualenv now, you can just type pip install <required module>
to install any module into your virtual environment.
To go back to the normal python installation simply type
deactivate
Command line and job usage
You need to activate the appropriate environment every time you log in, and at the start of all your jobs scripts. However, the installation of packages only needs to be done once. In the NiaEnv/2019b stack, it is *not* necessary to load the python module before activating the environment, while in the NiaEnv/2018a stack, you need to load the python module before activating the environment.
Usage of your virtual environment by others
Sharing a virtual environment with another user is easy. As long as the directory containing the virtual environment is readable by that other user (which on Niagara is the default when that user is in the same group as the directory), then they simply have to source the activate file in the bin directory of that environment, e.g.
source /home/g/group/user/.virtualenvs/myenv/bin/activate
Usage in the Jupyter Hub
You can use your virtual environment in Niagara's Jupyter_Hub, but there are two additional steps required to get the JupterHub to know about your environment and to make it as one of its possible "kernels" for new notebooks.
After having activated your environment, execute the following command
venv2jup
which is nearly equivalent to the following two commands
pip install ipykernel python -m ipykernel install --name NAME --user
The first installs the packages needed to interface with jupyter as a kernel, the latter puts an entry in the .share/jupyter directory, in which the jupyterhub looks for possible kernels. The advantage of the venv2jup command is that in addition to these two commands, it also corrects some paths in case modules are loaded and checks if all is setup properly. This procedure works for NiaEnv/2020a and NiaEnv/2019b, but may fail for NiaEnv/2018a.
For conda environments that were installed in .conda/envs, the jupyter notebook should pick them up automatically.
Using Virtual Environments in Intelpython/Anaconda
Caveat: Although using conda is possible on Niagara, it is strongly recommended not to do so, as it causes several difficulties.
Creation
One can use the same kind of virtual environments for the intelpython and conda modules as for regular modules. However, environments are built-in in Anaconda, see [1]. These "conda environments" are not the same as regular virtual environments, as they can contain general packages, such as compilers. The latter feature means that conda environments are much more flexible, but also that they do not cooperate well with other software modules on Niagara, and will created 10-100 thousands files that can easily cause issues with your file quota on $HOME. Therefore, you should always use regular virtual environments and pip on Niagara and not conda, unless you have a good reason not too.
First, you just need to load a conda-like module, e.g.
module load NiaEnv/2019b intelpython3
Then, you create a virtual environment
conda create -n myPythonEnv python=3.6
(conda puts the environment in the directory $HOME/.conda/envs/myPythonEnv)
Next, you activate your conda environment:
source activate myPythonEnv
At this point you are in your own environment and can just do the installation of any package that you need, e.g.
pip install myFAVpackage
or
conda install myFAVpackage
To go back to the normal python installation, type
source deactivate
Command line and job usage
You need to load the intelpython/anaconda module and activate the appropriate environment every time you log in, and at the start of all your jobs scripts. However, the installation of packages only needs to be done once.
Usage in the Jupyter Hub
You can use conda environment in Niagara's Jupyter_Hub. If they were installed in .conda/envs, the jupyter notebook should pick them up automatically.
Cleaning up conda
Once the installation of a package finishes, please clean the cache:
conda clean -y --all rm -rf $HOME/.conda/pkgs/*
If you do no need a conda environment anymore, make sure to remove it:
conda remove --name myPythonEnv --all
To verify that the environment was removed, run:
conda info --envs
Installing the Scientific Python Suite
For many scientific codes the packages numpy, scipy, matplotlib, pandas and ipython are used. Versions of these are already in the python modules (except for the regular python modules in the NiaEnv/2018a stack).
However, if you need different versions, you could start your virtual environment without --system-site-packages. In that case, for regular python modules, please install versions of package with an intel- prefix, if they exists, so that you will get the most optimized version of the package.
Running serial Python jobs
As with all serial jobs, if your Python computation does not use multiple cores, you should bundle them up so the 40 cores of a node are all performing work. Examples of this can be found on this page.
Using a Jupyter Notebook
Jupyter Hub
You may develop your Python scripts in a Jupyter Notebook on Niagara. A node has been set aside as a Jupyter Hub. See the Jupyter Hub page for details on how to access that node, and develop your code.
The Jupyter Hub is a shared resource, much like the login nodes. You should not use it for extensive computations. For that you'll need to run Jupyter on a compute node.
Running Jupyter on a Niagara Compute Node
If you need more memory or more cores for your notebook calculation, you should request a node through the scheduler and run Jupyter on it yourself.
1. To be able to run Jupyter on a compute node, you must first (a) install it inside a virtual environment, (b) enable a way for jupyter to seemingly write to a specific directory on $HOME, and (c) create a little helper script called notebook.sh that will be used to start the jupyter server in step 2. These are the command that you should use for the installation (which you should do only once, on a login node):
(a) Create virtual env
$ module load NiaEnv/2019b python/3.8.5 $ virtualenv --system-site-packages $HOME/.virtualenvs/jupyter $ source $HOME/.virtualenvs/jupyter/bin/activate $ pip install jupyter jupyterlab $ deactivate
You can choose another directory than $HOME/.virtualenvs/jupyter for where to create the virtual environment, but you need to be consistent and use the same directory everywhere.
(b) Make a writable 'runtime' directory for Jupyter.
$ mkdir -p $HOME/.local/share/jupyter/runtime $ mv -f $HOME/.local/share/jupyter/runtime $SCRATCH/jupyter_runtime || mkdir $SCRATCH/jupyter_runtime $ ln -sT $SCRATCH/jupyter_runtime $HOME/.local/share/jupyter/runtime
(c) Create a launch script.
$ cat > $HOME/.virtualenvs/jupyter/bin/notebook.sh <<EOF #!/bin/bash source \$HOME/.virtualenvs/jupyter/bin/activate export XDG_DATA_HOME=\$SCRATCH/.share export XDG_CACHE_HOME=\$SCRATCH/.cache export XDG_CONFIG_HOME=\$SCRATCH/.config export XDG_RUNTIME_DIR=\$SCRATCH/.runtime export JUPYTER_CONFIG_DIR=\$SCRATCH/.config/.jupyter jupyter \${1:-notebook} --ip \$(hostname -f) --no-browser --notebook-dir=\$PWD EOF $ chmod +x $HOME/.virtualenvs/jupyter/bin/notebook.sh
2. To run the jupyter server on a compute node, start an interactive session with the salloc command (debugjob would also work) and then launch the server:
$ salloc --time=2:00:00 -N 1 -n 40 # get one dedicated node for two hours with 40 cores. $ cd $SCRATCH # $HOME is read-only, so move to $SCRATCH $ $HOME/.virtualenvs/jupyter/bin/notebook.sh # add the argument "lab" to start with the jupyter lab
Make sure you note down (a) the name is of the compute node that you got allocated (the salloc command will let you know, they start with "nia" followed by a 4 digit number), and (b) the last URL that the notebook.sh tells you to use to connect.
4. To connect to this jupyter server running on a compute node, which is not accessible from the internet, in a different terminal on you own computer, you must reconnect to niagara with a port-forwarding tunnel to the compute node on which jupyter is running:
$ ssh -L 8888:niaXXXX:8888 USERNAME@niagara.scinet.utoronto.ca -N
where niaXXXX is the name of the compute node (point (a) above), and USERNAME should be your Digital Research Alliance of Canada username. This command should just "hang" there, it only serves to forward port number 8888 to port 8888 on the compute node.
Finally, point your browser to the URL that the notebook.sh command printed out (point (b) above), i.e., the one with 127.0.0.1 in it.
Producing Matplotlib Figures on Niagara Compute Nodes and in Job Scripts
The conventional way of producing figures from python using matplotlib i.e.,
import matplotlib.pyplot as plt plt.plot(.....) plt.savefig(...)
will not work on the Niagara compute nodes. The reason is that pyplot will try to open the figure in a window on the screen, but the compute nodes do not have screens or window managers. There is an easy workaround, however, that sets up a different 'backend' to matplotlib, one that does not try to open a window, as follows:
import matplotlib as mpl mpl.use('Agg') import matplotlib.pyplot as plt plt.plot(.....) plt.savefig(...)
It is essential that the mpl.use('Agg') command precedes the importing of pyplot.
Using mpi4py
Several of the Python installations contain mpi4py preinstalled. However, using mpi4py requires loading an MPI module. There are several combinations of compiler/MPI/python modules which can be used.
Using Regular Python
The Python in the regular python module (compiled from source) does not come with mpi4py. You will need to install mpi4py in your own storage space, preferably in a virtual environment.
$ module load NiaEnv/2019b gcc/8.3.0 intelmpi/2019u5 python/3.6.8 $ virtualenv --system-site-packages ~/.virtualenvs/mpi4pyenv $ source ~/.virtualenvs/mpi4pyenv/bin/activate (mpi4pyenv)$ pip install mpi4py
Using intelpython
Using the either the NiaEnv/2019b or NiaEnv/2018a stack (the most-recent software stack is always recommended), the intelpython modules all have mpi4py, and should all work if an MPI module is also loaded. An example of this, using the NiaEnv/2019b stack, might be
$ module load NiaEnv/2019b $ module load intel/2019u4 intelmpi/2019u4 $ module load intelpython3/2019u4
Other combinations of compilers (intel/gcc) or MPI module (intelmpi/openmpi) will also work with intelpython.
Using Anaconda
Under the NiaEnv/2018a stack anaconda is available as a module. This module does not come with mpi4py, but can be installed using the usual steps:
$ module load gcc/7.3.0 openmpi/3.1.1 $ module load anaconda3/2018.12 $ $ conda create -n myenv $ $ source activate myenv (myenv) $ (myenv) $ conda install mpi4py (myenv) $
Error messages
When using openmpi with mpi4py, you may get an error of this type:
pml_ucx.c:285 Error: UCP worker does not support MPI_THREAD_MULTIPLE
Add the following lines to your Python script, BEFORE you import the mpi4py package:
import mpi4py.rc mpi4py.rc.threads = False
Alternatively, you can edit the __init__.py file in your virtualenv's mpi4py directory (venv/lib/python3.8/site-packages/mpi4py for example), and change the 'thread_level' to 'funneled':
thread_level = 'funneled'
Which should change the level of mpi4py's thread support.
SciNet's Python Classes
There is a dizzying amount of documentation available for programming in Python on the Python.org webpage. That begin said, each fall, SciNet runs two 4-week classes on using Python for research:
- SCMP142: Introduction to Programming with Python. This class is intended for those with little-to-no programming experience who wish to learn how to program.
- SCMP112: Introduction to Scientific Computing with Python. This class focusses on using Python to perform research computing.
An excellent set of material for teaching scientists to program in Python is also available at the Software Carpentry homepage.