Python
Python is programing language that continues to grow in popularity for scientific computing. It is very fast to write code in, but the software that results is much much slower than C or Fortran; one should be wary of doing too much compute-intensive work in Python.
Python on Niagara
We currently have two families of Python installed on Niagara.
- Regular Python
- Intel Python (a variant of anaconda)
Here we describe the differences between these packages.
Regular Python
Python versions 2.7 and 3.6 have been installed from source and are optimized for Niagara. We call these 'regular' python versions because they are not dependent on other distribution mechanisms like (ana)conda. Such distributions do not play well with the rest of the software stack, so the 'regular' python modules should be your first choice.
In the Niagara Software Stack version 2019b, i.e., NiaEnv/2019b, the specific versions are 2.7.15 and 3.6.8, so you can load python 2 or python 3 using
module load python/2.7.15 module load python/3.6.8
Both these installations come with the following optimized python packages preinstalled:
virtualenv intel-numpy intel-scipy intel-scikit-learn ipp daal jinja2 cython matplotlib ipython numba numexpr pandas line_profiler memory_profiler funcsigs pycosat pyeditline pyOpenSSL PySocks PyYAML requests xgboost
In this list, a intel-PACKAGE package provides an Intel-optimized version of PACKAGE, often using Intel's high performance Math Kernel Library. You use these package in python the same way you would non-optimized versions, i.e., import PACKAGE.
In the previous NiaEnv/2018a stack, the regular python versions did not have these packages, and users needed to install them in their own home directory. This was wasteful in terms of storage and has occasional led to quota issues, so we highly recommend using the NiaEnv/2019b packages.
Additional packages in these module should be installed in virtual environments.
Intel Python
The Intel Python modules are based on the Anaconda package, a python distribution that aims to simplify package management. Intel has modified the package, and optimized the libraries to use the MKL libraries, which should make them faster than the Anaconda modules for some calculations. These modifications have also been incorporated in the intel-PACKAGES included in the regular python modules discussed above, but with Intel Python, you also get the conda command. You can load the python 2 version or the python 3 version of intel python with
module load intelpython2 module load intelpython3
Packages in this module can be installed in so-called conda environments (see below), although virtualenv also works.
A word of caution: conda environment are very wasteful when it comes to the number of files that they store in your home directory, and their is a good chance you will hit your quote of 250,000 files with only a few conda environments. And conda being a package manager on its own means that it does not always work well in combination with the rest of the software stack.
Miniconda and Anaconda
If your are looking for anaconda or miniconda, you should find that intelpython is a good substitute. In the NiaEnv/2019b stack, we no longer provide anaconda modules, but we do have aliases conda2 and conda3 for intelpython2 and intelpython3.
We advice against installing your own anaconda or miniconda in your home directory. Instead, start from one of the intelpython modules and use conda environments, or, even better, start from a regular python module and create a virtualenv in which you can install your own packages. Installing your own anaconda or miniconda would cause many more files to be installed in your $HOME directory, and this might cause trouble with the quota on the number of files.
Installing your own Python Modules
If you need to install your own Python modules, either in regular python or with conda, you should set up a virtual or conda environment. Visit the Installing your own Python Modules page for instructions on how to set this up.
We would urge you do remove any conda or virtual environments that you are not using, to help reduce the number of files on the $HOME file system.
Running serial Python jobs
As with all serial jobs, if your Python computation does not use multiple cores, you should bundle them up so the 40 cores of a node are all performing work. Examples of this can be found on this page.
Using a Jupyter Notebook
You may develop your Python scripts in a Jupyter Notebook on Niagara. A node has been set aside as a Jupyter Hub. See this page for details on how to access that node, and develop your code.
Producing Matplotlib Figures on Niagara Compute Nodes and in Job Scripts
The conventional way of producing figures from python using matplotlib i.e.,
import matplotlib.pyplot as plt plt.plot(.....) plt.savefig(...)
will not work on the Niagara compute nodes. The reason is that pyplot will try to open the figure in a window on the screen, but the compute nodes do not have screens or window managers. There is an easy workaround, however, that sets up a different 'backend' to matplotlib, one that does not try to open a window, as follows:
import matplotlib as mpl mpl.use('Agg') import matplotlib.pyplot as plt plt.plot(.....) plt.savefig(...)
It is essential that the mpl.use('Agg') command precedes the importing of pyplot.
SciNet's Python Classes
There is a dizzying amount of documentation available for programming in Python on the Python.org webpage. That begin said, each fall, SciNet runs two 4-week classes on using Python for research:
- SCMP142: Introduction to Programming with Python. This class is intended for those with little-to-no programming experience who wish to learn how to program.
- SCMP112: Introduction to Scientific Computing with Python. This class focusses on using Python to perform research computing.
An excellent set of material for teaching scientists to program in Python is also available at the Software Carpentry homepage.