Co-array Fortran on Niagara
Versions 12 and higher of the Intel Fortran compiler, and version 5.1 and up of the GNU Fortran compiler, support almost all of Co-array Fortran, and are installed on Niagara.
This page will briefly sketch how to compile and run Co-array Fortran programs using these compilers.
Example
Here is an example of a co-array fortran program:
program Hello_World integer :: i ! Local variable integer :: num[*] ! scalar coarray if (this_image() == 1) then write(*,'(a)') 'Enter a number: ' read(*,'(i80)') num ! Distribute information to other images do i = 2, num_images() num[i] = num end do end if sync all ! Barrier to make sure the data has arrived ! I/O from all nodes write(*,'(a,i0,a,i0)') 'Hello ',num,' from image ', this_image() end program Hello_world
(Adapted from [1]).
Compiling, linking and running co-array fortran programs is different depending on whether you will run the program only on a single node (with 8 cores), or on several nodes, and depends on which compiler you are using, Intel, or GNU.
Intel compiler instructions for Coarray Fortran
Loading necessary modules
First, you need to load the module for version 12 or greater of the Intel compilers, as well as Intel MPI.
module load NiaEnv/2019b intel/2019u4 intelmpi/2019u4
There are two modes in which the intel compiler supports coarray fortran:
1. Single node usage
2. Multiple node usage
The way you compile and run for these two cases is different.
Note: For multiple node usage, it makes sense to have to load the IntelMPI module, since Intel's implementation of Co-array Fortran uses MPI. However, the Intel MPI module is needed even for single-node usage, just in order to link successfully.
Single node usage
Compilation
ifort -O3 -xHost -coarray=shared -c [sourcefile] -o [objectfile]
Linking
ifort -coarray=shared [objectfile] -o [executable]
Running
To run this co-array program on one node with 16 images (co-array version for what openmp calls a thread and mpi calls a process), you simply put
./[executable]
in your job submission script. The reason that this gives 16 images is that HyperThreading is enabled on Niagara nodes, which makes it seem to the system as if there are 80 computing units on a node, even though physically there are only 40.
To control the number of images, you can change the FOR_COARRAY_NUM_IMAGES environment variable:
export FOR_COARRAY_NUM_IMAGES=2 ./[executable]
This can be useful for testing.
An example submission script would look as follows:
#!/bin/bash # SLURM submission script for SciNet Niagara (Intel Coarray Fortran) # #SBATCH --nodes=1 #SBATCH --time=1:00:00 #SBATCH --cpus-per-task=40 #SBATCH --job-name test # DIRECTORY TO RUN - $SLURM_SUBMIT_DIR is directory job was submitted from cd $SLURM_SUBMIT_DIR # LOAD MODULES THAT THE APPLICATION WAS COMPILED WITH module load NiaEnv/2019b intel/2019u4 intelmpi/2019u4 # RUN THE APPLICATION WITH 80 IMAGES export FOR_COARRAY_NUM_IMAGES=80 ./[executable]
Multiple nodes usage
Please read over the following link: [2] for the newer intel compilers
module load NiaEnv/2019b intel/2019u4 intelmpi/2019u4
Compilation
ifort -O3 -xHost -coarray=distributed -c [sourcefile] -o [objectfile]
Linking
ifort -coarray=distributed [objectfile] -o [executable]
Running
Because distributed co-array fortran is based on MPI, we need to launch the mpi processes on different nodes. The defaults will work on Niagara, however the number of images will be equal to the number of nodes * cpus-per-
An example submission script would look as follows:
#!/bin/bash # #SBATCH --nodes=4 #SBATCH --time=1:00:00 #SBATCH --ntasks-per-node=40 #SBATCH --job-name test # DIRECTORY TO RUN - $SLURM_SUBMIT_DIR is directory job was submitted from cd $SLURM_SUBMIT_DIR # LOAD MODULES THAT THE APPLICATION WAS COMPILED WITH module load NiaEnv/2019b intel/2019u4 intelmpi/2019u4 export FOR_COARRAY_NUM_IMAGES=$SLURM_NTASKS # EXECUTION export COMMAND;FOR_COARRAY_NUM_IMAGES = nodes*ntasks-per-node ./[executable]
You can provide a configuration file using the ifort '-corray-config-file=file.cfg` option that will allow you to provide your own MPI parameters including the number of tasks per host and the number of total tasks, ie. images.