OpenFOAM on BGQ

Using OpenFOAM on BG/Q

There are various OpenFOAM versions installed on BGQ. You can see the list by typing "module avail" on the terminal:

OpenFOAM/2.3.1(default)
OpenFOAM/2.4.0
OpenFOAM/3.0.1
OpenFOAM/5.0

and

FEN/OpenFOAM/2.2.0
FEN/OpenFOAM/2.3.0
FEN/OpenFOAM/2.4.0
FEN/OpenFOAM/3.0.1
FEN/OpenFOAM/5.0

The modules start with FEN refer to the installations can be used on the Front-End-Nodes. Therefore if you want to run serial tasks such as blockMesh, decomposePar or reconstructParMesh, please use FEN/OpenFOAM/* modules. Do not forget that FEN is not a dedicated area, each Front-End-Node is shared among connected users and only has 32GB of memory. So if you try to decompose a case with 100 million cells, you will occupy the whole FEN machine and run out of memory therefore make it unavailable for everyone.

When you want to submit a job, you should do that on the FEN using a batch script by typing the modules you want load inside the batch script. This is the only way of using compute nodes on BGQ. There is a sample batch script below. You can use it as a template and modify it according to your needs.

Running Serial OpenFOAM Tasks

As it has been written in the previous section, if you want to run serial tasks you need to use one of the FEN based modules. Most common serial tasks are:

blockMesh: Creates the block structured computational volume consists of hex elements.
decomposePar: Parallelises a serial case. Grid partitioning.
reconstructPar: Reconstructs a parallel case (results).
reconstructParMesh: Reconstructs a parallel case (mesh).

These binaries are not available on the compute nodes, therefore you can use these tools only on the FEN anyway.

Parallelizing OpenFOAM Cases

In order to run OpenFOAM in parallel, the problem needs to be decomposed into a number of subdomains that match the number of processors that will be used. OpenFOAM has a decomposePar utility that performs this operation. The control for this is done creating a OpenFOAM dictionary called decomposeParDict in the system directory of your case folder. decomposeParDict is the input file for the command "decomposePar -force". Below is an example file for decomposing an OpenFOAM case for running on 4 cores.

system/decomposeParDict

/*--------------------------------*- C++ -*----------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  2.4.0                                 |
|   \\  /    A nd           | Web:      www.OpenFOAM.org                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
FoamFile
{
    version     2.0;
    format      ascii;
    class       dictionary;
    location    "system";
    object      decomposeParDict;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

numberOfSubdomains 4;

method          simple;

simpleCoeffs
{
    n               ( 2 2 1 );
    delta           0.001;
}

// ************************************************************************* //

Another option for decomposition is hierarchical. If you use this method, then similar to simple you have to define hierarchicalCoeffs. Only difference between simple and hierarchical is that with hierarchical method you can define the order of the decomposition operation (xyz or zyx). There are more complicated methods of decomposition supported by OpenFOAM but since this a serial tasks that needs to be performed on FEN, these two methods are suggested.

The crucial part of the decomposeParDict is the numberOfSubdomains defined in the file. The intended number of cores should match this value. Therefore if one wants to run a case on 64 nodes using all cores then numberOfSubdomains should be 1024. Also, multiplication of the n values should be equal to this number for consistency. Otherwise OpenFOAM will complain because of the mismatch.

Running Parallel Meshing

The built-in meshing tool comes with OpenFOAM package is called snappyHexMesh. This tool reads inputs from the "system/snappyHexMeshDict" file and writes outputs to the "constant/polyMesh" folder (if used with -overwrite flag, otherwise writes to separate time folders 1/, 2/). snappyHexMesh operates on the outputs of blockMesh, refines specified regions, snaps out solid areas from the volume and adds boundary layers if enabled.

Before running mesh generation one needs to run "decomposePar -force", so that the case is parallelised and made available to run parallel executions on it. One can submit the script below to run parallel mesh generation on BG/Q:

#!/bin/sh
# @ job_name           = motorBike_mesh
# @ job_type           = bluegene
# @ comment            = "BGQ Job By Size"
# @ error              = $(jobid).err
# @ output             = $(jobid).out
# @ bg_size            = 64
# @ wall_clock_limit   = 06:00:00
# @ bg_connectivity    = Torus
# @ queue 

# Load modules
module purge
module load binutils/2.23 bgqgcc/4.8.1 mpich2/gcc-4.8.1 OpenFOAM/5.0
source $FOAM_DOT_FILE

# NOTE: when using --env-all there is a limit of 8192 characters that can be passed to runjob
# so removing LS_COLORS should free up enough space
export -n LS_COLORS

# Disabling the pt2pt small message optimizations - Solves hanging problems
export PAMID_SHORT=0

# Sets the cutoff point for switching from eager to rendezvous protocol at 50MB
export PAMID_EAGER=50M

# Do not optimise collective comm. - Solves termination with signal 36 issue
export PAMID_COLLECTIVES=0

# Do not generate core dump files
export BG_COREDUMPDISABLED=1

# Run mesh generation
runjob --np 1024 --ranks-per-node=16 --env-all : $FOAM_APPBIN/snappyHexMesh -overwrite -parallel

Reducing Number of Files in Parallel Runs

OpenFOAM creates one directory per processor in parallel simulations. This approach significantly increases the number of files per run and usually causes problems for the parallel file storage systems. One way to deal with this issue is using "collated" option that is introduced in the OF-5.0. When this option is used, OF creates only 1 folder called "processors" and results are saved in that directory. Therefore instead of having (n x m) files, you can end-up with only m files where n is the number of processors and m is the number of files per processor directory.

To use the collated option, one must override the global file handler value by adding the following block to the "case/system/controlDict":

optimisationSwitches
{
    //- Parallel IO file handler
    //  uncollated (default), collated or masterUncollated
    fileHandler collated;

    //- collated: thread buffer size for queued file writes.
    //  If set to 0 or not sufficient for the file size threading is not used.
    //  Default: 2e9
    // maxThreadFileBufferSize 0;
    maxThreadFileBufferSize 2e9;
}

Also, one should define the following environment variable in their job scripts:

export FOAM_FILEHANDLER=collated

This is a work-around for an issue in the OpenFOAM's work-flow. If that environment variables is not set, OF reads the main globalDict file with the default file handler and fails due to the inconsistency between the case and main settings. This behaviour is resolved in the OF version 6, however current release we have needs this workaround.

According to our (initial and not so extensive) experiments, cases simulated with "collated" file handler require more memory which is expected considering the fact that it is implemented. Therefore, users must be careful with your cells-per-core and ranks-per-node settings. In the very first test we conducted, it looked like each core needed to be distributed across nodes (which leaves remaining 15 cores idle) when ~ 1M cells per core was used. This is definitely not the desired scenario in terms of performance, therefore further tests needed, however can be considered as a starting point.

Loadleveler Submission Script for Solvers

The following is a sample script for running the OpenFOAM tutorial case on BG/Q:

#!/bin/sh
# @ job_name           = bgqopenfoam
# @ job_type           = bluegene
# @ comment            = "BGQ Job By Size"
# @ error              = $(job_name).$(Host).$(jobid).err
# @ output             = $(job_name).$(Host).$(jobid).out
# @ bg_size            = 64
# @ wall_clock_limit   = 06:00:00
# @ bg_connectivity    = Torus
# @ queue 

#------------------ Solver on BGQ --------------------
# Load BGQ OpenFOAM modules
module purge
module load binutils/2.23 bgqgcc/4.8.1 mpich2/gcc-4.8.1 OpenFOAM/5.0
source $FOAM_DOT_FILE

# NOTE: when using --env-all there is a limit of 8192 characters that can passed to runjob
# so removing LS_COLORS should free up enough space
export -n LS_COLORS

# Some solvers, simpleFOAM particularly, will hang on startup when using the default
# network parameters.  Disabling the pt2pt small message optimizations seems to allow it to run.
export PAMID_SHORT=0
export PAMID_EAGER=50M

# Do not optimise collective comm.
export PAMID_COLLECTIVES=0

# Do not generate core dump files
export BG_COREDUMPDISABLED=1

# Run solver
runjob --np 1024 --env-all  : $FOAM_APPBIN/icoFoam -parallel

Typical OpenFOAM Applications on BG/Q

A list of examples will be shared here. These sample cases are derived from applications that are run on BG/Q but changed for confidentiality reasons. It can guide new users for their specific use cases. Most of the information here is OpenFOAM specific, not BG/Q specific.

Wind Flow Around Buildings

This is a tutorial case that can be found in $FOAM_TUTORIALS/incompressible/simpleFoam/windAroundBuildings

Rotational Flows in OpenFOAM

Information will be added soon!

LES Models in OpenFOAM

Information will be added soon!

Multiphase Flows in OpenFOAM

Information will be added soon!

Post-Processing

Visualisations can be done on the Niagara Cluster!

https://docs.scinet.utoronto.ca/index.php/Visualization

General Tips and Tricks

Run serial tasks on FEN using FEN/OpenFOAM/* modules
Make a quality check for your mesh using checkMesh tool. Be careful that if you run a serial checkMesh in a parallel case, it will only return results from "case/constant/polyMesh" not from "case/processor*/constant/polyMesh"
Perform test runs using debug nodes before you submit large jobs. Request debug session with "debugjob -i" and use runjob.
Always work with binary files. This can be set in the "case/system/controlDict".
You can convert cases from ascii to binary using foamFormatConvert command.
Keep your simulations under $SCRATCH.
If you write your own code, keep them under $HOME. Preferably create a directory "$HOME/OpenFOAM/username-X.Y/src" and work here.
If you write your own code, do not forget to compile them to $FOAM_USER_APPBIN or $FOAM_USER_LIBBIN. You might need to compile shared objects on debug nodes as well.
OpenFOAM is a pure MPI code, there is no multithreading in OpenFOAM.
Each and every node on BG/Q has 16 GB memory and 16 compute cores. Some OpenFOAM functions, especially snappyHexMesh, are very memory consuming up to 4GB memory per 1M cells. Use 8 ranks per node if you run out of memory however be careful with that. Do not waste resources. Usually solvers require 1GB memory per 1M cells which allows users to fully utilize all 16 compute cores on a node.
Try collated option using the version 5.0. It significantly reduces the number of files however master processor gets overloaded.

OpenFOAM on BGQ

Contents

Using OpenFOAM on BG/Q

Running Serial OpenFOAM Tasks

Parallelizing OpenFOAM Cases

Running Parallel Meshing

Reducing Number of Files in Parallel Runs

Loadleveler Submission Script for Solvers

Typical OpenFOAM Applications on BG/Q

Wind Flow Around Buildings

Rotational Flows in OpenFOAM

LES Models in OpenFOAM

Multiphase Flows in OpenFOAM

Post-Processing

General Tips and Tricks

Navigation menu

Search