<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://docs.scinet.utoronto.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Fertinaz</id>
	<title>SciNet Users Documentation - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://docs.scinet.utoronto.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Fertinaz"/>
	<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php/Special:Contributions/Fertinaz"/>
	<updated>2026-05-05T11:48:45Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.35.12</generator>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=OpenFOAM_on_BGQ&amp;diff=1686</id>
		<title>OpenFOAM on BGQ</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=OpenFOAM_on_BGQ&amp;diff=1686"/>
		<updated>2018-10-31T16:26:37Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Using OpenFOAM on BG/Q ==&lt;br /&gt;
There are various OpenFOAM versions installed on BGQ. You can see the list by typing &amp;quot;module avail&amp;quot; on the terminal:&lt;br /&gt;
* OpenFOAM/2.3.1(default)&lt;br /&gt;
* OpenFOAM/2.4.0&lt;br /&gt;
* OpenFOAM/3.0.1&lt;br /&gt;
* OpenFOAM/5.0&lt;br /&gt;
and&lt;br /&gt;
* FEN/OpenFOAM/2.2.0&lt;br /&gt;
* FEN/OpenFOAM/2.3.0&lt;br /&gt;
* FEN/OpenFOAM/2.4.0&lt;br /&gt;
* FEN/OpenFOAM/3.0.1&lt;br /&gt;
* FEN/OpenFOAM/5.0 &lt;br /&gt;
&lt;br /&gt;
The modules start with FEN refer to the installations can be used on the Front-End-Nodes. Therefore if you want to run serial tasks such as blockMesh, decomposePar or reconstructParMesh, please use FEN/OpenFOAM/* modules. Do not forget that FEN is not a dedicated area, each Front-End-Node is shared among connected users and only has 32GB of memory. So if you try to decompose a case with 100 million cells, you will occupy the whole FEN machine and run out of memory therefore make it unavailable for everyone.&lt;br /&gt;
&lt;br /&gt;
When you want to submit a job, you should do that on the FEN using a batch script by typing the modules you want load inside the batch script. This is the only way of using compute nodes on BGQ. There is a sample batch script below. You can use it as a template and modify it according to your needs.&lt;br /&gt;
&lt;br /&gt;
== Running Serial OpenFOAM Tasks ==&lt;br /&gt;
&lt;br /&gt;
As it has been written in the previous section, if you want to run serial tasks you need to use one of the FEN based modules. Most common serial tasks are:&lt;br /&gt;
* blockMesh: Creates the block structured computational volume consists of hex elements.&lt;br /&gt;
* decomposePar: Parallelises a serial case. Grid partitioning.&lt;br /&gt;
* reconstructPar: Reconstructs a parallel case (results). &lt;br /&gt;
* reconstructParMesh: Reconstructs a parallel case (mesh). &lt;br /&gt;
&lt;br /&gt;
These binaries are not available on the compute nodes, therefore you can use these tools only on the FEN anyway.&lt;br /&gt;
&lt;br /&gt;
== Parallelizing OpenFOAM Cases ==&lt;br /&gt;
&lt;br /&gt;
In order to run OpenFOAM in parallel, the problem needs to be decomposed into a number of subdomains that match the number of processors that will be used. OpenFOAM has a  '''[http://www.openfoam.org/docs/user/running-applications-parallel.php decomposePar]''' utility that performs this operation. The control for this is done creating a OpenFOAM dictionary called decomposeParDict in the system directory of your case folder. decomposeParDict is the input file for the command &amp;quot;decomposePar -force&amp;quot;. Below is an example file for decomposing an OpenFOAM case for running on 4 cores.&lt;br /&gt;
&lt;br /&gt;
'''system/decomposeParDict'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
/*--------------------------------*- C++ -*----------------------------------*\&lt;br /&gt;
| =========                 |                                                 |&lt;br /&gt;
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |&lt;br /&gt;
|  \\    /   O peration     | Version:  2.4.0                                 |&lt;br /&gt;
|   \\  /    A nd           | Web:      www.OpenFOAM.org                      |&lt;br /&gt;
|    \\/     M anipulation  |                                                 |&lt;br /&gt;
\*---------------------------------------------------------------------------*/&lt;br /&gt;
FoamFile&lt;br /&gt;
{&lt;br /&gt;
    version     2.0;&lt;br /&gt;
    format      ascii;&lt;br /&gt;
    class       dictionary;&lt;br /&gt;
    location    &amp;quot;system&amp;quot;;&lt;br /&gt;
    object      decomposeParDict;&lt;br /&gt;
}&lt;br /&gt;
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //&lt;br /&gt;
&lt;br /&gt;
numberOfSubdomains 4;&lt;br /&gt;
&lt;br /&gt;
method          simple;&lt;br /&gt;
&lt;br /&gt;
simpleCoeffs&lt;br /&gt;
{&lt;br /&gt;
    n               ( 2 2 1 );&lt;br /&gt;
    delta           0.001;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
// ************************************************************************* //&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another option for decomposition is hierarchical. If you use this method, then similar to simple you have to define hierarchicalCoeffs. Only difference between simple and hierarchical is that with hierarchical method you can define the order of the decomposition operation (xyz or zyx). There are more complicated methods of decomposition supported by OpenFOAM but since this a serial tasks that needs to be performed on FEN, these two methods are suggested.&lt;br /&gt;
&lt;br /&gt;
The crucial part of the decomposeParDict is the numberOfSubdomains defined in the file. The intended number of cores should match this value. Therefore if one wants to run a case on 64 nodes using all cores then numberOfSubdomains should be 1024. Also, multiplication of the n values should be equal to this number for consistency. Otherwise OpenFOAM will complain because of the mismatch.&lt;br /&gt;
&lt;br /&gt;
== Running Parallel Meshing ==&lt;br /&gt;
The built-in meshing tool comes with OpenFOAM package is called snappyHexMesh. This tool reads inputs from the &amp;quot;system/snappyHexMeshDict&amp;quot; file and writes outputs to the &amp;quot;constant/polyMesh&amp;quot; folder (if used with -overwrite flag, otherwise writes to separate time folders 1/, 2/). snappyHexMesh operates on the outputs of blockMesh, refines specified regions, snaps out solid areas from the volume and adds boundary layers if enabled. &lt;br /&gt;
&lt;br /&gt;
Before running mesh generation one needs to run &amp;quot;decomposePar -force&amp;quot;, so that the case is parallelised and made available to run parallel executions on it. One can submit the script below to run parallel mesh generation on BG/Q:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = motorBike_mesh&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(jobid).err&lt;br /&gt;
# @ output             = $(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 06:00:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue &lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load binutils/2.23 bgqgcc/4.8.1 mpich2/gcc-4.8.1 OpenFOAM/5.0&lt;br /&gt;
source $FOAM_DOT_FILE&lt;br /&gt;
&lt;br /&gt;
# NOTE: when using --env-all there is a limit of 8192 characters that can be passed to runjob&lt;br /&gt;
# so removing LS_COLORS should free up enough space&lt;br /&gt;
export -n LS_COLORS&lt;br /&gt;
&lt;br /&gt;
# Disabling the pt2pt small message optimizations - Solves hanging problems&lt;br /&gt;
export PAMID_SHORT=0&lt;br /&gt;
&lt;br /&gt;
# Sets the cutoff point for switching from eager to rendezvous protocol at 50MB&lt;br /&gt;
export PAMID_EAGER=50M&lt;br /&gt;
&lt;br /&gt;
# Do not optimise collective comm. - Solves termination with signal 36 issue&lt;br /&gt;
export PAMID_COLLECTIVES=0&lt;br /&gt;
&lt;br /&gt;
# Do not generate core dump files&lt;br /&gt;
export BG_COREDUMPDISABLED=1&lt;br /&gt;
&lt;br /&gt;
# Run mesh generation&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --env-all : $FOAM_APPBIN/snappyHexMesh -overwrite -parallel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Reducing Number of Files in Parallel Runs ==&lt;br /&gt;
OpenFOAM creates one directory per processor in parallel simulations. This approach significantly increases the number of files per run and usually causes problems for the parallel file storage systems. One way to deal with this issue is using &amp;quot;collated&amp;quot; option that is introduced in the OF-5.0. When this option is used, OF creates only 1 folder called &amp;quot;processors&amp;quot; and results are saved in that directory. Therefore instead of having (n x m) files, you can end-up with only m files where n is the number of processors and m is the number of files per processor directory.&lt;br /&gt;
&lt;br /&gt;
To use the collated option, one must override the global file handler value by adding the following block to the &amp;quot;case/system/controlDict&amp;quot;:&lt;br /&gt;
 &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
optimisationSwitches&lt;br /&gt;
{&lt;br /&gt;
    //- Parallel IO file handler&lt;br /&gt;
    //  uncollated (default), collated or masterUncollated&lt;br /&gt;
    fileHandler collated;&lt;br /&gt;
&lt;br /&gt;
    //- collated: thread buffer size for queued file writes.&lt;br /&gt;
    //  If set to 0 or not sufficient for the file size threading is not used.&lt;br /&gt;
    //  Default: 2e9&lt;br /&gt;
    // maxThreadFileBufferSize 0;&lt;br /&gt;
    maxThreadFileBufferSize 2e9;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also, one should define the following environment variable in their job scripts: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export FOAM_FILEHANDLER=collated&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is a work-around for an issue in the OpenFOAM's work-flow. If that environment variables is not set, OF reads the main globalDict file with the default file handler and fails due to the inconsistency between the case and main settings. This behaviour is resolved in the OF version 6, however current release we have needs this workaround.&lt;br /&gt;
&lt;br /&gt;
According to our (initial and not so extensive) experiments, cases simulated with &amp;quot;collated&amp;quot; file handler require more memory which is expected considering the fact that it is implemented. Therefore, users must be careful with your cells-per-core and ranks-per-node settings. In the very first test we conducted, it looked like each core needed to be distributed across nodes (which leaves remaining 15 cores idle) when ~ 1M cells per core was used. This is definitely not the desired scenario in terms of performance, therefore further tests needed, however can be considered as a starting point.&lt;br /&gt;
&lt;br /&gt;
== Loadleveler Submission Script for Solvers ==&lt;br /&gt;
&lt;br /&gt;
The following is a sample script for running the OpenFOAM tutorial case on BG/Q:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgqopenfoam&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 06:00:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue &lt;br /&gt;
&lt;br /&gt;
#------------------ Solver on BGQ --------------------&lt;br /&gt;
# Load BGQ OpenFOAM modules&lt;br /&gt;
module purge&lt;br /&gt;
module load binutils/2.23 bgqgcc/4.8.1 mpich2/gcc-4.8.1 OpenFOAM/5.0&lt;br /&gt;
source $FOAM_DOT_FILE&lt;br /&gt;
&lt;br /&gt;
# NOTE: when using --env-all there is a limit of 8192 characters that can passed to runjob&lt;br /&gt;
# so removing LS_COLORS should free up enough space&lt;br /&gt;
export -n LS_COLORS&lt;br /&gt;
&lt;br /&gt;
# Some solvers, simpleFOAM particularly, will hang on startup when using the default&lt;br /&gt;
# network parameters.  Disabling the pt2pt small message optimizations seems to allow it to run.&lt;br /&gt;
export PAMID_SHORT=0&lt;br /&gt;
export PAMID_EAGER=50M&lt;br /&gt;
&lt;br /&gt;
# Do not optimise collective comm.&lt;br /&gt;
export PAMID_COLLECTIVES=0&lt;br /&gt;
&lt;br /&gt;
# Do not generate core dump files&lt;br /&gt;
export BG_COREDUMPDISABLED=1&lt;br /&gt;
&lt;br /&gt;
# Run solver&lt;br /&gt;
runjob --np 1024 --env-all  : $FOAM_APPBIN/icoFoam -parallel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Typical OpenFOAM Applications on BG/Q ==&lt;br /&gt;
A list of examples will be shared here. These sample cases are derived from applications that are run on BG/Q but changed for confidentiality reasons. It can guide new users for their specific use cases. Most of the information here is OpenFOAM specific, not BG/Q specific.&lt;br /&gt;
&lt;br /&gt;
=== Wind Flow Around Buildings ===&lt;br /&gt;
This is a tutorial case that can be found in $FOAM_TUTORIALS/incompressible/simpleFoam/windAroundBuildings&lt;br /&gt;
&lt;br /&gt;
=== Rotational Flows in OpenFOAM ===&lt;br /&gt;
Information will be added soon!&lt;br /&gt;
&lt;br /&gt;
=== LES Models in OpenFOAM ===&lt;br /&gt;
Information will be added soon!&lt;br /&gt;
&lt;br /&gt;
=== Multiphase Flows in OpenFOAM ===&lt;br /&gt;
Information will be added soon!&lt;br /&gt;
&lt;br /&gt;
== Post-Processing== &lt;br /&gt;
&lt;br /&gt;
Visualisations can be done on the Niagara Cluster!&lt;br /&gt;
&lt;br /&gt;
https://docs.scinet.utoronto.ca/index.php/Visualization&lt;br /&gt;
&lt;br /&gt;
== General Tips and Tricks ==&lt;br /&gt;
&lt;br /&gt;
* Run serial tasks on FEN using FEN/OpenFOAM/* modules&lt;br /&gt;
* Make a quality check for your mesh using checkMesh tool. Be careful that if you run a serial checkMesh in a parallel case, it will only return results from &amp;quot;case/constant/polyMesh&amp;quot; not from &amp;quot;case/processor*/constant/polyMesh&amp;quot;&lt;br /&gt;
* Perform test runs using debug nodes before you submit large jobs. Request debug session with &amp;quot;debugjob -i&amp;quot; and use runjob.&lt;br /&gt;
* Always work with binary files. This can be set in the &amp;quot;case/system/controlDict&amp;quot;.&lt;br /&gt;
* You can convert cases from ascii to binary using foamFormatConvert command.&lt;br /&gt;
* Keep your simulations under $SCRATCH.&lt;br /&gt;
* If you write your own code, keep them under $HOME. Preferably create a directory &amp;quot;$HOME/OpenFOAM/username-X.Y/src&amp;quot; and work here.&lt;br /&gt;
* If you write your own code, do not forget to compile them to $FOAM_USER_APPBIN or $FOAM_USER_LIBBIN. You might need to compile shared objects on debug nodes as well.&lt;br /&gt;
* OpenFOAM is a pure MPI code, there is no multithreading in OpenFOAM.&lt;br /&gt;
* Each and every node on BG/Q has 16 GB memory and 16 compute cores. Some OpenFOAM functions, especially snappyHexMesh, are very memory consuming up to 4GB memory per 1M cells. Use 8 ranks per node if you run out of memory however be careful with that. Do not waste resources. Usually solvers require 1GB memory per 1M cells which allows users to fully utilize all 16 compute cores on a node.&lt;br /&gt;
* Try collated option using the version 5.0. It significantly reduces the number of files however master processor gets overloaded.&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1664</id>
		<title>HybridX on P7</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1664"/>
		<updated>2018-10-23T15:55:51Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;You can have two different approaches when using HybridX on P7 cluster.&lt;br /&gt;
# Using OpenMPI&lt;br /&gt;
# Using PE - Parallel environment from IBM&lt;br /&gt;
&lt;br /&gt;
It has been observed that OpenMPI runs only on a single node due to an InfiniBand issue. If you plan to use 1 node jobs you can follow the instructions below:&lt;br /&gt;
&lt;br /&gt;
== How to compile HybridX using OpenMPI ==&lt;br /&gt;
&lt;br /&gt;
Following script assumes HybridX code is located under &amp;quot;$HOME/HybridCode&amp;quot;. It compiles the package using GCC-4.8 and OpenMPI-1.6.5 and installs it to the &amp;quot;build-p7/install&amp;quot; directory inside the HybridX. See script below and modify if you want make changes:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load gcc/4.8.1 cmake/2.8.8 openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# Package details&lt;br /&gt;
base=/$HOME/HybridCode&lt;br /&gt;
pkg=HybridX&lt;br /&gt;
&lt;br /&gt;
cd $base/$pkg&lt;br /&gt;
&lt;br /&gt;
# Variables for installation&lt;br /&gt;
src=$base/$pkg&lt;br /&gt;
bld=$base/$pkg/build-p7&lt;br /&gt;
&lt;br /&gt;
# Start from scratch each time when this script is executed&lt;br /&gt;
rm -rf $bld&lt;br /&gt;
mkdir -p $bld&lt;br /&gt;
cd $bld&lt;br /&gt;
&lt;br /&gt;
# Run cmake&lt;br /&gt;
cmake $src&lt;br /&gt;
# cmake -DBOOST_ROOT=${SCINET_BOOST_DIR} $src&lt;br /&gt;
&lt;br /&gt;
# Compile and install&lt;br /&gt;
gmake&lt;br /&gt;
gmake install&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
When the compilation is completed successfully, you should be able to find HybridX executable under the &amp;quot;$HOME/HybridCode/HybridX/build-p7/install/bin&amp;quot;. Using the binary executable, you can run HybridX simulations. Please see following job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = hybridx-isotropic&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 01:00:00&lt;br /&gt;
# @ node = 1&lt;br /&gt;
# @ tasks_per_node = 128&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load gcc/4.8.1&lt;br /&gt;
module load openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# HybridX folders&lt;br /&gt;
export hybrid_root=$HOME/HybridCode/HybridX/build-p7/install&lt;br /&gt;
export hybrid_bin=${hybrid_root}/bin&lt;br /&gt;
export hybrid_run=$HOME/HybridCode/run&lt;br /&gt;
&lt;br /&gt;
# Go to case folder&lt;br /&gt;
cd $hybrid_run/isotropic-p7&lt;br /&gt;
&lt;br /&gt;
mpirun -np 128 ${hybrid_bin}/Hybrid -i isotropic.input 2&amp;gt;&amp;amp;1 | tee log.hybridx.isotropic&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to compile HybridX using PE ==&lt;br /&gt;
&lt;br /&gt;
This is very similar to using OpenMPI except a few things. Make the following modifications to your compilation script given above:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load gcc/4.8.1 cmake/2.8.8 pe/1.2.0.9&lt;br /&gt;
&lt;br /&gt;
# Keep different env binaries in different directories&lt;br /&gt;
bld=$base/$pkg/build-p7-pe&lt;br /&gt;
&lt;br /&gt;
# For parallel environment on P7&lt;br /&gt;
export CXXFLAGS=&amp;quot;-cpp&amp;quot;&lt;br /&gt;
export LDFLAGS=&amp;quot;-Wl,--allow-multiple-definition&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rest of the compilation should be the same as before. If you can compile successfully, you can use the job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = hybridx-isotropic&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 01:00:00&lt;br /&gt;
# @ node = 4&lt;br /&gt;
# @ tasks_per_node = 32&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load gcc/4.8.1 pe/1.2.0.9&lt;br /&gt;
&lt;br /&gt;
# HybridX folders&lt;br /&gt;
export hybrid_root=$HOME/HybridCode/HybridX/build-p7-pe/install&lt;br /&gt;
export hybrid_bin=${hybrid_root}/bin&lt;br /&gt;
export hybrid_run=$HOME/HybridCode/run&lt;br /&gt;
&lt;br /&gt;
# Go to case folder&lt;br /&gt;
cd $hybrid_run/isotropic-p7&lt;br /&gt;
&lt;br /&gt;
mpiexec -n 128 ${hybrid_bin}/Hybrid -i isotropic.input 2&amp;gt;&amp;amp;1 | tee log.hybridx.isotropic&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
# P7 processors support 4 threads per core, so you can increase the number of tasks accordingly.&lt;br /&gt;
# Scheduler is the same as Blue Gene, LoadLeveler, so same commands apply; such as &amp;lt;span style=&amp;quot;font-family:Courier;&amp;quot;&amp;gt;llq, llsubmit, llcancel&amp;lt;/span&amp;gt; etc.&lt;br /&gt;
# LoadLeveler writes results to an output file specified in the job details so you don't need the tee command given in the example above.&lt;br /&gt;
# P7 cluster shares the same file-system with Blue Gene so be careful with that.&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1652</id>
		<title>HybridX on P7</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1652"/>
		<updated>2018-10-19T15:28:38Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;You can have two different approaches when using HybridX on P7 cluster.&lt;br /&gt;
# Using OpenMPI&lt;br /&gt;
# Using PE - Parallel environment from IBM&lt;br /&gt;
&lt;br /&gt;
It has been observed that OpenMPI runs only on a single node due to an InfiniBand issue. If you plan to use 1 node jobs you can follow the instructions below:&lt;br /&gt;
&lt;br /&gt;
== How to compile HybridX using OpenMPI ==&lt;br /&gt;
&lt;br /&gt;
Following script assumes HybridX code is located under &amp;quot;$HOME/HybridCode&amp;quot;. It compiles the package using GCC-4.8 and OpenMPI-1.6.5 and installs it to the &amp;quot;build-p7/install&amp;quot; directory inside the HybridX. See script below and modify if you want make changes:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load gcc/4.8.1 cmake/2.8.8 openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# Package details&lt;br /&gt;
base=/$HOME/HybridCode&lt;br /&gt;
pkg=HybridX&lt;br /&gt;
&lt;br /&gt;
cd $base/$pkg&lt;br /&gt;
&lt;br /&gt;
# Variables for installation&lt;br /&gt;
src=$base/$pkg&lt;br /&gt;
bld=$base/$pkg/build-p7&lt;br /&gt;
&lt;br /&gt;
# Start from scratch each time when this script is executed&lt;br /&gt;
rm -rf $bld&lt;br /&gt;
mkdir -p $bld&lt;br /&gt;
cd $bld&lt;br /&gt;
&lt;br /&gt;
# Run cmake&lt;br /&gt;
cmake $src&lt;br /&gt;
# cmake -DBOOST_ROOT=${SCINET_BOOST_DIR} $src&lt;br /&gt;
&lt;br /&gt;
# Compile and install&lt;br /&gt;
gmake&lt;br /&gt;
gmake install&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
When the compilation is completed successfully, you should be able to find HybridX executable under the &amp;quot;$HOME/HybridCode/HybridX/build-p7/install/bin&amp;quot;. Using the binary executable, you can run HybridX simulations. Please see following job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = hybridx-isotropic&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 01:00:00&lt;br /&gt;
# @ node = 1&lt;br /&gt;
# @ tasks_per_node = 128&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load gcc/4.8.1&lt;br /&gt;
module load openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# HybridX folders&lt;br /&gt;
export hybrid_root=$HOME/HybridCode/HybridX/build-p7/install&lt;br /&gt;
export hybrid_bin=${hybrid_root}/bin&lt;br /&gt;
export hybrid_run=$HOME/HybridCode/run&lt;br /&gt;
&lt;br /&gt;
# Go to case folder&lt;br /&gt;
cd $hybrid_run/isotropic-p7&lt;br /&gt;
&lt;br /&gt;
mpirun -np 128 ${hybrid_bin}/Hybrid -i isotropic.input 2&amp;gt;&amp;amp;1 | tee log.hybridx.isotropic&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to compile HybridX using PE ==&lt;br /&gt;
&lt;br /&gt;
This is very similar to using OpenMPI except a few things. Make the following modifications to your compilation script given above:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load gcc/4.8.1 cmake/2.8.8 pe/1.2.0.9&lt;br /&gt;
&lt;br /&gt;
# Keep different env binaries in different directories&lt;br /&gt;
bld=$base/$pkg/build-p7-pe&lt;br /&gt;
&lt;br /&gt;
# For parallel environment on P7&lt;br /&gt;
export CXXFLAGS=&amp;quot;-cpp&amp;quot;&lt;br /&gt;
export LDFLAGS=&amp;quot;-Wl,--allow-multiple-definition&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rest of the compilation should be the same as before. If you can compile successfully, you can use the job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = hybridx-isotropic&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 01:00:00&lt;br /&gt;
# @ node = 4&lt;br /&gt;
# @ tasks_per_node = 32&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load gcc/4.8.1 pe/1.2.0.9&lt;br /&gt;
&lt;br /&gt;
# HybridX folders&lt;br /&gt;
export hybrid_root=$HOME/HybridCode/HybridX/build-p7-pe/install&lt;br /&gt;
export hybrid_bin=${hybrid_root}/bin&lt;br /&gt;
export hybrid_run=$HOME/HybridCode/run&lt;br /&gt;
&lt;br /&gt;
# Go to case folder&lt;br /&gt;
cd $hybrid_run/isotropic-p7&lt;br /&gt;
&lt;br /&gt;
mpiexec -n 128 ${hybrid_bin}/Hybrid -i isotropic.input 2&amp;gt;&amp;amp;1 | tee log.hybridx.isotropic&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
# P7 processors support 4 threads per core, so you can increase the number of tasks accordingly.&lt;br /&gt;
# Scheduler is the same as Blue Gene, LoadLeveler so same commands apply. llq, llsubmit etc.&lt;br /&gt;
# LoadLeveler writes results to an output file specified in the job details so you don't need the tee command given in the example above.&lt;br /&gt;
# P7 cluster shares the same file-system with Blue Gene so be careful with that.&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1651</id>
		<title>HybridX on P7</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1651"/>
		<updated>2018-10-19T15:27:05Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* How to compile HybridX using PE */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;You can have two different approaches when using HybridX on P7 cluster.&lt;br /&gt;
# Using OpenMPI&lt;br /&gt;
# Using PE - Parallel environment from IBM&lt;br /&gt;
&lt;br /&gt;
It has been observed that OpenMPI runs only on a single node due to an InfiniBand issue. If you plan to use 1 node jobs you can follow the instructions below:&lt;br /&gt;
&lt;br /&gt;
== How to compile HybridX using OpenMPI ==&lt;br /&gt;
&lt;br /&gt;
Following script assumes HybridX code is located under &amp;quot;$HOME/HybridCode&amp;quot;. It compiles the package using GCC-4.8 and OpenMPI-1.6.5 and installs it to the &amp;quot;build-p7/install&amp;quot; directory inside the HybridX. See script below and modify if you want make changes:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load gcc/4.8.1 cmake/2.8.8 openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# Package details&lt;br /&gt;
base=/$HOME/HybridCode&lt;br /&gt;
pkg=HybridX&lt;br /&gt;
&lt;br /&gt;
cd $base/$pkg&lt;br /&gt;
&lt;br /&gt;
# Variables for installation&lt;br /&gt;
src=$base/$pkg&lt;br /&gt;
bld=$base/$pkg/build-p7&lt;br /&gt;
&lt;br /&gt;
# Start from scratch each time when this script is executed&lt;br /&gt;
rm -rf $bld&lt;br /&gt;
mkdir -p $bld&lt;br /&gt;
cd $bld&lt;br /&gt;
&lt;br /&gt;
# Run cmake&lt;br /&gt;
cmake $src&lt;br /&gt;
# cmake -DBOOST_ROOT=${SCINET_BOOST_DIR} $src&lt;br /&gt;
&lt;br /&gt;
# Compile and install&lt;br /&gt;
gmake&lt;br /&gt;
gmake install&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
When the compilation is completed successfully, you should be able to find HybridX executable under the &amp;quot;$HOME/HybridCode/HybridX/build-p7/install/bin&amp;quot;. Using the binary executable, you can run HybridX simulations. Please see following job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = hybridx-isotropic&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 01:00:00&lt;br /&gt;
# @ node = 1&lt;br /&gt;
# @ tasks_per_node = 128&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load gcc/4.8.1&lt;br /&gt;
module load openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# HybridX folders&lt;br /&gt;
export hybrid_root=$HOME/HybridCode/HybridX/build-p7/install&lt;br /&gt;
export hybrid_bin=${hybrid_root}/bin&lt;br /&gt;
export hybrid_run=$HOME/HybridCode/run&lt;br /&gt;
&lt;br /&gt;
# Go to case folder&lt;br /&gt;
cd $hybrid_run/isotropic-p7&lt;br /&gt;
&lt;br /&gt;
mpirun -np 128 ${hybrid_bin}/Hybrid -i isotropic.input 2&amp;gt;&amp;amp;1 | tee log.hybridx.isotropic&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to compile HybridX using PE ==&lt;br /&gt;
&lt;br /&gt;
This is very similar to using OpenMPI except a few things. Make the following modifications to your compilation script given above:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load gcc/4.8.1 cmake/2.8.8 pe/1.2.0.9&lt;br /&gt;
&lt;br /&gt;
# Keep different env binaries in different directories&lt;br /&gt;
bld=$base/$pkg/build-p7-pe&lt;br /&gt;
&lt;br /&gt;
# For parallel environment on P7&lt;br /&gt;
export CXXFLAGS=&amp;quot;-cpp&amp;quot;&lt;br /&gt;
export LDFLAGS=&amp;quot;-Wl,--allow-multiple-definition&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rest of the compilation should be the same as before. If you can compile successfully, you can use the job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = hybridx-isotropic&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 01:00:00&lt;br /&gt;
# @ node = 4&lt;br /&gt;
# @ tasks_per_node = 32&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load gcc/4.8.1 pe/1.2.0.9&lt;br /&gt;
&lt;br /&gt;
# HybridX folders&lt;br /&gt;
export hybrid_root=$HOME/HybridCode/HybridX/build-p7-pe/install&lt;br /&gt;
export hybrid_bin=${hybrid_root}/bin&lt;br /&gt;
export hybrid_run=$HOME/HybridCode/run&lt;br /&gt;
&lt;br /&gt;
# Go to case folder&lt;br /&gt;
cd $hybrid_run/isotropic-p7&lt;br /&gt;
&lt;br /&gt;
mpiexec -n 128 ${hybrid_bin}/Hybrid -i isotropic.input 2&amp;gt;&amp;amp;1 | tee log.hybridx.isotropic&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
# P7 processors support 4 threads per core, so you can increase the number of tasks accordingly.&lt;br /&gt;
# Scheduler is the same as Blue Gene, LoadLeveler so same commands apply. llq, llsubmit etc.&lt;br /&gt;
# LoadLeveler writes results to an output file specified in the job details so you don't need the tee command given in the example above.&lt;br /&gt;
# P7 cluster share the same file-system with Blue Gene so be careful with that.&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1650</id>
		<title>HybridX on P7</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1650"/>
		<updated>2018-10-19T15:25:51Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* How to compile HybridX using OpenMPI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;You can have two different approaches when using HybridX on P7 cluster.&lt;br /&gt;
# Using OpenMPI&lt;br /&gt;
# Using PE - Parallel environment from IBM&lt;br /&gt;
&lt;br /&gt;
It has been observed that OpenMPI runs only on a single node due to an InfiniBand issue. If you plan to use 1 node jobs you can follow the instructions below:&lt;br /&gt;
&lt;br /&gt;
== How to compile HybridX using OpenMPI ==&lt;br /&gt;
&lt;br /&gt;
Following script assumes HybridX code is located under &amp;quot;$HOME/HybridCode&amp;quot;. It compiles the package using GCC-4.8 and OpenMPI-1.6.5 and installs it to the &amp;quot;build-p7/install&amp;quot; directory inside the HybridX. See script below and modify if you want make changes:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load gcc/4.8.1 cmake/2.8.8 openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# Package details&lt;br /&gt;
base=/$HOME/HybridCode&lt;br /&gt;
pkg=HybridX&lt;br /&gt;
&lt;br /&gt;
cd $base/$pkg&lt;br /&gt;
&lt;br /&gt;
# Variables for installation&lt;br /&gt;
src=$base/$pkg&lt;br /&gt;
bld=$base/$pkg/build-p7&lt;br /&gt;
&lt;br /&gt;
# Start from scratch each time when this script is executed&lt;br /&gt;
rm -rf $bld&lt;br /&gt;
mkdir -p $bld&lt;br /&gt;
cd $bld&lt;br /&gt;
&lt;br /&gt;
# Run cmake&lt;br /&gt;
cmake $src&lt;br /&gt;
# cmake -DBOOST_ROOT=${SCINET_BOOST_DIR} $src&lt;br /&gt;
&lt;br /&gt;
# Compile and install&lt;br /&gt;
gmake&lt;br /&gt;
gmake install&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
When the compilation is completed successfully, you should be able to find HybridX executable under the &amp;quot;$HOME/HybridCode/HybridX/build-p7/install/bin&amp;quot;. Using the binary executable, you can run HybridX simulations. Please see following job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = hybridx-isotropic&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 01:00:00&lt;br /&gt;
# @ node = 1&lt;br /&gt;
# @ tasks_per_node = 128&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load gcc/4.8.1&lt;br /&gt;
module load openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# HybridX folders&lt;br /&gt;
export hybrid_root=$HOME/HybridCode/HybridX/build-p7/install&lt;br /&gt;
export hybrid_bin=${hybrid_root}/bin&lt;br /&gt;
export hybrid_run=$HOME/HybridCode/run&lt;br /&gt;
&lt;br /&gt;
# Go to case folder&lt;br /&gt;
cd $hybrid_run/isotropic-p7&lt;br /&gt;
&lt;br /&gt;
mpirun -np 128 ${hybrid_bin}/Hybrid -i isotropic.input 2&amp;gt;&amp;amp;1 | tee log.hybridx.isotropic&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to compile HybridX using PE ==&lt;br /&gt;
&lt;br /&gt;
This is very similar to using OpenMPI except a few things. Make the following modifications to your compilation script given above:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load gcc/4.8.1 cmake/2.8.8 pe/1.2.0.9&lt;br /&gt;
&lt;br /&gt;
bld=$base/$pkg/build-p7-pe&lt;br /&gt;
&lt;br /&gt;
# For parallel environment on P7&lt;br /&gt;
export CXXFLAGS=&amp;quot;-cpp&amp;quot;&lt;br /&gt;
export LDFLAGS=&amp;quot;-Wl,--allow-multiple-definition&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rest of the compilation should be the same as before. If you can compile successfully, you can use the job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = hybridx-isotropic&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 01:00:00&lt;br /&gt;
# @ node = 4&lt;br /&gt;
# @ tasks_per_node = 32&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load gcc/4.8.1 pe/1.2.0.9&lt;br /&gt;
&lt;br /&gt;
# HybridX folders&lt;br /&gt;
export hybrid_root=$HOME/HybridCode/HybridX/build-p7-pe/install&lt;br /&gt;
export hybrid_bin=${hybrid_root}/bin&lt;br /&gt;
export hybrid_run=$HOME/HybridCode/run&lt;br /&gt;
&lt;br /&gt;
# Go to case folder&lt;br /&gt;
cd $hybrid_run/isotropic-p7&lt;br /&gt;
&lt;br /&gt;
mpiexec -n 128 ${hybrid_bin}/Hybrid -i isotropic.input 2&amp;gt;&amp;amp;1 | tee log.hybridx.isotropic&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
# P7 processors support 4 threads per core, so you can increase the number of tasks accordingly.&lt;br /&gt;
# Scheduler is the same as Blue Gene, LoadLeveler so same commands apply. llq, llsubmit etc.&lt;br /&gt;
# LoadLeveler writes results to an output file specified in the job details so you don't need the tee command given in the example above.&lt;br /&gt;
# P7 cluster share the same file-system with Blue Gene so be careful with that.&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1649</id>
		<title>HybridX on P7</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1649"/>
		<updated>2018-10-19T15:24:53Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* How to compile HybridX using OpenMPI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;You can have two different approaches when using HybridX on P7 cluster.&lt;br /&gt;
# Using OpenMPI&lt;br /&gt;
# Using PE - Parallel environment from IBM&lt;br /&gt;
&lt;br /&gt;
It has been observed that OpenMPI runs only on a single node due to an InfiniBand issue. If you plan to use 1 node jobs you can follow the instructions below:&lt;br /&gt;
&lt;br /&gt;
== How to compile HybridX using OpenMPI ==&lt;br /&gt;
&lt;br /&gt;
Following script assumes HybridX code is located under &amp;quot;$HOME/HybridCode&amp;quot;. It then compiles the package using GCC-4.8 and OpenMPI-1.6.5 and install it to the &amp;quot;build-p7/install&amp;quot; directory inside the HybridX. See script below and modify if you want make changes:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load gcc/4.8.1 cmake/2.8.8 openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# Package details&lt;br /&gt;
base=/$HOME/HybridCode&lt;br /&gt;
pkg=HybridX&lt;br /&gt;
&lt;br /&gt;
cd $base/$pkg&lt;br /&gt;
&lt;br /&gt;
# Variables for installation&lt;br /&gt;
src=$base/$pkg&lt;br /&gt;
bld=$base/$pkg/build-p7&lt;br /&gt;
&lt;br /&gt;
# Start from scratch each time when this script is executed&lt;br /&gt;
rm -rf $bld&lt;br /&gt;
mkdir -p $bld&lt;br /&gt;
cd $bld&lt;br /&gt;
&lt;br /&gt;
# Run cmake&lt;br /&gt;
cmake $src&lt;br /&gt;
# cmake -DBOOST_ROOT=${SCINET_BOOST_DIR} $src&lt;br /&gt;
&lt;br /&gt;
# Compile and install&lt;br /&gt;
gmake&lt;br /&gt;
gmake install&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
When the compilation is completed successfully, you should be able to find HybridX executable under the &amp;quot;$HOME/HybridCode/HybridX/build-p7/install/bin&amp;quot;. Using the binary executable, you can run HybridX simulations. Please see following job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = hybridx-isotropic&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 01:00:00&lt;br /&gt;
# @ node = 1&lt;br /&gt;
# @ tasks_per_node = 128&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load gcc/4.8.1&lt;br /&gt;
module load openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# HybridX folders&lt;br /&gt;
export hybrid_root=$HOME/HybridCode/HybridX/build-p7/install&lt;br /&gt;
export hybrid_bin=${hybrid_root}/bin&lt;br /&gt;
export hybrid_run=$HOME/HybridCode/run&lt;br /&gt;
&lt;br /&gt;
# Go to case folder&lt;br /&gt;
cd $hybrid_run/isotropic-p7&lt;br /&gt;
&lt;br /&gt;
mpirun -np 128 ${hybrid_bin}/Hybrid -i isotropic.input 2&amp;gt;&amp;amp;1 | tee log.hybridx.isotropic&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to compile HybridX using PE ==&lt;br /&gt;
&lt;br /&gt;
This is very similar to using OpenMPI except a few things. Make the following modifications to your compilation script given above:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load gcc/4.8.1 cmake/2.8.8 pe/1.2.0.9&lt;br /&gt;
&lt;br /&gt;
bld=$base/$pkg/build-p7-pe&lt;br /&gt;
&lt;br /&gt;
# For parallel environment on P7&lt;br /&gt;
export CXXFLAGS=&amp;quot;-cpp&amp;quot;&lt;br /&gt;
export LDFLAGS=&amp;quot;-Wl,--allow-multiple-definition&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rest of the compilation should be the same as before. If you can compile successfully, you can use the job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = hybridx-isotropic&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 01:00:00&lt;br /&gt;
# @ node = 4&lt;br /&gt;
# @ tasks_per_node = 32&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load gcc/4.8.1 pe/1.2.0.9&lt;br /&gt;
&lt;br /&gt;
# HybridX folders&lt;br /&gt;
export hybrid_root=$HOME/HybridCode/HybridX/build-p7-pe/install&lt;br /&gt;
export hybrid_bin=${hybrid_root}/bin&lt;br /&gt;
export hybrid_run=$HOME/HybridCode/run&lt;br /&gt;
&lt;br /&gt;
# Go to case folder&lt;br /&gt;
cd $hybrid_run/isotropic-p7&lt;br /&gt;
&lt;br /&gt;
mpiexec -n 128 ${hybrid_bin}/Hybrid -i isotropic.input 2&amp;gt;&amp;amp;1 | tee log.hybridx.isotropic&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
# P7 processors support 4 threads per core, so you can increase the number of tasks accordingly.&lt;br /&gt;
# Scheduler is the same as Blue Gene, LoadLeveler so same commands apply. llq, llsubmit etc.&lt;br /&gt;
# LoadLeveler writes results to an output file specified in the job details so you don't need the tee command given in the example above.&lt;br /&gt;
# P7 cluster share the same file-system with Blue Gene so be careful with that.&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1648</id>
		<title>HybridX on P7</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1648"/>
		<updated>2018-10-19T15:24:37Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;You can have two different approaches when using HybridX on P7 cluster.&lt;br /&gt;
# Using OpenMPI&lt;br /&gt;
# Using PE - Parallel environment from IBM&lt;br /&gt;
&lt;br /&gt;
It has been observed that OpenMPI runs only on a single node due to an InfiniBand issue. If you plan to use 1 node jobs you can follow the instructions below:&lt;br /&gt;
&lt;br /&gt;
== How to compile HybridX using OpenMPI ==&lt;br /&gt;
&lt;br /&gt;
Following script assumes HybridX code is located under &amp;quot;/$HOME/HybridCode&amp;quot;. It then compiles the package using GCC-4.8 and OpenMPI-1.6.5 and install it to the &amp;quot;build-p7/install&amp;quot; directory inside the HybridX. See script below and modify if you want make changes:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load gcc/4.8.1 cmake/2.8.8 openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# Package details&lt;br /&gt;
base=/$HOME/HybridCode&lt;br /&gt;
pkg=HybridX&lt;br /&gt;
&lt;br /&gt;
cd $base/$pkg&lt;br /&gt;
&lt;br /&gt;
# Variables for installation&lt;br /&gt;
src=$base/$pkg&lt;br /&gt;
bld=$base/$pkg/build-p7&lt;br /&gt;
&lt;br /&gt;
# Start from scratch each time when this script is executed&lt;br /&gt;
rm -rf $bld&lt;br /&gt;
mkdir -p $bld&lt;br /&gt;
cd $bld&lt;br /&gt;
&lt;br /&gt;
# Run cmake&lt;br /&gt;
cmake $src&lt;br /&gt;
# cmake -DBOOST_ROOT=${SCINET_BOOST_DIR} $src&lt;br /&gt;
&lt;br /&gt;
# Compile and install&lt;br /&gt;
gmake&lt;br /&gt;
gmake install&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
When the compilation is completed successfully, you should be able to find HybridX executable under the &amp;quot;$HOME/HybridCode/HybridX/build-p7/install/bin&amp;quot;. Using the binary executable, you can run HybridX simulations. Please see following job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = hybridx-isotropic&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 01:00:00&lt;br /&gt;
# @ node = 1&lt;br /&gt;
# @ tasks_per_node = 128&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load gcc/4.8.1&lt;br /&gt;
module load openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# HybridX folders&lt;br /&gt;
export hybrid_root=$HOME/HybridCode/HybridX/build-p7/install&lt;br /&gt;
export hybrid_bin=${hybrid_root}/bin&lt;br /&gt;
export hybrid_run=$HOME/HybridCode/run&lt;br /&gt;
&lt;br /&gt;
# Go to case folder&lt;br /&gt;
cd $hybrid_run/isotropic-p7&lt;br /&gt;
&lt;br /&gt;
mpirun -np 128 ${hybrid_bin}/Hybrid -i isotropic.input 2&amp;gt;&amp;amp;1 | tee log.hybridx.isotropic&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to compile HybridX using PE ==&lt;br /&gt;
&lt;br /&gt;
This is very similar to using OpenMPI except a few things. Make the following modifications to your compilation script given above:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load gcc/4.8.1 cmake/2.8.8 pe/1.2.0.9&lt;br /&gt;
&lt;br /&gt;
bld=$base/$pkg/build-p7-pe&lt;br /&gt;
&lt;br /&gt;
# For parallel environment on P7&lt;br /&gt;
export CXXFLAGS=&amp;quot;-cpp&amp;quot;&lt;br /&gt;
export LDFLAGS=&amp;quot;-Wl,--allow-multiple-definition&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rest of the compilation should be the same as before. If you can compile successfully, you can use the job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = hybridx-isotropic&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 01:00:00&lt;br /&gt;
# @ node = 4&lt;br /&gt;
# @ tasks_per_node = 32&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load gcc/4.8.1 pe/1.2.0.9&lt;br /&gt;
&lt;br /&gt;
# HybridX folders&lt;br /&gt;
export hybrid_root=$HOME/HybridCode/HybridX/build-p7-pe/install&lt;br /&gt;
export hybrid_bin=${hybrid_root}/bin&lt;br /&gt;
export hybrid_run=$HOME/HybridCode/run&lt;br /&gt;
&lt;br /&gt;
# Go to case folder&lt;br /&gt;
cd $hybrid_run/isotropic-p7&lt;br /&gt;
&lt;br /&gt;
mpiexec -n 128 ${hybrid_bin}/Hybrid -i isotropic.input 2&amp;gt;&amp;amp;1 | tee log.hybridx.isotropic&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
# P7 processors support 4 threads per core, so you can increase the number of tasks accordingly.&lt;br /&gt;
# Scheduler is the same as Blue Gene, LoadLeveler so same commands apply. llq, llsubmit etc.&lt;br /&gt;
# LoadLeveler writes results to an output file specified in the job details so you don't need the tee command given in the example above.&lt;br /&gt;
# P7 cluster share the same file-system with Blue Gene so be careful with that.&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1647</id>
		<title>HybridX on P7</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1647"/>
		<updated>2018-10-19T14:27:04Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;You can have two different approaches when using HybridX on P7 cluster.&lt;br /&gt;
# Using OpenMPI&lt;br /&gt;
# Using PE - Parallel environment from IBM&lt;br /&gt;
&lt;br /&gt;
It looks like OpenMPI can run only on a single node. If you plan to use 1 node jobs you can follow the instructions below:&lt;br /&gt;
&lt;br /&gt;
== How to compile HybridX using OpenMPI ==&lt;br /&gt;
&lt;br /&gt;
Following script assumes HybridX code is located under &amp;quot;/$HOME/HybridCode&amp;quot;. It then compiles the package using GCC-4.8 and OpenMPI-1.6.5 and install it to the &amp;quot;build-p7/install&amp;quot; directory inside the HybridX. See script below and modify if you want make changes:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load gcc/4.8.1 cmake/2.8.8 openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# Package details&lt;br /&gt;
base=/$HOME/HybridCode&lt;br /&gt;
pkg=HybridX&lt;br /&gt;
&lt;br /&gt;
cd $base/$pkg&lt;br /&gt;
&lt;br /&gt;
# Variables for installation&lt;br /&gt;
src=$base/$pkg&lt;br /&gt;
bld=$base/$pkg/build-p7&lt;br /&gt;
&lt;br /&gt;
# Start from scratch each time when this script is executed&lt;br /&gt;
rm -rf $bld&lt;br /&gt;
mkdir -p $bld&lt;br /&gt;
cd $bld&lt;br /&gt;
&lt;br /&gt;
# Run cmake&lt;br /&gt;
cmake $src&lt;br /&gt;
# cmake -DBOOST_ROOT=${SCINET_BOOST_DIR} $src&lt;br /&gt;
&lt;br /&gt;
# Compile and install&lt;br /&gt;
gmake&lt;br /&gt;
gmake install&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
When the compilation is completed successfully, you should be able to find HybridX executable under the &amp;quot;$HOME/HybridCode/HybridX/build-p7/install/bin&amp;quot;. Using the binary executable, you can run HybridX simulations. Please see following job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = hybridx-isotropic&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 01:00:00&lt;br /&gt;
# @ node = 1&lt;br /&gt;
# @ tasks_per_node = 128&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load gcc/4.8.1&lt;br /&gt;
module load openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# HybridX folders&lt;br /&gt;
export hybrid_root=$HOME/HybridCode/HybridX/build-p7/install&lt;br /&gt;
export hybrid_bin=${hybrid_root}/bin&lt;br /&gt;
export hybrid_run=$HOME/HybridCode/run&lt;br /&gt;
&lt;br /&gt;
# Go to case folder&lt;br /&gt;
cd $hybrid_run/isotropic-p7&lt;br /&gt;
&lt;br /&gt;
mpirun -np 128 ${hybrid_bin}/Hybrid -i isotropic.input 2&amp;gt;&amp;amp;1 | tee log.hybridx.isotropic&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to compile HybridX using PE ==&lt;br /&gt;
&lt;br /&gt;
This is very similar to using OpenMPI except a few things. Make the following modifications to your compilation script given above:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load gcc/4.8.1 cmake/2.8.8 pe/1.2.0.9&lt;br /&gt;
&lt;br /&gt;
bld=$base/$pkg/build-p7-pe&lt;br /&gt;
&lt;br /&gt;
# For parallel environment on P7&lt;br /&gt;
export CXXFLAGS=&amp;quot;-cpp&amp;quot;&lt;br /&gt;
export LDFLAGS=&amp;quot;-Wl,--allow-multiple-definition&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rest of the compilation should be the same as before. If you can compile successfully, you can use the job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = hybridx-isotropic&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 01:00:00&lt;br /&gt;
# @ node = 4&lt;br /&gt;
# @ tasks_per_node = 32&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load gcc/4.8.1 pe/1.2.0.9&lt;br /&gt;
&lt;br /&gt;
# HybridX folders&lt;br /&gt;
export hybrid_root=$HOME/HybridCode/HybridX/build-p7-pe/install&lt;br /&gt;
export hybrid_bin=${hybrid_root}/bin&lt;br /&gt;
export hybrid_run=$HOME/HybridCode/run&lt;br /&gt;
&lt;br /&gt;
# Go to case folder&lt;br /&gt;
cd $hybrid_run/isotropic-p7&lt;br /&gt;
&lt;br /&gt;
mpiexec -n 128 ${hybrid_bin}/Hybrid -i isotropic.input 2&amp;gt;&amp;amp;1 | tee log.hybridx.isotropic&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
# P7 processors support 4 threads per core, so you can increase the number of tasks accordingly.&lt;br /&gt;
# Scheduler is the same as Blue Gene, LoadLeveler so same commands apply. llq, llsubmit etc.&lt;br /&gt;
# LoadLeveler writes results to an output file specified in the job details so you don't need the tee command given in the example above.&lt;br /&gt;
# P7 cluster share the same file-system with Blue Gene so be careful with that.&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1646</id>
		<title>HybridX on P7</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=HybridX_on_P7&amp;diff=1646"/>
		<updated>2018-10-18T20:38:47Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: Created page with &amp;quot;== How to compile HybridX ==  Following script assumes HybridX code is located under &amp;quot;/$HOME/HybridCode&amp;quot;. It then compiles the package using GCC-4.8 and OpenMPI-1.6.5 and inst...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== How to compile HybridX ==&lt;br /&gt;
&lt;br /&gt;
Following script assumes HybridX code is located under &amp;quot;/$HOME/HybridCode&amp;quot;. It then compiles the package using GCC-4.8 and OpenMPI-1.6.5 and install it to the &amp;quot;build-p7/install&amp;quot; directory inside the HybridX. See script below and modify if you want make changes:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load gcc/4.8.1 cmake/2.8.8 openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# Package details&lt;br /&gt;
base=/$HOME/HybridCode&lt;br /&gt;
pkg=HybridX&lt;br /&gt;
&lt;br /&gt;
cd $base/$pkg&lt;br /&gt;
&lt;br /&gt;
# Variables for installation&lt;br /&gt;
src=$base/$pkg&lt;br /&gt;
bld=$base/$pkg/build-p7&lt;br /&gt;
&lt;br /&gt;
# Start from scratch each time when this script is executed&lt;br /&gt;
rm -rf $bld&lt;br /&gt;
mkdir -p $bld&lt;br /&gt;
cd $bld&lt;br /&gt;
&lt;br /&gt;
# Run cmake&lt;br /&gt;
cmake $src&lt;br /&gt;
# cmake -DBOOST_ROOT=${SCINET_BOOST_DIR} $src&lt;br /&gt;
&lt;br /&gt;
# Compile and install&lt;br /&gt;
gmake&lt;br /&gt;
gmake install&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
When the compilation is completed successfully, you should be able to find HybridX executable under the &amp;quot;$HOME/HybridCode/HybridX/build-p7/install/bin&amp;quot;. Using the binary executable, you can run HybridX simulations. Please see following job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = hybridx-isotropic&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 01:00:00&lt;br /&gt;
# @ node = 4&lt;br /&gt;
# @ tasks_per_node = 128&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load gcc/4.8.1&lt;br /&gt;
module load openmpi/1.6.5-gcc&lt;br /&gt;
&lt;br /&gt;
# HybridX folders&lt;br /&gt;
export hybrid_root=$HOME/HybridCode/HybridX/build-p7/install&lt;br /&gt;
export hybrid_bin=${hybrid_root}/bin&lt;br /&gt;
export hybrid_run=$HOME/HybridCode/run&lt;br /&gt;
&lt;br /&gt;
# Go to case folder&lt;br /&gt;
cd $hybrid_run/isotropic-p7&lt;br /&gt;
&lt;br /&gt;
mpirun -np 128 ${hybrid_bin}/Hybrid -i isotropic.input 2&amp;gt;&amp;amp;1 | tee log.hybridx.isotropic&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
# P7 processors support 4 threads per core, so you can increase the number of tasks accordingly.&lt;br /&gt;
# Scheduler is the same as Blue Gene, LoadLeveler so same commands apply. llq, llsubmit etc.&lt;br /&gt;
# LoadLeveler writes results to an output file specified in the job details so you don't need the tee command given in the example above.&lt;br /&gt;
# P7 cluster share the same file-system with Blue Gene so be careful with that.&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=P7&amp;diff=1645</id>
		<title>P7</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=P7&amp;diff=1645"/>
		<updated>2018-10-18T20:23:11Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:IBM755.jpg|center|300px|thumb]]&lt;br /&gt;
|name=P7 Cluster (P7)&lt;br /&gt;
|installed=May 2011, March 2013&lt;br /&gt;
|operatingsystem= Linux (RHEL 6.3)&lt;br /&gt;
|loginnode= p701 (from &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt;)&lt;br /&gt;
|nnodes=8 (256 cores)&lt;br /&gt;
|rampernode=128 Gb &lt;br /&gt;
|corespernode=32 (128 Threads)&lt;br /&gt;
|interconnect=Infiniband (2 DDR/node )&lt;br /&gt;
|vendorcompilers=xlc/xlf&lt;br /&gt;
|queuetype=LoadLeveler&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The P7 Cluster consists of 8 IBM Power 755 Servers each with 4x 8core 3.3GHz Power7 CPUs and 128GB Ram. Similar to the Power 6, the Power 7 utilizes Simultaneous Multi Threading (SMT), but extends the design from 2 threads per core to 4.  This allows the 32 physical cores to support up to 128 threads which in many cases can lead to significant speedups.&lt;br /&gt;
&lt;br /&gt;
== Login ==&lt;br /&gt;
&lt;br /&gt;
First login via ssh with your scinet account at '''&amp;lt;tt&amp;gt;bgqdev.scinet.utoronto.ca&amp;lt;/tt&amp;gt;''', and from there you can proceed to '''&amp;lt;tt&amp;gt;p7n01-ib0&amp;lt;/tt&amp;gt;''' which &lt;br /&gt;
is currently the gateway/devel node for this cluster.  It is recommended that you modify your .bashrc files to distinguish between the P7 and other systems that use the same file system to avoid module confusion, by including something like&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
case $(hostname -s) in&lt;br /&gt;
    p7*)&lt;br /&gt;
      MACHINE=p7&lt;br /&gt;
      # commands for p7&lt;br /&gt;
    ;;&lt;br /&gt;
    bgq*)  &lt;br /&gt;
      MACHINE=bgq&lt;br /&gt;
      # commands for bgq&lt;br /&gt;
    ;;&lt;br /&gt;
    sgc*) &lt;br /&gt;
      MACHINE=sgc&lt;br /&gt;
      # commands for sgc&lt;br /&gt;
    ;;&lt;br /&gt;
    *)    &lt;br /&gt;
      MACHINE=unknown&lt;br /&gt;
    ;;&lt;br /&gt;
esac&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Compiler/Devel Node ==&lt;br /&gt;
&lt;br /&gt;
From '''&amp;lt;tt&amp;gt;p7n01-ib0&amp;lt;/tt&amp;gt;''' you can compile, do short tests, and submit your jobs to the queue.&lt;br /&gt;
&lt;br /&gt;
=== Software ===&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
gcc/g++/gfortran version 4.4.4 is the default with RHEL 6.3 and is available by default. Gcc 4.6.1 is available as a separate module. However, it is recommended to use the IBM compilers (see below).&lt;br /&gt;
&lt;br /&gt;
==== IBM Compilers ====&lt;br /&gt;
To use the IBM Power specific compilers xlc/xlc++/xlf you need to load the following modules &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load vacpp xlf&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
NOTE: Be sure to use &amp;quot;-q64&amp;quot; when using the IBM compilers.&lt;br /&gt;
&lt;br /&gt;
==== MPI ====&lt;br /&gt;
&lt;br /&gt;
IBM's POE is available and will work with both the IBM and GNU compilers.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load pe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The mpi wrappers for C, C++ and Fortran 77/90 are mpicc, mpicxx, and mpif77/mpif90, respectively (but mpcc, mpCC and mpfort should also work).&lt;br /&gt;
&lt;br /&gt;
Note: To use the full C++ bindings of MPI (those in the MPI namespace) in c++ code, you need to add &amp;lt;tt&amp;gt;-cpp&amp;lt;/tt&amp;gt; to the compilation command, and you need to add &amp;lt;tt&amp;gt;-Wl,--allow-multiple-definition&amp;lt;/tt&amp;gt; to the link command if you are linking several  object files that use the MPI c++ bindings.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module openmpi/1.5.3-gcc-v4.4.4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module openmpi/1.5.3-ibm-11.1+13.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
==== Spark Standalone ====&lt;br /&gt;
To run Spark you need to previously load JRE1.7.0 via JDK&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
p7n01-$ module load jdk/JRE1.7.0 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Then load Spark as follow :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
p7n01-$ module load spark/1.4.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
==== Spark SQL ====&lt;br /&gt;
The current build of spark/1.5.0 supports Spark SQL &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
p7n01-$ module load jdk/JRE1.7.0 &lt;br /&gt;
p7n01-$ module load spark/1.5.0&lt;br /&gt;
p7n01-$ module load hadoop/2.3.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Sample Spark script ==&lt;br /&gt;
We recommend you to read the following blog post by Jonathan Dursi to build your first Spark script :&lt;br /&gt;
http://www.dursi.ca/spark-in-hpc-clusters/ &lt;br /&gt;
&lt;br /&gt;
Prior to submitting sparkscript.py, change the import line to&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from pyspark.context import SparkContext&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or instead of submitting sparkscript.py, you can also try :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
spark-submit --master $sparkmaster --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/target/spark-examples_2.10-1.4.1.jar 256&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Submit a Job ==&lt;br /&gt;
&lt;br /&gt;
The current Scheduler is IBM's LoadLeveler. Be sure to &lt;br /&gt;
include the @environment flags shown below in the sample script as they are different and necessary to get full performance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
##===================================&lt;br /&gt;
## P7 Load Leveler Submission Script&lt;br /&gt;
##===================================&lt;br /&gt;
##&lt;br /&gt;
## Don't change these parameters unless you really know what you are doing&lt;br /&gt;
##&lt;br /&gt;
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \&lt;br /&gt;
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no&lt;br /&gt;
##&lt;br /&gt;
##===================================&lt;br /&gt;
## Avoid core dumps&lt;br /&gt;
## @ core_limit   = 0&lt;br /&gt;
##===================================&lt;br /&gt;
## Job specific&lt;br /&gt;
##===================================&lt;br /&gt;
#&lt;br /&gt;
# @ job_name = myjob&lt;br /&gt;
# @ job_type = parallel&lt;br /&gt;
# @ class = verylong&lt;br /&gt;
# @ output = $(jobid).out&lt;br /&gt;
# @ error = $(jobid).err&lt;br /&gt;
# @ wall_clock_limit = 2:00:00&lt;br /&gt;
# @ node = 2&lt;br /&gt;
# @ tasks_per_node = 128&lt;br /&gt;
# @ queue&lt;br /&gt;
#&lt;br /&gt;
#===================================&lt;br /&gt;
&lt;br /&gt;
#./my_script&lt;br /&gt;
./my_code &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llsubmit myjob.ll &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To show running jobs use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llcancel JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Split a Spark job ==&lt;br /&gt;
&lt;br /&gt;
e.g., To split a job into 256 tasks among 2 workers, you must select 3 nodes (one master and 2 workers) and add the following job specifications :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#@node = 3&lt;br /&gt;
#@preferences = Machine == { &amp;quot;AvailableNode1&amp;quot; &amp;quot;AvailableNode2&amp;quot; &amp;quot;AvailableNode3&amp;quot;}&lt;br /&gt;
#@task_per_node = 128&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitor your (Spark) job from localhost ==&lt;br /&gt;
&lt;br /&gt;
Spark creates a web UI on each master and slave that you can access from your local web browser. You can notably &amp;quot;check your cluster UI to ensure that workers are registered and have sufficient resources&amp;quot;. To do so, you must logged onto P7 (again) with forwarding the port of your cluster UI to your local port (e.g., 9999) :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ssh -L 9999:masternode:4040 userid@login.scinet.utoronto.ca&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then go to your web browser at http://localhost:9999&lt;br /&gt;
&lt;br /&gt;
== Specific Software Examples ==&lt;br /&gt;
&lt;br /&gt;
=== HybridX ===&lt;br /&gt;
[[HybridX_on_P7|This page]] contains information about HybridX usage on P7 cluster.&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=1582</id>
		<title>BGQ</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=1582"/>
		<updated>2018-10-02T14:04:34Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* 5D Torus Network */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Blue_Gene_Cabinet.jpeg|center|300px|thumb]]&lt;br /&gt;
|name=Blue Gene/Q (BGQ)&lt;br /&gt;
|installed=Aug 2012, Nov 2014&lt;br /&gt;
|operatingsystem= RH6.3, CNK (Linux) &lt;br /&gt;
|loginnode= bgqdev-fen1&lt;br /&gt;
|nnodes=  4096 nodes (65,536 cores)&lt;br /&gt;
|rampernode=16 GB &lt;br /&gt;
|corespernode=16 (64 threads)&lt;br /&gt;
|interconnect=5D Torus (jobs), QDR Infiniband (I/O) &lt;br /&gt;
|vendorcompilers= bgxlc, bgxlf&lt;br /&gt;
|queuetype=Loadleveler&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==System Status==&lt;br /&gt;
&lt;br /&gt;
The current BGQ system status can be found on the wiki's [[Main Page]].&lt;br /&gt;
&lt;br /&gt;
==SOSCIP &amp;amp; LKSAVI==&lt;br /&gt;
&lt;br /&gt;
The BGQ is a Southern Ontario Smart Computing&lt;br /&gt;
Innovation Platform ([http://soscip.org/ SOSCIP]) BlueGene/Q supercomputer located at the&lt;br /&gt;
University of Toronto's SciNet HPC facility. The SOSCIP &lt;br /&gt;
multi-university/industry consortium is funded by the Ontario Government &lt;br /&gt;
and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
A half-rack of BlueGene/Q (8,192 cores) was purchased by the [http://likashingvirology.med.ualberta.ca/ Li Ka Shing Institute of Virology] at the University of Alberta in late fall 2014 and integrated into the existing BGQ system.&lt;br /&gt;
&lt;br /&gt;
The combined 4 rack system is the fastest Canadian supercomputer on the [http://top500.org/ top 500], currently at the 120th place (Nov 2015) -- 499th as of Aug 2018.&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:bgq-support@scinet.utoronto.ca &amp;lt;bgq-support@scinet.utoronto.ca&amp;gt;] for BGQ-specific inquiries.&lt;br /&gt;
&lt;br /&gt;
==Specifications==&lt;br /&gt;
&lt;br /&gt;
BGQ is an extremely dense and energy efficient 3rd generation Blue Gene IBM supercomputer built around a system-on-a-chip compute node that has a 16core 1.6GHz PowerPC based CPU (PowerPC A2) with 16GB of Ram.  The nodes are bundled in groups of 32 into a node board (512 cores), and 16 boards make up a midplane (8192 cores) with 2 midplanes per rack, or 16,348 cores and 16 TB of RAM per rack. The compute nodes run a very lightweight Linux-based operating system called CNK ('''C'''ompute '''N'''ode '''K'''ernel).  The compute nodes are all connected together using a custom 5D torus highspeed interconnect. Each rack has 16 I/O nodes that run a full Redhat Linux OS that manages the compute nodes and mounts the filesystem.  SciNet's BGQ consists of 8 mdiplanes (four-racks) totalling 65,536 cores and 64TB of RAM.&lt;br /&gt;
&lt;br /&gt;
[[Image:BlueGeneQHardware2.png‎ |center]]&lt;br /&gt;
&lt;br /&gt;
=== 5D Torus Network ===&lt;br /&gt;
&lt;br /&gt;
The network topology of BlueGene/Q is a five-dimensional (5D) torus, with direct links between the nearest neighbours in the ±A, ±B, ±C, ±D, and ±E directions. As such there are only a few optimum block sizes that will use the network efficiently.&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellspacing=&amp;quot;0&amp;quot; cellpadding=&amp;quot;2&amp;quot;&lt;br /&gt;
| '''Node Boards '''&lt;br /&gt;
| '''Compute Nodes'''&lt;br /&gt;
| '''Cores'''&lt;br /&gt;
| '''Torus Dimensions'''&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 32&lt;br /&gt;
| 512&lt;br /&gt;
| 2x2x2x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 2 (adjacent pairs)&lt;br /&gt;
| 64&lt;br /&gt;
| 1024&lt;br /&gt;
| 2x2x4x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 4 (quadrants)&lt;br /&gt;
| 128&lt;br /&gt;
| 2048&lt;br /&gt;
| 2x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 8 (halves)&lt;br /&gt;
| 256&lt;br /&gt;
| 4096&lt;br /&gt;
| 4x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 16 (midplane)&lt;br /&gt;
| 512&lt;br /&gt;
| 8192&lt;br /&gt;
| 4x4x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 32 (1 rack)&lt;br /&gt;
| 1024&lt;br /&gt;
| 16384&lt;br /&gt;
| 4x4x4x8x2 &lt;br /&gt;
|-&lt;br /&gt;
| 64 (2 racks)&lt;br /&gt;
| 2048&lt;br /&gt;
| 32768&lt;br /&gt;
| 4x4x8x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 96 (3 racks)&lt;br /&gt;
| 3072&lt;br /&gt;
| 49152&lt;br /&gt;
| 4x4x12x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 128 (4 racks)&lt;br /&gt;
| 4096&lt;br /&gt;
| 65536&lt;br /&gt;
| 8x4x8x8x2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
On a 5D torus topology, each node has 10 point-to-point direct links. There is an additional 11th link to the I/O nodes.&lt;br /&gt;
&lt;br /&gt;
== Login/Devel Node ==&lt;br /&gt;
&lt;br /&gt;
The development node is '''bgqdev-fen1''' which one can login to directly from outside using '''bgqdev.scinet.utoronto.ca''', e.g.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -l USERNAME bgqdev.scinet.utoronto.ca -X&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where USERNAME is your username on the BGQ and the &amp;lt;tt&amp;gt;-X&amp;lt;/tt&amp;gt; flag is optional, needed only if you will use X graphics.&amp;lt;br/&amp;gt;&lt;br /&gt;
Note: To learn how to setup ssh keys for logging in please see [[SSH keys]].&lt;br /&gt;
&lt;br /&gt;
These development node is a Power7 machines running Linux which serve as the compilation and submission host for the BGQ.  Programs are cross-compiled for the BGQ on this node and then submitted to the queue using loadleveler.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including most of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
A list of the installed software can be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module unload vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands can go in your .bashrc files to make sure you are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
Modules that load libraries, define environment variables pointing to the location of library files and include files for use Makefiles. These environment variables follow the naming convention&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 $SCINET_[short-module-name]_BASE&lt;br /&gt;
 $SCINET_[short-module-name]_LIB&lt;br /&gt;
 $SCINET_[short-module-name]_INC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
for the base location of the module's files, the location of the libraries binaries and the header files, respectively.&lt;br /&gt;
&lt;br /&gt;
So to compile and link the library, you will have to add &amp;lt;tt&amp;gt;-I${SCINET_[module-basename]_INC}&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;-L${SCINET_[module-basename]_LIB}&amp;lt;/tt&amp;gt;, respectively, in addition to the usual &amp;lt;tt&amp;gt;-l[libname]&amp;lt;/tt&amp;gt;.  &lt;br /&gt;
&lt;br /&gt;
Note that a &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; command ''only'' sets the environment variables in your current shell (and any subprocesses that the shell launches).   It does ''not'' effect other shell environments.&lt;br /&gt;
&lt;br /&gt;
If you always require the same modules, it is easiest to load those modules in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and then they will always be present in your environment; if you routinely have to flip back and forth between modules, it is easiest to have almost no modules loaded in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and simply load them as you need them (and have the required &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; commands in your job submission scripts).&lt;br /&gt;
&lt;br /&gt;
=== Compilers ===&lt;br /&gt;
&lt;br /&gt;
The BGQ uses IBM XL compilers to cross-compile code for the BGQ.  Compilers are available for FORTRAN, C, and C++.  They are accessible by default, or by loading the '''xlf''' and '''vacpp''' modules. The compilers by default produce&lt;br /&gt;
static binaries, however with BGQ it is possible to now use dynamic libraries as well.  The compilers follow the XL conventions with the prefix '''bg''',&lt;br /&gt;
so '''bgxlc''' and '''bgxlf90''' are the C and FORTRAN compilers respectively.  &lt;br /&gt;
&lt;br /&gt;
Most users however will use the MPI variants, i.e. '''mpixlf90''' and '''mpixlc''' and  which are available by loading&lt;br /&gt;
the '''mpich2''' module. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load mpich2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is recommended to use at least the following flags when compiling and linking&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-O3 -qarch=qp -qtune=qp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to build a package for which the configure script tries to run small test jobs, the cross-compiling nature of the bgq can get in the way.  In that case, you should use the interactive [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] environment as described below.&lt;br /&gt;
&lt;br /&gt;
== ION/Devel Nodes ==&lt;br /&gt;
&lt;br /&gt;
There are also bgq native development nodes named '''bgqdev-ion[01-24]''' which one can login to directly, i.e. ssh, from '''bgqdev-fen1'''.  These nodes are extra I/O nodes that are essentially the same as the BGQ compute nodes with the exception that they run a full RedHat Linux and have an infiniband interface providing direct network access.    Unlike the regular development node, '''bgqdev-fen1''', which is Power7, this node has the same BGQ A2 processor, and thus cross compilations are not required which can make building some software easier.    &lt;br /&gt;
&lt;br /&gt;
'''NOTE''': BGQ MPI jobs can be compiled on these nodes, however can not be run locally as the mpich2 is setup for the BGQ network and thus will fail on these nodes.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
As the BlueGene/Q architecture is different from the development nodes, you cannot run applications intended/compiled for the BGQ on the devel nodes. The only way to run (or even test) your program is to submit a job to the BGQ.  Jobs are submitted as scripts through loadleveler. That script must then use '''runjob''' to start the job, which in many ways similar to mpirun or mpiexec.  As shown above in the network topology overview, there are only a few optimum job size configurations which is also further constrained by each block requiring a minimum of one IO node.  In SciNet's configuration (with 8 I/O nodes per midplane) this allows 64 nodes (1024 cores) to be the smallest block size. Normally a block size matches the job size to offer fully dedicated resources to the job.  Smaller jobs can be run within the same block however this results in shared resources (network and IO) and are referred to as sub-block jobs and are described in more detail below.  &lt;br /&gt;
&lt;br /&gt;
=== runjob ===&lt;br /&gt;
&lt;br /&gt;
All BGQ runs are launched using '''runjob''' which for those familiar with MPI is analogous to mpirun/mpiexec.  Jobs run on a block, which is a predefined group of nodes that have already been configured and booted.  There are two ways to get a block. One way is to use a 30-minute 'debugjob' session (more about that below). The other, more common case, is using a job script submitted and are running using loadleveler. Inside the job script, this block is set for you, and you do not have to specify the block name.  For example, if your loadleveler job script requests 64 nodes, each with 16 cores (for a total of 1024 cores), from within that job script, you can run a job with 16 processes per node and 1024 total processes with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here, &amp;lt;tt&amp;gt;--np 1024&amp;lt;/tt&amp;gt; sets the total number of mpi tasks, while &amp;lt;tt&amp;gt;--ranks-per-node=16&amp;lt;/tt&amp;gt; specifies that 16 processes should run on each node.&lt;br /&gt;
For pure mpi jobs, it is advisable always to give the number of ranks per node, because the default value of 1 may leave 15 cores on the node idle. The argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- (Note: If this were not a loadleveler job, and the block ID was R00-M0-N03-64, the command would be &amp;quot;&amp;lt;tt&amp;gt;runjob --block R00-M0-N03-64 --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&amp;lt;/tt&amp;gt;&amp;quot;) --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
runjob flags are shown with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
a particularly useful one is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--verbose #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where # is from 1-7 which can be helpful in debugging an application.&lt;br /&gt;
&lt;br /&gt;
=== How to set ranks-per-node ===&lt;br /&gt;
&lt;br /&gt;
There are 16 cores per node, but the argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64.  While it may seem natural to set ranks-per-node to 16, this is not generally recommended.  On the BGQ, one can efficiently run more than 1 process per core, because each core has four &amp;quot;hardware threads&amp;quot; (similar to HyperThreading on the GPC and Simultaneous Multi Threading on the TCS and P7), which can keep the different parts of each core busy at the same time. One would therefore ideally use 64 ranks per node.  There are two main reason why one might not set ranks-per-node to 64:&lt;br /&gt;
# The memory requirements do not allow 64 ranks (each rank only has 256MB of memory)&lt;br /&gt;
# The application is more efficient in a hybrid MPI/OpenMP mode (or MPI/pthreads). Using less ranks-per-node, the hardware threads are used as OpenMP threads within each process.&lt;br /&gt;
Because threads can share memory, the memory requirements of the hybrid runs is typically smaller than that of pure MPI runs.&lt;br /&gt;
&lt;br /&gt;
Note that the total number of mpi processes in a runjob (i.e., the --np argument) should be the ranks-per-node times the number of nodes (set by bg_size in the loadleveler script). So for the same number of nodes, if you change ranks-per-node by a factor of two, you should also multiply the total number of mpi processes by two.&lt;br /&gt;
&lt;br /&gt;
=== Queue Limits ===&lt;br /&gt;
&lt;br /&gt;
The maximum wall_clock_limit is 24 hours.  Official SOSCIP project jobs are prioritized over all other jobs using a fairshare algorithm with a 14 day rolling window.&lt;br /&gt;
&lt;br /&gt;
A 64 node block is reserved for development and interactive testing for 16 hours, from 8AM to midnight, everyday including weekends. While you can still reserve an interactive block from midnight to 8AM, the priority is given to batch jobs at that time interval in order to keep the machine usage as high as possible. This block is accessed by using the [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] command which has a 30 minute maximum wall_clock_limit. The purpose of this reservation is to ensure short testing jobs are run quickly without being held up by longer production type jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- We need to recover this functionality again. At the moment it doesn't work&lt;br /&gt;
=== BACKFILL scheduling ===&lt;br /&gt;
To optimize the cluster usage, we encourage users to submit jobs according to the available resources on BGQ. The command &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;llAvailableResources&amp;lt;/span&amp;gt; gives for example :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
On the Devel system : only a debugjob can start immediately&lt;br /&gt;
&lt;br /&gt;
On the Prod. system : a job will start immediately if you use 512 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 256 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 128 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 64 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Batch Jobs ===&lt;br /&gt;
&lt;br /&gt;
Job submission is done through loadleveler with a few blue gene specific commands.  The command &amp;quot;bg_size&amp;quot; is in number of nodes, not cores, so a bg_size=64 would be 64x16=1024 cores.&lt;br /&gt;
&lt;br /&gt;
The parameter &amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;bg_size&amp;lt;/span&amp;gt; can only be equal to 64, 128, 256, 512, 1024 and 2048.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;np&amp;lt;/span&amp;gt; &amp;amp;le; ranks-per-node * bg_size&lt;br /&gt;
&lt;br /&gt;
ranks-per-node &amp;amp;le; np&lt;br /&gt;
&lt;br /&gt;
(ranks-per-node * OMP_NUM_THREADS ) &amp;amp;le; 64 &lt;br /&gt;
&lt;br /&gt;
np : number of MPI processes&lt;br /&gt;
&lt;br /&gt;
ranks-per-node : number of MPI processes per node = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
OMP_NUM_THREADS : number of OpenMP thread per MPI process (for hybrid codes) = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue &lt;br /&gt;
&lt;br /&gt;
# Launch all BGQ jobs using runjob&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit to the queue use &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llsubmit myscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== Steps ( Job dependency) ===&lt;br /&gt;
LoadLeveler has a lot of advanced features to control job submission and execution. One of these features is called steps. This feature allows a series of jobs to be submitted using one script with dependencies defined between the jobs. What this allows is for a series of jobs to be run sequentially, waiting for the previous job, called a step, to be finished before the next job is started. The following example uses the same LoadLeveler script as previously shown, however the #@ step_name and #@ dependency directives are used to rerun the same case three times in a row, waiting until each job is finished to start the next.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step1                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the first step :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step1&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step2                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step1 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the second step if the first one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step2&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step3                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step2 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the third step if the second one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step3&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Jobs ===&lt;br /&gt;
&lt;br /&gt;
To see running jobs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq -b&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
to cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llcancel JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and to look at details of the bluegene resources use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstatus -M all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note: the loadleveler script commands  are not run on a bgq compute node but on the front-end node. Only programs started with runjob run on the bgq compute nodes. You should therefore keep scripting in the submission script to a bare minimum.'''&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Stats ===&lt;br /&gt;
&lt;br /&gt;
Use llbgstats to monitor your own stats and/or your group stats. PIs can also print their (current) monthly report.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstats -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Interactive Use / Debugging ===&lt;br /&gt;
&lt;br /&gt;
As BGQ codes are cross-compiled they cannot be run direclty on the front-nodes.  &lt;br /&gt;
Users however only have access to the BGQ through loadleveler which is appropriate for batch jobs, &lt;br /&gt;
however an interactive session is typically beneficial when debugging and developing.   As such a &lt;br /&gt;
script has been written to allow a session in which runjob can be run interactively.  The script&lt;br /&gt;
uses loadleveler to setup a block and set all the correct environment variables and then launch a spawned shell on&lt;br /&gt;
the front-end node. The '''debugjob''' session currently allows a 30 minute session on 64 nodes and when run on &lt;br /&gt;
'''&amp;lt;tt&amp;gt;bgqdev&amp;lt;/tt&amp;gt;''' runs in a dedicated reservation as described previously in the [[BGQ#Queue_Limits | queue limits]] section. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[user@bgqdev-fen1]$ debugjob&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ runjob --np 64 --ranks-per-node=16 --cwd=$PWD : $PWD/my_code -f myflags&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ exit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For debugging, gdb and Allinea DDT are available. The latter is recommended as it automatically attaches to all the processes of a process (instead of attaching a gdbtool by hand (as explained in the BGQ Application Development guide, link below). Simply compile with &amp;lt;tt&amp;gt;-g&amp;lt;/tt&amp;gt;, load the &amp;lt;tt&amp;gt;ddt/4.1&amp;lt;/tt&amp;gt; module, type &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt; and follow the graphical user interface.  The DDT user guide can be found below.&lt;br /&gt;
&lt;br /&gt;
Note: when running a job under ddt, you'll need to add &amp;quot;&amp;lt;tt&amp;gt;--ranks-per-node=X&amp;lt;/tt&amp;gt;&amp;quot; to the &amp;quot;runjob arguments&amp;quot; field.&lt;br /&gt;
&lt;br /&gt;
Apart from debugging, this environment is also useful for building libraries and applications that need to run small tests as part of their 'configure' step.   Within the debugjob session, applications compiled with the bgxl compilers or the mpcc/mpCC/mpfort wrappers, will automatically run on the BGQ, skipping the need for the runjob command, provided if you set the following environment variables &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export BG_PGM_LAUNCHER=yes&lt;br /&gt;
$ export RUNJOB_NP=1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The latter setting sets the number of mpi processes to run.  Most configure scripts expect only one mpi process, thus, &amp;lt;tt&amp;gt;RUNJOB_NP=1&amp;lt;/tt&amp;gt; is appropriate.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
debugjob session with an executable implicitly calls runjob  with 1 mpi task :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
debugjob -i&lt;br /&gt;
**********************************************************&lt;br /&gt;
 Interactive BGQ runjob shell using bgq-fen1-ib0.10295.0 and           &lt;br /&gt;
 LL14040718574824 for 30 minutes with 64 NODES (1024 cores). &lt;br /&gt;
 IMPLICIT MODE: running an executable implicitly calls runjob&lt;br /&gt;
                with 1 mpi task&lt;br /&gt;
 Exit shell when finished.                                &lt;br /&gt;
**********************************************************&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Sub-block jobs ===&lt;br /&gt;
&lt;br /&gt;
BGQ allows multiple applications to share the same block, which is referred to as sub-block jobs, however this needs to be done from within the same loadleveler submission script using multiple calls to runjob.  To run a sub-block job, you need to specify a &amp;quot;--corner&amp;quot; within the block to start each job and a 5D Torus AxBxCxDxE &amp;quot;--shape&amp;quot;.  The starting corner will depend on the specific block details provided by loadleveler and the shape and size of job trying to be used.  &lt;br /&gt;
&lt;br /&gt;
Figuring out what the corners and shapes should be is very tricky (especially since it depends on the block you get allocated).  For that reason, we've created a script called &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; that determines the corners and shape of the sub-blocks.  It only handles the (presumable common) case in which you want to subdivide the block into n equally sized sub-blocks, where n may be 1,2,4,8,16 and 32.&lt;br /&gt;
&lt;br /&gt;
Here is an example script calling &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; with a size of 4 that will return the appropriate $SHAPE argument and an array of 16 starting $CORNER. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# @ job_name           = bgsubblock&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job SUBBLOCK &amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue&lt;br /&gt;
&lt;br /&gt;
# Using subblocks script to set $SHAPE and array of ${CORNERS[n]}&lt;br /&gt;
# with size of subblocks in nodes (ie similiar to bg_size)&lt;br /&gt;
&lt;br /&gt;
# In this case 16 sub-blocks of 4 cnodes each (64 total ie bg_size)&lt;br /&gt;
source subblocks 4&lt;br /&gt;
&lt;br /&gt;
# 16 jobs of 4 each&lt;br /&gt;
for (( i=0; i &amp;lt;  16 ; i++)); do&lt;br /&gt;
   runjob --corner ${CORNER[$i]} --shape $SHAPE --np 64 --ranks-per-node=16 :  your_code_here &amp;gt; $i.out &amp;amp;&lt;br /&gt;
done&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Remember that subjobs are not the ideal way to run on the BlueGene/Qs. One needs to consider that these sub-blocks all have to share the same I/O nodes, so for I/O intensive jobs this will be an inefficient setup.  Also consider that if you need to run such small jobs that you have to run in sub-blocks, it may be more efficient to use other clusters such as the GPC.&lt;br /&gt;
&lt;br /&gt;
Let us know if you run into any issues with this technique, please contact bgq-support for help.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The BGQ has its own dedicated 500TB file system based on GPFS (General Parallel File System). There are two main systems for user data: /home, a small, backed-up space where user home directories are located, and /scratch, a large system for input or output data for jobs; data on /scratch is not backed up. The path to your home directory is in the environment variable $HOME, and will look like /home/G/GROUP/USER, .  The path to your scratch directory is in the environment variable $SCRATCH, and will look like /scratch/G/GROUP/USER (following the conventions of the rest of the SciNet systems).  &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! | file system &lt;br /&gt;
! | purpose &lt;br /&gt;
! | user quota &lt;br /&gt;
! | backed up&lt;br /&gt;
! | purged&lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 50 GB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
|-&lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| first of (20 TB ; 1 million files)&lt;br /&gt;
| no&lt;br /&gt;
| not currently&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Transfering files===&lt;br /&gt;
The BGQ GPFS file system,  except for HPSS, is '''not''' shared with the other SciNet systems (gpc, tcs, p7, arc), nor is the other file system mounted on the BGQ.  &lt;br /&gt;
Use scp to copy files from one file system to the other, e.g., from bgqdev-fen1, you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour login.scinet.utoronto.ca:code.tgz .&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or from a login node you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour code.tgz bgqdev.scinet.utoronto.ca:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The flag &amp;lt;tt&amp;gt;-c arcfour&amp;lt;/tt&amp;gt; is optional. It tells scp (or really, ssh), to use a non-default encryption. The one chosen here, arcfour, has been found to speed up the transfer by a factor of two (you may expect around 85MB/s).  This encryption method is only recommended for copying from the BGQ file system to the regular SciNet GPFS file system or back. &lt;br /&gt;
 &lt;br /&gt;
Note that although these transfers are witihin the same data center, you have to use the full names of the systems, login.scinet.utoronto.ca and bgq.scinet.utoronto.ca, respectively, and that you will be asked you for your password.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''diskUsage'''&amp;lt;/tt&amp;gt; command, available on the bgqdev nodes, provides information in a number of ways on the home and scratch file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;] [-de|-plot]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
       -plot: create plots of disk usages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that the information on usage and quota is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
===Bridge to HPSS===&lt;br /&gt;
&lt;br /&gt;
BGQ users may transfer material to/from HPSS via the GPC archive queue. On the HPSS gateway node (gpc-archive01), the BGQ GPFS file systems are mounted under a single mounting point /bgq (/bgq/scratch and /bgq/home). For detailed information on the use of HPSS [https://docs.scinet.utoronto.ca/index.php/HPSS please read the HPSS wiki section.]&lt;br /&gt;
&lt;br /&gt;
== Software modules installed on the BGQ ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! |Software  &lt;br /&gt;
! | Version&lt;br /&gt;
! | Comments&lt;br /&gt;
! | Command/Library&lt;br /&gt;
! | Module Name&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Compilers &amp;amp; Development Tools'''''&lt;br /&gt;
|-&lt;br /&gt;
|IBM fortran compiler&lt;br /&gt;
|14.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlf,bgxlf_r,bgxlf90,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|xlf&lt;br /&gt;
|-&lt;br /&gt;
|IBM c/c++ compilers&lt;br /&gt;
|12.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlc,bgxlC,bgxlc_r,bgxlC_r,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|vacpp&lt;br /&gt;
|-&lt;br /&gt;
|MPICH2 MPI library&lt;br /&gt;
|1.4.1&lt;br /&gt;
|There are 4 versions (see BGQ Applications Development document).&lt;br /&gt;
|&amp;lt;tt&amp;gt;mpicc,mpicxx,mpif77,mpif90&amp;lt;/tt&amp;gt;&lt;br /&gt;
|mpich2&lt;br /&gt;
|- &lt;br /&gt;
| GCC Compiler&lt;br /&gt;
| 4.4.6, 4.8.1&lt;br /&gt;
| GNU Compiler Collection for BGQ&amp;lt;br&amp;gt;(4.8.1 requires binutils/2.23 to be loaded)&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-gcc, powerpc64-bgq-linux-g++, powerpc64-bgq-linux-gfortran&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgqgcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Clang Compiler&lt;br /&gt;
| r217688-20140912, r263698-20160317&lt;br /&gt;
| Clang cross-compilers for bgq&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-clang, powerpc64-bgq-linux-clang++&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgclang&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Binutils&lt;br /&gt;
| 2.21.1, 2.23&lt;br /&gt;
| Cross-compilation utilities&lt;br /&gt;
| &amp;lt;tt&amp;gt;addr2line, ar, ld, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;binutils&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| CMake	&lt;br /&gt;
| 2.8.8, 2.8.12.1&lt;br /&gt;
| cross-platform, open-source build system&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Git&lt;br /&gt;
| 1.9.5&lt;br /&gt;
| Revision control system&lt;br /&gt;
| &amp;lt;tt&amp;gt;git, gitk&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;git&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Debug/performance tools'''''&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/gdb/ gdb]&lt;br /&gt;
| 7.2&lt;br /&gt;
| GNU Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/ddd/ ddd]&lt;br /&gt;
| 3.3.12&lt;br /&gt;
| GNO Data Display Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [http://www.allinea.com/products/ddt/ DDT]&lt;br /&gt;
| 4.1, 4.2, 5.0.1&lt;br /&gt;
| Allinea's Distributed Debugging Tool&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[HPCTW]]&lt;br /&gt;
| 1.0&lt;br /&gt;
| BGQ MPI and Hardware Counters&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmpihpm.a, libmpihpm_smp.a, libmpitrace.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hptibm&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[MemP]]&lt;br /&gt;
| 1.0.3&lt;br /&gt;
| BGQ Memory Stats&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmemP.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;memP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Storage tools/libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.9-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/189-v18-serial-xlc*&amp;lt;br/&amp;gt;hdf5/189-v18-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.12-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/1812-v18-serial-gcc&amp;lt;br/&amp;gt;hdf5/1812-v18-mpich2-gcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NetCDF&lt;br /&gt;
| 4.2.1.1&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncdump,ncgen,libnetcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;netcdf/4.2.1.1-serial-xlc*&amp;lt;br/&amp;gt;netcdf/4.2.1.1-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Parallel NetCDF&lt;br /&gt;
| 1.3.1&lt;br /&gt;
| Parallel scientific data storage and retrieval using MPI-IO&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpnetcdf.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parallel-netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| ESSL&lt;br /&gt;
| 5.1&lt;br /&gt;
| IBM Engineering and Scientific Subroutine Library (manual below)&lt;br /&gt;
| &amp;lt;tt&amp;gt;libesslbg,libesslsmpbg&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;essl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| WSMP&lt;br /&gt;
| 15.06.01&lt;br /&gt;
| Watson Sparse Matrix Package&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpwsmpBGQ.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;WSMP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| FFTW&lt;br /&gt;
| 2.1.5, 3.3.2, 3.1.2-esslwrapper&lt;br /&gt;
| Fast fourier transform &lt;br /&gt;
| &amp;lt;tt&amp;gt;libsfftw,libdfftw,libfftw3, libfftw3f&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;fftw/2.1.5, fftw/3.3.2, fftw/3.1.2-esslwrapper&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAPACK + ScaLAPACK&lt;br /&gt;
| 3.4.2 + 2.0.2&lt;br /&gt;
| Linear algebra routines. A subset of Lapack may be found in ESSL as well.&lt;br /&gt;
| &amp;lt;tt&amp;gt;liblapack, libscalpack&amp;lt;/tt&amp;gt;&lt;br /&gt;
| lapack&lt;br /&gt;
|-&lt;br /&gt;
| GSL&lt;br /&gt;
| 1.15&lt;br /&gt;
| GNU Scientific Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libgsl, libgslcblas&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gsl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| BOOST&lt;br /&gt;
| 1.47.0, 1.54, 1.57&lt;br /&gt;
| C++ Boost libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libboost...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cxxlibraries/boost&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| bzip2 + szip + zlib&lt;br /&gt;
| 1.0.6 + 2.1 + 1.2.7&lt;br /&gt;
| compression libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libbz2,libz,libsz&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;compression&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| METIS&lt;br /&gt;
| 5.0.2&lt;br /&gt;
| Serial Graph Partitioning and Fill-reducing Matrix Ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;metis&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| ParMETIS&lt;br /&gt;
| 4.0.2&lt;br /&gt;
| Parallel graph partitioning and fill-reducing matrix ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libparmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parmetis&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| OpenSSL&lt;br /&gt;
| 1.0.2 &lt;br /&gt;
| General-purpose cryptography library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libcrypto, libssl&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openssl&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| FILTLAN&lt;br /&gt;
| 1.0&lt;br /&gt;
| The Filtered Lanczos Package &lt;br /&gt;
| &amp;lt;tt&amp;gt;libdfiltlan,libdmatkit,libsfiltlan,libsmatkit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FILTLAN&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Scripting/interpreted languages'''''&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.6.6&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-2.6/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.7.3&lt;br /&gt;
| Python programming language. Modules included : numpy-1.8.0, pyFFTW-0.9.2, astropy-0.3, scipy-0.13.3, mpi4py-1.3.1, h5py-2.2.1&lt;br /&gt;
| &amp;lt;tt&amp;gt;/scinet/bgq/tools/Python/python2.7.3-20131205/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 3.2.2&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-3.2/bin/python3&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Applications'''''&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.abinit.org/ ABINIT]&lt;br /&gt;
| 7.10.4&lt;br /&gt;
| An atomic-scale simulation software suite&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.berkeleygw.org/ BerkeleyGW library]&lt;br /&gt;
| 1.0.4-2.0.0436&lt;br /&gt;
| Computes quasiparticle properties and the optical responses of a large variety of materials&lt;br /&gt;
| &amp;lt;tt&amp;gt;libBGW_wfn.a, wfn_rho_vxc_io_m.mod&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;BGW-paratec&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.cp2k.org/ CP2K]&lt;br /&gt;
| 2.3, 2.4, 2.5.1, 2.6.1&lt;br /&gt;
| DFT molecular dynamics, MPI &lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k.psmp&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.cpmd.org/ CPMD]&lt;br /&gt;
| 3.15.3, 3.17.1&lt;br /&gt;
| Carr-Parinello molecular dynamics, MPI&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd.x&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| gnuplot&lt;br /&gt;
| 4.6.1&lt;br /&gt;
| interactive plotting program to be run on front-end nodes&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAMMPS&lt;br /&gt;
| Nov 2012/7Dec15/7Dec15-mpi&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;lmp_bgq&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;lammps&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NAMD&lt;br /&gt;
| 2.9&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;namd2&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;namd/2.9-smp&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.quantum-espresso.org/index.php Quantum Espresso]&lt;br /&gt;
| 5.0.3/5.2.1&lt;br /&gt;
| Molecular Structure / Quantum Chemistry &lt;br /&gt;
| &amp;lt;tt&amp;gt;qe_pw.x, etc&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;espresso&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://openfoam.org OpenFOAM]&lt;br /&gt;
| 2.2.0, 2.3.0, 2.4.0, 3.0.1, 5.0&lt;br /&gt;
| Computational Fluid Dynamics&lt;br /&gt;
| &amp;lt;tt&amp;gt;icofoam,etc. &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openfoam/2.2.0, openfoam/2.3.0, openfoam/2.4.0, openfoam/3.0.1, openfoam/5.0&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Beta Tests'''''&lt;br /&gt;
|-&lt;br /&gt;
| WATSON API&lt;br /&gt;
| beta&lt;br /&gt;
| Natural Language Processing&lt;br /&gt;
| &amp;lt;tt&amp;gt;watson_beta&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FEN/WATSON&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== OpenFOAM on BGQ ===&lt;br /&gt;
&lt;br /&gt;
[[OpenFOAM_on_BGQ|A detailed explanation]] of OpenFOAM usage on the BG/Q cluster.&lt;br /&gt;
&lt;br /&gt;
== Python on BlueGene ==&lt;br /&gt;
Python 2.7.3 has been installed on BlueGene. To use &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Numpy&amp;lt;/span&amp;gt; and &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Scipy&amp;lt;/span&amp;gt;, the module &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;essl/5.1&amp;lt;/span&amp;gt; has to be loaded.&lt;br /&gt;
The full python path has to be provided (otherwise the default version is used).&lt;br /&gt;
&lt;br /&gt;
To use python on BlueGene (from within a job script or a debugjob session):&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load python/2.7.3&lt;br /&gt;
##Only if you need numpy/scipy :&lt;br /&gt;
module load xlf/14.1 essl/5.1&lt;br /&gt;
runjob --np 1 --ranks-per-node=1 --envs HOME=$HOME LD_LIBRARY_PATH=$LD_LIBRARY_PATH PYTHONPATH=/scinet/bgq/tools/Python/python2.7.3-20131205/lib/python2.7/site-packages/ : /scinet/bgq/tools/Python/python2.7.3-20131205/bin/python2.7 /PATHOFYOURSCRIPT.py &lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to use the mmap python API, you must use it in PRIVATE mode as shown in the bellow example :&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
import mmap&lt;br /&gt;
mm=mmap.mmap(-1,256,mmap.MAP_PRIVATE)&lt;br /&gt;
mm.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, you can use the mpi4py and h5py modules.&lt;br /&gt;
&lt;br /&gt;
Also, please read Cython documentation.&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
#BGQ Day: Introduction to Using the BG/Q [[Media:BgqintroUpdatedMarch2015.pdf|Slides (updated in 2015) ]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.mp4 (direct link)]&lt;br /&gt;
#BGQ Day: BG/Q Hardware Overview [https://support.scinet.utoronto.ca/~northrup/bgqhardware.pdf Slides] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.mp4 (direct link)]&lt;br /&gt;
# [http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUQUEEN/Documentation/Documention_node.html Julich BGQ Documentation]&lt;br /&gt;
# [https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q Argonne Mira BGQ Wiki]&lt;br /&gt;
# [https://computing.llnl.gov/tutorials/bgq/ LLNL Sequoia BGQ Info]&lt;br /&gt;
# [https://www.alcf.anl.gov/presentations Argonne MiraCon Presentations]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_SysAdmin.pdf|BGQ System Administration Guide]]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_AppDev.pdf|BGQ Application Development]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqccompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqclangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfcompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqflangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:Bgqfproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# [[Media:essl51.pdf|IBM ESSL (Engineering and Scientific Subroutine Library) 5.1 for Linux on Power]]&lt;br /&gt;
# [http://content.allinea.com/downloads/userguide.pdf Allinea DDT 4.1 User Guide]&lt;br /&gt;
# [https://www.ibm.com/support/knowledgecenter/en/SSFJTW_5.1.0/loadl.v5r1_welcome.html IBM LoadLeveler 5.1]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--  PUT IN TRAC !!!&lt;br /&gt;
&lt;br /&gt;
=== *Manual Block Creation* ===&lt;br /&gt;
&lt;br /&gt;
To reconfigure the BGQ nodes you can use the bg_console or the web based navigator from the service node &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bg_console&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are various options to create block types (section 3.2 in the BGQ admin manual), but the smallest is created using the&lt;br /&gt;
following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gen_small_block &amp;lt;blockid&amp;gt; &amp;lt;midplane&amp;gt; &amp;lt;cnodes&amp;gt; &amp;lt;nodeboard&amp;gt; &lt;br /&gt;
gen_small_block  R00-M0-N03-32 R00-M0 32 N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The block then needs to be booted using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
allocate R00-M0-N03-32&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If those resources are already booted into another block, that block must be freed before the new block can be &lt;br /&gt;
allocated.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
free R00-M0-N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many other functions in bg_console:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
help all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The BGQ default nomenclature for hardware is as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
(R)ack - (M)idplane - (N)ode board or block - (J)node - (C)ore&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So R00-M01-N03-J00-C02 would correspond to the first rack, second midplane, 3rd block, 1st node, and second core.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--!&amp;gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=1581</id>
		<title>BGQ</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=1581"/>
		<updated>2018-10-02T14:03:07Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* 5D Torus Network */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Blue_Gene_Cabinet.jpeg|center|300px|thumb]]&lt;br /&gt;
|name=Blue Gene/Q (BGQ)&lt;br /&gt;
|installed=Aug 2012, Nov 2014&lt;br /&gt;
|operatingsystem= RH6.3, CNK (Linux) &lt;br /&gt;
|loginnode= bgqdev-fen1&lt;br /&gt;
|nnodes=  4096 nodes (65,536 cores)&lt;br /&gt;
|rampernode=16 GB &lt;br /&gt;
|corespernode=16 (64 threads)&lt;br /&gt;
|interconnect=5D Torus (jobs), QDR Infiniband (I/O) &lt;br /&gt;
|vendorcompilers= bgxlc, bgxlf&lt;br /&gt;
|queuetype=Loadleveler&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==System Status==&lt;br /&gt;
&lt;br /&gt;
The current BGQ system status can be found on the wiki's [[Main Page]].&lt;br /&gt;
&lt;br /&gt;
==SOSCIP &amp;amp; LKSAVI==&lt;br /&gt;
&lt;br /&gt;
The BGQ is a Southern Ontario Smart Computing&lt;br /&gt;
Innovation Platform ([http://soscip.org/ SOSCIP]) BlueGene/Q supercomputer located at the&lt;br /&gt;
University of Toronto's SciNet HPC facility. The SOSCIP &lt;br /&gt;
multi-university/industry consortium is funded by the Ontario Government &lt;br /&gt;
and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
A half-rack of BlueGene/Q (8,192 cores) was purchased by the [http://likashingvirology.med.ualberta.ca/ Li Ka Shing Institute of Virology] at the University of Alberta in late fall 2014 and integrated into the existing BGQ system.&lt;br /&gt;
&lt;br /&gt;
The combined 4 rack system is the fastest Canadian supercomputer on the [http://top500.org/ top 500], currently at the 120th place (Nov 2015) -- 499th as of Aug 2018.&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:bgq-support@scinet.utoronto.ca &amp;lt;bgq-support@scinet.utoronto.ca&amp;gt;] for BGQ-specific inquiries.&lt;br /&gt;
&lt;br /&gt;
==Specifications==&lt;br /&gt;
&lt;br /&gt;
BGQ is an extremely dense and energy efficient 3rd generation Blue Gene IBM supercomputer built around a system-on-a-chip compute node that has a 16core 1.6GHz PowerPC based CPU (PowerPC A2) with 16GB of Ram.  The nodes are bundled in groups of 32 into a node board (512 cores), and 16 boards make up a midplane (8192 cores) with 2 midplanes per rack, or 16,348 cores and 16 TB of RAM per rack. The compute nodes run a very lightweight Linux-based operating system called CNK ('''C'''ompute '''N'''ode '''K'''ernel).  The compute nodes are all connected together using a custom 5D torus highspeed interconnect. Each rack has 16 I/O nodes that run a full Redhat Linux OS that manages the compute nodes and mounts the filesystem.  SciNet's BGQ consists of 8 mdiplanes (four-racks) totalling 65,536 cores and 64TB of RAM.&lt;br /&gt;
&lt;br /&gt;
[[Image:BlueGeneQHardware2.png‎ |center]]&lt;br /&gt;
&lt;br /&gt;
=== 5D Torus Network ===&lt;br /&gt;
&lt;br /&gt;
The network topology of BlueGene/Q is a five-dimensional (5D) torus, with direct links between the nearest neighbours in the ±A, ±B, ±C, ±D, and ±E directions. Therefore each node has 10 nearest neighbours. As such there are only a few optimum block sizes that will use the network efficiently.&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellspacing=&amp;quot;0&amp;quot; cellpadding=&amp;quot;2&amp;quot;&lt;br /&gt;
| '''Node Boards '''&lt;br /&gt;
| '''Compute Nodes'''&lt;br /&gt;
| '''Cores'''&lt;br /&gt;
| '''Torus Dimensions'''&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 32&lt;br /&gt;
| 512&lt;br /&gt;
| 2x2x2x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 2 (adjacent pairs)&lt;br /&gt;
| 64&lt;br /&gt;
| 1024&lt;br /&gt;
| 2x2x4x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 4 (quadrants)&lt;br /&gt;
| 128&lt;br /&gt;
| 2048&lt;br /&gt;
| 2x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 8 (halves)&lt;br /&gt;
| 256&lt;br /&gt;
| 4096&lt;br /&gt;
| 4x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 16 (midplane)&lt;br /&gt;
| 512&lt;br /&gt;
| 8192&lt;br /&gt;
| 4x4x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 32 (1 rack)&lt;br /&gt;
| 1024&lt;br /&gt;
| 16384&lt;br /&gt;
| 4x4x4x8x2 &lt;br /&gt;
|-&lt;br /&gt;
| 64 (2 racks)&lt;br /&gt;
| 2048&lt;br /&gt;
| 32768&lt;br /&gt;
| 4x4x8x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 96 (3 racks)&lt;br /&gt;
| 3072&lt;br /&gt;
| 49152&lt;br /&gt;
| 4x4x12x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 128 (4 racks)&lt;br /&gt;
| 4096&lt;br /&gt;
| 65536&lt;br /&gt;
| 8x4x8x8x2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Login/Devel Node ==&lt;br /&gt;
&lt;br /&gt;
The development node is '''bgqdev-fen1''' which one can login to directly from outside using '''bgqdev.scinet.utoronto.ca''', e.g.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -l USERNAME bgqdev.scinet.utoronto.ca -X&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where USERNAME is your username on the BGQ and the &amp;lt;tt&amp;gt;-X&amp;lt;/tt&amp;gt; flag is optional, needed only if you will use X graphics.&amp;lt;br/&amp;gt;&lt;br /&gt;
Note: To learn how to setup ssh keys for logging in please see [[SSH keys]].&lt;br /&gt;
&lt;br /&gt;
These development node is a Power7 machines running Linux which serve as the compilation and submission host for the BGQ.  Programs are cross-compiled for the BGQ on this node and then submitted to the queue using loadleveler.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including most of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
A list of the installed software can be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module unload vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands can go in your .bashrc files to make sure you are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
Modules that load libraries, define environment variables pointing to the location of library files and include files for use Makefiles. These environment variables follow the naming convention&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 $SCINET_[short-module-name]_BASE&lt;br /&gt;
 $SCINET_[short-module-name]_LIB&lt;br /&gt;
 $SCINET_[short-module-name]_INC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
for the base location of the module's files, the location of the libraries binaries and the header files, respectively.&lt;br /&gt;
&lt;br /&gt;
So to compile and link the library, you will have to add &amp;lt;tt&amp;gt;-I${SCINET_[module-basename]_INC}&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;-L${SCINET_[module-basename]_LIB}&amp;lt;/tt&amp;gt;, respectively, in addition to the usual &amp;lt;tt&amp;gt;-l[libname]&amp;lt;/tt&amp;gt;.  &lt;br /&gt;
&lt;br /&gt;
Note that a &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; command ''only'' sets the environment variables in your current shell (and any subprocesses that the shell launches).   It does ''not'' effect other shell environments.&lt;br /&gt;
&lt;br /&gt;
If you always require the same modules, it is easiest to load those modules in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and then they will always be present in your environment; if you routinely have to flip back and forth between modules, it is easiest to have almost no modules loaded in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and simply load them as you need them (and have the required &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; commands in your job submission scripts).&lt;br /&gt;
&lt;br /&gt;
=== Compilers ===&lt;br /&gt;
&lt;br /&gt;
The BGQ uses IBM XL compilers to cross-compile code for the BGQ.  Compilers are available for FORTRAN, C, and C++.  They are accessible by default, or by loading the '''xlf''' and '''vacpp''' modules. The compilers by default produce&lt;br /&gt;
static binaries, however with BGQ it is possible to now use dynamic libraries as well.  The compilers follow the XL conventions with the prefix '''bg''',&lt;br /&gt;
so '''bgxlc''' and '''bgxlf90''' are the C and FORTRAN compilers respectively.  &lt;br /&gt;
&lt;br /&gt;
Most users however will use the MPI variants, i.e. '''mpixlf90''' and '''mpixlc''' and  which are available by loading&lt;br /&gt;
the '''mpich2''' module. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load mpich2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is recommended to use at least the following flags when compiling and linking&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-O3 -qarch=qp -qtune=qp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to build a package for which the configure script tries to run small test jobs, the cross-compiling nature of the bgq can get in the way.  In that case, you should use the interactive [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] environment as described below.&lt;br /&gt;
&lt;br /&gt;
== ION/Devel Nodes ==&lt;br /&gt;
&lt;br /&gt;
There are also bgq native development nodes named '''bgqdev-ion[01-24]''' which one can login to directly, i.e. ssh, from '''bgqdev-fen1'''.  These nodes are extra I/O nodes that are essentially the same as the BGQ compute nodes with the exception that they run a full RedHat Linux and have an infiniband interface providing direct network access.    Unlike the regular development node, '''bgqdev-fen1''', which is Power7, this node has the same BGQ A2 processor, and thus cross compilations are not required which can make building some software easier.    &lt;br /&gt;
&lt;br /&gt;
'''NOTE''': BGQ MPI jobs can be compiled on these nodes, however can not be run locally as the mpich2 is setup for the BGQ network and thus will fail on these nodes.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
As the BlueGene/Q architecture is different from the development nodes, you cannot run applications intended/compiled for the BGQ on the devel nodes. The only way to run (or even test) your program is to submit a job to the BGQ.  Jobs are submitted as scripts through loadleveler. That script must then use '''runjob''' to start the job, which in many ways similar to mpirun or mpiexec.  As shown above in the network topology overview, there are only a few optimum job size configurations which is also further constrained by each block requiring a minimum of one IO node.  In SciNet's configuration (with 8 I/O nodes per midplane) this allows 64 nodes (1024 cores) to be the smallest block size. Normally a block size matches the job size to offer fully dedicated resources to the job.  Smaller jobs can be run within the same block however this results in shared resources (network and IO) and are referred to as sub-block jobs and are described in more detail below.  &lt;br /&gt;
&lt;br /&gt;
=== runjob ===&lt;br /&gt;
&lt;br /&gt;
All BGQ runs are launched using '''runjob''' which for those familiar with MPI is analogous to mpirun/mpiexec.  Jobs run on a block, which is a predefined group of nodes that have already been configured and booted.  There are two ways to get a block. One way is to use a 30-minute 'debugjob' session (more about that below). The other, more common case, is using a job script submitted and are running using loadleveler. Inside the job script, this block is set for you, and you do not have to specify the block name.  For example, if your loadleveler job script requests 64 nodes, each with 16 cores (for a total of 1024 cores), from within that job script, you can run a job with 16 processes per node and 1024 total processes with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here, &amp;lt;tt&amp;gt;--np 1024&amp;lt;/tt&amp;gt; sets the total number of mpi tasks, while &amp;lt;tt&amp;gt;--ranks-per-node=16&amp;lt;/tt&amp;gt; specifies that 16 processes should run on each node.&lt;br /&gt;
For pure mpi jobs, it is advisable always to give the number of ranks per node, because the default value of 1 may leave 15 cores on the node idle. The argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- (Note: If this were not a loadleveler job, and the block ID was R00-M0-N03-64, the command would be &amp;quot;&amp;lt;tt&amp;gt;runjob --block R00-M0-N03-64 --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&amp;lt;/tt&amp;gt;&amp;quot;) --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
runjob flags are shown with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
a particularly useful one is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--verbose #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where # is from 1-7 which can be helpful in debugging an application.&lt;br /&gt;
&lt;br /&gt;
=== How to set ranks-per-node ===&lt;br /&gt;
&lt;br /&gt;
There are 16 cores per node, but the argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64.  While it may seem natural to set ranks-per-node to 16, this is not generally recommended.  On the BGQ, one can efficiently run more than 1 process per core, because each core has four &amp;quot;hardware threads&amp;quot; (similar to HyperThreading on the GPC and Simultaneous Multi Threading on the TCS and P7), which can keep the different parts of each core busy at the same time. One would therefore ideally use 64 ranks per node.  There are two main reason why one might not set ranks-per-node to 64:&lt;br /&gt;
# The memory requirements do not allow 64 ranks (each rank only has 256MB of memory)&lt;br /&gt;
# The application is more efficient in a hybrid MPI/OpenMP mode (or MPI/pthreads). Using less ranks-per-node, the hardware threads are used as OpenMP threads within each process.&lt;br /&gt;
Because threads can share memory, the memory requirements of the hybrid runs is typically smaller than that of pure MPI runs.&lt;br /&gt;
&lt;br /&gt;
Note that the total number of mpi processes in a runjob (i.e., the --np argument) should be the ranks-per-node times the number of nodes (set by bg_size in the loadleveler script). So for the same number of nodes, if you change ranks-per-node by a factor of two, you should also multiply the total number of mpi processes by two.&lt;br /&gt;
&lt;br /&gt;
=== Queue Limits ===&lt;br /&gt;
&lt;br /&gt;
The maximum wall_clock_limit is 24 hours.  Official SOSCIP project jobs are prioritized over all other jobs using a fairshare algorithm with a 14 day rolling window.&lt;br /&gt;
&lt;br /&gt;
A 64 node block is reserved for development and interactive testing for 16 hours, from 8AM to midnight, everyday including weekends. While you can still reserve an interactive block from midnight to 8AM, the priority is given to batch jobs at that time interval in order to keep the machine usage as high as possible. This block is accessed by using the [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] command which has a 30 minute maximum wall_clock_limit. The purpose of this reservation is to ensure short testing jobs are run quickly without being held up by longer production type jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- We need to recover this functionality again. At the moment it doesn't work&lt;br /&gt;
=== BACKFILL scheduling ===&lt;br /&gt;
To optimize the cluster usage, we encourage users to submit jobs according to the available resources on BGQ. The command &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;llAvailableResources&amp;lt;/span&amp;gt; gives for example :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
On the Devel system : only a debugjob can start immediately&lt;br /&gt;
&lt;br /&gt;
On the Prod. system : a job will start immediately if you use 512 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 256 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 128 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 64 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Batch Jobs ===&lt;br /&gt;
&lt;br /&gt;
Job submission is done through loadleveler with a few blue gene specific commands.  The command &amp;quot;bg_size&amp;quot; is in number of nodes, not cores, so a bg_size=64 would be 64x16=1024 cores.&lt;br /&gt;
&lt;br /&gt;
The parameter &amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;bg_size&amp;lt;/span&amp;gt; can only be equal to 64, 128, 256, 512, 1024 and 2048.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;np&amp;lt;/span&amp;gt; &amp;amp;le; ranks-per-node * bg_size&lt;br /&gt;
&lt;br /&gt;
ranks-per-node &amp;amp;le; np&lt;br /&gt;
&lt;br /&gt;
(ranks-per-node * OMP_NUM_THREADS ) &amp;amp;le; 64 &lt;br /&gt;
&lt;br /&gt;
np : number of MPI processes&lt;br /&gt;
&lt;br /&gt;
ranks-per-node : number of MPI processes per node = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
OMP_NUM_THREADS : number of OpenMP thread per MPI process (for hybrid codes) = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue &lt;br /&gt;
&lt;br /&gt;
# Launch all BGQ jobs using runjob&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit to the queue use &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llsubmit myscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== Steps ( Job dependency) ===&lt;br /&gt;
LoadLeveler has a lot of advanced features to control job submission and execution. One of these features is called steps. This feature allows a series of jobs to be submitted using one script with dependencies defined between the jobs. What this allows is for a series of jobs to be run sequentially, waiting for the previous job, called a step, to be finished before the next job is started. The following example uses the same LoadLeveler script as previously shown, however the #@ step_name and #@ dependency directives are used to rerun the same case three times in a row, waiting until each job is finished to start the next.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step1                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the first step :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step1&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step2                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step1 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the second step if the first one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step2&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step3                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step2 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the third step if the second one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step3&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Jobs ===&lt;br /&gt;
&lt;br /&gt;
To see running jobs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq -b&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
to cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llcancel JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and to look at details of the bluegene resources use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstatus -M all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note: the loadleveler script commands  are not run on a bgq compute node but on the front-end node. Only programs started with runjob run on the bgq compute nodes. You should therefore keep scripting in the submission script to a bare minimum.'''&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Stats ===&lt;br /&gt;
&lt;br /&gt;
Use llbgstats to monitor your own stats and/or your group stats. PIs can also print their (current) monthly report.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstats -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Interactive Use / Debugging ===&lt;br /&gt;
&lt;br /&gt;
As BGQ codes are cross-compiled they cannot be run direclty on the front-nodes.  &lt;br /&gt;
Users however only have access to the BGQ through loadleveler which is appropriate for batch jobs, &lt;br /&gt;
however an interactive session is typically beneficial when debugging and developing.   As such a &lt;br /&gt;
script has been written to allow a session in which runjob can be run interactively.  The script&lt;br /&gt;
uses loadleveler to setup a block and set all the correct environment variables and then launch a spawned shell on&lt;br /&gt;
the front-end node. The '''debugjob''' session currently allows a 30 minute session on 64 nodes and when run on &lt;br /&gt;
'''&amp;lt;tt&amp;gt;bgqdev&amp;lt;/tt&amp;gt;''' runs in a dedicated reservation as described previously in the [[BGQ#Queue_Limits | queue limits]] section. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[user@bgqdev-fen1]$ debugjob&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ runjob --np 64 --ranks-per-node=16 --cwd=$PWD : $PWD/my_code -f myflags&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ exit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For debugging, gdb and Allinea DDT are available. The latter is recommended as it automatically attaches to all the processes of a process (instead of attaching a gdbtool by hand (as explained in the BGQ Application Development guide, link below). Simply compile with &amp;lt;tt&amp;gt;-g&amp;lt;/tt&amp;gt;, load the &amp;lt;tt&amp;gt;ddt/4.1&amp;lt;/tt&amp;gt; module, type &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt; and follow the graphical user interface.  The DDT user guide can be found below.&lt;br /&gt;
&lt;br /&gt;
Note: when running a job under ddt, you'll need to add &amp;quot;&amp;lt;tt&amp;gt;--ranks-per-node=X&amp;lt;/tt&amp;gt;&amp;quot; to the &amp;quot;runjob arguments&amp;quot; field.&lt;br /&gt;
&lt;br /&gt;
Apart from debugging, this environment is also useful for building libraries and applications that need to run small tests as part of their 'configure' step.   Within the debugjob session, applications compiled with the bgxl compilers or the mpcc/mpCC/mpfort wrappers, will automatically run on the BGQ, skipping the need for the runjob command, provided if you set the following environment variables &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export BG_PGM_LAUNCHER=yes&lt;br /&gt;
$ export RUNJOB_NP=1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The latter setting sets the number of mpi processes to run.  Most configure scripts expect only one mpi process, thus, &amp;lt;tt&amp;gt;RUNJOB_NP=1&amp;lt;/tt&amp;gt; is appropriate.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
debugjob session with an executable implicitly calls runjob  with 1 mpi task :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
debugjob -i&lt;br /&gt;
**********************************************************&lt;br /&gt;
 Interactive BGQ runjob shell using bgq-fen1-ib0.10295.0 and           &lt;br /&gt;
 LL14040718574824 for 30 minutes with 64 NODES (1024 cores). &lt;br /&gt;
 IMPLICIT MODE: running an executable implicitly calls runjob&lt;br /&gt;
                with 1 mpi task&lt;br /&gt;
 Exit shell when finished.                                &lt;br /&gt;
**********************************************************&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Sub-block jobs ===&lt;br /&gt;
&lt;br /&gt;
BGQ allows multiple applications to share the same block, which is referred to as sub-block jobs, however this needs to be done from within the same loadleveler submission script using multiple calls to runjob.  To run a sub-block job, you need to specify a &amp;quot;--corner&amp;quot; within the block to start each job and a 5D Torus AxBxCxDxE &amp;quot;--shape&amp;quot;.  The starting corner will depend on the specific block details provided by loadleveler and the shape and size of job trying to be used.  &lt;br /&gt;
&lt;br /&gt;
Figuring out what the corners and shapes should be is very tricky (especially since it depends on the block you get allocated).  For that reason, we've created a script called &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; that determines the corners and shape of the sub-blocks.  It only handles the (presumable common) case in which you want to subdivide the block into n equally sized sub-blocks, where n may be 1,2,4,8,16 and 32.&lt;br /&gt;
&lt;br /&gt;
Here is an example script calling &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; with a size of 4 that will return the appropriate $SHAPE argument and an array of 16 starting $CORNER. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# @ job_name           = bgsubblock&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job SUBBLOCK &amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue&lt;br /&gt;
&lt;br /&gt;
# Using subblocks script to set $SHAPE and array of ${CORNERS[n]}&lt;br /&gt;
# with size of subblocks in nodes (ie similiar to bg_size)&lt;br /&gt;
&lt;br /&gt;
# In this case 16 sub-blocks of 4 cnodes each (64 total ie bg_size)&lt;br /&gt;
source subblocks 4&lt;br /&gt;
&lt;br /&gt;
# 16 jobs of 4 each&lt;br /&gt;
for (( i=0; i &amp;lt;  16 ; i++)); do&lt;br /&gt;
   runjob --corner ${CORNER[$i]} --shape $SHAPE --np 64 --ranks-per-node=16 :  your_code_here &amp;gt; $i.out &amp;amp;&lt;br /&gt;
done&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Remember that subjobs are not the ideal way to run on the BlueGene/Qs. One needs to consider that these sub-blocks all have to share the same I/O nodes, so for I/O intensive jobs this will be an inefficient setup.  Also consider that if you need to run such small jobs that you have to run in sub-blocks, it may be more efficient to use other clusters such as the GPC.&lt;br /&gt;
&lt;br /&gt;
Let us know if you run into any issues with this technique, please contact bgq-support for help.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The BGQ has its own dedicated 500TB file system based on GPFS (General Parallel File System). There are two main systems for user data: /home, a small, backed-up space where user home directories are located, and /scratch, a large system for input or output data for jobs; data on /scratch is not backed up. The path to your home directory is in the environment variable $HOME, and will look like /home/G/GROUP/USER, .  The path to your scratch directory is in the environment variable $SCRATCH, and will look like /scratch/G/GROUP/USER (following the conventions of the rest of the SciNet systems).  &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! | file system &lt;br /&gt;
! | purpose &lt;br /&gt;
! | user quota &lt;br /&gt;
! | backed up&lt;br /&gt;
! | purged&lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 50 GB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
|-&lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| first of (20 TB ; 1 million files)&lt;br /&gt;
| no&lt;br /&gt;
| not currently&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Transfering files===&lt;br /&gt;
The BGQ GPFS file system,  except for HPSS, is '''not''' shared with the other SciNet systems (gpc, tcs, p7, arc), nor is the other file system mounted on the BGQ.  &lt;br /&gt;
Use scp to copy files from one file system to the other, e.g., from bgqdev-fen1, you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour login.scinet.utoronto.ca:code.tgz .&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or from a login node you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour code.tgz bgqdev.scinet.utoronto.ca:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The flag &amp;lt;tt&amp;gt;-c arcfour&amp;lt;/tt&amp;gt; is optional. It tells scp (or really, ssh), to use a non-default encryption. The one chosen here, arcfour, has been found to speed up the transfer by a factor of two (you may expect around 85MB/s).  This encryption method is only recommended for copying from the BGQ file system to the regular SciNet GPFS file system or back. &lt;br /&gt;
 &lt;br /&gt;
Note that although these transfers are witihin the same data center, you have to use the full names of the systems, login.scinet.utoronto.ca and bgq.scinet.utoronto.ca, respectively, and that you will be asked you for your password.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''diskUsage'''&amp;lt;/tt&amp;gt; command, available on the bgqdev nodes, provides information in a number of ways on the home and scratch file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;] [-de|-plot]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
       -plot: create plots of disk usages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that the information on usage and quota is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
===Bridge to HPSS===&lt;br /&gt;
&lt;br /&gt;
BGQ users may transfer material to/from HPSS via the GPC archive queue. On the HPSS gateway node (gpc-archive01), the BGQ GPFS file systems are mounted under a single mounting point /bgq (/bgq/scratch and /bgq/home). For detailed information on the use of HPSS [https://docs.scinet.utoronto.ca/index.php/HPSS please read the HPSS wiki section.]&lt;br /&gt;
&lt;br /&gt;
== Software modules installed on the BGQ ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! |Software  &lt;br /&gt;
! | Version&lt;br /&gt;
! | Comments&lt;br /&gt;
! | Command/Library&lt;br /&gt;
! | Module Name&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Compilers &amp;amp; Development Tools'''''&lt;br /&gt;
|-&lt;br /&gt;
|IBM fortran compiler&lt;br /&gt;
|14.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlf,bgxlf_r,bgxlf90,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|xlf&lt;br /&gt;
|-&lt;br /&gt;
|IBM c/c++ compilers&lt;br /&gt;
|12.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlc,bgxlC,bgxlc_r,bgxlC_r,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|vacpp&lt;br /&gt;
|-&lt;br /&gt;
|MPICH2 MPI library&lt;br /&gt;
|1.4.1&lt;br /&gt;
|There are 4 versions (see BGQ Applications Development document).&lt;br /&gt;
|&amp;lt;tt&amp;gt;mpicc,mpicxx,mpif77,mpif90&amp;lt;/tt&amp;gt;&lt;br /&gt;
|mpich2&lt;br /&gt;
|- &lt;br /&gt;
| GCC Compiler&lt;br /&gt;
| 4.4.6, 4.8.1&lt;br /&gt;
| GNU Compiler Collection for BGQ&amp;lt;br&amp;gt;(4.8.1 requires binutils/2.23 to be loaded)&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-gcc, powerpc64-bgq-linux-g++, powerpc64-bgq-linux-gfortran&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgqgcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Clang Compiler&lt;br /&gt;
| r217688-20140912, r263698-20160317&lt;br /&gt;
| Clang cross-compilers for bgq&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-clang, powerpc64-bgq-linux-clang++&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgclang&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Binutils&lt;br /&gt;
| 2.21.1, 2.23&lt;br /&gt;
| Cross-compilation utilities&lt;br /&gt;
| &amp;lt;tt&amp;gt;addr2line, ar, ld, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;binutils&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| CMake	&lt;br /&gt;
| 2.8.8, 2.8.12.1&lt;br /&gt;
| cross-platform, open-source build system&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Git&lt;br /&gt;
| 1.9.5&lt;br /&gt;
| Revision control system&lt;br /&gt;
| &amp;lt;tt&amp;gt;git, gitk&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;git&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Debug/performance tools'''''&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/gdb/ gdb]&lt;br /&gt;
| 7.2&lt;br /&gt;
| GNU Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/ddd/ ddd]&lt;br /&gt;
| 3.3.12&lt;br /&gt;
| GNO Data Display Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [http://www.allinea.com/products/ddt/ DDT]&lt;br /&gt;
| 4.1, 4.2, 5.0.1&lt;br /&gt;
| Allinea's Distributed Debugging Tool&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[HPCTW]]&lt;br /&gt;
| 1.0&lt;br /&gt;
| BGQ MPI and Hardware Counters&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmpihpm.a, libmpihpm_smp.a, libmpitrace.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hptibm&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[MemP]]&lt;br /&gt;
| 1.0.3&lt;br /&gt;
| BGQ Memory Stats&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmemP.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;memP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Storage tools/libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.9-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/189-v18-serial-xlc*&amp;lt;br/&amp;gt;hdf5/189-v18-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.12-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/1812-v18-serial-gcc&amp;lt;br/&amp;gt;hdf5/1812-v18-mpich2-gcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NetCDF&lt;br /&gt;
| 4.2.1.1&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncdump,ncgen,libnetcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;netcdf/4.2.1.1-serial-xlc*&amp;lt;br/&amp;gt;netcdf/4.2.1.1-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Parallel NetCDF&lt;br /&gt;
| 1.3.1&lt;br /&gt;
| Parallel scientific data storage and retrieval using MPI-IO&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpnetcdf.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parallel-netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| ESSL&lt;br /&gt;
| 5.1&lt;br /&gt;
| IBM Engineering and Scientific Subroutine Library (manual below)&lt;br /&gt;
| &amp;lt;tt&amp;gt;libesslbg,libesslsmpbg&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;essl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| WSMP&lt;br /&gt;
| 15.06.01&lt;br /&gt;
| Watson Sparse Matrix Package&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpwsmpBGQ.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;WSMP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| FFTW&lt;br /&gt;
| 2.1.5, 3.3.2, 3.1.2-esslwrapper&lt;br /&gt;
| Fast fourier transform &lt;br /&gt;
| &amp;lt;tt&amp;gt;libsfftw,libdfftw,libfftw3, libfftw3f&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;fftw/2.1.5, fftw/3.3.2, fftw/3.1.2-esslwrapper&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAPACK + ScaLAPACK&lt;br /&gt;
| 3.4.2 + 2.0.2&lt;br /&gt;
| Linear algebra routines. A subset of Lapack may be found in ESSL as well.&lt;br /&gt;
| &amp;lt;tt&amp;gt;liblapack, libscalpack&amp;lt;/tt&amp;gt;&lt;br /&gt;
| lapack&lt;br /&gt;
|-&lt;br /&gt;
| GSL&lt;br /&gt;
| 1.15&lt;br /&gt;
| GNU Scientific Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libgsl, libgslcblas&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gsl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| BOOST&lt;br /&gt;
| 1.47.0, 1.54, 1.57&lt;br /&gt;
| C++ Boost libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libboost...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cxxlibraries/boost&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| bzip2 + szip + zlib&lt;br /&gt;
| 1.0.6 + 2.1 + 1.2.7&lt;br /&gt;
| compression libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libbz2,libz,libsz&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;compression&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| METIS&lt;br /&gt;
| 5.0.2&lt;br /&gt;
| Serial Graph Partitioning and Fill-reducing Matrix Ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;metis&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| ParMETIS&lt;br /&gt;
| 4.0.2&lt;br /&gt;
| Parallel graph partitioning and fill-reducing matrix ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libparmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parmetis&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| OpenSSL&lt;br /&gt;
| 1.0.2 &lt;br /&gt;
| General-purpose cryptography library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libcrypto, libssl&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openssl&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| FILTLAN&lt;br /&gt;
| 1.0&lt;br /&gt;
| The Filtered Lanczos Package &lt;br /&gt;
| &amp;lt;tt&amp;gt;libdfiltlan,libdmatkit,libsfiltlan,libsmatkit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FILTLAN&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Scripting/interpreted languages'''''&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.6.6&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-2.6/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.7.3&lt;br /&gt;
| Python programming language. Modules included : numpy-1.8.0, pyFFTW-0.9.2, astropy-0.3, scipy-0.13.3, mpi4py-1.3.1, h5py-2.2.1&lt;br /&gt;
| &amp;lt;tt&amp;gt;/scinet/bgq/tools/Python/python2.7.3-20131205/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 3.2.2&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-3.2/bin/python3&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Applications'''''&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.abinit.org/ ABINIT]&lt;br /&gt;
| 7.10.4&lt;br /&gt;
| An atomic-scale simulation software suite&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.berkeleygw.org/ BerkeleyGW library]&lt;br /&gt;
| 1.0.4-2.0.0436&lt;br /&gt;
| Computes quasiparticle properties and the optical responses of a large variety of materials&lt;br /&gt;
| &amp;lt;tt&amp;gt;libBGW_wfn.a, wfn_rho_vxc_io_m.mod&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;BGW-paratec&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.cp2k.org/ CP2K]&lt;br /&gt;
| 2.3, 2.4, 2.5.1, 2.6.1&lt;br /&gt;
| DFT molecular dynamics, MPI &lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k.psmp&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.cpmd.org/ CPMD]&lt;br /&gt;
| 3.15.3, 3.17.1&lt;br /&gt;
| Carr-Parinello molecular dynamics, MPI&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd.x&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| gnuplot&lt;br /&gt;
| 4.6.1&lt;br /&gt;
| interactive plotting program to be run on front-end nodes&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAMMPS&lt;br /&gt;
| Nov 2012/7Dec15/7Dec15-mpi&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;lmp_bgq&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;lammps&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NAMD&lt;br /&gt;
| 2.9&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;namd2&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;namd/2.9-smp&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.quantum-espresso.org/index.php Quantum Espresso]&lt;br /&gt;
| 5.0.3/5.2.1&lt;br /&gt;
| Molecular Structure / Quantum Chemistry &lt;br /&gt;
| &amp;lt;tt&amp;gt;qe_pw.x, etc&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;espresso&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://openfoam.org OpenFOAM]&lt;br /&gt;
| 2.2.0, 2.3.0, 2.4.0, 3.0.1, 5.0&lt;br /&gt;
| Computational Fluid Dynamics&lt;br /&gt;
| &amp;lt;tt&amp;gt;icofoam,etc. &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openfoam/2.2.0, openfoam/2.3.0, openfoam/2.4.0, openfoam/3.0.1, openfoam/5.0&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Beta Tests'''''&lt;br /&gt;
|-&lt;br /&gt;
| WATSON API&lt;br /&gt;
| beta&lt;br /&gt;
| Natural Language Processing&lt;br /&gt;
| &amp;lt;tt&amp;gt;watson_beta&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FEN/WATSON&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== OpenFOAM on BGQ ===&lt;br /&gt;
&lt;br /&gt;
[[OpenFOAM_on_BGQ|A detailed explanation]] of OpenFOAM usage on the BG/Q cluster.&lt;br /&gt;
&lt;br /&gt;
== Python on BlueGene ==&lt;br /&gt;
Python 2.7.3 has been installed on BlueGene. To use &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Numpy&amp;lt;/span&amp;gt; and &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Scipy&amp;lt;/span&amp;gt;, the module &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;essl/5.1&amp;lt;/span&amp;gt; has to be loaded.&lt;br /&gt;
The full python path has to be provided (otherwise the default version is used).&lt;br /&gt;
&lt;br /&gt;
To use python on BlueGene (from within a job script or a debugjob session):&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load python/2.7.3&lt;br /&gt;
##Only if you need numpy/scipy :&lt;br /&gt;
module load xlf/14.1 essl/5.1&lt;br /&gt;
runjob --np 1 --ranks-per-node=1 --envs HOME=$HOME LD_LIBRARY_PATH=$LD_LIBRARY_PATH PYTHONPATH=/scinet/bgq/tools/Python/python2.7.3-20131205/lib/python2.7/site-packages/ : /scinet/bgq/tools/Python/python2.7.3-20131205/bin/python2.7 /PATHOFYOURSCRIPT.py &lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to use the mmap python API, you must use it in PRIVATE mode as shown in the bellow example :&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
import mmap&lt;br /&gt;
mm=mmap.mmap(-1,256,mmap.MAP_PRIVATE)&lt;br /&gt;
mm.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, you can use the mpi4py and h5py modules.&lt;br /&gt;
&lt;br /&gt;
Also, please read Cython documentation.&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
#BGQ Day: Introduction to Using the BG/Q [[Media:BgqintroUpdatedMarch2015.pdf|Slides (updated in 2015) ]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.mp4 (direct link)]&lt;br /&gt;
#BGQ Day: BG/Q Hardware Overview [https://support.scinet.utoronto.ca/~northrup/bgqhardware.pdf Slides] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.mp4 (direct link)]&lt;br /&gt;
# [http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUQUEEN/Documentation/Documention_node.html Julich BGQ Documentation]&lt;br /&gt;
# [https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q Argonne Mira BGQ Wiki]&lt;br /&gt;
# [https://computing.llnl.gov/tutorials/bgq/ LLNL Sequoia BGQ Info]&lt;br /&gt;
# [https://www.alcf.anl.gov/presentations Argonne MiraCon Presentations]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_SysAdmin.pdf|BGQ System Administration Guide]]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_AppDev.pdf|BGQ Application Development]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqccompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqclangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfcompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqflangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:Bgqfproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# [[Media:essl51.pdf|IBM ESSL (Engineering and Scientific Subroutine Library) 5.1 for Linux on Power]]&lt;br /&gt;
# [http://content.allinea.com/downloads/userguide.pdf Allinea DDT 4.1 User Guide]&lt;br /&gt;
# [https://www.ibm.com/support/knowledgecenter/en/SSFJTW_5.1.0/loadl.v5r1_welcome.html IBM LoadLeveler 5.1]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--  PUT IN TRAC !!!&lt;br /&gt;
&lt;br /&gt;
=== *Manual Block Creation* ===&lt;br /&gt;
&lt;br /&gt;
To reconfigure the BGQ nodes you can use the bg_console or the web based navigator from the service node &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bg_console&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are various options to create block types (section 3.2 in the BGQ admin manual), but the smallest is created using the&lt;br /&gt;
following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gen_small_block &amp;lt;blockid&amp;gt; &amp;lt;midplane&amp;gt; &amp;lt;cnodes&amp;gt; &amp;lt;nodeboard&amp;gt; &lt;br /&gt;
gen_small_block  R00-M0-N03-32 R00-M0 32 N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The block then needs to be booted using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
allocate R00-M0-N03-32&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If those resources are already booted into another block, that block must be freed before the new block can be &lt;br /&gt;
allocated.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
free R00-M0-N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many other functions in bg_console:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
help all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The BGQ default nomenclature for hardware is as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
(R)ack - (M)idplane - (N)ode board or block - (J)node - (C)ore&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So R00-M01-N03-J00-C02 would correspond to the first rack, second midplane, 3rd block, 1st node, and second core.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--!&amp;gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=1580</id>
		<title>BGQ</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=1580"/>
		<updated>2018-10-02T14:01:06Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* SOSCIP &amp;amp; LKSAVI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Blue_Gene_Cabinet.jpeg|center|300px|thumb]]&lt;br /&gt;
|name=Blue Gene/Q (BGQ)&lt;br /&gt;
|installed=Aug 2012, Nov 2014&lt;br /&gt;
|operatingsystem= RH6.3, CNK (Linux) &lt;br /&gt;
|loginnode= bgqdev-fen1&lt;br /&gt;
|nnodes=  4096 nodes (65,536 cores)&lt;br /&gt;
|rampernode=16 GB &lt;br /&gt;
|corespernode=16 (64 threads)&lt;br /&gt;
|interconnect=5D Torus (jobs), QDR Infiniband (I/O) &lt;br /&gt;
|vendorcompilers= bgxlc, bgxlf&lt;br /&gt;
|queuetype=Loadleveler&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==System Status==&lt;br /&gt;
&lt;br /&gt;
The current BGQ system status can be found on the wiki's [[Main Page]].&lt;br /&gt;
&lt;br /&gt;
==SOSCIP &amp;amp; LKSAVI==&lt;br /&gt;
&lt;br /&gt;
The BGQ is a Southern Ontario Smart Computing&lt;br /&gt;
Innovation Platform ([http://soscip.org/ SOSCIP]) BlueGene/Q supercomputer located at the&lt;br /&gt;
University of Toronto's SciNet HPC facility. The SOSCIP &lt;br /&gt;
multi-university/industry consortium is funded by the Ontario Government &lt;br /&gt;
and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
A half-rack of BlueGene/Q (8,192 cores) was purchased by the [http://likashingvirology.med.ualberta.ca/ Li Ka Shing Institute of Virology] at the University of Alberta in late fall 2014 and integrated into the existing BGQ system.&lt;br /&gt;
&lt;br /&gt;
The combined 4 rack system is the fastest Canadian supercomputer on the [http://top500.org/ top 500], currently at the 120th place (Nov 2015) -- 499th as of Aug 2018.&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:bgq-support@scinet.utoronto.ca &amp;lt;bgq-support@scinet.utoronto.ca&amp;gt;] for BGQ-specific inquiries.&lt;br /&gt;
&lt;br /&gt;
==Specifications==&lt;br /&gt;
&lt;br /&gt;
BGQ is an extremely dense and energy efficient 3rd generation Blue Gene IBM supercomputer built around a system-on-a-chip compute node that has a 16core 1.6GHz PowerPC based CPU (PowerPC A2) with 16GB of Ram.  The nodes are bundled in groups of 32 into a node board (512 cores), and 16 boards make up a midplane (8192 cores) with 2 midplanes per rack, or 16,348 cores and 16 TB of RAM per rack. The compute nodes run a very lightweight Linux-based operating system called CNK ('''C'''ompute '''N'''ode '''K'''ernel).  The compute nodes are all connected together using a custom 5D torus highspeed interconnect. Each rack has 16 I/O nodes that run a full Redhat Linux OS that manages the compute nodes and mounts the filesystem.  SciNet's BGQ consists of 8 mdiplanes (four-racks) totalling 65,536 cores and 64TB of RAM.&lt;br /&gt;
&lt;br /&gt;
[[Image:BlueGeneQHardware2.png‎ |center]]&lt;br /&gt;
&lt;br /&gt;
=== 5D Torus Network ===&lt;br /&gt;
&lt;br /&gt;
The network topology of BlueGene/Q is a five-dimensional (5D) torus, with direct links between the nearest neighbors in the ±A, ±B, ±C, ±D, and ±E directions.  As such there are only a few optimum block sizes that will use the network efficiently.&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellspacing=&amp;quot;0&amp;quot; cellpadding=&amp;quot;2&amp;quot;&lt;br /&gt;
| '''Node Boards '''&lt;br /&gt;
| '''Compute Nodes'''&lt;br /&gt;
| '''Cores'''&lt;br /&gt;
| '''Torus Dimensions'''&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 32&lt;br /&gt;
| 512&lt;br /&gt;
| 2x2x2x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 2 (adjacent pairs)&lt;br /&gt;
| 64&lt;br /&gt;
| 1024&lt;br /&gt;
| 2x2x4x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 4 (quadrants)&lt;br /&gt;
| 128&lt;br /&gt;
| 2048&lt;br /&gt;
| 2x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 8 (halves)&lt;br /&gt;
| 256&lt;br /&gt;
| 4096&lt;br /&gt;
| 4x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 16 (midplane)&lt;br /&gt;
| 512&lt;br /&gt;
| 8192&lt;br /&gt;
| 4x4x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 32 (1 rack)&lt;br /&gt;
| 1024&lt;br /&gt;
| 16384&lt;br /&gt;
| 4x4x4x8x2 &lt;br /&gt;
|-&lt;br /&gt;
| 64 (2 racks)&lt;br /&gt;
| 2048&lt;br /&gt;
| 32768&lt;br /&gt;
| 4x4x8x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 96 (3 racks)&lt;br /&gt;
| 3072&lt;br /&gt;
| 49152&lt;br /&gt;
| 4x4x12x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 128 (4 racks)&lt;br /&gt;
| 4096&lt;br /&gt;
| 65536&lt;br /&gt;
| 8x4x8x8x2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Login/Devel Node ==&lt;br /&gt;
&lt;br /&gt;
The development node is '''bgqdev-fen1''' which one can login to directly from outside using '''bgqdev.scinet.utoronto.ca''', e.g.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -l USERNAME bgqdev.scinet.utoronto.ca -X&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where USERNAME is your username on the BGQ and the &amp;lt;tt&amp;gt;-X&amp;lt;/tt&amp;gt; flag is optional, needed only if you will use X graphics.&amp;lt;br/&amp;gt;&lt;br /&gt;
Note: To learn how to setup ssh keys for logging in please see [[SSH keys]].&lt;br /&gt;
&lt;br /&gt;
These development node is a Power7 machines running Linux which serve as the compilation and submission host for the BGQ.  Programs are cross-compiled for the BGQ on this node and then submitted to the queue using loadleveler.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including most of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
A list of the installed software can be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module unload vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands can go in your .bashrc files to make sure you are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
Modules that load libraries, define environment variables pointing to the location of library files and include files for use Makefiles. These environment variables follow the naming convention&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 $SCINET_[short-module-name]_BASE&lt;br /&gt;
 $SCINET_[short-module-name]_LIB&lt;br /&gt;
 $SCINET_[short-module-name]_INC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
for the base location of the module's files, the location of the libraries binaries and the header files, respectively.&lt;br /&gt;
&lt;br /&gt;
So to compile and link the library, you will have to add &amp;lt;tt&amp;gt;-I${SCINET_[module-basename]_INC}&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;-L${SCINET_[module-basename]_LIB}&amp;lt;/tt&amp;gt;, respectively, in addition to the usual &amp;lt;tt&amp;gt;-l[libname]&amp;lt;/tt&amp;gt;.  &lt;br /&gt;
&lt;br /&gt;
Note that a &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; command ''only'' sets the environment variables in your current shell (and any subprocesses that the shell launches).   It does ''not'' effect other shell environments.&lt;br /&gt;
&lt;br /&gt;
If you always require the same modules, it is easiest to load those modules in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and then they will always be present in your environment; if you routinely have to flip back and forth between modules, it is easiest to have almost no modules loaded in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and simply load them as you need them (and have the required &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; commands in your job submission scripts).&lt;br /&gt;
&lt;br /&gt;
=== Compilers ===&lt;br /&gt;
&lt;br /&gt;
The BGQ uses IBM XL compilers to cross-compile code for the BGQ.  Compilers are available for FORTRAN, C, and C++.  They are accessible by default, or by loading the '''xlf''' and '''vacpp''' modules. The compilers by default produce&lt;br /&gt;
static binaries, however with BGQ it is possible to now use dynamic libraries as well.  The compilers follow the XL conventions with the prefix '''bg''',&lt;br /&gt;
so '''bgxlc''' and '''bgxlf90''' are the C and FORTRAN compilers respectively.  &lt;br /&gt;
&lt;br /&gt;
Most users however will use the MPI variants, i.e. '''mpixlf90''' and '''mpixlc''' and  which are available by loading&lt;br /&gt;
the '''mpich2''' module. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load mpich2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is recommended to use at least the following flags when compiling and linking&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-O3 -qarch=qp -qtune=qp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to build a package for which the configure script tries to run small test jobs, the cross-compiling nature of the bgq can get in the way.  In that case, you should use the interactive [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] environment as described below.&lt;br /&gt;
&lt;br /&gt;
== ION/Devel Nodes ==&lt;br /&gt;
&lt;br /&gt;
There are also bgq native development nodes named '''bgqdev-ion[01-24]''' which one can login to directly, i.e. ssh, from '''bgqdev-fen1'''.  These nodes are extra I/O nodes that are essentially the same as the BGQ compute nodes with the exception that they run a full RedHat Linux and have an infiniband interface providing direct network access.    Unlike the regular development node, '''bgqdev-fen1''', which is Power7, this node has the same BGQ A2 processor, and thus cross compilations are not required which can make building some software easier.    &lt;br /&gt;
&lt;br /&gt;
'''NOTE''': BGQ MPI jobs can be compiled on these nodes, however can not be run locally as the mpich2 is setup for the BGQ network and thus will fail on these nodes.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
As the BlueGene/Q architecture is different from the development nodes, you cannot run applications intended/compiled for the BGQ on the devel nodes. The only way to run (or even test) your program is to submit a job to the BGQ.  Jobs are submitted as scripts through loadleveler. That script must then use '''runjob''' to start the job, which in many ways similar to mpirun or mpiexec.  As shown above in the network topology overview, there are only a few optimum job size configurations which is also further constrained by each block requiring a minimum of one IO node.  In SciNet's configuration (with 8 I/O nodes per midplane) this allows 64 nodes (1024 cores) to be the smallest block size. Normally a block size matches the job size to offer fully dedicated resources to the job.  Smaller jobs can be run within the same block however this results in shared resources (network and IO) and are referred to as sub-block jobs and are described in more detail below.  &lt;br /&gt;
&lt;br /&gt;
=== runjob ===&lt;br /&gt;
&lt;br /&gt;
All BGQ runs are launched using '''runjob''' which for those familiar with MPI is analogous to mpirun/mpiexec.  Jobs run on a block, which is a predefined group of nodes that have already been configured and booted.  There are two ways to get a block. One way is to use a 30-minute 'debugjob' session (more about that below). The other, more common case, is using a job script submitted and are running using loadleveler. Inside the job script, this block is set for you, and you do not have to specify the block name.  For example, if your loadleveler job script requests 64 nodes, each with 16 cores (for a total of 1024 cores), from within that job script, you can run a job with 16 processes per node and 1024 total processes with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here, &amp;lt;tt&amp;gt;--np 1024&amp;lt;/tt&amp;gt; sets the total number of mpi tasks, while &amp;lt;tt&amp;gt;--ranks-per-node=16&amp;lt;/tt&amp;gt; specifies that 16 processes should run on each node.&lt;br /&gt;
For pure mpi jobs, it is advisable always to give the number of ranks per node, because the default value of 1 may leave 15 cores on the node idle. The argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- (Note: If this were not a loadleveler job, and the block ID was R00-M0-N03-64, the command would be &amp;quot;&amp;lt;tt&amp;gt;runjob --block R00-M0-N03-64 --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&amp;lt;/tt&amp;gt;&amp;quot;) --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
runjob flags are shown with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
a particularly useful one is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--verbose #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where # is from 1-7 which can be helpful in debugging an application.&lt;br /&gt;
&lt;br /&gt;
=== How to set ranks-per-node ===&lt;br /&gt;
&lt;br /&gt;
There are 16 cores per node, but the argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64.  While it may seem natural to set ranks-per-node to 16, this is not generally recommended.  On the BGQ, one can efficiently run more than 1 process per core, because each core has four &amp;quot;hardware threads&amp;quot; (similar to HyperThreading on the GPC and Simultaneous Multi Threading on the TCS and P7), which can keep the different parts of each core busy at the same time. One would therefore ideally use 64 ranks per node.  There are two main reason why one might not set ranks-per-node to 64:&lt;br /&gt;
# The memory requirements do not allow 64 ranks (each rank only has 256MB of memory)&lt;br /&gt;
# The application is more efficient in a hybrid MPI/OpenMP mode (or MPI/pthreads). Using less ranks-per-node, the hardware threads are used as OpenMP threads within each process.&lt;br /&gt;
Because threads can share memory, the memory requirements of the hybrid runs is typically smaller than that of pure MPI runs.&lt;br /&gt;
&lt;br /&gt;
Note that the total number of mpi processes in a runjob (i.e., the --np argument) should be the ranks-per-node times the number of nodes (set by bg_size in the loadleveler script). So for the same number of nodes, if you change ranks-per-node by a factor of two, you should also multiply the total number of mpi processes by two.&lt;br /&gt;
&lt;br /&gt;
=== Queue Limits ===&lt;br /&gt;
&lt;br /&gt;
The maximum wall_clock_limit is 24 hours.  Official SOSCIP project jobs are prioritized over all other jobs using a fairshare algorithm with a 14 day rolling window.&lt;br /&gt;
&lt;br /&gt;
A 64 node block is reserved for development and interactive testing for 16 hours, from 8AM to midnight, everyday including weekends. While you can still reserve an interactive block from midnight to 8AM, the priority is given to batch jobs at that time interval in order to keep the machine usage as high as possible. This block is accessed by using the [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] command which has a 30 minute maximum wall_clock_limit. The purpose of this reservation is to ensure short testing jobs are run quickly without being held up by longer production type jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- We need to recover this functionality again. At the moment it doesn't work&lt;br /&gt;
=== BACKFILL scheduling ===&lt;br /&gt;
To optimize the cluster usage, we encourage users to submit jobs according to the available resources on BGQ. The command &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;llAvailableResources&amp;lt;/span&amp;gt; gives for example :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
On the Devel system : only a debugjob can start immediately&lt;br /&gt;
&lt;br /&gt;
On the Prod. system : a job will start immediately if you use 512 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 256 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 128 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 64 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Batch Jobs ===&lt;br /&gt;
&lt;br /&gt;
Job submission is done through loadleveler with a few blue gene specific commands.  The command &amp;quot;bg_size&amp;quot; is in number of nodes, not cores, so a bg_size=64 would be 64x16=1024 cores.&lt;br /&gt;
&lt;br /&gt;
The parameter &amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;bg_size&amp;lt;/span&amp;gt; can only be equal to 64, 128, 256, 512, 1024 and 2048.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;np&amp;lt;/span&amp;gt; &amp;amp;le; ranks-per-node * bg_size&lt;br /&gt;
&lt;br /&gt;
ranks-per-node &amp;amp;le; np&lt;br /&gt;
&lt;br /&gt;
(ranks-per-node * OMP_NUM_THREADS ) &amp;amp;le; 64 &lt;br /&gt;
&lt;br /&gt;
np : number of MPI processes&lt;br /&gt;
&lt;br /&gt;
ranks-per-node : number of MPI processes per node = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
OMP_NUM_THREADS : number of OpenMP thread per MPI process (for hybrid codes) = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue &lt;br /&gt;
&lt;br /&gt;
# Launch all BGQ jobs using runjob&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit to the queue use &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llsubmit myscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== Steps ( Job dependency) ===&lt;br /&gt;
LoadLeveler has a lot of advanced features to control job submission and execution. One of these features is called steps. This feature allows a series of jobs to be submitted using one script with dependencies defined between the jobs. What this allows is for a series of jobs to be run sequentially, waiting for the previous job, called a step, to be finished before the next job is started. The following example uses the same LoadLeveler script as previously shown, however the #@ step_name and #@ dependency directives are used to rerun the same case three times in a row, waiting until each job is finished to start the next.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step1                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the first step :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step1&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step2                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step1 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the second step if the first one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step2&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step3                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step2 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the third step if the second one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step3&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Jobs ===&lt;br /&gt;
&lt;br /&gt;
To see running jobs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq -b&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
to cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llcancel JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and to look at details of the bluegene resources use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstatus -M all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note: the loadleveler script commands  are not run on a bgq compute node but on the front-end node. Only programs started with runjob run on the bgq compute nodes. You should therefore keep scripting in the submission script to a bare minimum.'''&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Stats ===&lt;br /&gt;
&lt;br /&gt;
Use llbgstats to monitor your own stats and/or your group stats. PIs can also print their (current) monthly report.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstats -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Interactive Use / Debugging ===&lt;br /&gt;
&lt;br /&gt;
As BGQ codes are cross-compiled they cannot be run direclty on the front-nodes.  &lt;br /&gt;
Users however only have access to the BGQ through loadleveler which is appropriate for batch jobs, &lt;br /&gt;
however an interactive session is typically beneficial when debugging and developing.   As such a &lt;br /&gt;
script has been written to allow a session in which runjob can be run interactively.  The script&lt;br /&gt;
uses loadleveler to setup a block and set all the correct environment variables and then launch a spawned shell on&lt;br /&gt;
the front-end node. The '''debugjob''' session currently allows a 30 minute session on 64 nodes and when run on &lt;br /&gt;
'''&amp;lt;tt&amp;gt;bgqdev&amp;lt;/tt&amp;gt;''' runs in a dedicated reservation as described previously in the [[BGQ#Queue_Limits | queue limits]] section. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[user@bgqdev-fen1]$ debugjob&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ runjob --np 64 --ranks-per-node=16 --cwd=$PWD : $PWD/my_code -f myflags&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ exit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For debugging, gdb and Allinea DDT are available. The latter is recommended as it automatically attaches to all the processes of a process (instead of attaching a gdbtool by hand (as explained in the BGQ Application Development guide, link below). Simply compile with &amp;lt;tt&amp;gt;-g&amp;lt;/tt&amp;gt;, load the &amp;lt;tt&amp;gt;ddt/4.1&amp;lt;/tt&amp;gt; module, type &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt; and follow the graphical user interface.  The DDT user guide can be found below.&lt;br /&gt;
&lt;br /&gt;
Note: when running a job under ddt, you'll need to add &amp;quot;&amp;lt;tt&amp;gt;--ranks-per-node=X&amp;lt;/tt&amp;gt;&amp;quot; to the &amp;quot;runjob arguments&amp;quot; field.&lt;br /&gt;
&lt;br /&gt;
Apart from debugging, this environment is also useful for building libraries and applications that need to run small tests as part of their 'configure' step.   Within the debugjob session, applications compiled with the bgxl compilers or the mpcc/mpCC/mpfort wrappers, will automatically run on the BGQ, skipping the need for the runjob command, provided if you set the following environment variables &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export BG_PGM_LAUNCHER=yes&lt;br /&gt;
$ export RUNJOB_NP=1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The latter setting sets the number of mpi processes to run.  Most configure scripts expect only one mpi process, thus, &amp;lt;tt&amp;gt;RUNJOB_NP=1&amp;lt;/tt&amp;gt; is appropriate.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
debugjob session with an executable implicitly calls runjob  with 1 mpi task :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
debugjob -i&lt;br /&gt;
**********************************************************&lt;br /&gt;
 Interactive BGQ runjob shell using bgq-fen1-ib0.10295.0 and           &lt;br /&gt;
 LL14040718574824 for 30 minutes with 64 NODES (1024 cores). &lt;br /&gt;
 IMPLICIT MODE: running an executable implicitly calls runjob&lt;br /&gt;
                with 1 mpi task&lt;br /&gt;
 Exit shell when finished.                                &lt;br /&gt;
**********************************************************&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Sub-block jobs ===&lt;br /&gt;
&lt;br /&gt;
BGQ allows multiple applications to share the same block, which is referred to as sub-block jobs, however this needs to be done from within the same loadleveler submission script using multiple calls to runjob.  To run a sub-block job, you need to specify a &amp;quot;--corner&amp;quot; within the block to start each job and a 5D Torus AxBxCxDxE &amp;quot;--shape&amp;quot;.  The starting corner will depend on the specific block details provided by loadleveler and the shape and size of job trying to be used.  &lt;br /&gt;
&lt;br /&gt;
Figuring out what the corners and shapes should be is very tricky (especially since it depends on the block you get allocated).  For that reason, we've created a script called &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; that determines the corners and shape of the sub-blocks.  It only handles the (presumable common) case in which you want to subdivide the block into n equally sized sub-blocks, where n may be 1,2,4,8,16 and 32.&lt;br /&gt;
&lt;br /&gt;
Here is an example script calling &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; with a size of 4 that will return the appropriate $SHAPE argument and an array of 16 starting $CORNER. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# @ job_name           = bgsubblock&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job SUBBLOCK &amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue&lt;br /&gt;
&lt;br /&gt;
# Using subblocks script to set $SHAPE and array of ${CORNERS[n]}&lt;br /&gt;
# with size of subblocks in nodes (ie similiar to bg_size)&lt;br /&gt;
&lt;br /&gt;
# In this case 16 sub-blocks of 4 cnodes each (64 total ie bg_size)&lt;br /&gt;
source subblocks 4&lt;br /&gt;
&lt;br /&gt;
# 16 jobs of 4 each&lt;br /&gt;
for (( i=0; i &amp;lt;  16 ; i++)); do&lt;br /&gt;
   runjob --corner ${CORNER[$i]} --shape $SHAPE --np 64 --ranks-per-node=16 :  your_code_here &amp;gt; $i.out &amp;amp;&lt;br /&gt;
done&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Remember that subjobs are not the ideal way to run on the BlueGene/Qs. One needs to consider that these sub-blocks all have to share the same I/O nodes, so for I/O intensive jobs this will be an inefficient setup.  Also consider that if you need to run such small jobs that you have to run in sub-blocks, it may be more efficient to use other clusters such as the GPC.&lt;br /&gt;
&lt;br /&gt;
Let us know if you run into any issues with this technique, please contact bgq-support for help.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The BGQ has its own dedicated 500TB file system based on GPFS (General Parallel File System). There are two main systems for user data: /home, a small, backed-up space where user home directories are located, and /scratch, a large system for input or output data for jobs; data on /scratch is not backed up. The path to your home directory is in the environment variable $HOME, and will look like /home/G/GROUP/USER, .  The path to your scratch directory is in the environment variable $SCRATCH, and will look like /scratch/G/GROUP/USER (following the conventions of the rest of the SciNet systems).  &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! | file system &lt;br /&gt;
! | purpose &lt;br /&gt;
! | user quota &lt;br /&gt;
! | backed up&lt;br /&gt;
! | purged&lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 50 GB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
|-&lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| first of (20 TB ; 1 million files)&lt;br /&gt;
| no&lt;br /&gt;
| not currently&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Transfering files===&lt;br /&gt;
The BGQ GPFS file system,  except for HPSS, is '''not''' shared with the other SciNet systems (gpc, tcs, p7, arc), nor is the other file system mounted on the BGQ.  &lt;br /&gt;
Use scp to copy files from one file system to the other, e.g., from bgqdev-fen1, you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour login.scinet.utoronto.ca:code.tgz .&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or from a login node you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour code.tgz bgqdev.scinet.utoronto.ca:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The flag &amp;lt;tt&amp;gt;-c arcfour&amp;lt;/tt&amp;gt; is optional. It tells scp (or really, ssh), to use a non-default encryption. The one chosen here, arcfour, has been found to speed up the transfer by a factor of two (you may expect around 85MB/s).  This encryption method is only recommended for copying from the BGQ file system to the regular SciNet GPFS file system or back. &lt;br /&gt;
 &lt;br /&gt;
Note that although these transfers are witihin the same data center, you have to use the full names of the systems, login.scinet.utoronto.ca and bgq.scinet.utoronto.ca, respectively, and that you will be asked you for your password.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''diskUsage'''&amp;lt;/tt&amp;gt; command, available on the bgqdev nodes, provides information in a number of ways on the home and scratch file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;] [-de|-plot]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
       -plot: create plots of disk usages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that the information on usage and quota is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
===Bridge to HPSS===&lt;br /&gt;
&lt;br /&gt;
BGQ users may transfer material to/from HPSS via the GPC archive queue. On the HPSS gateway node (gpc-archive01), the BGQ GPFS file systems are mounted under a single mounting point /bgq (/bgq/scratch and /bgq/home). For detailed information on the use of HPSS [https://docs.scinet.utoronto.ca/index.php/HPSS please read the HPSS wiki section.]&lt;br /&gt;
&lt;br /&gt;
== Software modules installed on the BGQ ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! |Software  &lt;br /&gt;
! | Version&lt;br /&gt;
! | Comments&lt;br /&gt;
! | Command/Library&lt;br /&gt;
! | Module Name&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Compilers &amp;amp; Development Tools'''''&lt;br /&gt;
|-&lt;br /&gt;
|IBM fortran compiler&lt;br /&gt;
|14.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlf,bgxlf_r,bgxlf90,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|xlf&lt;br /&gt;
|-&lt;br /&gt;
|IBM c/c++ compilers&lt;br /&gt;
|12.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlc,bgxlC,bgxlc_r,bgxlC_r,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|vacpp&lt;br /&gt;
|-&lt;br /&gt;
|MPICH2 MPI library&lt;br /&gt;
|1.4.1&lt;br /&gt;
|There are 4 versions (see BGQ Applications Development document).&lt;br /&gt;
|&amp;lt;tt&amp;gt;mpicc,mpicxx,mpif77,mpif90&amp;lt;/tt&amp;gt;&lt;br /&gt;
|mpich2&lt;br /&gt;
|- &lt;br /&gt;
| GCC Compiler&lt;br /&gt;
| 4.4.6, 4.8.1&lt;br /&gt;
| GNU Compiler Collection for BGQ&amp;lt;br&amp;gt;(4.8.1 requires binutils/2.23 to be loaded)&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-gcc, powerpc64-bgq-linux-g++, powerpc64-bgq-linux-gfortran&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgqgcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Clang Compiler&lt;br /&gt;
| r217688-20140912, r263698-20160317&lt;br /&gt;
| Clang cross-compilers for bgq&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-clang, powerpc64-bgq-linux-clang++&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgclang&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Binutils&lt;br /&gt;
| 2.21.1, 2.23&lt;br /&gt;
| Cross-compilation utilities&lt;br /&gt;
| &amp;lt;tt&amp;gt;addr2line, ar, ld, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;binutils&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| CMake	&lt;br /&gt;
| 2.8.8, 2.8.12.1&lt;br /&gt;
| cross-platform, open-source build system&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Git&lt;br /&gt;
| 1.9.5&lt;br /&gt;
| Revision control system&lt;br /&gt;
| &amp;lt;tt&amp;gt;git, gitk&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;git&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Debug/performance tools'''''&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/gdb/ gdb]&lt;br /&gt;
| 7.2&lt;br /&gt;
| GNU Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/ddd/ ddd]&lt;br /&gt;
| 3.3.12&lt;br /&gt;
| GNO Data Display Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [http://www.allinea.com/products/ddt/ DDT]&lt;br /&gt;
| 4.1, 4.2, 5.0.1&lt;br /&gt;
| Allinea's Distributed Debugging Tool&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[HPCTW]]&lt;br /&gt;
| 1.0&lt;br /&gt;
| BGQ MPI and Hardware Counters&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmpihpm.a, libmpihpm_smp.a, libmpitrace.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hptibm&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[MemP]]&lt;br /&gt;
| 1.0.3&lt;br /&gt;
| BGQ Memory Stats&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmemP.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;memP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Storage tools/libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.9-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/189-v18-serial-xlc*&amp;lt;br/&amp;gt;hdf5/189-v18-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.12-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/1812-v18-serial-gcc&amp;lt;br/&amp;gt;hdf5/1812-v18-mpich2-gcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NetCDF&lt;br /&gt;
| 4.2.1.1&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncdump,ncgen,libnetcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;netcdf/4.2.1.1-serial-xlc*&amp;lt;br/&amp;gt;netcdf/4.2.1.1-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Parallel NetCDF&lt;br /&gt;
| 1.3.1&lt;br /&gt;
| Parallel scientific data storage and retrieval using MPI-IO&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpnetcdf.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parallel-netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| ESSL&lt;br /&gt;
| 5.1&lt;br /&gt;
| IBM Engineering and Scientific Subroutine Library (manual below)&lt;br /&gt;
| &amp;lt;tt&amp;gt;libesslbg,libesslsmpbg&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;essl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| WSMP&lt;br /&gt;
| 15.06.01&lt;br /&gt;
| Watson Sparse Matrix Package&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpwsmpBGQ.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;WSMP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| FFTW&lt;br /&gt;
| 2.1.5, 3.3.2, 3.1.2-esslwrapper&lt;br /&gt;
| Fast fourier transform &lt;br /&gt;
| &amp;lt;tt&amp;gt;libsfftw,libdfftw,libfftw3, libfftw3f&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;fftw/2.1.5, fftw/3.3.2, fftw/3.1.2-esslwrapper&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAPACK + ScaLAPACK&lt;br /&gt;
| 3.4.2 + 2.0.2&lt;br /&gt;
| Linear algebra routines. A subset of Lapack may be found in ESSL as well.&lt;br /&gt;
| &amp;lt;tt&amp;gt;liblapack, libscalpack&amp;lt;/tt&amp;gt;&lt;br /&gt;
| lapack&lt;br /&gt;
|-&lt;br /&gt;
| GSL&lt;br /&gt;
| 1.15&lt;br /&gt;
| GNU Scientific Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libgsl, libgslcblas&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gsl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| BOOST&lt;br /&gt;
| 1.47.0, 1.54, 1.57&lt;br /&gt;
| C++ Boost libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libboost...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cxxlibraries/boost&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| bzip2 + szip + zlib&lt;br /&gt;
| 1.0.6 + 2.1 + 1.2.7&lt;br /&gt;
| compression libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libbz2,libz,libsz&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;compression&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| METIS&lt;br /&gt;
| 5.0.2&lt;br /&gt;
| Serial Graph Partitioning and Fill-reducing Matrix Ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;metis&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| ParMETIS&lt;br /&gt;
| 4.0.2&lt;br /&gt;
| Parallel graph partitioning and fill-reducing matrix ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libparmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parmetis&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| OpenSSL&lt;br /&gt;
| 1.0.2 &lt;br /&gt;
| General-purpose cryptography library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libcrypto, libssl&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openssl&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| FILTLAN&lt;br /&gt;
| 1.0&lt;br /&gt;
| The Filtered Lanczos Package &lt;br /&gt;
| &amp;lt;tt&amp;gt;libdfiltlan,libdmatkit,libsfiltlan,libsmatkit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FILTLAN&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Scripting/interpreted languages'''''&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.6.6&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-2.6/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.7.3&lt;br /&gt;
| Python programming language. Modules included : numpy-1.8.0, pyFFTW-0.9.2, astropy-0.3, scipy-0.13.3, mpi4py-1.3.1, h5py-2.2.1&lt;br /&gt;
| &amp;lt;tt&amp;gt;/scinet/bgq/tools/Python/python2.7.3-20131205/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 3.2.2&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-3.2/bin/python3&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Applications'''''&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.abinit.org/ ABINIT]&lt;br /&gt;
| 7.10.4&lt;br /&gt;
| An atomic-scale simulation software suite&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.berkeleygw.org/ BerkeleyGW library]&lt;br /&gt;
| 1.0.4-2.0.0436&lt;br /&gt;
| Computes quasiparticle properties and the optical responses of a large variety of materials&lt;br /&gt;
| &amp;lt;tt&amp;gt;libBGW_wfn.a, wfn_rho_vxc_io_m.mod&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;BGW-paratec&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.cp2k.org/ CP2K]&lt;br /&gt;
| 2.3, 2.4, 2.5.1, 2.6.1&lt;br /&gt;
| DFT molecular dynamics, MPI &lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k.psmp&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.cpmd.org/ CPMD]&lt;br /&gt;
| 3.15.3, 3.17.1&lt;br /&gt;
| Carr-Parinello molecular dynamics, MPI&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd.x&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| gnuplot&lt;br /&gt;
| 4.6.1&lt;br /&gt;
| interactive plotting program to be run on front-end nodes&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAMMPS&lt;br /&gt;
| Nov 2012/7Dec15/7Dec15-mpi&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;lmp_bgq&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;lammps&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NAMD&lt;br /&gt;
| 2.9&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;namd2&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;namd/2.9-smp&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.quantum-espresso.org/index.php Quantum Espresso]&lt;br /&gt;
| 5.0.3/5.2.1&lt;br /&gt;
| Molecular Structure / Quantum Chemistry &lt;br /&gt;
| &amp;lt;tt&amp;gt;qe_pw.x, etc&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;espresso&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://openfoam.org OpenFOAM]&lt;br /&gt;
| 2.2.0, 2.3.0, 2.4.0, 3.0.1, 5.0&lt;br /&gt;
| Computational Fluid Dynamics&lt;br /&gt;
| &amp;lt;tt&amp;gt;icofoam,etc. &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openfoam/2.2.0, openfoam/2.3.0, openfoam/2.4.0, openfoam/3.0.1, openfoam/5.0&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Beta Tests'''''&lt;br /&gt;
|-&lt;br /&gt;
| WATSON API&lt;br /&gt;
| beta&lt;br /&gt;
| Natural Language Processing&lt;br /&gt;
| &amp;lt;tt&amp;gt;watson_beta&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FEN/WATSON&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== OpenFOAM on BGQ ===&lt;br /&gt;
&lt;br /&gt;
[[OpenFOAM_on_BGQ|A detailed explanation]] of OpenFOAM usage on the BG/Q cluster.&lt;br /&gt;
&lt;br /&gt;
== Python on BlueGene ==&lt;br /&gt;
Python 2.7.3 has been installed on BlueGene. To use &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Numpy&amp;lt;/span&amp;gt; and &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Scipy&amp;lt;/span&amp;gt;, the module &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;essl/5.1&amp;lt;/span&amp;gt; has to be loaded.&lt;br /&gt;
The full python path has to be provided (otherwise the default version is used).&lt;br /&gt;
&lt;br /&gt;
To use python on BlueGene (from within a job script or a debugjob session):&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load python/2.7.3&lt;br /&gt;
##Only if you need numpy/scipy :&lt;br /&gt;
module load xlf/14.1 essl/5.1&lt;br /&gt;
runjob --np 1 --ranks-per-node=1 --envs HOME=$HOME LD_LIBRARY_PATH=$LD_LIBRARY_PATH PYTHONPATH=/scinet/bgq/tools/Python/python2.7.3-20131205/lib/python2.7/site-packages/ : /scinet/bgq/tools/Python/python2.7.3-20131205/bin/python2.7 /PATHOFYOURSCRIPT.py &lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to use the mmap python API, you must use it in PRIVATE mode as shown in the bellow example :&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
import mmap&lt;br /&gt;
mm=mmap.mmap(-1,256,mmap.MAP_PRIVATE)&lt;br /&gt;
mm.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, you can use the mpi4py and h5py modules.&lt;br /&gt;
&lt;br /&gt;
Also, please read Cython documentation.&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
#BGQ Day: Introduction to Using the BG/Q [[Media:BgqintroUpdatedMarch2015.pdf|Slides (updated in 2015) ]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.mp4 (direct link)]&lt;br /&gt;
#BGQ Day: BG/Q Hardware Overview [https://support.scinet.utoronto.ca/~northrup/bgqhardware.pdf Slides] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.mp4 (direct link)]&lt;br /&gt;
# [http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUQUEEN/Documentation/Documention_node.html Julich BGQ Documentation]&lt;br /&gt;
# [https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q Argonne Mira BGQ Wiki]&lt;br /&gt;
# [https://computing.llnl.gov/tutorials/bgq/ LLNL Sequoia BGQ Info]&lt;br /&gt;
# [https://www.alcf.anl.gov/presentations Argonne MiraCon Presentations]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_SysAdmin.pdf|BGQ System Administration Guide]]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_AppDev.pdf|BGQ Application Development]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqccompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqclangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfcompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqflangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:Bgqfproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# [[Media:essl51.pdf|IBM ESSL (Engineering and Scientific Subroutine Library) 5.1 for Linux on Power]]&lt;br /&gt;
# [http://content.allinea.com/downloads/userguide.pdf Allinea DDT 4.1 User Guide]&lt;br /&gt;
# [https://www.ibm.com/support/knowledgecenter/en/SSFJTW_5.1.0/loadl.v5r1_welcome.html IBM LoadLeveler 5.1]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--  PUT IN TRAC !!!&lt;br /&gt;
&lt;br /&gt;
=== *Manual Block Creation* ===&lt;br /&gt;
&lt;br /&gt;
To reconfigure the BGQ nodes you can use the bg_console or the web based navigator from the service node &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bg_console&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are various options to create block types (section 3.2 in the BGQ admin manual), but the smallest is created using the&lt;br /&gt;
following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gen_small_block &amp;lt;blockid&amp;gt; &amp;lt;midplane&amp;gt; &amp;lt;cnodes&amp;gt; &amp;lt;nodeboard&amp;gt; &lt;br /&gt;
gen_small_block  R00-M0-N03-32 R00-M0 32 N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The block then needs to be booted using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
allocate R00-M0-N03-32&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If those resources are already booted into another block, that block must be freed before the new block can be &lt;br /&gt;
allocated.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
free R00-M0-N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many other functions in bg_console:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
help all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The BGQ default nomenclature for hardware is as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
(R)ack - (M)idplane - (N)ode board or block - (J)node - (C)ore&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So R00-M01-N03-J00-C02 would correspond to the first rack, second midplane, 3rd block, 1st node, and second core.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--!&amp;gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Niagara_Quickstart&amp;diff=1579</id>
		<title>Niagara Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Niagara_Quickstart&amp;diff=1579"/>
		<updated>2018-10-01T15:35:00Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* Example submission script (OpenMP) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Niagara.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Niagara&lt;br /&gt;
|installed=Jan 2018&lt;br /&gt;
|operatingsystem= CentOS 7.4 &lt;br /&gt;
|loginnode= niagara.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1500 nodes (60,000 cores)&lt;br /&gt;
|rampernode=188 GiB / 202 GB  &lt;br /&gt;
|corespernode=40 (80 hyperthreads)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|vendorcompilers= icc (C) ifort (fortran) icpc (C++)&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=Specifications=&lt;br /&gt;
&lt;br /&gt;
The Niagara cluster is a large cluster of 1500 Lenovo SD350 servers each with 40 Intel &amp;quot;Skylake&amp;quot; cores at 2.4 GHz. &lt;br /&gt;
The peak performance of the cluster is 3.02 PFlops delivered / 4.6 PFlops theoretical.  It is the 53rd fastest supercomputer on the [https://www.top500.org/list/2018/06/?page=1 TOP500 list of June 2018]. &lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 188 GiB / 202 GB RAM per node (at least 4 GiB/core for user jobs).  Being designed for large parallel workloads, it has a fast interconnect consisting of EDR InfiniBand in a Dragonfly+ topology with Adaptive Routing.  The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 12 or 24 hours (for default or RAC accounts, respectively) and favours large jobs.&lt;br /&gt;
&lt;br /&gt;
* See the [https://support.scinet.utoronto.ca/education/go.php/370/content.php/cid/1383/  &amp;quot;Intro to Niagara&amp;quot;] recording&lt;br /&gt;
&lt;br /&gt;
More detailed hardware characteristics of the Niagara supercomputer can be found [https://docs.computecanada.ca/wiki/Niagara on this page].&lt;br /&gt;
&lt;br /&gt;
= Getting started on Niagara =&lt;br /&gt;
&lt;br /&gt;
Those of you new to SciNet and belonging to a group whose primary PI does not have an allocation, as granted in the annual [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions Compute Canada RAC], must first follow the old route of [https://www.scinethpc.ca/getting-a-scinet-account/ requesting a SciNet Consortium Account on the CCDB site] to gain access to Niagara.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [[FAQ]] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Niagara runs CentOS 7, which is a type of Linux.  You will need to be familiar with Linux systems to function on Niagara.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and CC (Compute Canada) compute systems, access to Niagara is done via [[SSH]] (secure shell) only.  Open a terminal window (e.g. Connecting with [https://docs.computecanada.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.computecanada.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Niagara login nodes with your CC credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -Y MYCCUSERNAME@niagara.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
 $ ssh -Y MYCCUSERNAME@niagara.computecanada.ca&lt;br /&gt;
&lt;br /&gt;
* The Niagara login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Niagara compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; is needed to open windows from the Niagara command-line onto your local X server.&lt;br /&gt;
* To run on Niagara's compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
== Your various directories ==&lt;br /&gt;
&lt;br /&gt;
By virtue of your access to Niagara you are granted storage space on the system.  There are several directories available to you, each indicated by an associated environment variable.&lt;br /&gt;
&lt;br /&gt;
=== home and scratch ===&lt;br /&gt;
&lt;br /&gt;
You have a home and scratch directory on the system, whose locations are of the form&lt;br /&gt;
&lt;br /&gt;
 $HOME=/home/g/groupname/myccusername&lt;br /&gt;
 $SCRATCH=/scratch/g/groupname/myccusername&lt;br /&gt;
&lt;br /&gt;
where groupname is the name of your PI's group, and myccusername is your CC username.  For example:&lt;br /&gt;
&lt;br /&gt;
  nia-login07:~$ pwd&lt;br /&gt;
  /home/s/scinet/rzon&lt;br /&gt;
  nia-login07:~$ cd $SCRATCH&lt;br /&gt;
  nia-login07:rzon$ pwd&lt;br /&gt;
  /scratch/s/scinet/rzon&lt;br /&gt;
&lt;br /&gt;
NOTE: home is read-only on compute nodes.&lt;br /&gt;
&lt;br /&gt;
=== project and archive ===&lt;br /&gt;
&lt;br /&gt;
Users from groups with [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions RAC storage allocation] will also have a project and/or archive directory.&lt;br /&gt;
&lt;br /&gt;
 $PROJECT=/project/g/groupname/myccusername&lt;br /&gt;
 $ARCHIVE=/archive/g/groupname/myccusername&lt;br /&gt;
&lt;br /&gt;
NOTE: Currently archive space is available only via [[HPSS]].&lt;br /&gt;
&lt;br /&gt;
'''''IMPORTANT: Future-proof your scripts'''''&lt;br /&gt;
&lt;br /&gt;
When writing your scripts, use the environment variables (&amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$SCRATCH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$PROJECT&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$ARCHIVE&amp;lt;/tt&amp;gt;) instead of the actual paths!  The paths may change in the future.&lt;br /&gt;
&lt;br /&gt;
=== Storage and quotas ===&lt;br /&gt;
&lt;br /&gt;
You should familiarize yourself with the [[Data_Management#Purpose_of_each_file_system | various file systems]], what purpose they serve, and how to properly use them.  This table summarizes the various file systems.  See the [[Data_Management | Data Management]] page for more details.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;2&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50-500TB per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|[[Data_Management#Quotas_and_purging | depending on group size]]&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|-&lt;br /&gt;
| $BBUFFER&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 10 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| very short&lt;br /&gt;
| no&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Moving data to Niagara ===&lt;br /&gt;
&lt;br /&gt;
If you need to move data to Niagara for analysis, or when you need to move data off of Niagara, use the following guidelines:&lt;br /&gt;
* If your data is less than 10GB, move the data using the login nodes.&lt;br /&gt;
* If your data is greater than 10GB, move the data using the datamover nodes nia-datamover1.scinet.utoronto.ca and nia-datamover2.scinet.utoronto.ca .&lt;br /&gt;
&lt;br /&gt;
Details of how to use the datamover nodes can be found on the [[Data_Management#Moving_data | Data Management ]] page.&lt;br /&gt;
&lt;br /&gt;
= Loading software modules =&lt;br /&gt;
&lt;br /&gt;
You have two options for running code on Niagara: use existing software, or [[Niagara_Quickstart#Compiling_on_Niagara:_Example | compile your own]].  This section focuses on the former.&lt;br /&gt;
&lt;br /&gt;
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available.  A detailed explanation of the module system can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Common module subcommands are:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;: load the default version of a particular software.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt;: load a specific version of a particular software.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt;: unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; (or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;): list available software packages.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt;: list loadable software packages.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;: list loaded modules.&lt;br /&gt;
&lt;br /&gt;
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.&lt;br /&gt;
&lt;br /&gt;
There are handy abbreviations for the module commands. &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; is the same as &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; is the same as &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
== Software stacks: NiaEnv and CCEnv ==&lt;br /&gt;
&lt;br /&gt;
On Niagara, there are two available software stacks:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol style=&amp;quot;list-style-type: decimal;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;A [https://docs.scinet.utoronto.ca/index.php/Modules_specific_to_Niagara Niagara software stack] tuned and compiled for this machine. This stack is available by default, but if not, can be reloaded with&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;code&amp;gt;module load NiaEnv&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;The same  [https://docs.computecanada.ca/wiki/Modules software stack available on Compute Canada's General Purpose clusters] [https://docs.computecanada.ca/wiki/Graham Graham] and [https://docs.computecanada.ca/wiki/Cedar Cedar], compiled (for now) for a previous generation of CPUs:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;code&amp;gt;module load CCEnv&amp;lt;/code&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Or, if you want the same default modules loaded as on Cedar and Graham, then do&lt;br /&gt;
&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&lt;br /&gt;
&amp;lt;code&amp;gt;module load CCEnv&amp;lt;/code&amp;gt;&lt;br /&gt;
&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&lt;br /&gt;
&amp;lt;code&amp;gt;module load StdEnv&amp;lt;/code&amp;gt;&lt;br /&gt;
&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Tips for loading software ==&lt;br /&gt;
&lt;br /&gt;
* We advise '''''against''''' loading modules in your .bashrc.  This can lead to very confusing behaviour under certain circumstances.  Our guidelines for .bashrc files can be found [[bashrc guidelines|here]].&lt;br /&gt;
* Instead, load modules by hand when needed, or by sourcing a separate script.&lt;br /&gt;
* Load run-specific modules inside your job submission script.&lt;br /&gt;
* Short names give default versions; e.g. &amp;lt;code&amp;gt;intel&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt;intel/2018.2&amp;lt;/code&amp;gt;. It is usually better to be explicit about the versions, for future reproducibility.&lt;br /&gt;
* Modules often require other modules to be loaded first.  Solve these dependencies by using [[Using_modules#Module_spider | &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
= Available compilers and interpreters =&lt;br /&gt;
&lt;br /&gt;
* For most compiled software, one should use the Intel compilers (&amp;lt;tt&amp;gt;icc&amp;lt;/tt&amp;gt; for C, &amp;lt;tt&amp;gt;icpc&amp;lt;/tt&amp;gt; for C++, and &amp;lt;tt&amp;gt;ifort&amp;lt;/tt&amp;gt; for Fortran). Loading an &amp;lt;tt&amp;gt;intel&amp;lt;/tt&amp;gt; module makes these available. &lt;br /&gt;
* The GNU compiler suite (&amp;lt;tt&amp;gt;gcc, g++, gfortran&amp;lt;/tt&amp;gt;) is also available, if you load one of the &amp;lt;tt&amp;gt;gcc&amp;lt;/tt&amp;gt; modules.&lt;br /&gt;
* Open source interpreted, interactive software is also available:&lt;br /&gt;
** [[Python]]&lt;br /&gt;
** [[R]]&lt;br /&gt;
** Julia&lt;br /&gt;
** Octave&lt;br /&gt;
  &lt;br /&gt;
Please visit the [[Python]] or [[R]] page for details on using these tools.  For information on running MATLAB applications on Niagara, visit [[MATLAB| this page]].&lt;br /&gt;
&lt;br /&gt;
= Using Commercial Software =&lt;br /&gt;
&lt;br /&gt;
May I use commercial software on Niagara?&lt;br /&gt;
* Possibly, but you have to bring your own license for it.  You can connect to an external license server using [[SSH_Tunneling | ssh tunneling]].&lt;br /&gt;
* SciNet and Compute Canada have an extremely large and broad user base of thousands of users, so we cannot provide licenses for everyone's favorite software.&lt;br /&gt;
* Thus, the only freely available commercial software installed on Niagara is software that can benefit everyone: Compilers, math libraries and debuggers.&lt;br /&gt;
* That means no [[MATLAB]], Gaussian, IDL, &lt;br /&gt;
* Open source alternatives like Octave, [[Python]], and [[R]] are available.&lt;br /&gt;
* We are happy to help you to install commercial software for which you have a license.&lt;br /&gt;
* In some cases, if you have a license, you can use software in the Compute Canada stack.&lt;br /&gt;
The list of commercial software which is installed on Niagara, for which you will need a license to use, can be found on the [[Commercial_software | commercial software page]].&lt;br /&gt;
&lt;br /&gt;
= Compiling on Niagara: Example =&lt;br /&gt;
&lt;br /&gt;
Suppose one wants to compile an application from two c source files, appl.c and module.c, which use the Math Kernel Library. This is an example of how this would be done:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
nia-login07:~$ module list&lt;br /&gt;
Currently Loaded Modules:&lt;br /&gt;
  1) NiaEnv/2018a (S)&lt;br /&gt;
  Where:&lt;br /&gt;
   S:  Module is Sticky, requires --force to unload or purge&lt;br /&gt;
&lt;br /&gt;
nia-login07:~$ module load intel/2018.2&lt;br /&gt;
&lt;br /&gt;
nia-login07:~$ ls&lt;br /&gt;
appl.c module.c&lt;br /&gt;
&lt;br /&gt;
nia-login07:~$ icc -c -O3 -xHost -o appl.o appl.c&lt;br /&gt;
nia-login07:~$ icc -c -O3 -xHost -o module.o module.c&lt;br /&gt;
nia-login07:~$ icc  -o appl module.o appl.o -mkl&lt;br /&gt;
&lt;br /&gt;
nia-login07:~$ ./appl&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Note:&lt;br /&gt;
* The optimization flags -O3 -xHost allow the Intel compiler to use instructions specific to the architecture CPU that is present (instead of for more generic x86_64 CPUs).&lt;br /&gt;
* Linking with the Intel Math Kernel Library (MKL) is easy when using the intel compiler, it just requires the -mkl flags.&lt;br /&gt;
* If compiling with gcc, the optimization flags would be -O3 -march=native. For the way to link with the MKL, it is suggested to use the [https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor MKL link line advisor].&lt;br /&gt;
&lt;br /&gt;
= Testing =&lt;br /&gt;
&lt;br /&gt;
You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.&lt;br /&gt;
* Small test jobs can be run on the login nodes.  Rule of thumb: tests should run no more than a couple of minutes, taking at most about 1-2GB of memory, and use no more than a couple of cores.&lt;br /&gt;
* You can run the ddt debugger on the login nodes after &amp;lt;code&amp;gt;module load ddt&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Short tests that do not fit on a login node, or for which you need a dedicated node, request an interactive debug job with the debug command:&lt;br /&gt;
 nia-login07:~$ debugjob N&lt;br /&gt;
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 30 minutes.  Finally, if your debugjob process takes more than 1 hour, you can request an interactive job from the regular queue using the salloc command.  Note, however, that this may take some time to run, since it will be part of the regular queue, and will be run when the scheduler decides.&lt;br /&gt;
 nia-login07:~$ salloc --nodes N --time=M:00:00&lt;br /&gt;
where N is again the number of nodes, and M is the number of hours you wish the job to run.&lt;br /&gt;
If you need to use graphics while testing your code through salloc, e.g. when using a debugger such as DDT or DDD, you have the following options, please visit the [[Testing_With_Graphics | Testing with graphics]] page.&lt;br /&gt;
&lt;br /&gt;
= Submitting jobs =&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- == Progressive approach to run jobs on niagara == --&amp;gt;&lt;br /&gt;
&amp;lt;!-- We would like to emphasize the need for users to adopt a more progressive and explicit approach for testing, running and scaling up of jobs on niagara. [[Progressive_Approach | '''Here is a set of steps we suggest that you follow.''']] --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Niagara login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster.  Your jobs will run on some of Niagara's 1500 compute nodes.  When and where your job runs is determined by the scheduler.&lt;br /&gt;
&lt;br /&gt;
Niagara uses SLURM as its job scheduler.  More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You submit jobs from a login node by passing a script to the sbatch command:&lt;br /&gt;
&lt;br /&gt;
 nia-login07:~$ sbatch jobscript.sh&lt;br /&gt;
&lt;br /&gt;
This puts the job in the queue. It will run on the compute nodes in due course.&lt;br /&gt;
&lt;br /&gt;
Jobs will run under your group's RRG allocation, or, if the your group has none, under a RAS allocation (previously called `default' allocation).&lt;br /&gt;
&lt;br /&gt;
Keep in mind:&lt;br /&gt;
* Scheduling is by node, so in multiples of 40 cores.&lt;br /&gt;
* If your group has an allocation, your job's maximum walltime is 24 hours.  If your group is without an allocation, your job's maximum walltime is 12 hours.&lt;br /&gt;
* Jobs must write their output to your scratch or project directory (home is read-only on compute nodes).&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* [[Data_Management#Moving_data | Move your data]] to Niagara before you submit your job.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of cpus-per-node what resources should be allocated.  On Niagara things are a bit different.&lt;br /&gt;
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them.&lt;br /&gt;
** No other users are running anything on them.&lt;br /&gt;
** You can [[SSH]] into them to see how things are going.&lt;br /&gt;
* Whatever your requests to the scheduler, it will always be translated into a multiple of nodes allocated to your job.&lt;br /&gt;
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes and 202GB is the amount of memory on the node.&lt;br /&gt;
* If you run serial jobs you must still use all 40 cores on the node.  Visit the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page for examples of how to do this.&lt;br /&gt;
* Since there are 40 cores per node, your job should use N x 40 cores. If you do not, we will contact you to help you optimize your workflow.  Or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued.  It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the job runs. 'Partitions' are SLURM-speak for use cases.  You specify the partition with the &amp;lt;tt&amp;gt;-p&amp;lt;/tt&amp;gt; parameter to &amp;lt;tt&amp;gt;sbatch&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;salloc&amp;lt;/tt&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;tt&amp;gt;compute&amp;lt;/tt&amp;gt; partition, which is the most common case. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs with an allocation||compute || 50 || 1000 || 1 node (40 cores) || 1000 nodes (40000 cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs without allocation (&amp;quot;default&amp;quot;)||compute || 50 || 200 || 1 node (40 cores) || 20 nodes (800 cores)|| 15 minutes || 12 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40 cores) || 4 nodes (160 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (max 5 total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue.  The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== File Input/Output Tips ==&lt;br /&gt;
&lt;br /&gt;
It is important to understand the file systems, so as to perform your file I/O (Input/Output) responsibly.  Refer to the [[Data_Management | Data Management]] page for details about the file systems.&lt;br /&gt;
* Your files can be seen on all Niagara login and compute nodes.&lt;br /&gt;
* $HOME, $SCRATCH, and $PROJECT all use the parallel file system called GPFS.&lt;br /&gt;
* GPFS is a high-performance file system which provides rapid reads and writes to large data sets in parallel from many nodes.&lt;br /&gt;
* Accessing data sets which consist of many, small files leads to poor performance on GPFS.&lt;br /&gt;
* Avoid reading and writing lots of small amounts of data to disk.  Many small files on the system waste space and are slower to access, read and write.  If you must write many small files, use [[User_Ramdisk | ramdisk]].&lt;br /&gt;
* Write data out in a binary format. This is faster and takes less space.&lt;br /&gt;
* The [[Burst Buffer]] is another option for I/O heavy-jobs and for speeding up [[Checkpoints|checkpoints]].&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;#!/bin/bash &lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks=80&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load intel/2018.2&lt;br /&gt;
module load openmpi/3.1.0&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Submit this script with the command:&lt;br /&gt;
&lt;br /&gt;
    nia-login07:~$ sbatch mpi_job.sh&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;sbatch reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes (each of which will have 40 cores) on which to run a total of 80 tasks, for 1 hour.&amp;lt;br&amp;gt;(Instead of specifying &amp;lt;tt&amp;gt;--ntasks=80&amp;lt;/tt&amp;gt;, you can also ask for &amp;lt;tt&amp;gt;--ntasks-per-node=40&amp;lt;/tt&amp;gt;, which amounts to the same.)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;quot;--ppn&amp;quot; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it found such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform mpirun or srun on how many processes to run).&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, just change --ntasks=80 to --ntasks=160, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --cpus-per-task=40&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load intel/2018.2&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;.&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Submit this script with the command:&lt;br /&gt;
&lt;br /&gt;
    nia-login07:~$ sbatch openmp_job.sh&lt;br /&gt;
&lt;br /&gt;
* First line indicates that this is a bash script.&lt;br /&gt;
* Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&lt;br /&gt;
* sbatch reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;) .&lt;br /&gt;
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.&lt;br /&gt;
* Once it found such a node, it runs the script:&lt;br /&gt;
** Change to the submission directory;&lt;br /&gt;
** Loads modules;&lt;br /&gt;
** Sets an environment variable;&lt;br /&gt;
** Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&lt;br /&gt;
* To use hyperthreading, just change &amp;lt;code&amp;gt;--cpus-per-task=40&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--cpus-per-task=80&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Monitoring queued jobs ==&lt;br /&gt;
&lt;br /&gt;
Once the job is incorporated into the queue, there are some command you can use to monitor its progress.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; (a caching version of squeue) to show the job queue (&amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; for just your jobs);&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; to get information on a specific job&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;(alternatively, &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt;, which is more verbose).&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; to get an estimate for when a job will run; these tend not to be very accurate predictions.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel -i JOBID&amp;lt;/code&amp;gt; to cancel the job.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; to get an instantaneous view of the cpu and memory usage of the nodes of the job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; to get information on your recent jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Further instructions for monitoring your jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].  The [https://my.scinet.utoronto.ca my.SciNet] site is also a very useful tool for monitoring your current and past usage.&lt;br /&gt;
&lt;br /&gt;
= Visualization =&lt;br /&gt;
Information about how to use visualization tools on Niagara is available on [[Visualization]] page.&lt;br /&gt;
&lt;br /&gt;
= Support =&lt;br /&gt;
&lt;br /&gt;
* [mailto:support@scinet.utoronto.ca support@scinet.utoronto.ca]&lt;br /&gt;
* [mailto:niagara@computecanada.ca niagara@computecanada.ca]&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Niagara_Quickstart&amp;diff=1578</id>
		<title>Niagara Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Niagara_Quickstart&amp;diff=1578"/>
		<updated>2018-10-01T15:34:41Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* Example submission script (MPI) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Niagara.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Niagara&lt;br /&gt;
|installed=Jan 2018&lt;br /&gt;
|operatingsystem= CentOS 7.4 &lt;br /&gt;
|loginnode= niagara.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1500 nodes (60,000 cores)&lt;br /&gt;
|rampernode=188 GiB / 202 GB  &lt;br /&gt;
|corespernode=40 (80 hyperthreads)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|vendorcompilers= icc (C) ifort (fortran) icpc (C++)&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=Specifications=&lt;br /&gt;
&lt;br /&gt;
The Niagara cluster is a large cluster of 1500 Lenovo SD350 servers each with 40 Intel &amp;quot;Skylake&amp;quot; cores at 2.4 GHz. &lt;br /&gt;
The peak performance of the cluster is 3.02 PFlops delivered / 4.6 PFlops theoretical.  It is the 53rd fastest supercomputer on the [https://www.top500.org/list/2018/06/?page=1 TOP500 list of June 2018]. &lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 188 GiB / 202 GB RAM per node (at least 4 GiB/core for user jobs).  Being designed for large parallel workloads, it has a fast interconnect consisting of EDR InfiniBand in a Dragonfly+ topology with Adaptive Routing.  The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 12 or 24 hours (for default or RAC accounts, respectively) and favours large jobs.&lt;br /&gt;
&lt;br /&gt;
* See the [https://support.scinet.utoronto.ca/education/go.php/370/content.php/cid/1383/  &amp;quot;Intro to Niagara&amp;quot;] recording&lt;br /&gt;
&lt;br /&gt;
More detailed hardware characteristics of the Niagara supercomputer can be found [https://docs.computecanada.ca/wiki/Niagara on this page].&lt;br /&gt;
&lt;br /&gt;
= Getting started on Niagara =&lt;br /&gt;
&lt;br /&gt;
Those of you new to SciNet and belonging to a group whose primary PI does not have an allocation, as granted in the annual [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions Compute Canada RAC], must first follow the old route of [https://www.scinethpc.ca/getting-a-scinet-account/ requesting a SciNet Consortium Account on the CCDB site] to gain access to Niagara.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [[FAQ]] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Niagara runs CentOS 7, which is a type of Linux.  You will need to be familiar with Linux systems to function on Niagara.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and CC (Compute Canada) compute systems, access to Niagara is done via [[SSH]] (secure shell) only.  Open a terminal window (e.g. Connecting with [https://docs.computecanada.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.computecanada.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Niagara login nodes with your CC credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -Y MYCCUSERNAME@niagara.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
 $ ssh -Y MYCCUSERNAME@niagara.computecanada.ca&lt;br /&gt;
&lt;br /&gt;
* The Niagara login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Niagara compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; is needed to open windows from the Niagara command-line onto your local X server.&lt;br /&gt;
* To run on Niagara's compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
== Your various directories ==&lt;br /&gt;
&lt;br /&gt;
By virtue of your access to Niagara you are granted storage space on the system.  There are several directories available to you, each indicated by an associated environment variable.&lt;br /&gt;
&lt;br /&gt;
=== home and scratch ===&lt;br /&gt;
&lt;br /&gt;
You have a home and scratch directory on the system, whose locations are of the form&lt;br /&gt;
&lt;br /&gt;
 $HOME=/home/g/groupname/myccusername&lt;br /&gt;
 $SCRATCH=/scratch/g/groupname/myccusername&lt;br /&gt;
&lt;br /&gt;
where groupname is the name of your PI's group, and myccusername is your CC username.  For example:&lt;br /&gt;
&lt;br /&gt;
  nia-login07:~$ pwd&lt;br /&gt;
  /home/s/scinet/rzon&lt;br /&gt;
  nia-login07:~$ cd $SCRATCH&lt;br /&gt;
  nia-login07:rzon$ pwd&lt;br /&gt;
  /scratch/s/scinet/rzon&lt;br /&gt;
&lt;br /&gt;
NOTE: home is read-only on compute nodes.&lt;br /&gt;
&lt;br /&gt;
=== project and archive ===&lt;br /&gt;
&lt;br /&gt;
Users from groups with [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions RAC storage allocation] will also have a project and/or archive directory.&lt;br /&gt;
&lt;br /&gt;
 $PROJECT=/project/g/groupname/myccusername&lt;br /&gt;
 $ARCHIVE=/archive/g/groupname/myccusername&lt;br /&gt;
&lt;br /&gt;
NOTE: Currently archive space is available only via [[HPSS]].&lt;br /&gt;
&lt;br /&gt;
'''''IMPORTANT: Future-proof your scripts'''''&lt;br /&gt;
&lt;br /&gt;
When writing your scripts, use the environment variables (&amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$SCRATCH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$PROJECT&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$ARCHIVE&amp;lt;/tt&amp;gt;) instead of the actual paths!  The paths may change in the future.&lt;br /&gt;
&lt;br /&gt;
=== Storage and quotas ===&lt;br /&gt;
&lt;br /&gt;
You should familiarize yourself with the [[Data_Management#Purpose_of_each_file_system | various file systems]], what purpose they serve, and how to properly use them.  This table summarizes the various file systems.  See the [[Data_Management | Data Management]] page for more details.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;2&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50-500TB per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|[[Data_Management#Quotas_and_purging | depending on group size]]&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|-&lt;br /&gt;
| $BBUFFER&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 10 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| very short&lt;br /&gt;
| no&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Moving data to Niagara ===&lt;br /&gt;
&lt;br /&gt;
If you need to move data to Niagara for analysis, or when you need to move data off of Niagara, use the following guidelines:&lt;br /&gt;
* If your data is less than 10GB, move the data using the login nodes.&lt;br /&gt;
* If your data is greater than 10GB, move the data using the datamover nodes nia-datamover1.scinet.utoronto.ca and nia-datamover2.scinet.utoronto.ca .&lt;br /&gt;
&lt;br /&gt;
Details of how to use the datamover nodes can be found on the [[Data_Management#Moving_data | Data Management ]] page.&lt;br /&gt;
&lt;br /&gt;
= Loading software modules =&lt;br /&gt;
&lt;br /&gt;
You have two options for running code on Niagara: use existing software, or [[Niagara_Quickstart#Compiling_on_Niagara:_Example | compile your own]].  This section focuses on the former.&lt;br /&gt;
&lt;br /&gt;
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available.  A detailed explanation of the module system can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Common module subcommands are:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;: load the default version of a particular software.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt;: load a specific version of a particular software.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt;: unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; (or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;): list available software packages.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt;: list loadable software packages.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;: list loaded modules.&lt;br /&gt;
&lt;br /&gt;
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.&lt;br /&gt;
&lt;br /&gt;
There are handy abbreviations for the module commands. &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; is the same as &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; is the same as &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
== Software stacks: NiaEnv and CCEnv ==&lt;br /&gt;
&lt;br /&gt;
On Niagara, there are two available software stacks:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol style=&amp;quot;list-style-type: decimal;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;A [https://docs.scinet.utoronto.ca/index.php/Modules_specific_to_Niagara Niagara software stack] tuned and compiled for this machine. This stack is available by default, but if not, can be reloaded with&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;code&amp;gt;module load NiaEnv&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;The same  [https://docs.computecanada.ca/wiki/Modules software stack available on Compute Canada's General Purpose clusters] [https://docs.computecanada.ca/wiki/Graham Graham] and [https://docs.computecanada.ca/wiki/Cedar Cedar], compiled (for now) for a previous generation of CPUs:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;code&amp;gt;module load CCEnv&amp;lt;/code&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Or, if you want the same default modules loaded as on Cedar and Graham, then do&lt;br /&gt;
&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&lt;br /&gt;
&amp;lt;code&amp;gt;module load CCEnv&amp;lt;/code&amp;gt;&lt;br /&gt;
&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&lt;br /&gt;
&amp;lt;code&amp;gt;module load StdEnv&amp;lt;/code&amp;gt;&lt;br /&gt;
&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Tips for loading software ==&lt;br /&gt;
&lt;br /&gt;
* We advise '''''against''''' loading modules in your .bashrc.  This can lead to very confusing behaviour under certain circumstances.  Our guidelines for .bashrc files can be found [[bashrc guidelines|here]].&lt;br /&gt;
* Instead, load modules by hand when needed, or by sourcing a separate script.&lt;br /&gt;
* Load run-specific modules inside your job submission script.&lt;br /&gt;
* Short names give default versions; e.g. &amp;lt;code&amp;gt;intel&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt;intel/2018.2&amp;lt;/code&amp;gt;. It is usually better to be explicit about the versions, for future reproducibility.&lt;br /&gt;
* Modules often require other modules to be loaded first.  Solve these dependencies by using [[Using_modules#Module_spider | &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
= Available compilers and interpreters =&lt;br /&gt;
&lt;br /&gt;
* For most compiled software, one should use the Intel compilers (&amp;lt;tt&amp;gt;icc&amp;lt;/tt&amp;gt; for C, &amp;lt;tt&amp;gt;icpc&amp;lt;/tt&amp;gt; for C++, and &amp;lt;tt&amp;gt;ifort&amp;lt;/tt&amp;gt; for Fortran). Loading an &amp;lt;tt&amp;gt;intel&amp;lt;/tt&amp;gt; module makes these available. &lt;br /&gt;
* The GNU compiler suite (&amp;lt;tt&amp;gt;gcc, g++, gfortran&amp;lt;/tt&amp;gt;) is also available, if you load one of the &amp;lt;tt&amp;gt;gcc&amp;lt;/tt&amp;gt; modules.&lt;br /&gt;
* Open source interpreted, interactive software is also available:&lt;br /&gt;
** [[Python]]&lt;br /&gt;
** [[R]]&lt;br /&gt;
** Julia&lt;br /&gt;
** Octave&lt;br /&gt;
  &lt;br /&gt;
Please visit the [[Python]] or [[R]] page for details on using these tools.  For information on running MATLAB applications on Niagara, visit [[MATLAB| this page]].&lt;br /&gt;
&lt;br /&gt;
= Using Commercial Software =&lt;br /&gt;
&lt;br /&gt;
May I use commercial software on Niagara?&lt;br /&gt;
* Possibly, but you have to bring your own license for it.  You can connect to an external license server using [[SSH_Tunneling | ssh tunneling]].&lt;br /&gt;
* SciNet and Compute Canada have an extremely large and broad user base of thousands of users, so we cannot provide licenses for everyone's favorite software.&lt;br /&gt;
* Thus, the only freely available commercial software installed on Niagara is software that can benefit everyone: Compilers, math libraries and debuggers.&lt;br /&gt;
* That means no [[MATLAB]], Gaussian, IDL, &lt;br /&gt;
* Open source alternatives like Octave, [[Python]], and [[R]] are available.&lt;br /&gt;
* We are happy to help you to install commercial software for which you have a license.&lt;br /&gt;
* In some cases, if you have a license, you can use software in the Compute Canada stack.&lt;br /&gt;
The list of commercial software which is installed on Niagara, for which you will need a license to use, can be found on the [[Commercial_software | commercial software page]].&lt;br /&gt;
&lt;br /&gt;
= Compiling on Niagara: Example =&lt;br /&gt;
&lt;br /&gt;
Suppose one wants to compile an application from two c source files, appl.c and module.c, which use the Math Kernel Library. This is an example of how this would be done:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
nia-login07:~$ module list&lt;br /&gt;
Currently Loaded Modules:&lt;br /&gt;
  1) NiaEnv/2018a (S)&lt;br /&gt;
  Where:&lt;br /&gt;
   S:  Module is Sticky, requires --force to unload or purge&lt;br /&gt;
&lt;br /&gt;
nia-login07:~$ module load intel/2018.2&lt;br /&gt;
&lt;br /&gt;
nia-login07:~$ ls&lt;br /&gt;
appl.c module.c&lt;br /&gt;
&lt;br /&gt;
nia-login07:~$ icc -c -O3 -xHost -o appl.o appl.c&lt;br /&gt;
nia-login07:~$ icc -c -O3 -xHost -o module.o module.c&lt;br /&gt;
nia-login07:~$ icc  -o appl module.o appl.o -mkl&lt;br /&gt;
&lt;br /&gt;
nia-login07:~$ ./appl&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Note:&lt;br /&gt;
* The optimization flags -O3 -xHost allow the Intel compiler to use instructions specific to the architecture CPU that is present (instead of for more generic x86_64 CPUs).&lt;br /&gt;
* Linking with the Intel Math Kernel Library (MKL) is easy when using the intel compiler, it just requires the -mkl flags.&lt;br /&gt;
* If compiling with gcc, the optimization flags would be -O3 -march=native. For the way to link with the MKL, it is suggested to use the [https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor MKL link line advisor].&lt;br /&gt;
&lt;br /&gt;
= Testing =&lt;br /&gt;
&lt;br /&gt;
You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.&lt;br /&gt;
* Small test jobs can be run on the login nodes.  Rule of thumb: tests should run no more than a couple of minutes, taking at most about 1-2GB of memory, and use no more than a couple of cores.&lt;br /&gt;
* You can run the ddt debugger on the login nodes after &amp;lt;code&amp;gt;module load ddt&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Short tests that do not fit on a login node, or for which you need a dedicated node, request an interactive debug job with the debug command:&lt;br /&gt;
 nia-login07:~$ debugjob N&lt;br /&gt;
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 30 minutes.  Finally, if your debugjob process takes more than 1 hour, you can request an interactive job from the regular queue using the salloc command.  Note, however, that this may take some time to run, since it will be part of the regular queue, and will be run when the scheduler decides.&lt;br /&gt;
 nia-login07:~$ salloc --nodes N --time=M:00:00&lt;br /&gt;
where N is again the number of nodes, and M is the number of hours you wish the job to run.&lt;br /&gt;
If you need to use graphics while testing your code through salloc, e.g. when using a debugger such as DDT or DDD, you have the following options, please visit the [[Testing_With_Graphics | Testing with graphics]] page.&lt;br /&gt;
&lt;br /&gt;
= Submitting jobs =&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- == Progressive approach to run jobs on niagara == --&amp;gt;&lt;br /&gt;
&amp;lt;!-- We would like to emphasize the need for users to adopt a more progressive and explicit approach for testing, running and scaling up of jobs on niagara. [[Progressive_Approach | '''Here is a set of steps we suggest that you follow.''']] --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Niagara login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster.  Your jobs will run on some of Niagara's 1500 compute nodes.  When and where your job runs is determined by the scheduler.&lt;br /&gt;
&lt;br /&gt;
Niagara uses SLURM as its job scheduler.  More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You submit jobs from a login node by passing a script to the sbatch command:&lt;br /&gt;
&lt;br /&gt;
 nia-login07:~$ sbatch jobscript.sh&lt;br /&gt;
&lt;br /&gt;
This puts the job in the queue. It will run on the compute nodes in due course.&lt;br /&gt;
&lt;br /&gt;
Jobs will run under your group's RRG allocation, or, if the your group has none, under a RAS allocation (previously called `default' allocation).&lt;br /&gt;
&lt;br /&gt;
Keep in mind:&lt;br /&gt;
* Scheduling is by node, so in multiples of 40 cores.&lt;br /&gt;
* If your group has an allocation, your job's maximum walltime is 24 hours.  If your group is without an allocation, your job's maximum walltime is 12 hours.&lt;br /&gt;
* Jobs must write their output to your scratch or project directory (home is read-only on compute nodes).&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* [[Data_Management#Moving_data | Move your data]] to Niagara before you submit your job.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of cpus-per-node what resources should be allocated.  On Niagara things are a bit different.&lt;br /&gt;
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them.&lt;br /&gt;
** No other users are running anything on them.&lt;br /&gt;
** You can [[SSH]] into them to see how things are going.&lt;br /&gt;
* Whatever your requests to the scheduler, it will always be translated into a multiple of nodes allocated to your job.&lt;br /&gt;
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes and 202GB is the amount of memory on the node.&lt;br /&gt;
* If you run serial jobs you must still use all 40 cores on the node.  Visit the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page for examples of how to do this.&lt;br /&gt;
* Since there are 40 cores per node, your job should use N x 40 cores. If you do not, we will contact you to help you optimize your workflow.  Or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued.  It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the job runs. 'Partitions' are SLURM-speak for use cases.  You specify the partition with the &amp;lt;tt&amp;gt;-p&amp;lt;/tt&amp;gt; parameter to &amp;lt;tt&amp;gt;sbatch&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;salloc&amp;lt;/tt&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;tt&amp;gt;compute&amp;lt;/tt&amp;gt; partition, which is the most common case. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs with an allocation||compute || 50 || 1000 || 1 node (40 cores) || 1000 nodes (40000 cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs without allocation (&amp;quot;default&amp;quot;)||compute || 50 || 200 || 1 node (40 cores) || 20 nodes (800 cores)|| 15 minutes || 12 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40 cores) || 4 nodes (160 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (max 5 total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue.  The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== File Input/Output Tips ==&lt;br /&gt;
&lt;br /&gt;
It is important to understand the file systems, so as to perform your file I/O (Input/Output) responsibly.  Refer to the [[Data_Management | Data Management]] page for details about the file systems.&lt;br /&gt;
* Your files can be seen on all Niagara login and compute nodes.&lt;br /&gt;
* $HOME, $SCRATCH, and $PROJECT all use the parallel file system called GPFS.&lt;br /&gt;
* GPFS is a high-performance file system which provides rapid reads and writes to large data sets in parallel from many nodes.&lt;br /&gt;
* Accessing data sets which consist of many, small files leads to poor performance on GPFS.&lt;br /&gt;
* Avoid reading and writing lots of small amounts of data to disk.  Many small files on the system waste space and are slower to access, read and write.  If you must write many small files, use [[User_Ramdisk | ramdisk]].&lt;br /&gt;
* Write data out in a binary format. This is faster and takes less space.&lt;br /&gt;
* The [[Burst Buffer]] is another option for I/O heavy-jobs and for speeding up [[Checkpoints|checkpoints]].&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;#!/bin/bash &lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks=80&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load intel/2018.2&lt;br /&gt;
module load openmpi/3.1.0&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Submit this script with the command:&lt;br /&gt;
&lt;br /&gt;
    nia-login07:~$ sbatch mpi_job.sh&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;sbatch reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes (each of which will have 40 cores) on which to run a total of 80 tasks, for 1 hour.&amp;lt;br&amp;gt;(Instead of specifying &amp;lt;tt&amp;gt;--ntasks=80&amp;lt;/tt&amp;gt;, you can also ask for &amp;lt;tt&amp;gt;--ntasks-per-node=40&amp;lt;/tt&amp;gt;, which amounts to the same.)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;quot;--ppn&amp;quot; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it found such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform mpirun or srun on how many processes to run).&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, just change --ntasks=80 to --ntasks=160, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --cpus-per-task=40&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
#SBATCH --job-name openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load intel/2018.2&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;.&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Submit this script with the command:&lt;br /&gt;
&lt;br /&gt;
    nia-login07:~$ sbatch openmp_job.sh&lt;br /&gt;
&lt;br /&gt;
* First line indicates that this is a bash script.&lt;br /&gt;
* Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&lt;br /&gt;
* sbatch reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;) .&lt;br /&gt;
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.&lt;br /&gt;
* Once it found such a node, it runs the script:&lt;br /&gt;
** Change to the submission directory;&lt;br /&gt;
** Loads modules;&lt;br /&gt;
** Sets an environment variable;&lt;br /&gt;
** Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&lt;br /&gt;
* To use hyperthreading, just change &amp;lt;code&amp;gt;--cpus-per-task=40&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--cpus-per-task=80&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Monitoring queued jobs ==&lt;br /&gt;
&lt;br /&gt;
Once the job is incorporated into the queue, there are some command you can use to monitor its progress.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; (a caching version of squeue) to show the job queue (&amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; for just your jobs);&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; to get information on a specific job&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;(alternatively, &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt;, which is more verbose).&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; to get an estimate for when a job will run; these tend not to be very accurate predictions.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel -i JOBID&amp;lt;/code&amp;gt; to cancel the job.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; to get an instantaneous view of the cpu and memory usage of the nodes of the job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; to get information on your recent jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Further instructions for monitoring your jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].  The [https://my.scinet.utoronto.ca my.SciNet] site is also a very useful tool for monitoring your current and past usage.&lt;br /&gt;
&lt;br /&gt;
= Visualization =&lt;br /&gt;
Information about how to use visualization tools on Niagara is available on [[Visualization]] page.&lt;br /&gt;
&lt;br /&gt;
= Support =&lt;br /&gt;
&lt;br /&gt;
* [mailto:support@scinet.utoronto.ca support@scinet.utoronto.ca]&lt;br /&gt;
* [mailto:niagara@computecanada.ca niagara@computecanada.ca]&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=MemP&amp;diff=1558</id>
		<title>MemP</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=MemP&amp;diff=1558"/>
		<updated>2018-09-25T19:52:29Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;memP is a Lawrence Livermore National Labs (LLNL) developed, light weight, parallel heap profiling library. Its primarily designed to identify the heap allocation that causes an MPI task to reach its memory in use high water mark (HWM).&lt;br /&gt;
&lt;br /&gt;
== memP Reports ==&lt;br /&gt;
&lt;br /&gt;
'''Summary Report:''' Generated from within MPI_Finalize, this report describes the memory HWM of each task over the run of the application. This can be used to determine which task allocates the most memory and how this compares to the memory of other tasks.&lt;br /&gt;
&lt;br /&gt;
'''Task Report:''' Based on specific criteria, a report can be generated for each task, that provides a snapshot of the heap memory currently in use, including the amount allocated at specific call sites.&lt;br /&gt;
&lt;br /&gt;
==Using memP==&lt;br /&gt;
&lt;br /&gt;
Load the memp Module&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load memP&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Compile with the recommended BG/Q flags and link your application with the required libraries:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-Wl,-zmuldefs ${SCINET_LIB_MEMP}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Examples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mpixlc -g -Wl,-zmuldefs -o myprog myprog.c -L/usr/local/tools/memP/lib -lmemP&lt;br /&gt;
mpixlf77 -g -Wl,-zmuldefs -o myprog myprog.f -L/usr/local/tools/memP/lib -lmemP &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then run your MPI application as usual and you will see an memP header and traile it sends to stdout, as well as the output file generated at the end of the run.&lt;br /&gt;
&lt;br /&gt;
==Output Options==&lt;br /&gt;
&lt;br /&gt;
See http://memp.sourceforge.net/ for full details.&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=HPCTW&amp;diff=1557</id>
		<title>HPCTW</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=HPCTW&amp;diff=1557"/>
		<updated>2018-09-25T19:47:52Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: Created page with &amp;quot;HPCTW is a set of libraries that may be linked to in order to gather MPI usage and hardware performance counter information for IBM BG/Q. There are three libraries to choose f...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;HPCTW is a set of libraries that may be linked to in order to gather MPI usage and hardware performance counter information for IBM BG/Q. There are three libraries to choose from depending on that statistics you want to gather.  &lt;br /&gt;
&lt;br /&gt;
= Usage =&lt;br /&gt;
&lt;br /&gt;
Load the HPCTW module &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load hpctw&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This module sets three environment variables that can then be used in the final link line of your programs compilation&lt;br /&gt;
depending on that statistics you want to gather.  &lt;br /&gt;
&lt;br /&gt;
For MPI only&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-L ${SCINET_HPCTW_MPI} &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For MPI with hardware counters use &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-L ${SCINET_HPCTW_MPIHPM} &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For MPI/OpenMP with hardware counters use &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-L ${SCINET_HPCTW_MPIHPM_SMP}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now run the program in the normal way and a series of text file outputs will be generated in the working directory.  For analysis and other options see the HPCTW Document below provided by the author. &lt;br /&gt;
&lt;br /&gt;
= Docs =&lt;br /&gt;
&lt;br /&gt;
[https://support.scinet.utoronto.ca/wiki/images/9/99/Hpct-bgq_0.pdf HPCTW Manual]&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Niagara_Quickstart&amp;diff=766</id>
		<title>Niagara Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Niagara_Quickstart&amp;diff=766"/>
		<updated>2018-07-17T14:20:48Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* Loading Software Modules */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Niagara.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Niagara&lt;br /&gt;
|installed=Jan 2018&lt;br /&gt;
|operatingsystem= CentOS 7.4 &lt;br /&gt;
|loginnode= niagara.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1500 nodes (60,000 cores)&lt;br /&gt;
|rampernode=188 GiB / 202 GB  &lt;br /&gt;
|corespernode=40 (80 hyperthreads)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|vendorcompilers= icc (C) ifort (fortran) icpc (C++)&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=Specifications=&lt;br /&gt;
&lt;br /&gt;
The Niagara cluster is a large cluster of 1500 Lenovo SD350 servers each with 40 Intel &amp;quot;Skylake&amp;quot; cores at 2.4 GHz. &lt;br /&gt;
The peak performance of the cluster is 3.02 PFlops delivered / 4.6 PFlops theoretical.  It is the 53rd fastest supercomputer on the [https://www.top500.org/list/2018/06/?page=1 TOP500 list of June 2018]. &lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 188 GiB / 202 GB RAM per node (at least 4 GiB/core for user jobs).  Being designed for large parallel workloads, it has a fast interconnect consisting of EDR InfiniBand in a Dragonfly+ topology with Adaptive Routing.  The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 12 or 24 hours and favours large jobs.&lt;br /&gt;
&lt;br /&gt;
* See the [https://support.scinet.utoronto.ca/education/go.php/370/content.php/cid/1383/  &amp;quot;Intro to Niagara&amp;quot;] recording&lt;br /&gt;
&lt;br /&gt;
More detailed hardware characteristics of the Niagara supercomputer can be found [https://docs.computecanada.ca/wiki/Niagara on this page].&lt;br /&gt;
&lt;br /&gt;
= Using Niagara: Logging in =&lt;br /&gt;
&lt;br /&gt;
Those of you new to SciNet and belonging to a group whose primary PI doesn't have a RAC, to gain access to niagara you will need first to follow the old route of [https://www.scinethpc.ca/getting-a-scinet-account/ requesting a SciNet Consortium Account on the CCDB site.]&lt;br /&gt;
&lt;br /&gt;
Otherwise, as with all SciNet and CC (Compute Canada) compute systems, access to Niagara is done via ssh (secure shell) only.&lt;br /&gt;
Just open a terminal window (e.g. Connecting with [https://docs.computecanada.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.computecanada.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then ssh into the Niagara login nodes with your CC credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -Y MYCCUSERNAME@niagara.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
 $ ssh -Y MYCCUSERNAME@niagara.computecanada.ca&lt;br /&gt;
&lt;br /&gt;
* The Niagara login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Niagara compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; is needed to open windows from the Niagara command-line onto your local X server.&lt;br /&gt;
* To run on Niagara's compute nodes, you must submit a batch job.&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure first to check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
= Locating your directories =&lt;br /&gt;
&lt;br /&gt;
== home and scratch ==&lt;br /&gt;
&lt;br /&gt;
You have a home and scratch directory on the system, whose locations will be given in the form&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;$HOME=/home/g/groupname/myccusername&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;$SCRATCH=/scratch/g/groupname/myccusername&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
  nia-login07:~$ pwd&lt;br /&gt;
  /home/s/scinet/rzon&lt;br /&gt;
  nia-login07:~$ cd $SCRATCH&lt;br /&gt;
  nia-login07:rzon$ pwd&lt;br /&gt;
  /scratch/s/scinet/rzon&lt;br /&gt;
&lt;br /&gt;
NOTE: home is read-only on compute nodes.&lt;br /&gt;
&lt;br /&gt;
== project and archive==&lt;br /&gt;
&lt;br /&gt;
Users from groups with RAC storage allocation will also have a project and/or archive directory.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;$PROJECT=/project/g/groupname/myccusername&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;$ARCHIVE=/archive/g/groupname/myccusername&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
NOTE: Currently archive space is available only via [[HPSS|HPSS]]&lt;br /&gt;
&lt;br /&gt;
'''''IMPORTANT: Future-proof your scripts'''''&lt;br /&gt;
&lt;br /&gt;
Use the environment variables (HOME, SCRATCH, PROJECT, ARCHIVE) instead of the actual paths!  The paths may change in the future.&lt;br /&gt;
&lt;br /&gt;
= Data Management =&lt;br /&gt;
== Purpose of each file system ==&lt;br /&gt;
=== /home ===&lt;br /&gt;
/home is intended primarily for individual user files, common software or small datasets used by others in the same group, provided it does not exceed individual quotas. Otherwise you may consider /scratch or /project. /home is read-only on the compute nodes.&lt;br /&gt;
&lt;br /&gt;
=== /scratch ===&lt;br /&gt;
/scratch is to be used primarily for temporary or transient files, for all the results of your computations and simulations, or any material that can be easily recreated or reacquired. You may use scratch as well for any intermediate step in your workflow, provided it does not induce too much IO or too many small files on this disk-based storage pool, otherwise you should consider burst buffer (/bb). Once you have your final results, those that you want to keep for the long term, you may migrate them to /project or /archive. /scratch is purged on a regular basis and has no backups.&lt;br /&gt;
&lt;br /&gt;
=== /project ===&lt;br /&gt;
/project is intended for common group software, large static datasets, or any material very costly to be reacquired or re-generated by the group. &amp;lt;font color=red&amp;gt;Material on /project is expected to be relatively immutable over time.&amp;lt;/font&amp;gt; Temporary or transient files should be kept on scratch, not project. High data turnover induces the consumption of a lot of tapes on the TSM backup system, long after this material has been deleted, due to backup retention policies and the extra versions kept of the same file. Users abusing the project file system and using it as scratch will be flagged and contacted. Note that on niagara /project is only available to groups with RAC allocation.&lt;br /&gt;
&lt;br /&gt;
=== /bb (burst buffer) ===&lt;br /&gt;
/bb is basically a very fast, very high performance alternative to /scratch, made of solid-state drives (SSD). You may request this resource instead, if you anticipate a lot of IO/IOPs (too much for scratch) or when you notice your job is not performing well running on scratch or project because of IO bottlenecks. Keep in mind, we can only offer 232TB for all niagara users at any given time. Once you get your results you may bundle/tarball them and move to scratch, project or archive. /bb is purged very frequently.&lt;br /&gt;
&lt;br /&gt;
=== /archive ===&lt;br /&gt;
/archive is a nearline storage pool, if you want to temporarily offload semi-active material from any of the above file systems. In practice users will offload/recall material as part of their regular workflow, or when they hit their quotas on scratch or project. That material can remain on HPSS for a few months to a few years. Note that on niagara /archive is only available to groups with RAC allocation.&lt;br /&gt;
&lt;br /&gt;
==Performance==&lt;br /&gt;
[http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] is a high-performance filesystem which provides rapid reads and writes to large datasets in parallel from many nodes.  As a consequence of this design, however, '''the file system performs quite ''poorly'' at accessing data sets which consist of many, small files.'''  For instance, you will find that reading data in from one 16MB file is enormously faster than from 400 40KB files. Such small files are also quite wasteful of space, as the blocksize for the scratch and project filesystems is 16MB. This is something you should keep in mind when planning your input/output strategy for runs on SciNet.&lt;br /&gt;
&lt;br /&gt;
For instance, if you run multi-process jobs, having each process write to a file of its own is not an scalable I/O solution. A directory gets locked by the first process accessing it, so all other processes have to wait for it. Not only has the code just become considerably less parallel, chances are the file system will have a time-out while waiting for your other processes, leading your program to crash mysteriously.&lt;br /&gt;
Consider using MPI-IO (part of the MPI-2 standard), which allows files to be opened simultaneously by different processes, or using a dedicated process for I/O to which all other processes send their data, and which subsequently writes this data to a single file.&lt;br /&gt;
&lt;br /&gt;
== Moving data ==&lt;br /&gt;
&lt;br /&gt;
=== using rsync/scp ===&lt;br /&gt;
'''''Move amounts less than 10GB through the login nodes.'''''&lt;br /&gt;
&lt;br /&gt;
* Only Niagara login nodes visible from outside SciNet.&lt;br /&gt;
* Use scp or rsync to niagara.scinet.utoronto.ca or niagara.computecanada.ca (no difference).&lt;br /&gt;
* This will time out for amounts larger than about 10GB.&lt;br /&gt;
&lt;br /&gt;
'''''Move amounts larger than 10GB through the datamover nodes.'''''&lt;br /&gt;
&lt;br /&gt;
* From a Niagara login node, ssh to &amp;lt;code&amp;gt;nia-datamover1&amp;lt;/code&amp;gt; or  &amp;lt;code&amp;gt;nia-datamover2&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Transfers must originate from this datamover.&lt;br /&gt;
* The other side (e.g. your machine) must be reachable from the outside.&lt;br /&gt;
* If you do this often, consider using [[https://docs.computecanada.ca/wiki/Globus Globus]], a web-based tool for data transfer.&lt;br /&gt;
&lt;br /&gt;
'''''Moving data to HPSS/Archive/Nearline using the scheduler.'''''&lt;br /&gt;
&lt;br /&gt;
* [[HPSS|HPSS]] is a tape-based storage solution, and is SciNet's nearline a.k.a. archive facility.&lt;br /&gt;
* Storage space on HPSS is allocated through the annual [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions Compute Canada RAC allocation].&lt;br /&gt;
&lt;br /&gt;
=== using Globus ===&lt;br /&gt;
Please check the comprehensive documentation [[https://docs.computecanada.ca/wiki/Globus here]]&lt;br /&gt;
&lt;br /&gt;
Niagara endpoint is &amp;quot;computecanada#niagara&amp;quot;&lt;br /&gt;
&lt;br /&gt;
== Storage and quotas ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;6&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB per user (dynamic per group)&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;6&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;6&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;6&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;6&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;6&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|up to 4 users per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50TB&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|up to 11 users per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|125TB&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|up to 28 users per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|250TB&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|up to 60 users per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|400TB&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|above 60 users per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|500TB&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|-&lt;br /&gt;
| $BBUFFER&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 10 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| very short&lt;br /&gt;
| no&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[https://docs.scinet.utoronto.ca/images/9/9a/Inode_vs._Space_quota_-_v2x.pdf Inode vs. Space quota (PROJECT and SCRATCH)]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[https://docs.scinet.utoronto.ca/images/0/0e/Scratch-quota.pdf dynamic quota per group (SCRATCH)]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Compute nodes do not have local storage.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Archive space is on [[HPSS|HPSS]].&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Backup means a recent snapshot, not a replica of all data or version that ever was.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;$BBUFFER&amp;lt;/code&amp;gt; stands for the [[Burst Buffer]], a faster parallel storage tier for temporary data.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==File/Ownership Management (ACL)==&lt;br /&gt;
* By default, at SciNet, users within the same group already have read permission to each other's files (not write)&lt;br /&gt;
* You may use access control list ('''ACL''') to allow your supervisor (or another user within your group) to manage files for you (i.e., create, move, rename, delete), while still retaining your access and permission as the original owner of the files/directories. You may also let users in other groups or whole other groups access (read, execute) your files using this same mechanism. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
===Using  setfacl/getfacl===&lt;br /&gt;
* To allow [supervisor] to manage files in /project/g/group/[owner] using '''setfacl''' and '''getfacl''' commands, follow the 3-steps below as the [owner] account from a shell:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1) $ /scinet/gpc/bin/setfacl -d -m user:[supervisor]:rwx /project/g/group/[owner]&lt;br /&gt;
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default from now on)&lt;br /&gt;
&lt;br /&gt;
2) $ /scinet/gpc/bin/setfacl -d -m user:[owner]:rwx /project/g/group/[owner]&lt;br /&gt;
   (but will also inherit [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])&lt;br /&gt;
&lt;br /&gt;
3) $ /scinet/gpc/bin/setfacl -Rm user:[supervisor]:rwx /project/g/group/[owner]&lt;br /&gt;
   (recursively modify all *existing* files/directories inside [owner] to also be rwx by [supervisor])&lt;br /&gt;
&lt;br /&gt;
   $ /scinet/gpc/bin/getfacl /project/g/group/[owner]&lt;br /&gt;
   (to determine the current ACL attributes)&lt;br /&gt;
&lt;br /&gt;
   $ /scinet/gpc/bin/setfacl -b /project/g/group/[owner]&lt;br /&gt;
   (to remove any previously set ACL)&lt;br /&gt;
&lt;br /&gt;
PS: on the datamovers getfacl, setfacl and chacl will be on your path&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
For more information on using [http://linux.die.net/man/1/setfacl &amp;lt;tt&amp;gt;setfacl&amp;lt;/tt&amp;gt;] or [http://linux.die.net/man/1/getfacl &amp;lt;tt&amp;gt;getfacl&amp;lt;/tt&amp;gt;] see their man pages.&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
===Using mmputacl/mmgetacl===&lt;br /&gt;
* You may use gpfs' native '''mmputacl''' and '''mmgetacl''' commands. The advantages are that you can set &amp;quot;control&amp;quot; permission and that [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm1160.html POSIX or NFS v4 style ACL] are supported. You will need first to create a /tmp/supervisor.acl file with the following contents:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user::rwxc&lt;br /&gt;
group::----&lt;br /&gt;
other::----&lt;br /&gt;
mask::rwxc&lt;br /&gt;
user:[owner]:rwxc&lt;br /&gt;
user:[supervisor]:rwxc&lt;br /&gt;
group:[othegroup]:r-xc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then issue the following 2 commands:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1) $ mmputacl -i /tmp/supervisor.acl /project/g/group/[owner]&lt;br /&gt;
2) $ mmputacl -d -i /tmp/supervisor.acl /project/g/group/[owner]&lt;br /&gt;
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default as well as &lt;br /&gt;
   [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])&lt;br /&gt;
&lt;br /&gt;
   $ mmgetacl /project/g/group/[owner]&lt;br /&gt;
   (to determine the current ACL attributes)&lt;br /&gt;
&lt;br /&gt;
   $ mmdelacl -d /project/g/group/[owner]&lt;br /&gt;
   (to remove any previously set ACL)&lt;br /&gt;
&lt;br /&gt;
   $ mmeditacl /project/g/group/[owner]&lt;br /&gt;
   (to create or change a GPFS access control list)&lt;br /&gt;
   (for this command to work set the EDITOR environment variable: export EDITOR=/usr/bin/vi)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
NOTES:&lt;br /&gt;
* There is no option to recursively add or remove ACL attributes using a gpfs built-in command to existing files. You'll need to use the -i option as above for each file or directory individually. [[Recursive_ACL_script | Here is a sample bash script you may use for that purpose]]&lt;br /&gt;
&lt;br /&gt;
* mmputacl will not overwrite the original linux group permissions for a directory when copied to another directory already with ACLs, hence the &amp;quot;#effective:r-x&amp;quot; note you may see from time to time with mmgetacf. If you want to give rwx permissions to everyone in your group you should simply rely on the plain unix 'chmod g+rwx' command. You may do that before or after copying the original material to another folder with the ACLs.&lt;br /&gt;
&lt;br /&gt;
* In the case of PROJECT, your group's supervisor will need to set proper ACL to the /project/G/GROUP level in order to let users from other groups access your files.&lt;br /&gt;
&lt;br /&gt;
* ACL's won't let you give away permissions to files or directories that do not belong to you.&lt;br /&gt;
&lt;br /&gt;
* We highly recommend that you never give write permission to other users on the top level of your home directory (/home/G/GROUP/[owner]), since that would seriously compromise your privacy, in addition to disable ssh key authentication, among other things. If necessary, make specific sub-directories under your home directory so that other users can manipulate/access files from those.&lt;br /&gt;
&lt;br /&gt;
For more information on using [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm11120.html &amp;lt;tt&amp;gt;mmputacl&amp;lt;/tt&amp;gt;] or [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm11120.html &amp;lt;tt&amp;gt;mmgetaclacl&amp;lt;/tt&amp;gt;] see their man pages.&lt;br /&gt;
&lt;br /&gt;
===Recursive ACL script ===&lt;br /&gt;
You may use/adapt '''[[Recursive_ACL_script| this sample bash script]]''' to recursively add or remove ACL attributes using gpfs built-in commands&lt;br /&gt;
&lt;br /&gt;
Courtesy of Agata Disks (http://csngwinfo.in2p3.fr/mediawiki/index.php/GPFS_ACL)&lt;br /&gt;
&lt;br /&gt;
==Scratch Disk Purging Policy==&lt;br /&gt;
&lt;br /&gt;
In order to ensure that there is always significant space available for running jobs '''we automatically delete files in /scratch that have not been accessed or modified for more than 2 months by the actual deletion day on the 15th of each month'''. Note that we recently changed the cut out reference to the ''MostRecentOf(atime,ctime)''. This policy is subject to revision depending on its effectiveness. More details about the purging process and how users can check if their files will be deleted follows. If you have files scheduled for deletion you should move them to more permanent locations such as your departmental server or your /project space or into HPSS (for PIs who have either been allocated storage space by the RAC on project or HPSS).&lt;br /&gt;
&lt;br /&gt;
On the '''first''' of each month, a list of files scheduled for purging is produced, and an email notification is sent to each user on that list. You also get a notification on the shell every time your login to Niagara. Furthermore, at/or about the '''12th''' of each month a 2nd scan produces a more current assessment and another email notification is sent. This way users can double check that they have indeed taken care of all the files they needed to relocate before the purging deadline. Those files will be automatically deleted on the '''15th''' of the same month unless they have been accessed or relocated in the interim. If you have files scheduled for deletion then they will be listed in a file in /scratch/t/todelete/current, which has your userid and groupid in the filename. For example, if user xxyz wants to check if they have files scheduled for deletion they can issue the following command on a system which mounts /scratch (e.g. a scinet login node): '''ls -1 /scratch/t/todelete/current |grep xxyz'''. In the example below, the name of this file indicates that user xxyz is part of group abc, has 9,560 files scheduled for deletion and they take up 1.0TB of space:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 [xxyz@nia-login03 ~]$ ls -1 /scratch/t/todelete/current |grep xxyz&lt;br /&gt;
 -rw-r----- 1 xxyz     root       1733059 Jan 17 11:46 3110001___xxyz_______abc_________1.00T_____9560files&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The file itself contains a list of all files scheduled for deletion (in the last column) and can be viewed with standard commands like more/less/cat - e.g. '''more /scratch/t/todelete/current/3110001___xxyz_______abc_________1.00T_____9560files'''&lt;br /&gt;
&lt;br /&gt;
Similarly, you can also verify all other users on your group by using the ls command with grep on your group. For example: '''ls -1 /scratch/t/todelete/current |grep abc'''. That will list all other users in the same group that xxyz is part of, and have files to be purged on the 15th. Members of the same group have access to each other's contents.&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Preparing these assessments takes several hours. If you change the access/modification time of a file in the interim, that will not be detected until the next cycle. A way for you to get immediate feedback is to use the ''''ls -lu'''' command on the file to verify the ctime and ''''ls -lc'''' for the mtime. If the file atime/ctime has been updated in the meantime, coming the purging date on the 15th it will no longer be deleted.&lt;br /&gt;
&lt;br /&gt;
==How much Disk Space Do I have left?==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''/scinet/niagara/bin/diskUsage'''&amp;lt;/tt&amp;gt; command, available on the login nodes and datamovers, provides information in a number of ways on the home, scratch, project and archive file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Did you know that you can check which of your directories have more than 1000 files with the &amp;lt;tt&amp;gt;'''/scinet/niagara/bin/topUserDirOver1000list'''&amp;lt;/tt&amp;gt; command and which have more than 1GB of material with the &amp;lt;tt&amp;gt;'''/scinet/niagara/bin/topUserDirOver1GBlist'''&amp;lt;/tt&amp;gt; command?&lt;br /&gt;
&lt;br /&gt;
Note:&lt;br /&gt;
* information on usage and quota is only updated every 3 hours!&lt;br /&gt;
&lt;br /&gt;
== I/O Tips ==&lt;br /&gt;
&lt;br /&gt;
* $HOME, $SCRATCH, and $PROJECT all use the parallel file system called GPFS.&lt;br /&gt;
* Your files can be seen on all Niagara login and compute nodes.&lt;br /&gt;
* GPFS is a high-performance file system which provides rapid reads and writes to large data sets in parallel from many nodes.&lt;br /&gt;
* But accessing data sets which consist of many, small files leads to poor performance.&lt;br /&gt;
* Avoid reading and writing lots of small amounts of data to disk.&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Many small files on the system would waste space and would be slower to access, read and write.&lt;br /&gt;
* Write data out in binary. Faster and takes less space.&lt;br /&gt;
* The [[Burst Buffer]] is better for i/o heavy jobs and to speed up checkpoints.&lt;br /&gt;
&lt;br /&gt;
= Loading Software Modules =&lt;br /&gt;
&lt;br /&gt;
Other than essentials, all installed software is made available [https://docs.computecanada.ca/wiki/Utiliser_des_modules/en using module commands]. These modules set environment variables (PATH, etc.) This allows multiple, conflicting versions of a given package to be available. &amp;lt;tt&amp;gt; module spider&amp;lt;/tt&amp;gt; shows the available software.&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
nia-login07:~$ module spider&lt;br /&gt;
---------------------------------------------------&lt;br /&gt;
The following is a list of the modules currently av&lt;br /&gt;
---------------------------------------------------&lt;br /&gt;
  CCEnv: CCEnv&lt;br /&gt;
&lt;br /&gt;
  NiaEnv: NiaEnv/2018a&lt;br /&gt;
&lt;br /&gt;
  anaconda2: anaconda2/5.1.0&lt;br /&gt;
&lt;br /&gt;
  anaconda3: anaconda3/5.1.0&lt;br /&gt;
&lt;br /&gt;
  autotools: autotools/2017&lt;br /&gt;
    autoconf, automake, and libtool &lt;br /&gt;
&lt;br /&gt;
  boost: boost/1.66.0&lt;br /&gt;
&lt;br /&gt;
  cfitsio: cfitsio/3.430&lt;br /&gt;
&lt;br /&gt;
  cmake: cmake/3.10.2 cmake/3.10.3&lt;br /&gt;
&lt;br /&gt;
  ...&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Common module subcommands are:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;module load &amp;amp;lt;module-name&amp;amp;gt;&amp;lt;/code&amp;gt;: use particular software&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt;: remove currently loaded modules&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; (or &amp;lt;code&amp;gt;module spider &amp;amp;lt;module-name&amp;amp;gt;&amp;lt;/code&amp;gt;): list available software packages&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt;: list loadable software packages&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;: list loaded modules&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
On Niagara, there are really two software stacks:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol style=&amp;quot;list-style-type: decimal;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;A [https://docs.computecanada.ca/wiki/Modules_specific_to_Niagara Niagara software stack] tuned and compiled for this machine. This stack is available by default, but if not, can be reloaded with&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;module load NiaEnv&amp;lt;/source&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;The same  [https://docs.computecanada.ca/wiki/Modules software stack available on Compute Canada's General Purpose clusters] [https://docs.computecanada.ca/wiki/Graham Graham] and [https://docs.computecanada.ca/wiki/Cedar Cedar], compiled (for now) for a previous generation of CPUs:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;module load CCEnv&amp;lt;/source&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;If you want the same default modules loaded as on Cedar and Graham, then afterwards also &amp;lt;code&amp;gt;module load StdEnv&amp;lt;/code&amp;gt;.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note: the &amp;lt;code&amp;gt;*Env&amp;lt;/code&amp;gt; modules are '''''sticky'''''; remove them by &amp;lt;code&amp;gt;--force&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for loading software ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;We advise '''''against''''' loading modules in your .bashrc.&amp;lt;br&amp;gt; This could lead to very confusing behaviour under certain circumstances.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;The default .bashrc and .bash_profile files on Niagara can be found [[bashrc guidelines|here]]&lt;br /&gt;
&amp;lt;li&amp;gt;Instead, load modules by hand when needed, or by sourcing a separate script.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Load run-specific modules inside your job submission script.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Short names give default versions; e.g. &amp;lt;code&amp;gt;intel&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt;intel/2018.2&amp;lt;/code&amp;gt;. It is usually better to be explicit about the versions, for future reproducibility.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Handy abbreviations:&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;sh&amp;quot;&amp;gt; &lt;br /&gt;
  ml → module list&lt;br /&gt;
  ml NAME → module load NAME  # if NAME is an existing module&lt;br /&gt;
  ml X → module X&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Modules sometimes require other modules to be loaded first.&amp;lt;br /&amp;gt;&lt;br /&gt;
Solve these dependencies by using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Module spider ==&lt;br /&gt;
&lt;br /&gt;
Oddly named, the module subcommand spider is the search-and-advice facility for modules.&lt;br /&gt;
&lt;br /&gt;
Suppose one wanted to load the openmpi module. Upon trying to load the module, one may get the following message:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
nia-login07:~$ module load openmpi&lt;br /&gt;
Lmod has detected the error:  These module(s) exist but cannot be loaded as requested: &amp;quot;openmpi&amp;quot;&lt;br /&gt;
   Try: &amp;quot;module spider openmpi&amp;quot; to see how to load the module(s).&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
So while that fails, following the advice that the command outputs, the next command would be:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
nia-login07:~$ module spider openmpi&lt;br /&gt;
------------------------------------------------------------------------------------------------------&lt;br /&gt;
  openmpi:&lt;br /&gt;
------------------------------------------------------------------------------------------------------&lt;br /&gt;
     Versions:&lt;br /&gt;
        openmpi/2.1.3&lt;br /&gt;
        openmpi/3.0.1&lt;br /&gt;
        openmpi/3.1.0&lt;br /&gt;
&lt;br /&gt;
------------------------------------------------------------------------------------------------------&lt;br /&gt;
  For detailed information about a specific &amp;quot;openmpi&amp;quot; module (including how to load the modules) use&lt;br /&gt;
  the module s full name.&lt;br /&gt;
  For example:&lt;br /&gt;
&lt;br /&gt;
     $ module spider openmpi/3.1.0&lt;br /&gt;
------------------------------------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
So this gives just more detailed suggestions on using the spider command. Following the advice again, one would type:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
nia-login07:~$ module spider openmpi/3.1.0&lt;br /&gt;
------------------------------------------------------------------------------------------------------&lt;br /&gt;
  openmpi: openmpi/3.1.0&lt;br /&gt;
------------------------------------------------------------------------------------------------------&lt;br /&gt;
    You will need to load all module(s) on any one of the lines below before the &amp;quot;openmpi/3.1.0&amp;quot;&lt;br /&gt;
    module is available to load.&lt;br /&gt;
&lt;br /&gt;
      NiaEnv/2018a  gcc/7.3.0&lt;br /&gt;
      NiaEnv/2018a  intel/2018.2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
These are concrete instructions on how to load this particular openmpi module. Following these leads to a successful loading of the module.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
nia-login07:~$ module load NiaEnv/2018a  intel/2018.2   # note: NiaEnv is usually already loaded&lt;br /&gt;
nia-login07:~$ module load openmpi/3.1.0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
nia-login07:~$ module list&lt;br /&gt;
Currently Loaded Modules:&lt;br /&gt;
  1) NiaEnv/2018a (S)   2) intel/2018.2   3) openmpi/3.1.0&lt;br /&gt;
&lt;br /&gt;
  Where:&lt;br /&gt;
   S:  Module is Sticky, requires --force to unload or purge&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Python Virtual Environments ==&lt;br /&gt;
&lt;br /&gt;
Virtual environments (short virtualenv) are a standard in Python to create isolated Python environments. This is useful when certain modules or certain versions of modules are not available in the default python environment. &lt;br /&gt;
&lt;br /&gt;
VirtualEnv can be used either with the default python modules or the anaconda ones. Please check the link below for more information:&lt;br /&gt;
https://docs.scinet.utoronto.ca/index.php/PythonVirtualEnv&lt;br /&gt;
&lt;br /&gt;
= Running Commercial Software =&lt;br /&gt;
&lt;br /&gt;
* Possibly, but you have to bring your own license for it.&lt;br /&gt;
* SciNet and Compute Canada have an extremely large and broad user base of thousands of users, so we cannot provide licenses for everyone's favorite software.&lt;br /&gt;
* Thus, the only commercial software installed on Niagara is software that can benefit everyone: Compilers, math libraries and debuggers.&lt;br /&gt;
* That means no Matlab, Gaussian, IDL, &lt;br /&gt;
* Open source alternatives like Octave, Python, R are available.&lt;br /&gt;
* We are happy to help you to install commercial software for which you have a license.&lt;br /&gt;
* In some cases, if you have a license, you can use software in the Compute Canada stack.&lt;br /&gt;
&lt;br /&gt;
= Compiling on Niagara: Example =&lt;br /&gt;
&lt;br /&gt;
Suppose one want to compile an application from two c source files, appl.c and module.c, which use the Gnu Scientific Library (GSL). This is an example of how this would be done:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
nia-login07:~$ module list&lt;br /&gt;
Currently Loaded Modules:&lt;br /&gt;
  1) NiaEnv/2018a (S)&lt;br /&gt;
  Where:&lt;br /&gt;
   S:  Module is Sticky, requires --force to unload or purge&lt;br /&gt;
&lt;br /&gt;
nia-login07:~$ module load intel/2018.2 gsl/2.4&lt;br /&gt;
&lt;br /&gt;
nia-login07:~$ ls&lt;br /&gt;
appl.c module.c&lt;br /&gt;
&lt;br /&gt;
nia-login07:~$ icc -c -O3 -xHost -o appl.o appl.c&lt;br /&gt;
nia-login07:~$ icc -c -O3 -xHost -o module.o module.c&lt;br /&gt;
nia-login07:~$ icc  -o appl module.o appl.o -lgsl -mkl&lt;br /&gt;
&lt;br /&gt;
nia-login07:~$ ./appl&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Note:&lt;br /&gt;
* The optimization flags -O3 -xHost allow the Intel compiler to use instructions specific to the architecture CPU that is present (instead of for more generic x86_64 CPUs).&lt;br /&gt;
* The GSL requires a cblas implementation, for is contained in the Intel Math Kernel Library (MKL). Linking with this library is easy when using the intel compiler, it just requires the -mkl flags.&lt;br /&gt;
* If compiling with gcc, the optimization flags would be -O3 -march=native. For the way to link with the MKL, it is suggested to use the [https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor MKL link line advisor].&lt;br /&gt;
&lt;br /&gt;
= Testing =&lt;br /&gt;
&lt;br /&gt;
You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Small test jobs can be run on the login nodes.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Rule of thumb: couple of minutes, taking at most about 1-2GB of memory, couple of cores.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;You can run the the ddt debugger on the login nodes after &amp;lt;code&amp;gt;module load ddt&amp;lt;/code&amp;gt;.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Short tests that do not fit on a login node, or for which you need a dedicated node, request an&amp;lt;br /&amp;gt;&lt;br /&gt;
interactive debug job with the salloc command&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
 nia-login07:~$ salloc -pdebug --nodes N --time=1:00:00&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;where N  is the number of nodes. The duration of your interactive debug session can be at most one hour, can use at most 4 nodes, and each user can only have one such session at a time.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Alternatively, on Niagara, you can use the command&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
 nia-login07:~$ debugjob N&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 30 minutes.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Finally, if your debugjob process takes more than 1 hour, you can request an interactive job from the regular queue.  Note, however, that this may take some time to run, since it will be part of the regular queue, and will be run when the scheduler decides.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
 nia-login07:~$ salloc --nodes N --time=M:00:00&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;where N is again the number of nodes, and M is the number of hours you wish the job to run.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Testing with Graphics: X-forwarding ==&lt;br /&gt;
If you need to use graphics while testing your code, e.g. when using a debugger such as DDT or DDD, you have the following options:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt; You can use the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command which automatically provides X-forwarding support.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
$ ssh  niagara.scinet.utoronto.ca -X&lt;br /&gt;
&lt;br /&gt;
USER@nia-login07:~$ debugjob&lt;br /&gt;
debugjob: Requesting 1 nodes for 60 minutes&lt;br /&gt;
xalloc: Granted job allocation 189857&lt;br /&gt;
xalloc: Waiting for resource configuration&lt;br /&gt;
xalloc: Nodes nia0030 are ready for job&lt;br /&gt;
&lt;br /&gt;
[USER@nia1265 ~]$&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt; If &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; is not suitable for your case due to the limitations either on time or resources (see above [[#Testing]]), then you have to follow these steps:&lt;br /&gt;
&lt;br /&gt;
You will need two terminals in order to achieve this:&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In the 1st terminal&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt; ssh to &amp;lt;code&amp;gt;niagara.scinet.utoronto.ca&amp;lt;/code&amp;gt; and issue your &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; command&lt;br /&gt;
&amp;lt;li&amp;gt; wait until your resources are allocated and you are assigned the nodes&lt;br /&gt;
&amp;lt;li&amp;gt; take note of the node where you are logged to, ie. the head node, let's say &amp;lt;code&amp;gt;niaWXYZ&amp;lt;/code&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
$ ssh  niagara.scinet.utoronto.ca&lt;br /&gt;
USER@nia-login07:~$ salloc --nodes 5 --time=2:00:00&lt;br /&gt;
&lt;br /&gt;
.salloc: Granted job allocation 141862&lt;br /&gt;
.salloc: Waiting for resource configuration&lt;br /&gt;
.salloc: Nodes nia1265 are ready for job&lt;br /&gt;
&lt;br /&gt;
[USER@nia1265 ~]$&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt; On the second terminal:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt; ssh into &amp;lt;code&amp;gt;niagara.scinet.utoronto.ca&amp;lt;/code&amp;gt; now using the &amp;lt;code&amp;gt;-X&amp;lt;/code&amp;gt; flag in the ssh command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt; after that &amp;lt;code&amp;gt;ssh -X niaWXYZ&amp;lt;/code&amp;gt;, ie. you will ssh carrying on the '-X' flag into the head node of the job&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt; in the &amp;lt;code&amp;gt;niaWXYZ&amp;lt;/code&amp;gt; you should be able to use graphics and should be redirected by x-forwarding to your local terminal&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
ssh niagara.scinet.utoronto.ca -X&lt;br /&gt;
USER@nia-login07:~$ ssh -X nia1265&lt;br /&gt;
[USER@nia1265 ~]$ xclock   ## just an example to test the graphics, a clock should pop up, close it to exit&lt;br /&gt;
[USER@nia1265 ~]$ module load ddt  ## load corresponding modules, eg. for DDT&lt;br /&gt;
[USER@nia1265 ~]$ ddt  ## launch DDT, the GUI should appear in your screen&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Observations:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt; If you are using ssh from a Windows machine, you need to have an X-server, a good option is to use MobaXterm, that already brings an X-server included.&lt;br /&gt;
&amp;lt;li&amp;gt; If you are in Mac OS, substitute -X by -Y&lt;br /&gt;
&amp;lt;li&amp;gt; Instead of using two terminals, you could just use &amp;lt;code&amp;gt;screen&amp;lt;/code&amp;gt; to request the resources and then detach the session and ssh into the head node directly.&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Submitting jobs =&lt;br /&gt;
&lt;br /&gt;
Niagara uses SLURM as its job scheduler.&lt;br /&gt;
&lt;br /&gt;
You submit jobs from a login node by passing a script to the sbatch command:&lt;br /&gt;
&lt;br /&gt;
 nia-login07:~$ sbatch jobscript.sh&lt;br /&gt;
&lt;br /&gt;
This puts the job in the queue. It will run on the compute nodes in due course.&lt;br /&gt;
&lt;br /&gt;
Jobs will run under their group's RRG allocation, or, if the group has none, under a RAS allocation (previously called `default' allocation).&lt;br /&gt;
&lt;br /&gt;
Keep in mind:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Scheduling is by node, so in multiples of 40 cores.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;For users with an allocation, the maximum walltime is 24 hours.  For those without an allocation, the maximum walltime is 12 hours.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Jobs must write to your scratch or project directory (home is read-only on compute nodes).&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Compute nodes have no internet access.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Download data you need beforehand on a login node.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== SLURM nomenclature: jobs, nodes, tasks, cpus, cores, threads  ==&lt;br /&gt;
&lt;br /&gt;
SLURM, which is the job scheduler used on Niagara, has a somewhat different way of referring to things like mpi processes and threads tasks.  The SLURM nomenclature is reflected in the names of scheduler option (i.e., resource requests). SLURM strictly enforces those requests, so it is important to get this right.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!term &lt;br /&gt;
!meaning &lt;br /&gt;
!SLURM term&lt;br /&gt;
!related scheduler options &lt;br /&gt;
|-&lt;br /&gt;
|job&lt;br /&gt;
|scheduled piece of work for which specific resources were requested.&lt;br /&gt;
|job&lt;br /&gt;
|&amp;lt;tt&amp;gt;sbatch, salloc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|node&lt;br /&gt;
|basic computing component with several cores (40 for Niagara) that share memory  &lt;br /&gt;
|node&lt;br /&gt;
|&amp;lt;tt&amp;gt;--nodes -N&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|mpi process&lt;br /&gt;
|one of a group of running programs using Message Passing Interface for parallel computing&lt;br /&gt;
|task&lt;br /&gt;
|&amp;lt;tt&amp;gt;--ntasks -n --ntasks-per-node&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|core ''or'' physical cpu&lt;br /&gt;
|A fully functional independent physical execution unit.&lt;br /&gt;
| -   &lt;br /&gt;
| -&lt;br /&gt;
|-&lt;br /&gt;
|logical cpu&lt;br /&gt;
|An execution unit that the operating system can assign work to. Operating systems can be configured to overload physical cores with multiple logical cpus using hyperthreading.&lt;br /&gt;
|cpu&lt;br /&gt;
|&amp;lt;tt&amp;gt;--ncpus-per-task&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|thread&lt;br /&gt;
|one of possibly multiple simultaneous execution paths within a program, which can share memory.&lt;br /&gt;
| -&lt;br /&gt;
| &amp;lt;tt&amp;gt;--ncpus-per-task&amp;lt;/tt&amp;gt; '''and''' &amp;lt;tt&amp;gt;OMP_NUM_THREADS&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|hyperthread&lt;br /&gt;
|a thread run in a collection of threads that is larger than the number of physical cores.&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of cpus-per-node, what resources should be allocated.  On Niagara, this is a bit different.&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;The nodes that your jobs run on are exclusively yours.&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;No other users are running anything on them.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;You can ssh into them to see how things are going.&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Whatever your requests to the scheduler, it will always be translated into a multiple of nodes allocated to your job.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;You should try to use all the cores on the nodes allocated to your job. Since there are 40 cores per node, your job should use N x 40 cores. If this is not the case, we will be contacted you to help you optimize your workflow.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Hyperthreading: Logical CPUs vs. cores ==&lt;br /&gt;
&lt;br /&gt;
Hyperthreading, a technology that leverages more of the physical hardware by pretending there are twice as many logical cores than real once, is enabled on Niagara.&lt;br /&gt;
So the OS and scheduler see 80 logical cpus.&lt;br /&gt;
&lt;br /&gt;
Using 80 logical cpus vs. 40 real cores typically gives about a 5-10% speedup (Your Mileage May Vary).&lt;br /&gt;
&lt;br /&gt;
Because Niagara is scheduled by node, hyperthreading is actually fairly easy to use:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Ask for a certain number of nodes N for your jobs.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;You know that you get 40xN cores, so you will use (at least) a total of 40xN mpi processes or threads. (mpirun, srun, and the OS will automaticallly spread these over the real cores)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;But you should also test if running 80xN mpi processes or threads gives you any speedup.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Regardless, your usage will be counted as 40xNx(walltime in years).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued.  It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases.  You specify the partition with the &amp;lt;tt&amp;gt;-p&amp;lt;/tt&amp;gt; parameter to &amp;lt;tt&amp;gt;sbatch&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;salloc&amp;lt;/tt&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;tt&amp;gt;compute&amp;lt;/tt&amp;gt; partition, which is the most common case. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Running jobs&lt;br /&gt;
!Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs with an allocation||compute || 50 || 1000 || 1 node (40 cores) || 1000 nodes (40000 cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs without allocation (&amp;quot;default&amp;quot;)||compute || 50 || 200 || 1 node (40 cores) || 20 nodes (800 cores)|| 15 minutes || 12 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40 cores) || 4 nodes (160 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (max 5 total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Within these limits, jobs will still have to wait in the queue.  The waiting time depends on many factors such as the allocation amount, how much allocation was used in the recent past, the number of nodes and the walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== SLURM Accounts ==&lt;br /&gt;
&lt;br /&gt;
To be able to prioritise jobs based on groups and allocations, the SLURM scheduler uses the concept of ''accounts''.  Each group that has a Resource for Research Groups (RRG) or Research Platforms and Portals (RPP) allocation (awarded through an annual competition by Compute Canada) has an account that starts with &amp;lt;tt&amp;gt;rrg-&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;rpp-&amp;lt;/tt&amp;gt;.  SLURM assigns a 'fairshare' priority to these accounts based on the size of the award in core-years.  Groups without an RRG or RPP can use Niagara using a so-called Rapid Access Service (RAS), and have an account that starts with &amp;lt;tt&amp;gt;def-&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
On Niagara, most users will only ever use one account, and those users do not need to specify the account to SLURM.  However, users that are part of collaborations may be able to use multiple accounts, i.e., that of their sponsor and that of their collaborator, but this mean that they need to select the right account when running jobs. &lt;br /&gt;
&lt;br /&gt;
To select the account, just add &lt;br /&gt;
&lt;br /&gt;
    #SBATCH -A [account]&lt;br /&gt;
&lt;br /&gt;
to the job scripts, or use the &amp;lt;tt&amp;gt;-A [account]&amp;lt;/tt&amp;gt; to &amp;lt;tt&amp;gt;salloc&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;debugjob&amp;lt;/tt&amp;gt;. &lt;br /&gt;
&lt;br /&gt;
To see which accounts you have access to, or what their names are, use the command&lt;br /&gt;
&lt;br /&gt;
    sshare -U&lt;br /&gt;
&lt;br /&gt;
== Passing Variables to Job's submission scripts ==&lt;br /&gt;
It is possible to pass values through environment variables into your SLURM submission scripts.&lt;br /&gt;
For doing so with already defined variables in your shell, just add the following directive in the submission script,&lt;br /&gt;
&lt;br /&gt;
 #SBATCH --export=ALL&lt;br /&gt;
&lt;br /&gt;
and you will have access to any predefined environment variable.&lt;br /&gt;
&lt;br /&gt;
A better way is to specify explicitly which variables you want to pass into the submision script,&lt;br /&gt;
&lt;br /&gt;
 sbatch --export=i=15,j='test' jobscript.sbatch&lt;br /&gt;
&lt;br /&gt;
You can even set the job name and output files using environment variables, eg.&lt;br /&gt;
&lt;br /&gt;
 i=&amp;quot;simulation&amp;quot;&lt;br /&gt;
 j=14&lt;br /&gt;
 sbatch --job-name=$i.$j.run --output=$i.$j.out --export=i=$i,j=$j jobscript.sbatch&lt;br /&gt;
&lt;br /&gt;
(The latter only works on the command line; you cannot use environment variables in &amp;lt;tt&amp;gt;#SBATCH&amp;lt;/tt&amp;gt; lines in the job script.)&lt;br /&gt;
&lt;br /&gt;
'''Command line arguments:'''&lt;br /&gt;
&lt;br /&gt;
Command line arguments can also be used in the same way as command line argument for shell scripts. All command line arguments given to sbatch that follow after the job script name, will be passed to the job script. In fact, SLURM will not look at any of these arguments, so you must place all sbatch arguments before the script name, e.g.:&lt;br /&gt;
&lt;br /&gt;
 sbatch  -p debug  jobscript.sbatch  FirstArgument SecondArgument ...&lt;br /&gt;
&lt;br /&gt;
In this example, &amp;lt;tt&amp;gt;-p debug&amp;lt;/tt&amp;gt; is interpreted by SLURM, while in your submission script you can access &amp;lt;tt&amp;gt;FirstArgument&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;SecondArgument&amp;lt;/tt&amp;gt;, etc., by referring to &amp;lt;code&amp;gt;$1, $2, ...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Email Notification ==&lt;br /&gt;
Email notification works, but you need to add the email address and type of notification you may want to receive in your submission script, eg.&lt;br /&gt;
&lt;br /&gt;
    #SBATCH --mail-user=YOUR.email.ADDRESS&lt;br /&gt;
    #SBATCH --mail-type=ALL&lt;br /&gt;
&lt;br /&gt;
The sbatch man page (type &amp;lt;tt&amp;gt;man sbatch&amp;lt;/tt&amp;gt; on Niagara) explains all possible mail-types.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;#!/bin/bash &lt;br /&gt;
#SBATCH --nodes=8&lt;br /&gt;
#SBATCH --ntasks=320&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
#SBATCH --job-name mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load intel/2018.2&lt;br /&gt;
module load openmpi/3.1.0&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Submit this script with the command:&lt;br /&gt;
&lt;br /&gt;
    nia-login07:~$ sbatch mpi_job.sh&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;First line indicates that this is a bash script.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;sbatch reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;)&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;In this case, SLURM looks for 8 nodes with 40 cores on which to run 320 tasks, for 1 hour.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Note that the mpifun flag &amp;quot;--ppn&amp;quot; (processors per node) is ignored.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Once it found such a node, it runs the script:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, just change --ntasks=320 to --ntasks=640, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --cpus-per-task=40&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
#SBATCH --job-name openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load intel/2018.2&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;.&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Submit this script with the command:&lt;br /&gt;
&lt;br /&gt;
    nia-login07:~$ sbatch openmp_job.sh&lt;br /&gt;
&lt;br /&gt;
* First line indicates that this is a bash script.&lt;br /&gt;
* Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&lt;br /&gt;
* sbatch reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;) .&lt;br /&gt;
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.&lt;br /&gt;
* Once it found such a node, it runs the script:&lt;br /&gt;
** Change to the submission directory;&lt;br /&gt;
** Loads modules;&lt;br /&gt;
** Sets an environment variable;&lt;br /&gt;
** Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&lt;br /&gt;
* To use hyperthreading, just change &amp;lt;code&amp;gt;--cpus-per-task=40&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--cpus-per-task=80&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= Monitoring queued jobs =&lt;br /&gt;
&lt;br /&gt;
Once the job is incorporated into the queue, there are some command you can use to monitor its progress.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; (a caching version of squeue) to show the job queue (&amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; for just your jobs);&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;qsum&amp;lt;/code&amp;gt; shows a summary of qudue by user&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; to get information on a specific job&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;(alternatively, &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt;, which is more verbose).&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; to get an estimate for when a job will run; these tend not to be very accurate predictions.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel -i JOBID&amp;lt;/code&amp;gt; to cancel the job.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sinfo -pcompute&amp;lt;/code&amp;gt; to look at available nodes.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; to get an instantaneous view of the cpu and memory usage of the nodes of the job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; to get information on your recent jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;More utilities like those that were available on the GPC are under development.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Visualization =&lt;br /&gt;
Information about how to use visualization tools on Niagara is available on [[Visualization]] page.&lt;br /&gt;
&lt;br /&gt;
= Further information =&lt;br /&gt;
&lt;br /&gt;
'''Useful sites'''&lt;br /&gt;
&lt;br /&gt;
* SciNet: https://www.scinet.utoronto.ca&lt;br /&gt;
* Niagara: https://docs.computecanada.ca/wiki/niagara&lt;br /&gt;
* System Status: https://docs.scinet.utoronto.ca/index.php/Main_Page&lt;br /&gt;
* Training: https://support.scinet.utoronto.ca/education&lt;br /&gt;
&lt;br /&gt;
'''Support'''&lt;br /&gt;
&lt;br /&gt;
* support@scinet.utoronto.ca&lt;br /&gt;
* niagara@computecanada.ca&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Rcp8Lecture8.pdf&amp;diff=468</id>
		<title>File:Rcp8Lecture8.pdf</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Rcp8Lecture8.pdf&amp;diff=468"/>
		<updated>2018-05-25T20:48:04Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Rcp7Lecture7.pdf&amp;diff=467</id>
		<title>File:Rcp7Lecture7.pdf</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Rcp7Lecture7.pdf&amp;diff=467"/>
		<updated>2018-05-25T20:47:52Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Rcp6Lecture6.pdf&amp;diff=466</id>
		<title>File:Rcp6Lecture6.pdf</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Rcp6Lecture6.pdf&amp;diff=466"/>
		<updated>2018-05-25T20:46:55Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Rcp5Lecture5.pdf&amp;diff=465</id>
		<title>File:Rcp5Lecture5.pdf</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Rcp5Lecture5.pdf&amp;diff=465"/>
		<updated>2018-05-25T20:46:31Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Rcp2Lecture4.pdf&amp;diff=464</id>
		<title>File:Rcp2Lecture4.pdf</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Rcp2Lecture4.pdf&amp;diff=464"/>
		<updated>2018-05-25T20:46:04Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Rcp3Lecture3.pdf&amp;diff=463</id>
		<title>File:Rcp3Lecture3.pdf</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Rcp3Lecture3.pdf&amp;diff=463"/>
		<updated>2018-05-25T20:45:47Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Rcp2Lecture2.pdf&amp;diff=462</id>
		<title>File:Rcp2Lecture2.pdf</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Rcp2Lecture2.pdf&amp;diff=462"/>
		<updated>2018-05-25T20:45:29Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Rcp1Lecture1.pdf&amp;diff=461</id>
		<title>File:Rcp1Lecture1.pdf</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Rcp1Lecture1.pdf&amp;diff=461"/>
		<updated>2018-05-25T20:45:11Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Rcp8FirstFrame.png&amp;diff=460</id>
		<title>File:Rcp8FirstFrame.png</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Rcp8FirstFrame.png&amp;diff=460"/>
		<updated>2018-05-25T20:41:36Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Rcp7FirstFrame.png&amp;diff=459</id>
		<title>File:Rcp7FirstFrame.png</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Rcp7FirstFrame.png&amp;diff=459"/>
		<updated>2018-05-25T20:41:25Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Rcp5FirstFrame.png&amp;diff=458</id>
		<title>File:Rcp5FirstFrame.png</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Rcp5FirstFrame.png&amp;diff=458"/>
		<updated>2018-05-25T20:41:12Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Rcp4FirstFrame.png&amp;diff=457</id>
		<title>File:Rcp4FirstFrame.png</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Rcp4FirstFrame.png&amp;diff=457"/>
		<updated>2018-05-25T20:40:58Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Rcp3aFirstFrame.png&amp;diff=456</id>
		<title>File:Rcp3aFirstFrame.png</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Rcp3aFirstFrame.png&amp;diff=456"/>
		<updated>2018-05-25T20:40:46Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Rcp2FirstFrame.png&amp;diff=455</id>
		<title>File:Rcp2FirstFrame.png</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Rcp2FirstFrame.png&amp;diff=455"/>
		<updated>2018-05-25T20:40:33Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Rcp1FirstFrame.png&amp;diff=454</id>
		<title>File:Rcp1FirstFrame.png</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Rcp1FirstFrame.png&amp;diff=454"/>
		<updated>2018-05-25T20:40:05Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Python&amp;diff=452</id>
		<title>Python</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Python&amp;diff=452"/>
		<updated>2018-05-25T20:32:54Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: Created page with &amp;quot;[http://www.python.org/ Python] is programing language that continues to grow in popularity for scientific computing.   It is very fast to write code in, but the software that...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[http://www.python.org/ Python] is programing language that continues to grow in popularity for scientific computing.   It is very fast to write code in, but the software that results is much much slower than C or Fortran; one should be wary of doing too much compute-intensive work in Python.     &lt;br /&gt;
&lt;br /&gt;
There is a dizzying amount of documentation available for programming in Python on the [http://python.org/ Python.org webpage]; SciNet has given a mini-course of 8 lectures on [[Research Computing with Python]] in the Fall of 2013.&lt;br /&gt;
An excellent set of material for teaching scientists to program in Python is also available at the [http://software-carpentry.org/4_0/python/ Software Carpentry homepage].&lt;br /&gt;
&lt;br /&gt;
__FORCETOC__ &lt;br /&gt;
&lt;br /&gt;
== Python on the GPC ==&lt;br /&gt;
&lt;br /&gt;
We currently have several versions of python installed, compiled against fast intel math libraries.  To load the python modules, type the following commands:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Version&lt;br /&gt;
! Command&lt;br /&gt;
|-&lt;br /&gt;
|2.7.2&lt;br /&gt;
|&amp;lt;tt&amp;gt;module load gcc intel python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|2.7.3&lt;br /&gt;
|&amp;lt;tt&amp;gt;module load gcc intel/13.1.1 python/2.7.3&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|2.7.5&lt;br /&gt;
|&amp;lt;tt&amp;gt;module load gcc intel/13.1.1 python/2.7.5&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|2.7.8&lt;br /&gt;
|&amp;lt;tt&amp;gt;module load intel/15.0.2 python/2.7.8&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|2.7.11&lt;br /&gt;
|&amp;lt;tt&amp;gt;module load anaconda2/4.0.0&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|2.7.13&lt;br /&gt;
|&amp;lt;tt&amp;gt;module load anaconda2/4.3.1&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|3.3.4&lt;br /&gt;
|&amp;lt;tt&amp;gt;module load gcc intel/14.0.1 python/3.3.4&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|3.5.1&lt;br /&gt;
|&amp;lt;tt&amp;gt;module load anaconda3/4.0.0&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|3.6.1&lt;br /&gt;
|&amp;lt;tt&amp;gt;module load anaconda3/4.4.0&amp;lt;/tt&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Modules installed system-wide ==&lt;br /&gt;
&lt;br /&gt;
Many optional packages are available for Python which greatly extend the language adding important new functionality.  Those packages which are likely to be important to all of our users &amp;amp;mdash; eg, [http://numpy.scipy.org/ NumPy], [http://www.scipy.org/ SciPy], and [http://matplotlib.sourceforge.net/ Matplotlib] are installed system-wide.&lt;br /&gt;
&lt;br /&gt;
Below is a list of the packages currently installed system-wide.&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
!{{Hl2}}| Module  &lt;br /&gt;
!{{Hl2}}| python/2.7.2 &lt;br /&gt;
!{{Hl2}}| python/2.7.3 &lt;br /&gt;
!{{Hl2}}| python/2.7.5 &lt;br /&gt;
!{{Hl2}}| python/2.7.8&lt;br /&gt;
!{{Hl2}}| python/3.3.4&lt;br /&gt;
!{{Hl2}}| Comments&lt;br /&gt;
|-  &lt;br /&gt;
|[http://www.scipy.org/ SciPy]&lt;br /&gt;
| 0.10.0&lt;br /&gt;
| 0.11.0&lt;br /&gt;
| 0.14.0&lt;br /&gt;
| 0.14.0&lt;br /&gt;
| 0.14.0&lt;br /&gt;
| An Open-source software for mathematics, science, and engineering.  Version in Python 2.7.x is linked against very fast MKL numerical libraries. &lt;br /&gt;
|-&lt;br /&gt;
|[http://numpy.scipy.org/ NumPy]&lt;br /&gt;
| 1.6.1&lt;br /&gt;
| 1.7.0&lt;br /&gt;
| 1.7.0&lt;br /&gt;
| 1.9.1&lt;br /&gt;
| 1.8.1&lt;br /&gt;
| NumPy is the fundamental package needed for scientific computing with Python. Contains fast arrays, tools for integrating C/C++ and Fortran code, linear algebra solvers, etc.  SciPy is built on top of NumPy.&lt;br /&gt;
|-&lt;br /&gt;
| [http://mpi4py.scipy.org/ mpi4py]&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| A pythonic interface to mpi.   Available with openmpi; must load an openmpi module for this to work. (There is an issue with openmpi 1.4.x + infiniband, however it does appear to work fine with IntelMPI)&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.scipy.org/SciPyPackages/NumExpr Numexpr]&lt;br /&gt;
| 2.0&lt;br /&gt;
| 2.0.1&lt;br /&gt;
| 2.2.1&lt;br /&gt;
| 2.4&lt;br /&gt;
| 2.4_rc2&lt;br /&gt;
| Fast, memory-efficient elementwise operations on Numpy arrays.&lt;br /&gt;
|-&lt;br /&gt;
| [http://dirac.cnrs-orleans.fr/plone/software/scientificpython/ ScientificPython]&lt;br /&gt;
| 2.8 &lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| A collection of scientific python utilities.   Does not include MPI support.  No longer supported.&lt;br /&gt;
|-&lt;br /&gt;
| [http://yt.enzotools.org/ yt]&lt;br /&gt;
| 2.2&lt;br /&gt;
| 2.5.3&lt;br /&gt;
| 2.5.5&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| A collection of python tools for analyzing astrophysical simulation output.&lt;br /&gt;
|-&lt;br /&gt;
| [http://ipython.scipy.org/moin/ iPython]&lt;br /&gt;
| 0.11 &lt;br /&gt;
| 0.13.1&lt;br /&gt;
| 1.0.0&lt;br /&gt;
| 2.3.0&lt;br /&gt;
| 1.2.1&lt;br /&gt;
| An enhanced interactive python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://matplotlib.sourceforge.net/ Matplotlib], pylab&lt;br /&gt;
| 1.1.0&lt;br /&gt;
| 1.2.0&lt;br /&gt;
| 1.3.0&lt;br /&gt;
| 1.4.2&lt;br /&gt;
| 1.3.1&lt;br /&gt;
| Matlab-like plotting for python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.pytables.org/moin PyTables]&lt;br /&gt;
| 2.3.1 &lt;br /&gt;
| 2.4.0&lt;br /&gt;
| 3.0.0&lt;br /&gt;
| 3.1.1&lt;br /&gt;
| 3.1.1&lt;br /&gt;
| Fast and efficient access to HDF5 files (and HDF5-format NetCDF4 files.)   Requires the &amp;lt;tt&amp;gt;hdf5/184-p1-v18-serial-gcc&amp;lt;/tt&amp;gt; module to be loaded. &lt;br /&gt;
|-&lt;br /&gt;
| [http://code.google.com/p/netcdf4-python/ NetCDF4-python]&lt;br /&gt;
| 0.9.8&lt;br /&gt;
| 1.0.4&lt;br /&gt;
| 1.1.1&lt;br /&gt;
| -&lt;br /&gt;
| 1.1.0&lt;br /&gt;
| Python interface to NetCDF4 files.   Requires the &amp;lt;tt&amp;gt;netcdf/4.0.1_hdf5_v18-serial.shared-nofortran&amp;lt;/tt&amp;gt; module to be loaded. &lt;br /&gt;
|-&lt;br /&gt;
| [http://www.pyngl.ucar.edu/Nio.shtml pyNIO]&lt;br /&gt;
| 1.4.1&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| Yet another Python interface to NetCDF4 files; again, requires the &amp;lt;tt&amp;gt;netcdf/4.0.1_hdf5_v18-serial.shared-nofortran&amp;lt;/tt&amp;gt; module.  No longer supported.&lt;br /&gt;
|-&lt;br /&gt;
| [http://alfven.org/wp/hdf5-for-python/ h5py]&lt;br /&gt;
| 2.0.1&lt;br /&gt;
| 2.1.3&lt;br /&gt;
| 2.2.0&lt;br /&gt;
| 2.3.1&lt;br /&gt;
| 2.3.0&lt;br /&gt;
| Yet another Python interface to HDF5 files; again, requires an HDF5 module to be loaded.&lt;br /&gt;
|-&lt;br /&gt;
| [http://pysvn.tigris.org/ PySVN]&lt;br /&gt;
| 1.7.1&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| Python interface to the svn version control system. &lt;br /&gt;
|-&lt;br /&gt;
| [http://mercurial.selenic.com/ Mercurial]&lt;br /&gt;
| 2.0.1&lt;br /&gt;
| 2.6.2&lt;br /&gt;
| 2.7.1&lt;br /&gt;
| 3.2&lt;br /&gt;
| -&lt;br /&gt;
| A distributed version-control system written in Python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://cython.org/ Cython]&lt;br /&gt;
| 0.15.1&lt;br /&gt;
| 0.18&lt;br /&gt;
| 0.19.1&lt;br /&gt;
| 0.21.1&lt;br /&gt;
| 0.20.1&lt;br /&gt;
| Cython is a compiler which compiles Python-like code files to C code and allows them to be easily called from Python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://code.google.com/p/python-nose/ nose]&lt;br /&gt;
| 1.1.2&lt;br /&gt;
| 1.2.1&lt;br /&gt;
| 1.3.0&lt;br /&gt;
| 1.3.4&lt;br /&gt;
| 1.3.0&lt;br /&gt;
| A unit-testing framework for python.&lt;br /&gt;
|- &lt;br /&gt;
| [http://pypi.python.org/pypi/setuptools setuptools]&lt;br /&gt;
| 0.6c11&lt;br /&gt;
| 0.6c11&lt;br /&gt;
| 1.1&lt;br /&gt;
| 7.0&lt;br /&gt;
| 5.1&lt;br /&gt;
| Enables easy installation of new python modules&lt;br /&gt;
|-&lt;br /&gt;
| [http://pandas.pydata.org/ pandas]&lt;br /&gt;
| 0.13.0&lt;br /&gt;
| 0.13.0&lt;br /&gt;
| 0.13.0&lt;br /&gt;
| 0.15.0&lt;br /&gt;
| 0.14.1&lt;br /&gt;
| high-performance, easy-to-use data structures and data analysis tools.&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.astropy.org astropy]&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 0.3&lt;br /&gt;
| 0.4.2&lt;br /&gt;
| 0.3.2&lt;br /&gt;
| astronomical routines&lt;br /&gt;
|-&lt;br /&gt;
| [http://briansimulator.org/ brian]&lt;br /&gt;
| 1.4.1&lt;br /&gt;
| 1.4.1&lt;br /&gt;
| 1.4.1&lt;br /&gt;
| 1.4.1&lt;br /&gt;
| -&lt;br /&gt;
| spiking neural network simulator&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Producing Matplotlib Figures on GPC Compute Nodes and in Job Scripts ==&lt;br /&gt;
&lt;br /&gt;
The conventional way of producing figures from python using matplotlib i.e., &lt;br /&gt;
&lt;br /&gt;
    import matplotlib.pyplot as plt&lt;br /&gt;
    plt.plot(.....)&lt;br /&gt;
    plt.savefig(...)&lt;br /&gt;
&lt;br /&gt;
will not work on the GPC compute nodes. The reason is that pyplot will try to open the figure in a window on the screen, but the compute nodes do not have screens or window managers.  There is an easy workaround, however, that sets up a different 'backend' to matplotlib, one that does not try to open a window, as follows:&lt;br /&gt;
 &lt;br /&gt;
    import matplotlib as mpl&lt;br /&gt;
    mpl.use('Agg')&lt;br /&gt;
    import matplotlib.pyplot as plt&lt;br /&gt;
    plt.plot(.....)&lt;br /&gt;
    plt.savefig(...)&lt;br /&gt;
&lt;br /&gt;
It is essential that the &amp;lt;tt&amp;gt;mpl.use('Agg')&amp;lt;/tt&amp;gt; command precedes the importing of pyplot. &lt;br /&gt;
&lt;br /&gt;
== Installing your own Python Modules ==&lt;br /&gt;
&lt;br /&gt;
Python provides an easy way for users to install the libraries they need in their home directories rather than having them installed system-wide. There are so many optional  packages for Python people could potentially want (see e.g. http://pypi.python.org/pypi), that we recommend users install these additional packages locally in their home directories.  This is almost certainly the easiest way to deal with the wide range of packages, ensure they're up to date, and ensure that users' package choices don't conflict. &lt;br /&gt;
&lt;br /&gt;
To install your own Python modules, follow the instructions below.   Where the instructions say &amp;lt;tt&amp;gt;python2.X&amp;lt;/tt&amp;gt;, type &amp;lt;tt&amp;gt;python2.6&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;python2.7&amp;lt;/tt&amp;gt; depending on the version of python you are using.&lt;br /&gt;
&lt;br /&gt;
* First, create a directory in your home directory, &amp;lt;tt&amp;gt;${HOME}/lib/python2.X/site-packages&amp;lt;/tt&amp;gt;, where the packages will go.&lt;br /&gt;
* Next, in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;, *after* you &amp;lt;tt&amp;gt;module load python&amp;lt;/tt&amp;gt; and in the &amp;quot;GPC&amp;quot; section, add the following line:&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
export PYTHONPATH=${PYTHONPATH}:${HOME}/lib/python2.X/site-packages/&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Re-load the modified .bashrc by typing &amp;lt;tt&amp;gt;source ~/.bashrc&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
* Now, if it's a standard python package and instructions say that you can use easy_intall to install it,&lt;br /&gt;
** install with the following command. where &amp;lt;tt&amp;gt;packagename&amp;lt;/tt&amp;gt; is the name of the package you are installing: &lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
easy_install --prefix=${HOME} -O1 [packagename]&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
** Continue doing this until all of the packages you need to install are successfully installed.&lt;br /&gt;
** If, upon importing the new python package, you get error messages like &amp;lt;tt&amp;gt;undefined symbol: __stack_chk_guard&amp;lt;/tt&amp;gt;, you may need to use the following command instead:&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
LDFLAGS=-fstack-protector easy_install --prefix=${HOME} -O1 [packagename]&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
* If easy_install isn't an option for your package, and the installation instructions instead talk about downloading a file and using &amp;lt;tt&amp;gt;python setup.py install&amp;lt;/tt&amp;gt; then instead:&lt;br /&gt;
** Download the relevant files&lt;br /&gt;
** You will probably have to uncompress and untar them: &amp;lt;tt&amp;gt;tar -xzvf packagename.tgz&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;tar -xjvf packagename.bz2&amp;lt;/tt&amp;gt;.&lt;br /&gt;
** cd into the newly created directory, and run &lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
python setup.py install --prefix=${HOME}&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
* Now, the install process may have added some .egg files or directories to your path.  For each .egg directory, add that to your python path as well in your .bashrc, in the same place as you had updated PYTHONPATH before: eg,&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
export PYTHONPATH=${PYTHONPATH}:${HOME}/lib/python2.X/site-packages:${HOME}/lib/python2.X/site-packages/packagename1-x.y.z-yy2.X.egg:${HOME}/lib/python2.X/site-packages/packagename2-a.b.c-py2.X.egg&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* You should now be done!   Now, re-source your .bashrc and test your new python modules.&lt;br /&gt;
&lt;br /&gt;
* In order to keep your .bashrc relatively uncluttered, and to avoid potential conflicts among software modules, we recommend that users create their own  modules (for the &amp;quot;module&amp;quot; system, not specifically python modules).  &lt;br /&gt;
&lt;br /&gt;
[[Brian|Here]] is an example module for the [[Brian]] package, including instructions for the installation of the python [[Brian]] package itself.&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=451</id>
		<title>BGQ</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=451"/>
		<updated>2018-05-25T20:31:35Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* Bridge to HPSS */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Blue_Gene_Cabinet.jpeg|center|300px|thumb]]&lt;br /&gt;
|name=Blue Gene/Q (BGQ)&lt;br /&gt;
|installed=Aug 2012, Nov 2014&lt;br /&gt;
|operatingsystem= RH6.3, CNK (Linux) &lt;br /&gt;
|loginnode= bgqdev-fen1&lt;br /&gt;
|nnodes=  4096 nodes (65,536 cores)&lt;br /&gt;
|rampernode=16 GB &lt;br /&gt;
|corespernode=16 (64 threads)&lt;br /&gt;
|interconnect=5D Torus (jobs), QDR Infiniband (I/O) &lt;br /&gt;
|vendorcompilers= bgxlc, bgxlf&lt;br /&gt;
|queuetype=Loadleveler&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==System Status==&lt;br /&gt;
&lt;br /&gt;
The current BGQ system status can be found on the wiki's [[Main Page]].&lt;br /&gt;
&lt;br /&gt;
==SOSCIP &amp;amp; LKSAVI==&lt;br /&gt;
&lt;br /&gt;
The BGQ is a Southern Ontario Smart Computing&lt;br /&gt;
Innovation Platform ([http://soscip.org/ SOSCIP]) BlueGene/Q supercomputer located at the&lt;br /&gt;
University of Toronto's SciNet HPC facility. The SOSCIP &lt;br /&gt;
multi-university/industry consortium is funded by the Ontario Government &lt;br /&gt;
and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
A half-rack of BlueGene/Q (8,192 cores) was purchased by the [http://likashingvirology.med.ualberta.ca/ Li Ka Shing Institute of Virology] at the University of Alberta in late fall 2014 and integrated into the existing BGQ system.&lt;br /&gt;
&lt;br /&gt;
The combined 4 rack system is the fastest Canadian supercomputer on the [http://top500.org/ top 500], currently at the 120th place (Nov 2015).&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:bgq-support@scinet.utoronto.ca &amp;lt;bgq-support@scinet.utoronto.ca&amp;gt;] for BGQ-specific inquiries.&lt;br /&gt;
&lt;br /&gt;
==Specifications==&lt;br /&gt;
&lt;br /&gt;
BGQ is an extremely dense and energy efficient 3rd generation Blue Gene IBM supercomputer built around a system-on-a-chip compute node that has a 16core 1.6GHz PowerPC based CPU (PowerPC A2) with 16GB of Ram.  The nodes are bundled in groups of 32 into a node board (512 cores), and 16 boards make up a midplane (8192 cores) with 2 midplanes per rack, or 16,348 cores and 16 TB of RAM per rack. The compute nodes run a very lightweight Linux-based operating system called CNK ('''C'''ompute '''N'''ode '''K'''ernel).  The compute nodes are all connected together using a custom 5D torus highspeed interconnect. Each rack has 16 I/O nodes that run a full Redhat Linux OS that manages the compute nodes and mounts the filesystem.  SciNet's BGQ consists of 8 mdiplanes (four-racks) totalling 65,536 cores and 64TB of RAM.&lt;br /&gt;
&lt;br /&gt;
[[Image:BlueGeneQHardware2.png‎ |center]]&lt;br /&gt;
&lt;br /&gt;
=== 5D Torus Network ===&lt;br /&gt;
&lt;br /&gt;
The network topology of BlueGene/Q is a five-dimensional (5D) torus, with direct links between the nearest neighbors in the ±A, ±B, ±C, ±D, and ±E directions.  As such there are only a few optimum block sizes that will use the network efficiently.&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellspacing=&amp;quot;0&amp;quot; cellpadding=&amp;quot;2&amp;quot;&lt;br /&gt;
| '''Node Boards '''&lt;br /&gt;
| '''Compute Nodes'''&lt;br /&gt;
| '''Cores'''&lt;br /&gt;
| '''Torus Dimensions'''&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 32&lt;br /&gt;
| 512&lt;br /&gt;
| 2x2x2x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 2 (adjacent pairs)&lt;br /&gt;
| 64&lt;br /&gt;
| 1024&lt;br /&gt;
| 2x2x4x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 4 (quadrants)&lt;br /&gt;
| 128&lt;br /&gt;
| 2048&lt;br /&gt;
| 2x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 8 (halves)&lt;br /&gt;
| 256&lt;br /&gt;
| 4096&lt;br /&gt;
| 4x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 16 (midplane)&lt;br /&gt;
| 512&lt;br /&gt;
| 8192&lt;br /&gt;
| 4x4x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 32 (1 rack)&lt;br /&gt;
| 1024&lt;br /&gt;
| 16384&lt;br /&gt;
| 4x4x4x8x2 &lt;br /&gt;
|-&lt;br /&gt;
| 64 (2 racks)&lt;br /&gt;
| 2048&lt;br /&gt;
| 32768&lt;br /&gt;
| 4x4x8x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 96 (3 racks)&lt;br /&gt;
| 3072&lt;br /&gt;
| 49152&lt;br /&gt;
| 4x4x12x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 128 (4 racks)&lt;br /&gt;
| 4096&lt;br /&gt;
| 65536&lt;br /&gt;
| 8x4x8x8x2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Login/Devel Node ==&lt;br /&gt;
&lt;br /&gt;
The development node is '''bgqdev-fen1''' which one can login to from the regular '''login.scinet.utoronto.ca''' login nodes or directly from outside using '''bgqdev.scinet.utoronto.ca''', e.g.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -l USERNAME bgqdev.scinet.utoronto.ca -X&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where USERNAME is your username on the BGQ and the &amp;lt;tt&amp;gt;-X&amp;lt;/tt&amp;gt; flag is optional, needed only if you will use X graphics.&amp;lt;br/&amp;gt;&lt;br /&gt;
Note: To learn how to setup ssh keys for logging in please see [[SSH keys]].&lt;br /&gt;
&lt;br /&gt;
These development node is a Power7 machines running Linux which serve as the compilation and submission host for the BGQ.  Programs are cross-compiled for the BGQ on this node and then submitted to the queue using loadleveler.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including most of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
A list of the installed software can be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module unload vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands can go in your .bashrc files to make sure you are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
Modules that load libraries, define environment variables pointing to the location of library files and include files for use Makefiles. These environment variables follow the naming convention&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 $SCINET_[short-module-name]_BASE&lt;br /&gt;
 $SCINET_[short-module-name]_LIB&lt;br /&gt;
 $SCINET_[short-module-name]_INC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
for the base location of the module's files, the location of the libraries binaries and the header files, respectively.&lt;br /&gt;
&lt;br /&gt;
So to compile and link the library, you will have to add &amp;lt;tt&amp;gt;-I${SCINET_[module-basename]_INC}&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;-L${SCINET_[module-basename]_LIB}&amp;lt;/tt&amp;gt;, respectively, in addition to the usual &amp;lt;tt&amp;gt;-l[libname]&amp;lt;/tt&amp;gt;.  &lt;br /&gt;
&lt;br /&gt;
Note that a &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; command ''only'' sets the environment variables in your current shell (and any subprocesses that the shell launches).   It does ''not'' effect other shell environments.&lt;br /&gt;
&lt;br /&gt;
If you always require the same modules, it is easiest to load those modules in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and then they will always be present in your environment; if you routinely have to flip back and forth between modules, it is easiest to have almost no modules loaded in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and simply load them as you need them (and have the required &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; commands in your job submission scripts).&lt;br /&gt;
&lt;br /&gt;
=== Compilers ===&lt;br /&gt;
&lt;br /&gt;
The BGQ uses IBM XL compilers to cross-compile code for the BGQ.  Compilers are available for FORTRAN, C, and C++.  They are accessible by default, or by loading the '''xlf''' and '''vacpp''' modules. The compilers by default produce&lt;br /&gt;
static binaries, however with BGQ it is possible to now use dynamic libraries as well.  The compilers follow the XL conventions with the prefix '''bg''',&lt;br /&gt;
so '''bgxlc''' and '''bgxlf90''' are the C and FORTRAN compilers respectively.  &lt;br /&gt;
&lt;br /&gt;
Most users however will use the MPI variants, i.e. '''mpixlf90''' and '''mpixlc''' and  which are available by loading&lt;br /&gt;
the '''mpich2''' module. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load mpich2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is recommended to use at least the following flags when compiling and linking&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-O3 -qarch=qp -qtune=qp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to build a package for which the configure script tries to run small test jobs, the cross-compiling nature of the bgq can get in the way.  In that case, you should use the interactive [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] environment as described below.&lt;br /&gt;
&lt;br /&gt;
== ION/Devel Nodes ==&lt;br /&gt;
&lt;br /&gt;
There are also bgq native development nodes named '''bgqdev-ion[01-24]''' which one can login to directly, i.e. ssh, from '''bgqdev-fen1'''.  These nodes are extra I/O nodes that are essentially the same as the BGQ compute nodes with the exception that they run a full RedHat Linux and have an infiniband interface providing direct network access.    Unlike the regular development node, '''bgqdev-fen1''', which is Power7, this node has the same BGQ A2 processor, and thus cross compilations are not required which can make building some software easier.    &lt;br /&gt;
&lt;br /&gt;
'''NOTE''': BGQ MPI jobs can be compiled on these nodes, however can not be run locally as the mpich2 is setup for the BGQ network and thus will fail on these nodes.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
As the BlueGene/Q architecture is different from the development nodes, you cannot run applications intended/compiled for the BGQ on the devel nodes. The only way to run (or even test) your program is to submit a job to the BGQ.  Jobs are submitted as scripts through loadleveler. That script must then use '''runjob''' to start the job, which in many ways similar to mpirun or mpiexec.  As shown above in the network topology overview, there are only a few optimum job size configurations which is also further constrained by each block requiring a minimum of one IO node.  In SciNet's configuration (with 8 I/O nodes per midplane) this allows 64 nodes (1024 cores) to be the smallest block size. Normally a block size matches the job size to offer fully dedicated resources to the job.  Smaller jobs can be run within the same block however this results in shared resources (network and IO) and are referred to as sub-block jobs and are described in more detail below.  &lt;br /&gt;
&lt;br /&gt;
=== runjob ===&lt;br /&gt;
&lt;br /&gt;
All BGQ runs are launched using '''runjob''' which for those familiar with MPI is analogous to mpirun/mpiexec.  Jobs run on a block, which is a predefined group of nodes that have already been configured and booted.  There are two ways to get a block. One way is to use a 30-minute 'debugjob' session (more about that below). The other, more common case, is using a job script submitted and are running using loadleveler. Inside the job script, this block is set for you, and you do not have to specify the block name.  For example, if your loadleveler job script requests 64 nodes, each with 16 cores (for a total of 1024 cores), from within that job script, you can run a job with 16 processes per node and 1024 total processes with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here, &amp;lt;tt&amp;gt;--np 1024&amp;lt;/tt&amp;gt; sets the total number of mpi tasks, while &amp;lt;tt&amp;gt;--ranks-per-node=16&amp;lt;/tt&amp;gt; specifies that 16 processes should run on each node.&lt;br /&gt;
For pure mpi jobs, it is advisable always to give the number of ranks per node, because the default value of 1 may leave 15 cores on the node idle. The argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- (Note: If this were not a loadleveler job, and the block ID was R00-M0-N03-64, the command would be &amp;quot;&amp;lt;tt&amp;gt;runjob --block R00-M0-N03-64 --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&amp;lt;/tt&amp;gt;&amp;quot;) --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
runjob flags are shown with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
a particularly useful one is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--verbose #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where # is from 1-7 which can be helpful in debugging an application.&lt;br /&gt;
&lt;br /&gt;
=== How to set ranks-per-node ===&lt;br /&gt;
&lt;br /&gt;
There are 16 cores per node, but the argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64.  While it may seem natural to set ranks-per-node to 16, this is not generally recommended.  On the BGQ, one can efficiently run more than 1 process per core, because each core has four &amp;quot;hardware threads&amp;quot; (similar to HyperThreading on the GPC and Simultaneous Multi Threading on the TCS and P7), which can keep the different parts of each core busy at the same time. One would therefore ideally use 64 ranks per node.  There are two main reason why one might not set ranks-per-node to 64:&lt;br /&gt;
# The memory requirements do not allow 64 ranks (each rank only has 256MB of memory)&lt;br /&gt;
# The application is more efficient in a hybrid MPI/OpenMP mode (or MPI/pthreads). Using less ranks-per-node, the hardware threads are used as OpenMP threads within each process.&lt;br /&gt;
Because threads can share memory, the memory requirements of the hybrid runs is typically smaller than that of pure MPI runs.&lt;br /&gt;
&lt;br /&gt;
Note that the total number of mpi processes in a runjob (i.e., the --np argument) should be the ranks-per-node times the number of nodes (set by bg_size in the loadleveler script). So for the same number of nodes, if you change ranks-per-node by a factor of two, you should also multiply the total number of mpi processes by two.&lt;br /&gt;
&lt;br /&gt;
=== Queue Limits ===&lt;br /&gt;
&lt;br /&gt;
The maximum wall_clock_limit is 24 hours.  Official SOSCIP project jobs are prioritized over all other jobs using a fairshare algorithm with a 14 day rolling window.&lt;br /&gt;
&lt;br /&gt;
A 64 node block is reserved for development and interactive testing for 16 hours, from 8AM to midnight, everyday including weekends. While you can still reserve an interactive block from midnight to 8AM, the priority is given to batch jobs at that time interval in order to keep the machine usage as high as possible. This block is accessed by using the [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] command which has a 30 minute maximum wall_clock_limit. The purpose of this reservation is to ensure short testing jobs are run quickly without being held up by longer production type jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- We need to recover this functionality again. At the moment it doesn't work&lt;br /&gt;
=== BACKFILL scheduling ===&lt;br /&gt;
To optimize the cluster usage, we encourage users to submit jobs according to the available resources on BGQ. The command &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;llAvailableResources&amp;lt;/span&amp;gt; gives for example :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
On the Devel system : only a debugjob can start immediately&lt;br /&gt;
&lt;br /&gt;
On the Prod. system : a job will start immediately if you use 512 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 256 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 128 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 64 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Batch Jobs ===&lt;br /&gt;
&lt;br /&gt;
Job submission is done through loadleveler with a few blue gene specific commands.  The command &amp;quot;bg_size&amp;quot; is in number of nodes, not cores, so a bg_size=64 would be 64x16=1024 cores.&lt;br /&gt;
&lt;br /&gt;
The parameter &amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;bg_size&amp;lt;/span&amp;gt; can only be equal to 64, 128, 256, 512, 1024 and 2048.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;np&amp;lt;/span&amp;gt; &amp;amp;le; ranks-per-node * bg_size&lt;br /&gt;
&lt;br /&gt;
ranks-per-node &amp;amp;le; np&lt;br /&gt;
&lt;br /&gt;
(ranks-per-node * OMP_NUM_THREADS ) &amp;amp;le; 64 &lt;br /&gt;
&lt;br /&gt;
np : number of MPI processes&lt;br /&gt;
&lt;br /&gt;
ranks-per-node : number of MPI processes per node = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
OMP_NUM_THREADS : number of OpenMP thread per MPI process (for hybrid codes) = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue &lt;br /&gt;
&lt;br /&gt;
# Launch all BGQ jobs using runjob&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit to the queue use &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llsubmit myscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== Steps ( Job dependency) ===&lt;br /&gt;
LoadLeveler has a lot of advanced features to control job submission and execution. One of these features is called steps. This feature allows a series of jobs to be submitted using one script with dependencies defined between the jobs. What this allows is for a series of jobs to be run sequentially, waiting for the previous job, called a step, to be finished before the next job is started. The following example uses the same LoadLeveler script as previously shown, however the #@ step_name and #@ dependency directives are used to rerun the same case three times in a row, waiting until each job is finished to start the next.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step1                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the first step :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step1&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step2                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step1 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the second step if the first one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step2&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step3                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step2 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the third step if the second one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step3&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Jobs ===&lt;br /&gt;
&lt;br /&gt;
To see running jobs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq -b&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
to cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llcancel JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and to look at details of the bluegene resources use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstatus -M all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note: the loadleveler script commands  are not run on a bgq compute node but on the front-end node. Only programs started with runjob run on the bgq compute nodes. You should therefore keep scripting in the submission script to a bare minimum.'''&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Stats ===&lt;br /&gt;
&lt;br /&gt;
Use llbgstats to monitor your own stats and/or your group stats. PIs can also print their (current) monthly report.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstats -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Interactive Use / Debugging ===&lt;br /&gt;
&lt;br /&gt;
As BGQ codes are cross-compiled they cannot be run direclty on the front-nodes.  &lt;br /&gt;
Users however only have access to the BGQ through loadleveler which is appropriate for batch jobs, &lt;br /&gt;
however an interactive session is typically beneficial when debugging and developing.   As such a &lt;br /&gt;
script has been written to allow a session in which runjob can be run interactively.  The script&lt;br /&gt;
uses loadleveler to setup a block and set all the correct environment variables and then launch a spawned shell on&lt;br /&gt;
the front-end node. The '''debugjob''' session currently allows a 30 minute session on 64 nodes and when run on &lt;br /&gt;
'''&amp;lt;tt&amp;gt;bgqdev&amp;lt;/tt&amp;gt;''' runs in a dedicated reservation as described previously in the [[BGQ#Queue_Limits | queue limits]] section. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[user@bgqdev-fen1]$ debugjob&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ runjob --np 64 --ranks-per-node=16 --cwd=$PWD : $PWD/my_code -f myflags&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ exit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For debugging, gdb and Allinea DDT are available. The latter is recommended as it automatically attaches to all the processes of a process (instead of attaching a gdbtool by hand (as explained in the BGQ Application Development guide, link below). Simply compile with &amp;lt;tt&amp;gt;-g&amp;lt;/tt&amp;gt;, load the &amp;lt;tt&amp;gt;ddt/4.1&amp;lt;/tt&amp;gt; module, type &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt; and follow the graphical user interface.  The DDT user guide can be found below.&lt;br /&gt;
&lt;br /&gt;
Note: when running a job under ddt, you'll need to add &amp;quot;&amp;lt;tt&amp;gt;--ranks-per-node=X&amp;lt;/tt&amp;gt;&amp;quot; to the &amp;quot;runjob arguments&amp;quot; field.&lt;br /&gt;
&lt;br /&gt;
Apart from debugging, this environment is also useful for building libraries and applications that need to run small tests as part of their 'configure' step.   Within the debugjob session, applications compiled with the bgxl compilers or the mpcc/mpCC/mpfort wrappers, will automatically run on the BGQ, skipping the need for the runjob command, provided if you set the following environment variables &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export BG_PGM_LAUNCHER=yes&lt;br /&gt;
$ export RUNJOB_NP=1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The latter setting sets the number of mpi processes to run.  Most configure scripts expect only one mpi process, thus, &amp;lt;tt&amp;gt;RUNJOB_NP=1&amp;lt;/tt&amp;gt; is appropriate.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
debugjob session with an executable implicitly calls runjob  with 1 mpi task :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
debugjob -i&lt;br /&gt;
**********************************************************&lt;br /&gt;
 Interactive BGQ runjob shell using bgq-fen1-ib0.10295.0 and           &lt;br /&gt;
 LL14040718574824 for 30 minutes with 64 NODES (1024 cores). &lt;br /&gt;
 IMPLICIT MODE: running an executable implicitly calls runjob&lt;br /&gt;
                with 1 mpi task&lt;br /&gt;
 Exit shell when finished.                                &lt;br /&gt;
**********************************************************&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Sub-block jobs ===&lt;br /&gt;
&lt;br /&gt;
BGQ allows multiple applications to share the same block, which is referred to as sub-block jobs, however this needs to be done from within the same loadleveler submission script using multiple calls to runjob.  To run a sub-block job, you need to specify a &amp;quot;--corner&amp;quot; within the block to start each job and a 5D Torus AxBxCxDxE &amp;quot;--shape&amp;quot;.  The starting corner will depend on the specific block details provided by loadleveler and the shape and size of job trying to be used.  &lt;br /&gt;
&lt;br /&gt;
Figuring out what the corners and shapes should be is very tricky (especially since it depends on the block you get allocated).  For that reason, we've created a script called &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; that determines the corners and shape of the sub-blocks.  It only handles the (presumable common) case in which you want to subdivide the block into n equally sized sub-blocks, where n may be 1,2,4,8,16 and 32.&lt;br /&gt;
&lt;br /&gt;
Here is an example script calling &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; with a size of 4 that will return the appropriate $SHAPE argument and an array of 16 starting $CORNER. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# @ job_name           = bgsubblock&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job SUBBLOCK &amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue&lt;br /&gt;
&lt;br /&gt;
# Using subblocks script to set $SHAPE and array of ${CORNERS[n]}&lt;br /&gt;
# with size of subblocks in nodes (ie similiar to bg_size)&lt;br /&gt;
&lt;br /&gt;
# In this case 16 sub-blocks of 4 cnodes each (64 total ie bg_size)&lt;br /&gt;
source subblocks 4&lt;br /&gt;
&lt;br /&gt;
# 16 jobs of 4 each&lt;br /&gt;
for (( i=0; i &amp;lt;  16 ; i++)); do&lt;br /&gt;
   runjob --corner ${CORNER[$i]} --shape $SHAPE --np 64 --ranks-per-node=16 :  your_code_here &amp;gt; $i.out &amp;amp;&lt;br /&gt;
done&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Remember that subjobs are not the ideal way to run on the BlueGene/Qs. One needs to consider that these sub-blocks all have to share the same I/O nodes, so for I/O intensive jobs this will be an inefficient setup.  Also consider that if you need to run such small jobs that you have to run in sub-blocks, it may be more efficient to use other clusters such as the GPC.&lt;br /&gt;
&lt;br /&gt;
Let us know if you run into any issues with this technique, please contact bgq-support for help.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The BGQ has its own dedicated 500TB file system based on GPFS (General Parallel File System). There are two main systems for user data: /home, a small, backed-up space where user home directories are located, and /scratch, a large system for input or output data for jobs; data on /scratch is not backed up. The path to your home directory is in the environment variable $HOME, and will look like /home/G/GROUP/USER, .  The path to your scratch directory is in the environment variable $SCRATCH, and will look like /scratch/G/GROUP/USER (following the conventions of the rest of the SciNet systems).  &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! | file system &lt;br /&gt;
! | purpose &lt;br /&gt;
! | user quota &lt;br /&gt;
! | backed up&lt;br /&gt;
! | purged&lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 50 GB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
|-&lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| first of (20 TB ; 1 million files)&lt;br /&gt;
| no&lt;br /&gt;
| not currently&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Transfering files===&lt;br /&gt;
The BGQ GPFS file system,  except for HPSS, is '''not''' shared with the other SciNet systems (gpc, tcs, p7, arc), nor is the other file system mounted on the BGQ.  &lt;br /&gt;
Use scp to copy files from one file system to the other, e.g., from bgqdev-fen1, you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour login.scinet.utoronto.ca:code.tgz .&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or from a login node you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour code.tgz bgqdev.scinet.utoronto.ca:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The flag &amp;lt;tt&amp;gt;-c arcfour&amp;lt;/tt&amp;gt; is optional. It tells scp (or really, ssh), to use a non-default encryption. The one chosen here, arcfour, has been found to speed up the transfer by a factor of two (you may expect around 85MB/s).  This encryption method is only recommended for copying from the BGQ file system to the regular SciNet GPFS file system or back. &lt;br /&gt;
 &lt;br /&gt;
Note that although these transfers are witihin the same data center, you have to use the full names of the systems, login.scinet.utoronto.ca and bgq.scinet.utoronto.ca, respectively, and that you will be asked you for your password.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''diskUsage'''&amp;lt;/tt&amp;gt; command, available on the bgqdev nodes, provides information in a number of ways on the home and scratch file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;] [-de|-plot]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
       -plot: create plots of disk usages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that the information on usage and quota is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
===Bridge to HPSS===&lt;br /&gt;
&lt;br /&gt;
BGQ users may transfer material to/from HPSS via the GPC archive queue. On the HPSS gateway node (gpc-archive01), the BGQ GPFS file systems are mounted under a single mounting point /bgq (/bgq/scratch and /bgq/home). For detailed information on the use of HPSS [https://docs.scinet.utoronto.ca/index.php/HPSS please read the HPSS wiki section.]&lt;br /&gt;
&lt;br /&gt;
== Software modules installed on the BGQ ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! |Software  &lt;br /&gt;
! | Version&lt;br /&gt;
! | Comments&lt;br /&gt;
! | Command/Library&lt;br /&gt;
! | Module Name&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Compilers &amp;amp; Development Tools'''''&lt;br /&gt;
|-&lt;br /&gt;
|IBM fortran compiler&lt;br /&gt;
|14.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlf,bgxlf_r,bgxlf90,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|xlf&lt;br /&gt;
|-&lt;br /&gt;
|IBM c/c++ compilers&lt;br /&gt;
|12.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlc,bgxlC,bgxlc_r,bgxlC_r,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|vacpp&lt;br /&gt;
|-&lt;br /&gt;
|MPICH2 MPI library&lt;br /&gt;
|1.4.1&lt;br /&gt;
|There are 4 versions (see BGQ Applications Development document).&lt;br /&gt;
|&amp;lt;tt&amp;gt;mpicc,mpicxx,mpif77,mpif90&amp;lt;/tt&amp;gt;&lt;br /&gt;
|mpich2&lt;br /&gt;
|- &lt;br /&gt;
| GCC Compiler&lt;br /&gt;
| 4.4.6, 4.8.1&lt;br /&gt;
| GNU Compiler Collection for BGQ&amp;lt;br&amp;gt;(4.8.1 requires binutils/2.23 to be loaded)&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-gcc, powerpc64-bgq-linux-g++, powerpc64-bgq-linux-gfortran&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgqgcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Clang Compiler&lt;br /&gt;
| r217688-20140912, r263698-20160317&lt;br /&gt;
| Clang cross-compilers for bgq&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-clang, powerpc64-bgq-linux-clang++&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgclang&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Binutils&lt;br /&gt;
| 2.21.1, 2.23&lt;br /&gt;
| Cross-compilation utilities&lt;br /&gt;
| &amp;lt;tt&amp;gt;addr2line, ar, ld, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;binutils&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| CMake	&lt;br /&gt;
| 2.8.8, 2.8.12.1&lt;br /&gt;
| cross-platform, open-source build system&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Git&lt;br /&gt;
| 1.9.5&lt;br /&gt;
| Revision control system&lt;br /&gt;
| &amp;lt;tt&amp;gt;git, gitk&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;git&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Debug/performance tools'''''&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/gdb/ gdb]&lt;br /&gt;
| 7.2&lt;br /&gt;
| GNU Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/ddd/ ddd]&lt;br /&gt;
| 3.3.12&lt;br /&gt;
| GNO Data Display Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [http://www.allinea.com/products/ddt/ DDT]&lt;br /&gt;
| 4.1, 4.2, 5.0.1&lt;br /&gt;
| Allinea's Distributed Debugging Tool&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[HPCTW]]&lt;br /&gt;
| 1.0&lt;br /&gt;
| BGQ MPI and Hardware Counters&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmpihpm.a, libmpihpm_smp.a, libmpitrace.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hptibm&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[MemP]]&lt;br /&gt;
| 1.0.3&lt;br /&gt;
| BGQ Memory Stats&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmemP.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;memP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Storage tools/libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.9-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/189-v18-serial-xlc*&amp;lt;br/&amp;gt;hdf5/189-v18-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.12-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/1812-v18-serial-gcc&amp;lt;br/&amp;gt;hdf5/1812-v18-mpich2-gcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NetCDF&lt;br /&gt;
| 4.2.1.1&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncdump,ncgen,libnetcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;netcdf/4.2.1.1-serial-xlc*&amp;lt;br/&amp;gt;netcdf/4.2.1.1-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Parallel NetCDF&lt;br /&gt;
| 1.3.1&lt;br /&gt;
| Parallel scientific data storage and retrieval using MPI-IO&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpnetcdf.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parallel-netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| ESSL&lt;br /&gt;
| 5.1&lt;br /&gt;
| IBM Engineering and Scientific Subroutine Library (manual below)&lt;br /&gt;
| &amp;lt;tt&amp;gt;libesslbg,libesslsmpbg&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;essl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| WSMP&lt;br /&gt;
| 15.06.01&lt;br /&gt;
| Watson Sparse Matrix Package&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpwsmpBGQ.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;WSMP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| FFTW&lt;br /&gt;
| 2.1.5, 3.3.2, 3.1.2-esslwrapper&lt;br /&gt;
| Fast fourier transform &lt;br /&gt;
| &amp;lt;tt&amp;gt;libsfftw,libdfftw,libfftw3, libfftw3f&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;fftw/2.1.5, fftw/3.3.2, fftw/3.1.2-esslwrapper&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAPACK + ScaLAPACK&lt;br /&gt;
| 3.4.2 + 2.0.2&lt;br /&gt;
| Linear algebra routines. A subset of Lapack may be found in ESSL as well.&lt;br /&gt;
| &amp;lt;tt&amp;gt;liblapack, libscalpack&amp;lt;/tt&amp;gt;&lt;br /&gt;
| lapack&lt;br /&gt;
|-&lt;br /&gt;
| GSL&lt;br /&gt;
| 1.15&lt;br /&gt;
| GNU Scientific Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libgsl, libgslcblas&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gsl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| BOOST&lt;br /&gt;
| 1.47.0, 1.54, 1.57&lt;br /&gt;
| C++ Boost libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libboost...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cxxlibraries/boost&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| bzip2 + szip + zlib&lt;br /&gt;
| 1.0.6 + 2.1 + 1.2.7&lt;br /&gt;
| compression libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libbz2,libz,libsz&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;compression&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| METIS&lt;br /&gt;
| 5.0.2&lt;br /&gt;
| Serial Graph Partitioning and Fill-reducing Matrix Ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;metis&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| ParMETIS&lt;br /&gt;
| 4.0.2&lt;br /&gt;
| Parallel graph partitioning and fill-reducing matrix ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libparmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parmetis&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| OpenSSL&lt;br /&gt;
| 1.0.2 &lt;br /&gt;
| General-purpose cryptography library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libcrypto, libssl&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openssl&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| FILTLAN&lt;br /&gt;
| 1.0&lt;br /&gt;
| The Filtered Lanczos Package &lt;br /&gt;
| &amp;lt;tt&amp;gt;libdfiltlan,libdmatkit,libsfiltlan,libsmatkit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FILTLAN&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Scripting/interpreted languages'''''&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.6.6&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-2.6/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.7.3&lt;br /&gt;
| Python programming language. Modules included : numpy-1.8.0, pyFFTW-0.9.2, astropy-0.3, scipy-0.13.3, mpi4py-1.3.1, h5py-2.2.1&lt;br /&gt;
| &amp;lt;tt&amp;gt;/scinet/bgq/tools/Python/python2.7.3-20131205/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 3.2.2&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-3.2/bin/python3&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Applications'''''&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.abinit.org/ ABINIT]&lt;br /&gt;
| 7.10.4&lt;br /&gt;
| An atomic-scale simulation software suite&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.berkeleygw.org/ BerkeleyGW library]&lt;br /&gt;
| 1.0.4-2.0.0436&lt;br /&gt;
| Computes quasiparticle properties and the optical responses of a large variety of materials&lt;br /&gt;
| &amp;lt;tt&amp;gt;libBGW_wfn.a, wfn_rho_vxc_io_m.mod&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;BGW-paratec&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.cp2k.org/ CP2K]&lt;br /&gt;
| 2.3, 2.4, 2.5.1, 2.6.1&lt;br /&gt;
| DFT molecular dynamics, MPI &lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k.psmp&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.cpmd.org/ CPMD]&lt;br /&gt;
| 3.15.3, 3.17.1&lt;br /&gt;
| Carr-Parinello molecular dynamics, MPI&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd.x&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| gnuplot&lt;br /&gt;
| 4.6.1&lt;br /&gt;
| interactive plotting program to be run on front-end nodes&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAMMPS&lt;br /&gt;
| Nov 2012/7Dec15/7Dec15-mpi&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;lmp_bgq&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;lammps&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NAMD&lt;br /&gt;
| 2.9&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;namd2&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;namd/2.9-smp&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.quantum-espresso.org/index.php Quantum Espresso]&lt;br /&gt;
| 5.0.3/5.2.1&lt;br /&gt;
| Molecular Structure / Quantum Chemistry &lt;br /&gt;
| &amp;lt;tt&amp;gt;qe_pw.x, etc&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;espresso&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://openfoam.org OpenFOAM]&lt;br /&gt;
| 2.2.0, 2.3.0, 2.4.0, 3.0.1, 5.0&lt;br /&gt;
| Computational Fluid Dynamics&lt;br /&gt;
| &amp;lt;tt&amp;gt;icofoam,etc. &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openfoam/2.2.0, openfoam/2.3.0, openfoam/2.4.0, openfoam/3.0.1, openfoam/5.0&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Beta Tests'''''&lt;br /&gt;
|-&lt;br /&gt;
| WATSON API&lt;br /&gt;
| beta&lt;br /&gt;
| Natural Language Processing&lt;br /&gt;
| &amp;lt;tt&amp;gt;watson_beta&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FEN/WATSON&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== OpenFOAM on BGQ ===&lt;br /&gt;
&lt;br /&gt;
[https://docs.scinet.utoronto.ca/index.php/OpenFOAM_on_BGQ A detailed explanation of OpenFOAM usage on BG/Q cluster]&lt;br /&gt;
&lt;br /&gt;
== Python on BlueGene ==&lt;br /&gt;
Python 2.7.3 has been installed on BlueGene. To use &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Numpy&amp;lt;/span&amp;gt; and &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Scipy&amp;lt;/span&amp;gt;, the module &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;essl/5.1&amp;lt;/span&amp;gt; has to be loaded.&lt;br /&gt;
The full python path has to be provided (otherwise the default version is used).&lt;br /&gt;
&lt;br /&gt;
To use python on BlueGene (from within a job script or a debugjob session):&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load python/2.7.3&lt;br /&gt;
##Only if you need numpy/scipy :&lt;br /&gt;
module load xlf/14.1 essl/5.1&lt;br /&gt;
runjob --np 1 --ranks-per-node=1 --envs HOME=$HOME LD_LIBRARY_PATH=$LD_LIBRARY_PATH PYTHONPATH=/scinet/bgq/tools/Python/python2.7.3-20131205/lib/python2.7/site-packages/ : /scinet/bgq/tools/Python/python2.7.3-20131205/bin/python2.7 /PATHOFYOURSCRIPT.py &lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to use the mmap python API, you must use it in PRIVATE mode as shown in the bellow example :&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
import mmap&lt;br /&gt;
mm=mmap.mmap(-1,256,mmap.MAP_PRIVATE)&lt;br /&gt;
mm.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, you can use the mpi4py and h5py modules.&lt;br /&gt;
&lt;br /&gt;
Also, please read Cython documentation.&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
#BGQ Day: Introduction to Using the BG/Q [[Media:BgqintroUpdatedMarch2015.pdf|Slides (updated in 2015) ]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.mp4 (direct link)]&lt;br /&gt;
#BGQ Day: BG/Q Hardware Overview [https://support.scinet.utoronto.ca/~northrup/bgqhardware.pdf Slides] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.mp4 (direct link)]&lt;br /&gt;
# [http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUQUEEN/Documentation/Documention_node.html Julich BGQ Documentation]&lt;br /&gt;
# [https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q Argonne Mira BGQ Wiki]&lt;br /&gt;
# [https://computing.llnl.gov/tutorials/bgq/ LLNL Sequoia BGQ Info]&lt;br /&gt;
# [https://www.alcf.anl.gov/presentations Argonne MiraCon Presentations]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_SysAdmin.pdf|BGQ System Administration Guide]]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_AppDev.pdf|BGQ Application Development]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqccompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqclangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfcompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqflangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:Bgqfproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# [[Media:essl51.pdf|IBM ESSL (Engineering and Scientific Subroutine Library) 5.1 for Linux on Power]]&lt;br /&gt;
# [http://content.allinea.com/downloads/userguide.pdf Allinea DDT 4.1 User Guide]&lt;br /&gt;
# [https://www.ibm.com/support/knowledgecenter/en/SSFJTW_5.1.0/loadl.v5r1_welcome.html IBM LoadLeveler 5.1]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--  PUT IN TRAC !!!&lt;br /&gt;
&lt;br /&gt;
=== *Manual Block Creation* ===&lt;br /&gt;
&lt;br /&gt;
To reconfigure the BGQ nodes you can use the bg_console or the web based navigator from the service node &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bg_console&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are various options to create block types (section 3.2 in the BGQ admin manual), but the smallest is created using the&lt;br /&gt;
following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gen_small_block &amp;lt;blockid&amp;gt; &amp;lt;midplane&amp;gt; &amp;lt;cnodes&amp;gt; &amp;lt;nodeboard&amp;gt; &lt;br /&gt;
gen_small_block  R00-M0-N03-32 R00-M0 32 N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The block then needs to be booted using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
allocate R00-M0-N03-32&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If those resources are already booted into another block, that block must be freed before the new block can be &lt;br /&gt;
allocated.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
free R00-M0-N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many other functions in bg_console:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
help all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The BGQ default nomenclature for hardware is as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
(R)ack - (M)idplane - (N)ode board or block - (J)node - (C)ore&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So R00-M01-N03-J00-C02 would correspond to the first rack, second midplane, 3rd block, 1st node, and second core.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--!&amp;gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=MemP&amp;diff=450</id>
		<title>MemP</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=MemP&amp;diff=450"/>
		<updated>2018-05-25T20:30:00Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: Created page with &amp;quot;memP is a Lawrence Livermore National Labs (LLNL) developed, light weight, parallel heap profiling library. Its primarily designed to identify the heap allocation that causes...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;memP is a Lawrence Livermore National Labs (LLNL) developed, light weight, parallel heap profiling library. Its primarily designed to identify the heap allocation that causes an MPI task to reach its memory in use high water mark (HWM).&lt;br /&gt;
&lt;br /&gt;
== memP Reports ==&lt;br /&gt;
&lt;br /&gt;
'''Summary Report:''' Generated from within MPI_Finalize, this report describes the memory HWM of each task over the run of the application. This can be used to determine which task allocates the most memory and how this compares to the memory of other tasks.&lt;br /&gt;
&lt;br /&gt;
'''Task Report:''' Based on specific criteria, a report can be generated for each task, that provides a snapshot of the heap memory currently in use, including the amount allocated at specific call sites.&lt;br /&gt;
&lt;br /&gt;
==Using memP==&lt;br /&gt;
&lt;br /&gt;
Load the memp Module&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load memP&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Compile with the recommended BG/Q flags and link your application with the required libraries:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-Wl,-zmuldefs ${SCINET_LIB_MEMP}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Examples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mpixlc -g -Wl,-zmuldefs -o myprog myprog.c -L/usr/local/tools/memP/lib -lmemP&lt;br /&gt;
mpixlf77 -g -Wl,-zmuldefs -o myprog myprog.f -L/usr/local/tools/memP/lib -lmemP &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then run your MPI application as usual and you will see an memP header and traile it sends to stdout, as well as the output file generated at the end of the run.&lt;br /&gt;
&lt;br /&gt;
==Output Options==&lt;br /&gt;
&lt;br /&gt;
See http://memp.sourceforge.net/ for full details.&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=447</id>
		<title>BGQ</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=447"/>
		<updated>2018-05-25T20:22:51Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* Login/Devel Node */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Blue_Gene_Cabinet.jpeg|center|300px|thumb]]&lt;br /&gt;
|name=Blue Gene/Q (BGQ)&lt;br /&gt;
|installed=Aug 2012, Nov 2014&lt;br /&gt;
|operatingsystem= RH6.3, CNK (Linux) &lt;br /&gt;
|loginnode= bgqdev-fen1&lt;br /&gt;
|nnodes=  4096 nodes (65,536 cores)&lt;br /&gt;
|rampernode=16 GB &lt;br /&gt;
|corespernode=16 (64 threads)&lt;br /&gt;
|interconnect=5D Torus (jobs), QDR Infiniband (I/O) &lt;br /&gt;
|vendorcompilers= bgxlc, bgxlf&lt;br /&gt;
|queuetype=Loadleveler&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==System Status==&lt;br /&gt;
&lt;br /&gt;
The current BGQ system status can be found on the wiki's [[Main Page]].&lt;br /&gt;
&lt;br /&gt;
==SOSCIP &amp;amp; LKSAVI==&lt;br /&gt;
&lt;br /&gt;
The BGQ is a Southern Ontario Smart Computing&lt;br /&gt;
Innovation Platform ([http://soscip.org/ SOSCIP]) BlueGene/Q supercomputer located at the&lt;br /&gt;
University of Toronto's SciNet HPC facility. The SOSCIP &lt;br /&gt;
multi-university/industry consortium is funded by the Ontario Government &lt;br /&gt;
and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
A half-rack of BlueGene/Q (8,192 cores) was purchased by the [http://likashingvirology.med.ualberta.ca/ Li Ka Shing Institute of Virology] at the University of Alberta in late fall 2014 and integrated into the existing BGQ system.&lt;br /&gt;
&lt;br /&gt;
The combined 4 rack system is the fastest Canadian supercomputer on the [http://top500.org/ top 500], currently at the 120th place (Nov 2015).&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:bgq-support@scinet.utoronto.ca &amp;lt;bgq-support@scinet.utoronto.ca&amp;gt;] for BGQ-specific inquiries.&lt;br /&gt;
&lt;br /&gt;
==Specifications==&lt;br /&gt;
&lt;br /&gt;
BGQ is an extremely dense and energy efficient 3rd generation Blue Gene IBM supercomputer built around a system-on-a-chip compute node that has a 16core 1.6GHz PowerPC based CPU (PowerPC A2) with 16GB of Ram.  The nodes are bundled in groups of 32 into a node board (512 cores), and 16 boards make up a midplane (8192 cores) with 2 midplanes per rack, or 16,348 cores and 16 TB of RAM per rack. The compute nodes run a very lightweight Linux-based operating system called CNK ('''C'''ompute '''N'''ode '''K'''ernel).  The compute nodes are all connected together using a custom 5D torus highspeed interconnect. Each rack has 16 I/O nodes that run a full Redhat Linux OS that manages the compute nodes and mounts the filesystem.  SciNet's BGQ consists of 8 mdiplanes (four-racks) totalling 65,536 cores and 64TB of RAM.&lt;br /&gt;
&lt;br /&gt;
[[Image:BlueGeneQHardware2.png‎ |center]]&lt;br /&gt;
&lt;br /&gt;
=== 5D Torus Network ===&lt;br /&gt;
&lt;br /&gt;
The network topology of BlueGene/Q is a five-dimensional (5D) torus, with direct links between the nearest neighbors in the ±A, ±B, ±C, ±D, and ±E directions.  As such there are only a few optimum block sizes that will use the network efficiently.&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellspacing=&amp;quot;0&amp;quot; cellpadding=&amp;quot;2&amp;quot;&lt;br /&gt;
| '''Node Boards '''&lt;br /&gt;
| '''Compute Nodes'''&lt;br /&gt;
| '''Cores'''&lt;br /&gt;
| '''Torus Dimensions'''&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 32&lt;br /&gt;
| 512&lt;br /&gt;
| 2x2x2x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 2 (adjacent pairs)&lt;br /&gt;
| 64&lt;br /&gt;
| 1024&lt;br /&gt;
| 2x2x4x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 4 (quadrants)&lt;br /&gt;
| 128&lt;br /&gt;
| 2048&lt;br /&gt;
| 2x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 8 (halves)&lt;br /&gt;
| 256&lt;br /&gt;
| 4096&lt;br /&gt;
| 4x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 16 (midplane)&lt;br /&gt;
| 512&lt;br /&gt;
| 8192&lt;br /&gt;
| 4x4x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 32 (1 rack)&lt;br /&gt;
| 1024&lt;br /&gt;
| 16384&lt;br /&gt;
| 4x4x4x8x2 &lt;br /&gt;
|-&lt;br /&gt;
| 64 (2 racks)&lt;br /&gt;
| 2048&lt;br /&gt;
| 32768&lt;br /&gt;
| 4x4x8x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 96 (3 racks)&lt;br /&gt;
| 3072&lt;br /&gt;
| 49152&lt;br /&gt;
| 4x4x12x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 128 (4 racks)&lt;br /&gt;
| 4096&lt;br /&gt;
| 65536&lt;br /&gt;
| 8x4x8x8x2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Login/Devel Node ==&lt;br /&gt;
&lt;br /&gt;
The development node is '''bgqdev-fen1''' which one can login to from the regular '''login.scinet.utoronto.ca''' login nodes or directly from outside using '''bgqdev.scinet.utoronto.ca''', e.g.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -l USERNAME bgqdev.scinet.utoronto.ca -X&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where USERNAME is your username on the BGQ and the &amp;lt;tt&amp;gt;-X&amp;lt;/tt&amp;gt; flag is optional, needed only if you will use X graphics.&amp;lt;br/&amp;gt;&lt;br /&gt;
Note: To learn how to setup ssh keys for logging in please see [[SSH keys]].&lt;br /&gt;
&lt;br /&gt;
These development node is a Power7 machines running Linux which serve as the compilation and submission host for the BGQ.  Programs are cross-compiled for the BGQ on this node and then submitted to the queue using loadleveler.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including most of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
A list of the installed software can be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module unload vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands can go in your .bashrc files to make sure you are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
Modules that load libraries, define environment variables pointing to the location of library files and include files for use Makefiles. These environment variables follow the naming convention&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 $SCINET_[short-module-name]_BASE&lt;br /&gt;
 $SCINET_[short-module-name]_LIB&lt;br /&gt;
 $SCINET_[short-module-name]_INC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
for the base location of the module's files, the location of the libraries binaries and the header files, respectively.&lt;br /&gt;
&lt;br /&gt;
So to compile and link the library, you will have to add &amp;lt;tt&amp;gt;-I${SCINET_[module-basename]_INC}&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;-L${SCINET_[module-basename]_LIB}&amp;lt;/tt&amp;gt;, respectively, in addition to the usual &amp;lt;tt&amp;gt;-l[libname]&amp;lt;/tt&amp;gt;.  &lt;br /&gt;
&lt;br /&gt;
Note that a &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; command ''only'' sets the environment variables in your current shell (and any subprocesses that the shell launches).   It does ''not'' effect other shell environments.&lt;br /&gt;
&lt;br /&gt;
If you always require the same modules, it is easiest to load those modules in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and then they will always be present in your environment; if you routinely have to flip back and forth between modules, it is easiest to have almost no modules loaded in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and simply load them as you need them (and have the required &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; commands in your job submission scripts).&lt;br /&gt;
&lt;br /&gt;
=== Compilers ===&lt;br /&gt;
&lt;br /&gt;
The BGQ uses IBM XL compilers to cross-compile code for the BGQ.  Compilers are available for FORTRAN, C, and C++.  They are accessible by default, or by loading the '''xlf''' and '''vacpp''' modules. The compilers by default produce&lt;br /&gt;
static binaries, however with BGQ it is possible to now use dynamic libraries as well.  The compilers follow the XL conventions with the prefix '''bg''',&lt;br /&gt;
so '''bgxlc''' and '''bgxlf90''' are the C and FORTRAN compilers respectively.  &lt;br /&gt;
&lt;br /&gt;
Most users however will use the MPI variants, i.e. '''mpixlf90''' and '''mpixlc''' and  which are available by loading&lt;br /&gt;
the '''mpich2''' module. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load mpich2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is recommended to use at least the following flags when compiling and linking&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-O3 -qarch=qp -qtune=qp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to build a package for which the configure script tries to run small test jobs, the cross-compiling nature of the bgq can get in the way.  In that case, you should use the interactive [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] environment as described below.&lt;br /&gt;
&lt;br /&gt;
== ION/Devel Nodes ==&lt;br /&gt;
&lt;br /&gt;
There are also bgq native development nodes named '''bgqdev-ion[01-24]''' which one can login to directly, i.e. ssh, from '''bgqdev-fen1'''.  These nodes are extra I/O nodes that are essentially the same as the BGQ compute nodes with the exception that they run a full RedHat Linux and have an infiniband interface providing direct network access.    Unlike the regular development node, '''bgqdev-fen1''', which is Power7, this node has the same BGQ A2 processor, and thus cross compilations are not required which can make building some software easier.    &lt;br /&gt;
&lt;br /&gt;
'''NOTE''': BGQ MPI jobs can be compiled on these nodes, however can not be run locally as the mpich2 is setup for the BGQ network and thus will fail on these nodes.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
As the BlueGene/Q architecture is different from the development nodes, you cannot run applications intended/compiled for the BGQ on the devel nodes. The only way to run (or even test) your program is to submit a job to the BGQ.  Jobs are submitted as scripts through loadleveler. That script must then use '''runjob''' to start the job, which in many ways similar to mpirun or mpiexec.  As shown above in the network topology overview, there are only a few optimum job size configurations which is also further constrained by each block requiring a minimum of one IO node.  In SciNet's configuration (with 8 I/O nodes per midplane) this allows 64 nodes (1024 cores) to be the smallest block size. Normally a block size matches the job size to offer fully dedicated resources to the job.  Smaller jobs can be run within the same block however this results in shared resources (network and IO) and are referred to as sub-block jobs and are described in more detail below.  &lt;br /&gt;
&lt;br /&gt;
=== runjob ===&lt;br /&gt;
&lt;br /&gt;
All BGQ runs are launched using '''runjob''' which for those familiar with MPI is analogous to mpirun/mpiexec.  Jobs run on a block, which is a predefined group of nodes that have already been configured and booted.  There are two ways to get a block. One way is to use a 30-minute 'debugjob' session (more about that below). The other, more common case, is using a job script submitted and are running using loadleveler. Inside the job script, this block is set for you, and you do not have to specify the block name.  For example, if your loadleveler job script requests 64 nodes, each with 16 cores (for a total of 1024 cores), from within that job script, you can run a job with 16 processes per node and 1024 total processes with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here, &amp;lt;tt&amp;gt;--np 1024&amp;lt;/tt&amp;gt; sets the total number of mpi tasks, while &amp;lt;tt&amp;gt;--ranks-per-node=16&amp;lt;/tt&amp;gt; specifies that 16 processes should run on each node.&lt;br /&gt;
For pure mpi jobs, it is advisable always to give the number of ranks per node, because the default value of 1 may leave 15 cores on the node idle. The argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- (Note: If this were not a loadleveler job, and the block ID was R00-M0-N03-64, the command would be &amp;quot;&amp;lt;tt&amp;gt;runjob --block R00-M0-N03-64 --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&amp;lt;/tt&amp;gt;&amp;quot;) --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
runjob flags are shown with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
a particularly useful one is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--verbose #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where # is from 1-7 which can be helpful in debugging an application.&lt;br /&gt;
&lt;br /&gt;
=== How to set ranks-per-node ===&lt;br /&gt;
&lt;br /&gt;
There are 16 cores per node, but the argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64.  While it may seem natural to set ranks-per-node to 16, this is not generally recommended.  On the BGQ, one can efficiently run more than 1 process per core, because each core has four &amp;quot;hardware threads&amp;quot; (similar to HyperThreading on the GPC and Simultaneous Multi Threading on the TCS and P7), which can keep the different parts of each core busy at the same time. One would therefore ideally use 64 ranks per node.  There are two main reason why one might not set ranks-per-node to 64:&lt;br /&gt;
# The memory requirements do not allow 64 ranks (each rank only has 256MB of memory)&lt;br /&gt;
# The application is more efficient in a hybrid MPI/OpenMP mode (or MPI/pthreads). Using less ranks-per-node, the hardware threads are used as OpenMP threads within each process.&lt;br /&gt;
Because threads can share memory, the memory requirements of the hybrid runs is typically smaller than that of pure MPI runs.&lt;br /&gt;
&lt;br /&gt;
Note that the total number of mpi processes in a runjob (i.e., the --np argument) should be the ranks-per-node times the number of nodes (set by bg_size in the loadleveler script). So for the same number of nodes, if you change ranks-per-node by a factor of two, you should also multiply the total number of mpi processes by two.&lt;br /&gt;
&lt;br /&gt;
=== Queue Limits ===&lt;br /&gt;
&lt;br /&gt;
The maximum wall_clock_limit is 24 hours.  Official SOSCIP project jobs are prioritized over all other jobs using a fairshare algorithm with a 14 day rolling window.&lt;br /&gt;
&lt;br /&gt;
A 64 node block is reserved for development and interactive testing for 16 hours, from 8AM to midnight, everyday including weekends. While you can still reserve an interactive block from midnight to 8AM, the priority is given to batch jobs at that time interval in order to keep the machine usage as high as possible. This block is accessed by using the [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] command which has a 30 minute maximum wall_clock_limit. The purpose of this reservation is to ensure short testing jobs are run quickly without being held up by longer production type jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- We need to recover this functionality again. At the moment it doesn't work&lt;br /&gt;
=== BACKFILL scheduling ===&lt;br /&gt;
To optimize the cluster usage, we encourage users to submit jobs according to the available resources on BGQ. The command &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;llAvailableResources&amp;lt;/span&amp;gt; gives for example :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
On the Devel system : only a debugjob can start immediately&lt;br /&gt;
&lt;br /&gt;
On the Prod. system : a job will start immediately if you use 512 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 256 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 128 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 64 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Batch Jobs ===&lt;br /&gt;
&lt;br /&gt;
Job submission is done through loadleveler with a few blue gene specific commands.  The command &amp;quot;bg_size&amp;quot; is in number of nodes, not cores, so a bg_size=64 would be 64x16=1024 cores.&lt;br /&gt;
&lt;br /&gt;
The parameter &amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;bg_size&amp;lt;/span&amp;gt; can only be equal to 64, 128, 256, 512, 1024 and 2048.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;np&amp;lt;/span&amp;gt; &amp;amp;le; ranks-per-node * bg_size&lt;br /&gt;
&lt;br /&gt;
ranks-per-node &amp;amp;le; np&lt;br /&gt;
&lt;br /&gt;
(ranks-per-node * OMP_NUM_THREADS ) &amp;amp;le; 64 &lt;br /&gt;
&lt;br /&gt;
np : number of MPI processes&lt;br /&gt;
&lt;br /&gt;
ranks-per-node : number of MPI processes per node = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
OMP_NUM_THREADS : number of OpenMP thread per MPI process (for hybrid codes) = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue &lt;br /&gt;
&lt;br /&gt;
# Launch all BGQ jobs using runjob&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit to the queue use &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llsubmit myscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== Steps ( Job dependency) ===&lt;br /&gt;
LoadLeveler has a lot of advanced features to control job submission and execution. One of these features is called steps. This feature allows a series of jobs to be submitted using one script with dependencies defined between the jobs. What this allows is for a series of jobs to be run sequentially, waiting for the previous job, called a step, to be finished before the next job is started. The following example uses the same LoadLeveler script as previously shown, however the #@ step_name and #@ dependency directives are used to rerun the same case three times in a row, waiting until each job is finished to start the next.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step1                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the first step :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step1&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step2                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step1 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the second step if the first one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step2&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step3                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step2 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the third step if the second one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step3&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Jobs ===&lt;br /&gt;
&lt;br /&gt;
To see running jobs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq -b&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
to cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llcancel JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and to look at details of the bluegene resources use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstatus -M all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note: the loadleveler script commands  are not run on a bgq compute node but on the front-end node. Only programs started with runjob run on the bgq compute nodes. You should therefore keep scripting in the submission script to a bare minimum.'''&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Stats ===&lt;br /&gt;
&lt;br /&gt;
Use llbgstats to monitor your own stats and/or your group stats. PIs can also print their (current) monthly report.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstats -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Interactive Use / Debugging ===&lt;br /&gt;
&lt;br /&gt;
As BGQ codes are cross-compiled they cannot be run direclty on the front-nodes.  &lt;br /&gt;
Users however only have access to the BGQ through loadleveler which is appropriate for batch jobs, &lt;br /&gt;
however an interactive session is typically beneficial when debugging and developing.   As such a &lt;br /&gt;
script has been written to allow a session in which runjob can be run interactively.  The script&lt;br /&gt;
uses loadleveler to setup a block and set all the correct environment variables and then launch a spawned shell on&lt;br /&gt;
the front-end node. The '''debugjob''' session currently allows a 30 minute session on 64 nodes and when run on &lt;br /&gt;
'''&amp;lt;tt&amp;gt;bgqdev&amp;lt;/tt&amp;gt;''' runs in a dedicated reservation as described previously in the [[BGQ#Queue_Limits | queue limits]] section. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[user@bgqdev-fen1]$ debugjob&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ runjob --np 64 --ranks-per-node=16 --cwd=$PWD : $PWD/my_code -f myflags&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ exit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For debugging, gdb and Allinea DDT are available. The latter is recommended as it automatically attaches to all the processes of a process (instead of attaching a gdbtool by hand (as explained in the BGQ Application Development guide, link below). Simply compile with &amp;lt;tt&amp;gt;-g&amp;lt;/tt&amp;gt;, load the &amp;lt;tt&amp;gt;ddt/4.1&amp;lt;/tt&amp;gt; module, type &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt; and follow the graphical user interface.  The DDT user guide can be found below.&lt;br /&gt;
&lt;br /&gt;
Note: when running a job under ddt, you'll need to add &amp;quot;&amp;lt;tt&amp;gt;--ranks-per-node=X&amp;lt;/tt&amp;gt;&amp;quot; to the &amp;quot;runjob arguments&amp;quot; field.&lt;br /&gt;
&lt;br /&gt;
Apart from debugging, this environment is also useful for building libraries and applications that need to run small tests as part of their 'configure' step.   Within the debugjob session, applications compiled with the bgxl compilers or the mpcc/mpCC/mpfort wrappers, will automatically run on the BGQ, skipping the need for the runjob command, provided if you set the following environment variables &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export BG_PGM_LAUNCHER=yes&lt;br /&gt;
$ export RUNJOB_NP=1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The latter setting sets the number of mpi processes to run.  Most configure scripts expect only one mpi process, thus, &amp;lt;tt&amp;gt;RUNJOB_NP=1&amp;lt;/tt&amp;gt; is appropriate.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
debugjob session with an executable implicitly calls runjob  with 1 mpi task :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
debugjob -i&lt;br /&gt;
**********************************************************&lt;br /&gt;
 Interactive BGQ runjob shell using bgq-fen1-ib0.10295.0 and           &lt;br /&gt;
 LL14040718574824 for 30 minutes with 64 NODES (1024 cores). &lt;br /&gt;
 IMPLICIT MODE: running an executable implicitly calls runjob&lt;br /&gt;
                with 1 mpi task&lt;br /&gt;
 Exit shell when finished.                                &lt;br /&gt;
**********************************************************&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Sub-block jobs ===&lt;br /&gt;
&lt;br /&gt;
BGQ allows multiple applications to share the same block, which is referred to as sub-block jobs, however this needs to be done from within the same loadleveler submission script using multiple calls to runjob.  To run a sub-block job, you need to specify a &amp;quot;--corner&amp;quot; within the block to start each job and a 5D Torus AxBxCxDxE &amp;quot;--shape&amp;quot;.  The starting corner will depend on the specific block details provided by loadleveler and the shape and size of job trying to be used.  &lt;br /&gt;
&lt;br /&gt;
Figuring out what the corners and shapes should be is very tricky (especially since it depends on the block you get allocated).  For that reason, we've created a script called &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; that determines the corners and shape of the sub-blocks.  It only handles the (presumable common) case in which you want to subdivide the block into n equally sized sub-blocks, where n may be 1,2,4,8,16 and 32.&lt;br /&gt;
&lt;br /&gt;
Here is an example script calling &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; with a size of 4 that will return the appropriate $SHAPE argument and an array of 16 starting $CORNER. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# @ job_name           = bgsubblock&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job SUBBLOCK &amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue&lt;br /&gt;
&lt;br /&gt;
# Using subblocks script to set $SHAPE and array of ${CORNERS[n]}&lt;br /&gt;
# with size of subblocks in nodes (ie similiar to bg_size)&lt;br /&gt;
&lt;br /&gt;
# In this case 16 sub-blocks of 4 cnodes each (64 total ie bg_size)&lt;br /&gt;
source subblocks 4&lt;br /&gt;
&lt;br /&gt;
# 16 jobs of 4 each&lt;br /&gt;
for (( i=0; i &amp;lt;  16 ; i++)); do&lt;br /&gt;
   runjob --corner ${CORNER[$i]} --shape $SHAPE --np 64 --ranks-per-node=16 :  your_code_here &amp;gt; $i.out &amp;amp;&lt;br /&gt;
done&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Remember that subjobs are not the ideal way to run on the BlueGene/Qs. One needs to consider that these sub-blocks all have to share the same I/O nodes, so for I/O intensive jobs this will be an inefficient setup.  Also consider that if you need to run such small jobs that you have to run in sub-blocks, it may be more efficient to use other clusters such as the GPC.&lt;br /&gt;
&lt;br /&gt;
Let us know if you run into any issues with this technique, please contact bgq-support for help.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The BGQ has its own dedicated 500TB file system based on GPFS (General Parallel File System). There are two main systems for user data: /home, a small, backed-up space where user home directories are located, and /scratch, a large system for input or output data for jobs; data on /scratch is not backed up. The path to your home directory is in the environment variable $HOME, and will look like /home/G/GROUP/USER, .  The path to your scratch directory is in the environment variable $SCRATCH, and will look like /scratch/G/GROUP/USER (following the conventions of the rest of the SciNet systems).  &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! | file system &lt;br /&gt;
! | purpose &lt;br /&gt;
! | user quota &lt;br /&gt;
! | backed up&lt;br /&gt;
! | purged&lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 50 GB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
|-&lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| first of (20 TB ; 1 million files)&lt;br /&gt;
| no&lt;br /&gt;
| not currently&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Transfering files===&lt;br /&gt;
The BGQ GPFS file system,  except for HPSS, is '''not''' shared with the other SciNet systems (gpc, tcs, p7, arc), nor is the other file system mounted on the BGQ.  &lt;br /&gt;
Use scp to copy files from one file system to the other, e.g., from bgqdev-fen1, you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour login.scinet.utoronto.ca:code.tgz .&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or from a login node you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour code.tgz bgqdev.scinet.utoronto.ca:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The flag &amp;lt;tt&amp;gt;-c arcfour&amp;lt;/tt&amp;gt; is optional. It tells scp (or really, ssh), to use a non-default encryption. The one chosen here, arcfour, has been found to speed up the transfer by a factor of two (you may expect around 85MB/s).  This encryption method is only recommended for copying from the BGQ file system to the regular SciNet GPFS file system or back. &lt;br /&gt;
 &lt;br /&gt;
Note that although these transfers are witihin the same data center, you have to use the full names of the systems, login.scinet.utoronto.ca and bgq.scinet.utoronto.ca, respectively, and that you will be asked you for your password.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''diskUsage'''&amp;lt;/tt&amp;gt; command, available on the bgqdev nodes, provides information in a number of ways on the home and scratch file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;] [-de|-plot]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
       -plot: create plots of disk usages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that the information on usage and quota is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
===Bridge to HPSS===&lt;br /&gt;
&lt;br /&gt;
BGQ users may transfer material to/from HPSS via the GPC archive queue. On the HPSS gateway node (gpc-archive01), the BGQ GPFS file systems are mounted under a single mounting point /bgq (/bgq/scratch and /bgq/home). For detailed information on the use of HPSS [https://support.scinet.utoronto.ca/wiki/index.php/HPSS please read the HPSS wiki section.]&lt;br /&gt;
&lt;br /&gt;
== Software modules installed on the BGQ ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! |Software  &lt;br /&gt;
! | Version&lt;br /&gt;
! | Comments&lt;br /&gt;
! | Command/Library&lt;br /&gt;
! | Module Name&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Compilers &amp;amp; Development Tools'''''&lt;br /&gt;
|-&lt;br /&gt;
|IBM fortran compiler&lt;br /&gt;
|14.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlf,bgxlf_r,bgxlf90,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|xlf&lt;br /&gt;
|-&lt;br /&gt;
|IBM c/c++ compilers&lt;br /&gt;
|12.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlc,bgxlC,bgxlc_r,bgxlC_r,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|vacpp&lt;br /&gt;
|-&lt;br /&gt;
|MPICH2 MPI library&lt;br /&gt;
|1.4.1&lt;br /&gt;
|There are 4 versions (see BGQ Applications Development document).&lt;br /&gt;
|&amp;lt;tt&amp;gt;mpicc,mpicxx,mpif77,mpif90&amp;lt;/tt&amp;gt;&lt;br /&gt;
|mpich2&lt;br /&gt;
|- &lt;br /&gt;
| GCC Compiler&lt;br /&gt;
| 4.4.6, 4.8.1&lt;br /&gt;
| GNU Compiler Collection for BGQ&amp;lt;br&amp;gt;(4.8.1 requires binutils/2.23 to be loaded)&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-gcc, powerpc64-bgq-linux-g++, powerpc64-bgq-linux-gfortran&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgqgcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Clang Compiler&lt;br /&gt;
| r217688-20140912, r263698-20160317&lt;br /&gt;
| Clang cross-compilers for bgq&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-clang, powerpc64-bgq-linux-clang++&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgclang&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Binutils&lt;br /&gt;
| 2.21.1, 2.23&lt;br /&gt;
| Cross-compilation utilities&lt;br /&gt;
| &amp;lt;tt&amp;gt;addr2line, ar, ld, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;binutils&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| CMake	&lt;br /&gt;
| 2.8.8, 2.8.12.1&lt;br /&gt;
| cross-platform, open-source build system&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Git&lt;br /&gt;
| 1.9.5&lt;br /&gt;
| Revision control system&lt;br /&gt;
| &amp;lt;tt&amp;gt;git, gitk&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;git&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Debug/performance tools'''''&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/gdb/ gdb]&lt;br /&gt;
| 7.2&lt;br /&gt;
| GNU Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/ddd/ ddd]&lt;br /&gt;
| 3.3.12&lt;br /&gt;
| GNO Data Display Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [http://www.allinea.com/products/ddt/ DDT]&lt;br /&gt;
| 4.1, 4.2, 5.0.1&lt;br /&gt;
| Allinea's Distributed Debugging Tool&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[HPCTW]]&lt;br /&gt;
| 1.0&lt;br /&gt;
| BGQ MPI and Hardware Counters&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmpihpm.a, libmpihpm_smp.a, libmpitrace.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hptibm&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[MemP]]&lt;br /&gt;
| 1.0.3&lt;br /&gt;
| BGQ Memory Stats&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmemP.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;memP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Storage tools/libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.9-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/189-v18-serial-xlc*&amp;lt;br/&amp;gt;hdf5/189-v18-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.12-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/1812-v18-serial-gcc&amp;lt;br/&amp;gt;hdf5/1812-v18-mpich2-gcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NetCDF&lt;br /&gt;
| 4.2.1.1&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncdump,ncgen,libnetcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;netcdf/4.2.1.1-serial-xlc*&amp;lt;br/&amp;gt;netcdf/4.2.1.1-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Parallel NetCDF&lt;br /&gt;
| 1.3.1&lt;br /&gt;
| Parallel scientific data storage and retrieval using MPI-IO&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpnetcdf.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parallel-netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| ESSL&lt;br /&gt;
| 5.1&lt;br /&gt;
| IBM Engineering and Scientific Subroutine Library (manual below)&lt;br /&gt;
| &amp;lt;tt&amp;gt;libesslbg,libesslsmpbg&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;essl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| WSMP&lt;br /&gt;
| 15.06.01&lt;br /&gt;
| Watson Sparse Matrix Package&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpwsmpBGQ.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;WSMP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| FFTW&lt;br /&gt;
| 2.1.5, 3.3.2, 3.1.2-esslwrapper&lt;br /&gt;
| Fast fourier transform &lt;br /&gt;
| &amp;lt;tt&amp;gt;libsfftw,libdfftw,libfftw3, libfftw3f&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;fftw/2.1.5, fftw/3.3.2, fftw/3.1.2-esslwrapper&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAPACK + ScaLAPACK&lt;br /&gt;
| 3.4.2 + 2.0.2&lt;br /&gt;
| Linear algebra routines. A subset of Lapack may be found in ESSL as well.&lt;br /&gt;
| &amp;lt;tt&amp;gt;liblapack, libscalpack&amp;lt;/tt&amp;gt;&lt;br /&gt;
| lapack&lt;br /&gt;
|-&lt;br /&gt;
| GSL&lt;br /&gt;
| 1.15&lt;br /&gt;
| GNU Scientific Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libgsl, libgslcblas&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gsl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| BOOST&lt;br /&gt;
| 1.47.0, 1.54, 1.57&lt;br /&gt;
| C++ Boost libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libboost...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cxxlibraries/boost&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| bzip2 + szip + zlib&lt;br /&gt;
| 1.0.6 + 2.1 + 1.2.7&lt;br /&gt;
| compression libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libbz2,libz,libsz&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;compression&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| METIS&lt;br /&gt;
| 5.0.2&lt;br /&gt;
| Serial Graph Partitioning and Fill-reducing Matrix Ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;metis&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| ParMETIS&lt;br /&gt;
| 4.0.2&lt;br /&gt;
| Parallel graph partitioning and fill-reducing matrix ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libparmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parmetis&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| OpenSSL&lt;br /&gt;
| 1.0.2 &lt;br /&gt;
| General-purpose cryptography library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libcrypto, libssl&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openssl&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| FILTLAN&lt;br /&gt;
| 1.0&lt;br /&gt;
| The Filtered Lanczos Package &lt;br /&gt;
| &amp;lt;tt&amp;gt;libdfiltlan,libdmatkit,libsfiltlan,libsmatkit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FILTLAN&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Scripting/interpreted languages'''''&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.6.6&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-2.6/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.7.3&lt;br /&gt;
| Python programming language. Modules included : numpy-1.8.0, pyFFTW-0.9.2, astropy-0.3, scipy-0.13.3, mpi4py-1.3.1, h5py-2.2.1&lt;br /&gt;
| &amp;lt;tt&amp;gt;/scinet/bgq/tools/Python/python2.7.3-20131205/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 3.2.2&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-3.2/bin/python3&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Applications'''''&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.abinit.org/ ABINIT]&lt;br /&gt;
| 7.10.4&lt;br /&gt;
| An atomic-scale simulation software suite&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.berkeleygw.org/ BerkeleyGW library]&lt;br /&gt;
| 1.0.4-2.0.0436&lt;br /&gt;
| Computes quasiparticle properties and the optical responses of a large variety of materials&lt;br /&gt;
| &amp;lt;tt&amp;gt;libBGW_wfn.a, wfn_rho_vxc_io_m.mod&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;BGW-paratec&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.cp2k.org/ CP2K]&lt;br /&gt;
| 2.3, 2.4, 2.5.1, 2.6.1&lt;br /&gt;
| DFT molecular dynamics, MPI &lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k.psmp&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.cpmd.org/ CPMD]&lt;br /&gt;
| 3.15.3, 3.17.1&lt;br /&gt;
| Carr-Parinello molecular dynamics, MPI&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd.x&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| gnuplot&lt;br /&gt;
| 4.6.1&lt;br /&gt;
| interactive plotting program to be run on front-end nodes&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAMMPS&lt;br /&gt;
| Nov 2012/7Dec15/7Dec15-mpi&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;lmp_bgq&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;lammps&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NAMD&lt;br /&gt;
| 2.9&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;namd2&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;namd/2.9-smp&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.quantum-espresso.org/index.php Quantum Espresso]&lt;br /&gt;
| 5.0.3/5.2.1&lt;br /&gt;
| Molecular Structure / Quantum Chemistry &lt;br /&gt;
| &amp;lt;tt&amp;gt;qe_pw.x, etc&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;espresso&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://openfoam.org OpenFOAM]&lt;br /&gt;
| 2.2.0, 2.3.0, 2.4.0, 3.0.1, 5.0&lt;br /&gt;
| Computational Fluid Dynamics&lt;br /&gt;
| &amp;lt;tt&amp;gt;icofoam,etc. &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openfoam/2.2.0, openfoam/2.3.0, openfoam/2.4.0, openfoam/3.0.1, openfoam/5.0&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Beta Tests'''''&lt;br /&gt;
|-&lt;br /&gt;
| WATSON API&lt;br /&gt;
| beta&lt;br /&gt;
| Natural Language Processing&lt;br /&gt;
| &amp;lt;tt&amp;gt;watson_beta&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FEN/WATSON&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== OpenFOAM on BGQ ===&lt;br /&gt;
&lt;br /&gt;
[https://docs.scinet.utoronto.ca/index.php/OpenFOAM_on_BGQ A detailed explanation of OpenFOAM usage on BG/Q cluster]&lt;br /&gt;
&lt;br /&gt;
== Python on BlueGene ==&lt;br /&gt;
Python 2.7.3 has been installed on BlueGene. To use &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Numpy&amp;lt;/span&amp;gt; and &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Scipy&amp;lt;/span&amp;gt;, the module &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;essl/5.1&amp;lt;/span&amp;gt; has to be loaded.&lt;br /&gt;
The full python path has to be provided (otherwise the default version is used).&lt;br /&gt;
&lt;br /&gt;
To use python on BlueGene (from within a job script or a debugjob session):&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load python/2.7.3&lt;br /&gt;
##Only if you need numpy/scipy :&lt;br /&gt;
module load xlf/14.1 essl/5.1&lt;br /&gt;
runjob --np 1 --ranks-per-node=1 --envs HOME=$HOME LD_LIBRARY_PATH=$LD_LIBRARY_PATH PYTHONPATH=/scinet/bgq/tools/Python/python2.7.3-20131205/lib/python2.7/site-packages/ : /scinet/bgq/tools/Python/python2.7.3-20131205/bin/python2.7 /PATHOFYOURSCRIPT.py &lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to use the mmap python API, you must use it in PRIVATE mode as shown in the bellow example :&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
import mmap&lt;br /&gt;
mm=mmap.mmap(-1,256,mmap.MAP_PRIVATE)&lt;br /&gt;
mm.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, you can use the mpi4py and h5py modules.&lt;br /&gt;
&lt;br /&gt;
Also, please read Cython documentation.&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
#BGQ Day: Introduction to Using the BG/Q [[Media:BgqintroUpdatedMarch2015.pdf|Slides (updated in 2015) ]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.mp4 (direct link)]&lt;br /&gt;
#BGQ Day: BG/Q Hardware Overview [https://support.scinet.utoronto.ca/~northrup/bgqhardware.pdf Slides] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.mp4 (direct link)]&lt;br /&gt;
# [http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUQUEEN/Documentation/Documention_node.html Julich BGQ Documentation]&lt;br /&gt;
# [https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q Argonne Mira BGQ Wiki]&lt;br /&gt;
# [https://computing.llnl.gov/tutorials/bgq/ LLNL Sequoia BGQ Info]&lt;br /&gt;
# [https://www.alcf.anl.gov/presentations Argonne MiraCon Presentations]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_SysAdmin.pdf|BGQ System Administration Guide]]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_AppDev.pdf|BGQ Application Development]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqccompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqclangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfcompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqflangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:Bgqfproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# [[Media:essl51.pdf|IBM ESSL (Engineering and Scientific Subroutine Library) 5.1 for Linux on Power]]&lt;br /&gt;
# [http://content.allinea.com/downloads/userguide.pdf Allinea DDT 4.1 User Guide]&lt;br /&gt;
# [https://www.ibm.com/support/knowledgecenter/en/SSFJTW_5.1.0/loadl.v5r1_welcome.html IBM LoadLeveler 5.1]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--  PUT IN TRAC !!!&lt;br /&gt;
&lt;br /&gt;
=== *Manual Block Creation* ===&lt;br /&gt;
&lt;br /&gt;
To reconfigure the BGQ nodes you can use the bg_console or the web based navigator from the service node &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bg_console&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are various options to create block types (section 3.2 in the BGQ admin manual), but the smallest is created using the&lt;br /&gt;
following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gen_small_block &amp;lt;blockid&amp;gt; &amp;lt;midplane&amp;gt; &amp;lt;cnodes&amp;gt; &amp;lt;nodeboard&amp;gt; &lt;br /&gt;
gen_small_block  R00-M0-N03-32 R00-M0 32 N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The block then needs to be booted using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
allocate R00-M0-N03-32&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If those resources are already booted into another block, that block must be freed before the new block can be &lt;br /&gt;
allocated.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
free R00-M0-N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many other functions in bg_console:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
help all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The BGQ default nomenclature for hardware is as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
(R)ack - (M)idplane - (N)ode board or block - (J)node - (C)ore&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So R00-M01-N03-J00-C02 would correspond to the first rack, second midplane, 3rd block, 1st node, and second core.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--!&amp;gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=443</id>
		<title>BGQ</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=443"/>
		<updated>2018-05-25T18:23:43Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* Software modules installed on the BGQ */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Blue_Gene_Cabinet.jpeg|center|300px|thumb]]&lt;br /&gt;
|name=Blue Gene/Q (BGQ)&lt;br /&gt;
|installed=Aug 2012, Nov 2014&lt;br /&gt;
|operatingsystem= RH6.3, CNK (Linux) &lt;br /&gt;
|loginnode= bgqdev-fen1&lt;br /&gt;
|nnodes=  4096 nodes (65,536 cores)&lt;br /&gt;
|rampernode=16 GB &lt;br /&gt;
|corespernode=16 (64 threads)&lt;br /&gt;
|interconnect=5D Torus (jobs), QDR Infiniband (I/O) &lt;br /&gt;
|vendorcompilers= bgxlc, bgxlf&lt;br /&gt;
|queuetype=Loadleveler&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==System Status==&lt;br /&gt;
&lt;br /&gt;
The current BGQ system status can be found on the wiki's [[Main Page]].&lt;br /&gt;
&lt;br /&gt;
==SOSCIP &amp;amp; LKSAVI==&lt;br /&gt;
&lt;br /&gt;
The BGQ is a Southern Ontario Smart Computing&lt;br /&gt;
Innovation Platform ([http://soscip.org/ SOSCIP]) BlueGene/Q supercomputer located at the&lt;br /&gt;
University of Toronto's SciNet HPC facility. The SOSCIP &lt;br /&gt;
multi-university/industry consortium is funded by the Ontario Government &lt;br /&gt;
and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
A half-rack of BlueGene/Q (8,192 cores) was purchased by the [http://likashingvirology.med.ualberta.ca/ Li Ka Shing Institute of Virology] at the University of Alberta in late fall 2014 and integrated into the existing BGQ system.&lt;br /&gt;
&lt;br /&gt;
The combined 4 rack system is the fastest Canadian supercomputer on the [http://top500.org/ top 500], currently at the 120th place (Nov 2015).&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:bgq-support@scinet.utoronto.ca &amp;lt;bgq-support@scinet.utoronto.ca&amp;gt;] for BGQ-specific inquiries.&lt;br /&gt;
&lt;br /&gt;
==Specifications==&lt;br /&gt;
&lt;br /&gt;
BGQ is an extremely dense and energy efficient 3rd generation Blue Gene IBM supercomputer built around a system-on-a-chip compute node that has a 16core 1.6GHz PowerPC based CPU (PowerPC A2) with 16GB of Ram.  The nodes are bundled in groups of 32 into a node board (512 cores), and 16 boards make up a midplane (8192 cores) with 2 midplanes per rack, or 16,348 cores and 16 TB of RAM per rack. The compute nodes run a very lightweight Linux-based operating system called CNK ('''C'''ompute '''N'''ode '''K'''ernel).  The compute nodes are all connected together using a custom 5D torus highspeed interconnect. Each rack has 16 I/O nodes that run a full Redhat Linux OS that manages the compute nodes and mounts the filesystem.  SciNet's BGQ consists of 8 mdiplanes (four-racks) totalling 65,536 cores and 64TB of RAM.&lt;br /&gt;
&lt;br /&gt;
[[Image:BlueGeneQHardware2.png‎ |center]]&lt;br /&gt;
&lt;br /&gt;
=== 5D Torus Network ===&lt;br /&gt;
&lt;br /&gt;
The network topology of BlueGene/Q is a five-dimensional (5D) torus, with direct links between the nearest neighbors in the ±A, ±B, ±C, ±D, and ±E directions.  As such there are only a few optimum block sizes that will use the network efficiently.&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellspacing=&amp;quot;0&amp;quot; cellpadding=&amp;quot;2&amp;quot;&lt;br /&gt;
| '''Node Boards '''&lt;br /&gt;
| '''Compute Nodes'''&lt;br /&gt;
| '''Cores'''&lt;br /&gt;
| '''Torus Dimensions'''&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 32&lt;br /&gt;
| 512&lt;br /&gt;
| 2x2x2x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 2 (adjacent pairs)&lt;br /&gt;
| 64&lt;br /&gt;
| 1024&lt;br /&gt;
| 2x2x4x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 4 (quadrants)&lt;br /&gt;
| 128&lt;br /&gt;
| 2048&lt;br /&gt;
| 2x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 8 (halves)&lt;br /&gt;
| 256&lt;br /&gt;
| 4096&lt;br /&gt;
| 4x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 16 (midplane)&lt;br /&gt;
| 512&lt;br /&gt;
| 8192&lt;br /&gt;
| 4x4x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 32 (1 rack)&lt;br /&gt;
| 1024&lt;br /&gt;
| 16384&lt;br /&gt;
| 4x4x4x8x2 &lt;br /&gt;
|-&lt;br /&gt;
| 64 (2 racks)&lt;br /&gt;
| 2048&lt;br /&gt;
| 32768&lt;br /&gt;
| 4x4x8x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 96 (3 racks)&lt;br /&gt;
| 3072&lt;br /&gt;
| 49152&lt;br /&gt;
| 4x4x12x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 128 (4 racks)&lt;br /&gt;
| 4096&lt;br /&gt;
| 65536&lt;br /&gt;
| 8x4x8x8x2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Login/Devel Node ==&lt;br /&gt;
&lt;br /&gt;
The development node is '''bgqdev-fen1''' which one can login to from the regular '''login.scinet.utoronto.ca''' login nodes or directly from outside using '''bgqdev.scinet.utoronto.ca''', e.g.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -l USERNAME bgqdev.scinet.utoronto.ca -X&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where USERNAME is your username on the BGQ and the &amp;lt;tt&amp;gt;-X&amp;lt;/tt&amp;gt; flag is optional, needed only if you will use X graphics.&amp;lt;br/&amp;gt;&lt;br /&gt;
Note: To learn how to setup ssh keys for logging in please see [[Ssh keys]].&lt;br /&gt;
&lt;br /&gt;
These development node is a Power7 machines running Linux which serve as the compilation and submission host for the BGQ.  Programs are cross-compiled for the BGQ on this node and then submitted to the queue using loadleveler.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including most of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
A list of the installed software can be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module unload vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands can go in your .bashrc files to make sure you are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
Modules that load libraries, define environment variables pointing to the location of library files and include files for use Makefiles. These environment variables follow the naming convention&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 $SCINET_[short-module-name]_BASE&lt;br /&gt;
 $SCINET_[short-module-name]_LIB&lt;br /&gt;
 $SCINET_[short-module-name]_INC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
for the base location of the module's files, the location of the libraries binaries and the header files, respectively.&lt;br /&gt;
&lt;br /&gt;
So to compile and link the library, you will have to add &amp;lt;tt&amp;gt;-I${SCINET_[module-basename]_INC}&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;-L${SCINET_[module-basename]_LIB}&amp;lt;/tt&amp;gt;, respectively, in addition to the usual &amp;lt;tt&amp;gt;-l[libname]&amp;lt;/tt&amp;gt;.  &lt;br /&gt;
&lt;br /&gt;
Note that a &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; command ''only'' sets the environment variables in your current shell (and any subprocesses that the shell launches).   It does ''not'' effect other shell environments.&lt;br /&gt;
&lt;br /&gt;
If you always require the same modules, it is easiest to load those modules in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and then they will always be present in your environment; if you routinely have to flip back and forth between modules, it is easiest to have almost no modules loaded in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and simply load them as you need them (and have the required &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; commands in your job submission scripts).&lt;br /&gt;
&lt;br /&gt;
=== Compilers ===&lt;br /&gt;
&lt;br /&gt;
The BGQ uses IBM XL compilers to cross-compile code for the BGQ.  Compilers are available for FORTRAN, C, and C++.  They are accessible by default, or by loading the '''xlf''' and '''vacpp''' modules. The compilers by default produce&lt;br /&gt;
static binaries, however with BGQ it is possible to now use dynamic libraries as well.  The compilers follow the XL conventions with the prefix '''bg''',&lt;br /&gt;
so '''bgxlc''' and '''bgxlf90''' are the C and FORTRAN compilers respectively.  &lt;br /&gt;
&lt;br /&gt;
Most users however will use the MPI variants, i.e. '''mpixlf90''' and '''mpixlc''' and  which are available by loading&lt;br /&gt;
the '''mpich2''' module. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load mpich2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is recommended to use at least the following flags when compiling and linking&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-O3 -qarch=qp -qtune=qp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to build a package for which the configure script tries to run small test jobs, the cross-compiling nature of the bgq can get in the way.  In that case, you should use the interactive [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] environment as described below.&lt;br /&gt;
&lt;br /&gt;
== ION/Devel Nodes ==&lt;br /&gt;
&lt;br /&gt;
There are also bgq native development nodes named '''bgqdev-ion[01-24]''' which one can login to directly, i.e. ssh, from '''bgqdev-fen1'''.  These nodes are extra I/O nodes that are essentially the same as the BGQ compute nodes with the exception that they run a full RedHat Linux and have an infiniband interface providing direct network access.    Unlike the regular development node, '''bgqdev-fen1''', which is Power7, this node has the same BGQ A2 processor, and thus cross compilations are not required which can make building some software easier.    &lt;br /&gt;
&lt;br /&gt;
'''NOTE''': BGQ MPI jobs can be compiled on these nodes, however can not be run locally as the mpich2 is setup for the BGQ network and thus will fail on these nodes.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
As the BlueGene/Q architecture is different from the development nodes, you cannot run applications intended/compiled for the BGQ on the devel nodes. The only way to run (or even test) your program is to submit a job to the BGQ.  Jobs are submitted as scripts through loadleveler. That script must then use '''runjob''' to start the job, which in many ways similar to mpirun or mpiexec.  As shown above in the network topology overview, there are only a few optimum job size configurations which is also further constrained by each block requiring a minimum of one IO node.  In SciNet's configuration (with 8 I/O nodes per midplane) this allows 64 nodes (1024 cores) to be the smallest block size. Normally a block size matches the job size to offer fully dedicated resources to the job.  Smaller jobs can be run within the same block however this results in shared resources (network and IO) and are referred to as sub-block jobs and are described in more detail below.  &lt;br /&gt;
&lt;br /&gt;
=== runjob ===&lt;br /&gt;
&lt;br /&gt;
All BGQ runs are launched using '''runjob''' which for those familiar with MPI is analogous to mpirun/mpiexec.  Jobs run on a block, which is a predefined group of nodes that have already been configured and booted.  There are two ways to get a block. One way is to use a 30-minute 'debugjob' session (more about that below). The other, more common case, is using a job script submitted and are running using loadleveler. Inside the job script, this block is set for you, and you do not have to specify the block name.  For example, if your loadleveler job script requests 64 nodes, each with 16 cores (for a total of 1024 cores), from within that job script, you can run a job with 16 processes per node and 1024 total processes with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here, &amp;lt;tt&amp;gt;--np 1024&amp;lt;/tt&amp;gt; sets the total number of mpi tasks, while &amp;lt;tt&amp;gt;--ranks-per-node=16&amp;lt;/tt&amp;gt; specifies that 16 processes should run on each node.&lt;br /&gt;
For pure mpi jobs, it is advisable always to give the number of ranks per node, because the default value of 1 may leave 15 cores on the node idle. The argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- (Note: If this were not a loadleveler job, and the block ID was R00-M0-N03-64, the command would be &amp;quot;&amp;lt;tt&amp;gt;runjob --block R00-M0-N03-64 --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&amp;lt;/tt&amp;gt;&amp;quot;) --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
runjob flags are shown with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
a particularly useful one is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--verbose #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where # is from 1-7 which can be helpful in debugging an application.&lt;br /&gt;
&lt;br /&gt;
=== How to set ranks-per-node ===&lt;br /&gt;
&lt;br /&gt;
There are 16 cores per node, but the argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64.  While it may seem natural to set ranks-per-node to 16, this is not generally recommended.  On the BGQ, one can efficiently run more than 1 process per core, because each core has four &amp;quot;hardware threads&amp;quot; (similar to HyperThreading on the GPC and Simultaneous Multi Threading on the TCS and P7), which can keep the different parts of each core busy at the same time. One would therefore ideally use 64 ranks per node.  There are two main reason why one might not set ranks-per-node to 64:&lt;br /&gt;
# The memory requirements do not allow 64 ranks (each rank only has 256MB of memory)&lt;br /&gt;
# The application is more efficient in a hybrid MPI/OpenMP mode (or MPI/pthreads). Using less ranks-per-node, the hardware threads are used as OpenMP threads within each process.&lt;br /&gt;
Because threads can share memory, the memory requirements of the hybrid runs is typically smaller than that of pure MPI runs.&lt;br /&gt;
&lt;br /&gt;
Note that the total number of mpi processes in a runjob (i.e., the --np argument) should be the ranks-per-node times the number of nodes (set by bg_size in the loadleveler script). So for the same number of nodes, if you change ranks-per-node by a factor of two, you should also multiply the total number of mpi processes by two.&lt;br /&gt;
&lt;br /&gt;
=== Queue Limits ===&lt;br /&gt;
&lt;br /&gt;
The maximum wall_clock_limit is 24 hours.  Official SOSCIP project jobs are prioritized over all other jobs using a fairshare algorithm with a 14 day rolling window.&lt;br /&gt;
&lt;br /&gt;
A 64 node block is reserved for development and interactive testing for 16 hours, from 8AM to midnight, everyday including weekends. While you can still reserve an interactive block from midnight to 8AM, the priority is given to batch jobs at that time interval in order to keep the machine usage as high as possible. This block is accessed by using the [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] command which has a 30 minute maximum wall_clock_limit. The purpose of this reservation is to ensure short testing jobs are run quickly without being held up by longer production type jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- We need to recover this functionality again. At the moment it doesn't work&lt;br /&gt;
=== BACKFILL scheduling ===&lt;br /&gt;
To optimize the cluster usage, we encourage users to submit jobs according to the available resources on BGQ. The command &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;llAvailableResources&amp;lt;/span&amp;gt; gives for example :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
On the Devel system : only a debugjob can start immediately&lt;br /&gt;
&lt;br /&gt;
On the Prod. system : a job will start immediately if you use 512 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 256 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 128 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 64 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Batch Jobs ===&lt;br /&gt;
&lt;br /&gt;
Job submission is done through loadleveler with a few blue gene specific commands.  The command &amp;quot;bg_size&amp;quot; is in number of nodes, not cores, so a bg_size=64 would be 64x16=1024 cores.&lt;br /&gt;
&lt;br /&gt;
The parameter &amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;bg_size&amp;lt;/span&amp;gt; can only be equal to 64, 128, 256, 512, 1024 and 2048.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;np&amp;lt;/span&amp;gt; &amp;amp;le; ranks-per-node * bg_size&lt;br /&gt;
&lt;br /&gt;
ranks-per-node &amp;amp;le; np&lt;br /&gt;
&lt;br /&gt;
(ranks-per-node * OMP_NUM_THREADS ) &amp;amp;le; 64 &lt;br /&gt;
&lt;br /&gt;
np : number of MPI processes&lt;br /&gt;
&lt;br /&gt;
ranks-per-node : number of MPI processes per node = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
OMP_NUM_THREADS : number of OpenMP thread per MPI process (for hybrid codes) = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue &lt;br /&gt;
&lt;br /&gt;
# Launch all BGQ jobs using runjob&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit to the queue use &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llsubmit myscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== Steps ( Job dependency) ===&lt;br /&gt;
LoadLeveler has a lot of advanced features to control job submission and execution. One of these features is called steps. This feature allows a series of jobs to be submitted using one script with dependencies defined between the jobs. What this allows is for a series of jobs to be run sequentially, waiting for the previous job, called a step, to be finished before the next job is started. The following example uses the same LoadLeveler script as previously shown, however the #@ step_name and #@ dependency directives are used to rerun the same case three times in a row, waiting until each job is finished to start the next.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step1                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the first step :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step1&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step2                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step1 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the second step if the first one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step2&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step3                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step2 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the third step if the second one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step3&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Jobs ===&lt;br /&gt;
&lt;br /&gt;
To see running jobs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq -b&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
to cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llcancel JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and to look at details of the bluegene resources use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstatus -M all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note: the loadleveler script commands  are not run on a bgq compute node but on the front-end node. Only programs started with runjob run on the bgq compute nodes. You should therefore keep scripting in the submission script to a bare minimum.'''&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Stats ===&lt;br /&gt;
&lt;br /&gt;
Use llbgstats to monitor your own stats and/or your group stats. PIs can also print their (current) monthly report.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstats -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Interactive Use / Debugging ===&lt;br /&gt;
&lt;br /&gt;
As BGQ codes are cross-compiled they cannot be run direclty on the front-nodes.  &lt;br /&gt;
Users however only have access to the BGQ through loadleveler which is appropriate for batch jobs, &lt;br /&gt;
however an interactive session is typically beneficial when debugging and developing.   As such a &lt;br /&gt;
script has been written to allow a session in which runjob can be run interactively.  The script&lt;br /&gt;
uses loadleveler to setup a block and set all the correct environment variables and then launch a spawned shell on&lt;br /&gt;
the front-end node. The '''debugjob''' session currently allows a 30 minute session on 64 nodes and when run on &lt;br /&gt;
'''&amp;lt;tt&amp;gt;bgqdev&amp;lt;/tt&amp;gt;''' runs in a dedicated reservation as described previously in the [[BGQ#Queue_Limits | queue limits]] section. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[user@bgqdev-fen1]$ debugjob&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ runjob --np 64 --ranks-per-node=16 --cwd=$PWD : $PWD/my_code -f myflags&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ exit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For debugging, gdb and Allinea DDT are available. The latter is recommended as it automatically attaches to all the processes of a process (instead of attaching a gdbtool by hand (as explained in the BGQ Application Development guide, link below). Simply compile with &amp;lt;tt&amp;gt;-g&amp;lt;/tt&amp;gt;, load the &amp;lt;tt&amp;gt;ddt/4.1&amp;lt;/tt&amp;gt; module, type &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt; and follow the graphical user interface.  The DDT user guide can be found below.&lt;br /&gt;
&lt;br /&gt;
Note: when running a job under ddt, you'll need to add &amp;quot;&amp;lt;tt&amp;gt;--ranks-per-node=X&amp;lt;/tt&amp;gt;&amp;quot; to the &amp;quot;runjob arguments&amp;quot; field.&lt;br /&gt;
&lt;br /&gt;
Apart from debugging, this environment is also useful for building libraries and applications that need to run small tests as part of their 'configure' step.   Within the debugjob session, applications compiled with the bgxl compilers or the mpcc/mpCC/mpfort wrappers, will automatically run on the BGQ, skipping the need for the runjob command, provided if you set the following environment variables &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export BG_PGM_LAUNCHER=yes&lt;br /&gt;
$ export RUNJOB_NP=1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The latter setting sets the number of mpi processes to run.  Most configure scripts expect only one mpi process, thus, &amp;lt;tt&amp;gt;RUNJOB_NP=1&amp;lt;/tt&amp;gt; is appropriate.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
debugjob session with an executable implicitly calls runjob  with 1 mpi task :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
debugjob -i&lt;br /&gt;
**********************************************************&lt;br /&gt;
 Interactive BGQ runjob shell using bgq-fen1-ib0.10295.0 and           &lt;br /&gt;
 LL14040718574824 for 30 minutes with 64 NODES (1024 cores). &lt;br /&gt;
 IMPLICIT MODE: running an executable implicitly calls runjob&lt;br /&gt;
                with 1 mpi task&lt;br /&gt;
 Exit shell when finished.                                &lt;br /&gt;
**********************************************************&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Sub-block jobs ===&lt;br /&gt;
&lt;br /&gt;
BGQ allows multiple applications to share the same block, which is referred to as sub-block jobs, however this needs to be done from within the same loadleveler submission script using multiple calls to runjob.  To run a sub-block job, you need to specify a &amp;quot;--corner&amp;quot; within the block to start each job and a 5D Torus AxBxCxDxE &amp;quot;--shape&amp;quot;.  The starting corner will depend on the specific block details provided by loadleveler and the shape and size of job trying to be used.  &lt;br /&gt;
&lt;br /&gt;
Figuring out what the corners and shapes should be is very tricky (especially since it depends on the block you get allocated).  For that reason, we've created a script called &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; that determines the corners and shape of the sub-blocks.  It only handles the (presumable common) case in which you want to subdivide the block into n equally sized sub-blocks, where n may be 1,2,4,8,16 and 32.&lt;br /&gt;
&lt;br /&gt;
Here is an example script calling &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; with a size of 4 that will return the appropriate $SHAPE argument and an array of 16 starting $CORNER. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# @ job_name           = bgsubblock&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job SUBBLOCK &amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue&lt;br /&gt;
&lt;br /&gt;
# Using subblocks script to set $SHAPE and array of ${CORNERS[n]}&lt;br /&gt;
# with size of subblocks in nodes (ie similiar to bg_size)&lt;br /&gt;
&lt;br /&gt;
# In this case 16 sub-blocks of 4 cnodes each (64 total ie bg_size)&lt;br /&gt;
source subblocks 4&lt;br /&gt;
&lt;br /&gt;
# 16 jobs of 4 each&lt;br /&gt;
for (( i=0; i &amp;lt;  16 ; i++)); do&lt;br /&gt;
   runjob --corner ${CORNER[$i]} --shape $SHAPE --np 64 --ranks-per-node=16 :  your_code_here &amp;gt; $i.out &amp;amp;&lt;br /&gt;
done&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Remember that subjobs are not the ideal way to run on the BlueGene/Qs. One needs to consider that these sub-blocks all have to share the same I/O nodes, so for I/O intensive jobs this will be an inefficient setup.  Also consider that if you need to run such small jobs that you have to run in sub-blocks, it may be more efficient to use other clusters such as the GPC.&lt;br /&gt;
&lt;br /&gt;
Let us know if you run into any issues with this technique, please contact bgq-support for help.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The BGQ has its own dedicated 500TB file system based on GPFS (General Parallel File System). There are two main systems for user data: /home, a small, backed-up space where user home directories are located, and /scratch, a large system for input or output data for jobs; data on /scratch is not backed up. The path to your home directory is in the environment variable $HOME, and will look like /home/G/GROUP/USER, .  The path to your scratch directory is in the environment variable $SCRATCH, and will look like /scratch/G/GROUP/USER (following the conventions of the rest of the SciNet systems).  &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! | file system &lt;br /&gt;
! | purpose &lt;br /&gt;
! | user quota &lt;br /&gt;
! | backed up&lt;br /&gt;
! | purged&lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 50 GB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
|-&lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| first of (20 TB ; 1 million files)&lt;br /&gt;
| no&lt;br /&gt;
| not currently&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Transfering files===&lt;br /&gt;
The BGQ GPFS file system,  except for HPSS, is '''not''' shared with the other SciNet systems (gpc, tcs, p7, arc), nor is the other file system mounted on the BGQ.  &lt;br /&gt;
Use scp to copy files from one file system to the other, e.g., from bgqdev-fen1, you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour login.scinet.utoronto.ca:code.tgz .&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or from a login node you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour code.tgz bgqdev.scinet.utoronto.ca:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The flag &amp;lt;tt&amp;gt;-c arcfour&amp;lt;/tt&amp;gt; is optional. It tells scp (or really, ssh), to use a non-default encryption. The one chosen here, arcfour, has been found to speed up the transfer by a factor of two (you may expect around 85MB/s).  This encryption method is only recommended for copying from the BGQ file system to the regular SciNet GPFS file system or back. &lt;br /&gt;
 &lt;br /&gt;
Note that although these transfers are witihin the same data center, you have to use the full names of the systems, login.scinet.utoronto.ca and bgq.scinet.utoronto.ca, respectively, and that you will be asked you for your password.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''diskUsage'''&amp;lt;/tt&amp;gt; command, available on the bgqdev nodes, provides information in a number of ways on the home and scratch file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;] [-de|-plot]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
       -plot: create plots of disk usages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that the information on usage and quota is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
===Bridge to HPSS===&lt;br /&gt;
&lt;br /&gt;
BGQ users may transfer material to/from HPSS via the GPC archive queue. On the HPSS gateway node (gpc-archive01), the BGQ GPFS file systems are mounted under a single mounting point /bgq (/bgq/scratch and /bgq/home). For detailed information on the use of HPSS [https://support.scinet.utoronto.ca/wiki/index.php/HPSS please read the HPSS wiki section.]&lt;br /&gt;
&lt;br /&gt;
== Software modules installed on the BGQ ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! |Software  &lt;br /&gt;
! | Version&lt;br /&gt;
! | Comments&lt;br /&gt;
! | Command/Library&lt;br /&gt;
! | Module Name&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Compilers &amp;amp; Development Tools'''''&lt;br /&gt;
|-&lt;br /&gt;
|IBM fortran compiler&lt;br /&gt;
|14.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlf,bgxlf_r,bgxlf90,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|xlf&lt;br /&gt;
|-&lt;br /&gt;
|IBM c/c++ compilers&lt;br /&gt;
|12.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlc,bgxlC,bgxlc_r,bgxlC_r,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|vacpp&lt;br /&gt;
|-&lt;br /&gt;
|MPICH2 MPI library&lt;br /&gt;
|1.4.1&lt;br /&gt;
|There are 4 versions (see BGQ Applications Development document).&lt;br /&gt;
|&amp;lt;tt&amp;gt;mpicc,mpicxx,mpif77,mpif90&amp;lt;/tt&amp;gt;&lt;br /&gt;
|mpich2&lt;br /&gt;
|- &lt;br /&gt;
| GCC Compiler&lt;br /&gt;
| 4.4.6, 4.8.1&lt;br /&gt;
| GNU Compiler Collection for BGQ&amp;lt;br&amp;gt;(4.8.1 requires binutils/2.23 to be loaded)&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-gcc, powerpc64-bgq-linux-g++, powerpc64-bgq-linux-gfortran&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgqgcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Clang Compiler&lt;br /&gt;
| r217688-20140912, r263698-20160317&lt;br /&gt;
| Clang cross-compilers for bgq&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-clang, powerpc64-bgq-linux-clang++&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgclang&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Binutils&lt;br /&gt;
| 2.21.1, 2.23&lt;br /&gt;
| Cross-compilation utilities&lt;br /&gt;
| &amp;lt;tt&amp;gt;addr2line, ar, ld, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;binutils&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| CMake	&lt;br /&gt;
| 2.8.8, 2.8.12.1&lt;br /&gt;
| cross-platform, open-source build system&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Git&lt;br /&gt;
| 1.9.5&lt;br /&gt;
| Revision control system&lt;br /&gt;
| &amp;lt;tt&amp;gt;git, gitk&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;git&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Debug/performance tools'''''&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/gdb/ gdb]&lt;br /&gt;
| 7.2&lt;br /&gt;
| GNU Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/ddd/ ddd]&lt;br /&gt;
| 3.3.12&lt;br /&gt;
| GNO Data Display Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [http://www.allinea.com/products/ddt/ DDT]&lt;br /&gt;
| 4.1, 4.2, 5.0.1&lt;br /&gt;
| Allinea's Distributed Debugging Tool&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[HPCTW]]&lt;br /&gt;
| 1.0&lt;br /&gt;
| BGQ MPI and Hardware Counters&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmpihpm.a, libmpihpm_smp.a, libmpitrace.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hptibm&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[MemP]]&lt;br /&gt;
| 1.0.3&lt;br /&gt;
| BGQ Memory Stats&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmemP.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;memP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Storage tools/libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.9-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/189-v18-serial-xlc*&amp;lt;br/&amp;gt;hdf5/189-v18-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.12-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/1812-v18-serial-gcc&amp;lt;br/&amp;gt;hdf5/1812-v18-mpich2-gcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NetCDF&lt;br /&gt;
| 4.2.1.1&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncdump,ncgen,libnetcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;netcdf/4.2.1.1-serial-xlc*&amp;lt;br/&amp;gt;netcdf/4.2.1.1-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Parallel NetCDF&lt;br /&gt;
| 1.3.1&lt;br /&gt;
| Parallel scientific data storage and retrieval using MPI-IO&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpnetcdf.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parallel-netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| ESSL&lt;br /&gt;
| 5.1&lt;br /&gt;
| IBM Engineering and Scientific Subroutine Library (manual below)&lt;br /&gt;
| &amp;lt;tt&amp;gt;libesslbg,libesslsmpbg&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;essl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| WSMP&lt;br /&gt;
| 15.06.01&lt;br /&gt;
| Watson Sparse Matrix Package&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpwsmpBGQ.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;WSMP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| FFTW&lt;br /&gt;
| 2.1.5, 3.3.2, 3.1.2-esslwrapper&lt;br /&gt;
| Fast fourier transform &lt;br /&gt;
| &amp;lt;tt&amp;gt;libsfftw,libdfftw,libfftw3, libfftw3f&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;fftw/2.1.5, fftw/3.3.2, fftw/3.1.2-esslwrapper&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAPACK + ScaLAPACK&lt;br /&gt;
| 3.4.2 + 2.0.2&lt;br /&gt;
| Linear algebra routines. A subset of Lapack may be found in ESSL as well.&lt;br /&gt;
| &amp;lt;tt&amp;gt;liblapack, libscalpack&amp;lt;/tt&amp;gt;&lt;br /&gt;
| lapack&lt;br /&gt;
|-&lt;br /&gt;
| GSL&lt;br /&gt;
| 1.15&lt;br /&gt;
| GNU Scientific Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libgsl, libgslcblas&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gsl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| BOOST&lt;br /&gt;
| 1.47.0, 1.54, 1.57&lt;br /&gt;
| C++ Boost libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libboost...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cxxlibraries/boost&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| bzip2 + szip + zlib&lt;br /&gt;
| 1.0.6 + 2.1 + 1.2.7&lt;br /&gt;
| compression libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libbz2,libz,libsz&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;compression&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| METIS&lt;br /&gt;
| 5.0.2&lt;br /&gt;
| Serial Graph Partitioning and Fill-reducing Matrix Ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;metis&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| ParMETIS&lt;br /&gt;
| 4.0.2&lt;br /&gt;
| Parallel graph partitioning and fill-reducing matrix ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libparmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parmetis&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| OpenSSL&lt;br /&gt;
| 1.0.2 &lt;br /&gt;
| General-purpose cryptography library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libcrypto, libssl&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openssl&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| FILTLAN&lt;br /&gt;
| 1.0&lt;br /&gt;
| The Filtered Lanczos Package &lt;br /&gt;
| &amp;lt;tt&amp;gt;libdfiltlan,libdmatkit,libsfiltlan,libsmatkit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FILTLAN&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Scripting/interpreted languages'''''&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.6.6&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-2.6/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.7.3&lt;br /&gt;
| Python programming language. Modules included : numpy-1.8.0, pyFFTW-0.9.2, astropy-0.3, scipy-0.13.3, mpi4py-1.3.1, h5py-2.2.1&lt;br /&gt;
| &amp;lt;tt&amp;gt;/scinet/bgq/tools/Python/python2.7.3-20131205/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 3.2.2&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-3.2/bin/python3&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Applications'''''&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.abinit.org/ ABINIT]&lt;br /&gt;
| 7.10.4&lt;br /&gt;
| An atomic-scale simulation software suite&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.berkeleygw.org/ BerkeleyGW library]&lt;br /&gt;
| 1.0.4-2.0.0436&lt;br /&gt;
| Computes quasiparticle properties and the optical responses of a large variety of materials&lt;br /&gt;
| &amp;lt;tt&amp;gt;libBGW_wfn.a, wfn_rho_vxc_io_m.mod&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;BGW-paratec&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.cp2k.org/ CP2K]&lt;br /&gt;
| 2.3, 2.4, 2.5.1, 2.6.1&lt;br /&gt;
| DFT molecular dynamics, MPI &lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k.psmp&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.cpmd.org/ CPMD]&lt;br /&gt;
| 3.15.3, 3.17.1&lt;br /&gt;
| Carr-Parinello molecular dynamics, MPI&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd.x&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| gnuplot&lt;br /&gt;
| 4.6.1&lt;br /&gt;
| interactive plotting program to be run on front-end nodes&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAMMPS&lt;br /&gt;
| Nov 2012/7Dec15/7Dec15-mpi&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;lmp_bgq&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;lammps&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NAMD&lt;br /&gt;
| 2.9&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;namd2&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;namd/2.9-smp&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.quantum-espresso.org/index.php Quantum Espresso]&lt;br /&gt;
| 5.0.3/5.2.1&lt;br /&gt;
| Molecular Structure / Quantum Chemistry &lt;br /&gt;
| &amp;lt;tt&amp;gt;qe_pw.x, etc&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;espresso&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://openfoam.org OpenFOAM]&lt;br /&gt;
| 2.2.0, 2.3.0, 2.4.0, 3.0.1, 5.0&lt;br /&gt;
| Computational Fluid Dynamics&lt;br /&gt;
| &amp;lt;tt&amp;gt;icofoam,etc. &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openfoam/2.2.0, openfoam/2.3.0, openfoam/2.4.0, openfoam/3.0.1, openfoam/5.0&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Beta Tests'''''&lt;br /&gt;
|-&lt;br /&gt;
| WATSON API&lt;br /&gt;
| beta&lt;br /&gt;
| Natural Language Processing&lt;br /&gt;
| &amp;lt;tt&amp;gt;watson_beta&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FEN/WATSON&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== OpenFOAM on BGQ ===&lt;br /&gt;
&lt;br /&gt;
[https://docs.scinet.utoronto.ca/index.php/OpenFOAM_on_BGQ A detailed explanation of OpenFOAM usage on BG/Q cluster]&lt;br /&gt;
&lt;br /&gt;
== Python on BlueGene ==&lt;br /&gt;
Python 2.7.3 has been installed on BlueGene. To use &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Numpy&amp;lt;/span&amp;gt; and &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Scipy&amp;lt;/span&amp;gt;, the module &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;essl/5.1&amp;lt;/span&amp;gt; has to be loaded.&lt;br /&gt;
The full python path has to be provided (otherwise the default version is used).&lt;br /&gt;
&lt;br /&gt;
To use python on BlueGene (from within a job script or a debugjob session):&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load python/2.7.3&lt;br /&gt;
##Only if you need numpy/scipy :&lt;br /&gt;
module load xlf/14.1 essl/5.1&lt;br /&gt;
runjob --np 1 --ranks-per-node=1 --envs HOME=$HOME LD_LIBRARY_PATH=$LD_LIBRARY_PATH PYTHONPATH=/scinet/bgq/tools/Python/python2.7.3-20131205/lib/python2.7/site-packages/ : /scinet/bgq/tools/Python/python2.7.3-20131205/bin/python2.7 /PATHOFYOURSCRIPT.py &lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to use the mmap python API, you must use it in PRIVATE mode as shown in the bellow example :&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
import mmap&lt;br /&gt;
mm=mmap.mmap(-1,256,mmap.MAP_PRIVATE)&lt;br /&gt;
mm.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, you can use the mpi4py and h5py modules.&lt;br /&gt;
&lt;br /&gt;
Also, please read Cython documentation.&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
#BGQ Day: Introduction to Using the BG/Q [[Media:BgqintroUpdatedMarch2015.pdf|Slides (updated in 2015) ]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.mp4 (direct link)]&lt;br /&gt;
#BGQ Day: BG/Q Hardware Overview [https://support.scinet.utoronto.ca/~northrup/bgqhardware.pdf Slides] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.mp4 (direct link)]&lt;br /&gt;
# [http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUQUEEN/Documentation/Documention_node.html Julich BGQ Documentation]&lt;br /&gt;
# [https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q Argonne Mira BGQ Wiki]&lt;br /&gt;
# [https://computing.llnl.gov/tutorials/bgq/ LLNL Sequoia BGQ Info]&lt;br /&gt;
# [https://www.alcf.anl.gov/presentations Argonne MiraCon Presentations]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_SysAdmin.pdf|BGQ System Administration Guide]]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_AppDev.pdf|BGQ Application Development]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqccompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqclangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfcompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqflangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:Bgqfproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# [[Media:essl51.pdf|IBM ESSL (Engineering and Scientific Subroutine Library) 5.1 for Linux on Power]]&lt;br /&gt;
# [http://content.allinea.com/downloads/userguide.pdf Allinea DDT 4.1 User Guide]&lt;br /&gt;
# [https://www.ibm.com/support/knowledgecenter/en/SSFJTW_5.1.0/loadl.v5r1_welcome.html IBM LoadLeveler 5.1]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--  PUT IN TRAC !!!&lt;br /&gt;
&lt;br /&gt;
=== *Manual Block Creation* ===&lt;br /&gt;
&lt;br /&gt;
To reconfigure the BGQ nodes you can use the bg_console or the web based navigator from the service node &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bg_console&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are various options to create block types (section 3.2 in the BGQ admin manual), but the smallest is created using the&lt;br /&gt;
following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gen_small_block &amp;lt;blockid&amp;gt; &amp;lt;midplane&amp;gt; &amp;lt;cnodes&amp;gt; &amp;lt;nodeboard&amp;gt; &lt;br /&gt;
gen_small_block  R00-M0-N03-32 R00-M0 32 N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The block then needs to be booted using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
allocate R00-M0-N03-32&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If those resources are already booted into another block, that block must be freed before the new block can be &lt;br /&gt;
allocated.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
free R00-M0-N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many other functions in bg_console:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
help all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The BGQ default nomenclature for hardware is as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
(R)ack - (M)idplane - (N)ode board or block - (J)node - (C)ore&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So R00-M01-N03-J00-C02 would correspond to the first rack, second midplane, 3rd block, 1st node, and second core.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--!&amp;gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=442</id>
		<title>BGQ</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=442"/>
		<updated>2018-05-25T18:19:27Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* Software modules installed on the BGQ */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Blue_Gene_Cabinet.jpeg|center|300px|thumb]]&lt;br /&gt;
|name=Blue Gene/Q (BGQ)&lt;br /&gt;
|installed=Aug 2012, Nov 2014&lt;br /&gt;
|operatingsystem= RH6.3, CNK (Linux) &lt;br /&gt;
|loginnode= bgqdev-fen1&lt;br /&gt;
|nnodes=  4096 nodes (65,536 cores)&lt;br /&gt;
|rampernode=16 GB &lt;br /&gt;
|corespernode=16 (64 threads)&lt;br /&gt;
|interconnect=5D Torus (jobs), QDR Infiniband (I/O) &lt;br /&gt;
|vendorcompilers= bgxlc, bgxlf&lt;br /&gt;
|queuetype=Loadleveler&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==System Status==&lt;br /&gt;
&lt;br /&gt;
The current BGQ system status can be found on the wiki's [[Main Page]].&lt;br /&gt;
&lt;br /&gt;
==SOSCIP &amp;amp; LKSAVI==&lt;br /&gt;
&lt;br /&gt;
The BGQ is a Southern Ontario Smart Computing&lt;br /&gt;
Innovation Platform ([http://soscip.org/ SOSCIP]) BlueGene/Q supercomputer located at the&lt;br /&gt;
University of Toronto's SciNet HPC facility. The SOSCIP &lt;br /&gt;
multi-university/industry consortium is funded by the Ontario Government &lt;br /&gt;
and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
A half-rack of BlueGene/Q (8,192 cores) was purchased by the [http://likashingvirology.med.ualberta.ca/ Li Ka Shing Institute of Virology] at the University of Alberta in late fall 2014 and integrated into the existing BGQ system.&lt;br /&gt;
&lt;br /&gt;
The combined 4 rack system is the fastest Canadian supercomputer on the [http://top500.org/ top 500], currently at the 120th place (Nov 2015).&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:bgq-support@scinet.utoronto.ca &amp;lt;bgq-support@scinet.utoronto.ca&amp;gt;] for BGQ-specific inquiries.&lt;br /&gt;
&lt;br /&gt;
==Specifications==&lt;br /&gt;
&lt;br /&gt;
BGQ is an extremely dense and energy efficient 3rd generation Blue Gene IBM supercomputer built around a system-on-a-chip compute node that has a 16core 1.6GHz PowerPC based CPU (PowerPC A2) with 16GB of Ram.  The nodes are bundled in groups of 32 into a node board (512 cores), and 16 boards make up a midplane (8192 cores) with 2 midplanes per rack, or 16,348 cores and 16 TB of RAM per rack. The compute nodes run a very lightweight Linux-based operating system called CNK ('''C'''ompute '''N'''ode '''K'''ernel).  The compute nodes are all connected together using a custom 5D torus highspeed interconnect. Each rack has 16 I/O nodes that run a full Redhat Linux OS that manages the compute nodes and mounts the filesystem.  SciNet's BGQ consists of 8 mdiplanes (four-racks) totalling 65,536 cores and 64TB of RAM.&lt;br /&gt;
&lt;br /&gt;
[[Image:BlueGeneQHardware2.png‎ |center]]&lt;br /&gt;
&lt;br /&gt;
=== 5D Torus Network ===&lt;br /&gt;
&lt;br /&gt;
The network topology of BlueGene/Q is a five-dimensional (5D) torus, with direct links between the nearest neighbors in the ±A, ±B, ±C, ±D, and ±E directions.  As such there are only a few optimum block sizes that will use the network efficiently.&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellspacing=&amp;quot;0&amp;quot; cellpadding=&amp;quot;2&amp;quot;&lt;br /&gt;
| '''Node Boards '''&lt;br /&gt;
| '''Compute Nodes'''&lt;br /&gt;
| '''Cores'''&lt;br /&gt;
| '''Torus Dimensions'''&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 32&lt;br /&gt;
| 512&lt;br /&gt;
| 2x2x2x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 2 (adjacent pairs)&lt;br /&gt;
| 64&lt;br /&gt;
| 1024&lt;br /&gt;
| 2x2x4x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 4 (quadrants)&lt;br /&gt;
| 128&lt;br /&gt;
| 2048&lt;br /&gt;
| 2x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 8 (halves)&lt;br /&gt;
| 256&lt;br /&gt;
| 4096&lt;br /&gt;
| 4x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 16 (midplane)&lt;br /&gt;
| 512&lt;br /&gt;
| 8192&lt;br /&gt;
| 4x4x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 32 (1 rack)&lt;br /&gt;
| 1024&lt;br /&gt;
| 16384&lt;br /&gt;
| 4x4x4x8x2 &lt;br /&gt;
|-&lt;br /&gt;
| 64 (2 racks)&lt;br /&gt;
| 2048&lt;br /&gt;
| 32768&lt;br /&gt;
| 4x4x8x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 96 (3 racks)&lt;br /&gt;
| 3072&lt;br /&gt;
| 49152&lt;br /&gt;
| 4x4x12x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 128 (4 racks)&lt;br /&gt;
| 4096&lt;br /&gt;
| 65536&lt;br /&gt;
| 8x4x8x8x2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Login/Devel Node ==&lt;br /&gt;
&lt;br /&gt;
The development node is '''bgqdev-fen1''' which one can login to from the regular '''login.scinet.utoronto.ca''' login nodes or directly from outside using '''bgqdev.scinet.utoronto.ca''', e.g.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -l USERNAME bgqdev.scinet.utoronto.ca -X&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where USERNAME is your username on the BGQ and the &amp;lt;tt&amp;gt;-X&amp;lt;/tt&amp;gt; flag is optional, needed only if you will use X graphics.&amp;lt;br/&amp;gt;&lt;br /&gt;
Note: To learn how to setup ssh keys for logging in please see [[Ssh keys]].&lt;br /&gt;
&lt;br /&gt;
These development node is a Power7 machines running Linux which serve as the compilation and submission host for the BGQ.  Programs are cross-compiled for the BGQ on this node and then submitted to the queue using loadleveler.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including most of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
A list of the installed software can be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module unload vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands can go in your .bashrc files to make sure you are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
Modules that load libraries, define environment variables pointing to the location of library files and include files for use Makefiles. These environment variables follow the naming convention&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 $SCINET_[short-module-name]_BASE&lt;br /&gt;
 $SCINET_[short-module-name]_LIB&lt;br /&gt;
 $SCINET_[short-module-name]_INC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
for the base location of the module's files, the location of the libraries binaries and the header files, respectively.&lt;br /&gt;
&lt;br /&gt;
So to compile and link the library, you will have to add &amp;lt;tt&amp;gt;-I${SCINET_[module-basename]_INC}&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;-L${SCINET_[module-basename]_LIB}&amp;lt;/tt&amp;gt;, respectively, in addition to the usual &amp;lt;tt&amp;gt;-l[libname]&amp;lt;/tt&amp;gt;.  &lt;br /&gt;
&lt;br /&gt;
Note that a &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; command ''only'' sets the environment variables in your current shell (and any subprocesses that the shell launches).   It does ''not'' effect other shell environments.&lt;br /&gt;
&lt;br /&gt;
If you always require the same modules, it is easiest to load those modules in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and then they will always be present in your environment; if you routinely have to flip back and forth between modules, it is easiest to have almost no modules loaded in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and simply load them as you need them (and have the required &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; commands in your job submission scripts).&lt;br /&gt;
&lt;br /&gt;
=== Compilers ===&lt;br /&gt;
&lt;br /&gt;
The BGQ uses IBM XL compilers to cross-compile code for the BGQ.  Compilers are available for FORTRAN, C, and C++.  They are accessible by default, or by loading the '''xlf''' and '''vacpp''' modules. The compilers by default produce&lt;br /&gt;
static binaries, however with BGQ it is possible to now use dynamic libraries as well.  The compilers follow the XL conventions with the prefix '''bg''',&lt;br /&gt;
so '''bgxlc''' and '''bgxlf90''' are the C and FORTRAN compilers respectively.  &lt;br /&gt;
&lt;br /&gt;
Most users however will use the MPI variants, i.e. '''mpixlf90''' and '''mpixlc''' and  which are available by loading&lt;br /&gt;
the '''mpich2''' module. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load mpich2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is recommended to use at least the following flags when compiling and linking&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-O3 -qarch=qp -qtune=qp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to build a package for which the configure script tries to run small test jobs, the cross-compiling nature of the bgq can get in the way.  In that case, you should use the interactive [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] environment as described below.&lt;br /&gt;
&lt;br /&gt;
== ION/Devel Nodes ==&lt;br /&gt;
&lt;br /&gt;
There are also bgq native development nodes named '''bgqdev-ion[01-24]''' which one can login to directly, i.e. ssh, from '''bgqdev-fen1'''.  These nodes are extra I/O nodes that are essentially the same as the BGQ compute nodes with the exception that they run a full RedHat Linux and have an infiniband interface providing direct network access.    Unlike the regular development node, '''bgqdev-fen1''', which is Power7, this node has the same BGQ A2 processor, and thus cross compilations are not required which can make building some software easier.    &lt;br /&gt;
&lt;br /&gt;
'''NOTE''': BGQ MPI jobs can be compiled on these nodes, however can not be run locally as the mpich2 is setup for the BGQ network and thus will fail on these nodes.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
As the BlueGene/Q architecture is different from the development nodes, you cannot run applications intended/compiled for the BGQ on the devel nodes. The only way to run (or even test) your program is to submit a job to the BGQ.  Jobs are submitted as scripts through loadleveler. That script must then use '''runjob''' to start the job, which in many ways similar to mpirun or mpiexec.  As shown above in the network topology overview, there are only a few optimum job size configurations which is also further constrained by each block requiring a minimum of one IO node.  In SciNet's configuration (with 8 I/O nodes per midplane) this allows 64 nodes (1024 cores) to be the smallest block size. Normally a block size matches the job size to offer fully dedicated resources to the job.  Smaller jobs can be run within the same block however this results in shared resources (network and IO) and are referred to as sub-block jobs and are described in more detail below.  &lt;br /&gt;
&lt;br /&gt;
=== runjob ===&lt;br /&gt;
&lt;br /&gt;
All BGQ runs are launched using '''runjob''' which for those familiar with MPI is analogous to mpirun/mpiexec.  Jobs run on a block, which is a predefined group of nodes that have already been configured and booted.  There are two ways to get a block. One way is to use a 30-minute 'debugjob' session (more about that below). The other, more common case, is using a job script submitted and are running using loadleveler. Inside the job script, this block is set for you, and you do not have to specify the block name.  For example, if your loadleveler job script requests 64 nodes, each with 16 cores (for a total of 1024 cores), from within that job script, you can run a job with 16 processes per node and 1024 total processes with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here, &amp;lt;tt&amp;gt;--np 1024&amp;lt;/tt&amp;gt; sets the total number of mpi tasks, while &amp;lt;tt&amp;gt;--ranks-per-node=16&amp;lt;/tt&amp;gt; specifies that 16 processes should run on each node.&lt;br /&gt;
For pure mpi jobs, it is advisable always to give the number of ranks per node, because the default value of 1 may leave 15 cores on the node idle. The argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- (Note: If this were not a loadleveler job, and the block ID was R00-M0-N03-64, the command would be &amp;quot;&amp;lt;tt&amp;gt;runjob --block R00-M0-N03-64 --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&amp;lt;/tt&amp;gt;&amp;quot;) --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
runjob flags are shown with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
a particularly useful one is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--verbose #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where # is from 1-7 which can be helpful in debugging an application.&lt;br /&gt;
&lt;br /&gt;
=== How to set ranks-per-node ===&lt;br /&gt;
&lt;br /&gt;
There are 16 cores per node, but the argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64.  While it may seem natural to set ranks-per-node to 16, this is not generally recommended.  On the BGQ, one can efficiently run more than 1 process per core, because each core has four &amp;quot;hardware threads&amp;quot; (similar to HyperThreading on the GPC and Simultaneous Multi Threading on the TCS and P7), which can keep the different parts of each core busy at the same time. One would therefore ideally use 64 ranks per node.  There are two main reason why one might not set ranks-per-node to 64:&lt;br /&gt;
# The memory requirements do not allow 64 ranks (each rank only has 256MB of memory)&lt;br /&gt;
# The application is more efficient in a hybrid MPI/OpenMP mode (or MPI/pthreads). Using less ranks-per-node, the hardware threads are used as OpenMP threads within each process.&lt;br /&gt;
Because threads can share memory, the memory requirements of the hybrid runs is typically smaller than that of pure MPI runs.&lt;br /&gt;
&lt;br /&gt;
Note that the total number of mpi processes in a runjob (i.e., the --np argument) should be the ranks-per-node times the number of nodes (set by bg_size in the loadleveler script). So for the same number of nodes, if you change ranks-per-node by a factor of two, you should also multiply the total number of mpi processes by two.&lt;br /&gt;
&lt;br /&gt;
=== Queue Limits ===&lt;br /&gt;
&lt;br /&gt;
The maximum wall_clock_limit is 24 hours.  Official SOSCIP project jobs are prioritized over all other jobs using a fairshare algorithm with a 14 day rolling window.&lt;br /&gt;
&lt;br /&gt;
A 64 node block is reserved for development and interactive testing for 16 hours, from 8AM to midnight, everyday including weekends. While you can still reserve an interactive block from midnight to 8AM, the priority is given to batch jobs at that time interval in order to keep the machine usage as high as possible. This block is accessed by using the [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] command which has a 30 minute maximum wall_clock_limit. The purpose of this reservation is to ensure short testing jobs are run quickly without being held up by longer production type jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- We need to recover this functionality again. At the moment it doesn't work&lt;br /&gt;
=== BACKFILL scheduling ===&lt;br /&gt;
To optimize the cluster usage, we encourage users to submit jobs according to the available resources on BGQ. The command &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;llAvailableResources&amp;lt;/span&amp;gt; gives for example :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
On the Devel system : only a debugjob can start immediately&lt;br /&gt;
&lt;br /&gt;
On the Prod. system : a job will start immediately if you use 512 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 256 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 128 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 64 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Batch Jobs ===&lt;br /&gt;
&lt;br /&gt;
Job submission is done through loadleveler with a few blue gene specific commands.  The command &amp;quot;bg_size&amp;quot; is in number of nodes, not cores, so a bg_size=64 would be 64x16=1024 cores.&lt;br /&gt;
&lt;br /&gt;
The parameter &amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;bg_size&amp;lt;/span&amp;gt; can only be equal to 64, 128, 256, 512, 1024 and 2048.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;np&amp;lt;/span&amp;gt; &amp;amp;le; ranks-per-node * bg_size&lt;br /&gt;
&lt;br /&gt;
ranks-per-node &amp;amp;le; np&lt;br /&gt;
&lt;br /&gt;
(ranks-per-node * OMP_NUM_THREADS ) &amp;amp;le; 64 &lt;br /&gt;
&lt;br /&gt;
np : number of MPI processes&lt;br /&gt;
&lt;br /&gt;
ranks-per-node : number of MPI processes per node = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
OMP_NUM_THREADS : number of OpenMP thread per MPI process (for hybrid codes) = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue &lt;br /&gt;
&lt;br /&gt;
# Launch all BGQ jobs using runjob&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit to the queue use &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llsubmit myscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== Steps ( Job dependency) ===&lt;br /&gt;
LoadLeveler has a lot of advanced features to control job submission and execution. One of these features is called steps. This feature allows a series of jobs to be submitted using one script with dependencies defined between the jobs. What this allows is for a series of jobs to be run sequentially, waiting for the previous job, called a step, to be finished before the next job is started. The following example uses the same LoadLeveler script as previously shown, however the #@ step_name and #@ dependency directives are used to rerun the same case three times in a row, waiting until each job is finished to start the next.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step1                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the first step :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step1&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step2                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step1 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the second step if the first one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step2&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step3                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step2 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the third step if the second one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step3&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Jobs ===&lt;br /&gt;
&lt;br /&gt;
To see running jobs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq -b&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
to cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llcancel JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and to look at details of the bluegene resources use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstatus -M all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note: the loadleveler script commands  are not run on a bgq compute node but on the front-end node. Only programs started with runjob run on the bgq compute nodes. You should therefore keep scripting in the submission script to a bare minimum.'''&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Stats ===&lt;br /&gt;
&lt;br /&gt;
Use llbgstats to monitor your own stats and/or your group stats. PIs can also print their (current) monthly report.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstats -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Interactive Use / Debugging ===&lt;br /&gt;
&lt;br /&gt;
As BGQ codes are cross-compiled they cannot be run direclty on the front-nodes.  &lt;br /&gt;
Users however only have access to the BGQ through loadleveler which is appropriate for batch jobs, &lt;br /&gt;
however an interactive session is typically beneficial when debugging and developing.   As such a &lt;br /&gt;
script has been written to allow a session in which runjob can be run interactively.  The script&lt;br /&gt;
uses loadleveler to setup a block and set all the correct environment variables and then launch a spawned shell on&lt;br /&gt;
the front-end node. The '''debugjob''' session currently allows a 30 minute session on 64 nodes and when run on &lt;br /&gt;
'''&amp;lt;tt&amp;gt;bgqdev&amp;lt;/tt&amp;gt;''' runs in a dedicated reservation as described previously in the [[BGQ#Queue_Limits | queue limits]] section. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[user@bgqdev-fen1]$ debugjob&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ runjob --np 64 --ranks-per-node=16 --cwd=$PWD : $PWD/my_code -f myflags&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ exit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For debugging, gdb and Allinea DDT are available. The latter is recommended as it automatically attaches to all the processes of a process (instead of attaching a gdbtool by hand (as explained in the BGQ Application Development guide, link below). Simply compile with &amp;lt;tt&amp;gt;-g&amp;lt;/tt&amp;gt;, load the &amp;lt;tt&amp;gt;ddt/4.1&amp;lt;/tt&amp;gt; module, type &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt; and follow the graphical user interface.  The DDT user guide can be found below.&lt;br /&gt;
&lt;br /&gt;
Note: when running a job under ddt, you'll need to add &amp;quot;&amp;lt;tt&amp;gt;--ranks-per-node=X&amp;lt;/tt&amp;gt;&amp;quot; to the &amp;quot;runjob arguments&amp;quot; field.&lt;br /&gt;
&lt;br /&gt;
Apart from debugging, this environment is also useful for building libraries and applications that need to run small tests as part of their 'configure' step.   Within the debugjob session, applications compiled with the bgxl compilers or the mpcc/mpCC/mpfort wrappers, will automatically run on the BGQ, skipping the need for the runjob command, provided if you set the following environment variables &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export BG_PGM_LAUNCHER=yes&lt;br /&gt;
$ export RUNJOB_NP=1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The latter setting sets the number of mpi processes to run.  Most configure scripts expect only one mpi process, thus, &amp;lt;tt&amp;gt;RUNJOB_NP=1&amp;lt;/tt&amp;gt; is appropriate.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
debugjob session with an executable implicitly calls runjob  with 1 mpi task :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
debugjob -i&lt;br /&gt;
**********************************************************&lt;br /&gt;
 Interactive BGQ runjob shell using bgq-fen1-ib0.10295.0 and           &lt;br /&gt;
 LL14040718574824 for 30 minutes with 64 NODES (1024 cores). &lt;br /&gt;
 IMPLICIT MODE: running an executable implicitly calls runjob&lt;br /&gt;
                with 1 mpi task&lt;br /&gt;
 Exit shell when finished.                                &lt;br /&gt;
**********************************************************&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Sub-block jobs ===&lt;br /&gt;
&lt;br /&gt;
BGQ allows multiple applications to share the same block, which is referred to as sub-block jobs, however this needs to be done from within the same loadleveler submission script using multiple calls to runjob.  To run a sub-block job, you need to specify a &amp;quot;--corner&amp;quot; within the block to start each job and a 5D Torus AxBxCxDxE &amp;quot;--shape&amp;quot;.  The starting corner will depend on the specific block details provided by loadleveler and the shape and size of job trying to be used.  &lt;br /&gt;
&lt;br /&gt;
Figuring out what the corners and shapes should be is very tricky (especially since it depends on the block you get allocated).  For that reason, we've created a script called &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; that determines the corners and shape of the sub-blocks.  It only handles the (presumable common) case in which you want to subdivide the block into n equally sized sub-blocks, where n may be 1,2,4,8,16 and 32.&lt;br /&gt;
&lt;br /&gt;
Here is an example script calling &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; with a size of 4 that will return the appropriate $SHAPE argument and an array of 16 starting $CORNER. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# @ job_name           = bgsubblock&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job SUBBLOCK &amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue&lt;br /&gt;
&lt;br /&gt;
# Using subblocks script to set $SHAPE and array of ${CORNERS[n]}&lt;br /&gt;
# with size of subblocks in nodes (ie similiar to bg_size)&lt;br /&gt;
&lt;br /&gt;
# In this case 16 sub-blocks of 4 cnodes each (64 total ie bg_size)&lt;br /&gt;
source subblocks 4&lt;br /&gt;
&lt;br /&gt;
# 16 jobs of 4 each&lt;br /&gt;
for (( i=0; i &amp;lt;  16 ; i++)); do&lt;br /&gt;
   runjob --corner ${CORNER[$i]} --shape $SHAPE --np 64 --ranks-per-node=16 :  your_code_here &amp;gt; $i.out &amp;amp;&lt;br /&gt;
done&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Remember that subjobs are not the ideal way to run on the BlueGene/Qs. One needs to consider that these sub-blocks all have to share the same I/O nodes, so for I/O intensive jobs this will be an inefficient setup.  Also consider that if you need to run such small jobs that you have to run in sub-blocks, it may be more efficient to use other clusters such as the GPC.&lt;br /&gt;
&lt;br /&gt;
Let us know if you run into any issues with this technique, please contact bgq-support for help.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The BGQ has its own dedicated 500TB file system based on GPFS (General Parallel File System). There are two main systems for user data: /home, a small, backed-up space where user home directories are located, and /scratch, a large system for input or output data for jobs; data on /scratch is not backed up. The path to your home directory is in the environment variable $HOME, and will look like /home/G/GROUP/USER, .  The path to your scratch directory is in the environment variable $SCRATCH, and will look like /scratch/G/GROUP/USER (following the conventions of the rest of the SciNet systems).  &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! | file system &lt;br /&gt;
! | purpose &lt;br /&gt;
! | user quota &lt;br /&gt;
! | backed up&lt;br /&gt;
! | purged&lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 50 GB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
|-&lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| first of (20 TB ; 1 million files)&lt;br /&gt;
| no&lt;br /&gt;
| not currently&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Transfering files===&lt;br /&gt;
The BGQ GPFS file system,  except for HPSS, is '''not''' shared with the other SciNet systems (gpc, tcs, p7, arc), nor is the other file system mounted on the BGQ.  &lt;br /&gt;
Use scp to copy files from one file system to the other, e.g., from bgqdev-fen1, you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour login.scinet.utoronto.ca:code.tgz .&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or from a login node you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour code.tgz bgqdev.scinet.utoronto.ca:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The flag &amp;lt;tt&amp;gt;-c arcfour&amp;lt;/tt&amp;gt; is optional. It tells scp (or really, ssh), to use a non-default encryption. The one chosen here, arcfour, has been found to speed up the transfer by a factor of two (you may expect around 85MB/s).  This encryption method is only recommended for copying from the BGQ file system to the regular SciNet GPFS file system or back. &lt;br /&gt;
 &lt;br /&gt;
Note that although these transfers are witihin the same data center, you have to use the full names of the systems, login.scinet.utoronto.ca and bgq.scinet.utoronto.ca, respectively, and that you will be asked you for your password.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''diskUsage'''&amp;lt;/tt&amp;gt; command, available on the bgqdev nodes, provides information in a number of ways on the home and scratch file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;] [-de|-plot]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
       -plot: create plots of disk usages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that the information on usage and quota is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
===Bridge to HPSS===&lt;br /&gt;
&lt;br /&gt;
BGQ users may transfer material to/from HPSS via the GPC archive queue. On the HPSS gateway node (gpc-archive01), the BGQ GPFS file systems are mounted under a single mounting point /bgq (/bgq/scratch and /bgq/home). For detailed information on the use of HPSS [https://support.scinet.utoronto.ca/wiki/index.php/HPSS please read the HPSS wiki section.]&lt;br /&gt;
&lt;br /&gt;
== Software modules installed on the BGQ ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! |Software  &lt;br /&gt;
! | Version&lt;br /&gt;
! | Comments&lt;br /&gt;
! | Command/Library&lt;br /&gt;
! | Module Name&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Compilers &amp;amp; Development Tools'''''&lt;br /&gt;
|-&lt;br /&gt;
|IBM fortran compiler&lt;br /&gt;
|14.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlf,bgxlf_r,bgxlf90,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|xlf&lt;br /&gt;
|-&lt;br /&gt;
|IBM c/c++ compilers&lt;br /&gt;
|12.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlc,bgxlC,bgxlc_r,bgxlC_r,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|vacpp&lt;br /&gt;
|-&lt;br /&gt;
|MPICH2 MPI library&lt;br /&gt;
|1.4.1&lt;br /&gt;
|There are 4 versions (see BGQ Applications Development document).&lt;br /&gt;
|&amp;lt;tt&amp;gt;mpicc,mpicxx,mpif77,mpif90&amp;lt;/tt&amp;gt;&lt;br /&gt;
|mpich2&lt;br /&gt;
|- &lt;br /&gt;
| GCC Compiler&lt;br /&gt;
| 4.4.6, 4.8.1&lt;br /&gt;
| GNU Compiler Collection for BGQ&amp;lt;br&amp;gt;(4.8.1 requires binutils/2.23 to be loaded)&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-gcc, powerpc64-bgq-linux-g++, powerpc64-bgq-linux-gfortran&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgqgcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Clang Compiler&lt;br /&gt;
| r217688-20140912, r263698-20160317&lt;br /&gt;
| Clang cross-compilers for bgq&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-clang, powerpc64-bgq-linux-clang++&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgclang&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Binutils&lt;br /&gt;
| 2.21.1, 2.23&lt;br /&gt;
| Cross-compilation utilities&lt;br /&gt;
| &amp;lt;tt&amp;gt;addr2line, ar, ld, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;binutils&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| CMake	&lt;br /&gt;
| 2.8.8, 2.8.12.1&lt;br /&gt;
| cross-platform, open-source build system&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Git&lt;br /&gt;
| 1.9.5&lt;br /&gt;
| Revision control system&lt;br /&gt;
| &amp;lt;tt&amp;gt;git, gitk&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;git&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Debug/performance tools'''''&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/gdb/ gdb]&lt;br /&gt;
| 7.2&lt;br /&gt;
| GNU Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/ddd/ ddd]&lt;br /&gt;
| 3.3.12&lt;br /&gt;
| GNO Data Display Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [http://www.allinea.com/products/ddt/ DDT]&lt;br /&gt;
| 4.1, 4.2, 5.0.1&lt;br /&gt;
| Allinea's Distributed Debugging Tool&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[HPCTW]]&lt;br /&gt;
| 1.0&lt;br /&gt;
| BGQ MPI and Hardware Counters&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmpihpm.a, libmpihpm_smp.a, libmpitrace.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hptibm&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[MemP]]&lt;br /&gt;
| 1.0.3&lt;br /&gt;
| BGQ Memory Stats&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmemP.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;memP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Storage tools/libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.9-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/189-v18-serial-xlc*&amp;lt;br/&amp;gt;hdf5/189-v18-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.12-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/1812-v18-serial-gcc&amp;lt;br/&amp;gt;hdf5/1812-v18-mpich2-gcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NetCDF&lt;br /&gt;
| 4.2.1.1&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncdump,ncgen,libnetcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;netcdf/4.2.1.1-serial-xlc*&amp;lt;br/&amp;gt;netcdf/4.2.1.1-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Parallel NetCDF&lt;br /&gt;
| 1.3.1&lt;br /&gt;
| Parallel scientific data storage and retrieval using MPI-IO&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpnetcdf.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parallel-netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| ESSL&lt;br /&gt;
| 5.1&lt;br /&gt;
| IBM Engineering and Scientific Subroutine Library (manual below)&lt;br /&gt;
| &amp;lt;tt&amp;gt;libesslbg,libesslsmpbg&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;essl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| WSMP&lt;br /&gt;
| 15.06.01&lt;br /&gt;
| Watson Sparse Matrix Package&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpwsmpBGQ.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;WSMP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| FFTW&lt;br /&gt;
| 2.1.5, 3.3.2, 3.1.2-esslwrapper&lt;br /&gt;
| Fast fourier transform &lt;br /&gt;
| &amp;lt;tt&amp;gt;libsfftw,libdfftw,libfftw3, libfftw3f&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;fftw/2.1.5, fftw/3.3.2, fftw/3.1.2-esslwrapper&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAPACK + ScaLAPACK&lt;br /&gt;
| 3.4.2 + 2.0.2&lt;br /&gt;
| Linear algebra routines. A subset of Lapack may be found in ESSL as well.&lt;br /&gt;
| &amp;lt;tt&amp;gt;liblapack, libscalpack&amp;lt;/tt&amp;gt;&lt;br /&gt;
| lapack&lt;br /&gt;
|-&lt;br /&gt;
| GSL&lt;br /&gt;
| 1.15&lt;br /&gt;
| GNU Scientific Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libgsl, libgslcblas&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gsl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| BOOST&lt;br /&gt;
| 1.47.0, 1.54, 1.57&lt;br /&gt;
| C++ Boost libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libboost...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cxxlibraries/boost&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| bzip2 + szip + zlib&lt;br /&gt;
| 1.0.6 + 2.1 + 1.2.7&lt;br /&gt;
| compression libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libbz2,libz,libsz&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;compression&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| METIS&lt;br /&gt;
| 5.0.2&lt;br /&gt;
| Serial Graph Partitioning and Fill-reducing Matrix Ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;metis&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| ParMETIS&lt;br /&gt;
| 4.0.2&lt;br /&gt;
| Parallel graph partitioning and fill-reducing matrix ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libparmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parmetis&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| OpenSSL&lt;br /&gt;
| 1.0.2 &lt;br /&gt;
| General-purpose cryptography library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libcrypto, libssl&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openssl&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| FILTLAN&lt;br /&gt;
| 1.0&lt;br /&gt;
| The Filtered Lanczos Package &lt;br /&gt;
| &amp;lt;tt&amp;gt;libdfiltlan,libdmatkit,libsfiltlan,libsmatkit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FILTLAN&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Scripting/interpreted languages'''''&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.6.6&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-2.6/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.7.3&lt;br /&gt;
| Python programming language. Modules included : numpy-1.8.0, pyFFTW-0.9.2, astropy-0.3, scipy-0.13.3, mpi4py-1.3.1, h5py-2.2.1&lt;br /&gt;
| &amp;lt;tt&amp;gt;/scinet/bgq/tools/Python/python2.7.3-20131205/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 3.2.2&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-3.2/bin/python3&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Applications'''''&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.abinit.org/ ABINIT]&lt;br /&gt;
| 7.10.4&lt;br /&gt;
| An atomic-scale simulation software suite&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.berkeleygw.org/ BerkeleyGW library]&lt;br /&gt;
| 1.0.4-2.0.0436&lt;br /&gt;
| Computes quasiparticle properties and the optical responses of a large variety of materials&lt;br /&gt;
| &amp;lt;tt&amp;gt;libBGW_wfn.a, wfn_rho_vxc_io_m.mod&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;BGW-paratec&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.cp2k.org/ CP2K]&lt;br /&gt;
| 2.3, 2.4, 2.5.1, 2.6.1&lt;br /&gt;
| DFT molecular dynamics, MPI &lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k.psmp&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.cpmd.org/ CPMD]&lt;br /&gt;
| 3.15.3, 3.17.1&lt;br /&gt;
| Carr-Parinello molecular dynamics, MPI&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd.x&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| gnuplot&lt;br /&gt;
| 4.6.1&lt;br /&gt;
| interactive plotting program to be run on front-end nodes&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAMMPS&lt;br /&gt;
| Nov 2012/7Dec15/7Dec15-mpi&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;lmp_bgq&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;lammps&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NAMD&lt;br /&gt;
| 2.9&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;namd2&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;namd/2.9-smp&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.quantum-espresso.org/index.php Quantum Espresso]&lt;br /&gt;
| 5.0.3/5.2.1&lt;br /&gt;
| Molecular Structure / Quantum Chemistry &lt;br /&gt;
| &amp;lt;tt&amp;gt;qe_pw.x, etc&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;espresso&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://openfoam.org OpenFOAM]&lt;br /&gt;
| 2.2.0, 2.3.0, 2.4.0, 3.0.1&lt;br /&gt;
| Computational Fluid Dynamics&lt;br /&gt;
| &amp;lt;tt&amp;gt;icofoam,etc. &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openfoam/2.2.0, openfoam/2.3.0, openfoam/2.4.0&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Beta Tests'''''&lt;br /&gt;
|-&lt;br /&gt;
| WATSON API&lt;br /&gt;
| beta&lt;br /&gt;
| Natural Language Processing&lt;br /&gt;
| &amp;lt;tt&amp;gt;watson_beta&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FEN/WATSON&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== OpenFOAM on BGQ ===&lt;br /&gt;
&lt;br /&gt;
[https://docs.scinet.utoronto.ca/index.php/OpenFOAM_on_BGQ A detailed explanation of OpenFOAM usage on BG/Q cluster]&lt;br /&gt;
&lt;br /&gt;
== Python on BlueGene ==&lt;br /&gt;
Python 2.7.3 has been installed on BlueGene. To use &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Numpy&amp;lt;/span&amp;gt; and &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Scipy&amp;lt;/span&amp;gt;, the module &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;essl/5.1&amp;lt;/span&amp;gt; has to be loaded.&lt;br /&gt;
The full python path has to be provided (otherwise the default version is used).&lt;br /&gt;
&lt;br /&gt;
To use python on BlueGene (from within a job script or a debugjob session):&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load python/2.7.3&lt;br /&gt;
##Only if you need numpy/scipy :&lt;br /&gt;
module load xlf/14.1 essl/5.1&lt;br /&gt;
runjob --np 1 --ranks-per-node=1 --envs HOME=$HOME LD_LIBRARY_PATH=$LD_LIBRARY_PATH PYTHONPATH=/scinet/bgq/tools/Python/python2.7.3-20131205/lib/python2.7/site-packages/ : /scinet/bgq/tools/Python/python2.7.3-20131205/bin/python2.7 /PATHOFYOURSCRIPT.py &lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to use the mmap python API, you must use it in PRIVATE mode as shown in the bellow example :&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
import mmap&lt;br /&gt;
mm=mmap.mmap(-1,256,mmap.MAP_PRIVATE)&lt;br /&gt;
mm.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, you can use the mpi4py and h5py modules.&lt;br /&gt;
&lt;br /&gt;
Also, please read Cython documentation.&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
#BGQ Day: Introduction to Using the BG/Q [[Media:BgqintroUpdatedMarch2015.pdf|Slides (updated in 2015) ]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.mp4 (direct link)]&lt;br /&gt;
#BGQ Day: BG/Q Hardware Overview [https://support.scinet.utoronto.ca/~northrup/bgqhardware.pdf Slides] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.mp4 (direct link)]&lt;br /&gt;
# [http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUQUEEN/Documentation/Documention_node.html Julich BGQ Documentation]&lt;br /&gt;
# [https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q Argonne Mira BGQ Wiki]&lt;br /&gt;
# [https://computing.llnl.gov/tutorials/bgq/ LLNL Sequoia BGQ Info]&lt;br /&gt;
# [https://www.alcf.anl.gov/presentations Argonne MiraCon Presentations]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_SysAdmin.pdf|BGQ System Administration Guide]]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_AppDev.pdf|BGQ Application Development]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqccompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqclangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfcompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqflangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:Bgqfproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# [[Media:essl51.pdf|IBM ESSL (Engineering and Scientific Subroutine Library) 5.1 for Linux on Power]]&lt;br /&gt;
# [http://content.allinea.com/downloads/userguide.pdf Allinea DDT 4.1 User Guide]&lt;br /&gt;
# [https://www.ibm.com/support/knowledgecenter/en/SSFJTW_5.1.0/loadl.v5r1_welcome.html IBM LoadLeveler 5.1]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--  PUT IN TRAC !!!&lt;br /&gt;
&lt;br /&gt;
=== *Manual Block Creation* ===&lt;br /&gt;
&lt;br /&gt;
To reconfigure the BGQ nodes you can use the bg_console or the web based navigator from the service node &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bg_console&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are various options to create block types (section 3.2 in the BGQ admin manual), but the smallest is created using the&lt;br /&gt;
following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gen_small_block &amp;lt;blockid&amp;gt; &amp;lt;midplane&amp;gt; &amp;lt;cnodes&amp;gt; &amp;lt;nodeboard&amp;gt; &lt;br /&gt;
gen_small_block  R00-M0-N03-32 R00-M0 32 N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The block then needs to be booted using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
allocate R00-M0-N03-32&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If those resources are already booted into another block, that block must be freed before the new block can be &lt;br /&gt;
allocated.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
free R00-M0-N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many other functions in bg_console:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
help all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The BGQ default nomenclature for hardware is as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
(R)ack - (M)idplane - (N)ode board or block - (J)node - (C)ore&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So R00-M01-N03-J00-C02 would correspond to the first rack, second midplane, 3rd block, 1st node, and second core.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--!&amp;gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=441</id>
		<title>BGQ</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=441"/>
		<updated>2018-05-25T18:18:29Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* Software modules installed on the BGQ */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Blue_Gene_Cabinet.jpeg|center|300px|thumb]]&lt;br /&gt;
|name=Blue Gene/Q (BGQ)&lt;br /&gt;
|installed=Aug 2012, Nov 2014&lt;br /&gt;
|operatingsystem= RH6.3, CNK (Linux) &lt;br /&gt;
|loginnode= bgqdev-fen1&lt;br /&gt;
|nnodes=  4096 nodes (65,536 cores)&lt;br /&gt;
|rampernode=16 GB &lt;br /&gt;
|corespernode=16 (64 threads)&lt;br /&gt;
|interconnect=5D Torus (jobs), QDR Infiniband (I/O) &lt;br /&gt;
|vendorcompilers= bgxlc, bgxlf&lt;br /&gt;
|queuetype=Loadleveler&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==System Status==&lt;br /&gt;
&lt;br /&gt;
The current BGQ system status can be found on the wiki's [[Main Page]].&lt;br /&gt;
&lt;br /&gt;
==SOSCIP &amp;amp; LKSAVI==&lt;br /&gt;
&lt;br /&gt;
The BGQ is a Southern Ontario Smart Computing&lt;br /&gt;
Innovation Platform ([http://soscip.org/ SOSCIP]) BlueGene/Q supercomputer located at the&lt;br /&gt;
University of Toronto's SciNet HPC facility. The SOSCIP &lt;br /&gt;
multi-university/industry consortium is funded by the Ontario Government &lt;br /&gt;
and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
A half-rack of BlueGene/Q (8,192 cores) was purchased by the [http://likashingvirology.med.ualberta.ca/ Li Ka Shing Institute of Virology] at the University of Alberta in late fall 2014 and integrated into the existing BGQ system.&lt;br /&gt;
&lt;br /&gt;
The combined 4 rack system is the fastest Canadian supercomputer on the [http://top500.org/ top 500], currently at the 120th place (Nov 2015).&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:bgq-support@scinet.utoronto.ca &amp;lt;bgq-support@scinet.utoronto.ca&amp;gt;] for BGQ-specific inquiries.&lt;br /&gt;
&lt;br /&gt;
==Specifications==&lt;br /&gt;
&lt;br /&gt;
BGQ is an extremely dense and energy efficient 3rd generation Blue Gene IBM supercomputer built around a system-on-a-chip compute node that has a 16core 1.6GHz PowerPC based CPU (PowerPC A2) with 16GB of Ram.  The nodes are bundled in groups of 32 into a node board (512 cores), and 16 boards make up a midplane (8192 cores) with 2 midplanes per rack, or 16,348 cores and 16 TB of RAM per rack. The compute nodes run a very lightweight Linux-based operating system called CNK ('''C'''ompute '''N'''ode '''K'''ernel).  The compute nodes are all connected together using a custom 5D torus highspeed interconnect. Each rack has 16 I/O nodes that run a full Redhat Linux OS that manages the compute nodes and mounts the filesystem.  SciNet's BGQ consists of 8 mdiplanes (four-racks) totalling 65,536 cores and 64TB of RAM.&lt;br /&gt;
&lt;br /&gt;
[[Image:BlueGeneQHardware2.png‎ |center]]&lt;br /&gt;
&lt;br /&gt;
=== 5D Torus Network ===&lt;br /&gt;
&lt;br /&gt;
The network topology of BlueGene/Q is a five-dimensional (5D) torus, with direct links between the nearest neighbors in the ±A, ±B, ±C, ±D, and ±E directions.  As such there are only a few optimum block sizes that will use the network efficiently.&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellspacing=&amp;quot;0&amp;quot; cellpadding=&amp;quot;2&amp;quot;&lt;br /&gt;
| '''Node Boards '''&lt;br /&gt;
| '''Compute Nodes'''&lt;br /&gt;
| '''Cores'''&lt;br /&gt;
| '''Torus Dimensions'''&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 32&lt;br /&gt;
| 512&lt;br /&gt;
| 2x2x2x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 2 (adjacent pairs)&lt;br /&gt;
| 64&lt;br /&gt;
| 1024&lt;br /&gt;
| 2x2x4x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 4 (quadrants)&lt;br /&gt;
| 128&lt;br /&gt;
| 2048&lt;br /&gt;
| 2x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 8 (halves)&lt;br /&gt;
| 256&lt;br /&gt;
| 4096&lt;br /&gt;
| 4x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 16 (midplane)&lt;br /&gt;
| 512&lt;br /&gt;
| 8192&lt;br /&gt;
| 4x4x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 32 (1 rack)&lt;br /&gt;
| 1024&lt;br /&gt;
| 16384&lt;br /&gt;
| 4x4x4x8x2 &lt;br /&gt;
|-&lt;br /&gt;
| 64 (2 racks)&lt;br /&gt;
| 2048&lt;br /&gt;
| 32768&lt;br /&gt;
| 4x4x8x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 96 (3 racks)&lt;br /&gt;
| 3072&lt;br /&gt;
| 49152&lt;br /&gt;
| 4x4x12x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 128 (4 racks)&lt;br /&gt;
| 4096&lt;br /&gt;
| 65536&lt;br /&gt;
| 8x4x8x8x2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Login/Devel Node ==&lt;br /&gt;
&lt;br /&gt;
The development node is '''bgqdev-fen1''' which one can login to from the regular '''login.scinet.utoronto.ca''' login nodes or directly from outside using '''bgqdev.scinet.utoronto.ca''', e.g.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -l USERNAME bgqdev.scinet.utoronto.ca -X&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where USERNAME is your username on the BGQ and the &amp;lt;tt&amp;gt;-X&amp;lt;/tt&amp;gt; flag is optional, needed only if you will use X graphics.&amp;lt;br/&amp;gt;&lt;br /&gt;
Note: To learn how to setup ssh keys for logging in please see [[Ssh keys]].&lt;br /&gt;
&lt;br /&gt;
These development node is a Power7 machines running Linux which serve as the compilation and submission host for the BGQ.  Programs are cross-compiled for the BGQ on this node and then submitted to the queue using loadleveler.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including most of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
A list of the installed software can be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module unload vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands can go in your .bashrc files to make sure you are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
Modules that load libraries, define environment variables pointing to the location of library files and include files for use Makefiles. These environment variables follow the naming convention&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 $SCINET_[short-module-name]_BASE&lt;br /&gt;
 $SCINET_[short-module-name]_LIB&lt;br /&gt;
 $SCINET_[short-module-name]_INC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
for the base location of the module's files, the location of the libraries binaries and the header files, respectively.&lt;br /&gt;
&lt;br /&gt;
So to compile and link the library, you will have to add &amp;lt;tt&amp;gt;-I${SCINET_[module-basename]_INC}&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;-L${SCINET_[module-basename]_LIB}&amp;lt;/tt&amp;gt;, respectively, in addition to the usual &amp;lt;tt&amp;gt;-l[libname]&amp;lt;/tt&amp;gt;.  &lt;br /&gt;
&lt;br /&gt;
Note that a &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; command ''only'' sets the environment variables in your current shell (and any subprocesses that the shell launches).   It does ''not'' effect other shell environments.&lt;br /&gt;
&lt;br /&gt;
If you always require the same modules, it is easiest to load those modules in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and then they will always be present in your environment; if you routinely have to flip back and forth between modules, it is easiest to have almost no modules loaded in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and simply load them as you need them (and have the required &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; commands in your job submission scripts).&lt;br /&gt;
&lt;br /&gt;
=== Compilers ===&lt;br /&gt;
&lt;br /&gt;
The BGQ uses IBM XL compilers to cross-compile code for the BGQ.  Compilers are available for FORTRAN, C, and C++.  They are accessible by default, or by loading the '''xlf''' and '''vacpp''' modules. The compilers by default produce&lt;br /&gt;
static binaries, however with BGQ it is possible to now use dynamic libraries as well.  The compilers follow the XL conventions with the prefix '''bg''',&lt;br /&gt;
so '''bgxlc''' and '''bgxlf90''' are the C and FORTRAN compilers respectively.  &lt;br /&gt;
&lt;br /&gt;
Most users however will use the MPI variants, i.e. '''mpixlf90''' and '''mpixlc''' and  which are available by loading&lt;br /&gt;
the '''mpich2''' module. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load mpich2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is recommended to use at least the following flags when compiling and linking&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-O3 -qarch=qp -qtune=qp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to build a package for which the configure script tries to run small test jobs, the cross-compiling nature of the bgq can get in the way.  In that case, you should use the interactive [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] environment as described below.&lt;br /&gt;
&lt;br /&gt;
== ION/Devel Nodes ==&lt;br /&gt;
&lt;br /&gt;
There are also bgq native development nodes named '''bgqdev-ion[01-24]''' which one can login to directly, i.e. ssh, from '''bgqdev-fen1'''.  These nodes are extra I/O nodes that are essentially the same as the BGQ compute nodes with the exception that they run a full RedHat Linux and have an infiniband interface providing direct network access.    Unlike the regular development node, '''bgqdev-fen1''', which is Power7, this node has the same BGQ A2 processor, and thus cross compilations are not required which can make building some software easier.    &lt;br /&gt;
&lt;br /&gt;
'''NOTE''': BGQ MPI jobs can be compiled on these nodes, however can not be run locally as the mpich2 is setup for the BGQ network and thus will fail on these nodes.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
As the BlueGene/Q architecture is different from the development nodes, you cannot run applications intended/compiled for the BGQ on the devel nodes. The only way to run (or even test) your program is to submit a job to the BGQ.  Jobs are submitted as scripts through loadleveler. That script must then use '''runjob''' to start the job, which in many ways similar to mpirun or mpiexec.  As shown above in the network topology overview, there are only a few optimum job size configurations which is also further constrained by each block requiring a minimum of one IO node.  In SciNet's configuration (with 8 I/O nodes per midplane) this allows 64 nodes (1024 cores) to be the smallest block size. Normally a block size matches the job size to offer fully dedicated resources to the job.  Smaller jobs can be run within the same block however this results in shared resources (network and IO) and are referred to as sub-block jobs and are described in more detail below.  &lt;br /&gt;
&lt;br /&gt;
=== runjob ===&lt;br /&gt;
&lt;br /&gt;
All BGQ runs are launched using '''runjob''' which for those familiar with MPI is analogous to mpirun/mpiexec.  Jobs run on a block, which is a predefined group of nodes that have already been configured and booted.  There are two ways to get a block. One way is to use a 30-minute 'debugjob' session (more about that below). The other, more common case, is using a job script submitted and are running using loadleveler. Inside the job script, this block is set for you, and you do not have to specify the block name.  For example, if your loadleveler job script requests 64 nodes, each with 16 cores (for a total of 1024 cores), from within that job script, you can run a job with 16 processes per node and 1024 total processes with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here, &amp;lt;tt&amp;gt;--np 1024&amp;lt;/tt&amp;gt; sets the total number of mpi tasks, while &amp;lt;tt&amp;gt;--ranks-per-node=16&amp;lt;/tt&amp;gt; specifies that 16 processes should run on each node.&lt;br /&gt;
For pure mpi jobs, it is advisable always to give the number of ranks per node, because the default value of 1 may leave 15 cores on the node idle. The argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- (Note: If this were not a loadleveler job, and the block ID was R00-M0-N03-64, the command would be &amp;quot;&amp;lt;tt&amp;gt;runjob --block R00-M0-N03-64 --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&amp;lt;/tt&amp;gt;&amp;quot;) --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
runjob flags are shown with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
a particularly useful one is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--verbose #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where # is from 1-7 which can be helpful in debugging an application.&lt;br /&gt;
&lt;br /&gt;
=== How to set ranks-per-node ===&lt;br /&gt;
&lt;br /&gt;
There are 16 cores per node, but the argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64.  While it may seem natural to set ranks-per-node to 16, this is not generally recommended.  On the BGQ, one can efficiently run more than 1 process per core, because each core has four &amp;quot;hardware threads&amp;quot; (similar to HyperThreading on the GPC and Simultaneous Multi Threading on the TCS and P7), which can keep the different parts of each core busy at the same time. One would therefore ideally use 64 ranks per node.  There are two main reason why one might not set ranks-per-node to 64:&lt;br /&gt;
# The memory requirements do not allow 64 ranks (each rank only has 256MB of memory)&lt;br /&gt;
# The application is more efficient in a hybrid MPI/OpenMP mode (or MPI/pthreads). Using less ranks-per-node, the hardware threads are used as OpenMP threads within each process.&lt;br /&gt;
Because threads can share memory, the memory requirements of the hybrid runs is typically smaller than that of pure MPI runs.&lt;br /&gt;
&lt;br /&gt;
Note that the total number of mpi processes in a runjob (i.e., the --np argument) should be the ranks-per-node times the number of nodes (set by bg_size in the loadleveler script). So for the same number of nodes, if you change ranks-per-node by a factor of two, you should also multiply the total number of mpi processes by two.&lt;br /&gt;
&lt;br /&gt;
=== Queue Limits ===&lt;br /&gt;
&lt;br /&gt;
The maximum wall_clock_limit is 24 hours.  Official SOSCIP project jobs are prioritized over all other jobs using a fairshare algorithm with a 14 day rolling window.&lt;br /&gt;
&lt;br /&gt;
A 64 node block is reserved for development and interactive testing for 16 hours, from 8AM to midnight, everyday including weekends. While you can still reserve an interactive block from midnight to 8AM, the priority is given to batch jobs at that time interval in order to keep the machine usage as high as possible. This block is accessed by using the [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] command which has a 30 minute maximum wall_clock_limit. The purpose of this reservation is to ensure short testing jobs are run quickly without being held up by longer production type jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- We need to recover this functionality again. At the moment it doesn't work&lt;br /&gt;
=== BACKFILL scheduling ===&lt;br /&gt;
To optimize the cluster usage, we encourage users to submit jobs according to the available resources on BGQ. The command &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;llAvailableResources&amp;lt;/span&amp;gt; gives for example :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
On the Devel system : only a debugjob can start immediately&lt;br /&gt;
&lt;br /&gt;
On the Prod. system : a job will start immediately if you use 512 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 256 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 128 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 64 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Batch Jobs ===&lt;br /&gt;
&lt;br /&gt;
Job submission is done through loadleveler with a few blue gene specific commands.  The command &amp;quot;bg_size&amp;quot; is in number of nodes, not cores, so a bg_size=64 would be 64x16=1024 cores.&lt;br /&gt;
&lt;br /&gt;
The parameter &amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;bg_size&amp;lt;/span&amp;gt; can only be equal to 64, 128, 256, 512, 1024 and 2048.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;np&amp;lt;/span&amp;gt; &amp;amp;le; ranks-per-node * bg_size&lt;br /&gt;
&lt;br /&gt;
ranks-per-node &amp;amp;le; np&lt;br /&gt;
&lt;br /&gt;
(ranks-per-node * OMP_NUM_THREADS ) &amp;amp;le; 64 &lt;br /&gt;
&lt;br /&gt;
np : number of MPI processes&lt;br /&gt;
&lt;br /&gt;
ranks-per-node : number of MPI processes per node = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
OMP_NUM_THREADS : number of OpenMP thread per MPI process (for hybrid codes) = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue &lt;br /&gt;
&lt;br /&gt;
# Launch all BGQ jobs using runjob&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit to the queue use &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llsubmit myscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== Steps ( Job dependency) ===&lt;br /&gt;
LoadLeveler has a lot of advanced features to control job submission and execution. One of these features is called steps. This feature allows a series of jobs to be submitted using one script with dependencies defined between the jobs. What this allows is for a series of jobs to be run sequentially, waiting for the previous job, called a step, to be finished before the next job is started. The following example uses the same LoadLeveler script as previously shown, however the #@ step_name and #@ dependency directives are used to rerun the same case three times in a row, waiting until each job is finished to start the next.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step1                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the first step :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step1&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step2                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step1 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the second step if the first one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step2&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step3                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step2 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the third step if the second one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step3&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Jobs ===&lt;br /&gt;
&lt;br /&gt;
To see running jobs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq -b&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
to cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llcancel JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and to look at details of the bluegene resources use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstatus -M all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note: the loadleveler script commands  are not run on a bgq compute node but on the front-end node. Only programs started with runjob run on the bgq compute nodes. You should therefore keep scripting in the submission script to a bare minimum.'''&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Stats ===&lt;br /&gt;
&lt;br /&gt;
Use llbgstats to monitor your own stats and/or your group stats. PIs can also print their (current) monthly report.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstats -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Interactive Use / Debugging ===&lt;br /&gt;
&lt;br /&gt;
As BGQ codes are cross-compiled they cannot be run direclty on the front-nodes.  &lt;br /&gt;
Users however only have access to the BGQ through loadleveler which is appropriate for batch jobs, &lt;br /&gt;
however an interactive session is typically beneficial when debugging and developing.   As such a &lt;br /&gt;
script has been written to allow a session in which runjob can be run interactively.  The script&lt;br /&gt;
uses loadleveler to setup a block and set all the correct environment variables and then launch a spawned shell on&lt;br /&gt;
the front-end node. The '''debugjob''' session currently allows a 30 minute session on 64 nodes and when run on &lt;br /&gt;
'''&amp;lt;tt&amp;gt;bgqdev&amp;lt;/tt&amp;gt;''' runs in a dedicated reservation as described previously in the [[BGQ#Queue_Limits | queue limits]] section. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[user@bgqdev-fen1]$ debugjob&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ runjob --np 64 --ranks-per-node=16 --cwd=$PWD : $PWD/my_code -f myflags&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ exit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For debugging, gdb and Allinea DDT are available. The latter is recommended as it automatically attaches to all the processes of a process (instead of attaching a gdbtool by hand (as explained in the BGQ Application Development guide, link below). Simply compile with &amp;lt;tt&amp;gt;-g&amp;lt;/tt&amp;gt;, load the &amp;lt;tt&amp;gt;ddt/4.1&amp;lt;/tt&amp;gt; module, type &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt; and follow the graphical user interface.  The DDT user guide can be found below.&lt;br /&gt;
&lt;br /&gt;
Note: when running a job under ddt, you'll need to add &amp;quot;&amp;lt;tt&amp;gt;--ranks-per-node=X&amp;lt;/tt&amp;gt;&amp;quot; to the &amp;quot;runjob arguments&amp;quot; field.&lt;br /&gt;
&lt;br /&gt;
Apart from debugging, this environment is also useful for building libraries and applications that need to run small tests as part of their 'configure' step.   Within the debugjob session, applications compiled with the bgxl compilers or the mpcc/mpCC/mpfort wrappers, will automatically run on the BGQ, skipping the need for the runjob command, provided if you set the following environment variables &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export BG_PGM_LAUNCHER=yes&lt;br /&gt;
$ export RUNJOB_NP=1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The latter setting sets the number of mpi processes to run.  Most configure scripts expect only one mpi process, thus, &amp;lt;tt&amp;gt;RUNJOB_NP=1&amp;lt;/tt&amp;gt; is appropriate.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
debugjob session with an executable implicitly calls runjob  with 1 mpi task :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
debugjob -i&lt;br /&gt;
**********************************************************&lt;br /&gt;
 Interactive BGQ runjob shell using bgq-fen1-ib0.10295.0 and           &lt;br /&gt;
 LL14040718574824 for 30 minutes with 64 NODES (1024 cores). &lt;br /&gt;
 IMPLICIT MODE: running an executable implicitly calls runjob&lt;br /&gt;
                with 1 mpi task&lt;br /&gt;
 Exit shell when finished.                                &lt;br /&gt;
**********************************************************&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Sub-block jobs ===&lt;br /&gt;
&lt;br /&gt;
BGQ allows multiple applications to share the same block, which is referred to as sub-block jobs, however this needs to be done from within the same loadleveler submission script using multiple calls to runjob.  To run a sub-block job, you need to specify a &amp;quot;--corner&amp;quot; within the block to start each job and a 5D Torus AxBxCxDxE &amp;quot;--shape&amp;quot;.  The starting corner will depend on the specific block details provided by loadleveler and the shape and size of job trying to be used.  &lt;br /&gt;
&lt;br /&gt;
Figuring out what the corners and shapes should be is very tricky (especially since it depends on the block you get allocated).  For that reason, we've created a script called &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; that determines the corners and shape of the sub-blocks.  It only handles the (presumable common) case in which you want to subdivide the block into n equally sized sub-blocks, where n may be 1,2,4,8,16 and 32.&lt;br /&gt;
&lt;br /&gt;
Here is an example script calling &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; with a size of 4 that will return the appropriate $SHAPE argument and an array of 16 starting $CORNER. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# @ job_name           = bgsubblock&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job SUBBLOCK &amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue&lt;br /&gt;
&lt;br /&gt;
# Using subblocks script to set $SHAPE and array of ${CORNERS[n]}&lt;br /&gt;
# with size of subblocks in nodes (ie similiar to bg_size)&lt;br /&gt;
&lt;br /&gt;
# In this case 16 sub-blocks of 4 cnodes each (64 total ie bg_size)&lt;br /&gt;
source subblocks 4&lt;br /&gt;
&lt;br /&gt;
# 16 jobs of 4 each&lt;br /&gt;
for (( i=0; i &amp;lt;  16 ; i++)); do&lt;br /&gt;
   runjob --corner ${CORNER[$i]} --shape $SHAPE --np 64 --ranks-per-node=16 :  your_code_here &amp;gt; $i.out &amp;amp;&lt;br /&gt;
done&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Remember that subjobs are not the ideal way to run on the BlueGene/Qs. One needs to consider that these sub-blocks all have to share the same I/O nodes, so for I/O intensive jobs this will be an inefficient setup.  Also consider that if you need to run such small jobs that you have to run in sub-blocks, it may be more efficient to use other clusters such as the GPC.&lt;br /&gt;
&lt;br /&gt;
Let us know if you run into any issues with this technique, please contact bgq-support for help.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The BGQ has its own dedicated 500TB file system based on GPFS (General Parallel File System). There are two main systems for user data: /home, a small, backed-up space where user home directories are located, and /scratch, a large system for input or output data for jobs; data on /scratch is not backed up. The path to your home directory is in the environment variable $HOME, and will look like /home/G/GROUP/USER, .  The path to your scratch directory is in the environment variable $SCRATCH, and will look like /scratch/G/GROUP/USER (following the conventions of the rest of the SciNet systems).  &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! | file system &lt;br /&gt;
! | purpose &lt;br /&gt;
! | user quota &lt;br /&gt;
! | backed up&lt;br /&gt;
! | purged&lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 50 GB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
|-&lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| first of (20 TB ; 1 million files)&lt;br /&gt;
| no&lt;br /&gt;
| not currently&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Transfering files===&lt;br /&gt;
The BGQ GPFS file system,  except for HPSS, is '''not''' shared with the other SciNet systems (gpc, tcs, p7, arc), nor is the other file system mounted on the BGQ.  &lt;br /&gt;
Use scp to copy files from one file system to the other, e.g., from bgqdev-fen1, you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour login.scinet.utoronto.ca:code.tgz .&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or from a login node you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour code.tgz bgqdev.scinet.utoronto.ca:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The flag &amp;lt;tt&amp;gt;-c arcfour&amp;lt;/tt&amp;gt; is optional. It tells scp (or really, ssh), to use a non-default encryption. The one chosen here, arcfour, has been found to speed up the transfer by a factor of two (you may expect around 85MB/s).  This encryption method is only recommended for copying from the BGQ file system to the regular SciNet GPFS file system or back. &lt;br /&gt;
 &lt;br /&gt;
Note that although these transfers are witihin the same data center, you have to use the full names of the systems, login.scinet.utoronto.ca and bgq.scinet.utoronto.ca, respectively, and that you will be asked you for your password.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''diskUsage'''&amp;lt;/tt&amp;gt; command, available on the bgqdev nodes, provides information in a number of ways on the home and scratch file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;] [-de|-plot]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
       -plot: create plots of disk usages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that the information on usage and quota is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
===Bridge to HPSS===&lt;br /&gt;
&lt;br /&gt;
BGQ users may transfer material to/from HPSS via the GPC archive queue. On the HPSS gateway node (gpc-archive01), the BGQ GPFS file systems are mounted under a single mounting point /bgq (/bgq/scratch and /bgq/home). For detailed information on the use of HPSS [https://support.scinet.utoronto.ca/wiki/index.php/HPSS please read the HPSS wiki section.]&lt;br /&gt;
&lt;br /&gt;
== Software modules installed on the BGQ ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! |Software  &lt;br /&gt;
! | Version&lt;br /&gt;
! | Comments&lt;br /&gt;
! | Command/Library&lt;br /&gt;
! | Module Name&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Compilers &amp;amp; Development Tools'''''&lt;br /&gt;
|-&lt;br /&gt;
|IBM fortran compiler&lt;br /&gt;
|14.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlf,bgxlf_r,bgxlf90,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|xlf&lt;br /&gt;
|-&lt;br /&gt;
|IBM c/c++ compilers&lt;br /&gt;
|12.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlc,bgxlC,bgxlc_r,bgxlC_r,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|vacpp&lt;br /&gt;
|-&lt;br /&gt;
|MPICH2 MPI library&lt;br /&gt;
|1.4.1&lt;br /&gt;
|There are 4 versions (see BGQ Applications Development document).&lt;br /&gt;
|&amp;lt;tt&amp;gt;mpicc,mpicxx,mpif77,mpif90&amp;lt;/tt&amp;gt;&lt;br /&gt;
|mpich2&lt;br /&gt;
|- &lt;br /&gt;
| GCC Compiler&lt;br /&gt;
| 4.4.6, 4.8.1&lt;br /&gt;
| GNU Compiler Collection for BGQ&amp;lt;br&amp;gt;(4.8.1 requires binutils/2.23 to be loaded)&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-gcc, powerpc64-bgq-linux-g++, powerpc64-bgq-linux-gfortran&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgqgcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Clang Compiler&lt;br /&gt;
| r217688-20140912, r263698-20160317&lt;br /&gt;
| Clang cross-compilers for bgq&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-clang, powerpc64-bgq-linux-clang++&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgclang&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Binutils&lt;br /&gt;
| 2.21.1, 2.23&lt;br /&gt;
| Cross-compilation utilities&lt;br /&gt;
| &amp;lt;tt&amp;gt;addr2line, ar, ld, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;binutils&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| CMake	&lt;br /&gt;
| 2.8.8, 2.8.12.1&lt;br /&gt;
| cross-platform, open-source build system&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Git&lt;br /&gt;
| 1.9.5&lt;br /&gt;
| Revision control system&lt;br /&gt;
| &amp;lt;tt&amp;gt;git, gitk&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;git&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Debug/performance tools'''''&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/gdb/ gdb]&lt;br /&gt;
| 7.2&lt;br /&gt;
| GNU Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/ddd/ ddd]&lt;br /&gt;
| 3.3.12&lt;br /&gt;
| GNO Data Display Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [http://www.allinea.com/products/ddt/ DDT]&lt;br /&gt;
| 4.1, 4.2, 5.0.1&lt;br /&gt;
| Allinea's Distributed Debugging Tool&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[HPCTW]]&lt;br /&gt;
| 1.0&lt;br /&gt;
| BGQ MPI and Hardware Counters&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmpihpm.a, libmpihpm_smp.a, libmpitrace.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hptibm&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[MemP]]&lt;br /&gt;
| 1.0.3&lt;br /&gt;
| BGQ Memory Stats&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmemP.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;memP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Storage tools/libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.9-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/189-v18-serial-xlc*&amp;lt;br/&amp;gt;hdf5/189-v18-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.12-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/1812-v18-serial-gcc&amp;lt;br/&amp;gt;hdf5/1812-v18-mpich2-gcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NetCDF&lt;br /&gt;
| 4.2.1.1&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncdump,ncgen,libnetcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;netcdf/4.2.1.1-serial-xlc*&amp;lt;br/&amp;gt;netcdf/4.2.1.1-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Parallel NetCDF&lt;br /&gt;
| 1.3.1&lt;br /&gt;
| Parallel scientific data storage and retrieval using MPI-IO&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpnetcdf.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parallel-netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| ESSL&lt;br /&gt;
| 5.1&lt;br /&gt;
| IBM Engineering and Scientific Subroutine Library (manual below)&lt;br /&gt;
| &amp;lt;tt&amp;gt;libesslbg,libesslsmpbg&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;essl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| WSMP&lt;br /&gt;
| 15.06.01&lt;br /&gt;
| Watson Sparse Matrix Package&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpwsmpBGQ.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;WSMP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| FFTW&lt;br /&gt;
| 2.1.5, 3.3.2, 3.1.2-esslwrapper&lt;br /&gt;
| Fast fourier transform &lt;br /&gt;
| &amp;lt;tt&amp;gt;libsfftw,libdfftw,libfftw3, libfftw3f&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;fftw/2.1.5, fftw/3.3.2, fftw/3.1.2-esslwrapper&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAPACK + ScaLAPACK&lt;br /&gt;
| 3.4.2 + 2.0.2&lt;br /&gt;
| Linear algebra routines. A subset of Lapack may be found in ESSL as well.&lt;br /&gt;
| &amp;lt;tt&amp;gt;liblapack, libscalpack&amp;lt;/tt&amp;gt;&lt;br /&gt;
| lapack&lt;br /&gt;
|-&lt;br /&gt;
| GSL&lt;br /&gt;
| 1.15&lt;br /&gt;
| GNU Scientific Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libgsl, libgslcblas&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gsl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| BOOST&lt;br /&gt;
| 1.47.0, 1.54, 1.57&lt;br /&gt;
| C++ Boost libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libboost...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cxxlibraries/boost&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| bzip2 + szip + zlib&lt;br /&gt;
| 1.0.6 + 2.1 + 1.2.7&lt;br /&gt;
| compression libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libbz2,libz,libsz&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;compression&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| METIS&lt;br /&gt;
| 5.0.2&lt;br /&gt;
| Serial Graph Partitioning and Fill-reducing Matrix Ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;metis&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| ParMETIS&lt;br /&gt;
| 4.0.2&lt;br /&gt;
| Parallel graph partitioning and fill-reducing matrix ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libparmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parmetis&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| OpenSSL&lt;br /&gt;
| 1.0.2 &lt;br /&gt;
| General-purpose cryptography library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libcrypto, libssl&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openssl&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| FILTLAN&lt;br /&gt;
| 1.0&lt;br /&gt;
| The Filtered Lanczos Package &lt;br /&gt;
| &amp;lt;tt&amp;gt;libdfiltlan,libdmatkit,libsfiltlan,libsmatkit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FILTLAN&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Scripting/interpreted languages'''''&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.6.6&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-2.6/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.7.3&lt;br /&gt;
| Python programming language. Modules included : numpy-1.8.0, pyFFTW-0.9.2, astropy-0.3, scipy-0.13.3, mpi4py-1.3.1, h5py-2.2.1&lt;br /&gt;
| &amp;lt;tt&amp;gt;/scinet/bgq/tools/Python/python2.7.3-20131205/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 3.2.2&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-3.2/bin/python3&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Applications'''''&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.abinit.org/ ABINIT]&lt;br /&gt;
| 7.10.4&lt;br /&gt;
| An atomic-scale simulation software suite&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.berkeleygw.org/ BerkeleyGW library]&lt;br /&gt;
| 1.0.4-2.0.0436&lt;br /&gt;
| Computes quasiparticle properties and the optical responses of a large variety of materials&lt;br /&gt;
| &amp;lt;tt&amp;gt;libBGW_wfn.a, wfn_rho_vxc_io_m.mod&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;BGW-paratec&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.cp2k.org/ CP2K]&lt;br /&gt;
| 2.3, 2.4, 2.5.1, 2.6.1&lt;br /&gt;
| DFT molecular dynamics, MPI &lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k.psmp&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.cpmd.org/ CPMD]&lt;br /&gt;
| 3.15.3, 3.17.1&lt;br /&gt;
| Carr-Parinello molecular dynamics, MPI&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd.x&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| gnuplot&lt;br /&gt;
| 4.6.1&lt;br /&gt;
| interactive plotting program to be run on front-end nodes&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAMMPS&lt;br /&gt;
| Nov 2012/7Dec15/7Dec15-mpi&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;lmp_bgq&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;lammps&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NAMD&lt;br /&gt;
| 2.9&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;namd2&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;namd/2.9-smp&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.quantum-espresso.org/index.php Quantum Espresso]&lt;br /&gt;
| 5.0.3/5.2.1&lt;br /&gt;
| Molecular Structure / Quantum Chemistry &lt;br /&gt;
| &amp;lt;tt&amp;gt;qe_pw.x, etc&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;espresso&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[https://openfoam.org/ | OpenFOAM]]&lt;br /&gt;
| 2.2.0, 2.3.0, 2.4.0, 3.0.1&lt;br /&gt;
| Computational Fluid Dynamics&lt;br /&gt;
| &amp;lt;tt&amp;gt;icofoam,etc. &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openfoam/2.2.0, openfoam/2.3.0, openfoam/2.4.0&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Beta Tests'''''&lt;br /&gt;
|-&lt;br /&gt;
| WATSON API&lt;br /&gt;
| beta&lt;br /&gt;
| Natural Language Processing&lt;br /&gt;
| &amp;lt;tt&amp;gt;watson_beta&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FEN/WATSON&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== OpenFOAM on BGQ ===&lt;br /&gt;
&lt;br /&gt;
[https://docs.scinet.utoronto.ca/index.php/OpenFOAM_on_BGQ A detailed explanation of OpenFOAM usage on BG/Q cluster]&lt;br /&gt;
&lt;br /&gt;
== Python on BlueGene ==&lt;br /&gt;
Python 2.7.3 has been installed on BlueGene. To use &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Numpy&amp;lt;/span&amp;gt; and &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Scipy&amp;lt;/span&amp;gt;, the module &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;essl/5.1&amp;lt;/span&amp;gt; has to be loaded.&lt;br /&gt;
The full python path has to be provided (otherwise the default version is used).&lt;br /&gt;
&lt;br /&gt;
To use python on BlueGene (from within a job script or a debugjob session):&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load python/2.7.3&lt;br /&gt;
##Only if you need numpy/scipy :&lt;br /&gt;
module load xlf/14.1 essl/5.1&lt;br /&gt;
runjob --np 1 --ranks-per-node=1 --envs HOME=$HOME LD_LIBRARY_PATH=$LD_LIBRARY_PATH PYTHONPATH=/scinet/bgq/tools/Python/python2.7.3-20131205/lib/python2.7/site-packages/ : /scinet/bgq/tools/Python/python2.7.3-20131205/bin/python2.7 /PATHOFYOURSCRIPT.py &lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to use the mmap python API, you must use it in PRIVATE mode as shown in the bellow example :&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
import mmap&lt;br /&gt;
mm=mmap.mmap(-1,256,mmap.MAP_PRIVATE)&lt;br /&gt;
mm.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, you can use the mpi4py and h5py modules.&lt;br /&gt;
&lt;br /&gt;
Also, please read Cython documentation.&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
#BGQ Day: Introduction to Using the BG/Q [[Media:BgqintroUpdatedMarch2015.pdf|Slides (updated in 2015) ]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.mp4 (direct link)]&lt;br /&gt;
#BGQ Day: BG/Q Hardware Overview [https://support.scinet.utoronto.ca/~northrup/bgqhardware.pdf Slides] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.mp4 (direct link)]&lt;br /&gt;
# [http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUQUEEN/Documentation/Documention_node.html Julich BGQ Documentation]&lt;br /&gt;
# [https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q Argonne Mira BGQ Wiki]&lt;br /&gt;
# [https://computing.llnl.gov/tutorials/bgq/ LLNL Sequoia BGQ Info]&lt;br /&gt;
# [https://www.alcf.anl.gov/presentations Argonne MiraCon Presentations]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_SysAdmin.pdf|BGQ System Administration Guide]]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_AppDev.pdf|BGQ Application Development]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqccompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqclangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfcompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqflangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:Bgqfproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# [[Media:essl51.pdf|IBM ESSL (Engineering and Scientific Subroutine Library) 5.1 for Linux on Power]]&lt;br /&gt;
# [http://content.allinea.com/downloads/userguide.pdf Allinea DDT 4.1 User Guide]&lt;br /&gt;
# [https://www.ibm.com/support/knowledgecenter/en/SSFJTW_5.1.0/loadl.v5r1_welcome.html IBM LoadLeveler 5.1]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--  PUT IN TRAC !!!&lt;br /&gt;
&lt;br /&gt;
=== *Manual Block Creation* ===&lt;br /&gt;
&lt;br /&gt;
To reconfigure the BGQ nodes you can use the bg_console or the web based navigator from the service node &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bg_console&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are various options to create block types (section 3.2 in the BGQ admin manual), but the smallest is created using the&lt;br /&gt;
following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gen_small_block &amp;lt;blockid&amp;gt; &amp;lt;midplane&amp;gt; &amp;lt;cnodes&amp;gt; &amp;lt;nodeboard&amp;gt; &lt;br /&gt;
gen_small_block  R00-M0-N03-32 R00-M0 32 N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The block then needs to be booted using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
allocate R00-M0-N03-32&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If those resources are already booted into another block, that block must be freed before the new block can be &lt;br /&gt;
allocated.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
free R00-M0-N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many other functions in bg_console:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
help all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The BGQ default nomenclature for hardware is as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
(R)ack - (M)idplane - (N)ode board or block - (J)node - (C)ore&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So R00-M01-N03-J00-C02 would correspond to the first rack, second midplane, 3rd block, 1st node, and second core.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--!&amp;gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=440</id>
		<title>BGQ</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=440"/>
		<updated>2018-05-25T18:17:27Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* OpenFOAM on BGQ */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Blue_Gene_Cabinet.jpeg|center|300px|thumb]]&lt;br /&gt;
|name=Blue Gene/Q (BGQ)&lt;br /&gt;
|installed=Aug 2012, Nov 2014&lt;br /&gt;
|operatingsystem= RH6.3, CNK (Linux) &lt;br /&gt;
|loginnode= bgqdev-fen1&lt;br /&gt;
|nnodes=  4096 nodes (65,536 cores)&lt;br /&gt;
|rampernode=16 GB &lt;br /&gt;
|corespernode=16 (64 threads)&lt;br /&gt;
|interconnect=5D Torus (jobs), QDR Infiniband (I/O) &lt;br /&gt;
|vendorcompilers= bgxlc, bgxlf&lt;br /&gt;
|queuetype=Loadleveler&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==System Status==&lt;br /&gt;
&lt;br /&gt;
The current BGQ system status can be found on the wiki's [[Main Page]].&lt;br /&gt;
&lt;br /&gt;
==SOSCIP &amp;amp; LKSAVI==&lt;br /&gt;
&lt;br /&gt;
The BGQ is a Southern Ontario Smart Computing&lt;br /&gt;
Innovation Platform ([http://soscip.org/ SOSCIP]) BlueGene/Q supercomputer located at the&lt;br /&gt;
University of Toronto's SciNet HPC facility. The SOSCIP &lt;br /&gt;
multi-university/industry consortium is funded by the Ontario Government &lt;br /&gt;
and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
A half-rack of BlueGene/Q (8,192 cores) was purchased by the [http://likashingvirology.med.ualberta.ca/ Li Ka Shing Institute of Virology] at the University of Alberta in late fall 2014 and integrated into the existing BGQ system.&lt;br /&gt;
&lt;br /&gt;
The combined 4 rack system is the fastest Canadian supercomputer on the [http://top500.org/ top 500], currently at the 120th place (Nov 2015).&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:bgq-support@scinet.utoronto.ca &amp;lt;bgq-support@scinet.utoronto.ca&amp;gt;] for BGQ-specific inquiries.&lt;br /&gt;
&lt;br /&gt;
==Specifications==&lt;br /&gt;
&lt;br /&gt;
BGQ is an extremely dense and energy efficient 3rd generation Blue Gene IBM supercomputer built around a system-on-a-chip compute node that has a 16core 1.6GHz PowerPC based CPU (PowerPC A2) with 16GB of Ram.  The nodes are bundled in groups of 32 into a node board (512 cores), and 16 boards make up a midplane (8192 cores) with 2 midplanes per rack, or 16,348 cores and 16 TB of RAM per rack. The compute nodes run a very lightweight Linux-based operating system called CNK ('''C'''ompute '''N'''ode '''K'''ernel).  The compute nodes are all connected together using a custom 5D torus highspeed interconnect. Each rack has 16 I/O nodes that run a full Redhat Linux OS that manages the compute nodes and mounts the filesystem.  SciNet's BGQ consists of 8 mdiplanes (four-racks) totalling 65,536 cores and 64TB of RAM.&lt;br /&gt;
&lt;br /&gt;
[[Image:BlueGeneQHardware2.png‎ |center]]&lt;br /&gt;
&lt;br /&gt;
=== 5D Torus Network ===&lt;br /&gt;
&lt;br /&gt;
The network topology of BlueGene/Q is a five-dimensional (5D) torus, with direct links between the nearest neighbors in the ±A, ±B, ±C, ±D, and ±E directions.  As such there are only a few optimum block sizes that will use the network efficiently.&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellspacing=&amp;quot;0&amp;quot; cellpadding=&amp;quot;2&amp;quot;&lt;br /&gt;
| '''Node Boards '''&lt;br /&gt;
| '''Compute Nodes'''&lt;br /&gt;
| '''Cores'''&lt;br /&gt;
| '''Torus Dimensions'''&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 32&lt;br /&gt;
| 512&lt;br /&gt;
| 2x2x2x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 2 (adjacent pairs)&lt;br /&gt;
| 64&lt;br /&gt;
| 1024&lt;br /&gt;
| 2x2x4x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 4 (quadrants)&lt;br /&gt;
| 128&lt;br /&gt;
| 2048&lt;br /&gt;
| 2x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 8 (halves)&lt;br /&gt;
| 256&lt;br /&gt;
| 4096&lt;br /&gt;
| 4x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 16 (midplane)&lt;br /&gt;
| 512&lt;br /&gt;
| 8192&lt;br /&gt;
| 4x4x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 32 (1 rack)&lt;br /&gt;
| 1024&lt;br /&gt;
| 16384&lt;br /&gt;
| 4x4x4x8x2 &lt;br /&gt;
|-&lt;br /&gt;
| 64 (2 racks)&lt;br /&gt;
| 2048&lt;br /&gt;
| 32768&lt;br /&gt;
| 4x4x8x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 96 (3 racks)&lt;br /&gt;
| 3072&lt;br /&gt;
| 49152&lt;br /&gt;
| 4x4x12x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 128 (4 racks)&lt;br /&gt;
| 4096&lt;br /&gt;
| 65536&lt;br /&gt;
| 8x4x8x8x2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Login/Devel Node ==&lt;br /&gt;
&lt;br /&gt;
The development node is '''bgqdev-fen1''' which one can login to from the regular '''login.scinet.utoronto.ca''' login nodes or directly from outside using '''bgqdev.scinet.utoronto.ca''', e.g.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -l USERNAME bgqdev.scinet.utoronto.ca -X&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where USERNAME is your username on the BGQ and the &amp;lt;tt&amp;gt;-X&amp;lt;/tt&amp;gt; flag is optional, needed only if you will use X graphics.&amp;lt;br/&amp;gt;&lt;br /&gt;
Note: To learn how to setup ssh keys for logging in please see [[Ssh keys]].&lt;br /&gt;
&lt;br /&gt;
These development node is a Power7 machines running Linux which serve as the compilation and submission host for the BGQ.  Programs are cross-compiled for the BGQ on this node and then submitted to the queue using loadleveler.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including most of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
A list of the installed software can be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module unload vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands can go in your .bashrc files to make sure you are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
Modules that load libraries, define environment variables pointing to the location of library files and include files for use Makefiles. These environment variables follow the naming convention&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 $SCINET_[short-module-name]_BASE&lt;br /&gt;
 $SCINET_[short-module-name]_LIB&lt;br /&gt;
 $SCINET_[short-module-name]_INC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
for the base location of the module's files, the location of the libraries binaries and the header files, respectively.&lt;br /&gt;
&lt;br /&gt;
So to compile and link the library, you will have to add &amp;lt;tt&amp;gt;-I${SCINET_[module-basename]_INC}&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;-L${SCINET_[module-basename]_LIB}&amp;lt;/tt&amp;gt;, respectively, in addition to the usual &amp;lt;tt&amp;gt;-l[libname]&amp;lt;/tt&amp;gt;.  &lt;br /&gt;
&lt;br /&gt;
Note that a &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; command ''only'' sets the environment variables in your current shell (and any subprocesses that the shell launches).   It does ''not'' effect other shell environments.&lt;br /&gt;
&lt;br /&gt;
If you always require the same modules, it is easiest to load those modules in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and then they will always be present in your environment; if you routinely have to flip back and forth between modules, it is easiest to have almost no modules loaded in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and simply load them as you need them (and have the required &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; commands in your job submission scripts).&lt;br /&gt;
&lt;br /&gt;
=== Compilers ===&lt;br /&gt;
&lt;br /&gt;
The BGQ uses IBM XL compilers to cross-compile code for the BGQ.  Compilers are available for FORTRAN, C, and C++.  They are accessible by default, or by loading the '''xlf''' and '''vacpp''' modules. The compilers by default produce&lt;br /&gt;
static binaries, however with BGQ it is possible to now use dynamic libraries as well.  The compilers follow the XL conventions with the prefix '''bg''',&lt;br /&gt;
so '''bgxlc''' and '''bgxlf90''' are the C and FORTRAN compilers respectively.  &lt;br /&gt;
&lt;br /&gt;
Most users however will use the MPI variants, i.e. '''mpixlf90''' and '''mpixlc''' and  which are available by loading&lt;br /&gt;
the '''mpich2''' module. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load mpich2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is recommended to use at least the following flags when compiling and linking&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-O3 -qarch=qp -qtune=qp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to build a package for which the configure script tries to run small test jobs, the cross-compiling nature of the bgq can get in the way.  In that case, you should use the interactive [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] environment as described below.&lt;br /&gt;
&lt;br /&gt;
== ION/Devel Nodes ==&lt;br /&gt;
&lt;br /&gt;
There are also bgq native development nodes named '''bgqdev-ion[01-24]''' which one can login to directly, i.e. ssh, from '''bgqdev-fen1'''.  These nodes are extra I/O nodes that are essentially the same as the BGQ compute nodes with the exception that they run a full RedHat Linux and have an infiniband interface providing direct network access.    Unlike the regular development node, '''bgqdev-fen1''', which is Power7, this node has the same BGQ A2 processor, and thus cross compilations are not required which can make building some software easier.    &lt;br /&gt;
&lt;br /&gt;
'''NOTE''': BGQ MPI jobs can be compiled on these nodes, however can not be run locally as the mpich2 is setup for the BGQ network and thus will fail on these nodes.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
As the BlueGene/Q architecture is different from the development nodes, you cannot run applications intended/compiled for the BGQ on the devel nodes. The only way to run (or even test) your program is to submit a job to the BGQ.  Jobs are submitted as scripts through loadleveler. That script must then use '''runjob''' to start the job, which in many ways similar to mpirun or mpiexec.  As shown above in the network topology overview, there are only a few optimum job size configurations which is also further constrained by each block requiring a minimum of one IO node.  In SciNet's configuration (with 8 I/O nodes per midplane) this allows 64 nodes (1024 cores) to be the smallest block size. Normally a block size matches the job size to offer fully dedicated resources to the job.  Smaller jobs can be run within the same block however this results in shared resources (network and IO) and are referred to as sub-block jobs and are described in more detail below.  &lt;br /&gt;
&lt;br /&gt;
=== runjob ===&lt;br /&gt;
&lt;br /&gt;
All BGQ runs are launched using '''runjob''' which for those familiar with MPI is analogous to mpirun/mpiexec.  Jobs run on a block, which is a predefined group of nodes that have already been configured and booted.  There are two ways to get a block. One way is to use a 30-minute 'debugjob' session (more about that below). The other, more common case, is using a job script submitted and are running using loadleveler. Inside the job script, this block is set for you, and you do not have to specify the block name.  For example, if your loadleveler job script requests 64 nodes, each with 16 cores (for a total of 1024 cores), from within that job script, you can run a job with 16 processes per node and 1024 total processes with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here, &amp;lt;tt&amp;gt;--np 1024&amp;lt;/tt&amp;gt; sets the total number of mpi tasks, while &amp;lt;tt&amp;gt;--ranks-per-node=16&amp;lt;/tt&amp;gt; specifies that 16 processes should run on each node.&lt;br /&gt;
For pure mpi jobs, it is advisable always to give the number of ranks per node, because the default value of 1 may leave 15 cores on the node idle. The argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- (Note: If this were not a loadleveler job, and the block ID was R00-M0-N03-64, the command would be &amp;quot;&amp;lt;tt&amp;gt;runjob --block R00-M0-N03-64 --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&amp;lt;/tt&amp;gt;&amp;quot;) --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
runjob flags are shown with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
a particularly useful one is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--verbose #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where # is from 1-7 which can be helpful in debugging an application.&lt;br /&gt;
&lt;br /&gt;
=== How to set ranks-per-node ===&lt;br /&gt;
&lt;br /&gt;
There are 16 cores per node, but the argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64.  While it may seem natural to set ranks-per-node to 16, this is not generally recommended.  On the BGQ, one can efficiently run more than 1 process per core, because each core has four &amp;quot;hardware threads&amp;quot; (similar to HyperThreading on the GPC and Simultaneous Multi Threading on the TCS and P7), which can keep the different parts of each core busy at the same time. One would therefore ideally use 64 ranks per node.  There are two main reason why one might not set ranks-per-node to 64:&lt;br /&gt;
# The memory requirements do not allow 64 ranks (each rank only has 256MB of memory)&lt;br /&gt;
# The application is more efficient in a hybrid MPI/OpenMP mode (or MPI/pthreads). Using less ranks-per-node, the hardware threads are used as OpenMP threads within each process.&lt;br /&gt;
Because threads can share memory, the memory requirements of the hybrid runs is typically smaller than that of pure MPI runs.&lt;br /&gt;
&lt;br /&gt;
Note that the total number of mpi processes in a runjob (i.e., the --np argument) should be the ranks-per-node times the number of nodes (set by bg_size in the loadleveler script). So for the same number of nodes, if you change ranks-per-node by a factor of two, you should also multiply the total number of mpi processes by two.&lt;br /&gt;
&lt;br /&gt;
=== Queue Limits ===&lt;br /&gt;
&lt;br /&gt;
The maximum wall_clock_limit is 24 hours.  Official SOSCIP project jobs are prioritized over all other jobs using a fairshare algorithm with a 14 day rolling window.&lt;br /&gt;
&lt;br /&gt;
A 64 node block is reserved for development and interactive testing for 16 hours, from 8AM to midnight, everyday including weekends. While you can still reserve an interactive block from midnight to 8AM, the priority is given to batch jobs at that time interval in order to keep the machine usage as high as possible. This block is accessed by using the [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] command which has a 30 minute maximum wall_clock_limit. The purpose of this reservation is to ensure short testing jobs are run quickly without being held up by longer production type jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- We need to recover this functionality again. At the moment it doesn't work&lt;br /&gt;
=== BACKFILL scheduling ===&lt;br /&gt;
To optimize the cluster usage, we encourage users to submit jobs according to the available resources on BGQ. The command &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;llAvailableResources&amp;lt;/span&amp;gt; gives for example :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
On the Devel system : only a debugjob can start immediately&lt;br /&gt;
&lt;br /&gt;
On the Prod. system : a job will start immediately if you use 512 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 256 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 128 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 64 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Batch Jobs ===&lt;br /&gt;
&lt;br /&gt;
Job submission is done through loadleveler with a few blue gene specific commands.  The command &amp;quot;bg_size&amp;quot; is in number of nodes, not cores, so a bg_size=64 would be 64x16=1024 cores.&lt;br /&gt;
&lt;br /&gt;
The parameter &amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;bg_size&amp;lt;/span&amp;gt; can only be equal to 64, 128, 256, 512, 1024 and 2048.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;np&amp;lt;/span&amp;gt; &amp;amp;le; ranks-per-node * bg_size&lt;br /&gt;
&lt;br /&gt;
ranks-per-node &amp;amp;le; np&lt;br /&gt;
&lt;br /&gt;
(ranks-per-node * OMP_NUM_THREADS ) &amp;amp;le; 64 &lt;br /&gt;
&lt;br /&gt;
np : number of MPI processes&lt;br /&gt;
&lt;br /&gt;
ranks-per-node : number of MPI processes per node = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
OMP_NUM_THREADS : number of OpenMP thread per MPI process (for hybrid codes) = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue &lt;br /&gt;
&lt;br /&gt;
# Launch all BGQ jobs using runjob&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit to the queue use &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llsubmit myscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== Steps ( Job dependency) ===&lt;br /&gt;
LoadLeveler has a lot of advanced features to control job submission and execution. One of these features is called steps. This feature allows a series of jobs to be submitted using one script with dependencies defined between the jobs. What this allows is for a series of jobs to be run sequentially, waiting for the previous job, called a step, to be finished before the next job is started. The following example uses the same LoadLeveler script as previously shown, however the #@ step_name and #@ dependency directives are used to rerun the same case three times in a row, waiting until each job is finished to start the next.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step1                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the first step :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step1&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step2                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step1 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the second step if the first one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step2&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step3                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step2 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the third step if the second one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step3&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Jobs ===&lt;br /&gt;
&lt;br /&gt;
To see running jobs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq -b&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
to cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llcancel JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and to look at details of the bluegene resources use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstatus -M all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note: the loadleveler script commands  are not run on a bgq compute node but on the front-end node. Only programs started with runjob run on the bgq compute nodes. You should therefore keep scripting in the submission script to a bare minimum.'''&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Stats ===&lt;br /&gt;
&lt;br /&gt;
Use llbgstats to monitor your own stats and/or your group stats. PIs can also print their (current) monthly report.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstats -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Interactive Use / Debugging ===&lt;br /&gt;
&lt;br /&gt;
As BGQ codes are cross-compiled they cannot be run direclty on the front-nodes.  &lt;br /&gt;
Users however only have access to the BGQ through loadleveler which is appropriate for batch jobs, &lt;br /&gt;
however an interactive session is typically beneficial when debugging and developing.   As such a &lt;br /&gt;
script has been written to allow a session in which runjob can be run interactively.  The script&lt;br /&gt;
uses loadleveler to setup a block and set all the correct environment variables and then launch a spawned shell on&lt;br /&gt;
the front-end node. The '''debugjob''' session currently allows a 30 minute session on 64 nodes and when run on &lt;br /&gt;
'''&amp;lt;tt&amp;gt;bgqdev&amp;lt;/tt&amp;gt;''' runs in a dedicated reservation as described previously in the [[BGQ#Queue_Limits | queue limits]] section. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[user@bgqdev-fen1]$ debugjob&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ runjob --np 64 --ranks-per-node=16 --cwd=$PWD : $PWD/my_code -f myflags&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ exit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For debugging, gdb and Allinea DDT are available. The latter is recommended as it automatically attaches to all the processes of a process (instead of attaching a gdbtool by hand (as explained in the BGQ Application Development guide, link below). Simply compile with &amp;lt;tt&amp;gt;-g&amp;lt;/tt&amp;gt;, load the &amp;lt;tt&amp;gt;ddt/4.1&amp;lt;/tt&amp;gt; module, type &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt; and follow the graphical user interface.  The DDT user guide can be found below.&lt;br /&gt;
&lt;br /&gt;
Note: when running a job under ddt, you'll need to add &amp;quot;&amp;lt;tt&amp;gt;--ranks-per-node=X&amp;lt;/tt&amp;gt;&amp;quot; to the &amp;quot;runjob arguments&amp;quot; field.&lt;br /&gt;
&lt;br /&gt;
Apart from debugging, this environment is also useful for building libraries and applications that need to run small tests as part of their 'configure' step.   Within the debugjob session, applications compiled with the bgxl compilers or the mpcc/mpCC/mpfort wrappers, will automatically run on the BGQ, skipping the need for the runjob command, provided if you set the following environment variables &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export BG_PGM_LAUNCHER=yes&lt;br /&gt;
$ export RUNJOB_NP=1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The latter setting sets the number of mpi processes to run.  Most configure scripts expect only one mpi process, thus, &amp;lt;tt&amp;gt;RUNJOB_NP=1&amp;lt;/tt&amp;gt; is appropriate.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
debugjob session with an executable implicitly calls runjob  with 1 mpi task :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
debugjob -i&lt;br /&gt;
**********************************************************&lt;br /&gt;
 Interactive BGQ runjob shell using bgq-fen1-ib0.10295.0 and           &lt;br /&gt;
 LL14040718574824 for 30 minutes with 64 NODES (1024 cores). &lt;br /&gt;
 IMPLICIT MODE: running an executable implicitly calls runjob&lt;br /&gt;
                with 1 mpi task&lt;br /&gt;
 Exit shell when finished.                                &lt;br /&gt;
**********************************************************&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Sub-block jobs ===&lt;br /&gt;
&lt;br /&gt;
BGQ allows multiple applications to share the same block, which is referred to as sub-block jobs, however this needs to be done from within the same loadleveler submission script using multiple calls to runjob.  To run a sub-block job, you need to specify a &amp;quot;--corner&amp;quot; within the block to start each job and a 5D Torus AxBxCxDxE &amp;quot;--shape&amp;quot;.  The starting corner will depend on the specific block details provided by loadleveler and the shape and size of job trying to be used.  &lt;br /&gt;
&lt;br /&gt;
Figuring out what the corners and shapes should be is very tricky (especially since it depends on the block you get allocated).  For that reason, we've created a script called &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; that determines the corners and shape of the sub-blocks.  It only handles the (presumable common) case in which you want to subdivide the block into n equally sized sub-blocks, where n may be 1,2,4,8,16 and 32.&lt;br /&gt;
&lt;br /&gt;
Here is an example script calling &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; with a size of 4 that will return the appropriate $SHAPE argument and an array of 16 starting $CORNER. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# @ job_name           = bgsubblock&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job SUBBLOCK &amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue&lt;br /&gt;
&lt;br /&gt;
# Using subblocks script to set $SHAPE and array of ${CORNERS[n]}&lt;br /&gt;
# with size of subblocks in nodes (ie similiar to bg_size)&lt;br /&gt;
&lt;br /&gt;
# In this case 16 sub-blocks of 4 cnodes each (64 total ie bg_size)&lt;br /&gt;
source subblocks 4&lt;br /&gt;
&lt;br /&gt;
# 16 jobs of 4 each&lt;br /&gt;
for (( i=0; i &amp;lt;  16 ; i++)); do&lt;br /&gt;
   runjob --corner ${CORNER[$i]} --shape $SHAPE --np 64 --ranks-per-node=16 :  your_code_here &amp;gt; $i.out &amp;amp;&lt;br /&gt;
done&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Remember that subjobs are not the ideal way to run on the BlueGene/Qs. One needs to consider that these sub-blocks all have to share the same I/O nodes, so for I/O intensive jobs this will be an inefficient setup.  Also consider that if you need to run such small jobs that you have to run in sub-blocks, it may be more efficient to use other clusters such as the GPC.&lt;br /&gt;
&lt;br /&gt;
Let us know if you run into any issues with this technique, please contact bgq-support for help.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The BGQ has its own dedicated 500TB file system based on GPFS (General Parallel File System). There are two main systems for user data: /home, a small, backed-up space where user home directories are located, and /scratch, a large system for input or output data for jobs; data on /scratch is not backed up. The path to your home directory is in the environment variable $HOME, and will look like /home/G/GROUP/USER, .  The path to your scratch directory is in the environment variable $SCRATCH, and will look like /scratch/G/GROUP/USER (following the conventions of the rest of the SciNet systems).  &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! | file system &lt;br /&gt;
! | purpose &lt;br /&gt;
! | user quota &lt;br /&gt;
! | backed up&lt;br /&gt;
! | purged&lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 50 GB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
|-&lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| first of (20 TB ; 1 million files)&lt;br /&gt;
| no&lt;br /&gt;
| not currently&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Transfering files===&lt;br /&gt;
The BGQ GPFS file system,  except for HPSS, is '''not''' shared with the other SciNet systems (gpc, tcs, p7, arc), nor is the other file system mounted on the BGQ.  &lt;br /&gt;
Use scp to copy files from one file system to the other, e.g., from bgqdev-fen1, you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour login.scinet.utoronto.ca:code.tgz .&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or from a login node you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour code.tgz bgqdev.scinet.utoronto.ca:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The flag &amp;lt;tt&amp;gt;-c arcfour&amp;lt;/tt&amp;gt; is optional. It tells scp (or really, ssh), to use a non-default encryption. The one chosen here, arcfour, has been found to speed up the transfer by a factor of two (you may expect around 85MB/s).  This encryption method is only recommended for copying from the BGQ file system to the regular SciNet GPFS file system or back. &lt;br /&gt;
 &lt;br /&gt;
Note that although these transfers are witihin the same data center, you have to use the full names of the systems, login.scinet.utoronto.ca and bgq.scinet.utoronto.ca, respectively, and that you will be asked you for your password.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''diskUsage'''&amp;lt;/tt&amp;gt; command, available on the bgqdev nodes, provides information in a number of ways on the home and scratch file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;] [-de|-plot]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
       -plot: create plots of disk usages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that the information on usage and quota is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
===Bridge to HPSS===&lt;br /&gt;
&lt;br /&gt;
BGQ users may transfer material to/from HPSS via the GPC archive queue. On the HPSS gateway node (gpc-archive01), the BGQ GPFS file systems are mounted under a single mounting point /bgq (/bgq/scratch and /bgq/home). For detailed information on the use of HPSS [https://support.scinet.utoronto.ca/wiki/index.php/HPSS please read the HPSS wiki section.]&lt;br /&gt;
&lt;br /&gt;
== Software modules installed on the BGQ ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! |Software  &lt;br /&gt;
! | Version&lt;br /&gt;
! | Comments&lt;br /&gt;
! | Command/Library&lt;br /&gt;
! | Module Name&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Compilers &amp;amp; Development Tools'''''&lt;br /&gt;
|-&lt;br /&gt;
|IBM fortran compiler&lt;br /&gt;
|14.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlf,bgxlf_r,bgxlf90,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|xlf&lt;br /&gt;
|-&lt;br /&gt;
|IBM c/c++ compilers&lt;br /&gt;
|12.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlc,bgxlC,bgxlc_r,bgxlC_r,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|vacpp&lt;br /&gt;
|-&lt;br /&gt;
|MPICH2 MPI library&lt;br /&gt;
|1.4.1&lt;br /&gt;
|There are 4 versions (see BGQ Applications Development document).&lt;br /&gt;
|&amp;lt;tt&amp;gt;mpicc,mpicxx,mpif77,mpif90&amp;lt;/tt&amp;gt;&lt;br /&gt;
|mpich2&lt;br /&gt;
|- &lt;br /&gt;
| GCC Compiler&lt;br /&gt;
| 4.4.6, 4.8.1&lt;br /&gt;
| GNU Compiler Collection for BGQ&amp;lt;br&amp;gt;(4.8.1 requires binutils/2.23 to be loaded)&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-gcc, powerpc64-bgq-linux-g++, powerpc64-bgq-linux-gfortran&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgqgcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Clang Compiler&lt;br /&gt;
| r217688-20140912, r263698-20160317&lt;br /&gt;
| Clang cross-compilers for bgq&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-clang, powerpc64-bgq-linux-clang++&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgclang&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Binutils&lt;br /&gt;
| 2.21.1, 2.23&lt;br /&gt;
| Cross-compilation utilities&lt;br /&gt;
| &amp;lt;tt&amp;gt;addr2line, ar, ld, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;binutils&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| CMake	&lt;br /&gt;
| 2.8.8, 2.8.12.1&lt;br /&gt;
| cross-platform, open-source build system&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Git&lt;br /&gt;
| 1.9.5&lt;br /&gt;
| Revision control system&lt;br /&gt;
| &amp;lt;tt&amp;gt;git, gitk&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;git&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Debug/performance tools'''''&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/gdb/ gdb]&lt;br /&gt;
| 7.2&lt;br /&gt;
| GNU Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/ddd/ ddd]&lt;br /&gt;
| 3.3.12&lt;br /&gt;
| GNO Data Display Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [http://www.allinea.com/products/ddt/ DDT]&lt;br /&gt;
| 4.1, 4.2, 5.0.1&lt;br /&gt;
| Allinea's Distributed Debugging Tool&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[HPCTW]]&lt;br /&gt;
| 1.0&lt;br /&gt;
| BGQ MPI and Hardware Counters&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmpihpm.a, libmpihpm_smp.a, libmpitrace.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hptibm&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[MemP]]&lt;br /&gt;
| 1.0.3&lt;br /&gt;
| BGQ Memory Stats&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmemP.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;memP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Storage tools/libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.9-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/189-v18-serial-xlc*&amp;lt;br/&amp;gt;hdf5/189-v18-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.12-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/1812-v18-serial-gcc&amp;lt;br/&amp;gt;hdf5/1812-v18-mpich2-gcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NetCDF&lt;br /&gt;
| 4.2.1.1&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncdump,ncgen,libnetcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;netcdf/4.2.1.1-serial-xlc*&amp;lt;br/&amp;gt;netcdf/4.2.1.1-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Parallel NetCDF&lt;br /&gt;
| 1.3.1&lt;br /&gt;
| Parallel scientific data storage and retrieval using MPI-IO&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpnetcdf.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parallel-netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| ESSL&lt;br /&gt;
| 5.1&lt;br /&gt;
| IBM Engineering and Scientific Subroutine Library (manual below)&lt;br /&gt;
| &amp;lt;tt&amp;gt;libesslbg,libesslsmpbg&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;essl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| WSMP&lt;br /&gt;
| 15.06.01&lt;br /&gt;
| Watson Sparse Matrix Package&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpwsmpBGQ.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;WSMP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| FFTW&lt;br /&gt;
| 2.1.5, 3.3.2, 3.1.2-esslwrapper&lt;br /&gt;
| Fast fourier transform &lt;br /&gt;
| &amp;lt;tt&amp;gt;libsfftw,libdfftw,libfftw3, libfftw3f&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;fftw/2.1.5, fftw/3.3.2, fftw/3.1.2-esslwrapper&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAPACK + ScaLAPACK&lt;br /&gt;
| 3.4.2 + 2.0.2&lt;br /&gt;
| Linear algebra routines. A subset of Lapack may be found in ESSL as well.&lt;br /&gt;
| &amp;lt;tt&amp;gt;liblapack, libscalpack&amp;lt;/tt&amp;gt;&lt;br /&gt;
| lapack&lt;br /&gt;
|-&lt;br /&gt;
| GSL&lt;br /&gt;
| 1.15&lt;br /&gt;
| GNU Scientific Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libgsl, libgslcblas&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gsl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| BOOST&lt;br /&gt;
| 1.47.0, 1.54, 1.57&lt;br /&gt;
| C++ Boost libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libboost...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cxxlibraries/boost&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| bzip2 + szip + zlib&lt;br /&gt;
| 1.0.6 + 2.1 + 1.2.7&lt;br /&gt;
| compression libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libbz2,libz,libsz&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;compression&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| METIS&lt;br /&gt;
| 5.0.2&lt;br /&gt;
| Serial Graph Partitioning and Fill-reducing Matrix Ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;metis&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| ParMETIS&lt;br /&gt;
| 4.0.2&lt;br /&gt;
| Parallel graph partitioning and fill-reducing matrix ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libparmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parmetis&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| OpenSSL&lt;br /&gt;
| 1.0.2 &lt;br /&gt;
| General-purpose cryptography library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libcrypto, libssl&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openssl&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| FILTLAN&lt;br /&gt;
| 1.0&lt;br /&gt;
| The Filtered Lanczos Package &lt;br /&gt;
| &amp;lt;tt&amp;gt;libdfiltlan,libdmatkit,libsfiltlan,libsmatkit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FILTLAN&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Scripting/interpreted languages'''''&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.6.6&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-2.6/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.7.3&lt;br /&gt;
| Python programming language. Modules included : numpy-1.8.0, pyFFTW-0.9.2, astropy-0.3, scipy-0.13.3, mpi4py-1.3.1, h5py-2.2.1&lt;br /&gt;
| &amp;lt;tt&amp;gt;/scinet/bgq/tools/Python/python2.7.3-20131205/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 3.2.2&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-3.2/bin/python3&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Applications'''''&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.abinit.org/ ABINIT]&lt;br /&gt;
| 7.10.4&lt;br /&gt;
| An atomic-scale simulation software suite&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.berkeleygw.org/ BerkeleyGW library]&lt;br /&gt;
| 1.0.4-2.0.0436&lt;br /&gt;
| Computes quasiparticle properties and the optical responses of a large variety of materials&lt;br /&gt;
| &amp;lt;tt&amp;gt;libBGW_wfn.a, wfn_rho_vxc_io_m.mod&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;BGW-paratec&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.cp2k.org/ CP2K]&lt;br /&gt;
| 2.3, 2.4, 2.5.1, 2.6.1&lt;br /&gt;
| DFT molecular dynamics, MPI &lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k.psmp&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.cpmd.org/ CPMD]&lt;br /&gt;
| 3.15.3, 3.17.1&lt;br /&gt;
| Carr-Parinello molecular dynamics, MPI&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd.x&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| gnuplot&lt;br /&gt;
| 4.6.1&lt;br /&gt;
| interactive plotting program to be run on front-end nodes&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAMMPS&lt;br /&gt;
| Nov 2012/7Dec15/7Dec15-mpi&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;lmp_bgq&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;lammps&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NAMD&lt;br /&gt;
| 2.9&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;namd2&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;namd/2.9-smp&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.quantum-espresso.org/index.php Quantum Espresso]&lt;br /&gt;
| 5.0.3/5.2.1&lt;br /&gt;
| Molecular Structure / Quantum Chemistry &lt;br /&gt;
| &amp;lt;tt&amp;gt;qe_pw.x, etc&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;espresso&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[BGQ_OpenFOAM | OpenFOAM]]&lt;br /&gt;
| 2.2.0, 2.3.0, 2.4.0, 3.0.1&lt;br /&gt;
| Computational Fluid Dynamics&lt;br /&gt;
| &amp;lt;tt&amp;gt;icofoam,etc. &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openfoam/2.2.0, openfoam/2.3.0, openfoam/2.4.0&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Beta Tests'''''&lt;br /&gt;
|-&lt;br /&gt;
| WATSON API&lt;br /&gt;
| beta&lt;br /&gt;
| Natural Language Processing&lt;br /&gt;
| &amp;lt;tt&amp;gt;watson_beta&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FEN/WATSON&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== OpenFOAM on BGQ ===&lt;br /&gt;
&lt;br /&gt;
[https://docs.scinet.utoronto.ca/index.php/OpenFOAM_on_BGQ A detailed explanation of OpenFOAM usage on BG/Q cluster]&lt;br /&gt;
&lt;br /&gt;
== Python on BlueGene ==&lt;br /&gt;
Python 2.7.3 has been installed on BlueGene. To use &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Numpy&amp;lt;/span&amp;gt; and &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Scipy&amp;lt;/span&amp;gt;, the module &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;essl/5.1&amp;lt;/span&amp;gt; has to be loaded.&lt;br /&gt;
The full python path has to be provided (otherwise the default version is used).&lt;br /&gt;
&lt;br /&gt;
To use python on BlueGene (from within a job script or a debugjob session):&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load python/2.7.3&lt;br /&gt;
##Only if you need numpy/scipy :&lt;br /&gt;
module load xlf/14.1 essl/5.1&lt;br /&gt;
runjob --np 1 --ranks-per-node=1 --envs HOME=$HOME LD_LIBRARY_PATH=$LD_LIBRARY_PATH PYTHONPATH=/scinet/bgq/tools/Python/python2.7.3-20131205/lib/python2.7/site-packages/ : /scinet/bgq/tools/Python/python2.7.3-20131205/bin/python2.7 /PATHOFYOURSCRIPT.py &lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to use the mmap python API, you must use it in PRIVATE mode as shown in the bellow example :&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
import mmap&lt;br /&gt;
mm=mmap.mmap(-1,256,mmap.MAP_PRIVATE)&lt;br /&gt;
mm.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, you can use the mpi4py and h5py modules.&lt;br /&gt;
&lt;br /&gt;
Also, please read Cython documentation.&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
#BGQ Day: Introduction to Using the BG/Q [[Media:BgqintroUpdatedMarch2015.pdf|Slides (updated in 2015) ]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.mp4 (direct link)]&lt;br /&gt;
#BGQ Day: BG/Q Hardware Overview [https://support.scinet.utoronto.ca/~northrup/bgqhardware.pdf Slides] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.mp4 (direct link)]&lt;br /&gt;
# [http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUQUEEN/Documentation/Documention_node.html Julich BGQ Documentation]&lt;br /&gt;
# [https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q Argonne Mira BGQ Wiki]&lt;br /&gt;
# [https://computing.llnl.gov/tutorials/bgq/ LLNL Sequoia BGQ Info]&lt;br /&gt;
# [https://www.alcf.anl.gov/presentations Argonne MiraCon Presentations]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_SysAdmin.pdf|BGQ System Administration Guide]]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_AppDev.pdf|BGQ Application Development]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqccompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqclangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfcompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqflangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:Bgqfproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# [[Media:essl51.pdf|IBM ESSL (Engineering and Scientific Subroutine Library) 5.1 for Linux on Power]]&lt;br /&gt;
# [http://content.allinea.com/downloads/userguide.pdf Allinea DDT 4.1 User Guide]&lt;br /&gt;
# [https://www.ibm.com/support/knowledgecenter/en/SSFJTW_5.1.0/loadl.v5r1_welcome.html IBM LoadLeveler 5.1]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--  PUT IN TRAC !!!&lt;br /&gt;
&lt;br /&gt;
=== *Manual Block Creation* ===&lt;br /&gt;
&lt;br /&gt;
To reconfigure the BGQ nodes you can use the bg_console or the web based navigator from the service node &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bg_console&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are various options to create block types (section 3.2 in the BGQ admin manual), but the smallest is created using the&lt;br /&gt;
following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gen_small_block &amp;lt;blockid&amp;gt; &amp;lt;midplane&amp;gt; &amp;lt;cnodes&amp;gt; &amp;lt;nodeboard&amp;gt; &lt;br /&gt;
gen_small_block  R00-M0-N03-32 R00-M0 32 N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The block then needs to be booted using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
allocate R00-M0-N03-32&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If those resources are already booted into another block, that block must be freed before the new block can be &lt;br /&gt;
allocated.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
free R00-M0-N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many other functions in bg_console:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
help all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The BGQ default nomenclature for hardware is as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
(R)ack - (M)idplane - (N)ode board or block - (J)node - (C)ore&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So R00-M01-N03-J00-C02 would correspond to the first rack, second midplane, 3rd block, 1st node, and second core.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--!&amp;gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=OpenFOAM_on_BGQ&amp;diff=439</id>
		<title>OpenFOAM on BGQ</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=OpenFOAM_on_BGQ&amp;diff=439"/>
		<updated>2018-05-25T18:13:31Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: Created page with &amp;quot;== Using OpenFOAM on BG/Q == There are various OpenFOAM versions installed on BGQ. You can see the list by typing &amp;quot;module avail&amp;quot; on the terminal: * OpenFOAM/2.3.1(default) * O...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Using OpenFOAM on BG/Q ==&lt;br /&gt;
There are various OpenFOAM versions installed on BGQ. You can see the list by typing &amp;quot;module avail&amp;quot; on the terminal:&lt;br /&gt;
* OpenFOAM/2.3.1(default)&lt;br /&gt;
* OpenFOAM/2.4.0&lt;br /&gt;
* OpenFOAM/3.0.1&lt;br /&gt;
* OpenFOAM/5.0&lt;br /&gt;
and&lt;br /&gt;
* FEN/OpenFOAM/2.2.0&lt;br /&gt;
* FEN/OpenFOAM/2.3.0&lt;br /&gt;
* FEN/OpenFOAM/2.4.0&lt;br /&gt;
* FEN/OpenFOAM/3.0.1&lt;br /&gt;
* FEN/OpenFOAM/5.0 &lt;br /&gt;
&lt;br /&gt;
The modules start with FEN refer to the installations can be used on the Front-End-Nodes. Therefore if you want to run serial tasks such as blockMesh, decomposePar or reconstructParMesh, please use FEN/OpenFOAM/* modules. Do not forget that FEN is not a dedicated area, each Front-End-Node is shared among connected users and only has 32GB of memory. So if you try to decompose a case with 100 million cells, you will occupy the whole FEN machine and run out of memory therefore make it unavailable for everyone.&lt;br /&gt;
&lt;br /&gt;
When you want to submit a job, you should do that on the FEN using a batch script by typing the modules you want load inside the batch script. This is the only way of using compute nodes on BGQ. There is a sample batch script below. You can use it as a template and modify it according to your needs.&lt;br /&gt;
&lt;br /&gt;
== Running Serial OpenFOAM Tasks ==&lt;br /&gt;
&lt;br /&gt;
As it has been written in the previous section, if you want to run serial tasks you need to use one of the FEN based modules. Most common serial tasks are:&lt;br /&gt;
* blockMesh: Creates the block structured computational volume consists of hex elements.&lt;br /&gt;
* decomposePar: Parallelises a serial case. Grid partitioning.&lt;br /&gt;
* reconstructPar: Reconstructs a parallel case (results). &lt;br /&gt;
* reconstructParMesh: Reconstructs a parallel case (mesh). &lt;br /&gt;
&lt;br /&gt;
These binaries are not available on the compute nodes, therefore you can use these tools only on the FEN anyway.&lt;br /&gt;
&lt;br /&gt;
== Parallelizing OpenFOAM Cases ==&lt;br /&gt;
&lt;br /&gt;
In order to run OpenFOAM in parallel, the problem needs to be decomposed into a number of subdomains that match the number of processors that will be used. OpenFOAM has a  '''[http://www.openfoam.org/docs/user/running-applications-parallel.php decomposePar]''' utility that performs this operation. The control for this is done creating a OpenFOAM dictionary called decomposeParDict in the system directory of your case folder. decomposeParDict is the input file for the command &amp;quot;decomposePar -force&amp;quot;. Below is an example file for decomposing an OpenFOAM case for running on 4 cores.&lt;br /&gt;
&lt;br /&gt;
'''system/decomposeParDict'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
/*--------------------------------*- C++ -*----------------------------------*\&lt;br /&gt;
| =========                 |                                                 |&lt;br /&gt;
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |&lt;br /&gt;
|  \\    /   O peration     | Version:  2.4.0                                 |&lt;br /&gt;
|   \\  /    A nd           | Web:      www.OpenFOAM.org                      |&lt;br /&gt;
|    \\/     M anipulation  |                                                 |&lt;br /&gt;
\*---------------------------------------------------------------------------*/&lt;br /&gt;
FoamFile&lt;br /&gt;
{&lt;br /&gt;
    version     2.0;&lt;br /&gt;
    format      ascii;&lt;br /&gt;
    class       dictionary;&lt;br /&gt;
    location    &amp;quot;system&amp;quot;;&lt;br /&gt;
    object      decomposeParDict;&lt;br /&gt;
}&lt;br /&gt;
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //&lt;br /&gt;
&lt;br /&gt;
numberOfSubdomains 4;&lt;br /&gt;
&lt;br /&gt;
method          simple;&lt;br /&gt;
&lt;br /&gt;
simpleCoeffs&lt;br /&gt;
{&lt;br /&gt;
    n               ( 2 2 1 );&lt;br /&gt;
    delta           0.001;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
// ************************************************************************* //&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another option for decomposition is hierarchical. If you use this method, then similar to simple you have to define hierarchicalCoeffs. Only difference between simple and hierarchical is that with hierarchical method you can define the order of the decomposition operation (xyz or zyx). There are more complicated methods of decomposition supported by OpenFOAM but since this a serial tasks that needs to be performed on FEN, these two methods are suggested.&lt;br /&gt;
&lt;br /&gt;
The crucial part of the decomposeParDict is the numberOfSubdomains defined in the file. The intended number of cores should match this value. Therefore if one wants to run a case on 64 nodes using all cores then numberOfSubdomains should be 1024. Also, multiplication of the n values should be equal to this number for consistency. Otherwise OpenFOAM will complain because of the mismatch.&lt;br /&gt;
&lt;br /&gt;
== Running Parallel Meshing ==&lt;br /&gt;
The built-in meshing tool comes with OpenFOAM package is called snappyHexMesh. This tool reads inputs from the &amp;quot;system/snappyHexMeshDict&amp;quot; file and writes outputs to the &amp;quot;constant/polyMesh&amp;quot; folder (if used with -overwrite flag, otherwise writes to separate time folders 1/, 2/). snappyHexMesh operates on the outputs of blockMesh, refines specified regions, snaps out solid areas from the volume and adds boundary layers if enabled. &lt;br /&gt;
&lt;br /&gt;
Before running mesh generation one needs to run &amp;quot;decomposePar -force&amp;quot;, so that the case is parallelised and made available to run parallel executions on it. One can submit the script below to run parallel mesh generation on BG/Q:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = motorBike_mesh&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(jobid).err&lt;br /&gt;
# @ output             = $(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 06:00:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue &lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module purge&lt;br /&gt;
module load binutils/2.23 bgqgcc/4.8.1 mpich2/gcc-4.8.1 OpenFOAM/5.0&lt;br /&gt;
source $FOAM_DOT_FILE&lt;br /&gt;
&lt;br /&gt;
# NOTE: when using --env-all there is a limit of 8192 characters that can be passed to runjob&lt;br /&gt;
# so removing LS_COLORS should free up enough space&lt;br /&gt;
export -n LS_COLORS&lt;br /&gt;
&lt;br /&gt;
# Disabling the pt2pt small message optimizations - Solves hanging problems&lt;br /&gt;
export PAMID_SHORT=0&lt;br /&gt;
&lt;br /&gt;
# Sets the cutoff point for switching from eager to rendezvous protocol at 50MB&lt;br /&gt;
export PAMID_EAGER=50M&lt;br /&gt;
&lt;br /&gt;
# Do not optimise collective comm. - Solves termination with signal 36 issue&lt;br /&gt;
export PAMID_COLLECTIVES=0&lt;br /&gt;
&lt;br /&gt;
# Do not generate core dump files&lt;br /&gt;
export BG_COREDUMPDISABLED=1&lt;br /&gt;
&lt;br /&gt;
# Run mesh generation&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --env-all : $FOAM_APPBIN/snappyHexMesh -overwrite -parallel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Loadleveler Submission Script for Solvers ==&lt;br /&gt;
&lt;br /&gt;
The following is a sample script for running the OpenFOAM tutorial case on BG/Q:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgqopenfoam&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 06:00:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue &lt;br /&gt;
&lt;br /&gt;
#------------------ Solver on BGQ --------------------&lt;br /&gt;
# Load BGQ OpenFOAM modules&lt;br /&gt;
module purge&lt;br /&gt;
module load binutils/2.23 bgqgcc/4.8.1 mpich2/gcc-4.8.1 OpenFOAM/5.0&lt;br /&gt;
source $FOAM_DOT_FILE&lt;br /&gt;
&lt;br /&gt;
# NOTE: when using --env-all there is a limit of 8192 characters that can passed to runjob&lt;br /&gt;
# so removing LS_COLORS should free up enough space&lt;br /&gt;
export -n LS_COLORS&lt;br /&gt;
&lt;br /&gt;
# Some solvers, simpleFOAM particularly, will hang on startup when using the default&lt;br /&gt;
# network parameters.  Disabling the pt2pt small message optimizations seems to allow it to run.&lt;br /&gt;
export PAMID_SHORT=0&lt;br /&gt;
export PAMID_EAGER=50M&lt;br /&gt;
&lt;br /&gt;
# Do not optimise collective comm.&lt;br /&gt;
export PAMID_COLLECTIVES=0&lt;br /&gt;
&lt;br /&gt;
# Do not generate core dump files&lt;br /&gt;
export BG_COREDUMPDISABLED=1&lt;br /&gt;
&lt;br /&gt;
# Run solver&lt;br /&gt;
runjob --np 1024 --env-all  : $FOAM_APPBIN/icoFoam -parallel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Typical OpenFOAM Applications on BG/Q ==&lt;br /&gt;
A list of examples will be shared here. These sample cases are derived from applications that are run on BG/Q but changed for confidentiality reasons. It can guide new users for their specific use cases. Most of the information here is OpenFOAM specific, not BG/Q specific.&lt;br /&gt;
&lt;br /&gt;
=== Wind Flow Around Buildings ===&lt;br /&gt;
This is a tutorial case that can be found in $FOAM_TUTORIALS/incompressible/simpleFoam/windAroundBuildings&lt;br /&gt;
&lt;br /&gt;
=== Rotational Flows in OpenFOAM ===&lt;br /&gt;
Information will be added soon!&lt;br /&gt;
&lt;br /&gt;
=== LES Models in OpenFOAM ===&lt;br /&gt;
Information will be added soon!&lt;br /&gt;
&lt;br /&gt;
=== Multiphase Flows in OpenFOAM ===&lt;br /&gt;
Information will be added soon!&lt;br /&gt;
&lt;br /&gt;
== Post-Processing== &lt;br /&gt;
&lt;br /&gt;
Visualisations can be done on the Niagara Cluster!&lt;br /&gt;
&lt;br /&gt;
https://docs.scinet.utoronto.ca/index.php/Visualization&lt;br /&gt;
&lt;br /&gt;
== General Tips and Tricks ==&lt;br /&gt;
&lt;br /&gt;
* Run serial tasks on FEN using FEN/OpenFOAM/* modules&lt;br /&gt;
* Make a quality check for your mesh using checkMesh tool. Be careful that if you run a serial checkMesh in a parallel case, it will only return results from &amp;quot;case/constant/polyMesh&amp;quot; not from &amp;quot;case/processor*/constant/polyMesh&amp;quot;&lt;br /&gt;
* Perform test runs using debug nodes before you submit large jobs. Request debug session with &amp;quot;debugjob -i&amp;quot; and use runjob.&lt;br /&gt;
* Always work with binary files. This can be set in the &amp;quot;case/system/controlDict&amp;quot;.&lt;br /&gt;
* You can convert cases from ascii to binary using foamFormatConvert command.&lt;br /&gt;
* Keep your simulations under $SCRATCH.&lt;br /&gt;
* If you write your own code, keep them under $HOME. Preferably create a directory &amp;quot;$HOME/OpenFOAM/username-X.Y/src&amp;quot; and work here.&lt;br /&gt;
* If you write your own code, do not forget to compile them to $FOAM_USER_APPBIN or $FOAM_USER_LIBBIN. You might need to compile shared objects on debug nodes as well.&lt;br /&gt;
* OpenFOAM is a pure MPI code, there is no multithreading in OpenFOAM.&lt;br /&gt;
* Each and every node on BG/Q has 16 GB memory and 16 compute cores. Some OpenFOAM functions, especially snappyHexMesh, are very memory consuming up to 4GB memory per 1M cells. Use 8 ranks per node if you run out of memory however be careful with that. Do not waste resources. Usually solvers require 1GB memory per 1M cells which allows users to fully utilize all 16 compute cores on a node.&lt;br /&gt;
* Try collated option using the version 5.0. It significantly reduces the number of files however master processor gets overloaded.&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=438</id>
		<title>BGQ</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=BGQ&amp;diff=438"/>
		<updated>2018-05-25T18:10:25Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: /* Documentation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Blue_Gene_Cabinet.jpeg|center|300px|thumb]]&lt;br /&gt;
|name=Blue Gene/Q (BGQ)&lt;br /&gt;
|installed=Aug 2012, Nov 2014&lt;br /&gt;
|operatingsystem= RH6.3, CNK (Linux) &lt;br /&gt;
|loginnode= bgqdev-fen1&lt;br /&gt;
|nnodes=  4096 nodes (65,536 cores)&lt;br /&gt;
|rampernode=16 GB &lt;br /&gt;
|corespernode=16 (64 threads)&lt;br /&gt;
|interconnect=5D Torus (jobs), QDR Infiniband (I/O) &lt;br /&gt;
|vendorcompilers= bgxlc, bgxlf&lt;br /&gt;
|queuetype=Loadleveler&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==System Status==&lt;br /&gt;
&lt;br /&gt;
The current BGQ system status can be found on the wiki's [[Main Page]].&lt;br /&gt;
&lt;br /&gt;
==SOSCIP &amp;amp; LKSAVI==&lt;br /&gt;
&lt;br /&gt;
The BGQ is a Southern Ontario Smart Computing&lt;br /&gt;
Innovation Platform ([http://soscip.org/ SOSCIP]) BlueGene/Q supercomputer located at the&lt;br /&gt;
University of Toronto's SciNet HPC facility. The SOSCIP &lt;br /&gt;
multi-university/industry consortium is funded by the Ontario Government &lt;br /&gt;
and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
A half-rack of BlueGene/Q (8,192 cores) was purchased by the [http://likashingvirology.med.ualberta.ca/ Li Ka Shing Institute of Virology] at the University of Alberta in late fall 2014 and integrated into the existing BGQ system.&lt;br /&gt;
&lt;br /&gt;
The combined 4 rack system is the fastest Canadian supercomputer on the [http://top500.org/ top 500], currently at the 120th place (Nov 2015).&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:bgq-support@scinet.utoronto.ca &amp;lt;bgq-support@scinet.utoronto.ca&amp;gt;] for BGQ-specific inquiries.&lt;br /&gt;
&lt;br /&gt;
==Specifications==&lt;br /&gt;
&lt;br /&gt;
BGQ is an extremely dense and energy efficient 3rd generation Blue Gene IBM supercomputer built around a system-on-a-chip compute node that has a 16core 1.6GHz PowerPC based CPU (PowerPC A2) with 16GB of Ram.  The nodes are bundled in groups of 32 into a node board (512 cores), and 16 boards make up a midplane (8192 cores) with 2 midplanes per rack, or 16,348 cores and 16 TB of RAM per rack. The compute nodes run a very lightweight Linux-based operating system called CNK ('''C'''ompute '''N'''ode '''K'''ernel).  The compute nodes are all connected together using a custom 5D torus highspeed interconnect. Each rack has 16 I/O nodes that run a full Redhat Linux OS that manages the compute nodes and mounts the filesystem.  SciNet's BGQ consists of 8 mdiplanes (four-racks) totalling 65,536 cores and 64TB of RAM.&lt;br /&gt;
&lt;br /&gt;
[[Image:BlueGeneQHardware2.png‎ |center]]&lt;br /&gt;
&lt;br /&gt;
=== 5D Torus Network ===&lt;br /&gt;
&lt;br /&gt;
The network topology of BlueGene/Q is a five-dimensional (5D) torus, with direct links between the nearest neighbors in the ±A, ±B, ±C, ±D, and ±E directions.  As such there are only a few optimum block sizes that will use the network efficiently.&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellspacing=&amp;quot;0&amp;quot; cellpadding=&amp;quot;2&amp;quot;&lt;br /&gt;
| '''Node Boards '''&lt;br /&gt;
| '''Compute Nodes'''&lt;br /&gt;
| '''Cores'''&lt;br /&gt;
| '''Torus Dimensions'''&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 32&lt;br /&gt;
| 512&lt;br /&gt;
| 2x2x2x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 2 (adjacent pairs)&lt;br /&gt;
| 64&lt;br /&gt;
| 1024&lt;br /&gt;
| 2x2x4x2x2&lt;br /&gt;
|-&lt;br /&gt;
| 4 (quadrants)&lt;br /&gt;
| 128&lt;br /&gt;
| 2048&lt;br /&gt;
| 2x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 8 (halves)&lt;br /&gt;
| 256&lt;br /&gt;
| 4096&lt;br /&gt;
| 4x2x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 16 (midplane)&lt;br /&gt;
| 512&lt;br /&gt;
| 8192&lt;br /&gt;
| 4x4x4x4x2&lt;br /&gt;
|-&lt;br /&gt;
| 32 (1 rack)&lt;br /&gt;
| 1024&lt;br /&gt;
| 16384&lt;br /&gt;
| 4x4x4x8x2 &lt;br /&gt;
|-&lt;br /&gt;
| 64 (2 racks)&lt;br /&gt;
| 2048&lt;br /&gt;
| 32768&lt;br /&gt;
| 4x4x8x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 96 (3 racks)&lt;br /&gt;
| 3072&lt;br /&gt;
| 49152&lt;br /&gt;
| 4x4x12x8x2&lt;br /&gt;
|-&lt;br /&gt;
| 128 (4 racks)&lt;br /&gt;
| 4096&lt;br /&gt;
| 65536&lt;br /&gt;
| 8x4x8x8x2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Login/Devel Node ==&lt;br /&gt;
&lt;br /&gt;
The development node is '''bgqdev-fen1''' which one can login to from the regular '''login.scinet.utoronto.ca''' login nodes or directly from outside using '''bgqdev.scinet.utoronto.ca''', e.g.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -l USERNAME bgqdev.scinet.utoronto.ca -X&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where USERNAME is your username on the BGQ and the &amp;lt;tt&amp;gt;-X&amp;lt;/tt&amp;gt; flag is optional, needed only if you will use X graphics.&amp;lt;br/&amp;gt;&lt;br /&gt;
Note: To learn how to setup ssh keys for logging in please see [[Ssh keys]].&lt;br /&gt;
&lt;br /&gt;
These development node is a Power7 machines running Linux which serve as the compilation and submission host for the BGQ.  Programs are cross-compiled for the BGQ on this node and then submitted to the queue using loadleveler.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including most of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
A list of the installed software can be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module unload vacpp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands can go in your .bashrc files to make sure you are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
Modules that load libraries, define environment variables pointing to the location of library files and include files for use Makefiles. These environment variables follow the naming convention&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 $SCINET_[short-module-name]_BASE&lt;br /&gt;
 $SCINET_[short-module-name]_LIB&lt;br /&gt;
 $SCINET_[short-module-name]_INC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
for the base location of the module's files, the location of the libraries binaries and the header files, respectively.&lt;br /&gt;
&lt;br /&gt;
So to compile and link the library, you will have to add &amp;lt;tt&amp;gt;-I${SCINET_[module-basename]_INC}&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;-L${SCINET_[module-basename]_LIB}&amp;lt;/tt&amp;gt;, respectively, in addition to the usual &amp;lt;tt&amp;gt;-l[libname]&amp;lt;/tt&amp;gt;.  &lt;br /&gt;
&lt;br /&gt;
Note that a &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; command ''only'' sets the environment variables in your current shell (and any subprocesses that the shell launches).   It does ''not'' effect other shell environments.&lt;br /&gt;
&lt;br /&gt;
If you always require the same modules, it is easiest to load those modules in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and then they will always be present in your environment; if you routinely have to flip back and forth between modules, it is easiest to have almost no modules loaded in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; and simply load them as you need them (and have the required &amp;lt;tt&amp;gt;module load&amp;lt;/tt&amp;gt; commands in your job submission scripts).&lt;br /&gt;
&lt;br /&gt;
=== Compilers ===&lt;br /&gt;
&lt;br /&gt;
The BGQ uses IBM XL compilers to cross-compile code for the BGQ.  Compilers are available for FORTRAN, C, and C++.  They are accessible by default, or by loading the '''xlf''' and '''vacpp''' modules. The compilers by default produce&lt;br /&gt;
static binaries, however with BGQ it is possible to now use dynamic libraries as well.  The compilers follow the XL conventions with the prefix '''bg''',&lt;br /&gt;
so '''bgxlc''' and '''bgxlf90''' are the C and FORTRAN compilers respectively.  &lt;br /&gt;
&lt;br /&gt;
Most users however will use the MPI variants, i.e. '''mpixlf90''' and '''mpixlc''' and  which are available by loading&lt;br /&gt;
the '''mpich2''' module. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load mpich2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is recommended to use at least the following flags when compiling and linking&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-O3 -qarch=qp -qtune=qp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to build a package for which the configure script tries to run small test jobs, the cross-compiling nature of the bgq can get in the way.  In that case, you should use the interactive [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] environment as described below.&lt;br /&gt;
&lt;br /&gt;
== ION/Devel Nodes ==&lt;br /&gt;
&lt;br /&gt;
There are also bgq native development nodes named '''bgqdev-ion[01-24]''' which one can login to directly, i.e. ssh, from '''bgqdev-fen1'''.  These nodes are extra I/O nodes that are essentially the same as the BGQ compute nodes with the exception that they run a full RedHat Linux and have an infiniband interface providing direct network access.    Unlike the regular development node, '''bgqdev-fen1''', which is Power7, this node has the same BGQ A2 processor, and thus cross compilations are not required which can make building some software easier.    &lt;br /&gt;
&lt;br /&gt;
'''NOTE''': BGQ MPI jobs can be compiled on these nodes, however can not be run locally as the mpich2 is setup for the BGQ network and thus will fail on these nodes.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
As the BlueGene/Q architecture is different from the development nodes, you cannot run applications intended/compiled for the BGQ on the devel nodes. The only way to run (or even test) your program is to submit a job to the BGQ.  Jobs are submitted as scripts through loadleveler. That script must then use '''runjob''' to start the job, which in many ways similar to mpirun or mpiexec.  As shown above in the network topology overview, there are only a few optimum job size configurations which is also further constrained by each block requiring a minimum of one IO node.  In SciNet's configuration (with 8 I/O nodes per midplane) this allows 64 nodes (1024 cores) to be the smallest block size. Normally a block size matches the job size to offer fully dedicated resources to the job.  Smaller jobs can be run within the same block however this results in shared resources (network and IO) and are referred to as sub-block jobs and are described in more detail below.  &lt;br /&gt;
&lt;br /&gt;
=== runjob ===&lt;br /&gt;
&lt;br /&gt;
All BGQ runs are launched using '''runjob''' which for those familiar with MPI is analogous to mpirun/mpiexec.  Jobs run on a block, which is a predefined group of nodes that have already been configured and booted.  There are two ways to get a block. One way is to use a 30-minute 'debugjob' session (more about that below). The other, more common case, is using a job script submitted and are running using loadleveler. Inside the job script, this block is set for you, and you do not have to specify the block name.  For example, if your loadleveler job script requests 64 nodes, each with 16 cores (for a total of 1024 cores), from within that job script, you can run a job with 16 processes per node and 1024 total processes with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here, &amp;lt;tt&amp;gt;--np 1024&amp;lt;/tt&amp;gt; sets the total number of mpi tasks, while &amp;lt;tt&amp;gt;--ranks-per-node=16&amp;lt;/tt&amp;gt; specifies that 16 processes should run on each node.&lt;br /&gt;
For pure mpi jobs, it is advisable always to give the number of ranks per node, because the default value of 1 may leave 15 cores on the node idle. The argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- (Note: If this were not a loadleveler job, and the block ID was R00-M0-N03-64, the command would be &amp;quot;&amp;lt;tt&amp;gt;runjob --block R00-M0-N03-64 --np 1024 --ranks-per-node=16 --cwd=$PWD : $PWD/code -f file.in&amp;lt;/tt&amp;gt;&amp;quot;) --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
runjob flags are shown with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
runjob -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
a particularly useful one is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--verbose #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where # is from 1-7 which can be helpful in debugging an application.&lt;br /&gt;
&lt;br /&gt;
=== How to set ranks-per-node ===&lt;br /&gt;
&lt;br /&gt;
There are 16 cores per node, but the argument to ranks-per-node may be 1, 2, 4, 8, 16, 32, or 64.  While it may seem natural to set ranks-per-node to 16, this is not generally recommended.  On the BGQ, one can efficiently run more than 1 process per core, because each core has four &amp;quot;hardware threads&amp;quot; (similar to HyperThreading on the GPC and Simultaneous Multi Threading on the TCS and P7), which can keep the different parts of each core busy at the same time. One would therefore ideally use 64 ranks per node.  There are two main reason why one might not set ranks-per-node to 64:&lt;br /&gt;
# The memory requirements do not allow 64 ranks (each rank only has 256MB of memory)&lt;br /&gt;
# The application is more efficient in a hybrid MPI/OpenMP mode (or MPI/pthreads). Using less ranks-per-node, the hardware threads are used as OpenMP threads within each process.&lt;br /&gt;
Because threads can share memory, the memory requirements of the hybrid runs is typically smaller than that of pure MPI runs.&lt;br /&gt;
&lt;br /&gt;
Note that the total number of mpi processes in a runjob (i.e., the --np argument) should be the ranks-per-node times the number of nodes (set by bg_size in the loadleveler script). So for the same number of nodes, if you change ranks-per-node by a factor of two, you should also multiply the total number of mpi processes by two.&lt;br /&gt;
&lt;br /&gt;
=== Queue Limits ===&lt;br /&gt;
&lt;br /&gt;
The maximum wall_clock_limit is 24 hours.  Official SOSCIP project jobs are prioritized over all other jobs using a fairshare algorithm with a 14 day rolling window.&lt;br /&gt;
&lt;br /&gt;
A 64 node block is reserved for development and interactive testing for 16 hours, from 8AM to midnight, everyday including weekends. While you can still reserve an interactive block from midnight to 8AM, the priority is given to batch jobs at that time interval in order to keep the machine usage as high as possible. This block is accessed by using the [[BGQ#Interactive_Use_.2F_Debugging | &amp;lt;tt&amp;gt;'''debugjob'''&amp;lt;/tt&amp;gt;]] command which has a 30 minute maximum wall_clock_limit. The purpose of this reservation is to ensure short testing jobs are run quickly without being held up by longer production type jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- We need to recover this functionality again. At the moment it doesn't work&lt;br /&gt;
=== BACKFILL scheduling ===&lt;br /&gt;
To optimize the cluster usage, we encourage users to submit jobs according to the available resources on BGQ. The command &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;llAvailableResources&amp;lt;/span&amp;gt; gives for example :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
On the Devel system : only a debugjob can start immediately&lt;br /&gt;
&lt;br /&gt;
On the Prod. system : a job will start immediately if you use 512 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 256 nodes requesting a walltime T &amp;lt;= 21 hours and 11 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 128 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min &lt;br /&gt;
On the Prod. system : a job will start immediately if you use 64 nodes requesting a walltime T &amp;lt;= 24 hours and 0 min&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Batch Jobs ===&lt;br /&gt;
&lt;br /&gt;
Job submission is done through loadleveler with a few blue gene specific commands.  The command &amp;quot;bg_size&amp;quot; is in number of nodes, not cores, so a bg_size=64 would be 64x16=1024 cores.&lt;br /&gt;
&lt;br /&gt;
The parameter &amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;bg_size&amp;lt;/span&amp;gt; can only be equal to 64, 128, 256, 512, 1024 and 2048.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;np&amp;lt;/span&amp;gt; &amp;amp;le; ranks-per-node * bg_size&lt;br /&gt;
&lt;br /&gt;
ranks-per-node &amp;amp;le; np&lt;br /&gt;
&lt;br /&gt;
(ranks-per-node * OMP_NUM_THREADS ) &amp;amp;le; 64 &lt;br /&gt;
&lt;br /&gt;
np : number of MPI processes&lt;br /&gt;
&lt;br /&gt;
ranks-per-node : number of MPI processes per node = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
OMP_NUM_THREADS : number of OpenMP thread per MPI process (for hybrid codes) = 1 , 2 , 4 , 8 , 16 , 32 , 64&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue &lt;br /&gt;
&lt;br /&gt;
# Launch all BGQ jobs using runjob&lt;br /&gt;
runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit to the queue use &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llsubmit myscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== Steps ( Job dependency) ===&lt;br /&gt;
LoadLeveler has a lot of advanced features to control job submission and execution. One of these features is called steps. This feature allows a series of jobs to be submitted using one script with dependencies defined between the jobs. What this allows is for a series of jobs to be run sequentially, waiting for the previous job, called a step, to be finished before the next job is started. The following example uses the same LoadLeveler script as previously shown, however the #@ step_name and #@ dependency directives are used to rerun the same case three times in a row, waiting until each job is finished to start the next.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step1                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the first step :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step1&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step2                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step1 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the second step if the first one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step2&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @ job_name           = bgsample&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job By Size&amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ step_name = step3                                                                                                                                                                                                                        &lt;br /&gt;
# @ dependency = step2 == 0                                                                                                                                                                                                                        &lt;br /&gt;
# @ queue&lt;br /&gt;
# Launch the third step if the second one has returned 0 (done successfully) :&lt;br /&gt;
if [ $LOADL_STEP_NAME = &amp;quot;step3&amp;quot; ]; then&lt;br /&gt;
    runjob --np 1024 --ranks-per-node=16 --envs OMP_NUM_THREADS=1 --cwd=$SCRATCH/ : $HOME/mycode.exe myflags&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Jobs ===&lt;br /&gt;
&lt;br /&gt;
To see running jobs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llq -b&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
to cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llcancel JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and to look at details of the bluegene resources use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstatus -M all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note: the loadleveler script commands  are not run on a bgq compute node but on the front-end node. Only programs started with runjob run on the bgq compute nodes. You should therefore keep scripting in the submission script to a bare minimum.'''&lt;br /&gt;
&lt;br /&gt;
=== Monitoring Stats ===&lt;br /&gt;
&lt;br /&gt;
Use llbgstats to monitor your own stats and/or your group stats. PIs can also print their (current) monthly report.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
llbgstats -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Interactive Use / Debugging ===&lt;br /&gt;
&lt;br /&gt;
As BGQ codes are cross-compiled they cannot be run direclty on the front-nodes.  &lt;br /&gt;
Users however only have access to the BGQ through loadleveler which is appropriate for batch jobs, &lt;br /&gt;
however an interactive session is typically beneficial when debugging and developing.   As such a &lt;br /&gt;
script has been written to allow a session in which runjob can be run interactively.  The script&lt;br /&gt;
uses loadleveler to setup a block and set all the correct environment variables and then launch a spawned shell on&lt;br /&gt;
the front-end node. The '''debugjob''' session currently allows a 30 minute session on 64 nodes and when run on &lt;br /&gt;
'''&amp;lt;tt&amp;gt;bgqdev&amp;lt;/tt&amp;gt;''' runs in a dedicated reservation as described previously in the [[BGQ#Queue_Limits | queue limits]] section. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[user@bgqdev-fen1]$ debugjob&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ runjob --np 64 --ranks-per-node=16 --cwd=$PWD : $PWD/my_code -f myflags&lt;br /&gt;
&lt;br /&gt;
[user@bgqdev-fen1]$ exit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For debugging, gdb and Allinea DDT are available. The latter is recommended as it automatically attaches to all the processes of a process (instead of attaching a gdbtool by hand (as explained in the BGQ Application Development guide, link below). Simply compile with &amp;lt;tt&amp;gt;-g&amp;lt;/tt&amp;gt;, load the &amp;lt;tt&amp;gt;ddt/4.1&amp;lt;/tt&amp;gt; module, type &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt; and follow the graphical user interface.  The DDT user guide can be found below.&lt;br /&gt;
&lt;br /&gt;
Note: when running a job under ddt, you'll need to add &amp;quot;&amp;lt;tt&amp;gt;--ranks-per-node=X&amp;lt;/tt&amp;gt;&amp;quot; to the &amp;quot;runjob arguments&amp;quot; field.&lt;br /&gt;
&lt;br /&gt;
Apart from debugging, this environment is also useful for building libraries and applications that need to run small tests as part of their 'configure' step.   Within the debugjob session, applications compiled with the bgxl compilers or the mpcc/mpCC/mpfort wrappers, will automatically run on the BGQ, skipping the need for the runjob command, provided if you set the following environment variables &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export BG_PGM_LAUNCHER=yes&lt;br /&gt;
$ export RUNJOB_NP=1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The latter setting sets the number of mpi processes to run.  Most configure scripts expect only one mpi process, thus, &amp;lt;tt&amp;gt;RUNJOB_NP=1&amp;lt;/tt&amp;gt; is appropriate.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
debugjob session with an executable implicitly calls runjob  with 1 mpi task :&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
debugjob -i&lt;br /&gt;
**********************************************************&lt;br /&gt;
 Interactive BGQ runjob shell using bgq-fen1-ib0.10295.0 and           &lt;br /&gt;
 LL14040718574824 for 30 minutes with 64 NODES (1024 cores). &lt;br /&gt;
 IMPLICIT MODE: running an executable implicitly calls runjob&lt;br /&gt;
                with 1 mpi task&lt;br /&gt;
 Exit shell when finished.                                &lt;br /&gt;
**********************************************************&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Sub-block jobs ===&lt;br /&gt;
&lt;br /&gt;
BGQ allows multiple applications to share the same block, which is referred to as sub-block jobs, however this needs to be done from within the same loadleveler submission script using multiple calls to runjob.  To run a sub-block job, you need to specify a &amp;quot;--corner&amp;quot; within the block to start each job and a 5D Torus AxBxCxDxE &amp;quot;--shape&amp;quot;.  The starting corner will depend on the specific block details provided by loadleveler and the shape and size of job trying to be used.  &lt;br /&gt;
&lt;br /&gt;
Figuring out what the corners and shapes should be is very tricky (especially since it depends on the block you get allocated).  For that reason, we've created a script called &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; that determines the corners and shape of the sub-blocks.  It only handles the (presumable common) case in which you want to subdivide the block into n equally sized sub-blocks, where n may be 1,2,4,8,16 and 32.&lt;br /&gt;
&lt;br /&gt;
Here is an example script calling &amp;lt;tt&amp;gt;subblocks&amp;lt;/tt&amp;gt; with a size of 4 that will return the appropriate $SHAPE argument and an array of 16 starting $CORNER. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# @ job_name           = bgsubblock&lt;br /&gt;
# @ job_type           = bluegene&lt;br /&gt;
# @ comment            = &amp;quot;BGQ Job SUBBLOCK &amp;quot;&lt;br /&gt;
# @ error              = $(job_name).$(Host).$(jobid).err&lt;br /&gt;
# @ output             = $(job_name).$(Host).$(jobid).out&lt;br /&gt;
# @ bg_size            = 64&lt;br /&gt;
# @ wall_clock_limit   = 30:00&lt;br /&gt;
# @ bg_connectivity    = Torus&lt;br /&gt;
# @ queue&lt;br /&gt;
&lt;br /&gt;
# Using subblocks script to set $SHAPE and array of ${CORNERS[n]}&lt;br /&gt;
# with size of subblocks in nodes (ie similiar to bg_size)&lt;br /&gt;
&lt;br /&gt;
# In this case 16 sub-blocks of 4 cnodes each (64 total ie bg_size)&lt;br /&gt;
source subblocks 4&lt;br /&gt;
&lt;br /&gt;
# 16 jobs of 4 each&lt;br /&gt;
for (( i=0; i &amp;lt;  16 ; i++)); do&lt;br /&gt;
   runjob --corner ${CORNER[$i]} --shape $SHAPE --np 64 --ranks-per-node=16 :  your_code_here &amp;gt; $i.out &amp;amp;&lt;br /&gt;
done&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Remember that subjobs are not the ideal way to run on the BlueGene/Qs. One needs to consider that these sub-blocks all have to share the same I/O nodes, so for I/O intensive jobs this will be an inefficient setup.  Also consider that if you need to run such small jobs that you have to run in sub-blocks, it may be more efficient to use other clusters such as the GPC.&lt;br /&gt;
&lt;br /&gt;
Let us know if you run into any issues with this technique, please contact bgq-support for help.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The BGQ has its own dedicated 500TB file system based on GPFS (General Parallel File System). There are two main systems for user data: /home, a small, backed-up space where user home directories are located, and /scratch, a large system for input or output data for jobs; data on /scratch is not backed up. The path to your home directory is in the environment variable $HOME, and will look like /home/G/GROUP/USER, .  The path to your scratch directory is in the environment variable $SCRATCH, and will look like /scratch/G/GROUP/USER (following the conventions of the rest of the SciNet systems).  &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! | file system &lt;br /&gt;
! | purpose &lt;br /&gt;
! | user quota &lt;br /&gt;
! | backed up&lt;br /&gt;
! | purged&lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 50 GB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
|-&lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| first of (20 TB ; 1 million files)&lt;br /&gt;
| no&lt;br /&gt;
| not currently&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Transfering files===&lt;br /&gt;
The BGQ GPFS file system,  except for HPSS, is '''not''' shared with the other SciNet systems (gpc, tcs, p7, arc), nor is the other file system mounted on the BGQ.  &lt;br /&gt;
Use scp to copy files from one file system to the other, e.g., from bgqdev-fen1, you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour login.scinet.utoronto.ca:code.tgz .&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or from a login node you could do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  $ scp -c arcfour code.tgz bgqdev.scinet.utoronto.ca:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The flag &amp;lt;tt&amp;gt;-c arcfour&amp;lt;/tt&amp;gt; is optional. It tells scp (or really, ssh), to use a non-default encryption. The one chosen here, arcfour, has been found to speed up the transfer by a factor of two (you may expect around 85MB/s).  This encryption method is only recommended for copying from the BGQ file system to the regular SciNet GPFS file system or back. &lt;br /&gt;
 &lt;br /&gt;
Note that although these transfers are witihin the same data center, you have to use the full names of the systems, login.scinet.utoronto.ca and bgq.scinet.utoronto.ca, respectively, and that you will be asked you for your password.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''diskUsage'''&amp;lt;/tt&amp;gt; command, available on the bgqdev nodes, provides information in a number of ways on the home and scratch file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;] [-de|-plot]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
       -plot: create plots of disk usages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that the information on usage and quota is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
===Bridge to HPSS===&lt;br /&gt;
&lt;br /&gt;
BGQ users may transfer material to/from HPSS via the GPC archive queue. On the HPSS gateway node (gpc-archive01), the BGQ GPFS file systems are mounted under a single mounting point /bgq (/bgq/scratch and /bgq/home). For detailed information on the use of HPSS [https://support.scinet.utoronto.ca/wiki/index.php/HPSS please read the HPSS wiki section.]&lt;br /&gt;
&lt;br /&gt;
== Software modules installed on the BGQ ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! |Software  &lt;br /&gt;
! | Version&lt;br /&gt;
! | Comments&lt;br /&gt;
! | Command/Library&lt;br /&gt;
! | Module Name&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Compilers &amp;amp; Development Tools'''''&lt;br /&gt;
|-&lt;br /&gt;
|IBM fortran compiler&lt;br /&gt;
|14.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlf,bgxlf_r,bgxlf90,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|xlf&lt;br /&gt;
|-&lt;br /&gt;
|IBM c/c++ compilers&lt;br /&gt;
|12.1&lt;br /&gt;
|These are cross compilers&lt;br /&gt;
|&amp;lt;tt&amp;gt;bgxlc,bgxlC,bgxlc_r,bgxlC_r,...&amp;lt;/tt&amp;gt;&lt;br /&gt;
|vacpp&lt;br /&gt;
|-&lt;br /&gt;
|MPICH2 MPI library&lt;br /&gt;
|1.4.1&lt;br /&gt;
|There are 4 versions (see BGQ Applications Development document).&lt;br /&gt;
|&amp;lt;tt&amp;gt;mpicc,mpicxx,mpif77,mpif90&amp;lt;/tt&amp;gt;&lt;br /&gt;
|mpich2&lt;br /&gt;
|- &lt;br /&gt;
| GCC Compiler&lt;br /&gt;
| 4.4.6, 4.8.1&lt;br /&gt;
| GNU Compiler Collection for BGQ&amp;lt;br&amp;gt;(4.8.1 requires binutils/2.23 to be loaded)&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-gcc, powerpc64-bgq-linux-g++, powerpc64-bgq-linux-gfortran&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgqgcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Clang Compiler&lt;br /&gt;
| r217688-20140912, r263698-20160317&lt;br /&gt;
| Clang cross-compilers for bgq&lt;br /&gt;
| &amp;lt;tt&amp;gt;powerpc64-bgq-linux-clang, powerpc64-bgq-linux-clang++&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;bgclang&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Binutils&lt;br /&gt;
| 2.21.1, 2.23&lt;br /&gt;
| Cross-compilation utilities&lt;br /&gt;
| &amp;lt;tt&amp;gt;addr2line, ar, ld, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;binutils&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| CMake	&lt;br /&gt;
| 2.8.8, 2.8.12.1&lt;br /&gt;
| cross-platform, open-source build system&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Git&lt;br /&gt;
| 1.9.5&lt;br /&gt;
| Revision control system&lt;br /&gt;
| &amp;lt;tt&amp;gt;git, gitk&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;git&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Debug/performance tools'''''&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/gdb/ gdb]&lt;br /&gt;
| 7.2&lt;br /&gt;
| GNU Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.gnu.org/software/ddd/ ddd]&lt;br /&gt;
| 3.3.12&lt;br /&gt;
| GNO Data Display Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [http://www.allinea.com/products/ddt/ DDT]&lt;br /&gt;
| 4.1, 4.2, 5.0.1&lt;br /&gt;
| Allinea's Distributed Debugging Tool&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddt&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[HPCTW]]&lt;br /&gt;
| 1.0&lt;br /&gt;
| BGQ MPI and Hardware Counters&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmpihpm.a, libmpihpm_smp.a, libmpitrace.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hptibm&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[MemP]]&lt;br /&gt;
| 1.0.3&lt;br /&gt;
| BGQ Memory Stats&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmemP.a &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;memP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Storage tools/libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.9-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/189-v18-serial-xlc*&amp;lt;br/&amp;gt;hdf5/189-v18-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.12-v18&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5/1812-v18-serial-gcc&amp;lt;br/&amp;gt;hdf5/1812-v18-mpich2-gcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NetCDF&lt;br /&gt;
| 4.2.1.1&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncdump,ncgen,libnetcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;netcdf/4.2.1.1-serial-xlc*&amp;lt;br/&amp;gt;netcdf/4.2.1.1-mpich2-xlc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Parallel NetCDF&lt;br /&gt;
| 1.3.1&lt;br /&gt;
| Parallel scientific data storage and retrieval using MPI-IO&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpnetcdf.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parallel-netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| ESSL&lt;br /&gt;
| 5.1&lt;br /&gt;
| IBM Engineering and Scientific Subroutine Library (manual below)&lt;br /&gt;
| &amp;lt;tt&amp;gt;libesslbg,libesslsmpbg&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;essl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| WSMP&lt;br /&gt;
| 15.06.01&lt;br /&gt;
| Watson Sparse Matrix Package&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpwsmpBGQ.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;WSMP&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| FFTW&lt;br /&gt;
| 2.1.5, 3.3.2, 3.1.2-esslwrapper&lt;br /&gt;
| Fast fourier transform &lt;br /&gt;
| &amp;lt;tt&amp;gt;libsfftw,libdfftw,libfftw3, libfftw3f&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;fftw/2.1.5, fftw/3.3.2, fftw/3.1.2-esslwrapper&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAPACK + ScaLAPACK&lt;br /&gt;
| 3.4.2 + 2.0.2&lt;br /&gt;
| Linear algebra routines. A subset of Lapack may be found in ESSL as well.&lt;br /&gt;
| &amp;lt;tt&amp;gt;liblapack, libscalpack&amp;lt;/tt&amp;gt;&lt;br /&gt;
| lapack&lt;br /&gt;
|-&lt;br /&gt;
| GSL&lt;br /&gt;
| 1.15&lt;br /&gt;
| GNU Scientific Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libgsl, libgslcblas&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gsl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| BOOST&lt;br /&gt;
| 1.47.0, 1.54, 1.57&lt;br /&gt;
| C++ Boost libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libboost...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cxxlibraries/boost&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| bzip2 + szip + zlib&lt;br /&gt;
| 1.0.6 + 2.1 + 1.2.7&lt;br /&gt;
| compression libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libbz2,libz,libsz&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;compression&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| METIS&lt;br /&gt;
| 5.0.2&lt;br /&gt;
| Serial Graph Partitioning and Fill-reducing Matrix Ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;metis&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| ParMETIS&lt;br /&gt;
| 4.0.2&lt;br /&gt;
| Parallel graph partitioning and fill-reducing matrix ordering&lt;br /&gt;
| &amp;lt;tt&amp;gt;libparmetis&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parmetis&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| OpenSSL&lt;br /&gt;
| 1.0.2 &lt;br /&gt;
| General-purpose cryptography library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libcrypto, libssl&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openssl&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
| FILTLAN&lt;br /&gt;
| 1.0&lt;br /&gt;
| The Filtered Lanczos Package &lt;br /&gt;
| &amp;lt;tt&amp;gt;libdfiltlan,libdmatkit,libsfiltlan,libsmatkit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FILTLAN&amp;lt;/tt&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Scripting/interpreted languages'''''&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.6.6&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-2.6/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 2.7.3&lt;br /&gt;
| Python programming language. Modules included : numpy-1.8.0, pyFFTW-0.9.2, astropy-0.3, scipy-0.13.3, mpi4py-1.3.1, h5py-2.2.1&lt;br /&gt;
| &amp;lt;tt&amp;gt;/scinet/bgq/tools/Python/python2.7.3-20131205/bin/python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Python]]&lt;br /&gt;
| 3.2.2&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;/bgsys/tools/Python-3.2/bin/python3&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Applications'''''&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.abinit.org/ ABINIT]&lt;br /&gt;
| 7.10.4&lt;br /&gt;
| An atomic-scale simulation software suite&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;abinit&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.berkeleygw.org/ BerkeleyGW library]&lt;br /&gt;
| 1.0.4-2.0.0436&lt;br /&gt;
| Computes quasiparticle properties and the optical responses of a large variety of materials&lt;br /&gt;
| &amp;lt;tt&amp;gt;libBGW_wfn.a, wfn_rho_vxc_io_m.mod&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;BGW-paratec&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.cp2k.org/ CP2K]&lt;br /&gt;
| 2.3, 2.4, 2.5.1, 2.6.1&lt;br /&gt;
| DFT molecular dynamics, MPI &lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k.psmp&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cp2k&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.cpmd.org/ CPMD]&lt;br /&gt;
| 3.15.3, 3.17.1&lt;br /&gt;
| Carr-Parinello molecular dynamics, MPI&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd.x&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| gnuplot&lt;br /&gt;
| 4.6.1&lt;br /&gt;
| interactive plotting program to be run on front-end nodes&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| LAMMPS&lt;br /&gt;
| Nov 2012/7Dec15/7Dec15-mpi&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;lmp_bgq&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;lammps&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NAMD&lt;br /&gt;
| 2.9&lt;br /&gt;
| Molecular Dynamics &lt;br /&gt;
| &amp;lt;tt&amp;gt;namd2&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;namd/2.9-smp&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.quantum-espresso.org/index.php Quantum Espresso]&lt;br /&gt;
| 5.0.3/5.2.1&lt;br /&gt;
| Molecular Structure / Quantum Chemistry &lt;br /&gt;
| &amp;lt;tt&amp;gt;qe_pw.x, etc&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;espresso&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[BGQ_OpenFOAM | OpenFOAM]]&lt;br /&gt;
| 2.2.0, 2.3.0, 2.4.0, 3.0.1&lt;br /&gt;
| Computational Fluid Dynamics&lt;br /&gt;
| &amp;lt;tt&amp;gt;icofoam,etc. &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openfoam/2.2.0, openfoam/2.3.0, openfoam/2.4.0&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Beta Tests'''''&lt;br /&gt;
|-&lt;br /&gt;
| WATSON API&lt;br /&gt;
| beta&lt;br /&gt;
| Natural Language Processing&lt;br /&gt;
| &amp;lt;tt&amp;gt;watson_beta&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;FEN/WATSON&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== OpenFOAM on BGQ ===&lt;br /&gt;
[https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ_OpenFOAM How to use OpenFOAM on BGQ]&lt;br /&gt;
&lt;br /&gt;
== Python on BlueGene ==&lt;br /&gt;
Python 2.7.3 has been installed on BlueGene. To use &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Numpy&amp;lt;/span&amp;gt; and &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;Scipy&amp;lt;/span&amp;gt;, the module &amp;lt;span style=&amp;quot;color: red;font-weight: bold;&amp;quot;&amp;gt;essl/5.1&amp;lt;/span&amp;gt; has to be loaded.&lt;br /&gt;
The full python path has to be provided (otherwise the default version is used).&lt;br /&gt;
&lt;br /&gt;
To use python on BlueGene (from within a job script or a debugjob session):&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load python/2.7.3&lt;br /&gt;
##Only if you need numpy/scipy :&lt;br /&gt;
module load xlf/14.1 essl/5.1&lt;br /&gt;
runjob --np 1 --ranks-per-node=1 --envs HOME=$HOME LD_LIBRARY_PATH=$LD_LIBRARY_PATH PYTHONPATH=/scinet/bgq/tools/Python/python2.7.3-20131205/lib/python2.7/site-packages/ : /scinet/bgq/tools/Python/python2.7.3-20131205/bin/python2.7 /PATHOFYOURSCRIPT.py &lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to use the mmap python API, you must use it in PRIVATE mode as shown in the bellow example :&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
import mmap&lt;br /&gt;
mm=mmap.mmap(-1,256,mmap.MAP_PRIVATE)&lt;br /&gt;
mm.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, you can use the mpi4py and h5py modules.&lt;br /&gt;
&lt;br /&gt;
Also, please read Cython documentation.&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
#BGQ Day: Introduction to Using the BG/Q [[Media:BgqintroUpdatedMarch2015.pdf|Slides (updated in 2015) ]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.mp4 (direct link)]&lt;br /&gt;
#BGQ Day: BG/Q Hardware Overview [https://support.scinet.utoronto.ca/~northrup/bgqhardware.pdf Slides] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.html Video recording] [http://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.mp4 (direct link)]&lt;br /&gt;
# [http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUQUEEN/Documentation/Documention_node.html Julich BGQ Documentation]&lt;br /&gt;
# [https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q Argonne Mira BGQ Wiki]&lt;br /&gt;
# [https://computing.llnl.gov/tutorials/bgq/ LLNL Sequoia BGQ Info]&lt;br /&gt;
# [https://www.alcf.anl.gov/presentations Argonne MiraCon Presentations]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_SysAdmin.pdf|BGQ System Administration Guide]]&lt;br /&gt;
# IBM Red Books [[Media:BGQ_Red_AppDev.pdf|BGQ Application Development]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqccompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqclangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL C/C++ for Blue Gene/Q: [[Media:bgqcproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfgetstart.pdf|Getting started]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqfcompiler.pdf|Compiler reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:bgqflangref.pdf|Language reference]]&lt;br /&gt;
# IBM XL Fortran for Blue Gene/Q: [[Media:Bgqfproguide.pdf|Optimization and Programming Guide]]&lt;br /&gt;
# [[Media:essl51.pdf|IBM ESSL (Engineering and Scientific Subroutine Library) 5.1 for Linux on Power]]&lt;br /&gt;
# [http://content.allinea.com/downloads/userguide.pdf Allinea DDT 4.1 User Guide]&lt;br /&gt;
# [https://www.ibm.com/support/knowledgecenter/en/SSFJTW_5.1.0/loadl.v5r1_welcome.html IBM LoadLeveler 5.1]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--  PUT IN TRAC !!!&lt;br /&gt;
&lt;br /&gt;
=== *Manual Block Creation* ===&lt;br /&gt;
&lt;br /&gt;
To reconfigure the BGQ nodes you can use the bg_console or the web based navigator from the service node &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bg_console&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are various options to create block types (section 3.2 in the BGQ admin manual), but the smallest is created using the&lt;br /&gt;
following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gen_small_block &amp;lt;blockid&amp;gt; &amp;lt;midplane&amp;gt; &amp;lt;cnodes&amp;gt; &amp;lt;nodeboard&amp;gt; &lt;br /&gt;
gen_small_block  R00-M0-N03-32 R00-M0 32 N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The block then needs to be booted using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
allocate R00-M0-N03-32&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If those resources are already booted into another block, that block must be freed before the new block can be &lt;br /&gt;
allocated.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
free R00-M0-N03&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many other functions in bg_console:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
help all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The BGQ default nomenclature for hardware is as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
(R)ack - (M)idplane - (N)ode board or block - (J)node - (C)ore&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So R00-M01-N03-J00-C02 would correspond to the first rack, second midplane, 3rd block, 1st node, and second core.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--!&amp;gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Bgqfproguide.pdf&amp;diff=437</id>
		<title>File:Bgqfproguide.pdf</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Bgqfproguide.pdf&amp;diff=437"/>
		<updated>2018-05-25T18:08:46Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: Fertinaz uploaded a new version of File:Bgqfproguide.pdf&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Bgqfproguide.pdf&amp;diff=436</id>
		<title>File:Bgqfproguide.pdf</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Bgqfproguide.pdf&amp;diff=436"/>
		<updated>2018-05-25T18:06:38Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: Fertinaz uploaded a new version of File:Bgqfproguide.pdf&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Bgqfproguide.pdf&amp;diff=435</id>
		<title>File:Bgqfproguide.pdf</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Bgqfproguide.pdf&amp;diff=435"/>
		<updated>2018-05-25T18:05:44Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Bgqflangref.pdf&amp;diff=434</id>
		<title>File:Bgqflangref.pdf</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Bgqflangref.pdf&amp;diff=434"/>
		<updated>2018-05-25T18:04:57Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: Fertinaz uploaded a new version of File:Bgqflangref.pdf&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Bgqflangref.pdf&amp;diff=433</id>
		<title>File:Bgqflangref.pdf</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Bgqflangref.pdf&amp;diff=433"/>
		<updated>2018-05-25T18:04:17Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Bgqfcompiler.pdf&amp;diff=432</id>
		<title>File:Bgqfcompiler.pdf</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Bgqfcompiler.pdf&amp;diff=432"/>
		<updated>2018-05-25T18:02:55Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=File:Bgqfgetstart.pdf&amp;diff=431</id>
		<title>File:Bgqfgetstart.pdf</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=File:Bgqfgetstart.pdf&amp;diff=431"/>
		<updated>2018-05-25T18:02:05Z</updated>

		<summary type="html">&lt;p&gt;Fertinaz: Fertinaz uploaded a new version of File:Bgqfgetstart.pdf&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Fertinaz</name></author>
	</entry>
</feed>