P7

P7 Cluster (P7)
P7 Cluster (P7)
Installed	May 2011, March 2013
Operating System	Linux (RHEL 6.3)
Number of Nodes	8 (256 cores)
Interconnect	Infiniband (2 DDR/node )
Ram/Node	128 Gb
Cores/Node	32 (128 Threads)
Login/Devel Node	p701 (from login.scinet)
Vendor Compilers	xlc/xlf
Queue Submission	LoadLeveler

Specifications

The P7 Cluster consists of 8 IBM Power 755 Servers each with 4x 8core 3.3GHz Power7 CPUs and 128GB Ram. Similar to the Power 6, the Power 7 utilizes Simultaneous Multi Threading (SMT), but extends the design from 2 threads per core to 4. This allows the 32 physical cores to support up to 128 threads which in many cases can lead to significant speedups.

Login

First login via ssh with your scinet account at bgqdev.scinet.utoronto.ca, and from there you can proceed to p7n01-ib0 which is currently the gateway/devel node for this cluster. It is recommended that you modify your .bashrc files to distinguish between the P7 and other systems that use the same file system to avoid module confusion, by including something like

case $(hostname -s) in
    p7*)
      MACHINE=p7
      # commands for p7
    ;;
    bgq*)  
      MACHINE=bgq
      # commands for bgq
    ;;
    sgc*) 
      MACHINE=sgc
      # commands for sgc
    ;;
    *)    
      MACHINE=unknown
    ;;
esac

Compiler/Devel Node

From p7n01-ib0 you can compile, do short tests, and submit your jobs to the queue.

Software

GNU Compilers

gcc/g++/gfortran version 4.4.4 is the default with RHEL 6.3 and is available by default. Gcc 4.6.1 is available as a separate module. However, it is recommended to use the IBM compilers (see below).

IBM Compilers

To use the IBM Power specific compilers xlc/xlc++/xlf you need to load the following modules

$ module load vacpp xlf

NOTE: Be sure to use "-q64" when using the IBM compilers.

MPI

IBM's POE is available and will work with both the IBM and GNU compilers.

$ module load pe

The mpi wrappers for C, C++ and Fortran 77/90 are mpicc, mpicxx, and mpif77/mpif90, respectively (but mpcc, mpCC and mpfort should also work).

Note: To use the full C++ bindings of MPI (those in the MPI namespace) in c++ code, you need to add -cpp to the compilation command, and you need to add -Wl,--allow-multiple-definition to the link command if you are linking several object files that use the MPI c++ bindings.

Spark Standalone

To run Spark you need to previously load JRE1.7.0 via JDK

p7n01-$ module load jdk/JRE1.7.0

Then load Spark as follow :

p7n01-$ module load spark/1.4.1

Spark SQL

The current build of spark/1.5.0 supports Spark SQL

p7n01-$ module load jdk/JRE1.7.0 
p7n01-$ module load spark/1.5.0
p7n01-$ module load hadoop/2.3.0

Sample Spark script

We recommend you to read the following blog post by Jonathan Dursi to build your first Spark script : http://www.dursi.ca/spark-in-hpc-clusters/

Prior to submitting sparkscript.py, change the import line to

from pyspark.context import SparkContext

Or instead of submitting sparkscript.py, you can also try :

spark-submit --master $sparkmaster --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/target/spark-examples_2.10-1.4.1.jar 256

Submit a Job

The current Scheduler is IBM's LoadLeveler. Be sure to include the @environment flags shown below in the sample script as they are different and necessary to get full performance.

#!/bin/bash
##===================================
## P7 Load Leveler Submission Script
##===================================
##
## Don't change these parameters unless you really know what you are doing
##
##@ environment = MP_INFOLEVEL=0; MP_USE_BULK_XFER=yes; MP_BULK_MIN_MSG_SIZE=64K; \
##                MP_EAGER_LIMIT=64K; MP_DEBUG_ENABLE_AFFINITY=no
##
##===================================
## Avoid core dumps
## @ core_limit   = 0
##===================================
## Job specific
##===================================
#
# @ job_name = myjob
# @ job_type = parallel
# @ class = verylong
# @ output = $(jobid).out
# @ error = $(jobid).err
# @ wall_clock_limit = 2:00:00
# @ node = 2
# @ tasks_per_node = 128
# @ queue
#
#===================================

#./my_script
./my_code

llsubmit myjob.ll

To show running jobs use

llq

To cancel a job use

llcancel JOBID

Split a Spark job

e.g., To split a job into 256 tasks among 2 workers, you must select 3 nodes (one master and 2 workers) and add the following job specifications :

#@node = 3
#@preferences = Machine == { "AvailableNode1" "AvailableNode2" "AvailableNode3"}
#@task_per_node = 128

Monitor your (Spark) job from localhost

Spark creates a web UI on each master and slave that you can access from your local web browser. You can notably "check your cluster UI to ensure that workers are registered and have sufficient resources". To do so, you must logged onto P7 (again) with forwarding the port of your cluster UI to your local port (e.g., 9999) :

ssh -L 9999:masternode:4040 userid@login.scinet.utoronto.ca

Then go to your web browser at http://localhost:9999

P7

Contents