Performance And Debugging Tools: Niagara
Memory Profiling
Valgrind
Valgrind http://valgrind.org/ is a suite of tools for debugging and profiling programs especially useful for finding memory problems, such as memory leaks and segfaults. To use it on the GPC you must first load the module valgrind.
module load valgrind
In serial valgrind can simply by run as follows, with no need to recompile your binary
valgrind --tool=memcheck ./a.out
there are many useful flags such as --leak-check=yes and --show-reachable=yes that can be found by running valgrind --help on consulting the man pages.
Valgrind can also be used in parallel with MPI as well, http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mpiwrap in a similar fashion, however a library needs to preloaded first.
LD_PRELOAD=${SCINET_VALGRIND_ROOT}/lib/valgrind/libmpiwrap-amd64-linux.so mpirun -np 2 valgrind --tool=memcheck ./a.out
Besides being a extremely good at finding memory problems, valgrind comes with a tool called cachegrind which can find cache use problems in your code; its use is described in our Intro To Performance.
Debugging
gdb
GDB is solid source-language level serial debugger. This SNUG TechTalk from Nov 2010 introduces debugging with gdb. gdb will debug code compiled with both intel and gcc compilers. Use module load gdb to ensure you are using the most recent version of gdb.
gdb -tui launches a text-based interface that functions a bit like a gui for debugging code, and can be handy.
ddd
Data Display Debugger (ddd) is a graphical wrapper to the gdb or idb debuggers. It is quite handy for debugging scientific programs as it has fairly sophisticated data-plotting features where array variables can be plotted dynamically as the program progresses. As with idb, however, using the graphical interface over a network connection can be quite laggy and slow. Use module load ddd to load ddd into your environment.
ddt
ddt is Allinea's graphical parallel debugger, in the ddt module. Highly recommended! Use module load ddt to load ddt into your environment.
Performance Profiling
gprof
gprof is a very useful tool for finding out where a program is spending its time; its use is described in our Intro To Performance.
Open|SpeedShop
OpenSpeedshop is a tool for performing sampling and MPI tracing of a program on the GPC. It's use is outlined in our Intro to Performance. It is right now compiled for use only with openmpi and gcc-4.4; to use it,
module load gcc openmpi openspeedshop
To run a sampling experiment with the program, run the program through openspeedshop then use the same program to view the results:
$ openss -f "./a.out" pcsamp [program runs as usual, outputs extra performance data at the end] $ openss -f [experimentname].openss
It can also trace MPI calls and give detailed statistics about the time spent in MPI routines by process number; to test this on a run done on one of the devel nodes,
$ module load gcc openmpi/1.3.2-gcc-v4.4.0-ofed openspeedshop $ openss -f "mpirun -np 6 ./a.out" mpit [program runs as usual, outputs extra performance data at the end] $ openss -f [experimentname].openss
this can also be used to perform tracing on batch jobs by ensuring module load gcc openmpi/1.3.2-gcc-v4.4.0-ofed openspeedshop is in your .bashrc and mpirun-ing the program as shown above.
Scalasca
Scalasca is a sophisticated tool for analyzing performance and finding common performance problems. We describe it in our Intro to Performance.