Analyzing Loops |
3 |
![]() |
This chapter is organized as follows:
Basic Concepts
LoopTool's main features include the ability to:
LoopReport is the command-line version of LoopTool. For more information about LoopReport, see SunSoft WorkShop: Command Line Options.
Using LoopTool is like using gprof. The three major steps are: compile, run, and analyze.
Note - The following examples use the Fortran MP (f77 and f90) compiler. The options shown (such as -xparallel, -Zlp) also work for MP C.
% setenv PARALLEL `/usr/sbin/psrinfo | wc -l` |
Note - If you have installed LoopTool in a nondefault directory, substitute that path for the one shown here.
% setenv XUSERFILESEARCHPATH \
/opt/SUNWspro/lib/sunpro_defaults/looptool.res
% setenv LD_LIBRARY_PATH /usr/dt/lib:$LD_LIBRARY_PATH |
% setenv LD_LIBARY_PATH \ /opt/SUNWspro/Motif_Solaris24/dt/lib:$LD_LIBRARY_PATH |
% f77 -xO4 -xparallel -Zlp source_file |
For additional information, see "Loading a Timing File" on page 26.
Note - All examples apply to Fortran 77, Fortran 90 and C programs.
There are a number of other useful options for looking at and parallelizing loops. Some of these options are shown in Table 3-1 below.
For more information, see "Other Compilation Options" on page 32.
Run The Program
After compiling with -Zlp, run the instrumented executable. This creates the loop timing file, program.looptimes. LoopTool processes two files: the instrumented executable and the loop timing file.
Starting LoopTool
You can start LoopTool by giving it the name of a program (that is, an executable) to load:
% looptool program & |
% looptool & |
LoopReport is usually started like this:
% loopreport program & |
Loading a Timing File
LoopTool reads the timing file associated with your program. The timing file contains information about loops. Typically, this file has a name of the format program.looptimes and is in the same directory as your program.
% looptool program & |
If you name a timing file on the command line, then LoopTool and LoopReport use it.
% looptool program program.looptimes & |
% looptool -p timing_file_directory program & |
If the environment variable LVPATH is set, the tools check that directory for a timing file.
% setenv LVPATH timing_file_directory % looptool program & |
Using LoopTool
The Main Window
The main window displays the runtimes of your program's loops in a bar chart arranged in the order that the source files were presented to the compiler.
Figure 3-1 LoopTool Main Window
Opening Files
Choose File Open from the File menu in the main window to open executable and timing files.
For more information about opening files, see the LoopTool section of the WorkShop Online Help.
Figure 3-2 LoopReport
Printing the LoopTool Graph
The Options pop-up window lets you choose an editor for editing source code. The editors are vi, gnuemacs, and xemacs. See "Getting Hints and Editing Source Code" on page 30 for more on editing source code.
For more information about choosing an editor see the WorkShop Online Help.
Note - vi and xemacs are installed with LoopTool into your install directory (usually /opt/SUNWspro/bin) if they're not already on your system. You must provide gnuemacs yourself. In all cases, the editor you want must be in a directory that's in your search path in order for LoopTool to find it. For example, your PATH environment variable should include /usr/ucb if that's where vi is located on your system.
Getting Hints and Editing Source Code
Clicking a loop in the main window (Figure 3-1) does two things:
Click for closeup view.
Figure 3-3 shows the editor and hint windows:
Warning - If you edit your source code, line numbers shown by LoopTool may become inconsistent with the source. You must save and recompile the edited source and then run LoopTool with the new executable, producing new loop information, for the line numbers to remain consistent.
Either -xO3 or -xO4 can be used with -xparallel. If you don't specify -xO3 or -xO4 but you do use -xparallel, then -xO3 is added. Table 3-2 summarizes how switches are added.
Other compilation options include -xexplicitpar and -xloopinfo.
subroutine adj(a,b,c,x,n) real*8 a(n), b(n), c(-n:0), x integer n c$par DOALL do 19 i = 1, n*n do 29 k = i, n*n a(i) = a(i) + x*b(k)*c(i-k) 29 continue 19 continue return end |
When you use -Zlp by itself, -xdepend and -xO3 are added. The switch -xdepend instructs the compiler to perform the data dependency analysis that it needs to do to identify loops. The switch -xparallel includes -xdepend, but -xdepend does not imply (or trigger) -xparallel.
% f77 -xO3 -parallel -xloopinfo -Zlp gamteb.F 2> gamteb.loopinfo |
For some useful explanations and tips, read the sections in the Sun WorkShop Fortran: User's Guide that address parallelization.
Note - The hints are heuristics gathered by the compiler during the optimization pass. They should be understood in that context; they are not absolute facts about the code generated for a given loop. However, the hints are often very useful indications of how you can transform your code so that the compiler can perform more aggressive optimizations, including parallelizing loops.
Table 3-3 lists the hints about optimizations applied to loops.
c$par DOALL do 19 i = 1, n*n do 29 k = i, n*n a(i) = a(i) + x*b(k)*c(i-k) call foo() 29 continue 19 continue |
do 99 i=1,n do 99 j = 1,m a[i, j+1] = a[i,j] + a[i,j-1] 99 continue |
Some particular optimizations that can cause confusion are described in the following sections.
If the compiler hints seem particularly opaque, consider compiling with -O3
In particular, "phantom" loops--that is, loops that the compiler claims exist, but you know do not exist in your source code--could well be a symptom of inlining.
LoopTool attempts to provide hints that make as much sense as possible, but given the nature of the problem of associating optimized code with source code, the hints may be misleading. For more information on what optimizations do for your code, refer to compiler books such as Compilers: Principles, Techniques and Tools by Aho, Sethi and Ullman.
However, the outer loop is assigned only the runtime of its child, the parallel loop, which will be the runtime of the longest parallel instantiation of the inner loop. This double timing leads to the anomaly of the outer loop apparently consuming less time than the inner loop.