Analyzing Loops

3

The Fortran MP and MP C compilers automatically parallelize loops for which they determine that it is safe and profitable to do so. LoopTool is a performance analysis tool that reads loop timing files created by these compilers. LoopTool has a graphical user interface (GUI); LoopReport (which is discussed in Sun WorkShop: Command Line Utilities) is the command-line version of LoopTool.

This chapter is organized as follows:

Basic Concepts

page 23

Setting Up Your Environment

page 24

Creating a Loop Timing File

page 25

Starting LoopTool

page 26

Using LoopTool

page 27

Other Compilation Options

page 32

Compiler Hints

page 34

Compiler Optimizations and How They Affect Loops

page 38

Basic Concepts	page 23
Setting Up Your Environment	page 24
Creating a Loop Timing File	page 25
Starting LoopTool	page 26
Using LoopTool	page 27
Other Compilation Options	page 32
Compiler Hints	page 34
Compiler Optimizations and How They Affect Loops	page 38

Basic Concepts

LoopTool's main features include the ability to:

Time all loops, whether serial or parallel
Produce a table of loop timings
Collect hints from the compiler during compilation. These hints can help you parallelize loops that were not parallelized. Hints are described further in "Compiler Hints" on page 34.

LoopTool displays a graph of loop runtimes and shows which loops were parallelized. You can go directly from the graphical display of loops to the source code for any loop you want, so you can edit your source code while in LoopTool.

LoopReport is the command-line version of LoopTool. For more information about LoopReport, see SunSoft WorkShop: Command Line Options.

Using LoopTool is like using gprof. The three major steps are: compile, run, and analyze.

Note - The following examples use the Fortran MP (f77 and f90) compiler. The options shown (such as -xparallel, -Zlp) also work for MP C.

Setting Up Your Environment

1. Before compiling, set the environment variable PARALLEL to the number of processors on your machine.

The following command makes use of psrinfo, a system utility. Note the backquotes:


% setenv PARALLEL `/usr/sbin/psrinfo \| wc -l`

Note - If you have installed LoopTool in a nondefault directory, substitute that path for the one shown here.

2. Before starting LoopTool, make sure the environment variable XUSERFILESEARCHPATH is set:


% `setenv XUSERFILESEARCHPATH \ /opt/SUNWspro/lib/sunpro_defaults/looptool.res`

3. Set LD_LIBRARY_PATH.

If you are running Solaris 2.5:


% `setenv LD_LIBRARY_PATH /usr/dt/lib:$LD_LIBRARY_PATH`

If you are running Solaris 2.3 or 2.4:


% `setenv LD_LIBARY_PATH \` `/opt/SUNWspro/Motif_Solaris24/dt/lib:$LD_LIBRARY_PATH`

You may want to put these commands in a shell startup file (such as .cshrc or .profile).

Creating a Loop Timing File

To compile for automatic parallelization, typical compilation switches are -xparallel and -xO4. To compile for LoopTool, add -Zlp, as shown in the following example:


% f77 -xO4 -xparallel -Zlp source_file

Note - All examples apply to Fortran 77, Fortran 90 and C programs.

For additional information, see "Loading a Timing File" on page 26.

There are a number of other useful options for looking at and parallelizing loops. Some of these options are shown in Table 3-1 below.

Table 3-1 Some Useful Compiler Options

Option
Effect

-o program

Renames the executable to program

-xexplicitpar

Parallelizes loops marked with DOALL pragma

-xloopinfo

Prints hints to stderr for redirection to files

Table 3-1 Some Useful Compiler Options
Option	Effect
`-o` `program`	Renames the executable to program
`-xexplicitpar`	Parallelizes loops marked with `DOALL` pragma
`-xloopinfo`	Prints hints to `stderr` for redirection to files

For more information, see "Other Compilation Options" on page 32.

Run The Program

After compiling with -Zlp, run the instrumented executable. This creates the loop timing file, program.looptimes. LoopTool processes two files: the instrumented executable and the loop timing file.

Starting LoopTool

You can start LoopTool by giving it the name of a program (that is, an executable) to load:


% looptool program &

You can also start the tools with no files specified. In this case, LoopTool's file chooser comes up automatically so you can select a file to examine:

% looptool &

% looptool &

LoopReport is usually started like this:

% loopreport program &

% loopreport program &

Loading a Timing File

LoopTool reads the timing file associated with your program. The timing file contains information about loops. Typically, this file has a name of the format program.looptimes and is in the same directory as your program.

By default, LoopTool looks in the executable's directory for a timing file. Therefore, if the timing file is there (the usual case), you don't need to specify where to look for it:

% looptool program &

% looptool `program` &

If you name a timing file on the command line, then LoopTool and LoopReport use it.

% looptool program program.looptimes &

% looptool program program`.looptimes` &

If you use the command line option -p, LoopTool and LoopReport check for a timing file in the directory indicated by -p:

% looptool -p timing_file_directory program &

% looptool -p timing_file_directory program &

If the environment variable LVPATH is set, the tools check that directory for a timing file.

% setenv LVPATH timing_file_directory

% looptool program &

% setenv LVPATH timing_file_directory % looptool program &

Using LoopTool

The Main Window

The main window displays the runtimes of your program's loops in a bar chart arranged in the order that the source files were presented to the compiler.

Figure 3-1 shows the components of the main window.

Click for closeup view.

Figure 3-1 LoopTool Main Window

Opening Files

Choose File

Open from the File menu in the main window to open executable and timing files.

There are two ways to specify the files you want to open:

Type in the name of the files to open
Bring up a file chooser.

Once you've typed in the executable's path, you don't need to type in the timing file, unless it's in a different directory or has a non-default name (or both).

For more information about opening files, see the LoopTool section of the WorkShop Online Help.

Creating a Report on All Loops

Choose File

Create Report from the File menu in the main window to open a window with detailed information on all the loops in your program (see Figure 3-2). The Help button in the report window links to the WorkShop Online Help section containing compiler hints.

Click for closeup view.

Figure 3-2 LoopReport

Printing the LoopTool Graph

: 1. Choose File Print Graph from the File menu in the main window to open the Print pop-up window.

2. Choose whether to print the graph of put it in a file.

3. Enter the name of the printer or filename where you want to send the graph.

For more information about printing see the WorkShop Online Help.

Choosing an Editor

Choose File

Options from the File menu in the main window to open the Options pop-up window.

The Options pop-up window lets you choose an editor for editing source code. The editors are vi, gnuemacs, and xemacs. See "Getting Hints and Editing Source Code" on page 30 for more on editing source code.

Note - vi and xemacs are installed with LoopTool into your install directory (usually /opt/SUNWspro/bin) if they're not already on your system. You must provide gnuemacs yourself. In all cases, the editor you want must be in a directory that's in your search path in order for LoopTool to find it. For example, your PATH environment variable should include /usr/ucb if that's where vi is located on your system.

For more information about choosing an editor see the WorkShop Online Help.

Getting Hints and Editing Source Code

Clicking a loop in the main window (Figure 3-1) does two things:

It brings up a window in which you can edit your source code (Figure 3-3). The available editors are vi, xemacs, and gnuemacs. See "Choosing an Editor" on page 30 for more information on choosing an editor.

For information on vi, see the vi(1) manual page. xemacs and gnuemacs have online help (click the Help button).

The WorkShop vi editor has a special menu, Version, that allows you to make use of the SCCS (Source Code Control System) utility for sharing files. See the LoopTool online help, as well as the sccs(1) manual page, for more information.

: 3. It brings up a separate window that displays one or more hints about the loop you've selected. The Help button in this window displays the WorkShop online help compiler hints section. See also "Compiler Hints" on page 34, which explains the hints in detail.

Click for closeup view.
Figure 3-3 shows the editor and hint windows:

Figure 3-3 The Editor and Hints Windows

Warning - If you edit your source code, line numbers shown by LoopTool may become inconsistent with the source. You must save and recompile the edited source and then run LoopTool with the new executable, producing new loop information, for the line numbers to remain consistent.

Getting Help and Sending Comments

Choose from the Help menu (shown in Figure 3-1) to:

See general help about starting and using LoopTool (Help Contents)
Send comments about LoopTool (Send Comments)
Get last-minute information (Release Notes)
Invoke On Item Help (On Item)
Access video demos of LoopTool and WorkShop features (Demos)
Access WorkShop HTML documentation (WorkShop Manuals)

Other Compilation Options

Many combinations of compile switches work for LoopTool.

Either -xO3 or -xO4 can be used with -xparallel. If you don't specify -xO3 or -xO4 but you do use -xparallel, then -xO3 is added. Table 3-2 summarizes how switches are added.

Table 3-2 Promotion of Compiler Switches

You type:
Bumped Up To:

-xparallel

-xparallel -xO3

-xparallel -Zlp

-xparallel -xO3 -Zlp

-xexplicitpar

-xexplicitpar -xO3

-xexplicitpar -Zlp

-xexplicitpar -xO3 -Zlp

-Zlp

-xdepend -xO3 -Zlp

Table 3-2 Promotion of Compiler Switches
You type:	Bumped Up To:
-xparallel	-xparallel -xO3
-xparallel -Zlp	-xparallel -xO3 -Zlp
-xexplicitpar	-xexplicitpar -xO3
-xexplicitpar -Zlp	-xexplicitpar -xO3 -Zlp
-Zlp	-xdepend -xO3 -Zlp

Other compilation options include -xexplicitpar and -xloopinfo.

The Fortran MP compiler switch -xexplicitpar is used with the pragma DOALL. If you insert DOALL before a loop in your source code, you are explicitly marking that loop for parallelization. The compiler will parallelize this loop when you compile with -xexplicitpar.

The following code fragment shows how to mark a loop explicitly for parallelization.

subroutine adj(a,b,c,x,n)

real*8 a(n), b(n), c(-n:0), x

integer n

c$par DOALL

do 19 i = 1, n*n

do 29 k = i, n*n

a(i) = a(i) + x*b(k)*c(i-k)

29 continue

19 continue

return

end

subroutine adj(a,b,c,x,n) real8 a(n), b(n), c(-n:0), x integer n c$par DOALL do 19 i = 1, nn do 29 k = i, nn a(i) = a(i) + xb(k)*c(i-k) 29 continue 19 continue return end

When you use -Zlp by itself, -xdepend and -xO3 are added. The switch -xdepend instructs the compiler to perform the data dependency analysis that it needs to do to identify loops. The switch -xparallel includes -xdepend, but -xdepend does not imply (or trigger) -xparallel.

The -xloopinfo option prints hints about loops to stderr (the UNIX standard error file, on file descriptor 2) when you compile your program. The hints include the routine names, the line number for the start of the loop, whether the loop was parallelized, and the reason it was not parallelized, if applicable.

The following example redirects hints about loops in the source file gamteb.F to the file gamtab.loopinfo:


% `f77 -xO3 -parallel -xloopinfo -Zlp gamteb.F 2> gamteb.loopinfo`

The main difference between -Zlp and -xloopinfo is that in addition to providing compiler hints about loops, -Zlp also instruments your program so that timing statistics are recorded at runtime. For this reason, also, LoopTool and LoopReport analyze only programs that have been compiled with -Zlp.

Compiler Hints

LoopTool and LoopReport present somewhat cryptic hints about the optimizations applied to a particular loop, and in particular, about why a particular loop may not have been parallelized. Some of the hints may seem to mean essentially the same thing.

Note - The hints are heuristics gathered by the compiler during the optimization pass. They should be understood in that context; they are not absolute facts about the code generated for a given loop. However, the hints are often very useful indications of how you can transform your code so that the compiler can perform more aggressive optimizations, including parallelizing loops.

For some useful explanations and tips, read the sections in the Sun WorkShop Fortran: User's Guide that address parallelization.

Table 3-3 lists the hints about optimizations applied to loops.

Table 3-3 LoopTool Hints

Hint #
Hint Definition

0

No hint available

1

Loop contains procedure call

2

Compiler generated two versions of this loop

3

Loop contains data dependency

4

Loop was significantly transformed during optimization

5

Loop may or may not hold enough work to be profitably parallelized

6

Loop was marked by user-inserted pragma, DOALL

7

Loop contains multiple exits

8

Loop contains I/O, or other function calls, that are not MT safe

9

Loop contains backward flow of control

10

Loop may have been distributed

11

Two or more loops may have been fused

12

Two or more loops may have been interchanged

Table 3-3 LoopTool Hints
Hint #	Hint Definition
0	No hint available
1	Loop contains procedure call
2	Compiler generated two versions of this loop
3	Loop contains data dependency
4	Loop was significantly transformed during optimization
5	Loop may or may not hold enough work to be profitably parallelized
6	Loop was marked by user-inserted pragma, `DOALL`
7	Loop contains multiple exits
8	Loop contains I/O, or other function calls, that are not MT safe
9	Loop contains backward flow of control
10	Loop may have been distributed
11	Two or more loops may have been fused
12	Two or more loops may have been interchanged

0. No hint available

None of the other hints applied to this loop. This hint does not mean that none of the other hints might apply; it means that the compiler did not infer any of those hints.

1. Loop contains procedure call

The loop could not be parallelized since it contains a procedure call that is not MT safe. If such a loop were parallelized, multiple copies of the loop might instantiate the function call simultaneously, trample on each other's use of any variables local to that function, or trample on return values, and generally invalidate the function's purpose. If you are certain that the procedure calls in this loop are MT safe, you can direct the compiler to parallelize this loop no matter what by inserting the DOALL pragma before the body of the loop. For example, if foo is an MT-safe function call, then you can force it to be parallelized by inserting c$par DOALL:


c$par DOALL do 19 i = 1, nn do 29 k = i, nn a(i) = a(i) + xb(k)c(i-k) call foo() 29 continue 19 continue

The computer interprets the DOALL pragmas only when you compile with -parallel or -explicitpar; if you compile with -autopar, then the compiler ignores the DOALL pragmas.

2. Compiler generated two versions of this loop

The compiler couldn't tell at compile time if the loop contained enough work to be profitable to parallelize. The compiler generated two versions of the loop, a serial version and a parallel version, and a runtime check that will choose at runtime which version to execute. The runtime check determines the amount of work that the loop has to do by checking the loop iteration values.

3. Loop contains data dependency

A variable inside the loop is affected by the value of a variable in a previous iteration of the loop. For example:


do 99 i=1,n do 99 j = 1,m a[i, j+1] = a[i,j] + a[i,j-1] 99 continue

This is a contrived example, since for such a simple loop the optimizer would simply swap the inner and outer loops, so that the inner loop could be parallelized. But this example demonstrates the concept of data dependency, often referred to as "data-carried dependency."

The compiler will often be able to tell you the names of the variables that cause the data-carried dependency. If you rearrange your program to remove (or minimize) such dependencies, then the compiler will be able to perform more aggressive optimizations.

4. Loop was significantly transformed during optimization

The compiler performed some optimizations on this loop that might make it almost impossible to associate the generated code with the source code. For this reason, line numbers may be incorrect. Examples of optimizations that can radically alter a loop are loop distribution, loop fusion, and loop interchange (see Hint 10, Hint 11, and Hint 12).

5. Loop may or may not hold enough work to be profitably parallelized

The compiler was not able to determine at compile time whether this loop held enough work to warrant parallelizing. Often loops that are labeled with this hint may also be labeled "parallelized," meaning that the compiler generated two versions of the loop (see Hint 2), and that it will be decided at runtime whether the parallel version or the serial version should be used.

Since all the compiler hints, including the flag that indicates whether or not a loop is parallelized, are generated at compile time, there's no way to be certain that a loop labeled "parallelized" actually executes in parallel. To determine whether a loop executes in parallel, you need to perform additional runtime tracing, such as can be accomplished with the Thread Analyzer. You can compile your programs with both -Zlp (for LoopTool) and -Ztha (for Thread analyzer) and compare the analysis of both tools to get as much information as possible about your program's runtime behavior.

6. Loop was marked by user-inserted pragma, DOALL

This loop was parallelized because the compiler was instructed to do so by the DOALL pragma. This hint is a useful reminder to help you easily identify those loops that you explicitly wanted to parallelize.

The DOALL pragmas are interpreted by the compiler only when you compile with -parallel or -explicitpar; if you compile with -autopar, then the compiler will ignore the DOALL pragmas.

7. Loop contains multiple exits

The loop contains a GOTO or some other branch out of the loop other than the natural loop end point. For this reason, it is not safe to parallelize the loop, since the compiler has no way of predicting the loop's runtime behavior.

8. Loop contains I/O, or other function calls, that are not MT safe

This hint is similar to Hint 1; the difference is that this hint often focuses on I/O that is not MT safe, whereas Hint 1 can refer to any sort of MT-unsafe function call.

9. Loop contains backward flow of control

The loop contains a GOTO or other control flow up and out of the body of the loop. That is, some statement inside the loop appears to the compiler to jump back to some previously executed portion of code. As with the case of a loop that contains multiple exits, this loop is not safe to parallelize.

If you can reduce or minimize backward flows of control, the compiler will be able to perform more aggressive optimizations.

10. Loop may have been distributed

The contents of the loop may have been distributed over several iterations of the loop. That is, the compiler may have been able to rewrite the body of the loop so that it could be parallelized. However, since this rewriting takes place in the language of the internal representation of the optimizer, it's very difficult to associate the original source code with the rewritten version. For this reason, hints about a distributed loop may refer to line numbers that don't correspond to line numbers in your source code.

11. Two or more loops may have been fused

Two consecutive loops were combined into one, so the resulting larger loop contains enough work to be profitably parallelized. Again, in this case, source line numbers for the loop may be misleading.

12. Two or more loops may have been interchanged

The loop indices of an inner and an outer loop have been swapped, to move data dependencies as far away from the inner loop as possible, and to enable this nested loop to be parallelized. In the case of deeply nested loops, the interchange may have occurred with more than two loops.

Compiler Optimizations and How They Affect Loops

As you might infer from the descriptions of the compiler hints, associating optimized code with source code can be tricky. Clearly, you would prefer to see information from the compiler presented to you in a way that relates as directly as possible to your source code. Unfortunately, the compiler optimizer "reads" your program in terms of its internal language, and although it tries to relate that to your source code, it is not always successful.

Some particular optimizations that can cause confusion are described in the following sections.