Sun ANSI C Compiler-Specific Information

3

The Sun ANSI C compiler is compatible with the C language described in the American National Standard for Programming Language--C, ANSI/ISO 9899-1990. This chapter documents those areas specific to the Sun ANSI C compiler.

Environment Variables

`TMPDIR`

cc normally creates temporary files in the directory /tmp. You can specify another directory by setting the environment variable TMPDIR to the directory of your choice. However, if TMPDIR is not a valid directory, cc uses /tmp. The -xtemp option has precedence over the TMPDIR environment variable.

If you use a Bourne shell, type:

$ TMPDIR=dir; export TMPDIR

If you use a C shell, type:

% setenv TMPDIR dir

`SUNPRO_SB_INIT_FILE_NAME`

The absolute path name of the directory containing the .sbinit(5) file. This variable is used only if the -xsb or -xsbfast flag is used.

`PARALLEL`

(SPARC) Refer to Environment Variable on page 73 for details.

Global Behavior: Value versus `unsigned` Preserving

A program that depends on unsigned preserving arithmetic conversions behaves differently. This is considered to be the most serious change made by ANSI C.

In the first edition of K&R, The C Programming Language (Prentice-Hall, 1978), unsigned specified exactly one type; there were no unsigned chars, unsigned shorts, or unsigned longs, but most C compilers added these very soon thereafter.

In previous C compilers, the unsigned preserving rule is used for promotions: when an unsigned type needs to be widened, it is widened to an unsigned type; when an unsigned type mixes with a signed type, the result is an unsigned type.

The other rule, specified by ANSI C, came to be called "value preserving," in which the result type depends on the relative sizes of the operand types. When an unsigned char or unsigned short is widened, the result type is int if an int is large enough to represent all the values of the smaller type. Otherwise, the result type is unsigned int. The value preserving rule produces the least surprise arithmetic result for most expressions.

Only in the -Xt and -Xs modes does the compiler use the unsigned preserving promotions; in the other modes, -Xc and -Xa, the value preserving promotion rules are used. When the -xtransition option is used, the compiler warns about each expression whose behavior might depend on the promotion rules used.

Keywords

`asm` Keyword

The _asm keyword is a synonym for the asm keyword. asm is available under all compilation modes, although a warning is issued when it is used under the -Xc mode.

The asm statement has the form:

asm("string"):

`asm("`string`"):`

where string is a valid assembly language statement.

For example:

main()

{

int i;

/* i = 10 */

asm("mov 10,%l0");

asm("st %l0,[%fp-8]");

printf("i = %d\n",i);

}

%cc foo.c

%a.out

i = 10

%

`main()` `{` `int i;` `/* i = 10 */` `asm("mov 10,%l0");` `asm("st %l0,[%fp-8]");` `printf("i = %d\n",i);` `}` `%cc foo.c` `%a.out` `i = 10` `%`

asm statements must appear within function bodies.

`_Restrict` Keyword

For a compiler to effectively perform parallel execution of a loop, it needs to determine if certain lvalues designate distinct regions of storage. Aliases are lvalues whose regions of storage are not distinct. Determining if two pointers to objects are aliases is a difficult and time-consuming process because it could require analysis of the entire program.

Example: the function vsq()

void vsq(int n, double * a, double * b)

{

int i;

for (i=0; i<n; i++) b[i] = a[i] * a[i];

}

`void vsq(int n, double * a, double * b)` `{` `int i;` `for (i=0; i<n; i++) b[i] = a[i] * a[i];` `}`

The compiler can parallelize the execution of the different iterations of the loops if it knows that pointers a and b access different objects. If there is an overlap in objects accessed through pointers a and b then it would be unsafe for the compiler to execute the loops in parallel. At compile time, the compiler does not know if the objects accessed by a and b overlap by simply analyzing the function vsq(); the compiler may need to analyze the whole program to get this information.

Restricted pointers are used to specify pointers which designate distinct objects so that the compiler can perform pointer alias analysis. To support restricted pointers, the keyword _Restrict is recognized by the Sun ANSI C compiler as an extension. Below is an example of declaring function parameters of vsq() as restricted pointers:

void vsq(int n, double * _Restrict a, double * _Restrict b)

`void vsq(int n, double * _Restrict a, double * _Restrict b)`

Pointers a and b are declared as restricted pointers, so the compiler knows that the regions of storage pointed to by a and b are distinct. With this alias information, the compiler is able to parallelize the loop.

The _Restrict keyword is a type qualifier, like volatile, and it qualifies pointer types only. _Restrict is recognized as a keyword only for compilation modes -Xa (default) and -Xt. For these two modes, the compiler defines the macro __RESTRICT to enable users write portable code with restricted pointers.

The compiler defines the macro __RESTRICT to enable users to write portable code with restricted pointers. For example, the following code works on the Sun ANSI C compiler in all compilation modes, and should work on other compilers which do not support restricted pointers:

#ifdef __RESTRICT

#define restrict _Restrict

#else

#define restrict

#endif

void vsq(int n, double * restrict a, double * restrict b)

{

int i;

for (i=0; i<n; i++) b[i] = a[i] * a[i];

}

`#ifdef __RESTRICT` `#define restrict _Restrict` `#else` `#define restrict` `#endif` `void vsq(int n, double * restrict a, double * restrict b)` `{` `int i;` `for (i=0; i<n; i++) b[i] = a[i] * a[i];` `}`

If restricted pointers become a part of the ANSI C Standard, it is likely that "restrict" will be the keyword. Users may want to write code with restricted pointers using:

#define restrict _Restrict

`#define restrict _Restrict`

as in vsq() because this way there will be minimal changes should "restrict" become a keyword in the ANSI C Standard. The Sun ANSI C compiler uses _Restrict as the keyword because it is in the implementor's name space, so there is no conflict with identifiers in the user's name space.

There are situations where a user may not want to change the source code. One can specify pointer-valued function parameters to be treated as restricted pointers with the command-line option -xrestrict; refer to "-xrestrict=f" on page 43 for details.

If a function list is specified, pointer parameters in the specified functions are treated as restricted; otherwise, all pointer parameters in the entire C file are treated as restricted. For example, -xrestrict=vsq would qualify the pointers a and b given in "Example: the function vsq()" on page 59 with the keyword _Restrict.

It is critical that _Restrict be used correctly. If pointers qualified as restricted pointers point to objects which are not distinct, loops may be incorrectly parallelized, resulting in undefined behavior. For example, assume that pointers a and b of function vsq() point to objects which overlap, such that b[i] and a[i+1] are the same object. If a and b are not declared as restricted pointers, the loops will be executed serially. If a and b are incorrectly qualified as restricted pointers, the compiler may parallelize the execution of the loops; this is not safe, because b[i+1] should only be computed after b[i] has been computed.

`long long` Data Type

The Sun ANSI C compiler includes the data types long long, and unsigned long long, which are similar to the data type long. long long can store 64 bits of information; long can store 32 bits of information. long long is not available in -Xc mode.

Printing `long long` Data Types

To print or scan long long data types, prefix the conversion specifier with the letters "ll." For example, to print llvar, a variable of long long data type, in signed decimal format, use:


`printf("%lld\n", llvar);`

Usual Arithmetic Conversions

Some binary operators convert the types of their operands to yield a common type, which is also the type of the result. These are called the usual arithmetic conversions:

If either operand is type long double, the other operand is converted to long double.
Otherwise, if either operand has type double, the other operand is converted to double.
Otherwise, if either operand has type float, the other operand is converted to float.
Otherwise, the integral promotions are performed on both operands. Then, these rules are applied:

Constants

This section contains information related to constants that is specific to the Sun ANSI C compiler.

Integral Constants

Decimal, octal, and hexadecimal integral constants can be suffixed to indicate type, as shown in the Table 3-1.

Table 3-1 Data Type Suffixes
Suffix	Type
`u` or `U`	`unsigned`
`l` or `L`	`long`
`ll` or `LL`	`long long`¹
`lu`, `LU`, `Lu`, `lU, ul, uL, Ul, or UL`	`unsigned long`
`llu`, `LLU`, `LLu`, `llU, ull, ULL, uLL, Ull`	`unsigned long long`1

¹ long long and unsigned long long are not available in -Xc mode.

When assigning types to unsuffixed constants, the compiler uses the first of this list in which the value can be represented, depending on the size of the constant:

int
long int
unsigned long int
long long int
unsigned long long int

Character Constants

A multiple-character constant that is not an escape sequence has a value derived from the numeric values of each character. For example, the constant '123' has a value of:

Table 3-2 Multiple-character Constant (ANSI)
`0`	`'3'`	`'2'`	`'1'`

or 0x333231.

With the -Xs option and in other, non-ANSI versions of C, the value is:

Table 3-3 Multiple-character Constant (non-ANSI)

0

'1'

'2'

'3'

Table 3-3 Multiple-character Constant (non-ANSI)
`0`	`'1'`	`'2'`	`'3'`

or 0x313233.

Include Files

To include any of the standard header files supplied with the C compilation system, use this format:


`#include <stdio.h>`

The angle brackets (<>) cause the preprocessor to search for the header file in the standard place for header files on your system, usually the /usr/include directory.

The format is different for header files that you have stored in your own directories:

#include "header.h"

`#include "header.h"`

The quotation marks (" ") cause the preprocessor to search for header.h first in the directory of the file containing the #include line.

If your header file is not in the same directory as the sourcefiles that include it, specify the path of the directory in which it is stored with the -I option to cc. Suppose, for instance, that you have included both stdio.h and header.h in the source file mycode.c:

#include <stdio.h>

#include "header.h"

`#include <stdio.h>` `#include "header.h"`

Suppose further that header.h is stored in the directory../defs. The command:

% cc -I../defs mycode.c

directs the preprocessor to search for header.h first in the directory containing mycode.c, then in the directory ../defs, and finally in the standard place. It also directs the preprocessor to search for stdio.h first in ../defs, then in the standard place. The difference is that the current directory is searched only for header files whose names you have enclosed in quotation marks.

You can specify the -I option more than once on the cc command-line. The preprocessor searches the specified directories in the order they appear. You can specify multiple options to cc on the same command-line:

% cc -o prog -I../defs mycode.c

Nonstandard Floating Point

IEEE 754 floating-point default arithmetic is "nonstop." Underflows are "gradual." Following is a summary of explanation. See the Numerical Computation Guide for details.

Nonstop means that execution does not halt on occurrences like division by zero, floating-point overflow, or invalid operation exceptions. For example, consider the following, where x is zero and y is positive:

z = y / x;

By default, z is set to the value +Inf, and execution continues. With the -fnonstd option, however, this code causes an exit, such as a core dump.

Here is how gradual underflow works. Suppose you have the following code:

x = 10; for (i = 0; i < LARGE_NUMBER; i++) x = x / 10;

`x = 10; for (i = 0; i < LARGE_NUMBER; i++) x = x / 10;`

The first time through the loop, x is set to 1; the second time through, to 0.1; the third time through, to 0.01; and so on. Eventually, x reaches the lower limit of the machine's capacity to represent its value. What happens the next time the loop runs?

Let's say that the smallest number characterizable is:

1.234567e-38

The next time the loop runs, the number is modified by "stealing" from the mantissa and "giving" to the exponent:

1.23456e-39

and, subsequently,

1.2345e-40

and so on. This is known as "gradual underflow," which is the default behavior. In nonstandard behavior, none of this "stealing" takes place; typically, x is simply set to zero.

Preprocessing Directives

This section describes assertions, pragmas, and predefined names.

Assertions

A line of the form:


#assert predicate (token-sequence)

associates the token-sequence with the predicate in the assertion name space (separate from the space used for macro definitions). The predicate must be an identifier token.

#assert predicate

`#assert` predicate

asserts that predicate exists, but does not associate any token sequence with it.

The compiler provides the following predefined predicates by default (not in
-Xc mode):

#assert system (unix)
#assert machine (sparc) (SPARC)
#assert machine (i386) (Intel)
#assert machine (ppc) (PowerPC)
#assert cpu (sparc) (SPARC)
#assert cpu (i386) (Intel)
#assert cpu (ppc) (PowerPC)

`#assert system (unix)` `#assert machine (sparc)` (SPARC) `#assert machine (i386)` (Intel) `#assert machine (ppc)` (PowerPC) `#assert cpu (sparc)` (SPARC) `#assert cpu (i386)` (Intel) `#assert cpu (ppc)` (PowerPC)

lint provides the following predefinition predicate by default (not in
-Xc mode):

#assert lint (on)

`#assert lint (on)`

Any assertion may be removed by using #unassert, which uses the same syntax as assert. Using #unassert with no argument deletes all assertions on the predicate; specifying an assertion deletes only that assertion.

An assertion may be tested in a #if statement with the following syntax:

#if #predicate(non-empty token-list)

#if #predicate(non-empty token-list)

For example, the predefined predicate system can be tested with the following line:

#if #system(unix)

`#if #system(unix)`

which evaluates true.

Pragmas

Preprocessing lines of the form:


`#pragma` pp-tokens

specify implementation-defined actions.

The following #pragmas are recognized by the compilation system:

#pragma align integer (variable[,variable])--Makes all the mentioned variables memory aligned to integer bytes, overriding the default. The following limitations apply:
#pragma fini (f1[,f2...,fn])--Causes the implementation to call functions f1 to fn (finalization functions) after it calls main() routine. Such functions are expected to be of type void and to accept no arguments, and are called either when a program terminates under program control or when the containing shared object is removed from memory. As with "initialization functions," finalization functions are executed in the order processed by the link editors.
#pragma init (f1[,f2...,fn])--Causes the implementation to call functions f1 to fn (initialization functions) before it calls main() routine. Such functions are expected to be of type void and to accept no arguments, and are called while constructing the memory image of the program at the start of execution. In the case of initializers in a shared object, they are executed during the operation that brings the shared object into memory, either program start-up or some dynamic loading operation, such as dlopen(). The only ordering of calls to initialization functions is the order in which they were processed by the link editors, both static and dynamic.
#pragma ident string--Places string in the .comment section of the executable.
#pragma int_to_unsigned function_name--For a function that returns a type of unsigned, in -Xt or -Xs mode, changes the function return to be of type int.
(SPARC) #pragma MP serial_loop--Refer to "Serial Pragmas" on page 74 for details.
(SPARC) #pragma MP serial_loop_nested--Refer to "Serial Pragmas" on page 74 for details.
(SPARC) #pragma MP taskloop--Refer to "Parallel Pragmas" on page 74 for details.
(SPARC) #pragma nomemorydepend--This pragma specifies that for any iteration of a loop, there are no memory dependences. That is, within any iteration of a loop there are no references to the same memory. This pragma will permit the compiler (pipeliner) to schedule instructions, more effectively, within a single iteration of a loop. If any memory dependences exist within any iteration of a loop, the results of executing the program are undefined. The pragma applies to the next for loop within the current block. The compiler takes advantage of this information at optimization level of 3 or above.
(SPARC) #pragma no_side_effect(funcname)--funcname specifies the name of a function within the current translation unit. The function must be declared prior to the pragma. The pragma must be specified prior to the function's definition. For the named function, funcname, the pragma declares that the function has no side effects of any kind. The compiler can use this information when doing optimizations using the function. If the function does have side effects, the results of executing a program which calls this function are undefined. The compiler takes advantage of this information at optimization level of 3 or above.
#pragma pack(n)--Controls the layout of structure offsets. n is a number, 1, 2, or 4, that specifies the strictest alignment desired for any structure member. If n is omitted, members are aligned on their natural boundaries. If you are using #pragma pack(n), be sure to place it after all #includes.
(SPARC) #pragma pipeloop(n)--This pragma accepts a positive constant integer value, or 0, for the argument n. This pragma specifies that a loop is pipelinable and the minimum dependence distance of the loop-carried dependence is n. If the distance is 0, then the loop is effectively a Fortran-style doall loop and should be pipelined on the target processors. If the distance is greater than 0, then the compiler (pipeliner) will only try to pipeline n successive iterations. The pragma applies to the next for loop within the current block. The compiler takes advantage of this information at optimization level of 3 or above.
#pragma redefine_extname old_extname new_extname--The pragma causes every externally defined occurrence of the name "old_extname" in the object code to be "new_extname". Such that, at link time only the name "new_extname" is seen by the loader.

`#pragma align 64 (aninteger, astring, astruct)` `int aninteger;` `static char astring[256];` `struct astruct{int a; char *b;};`

If pragma redefine_extname is encountered after the first use of "old_extname", as a function definition, an initializer, or an expression, the effect is undefined. (Not supported in -Xs and -Xc modes.)

#pragma unknown_control_flow (name, [,name])--Specifies a list of routines that violate the usual control flow properties of procedure calls. For example, the statement following a call to setjmp() can be reached from an arbitrary call to any other routine. The statement is reached by a call to longjmp(). Since such routines render standard flowgraph analysis invalid, routines that call them cannot be safely optimized; hence, they are compiled with the optimizer disabled.
(SPARC) #pragma unroll (unroll_factor)--This pragma accepts a positive constant integer value for the argument unroll_factor. The pragma applies to the next for loop within the current block. For unroll factor other than 1, this directive serves as a suggestion to the compiler that the specified loop should be unrolled by the given factor. The compiler will, when possible, use that unroll factor. When the unroll factor value is 1, this directive serves as a command which specifies to the compiler that the loop is not to be unrolled. The compiler takes advantage of this information at optimization level of 3 or above.
#pragma weak symbol1 [=symbol2]--Defines a weak global symbol. This pragma is used mainly in source files for building libraries. The linker does not produce an error message if it is unable to resolve a weak symbol.
#pragma weak symbol

`#pragma weak` symbol

defines symbol to be a weak symbol. The linker does not produce an error message if it does not find a definition for symbol.


`#pragma weak` symbol1 `=` symbol2

defines symbol1 to be a weak symbol, which is an alias for the symbol symbol2. This form of the pragma can only be used in the same translation unit where symbol2 is defined, either in the sourcefiles or one of its included headerfiles. Otherwise, a compilation error will result.

If your program calls but does not define symbol1, and symbol1 is a weak symbol in a library being linked,the linker uses the definition from that library. However, if your program defines its own version of symbol1, then the program's definition is used and the weak global definition of symbol1 in the library is not used. If the program directly calls symbol2, the definition from the library is used; a duplicate definition of symbol2 causes an error.

The compiler ignores unrecognized pragmas. Using the -v option will give a warning on unrecognized pragmas.

Predefined Names

The following identifier is predefined as an object-like macro:

Table 3-4 Predefined Identifier
Identifier	Description
__STDC__	__STDC__ 1 -Xc __STDC__ 0 -Xa, -Xt Not defined `-Xs`

The compiler will issue a warning if __STDC__ is undefined (#undef __STDC__). __STDC__ is not defined in -Xs mode.

Predefinitions (not valid in -Xc mode):

sun
unix
sparc (SPARC)
i386 (Intel)

The following predefinitions are valid in all modes:

_ _sun
_ _unix
_ _SUNPRO_C=0x400
_ _`uname -s`_`uname -r` (example: _ _SunOS_5_4)
_ _sparc (SPARC)
_ _i386 (Intel)
_ _BUILTIN_VA_ARG_INCR
_ _SVR4
_ _LITTLE_ENDIAN (PowerPC)
_ _ppc (PowerPC)

The compiler also predefines the object-like macro
_ _PRAGMA_REDEFINE_EXTNAME

to indicate that the pragma will be recognized.

The following is predefined in -Xa and -Xt modes only:

_ _RESTRICT

MP C (SPARC)

SunSoft MP C is an extended ANSI C compiler that can optimize code to run on SPARC shared-memory multiprocessor machines. The process is called parallelizing. The compiled code can execute in parallel using the multiple processors on the system.

The SunSoft WorkShop includes the license required to use the features of MP C.

This section contains an overview and example of using MP C, and documents the environment variable, keyword, pragmas, and options used with MP C.

Refer to the "MP C" white paper, located in /opt/SUNWspro/READMEs/mpc.ps, for examples on using MP C and for further reference information.

Overview

The MP C compiler generates parallel code for those loops that it determines are safe to parallelize. Typically, these loops have iterations that are independent of each other. For such loops, it does not matter in what order the iterations are executed or if they are executed in parallel. Many, although not all, vector loops fall into this category.

Because of the way aliasing works in C, it is difficult to determine the safety of parallelization. To help the compiler, MP C offers pragmas and additional pointer qualifications to provide aliasing information known to the programmer that the compiler cannot determine.

Example of Use

The following example illustrates the use of MP C and how parallel execution can be controlled. To enable parallelization of the target program, the
"-xautopar" option can be used as follows:

% cc -fast -xO4 -xautopar example.c -o example

This generates an executable called example, which can be executed normally.

Environment Variable

If multiprocessor execution is desired, the PARALLEL environment variable needs to be set. It specifies the number of processors available to the program:

% setenv PARALLEL 2

This will enable the execution of the program on two threads. If the target machine has multiple processors, the threads can map to independent processors.

% example

Running the program will lead to creation of two threads that will execute the parallelized portions of the program.

        #pragma MP taskloop maxcpus(4)

        #pragma MP taskloop shared(a,b)

        #pragma MP taskloop storeback(x)

These options may appear multiple times prior to the for loop to which they apply. In case of conflicting options, the compiler will issue a warning message.

Nesting of `for` loops

An MP taskloop pragma applies to the next for loop within the current block. There is no nesting of parallelized for loops by MP C.

Eligibility for Parallelizing

An MP taskloop pragma suggests to the compiler that, unless otherwise disallowed, the specified for loop should be parallelized.

For loops with irregular control flow and unknown loop iteration increment are not eligible for parallelization. For example, for loops containing setjmp, longjmp, exit, abort, return, goto, labels, and break should not be considered as candidates for parallelization.

Of particular importance is to note that for loops with inter-iteration dependencies can be eligible for explicit parallelization. This means that if a MP taskloop pragma is specified for such a loop the compiler will simply honor it, unless the for loop is disqualified. It is the user's responsibility to make sure that such explicit parallelization will not lead to incorrect results.

If both the serial_loop or serial_loop_nested and taskloop pragmas are specified for a for loop, the last one specified will prevail.

Consider the following example:

      #pragma MP serial_loop_nested

          for (i=0; i<100; i++) {

         # pragma MP taskloop

           for (j=0; j<1000; j++) {

                   ...

          }

         }

The i loop will not be parallelized but the j loop might be.

Number of Processors

#pragma MP taskloop maxcpus (number_of_processors) specifies the number of processors to be used for this loop, if possible.

The value of maxcpus must be a positive integer. If maxcpus equals 1, then the specified loop will be executed in serial. (Note that setting maxcpus to be 1 is equivalent to specifying the serial_loop pragma.) The smaller of the values of maxcpus or the interpreted value of the PARALLEL environment variable will be used. When the environment variable PARALLEL is not specified, it is interpreted as having the value 1.

If more than one maxcpus pragma is specified for a for loop, the last one specified will prevail.

Classifying Variables

A variable used in a loop is classified as being either a "private", "shared", "reduction", or "readonly" variable. The variable will belong to only one of these classifications. A variable can only be classified as a reduction or readonly variable via an explicit pragma. See #pragma MP taskloop reduction and #pragma MP taskloop readonly. A variable can be classified as being either a "private or "shared" variable via an explicit pragma or through the following default scoping rules.

Default Scoping Rules for Private and Shared Variables

A private variable is one whose value is private to each processor processing some iterations of a for loop. In other words, the value assigned to a private variable in one iteration of a for loop is not propagated to other processors processing other iterations of that for loop. A shared variable, on the other hand, is a variable whose current value is accessible by all processors processing iterations of a for loop. The value assigned to a shared variable by one processor working on iterations of a loop may be seen by other processors working on other iterations of the loop. Loops being explicitly parallelized through use of #pragma MP taskloop directives, that contain references to shared variables, must ensure that such sharing of values does not cause any correctness problems (such as race conditions). No synchronization is provided by the compiler on updates and accesses to shared variables in an explicitly parallelized loop.

In analyzing explicitly parallelized loops, the compiler uses the following "default scoping rules" to determine whether a variable is private or shared:

If a variable is not explicitly classified via a pragma, the variable will default to being classified as a shared variable if it is declared as a pointer or array, and is only referenced using array syntax within the loop. Otherwise, it will be classified as a private variable.
The loop index variable is always treated as a private variable and is always a storeback variable.

It is highly recommended that all variables used in an explicitly parallelized for loop be explicitly classified as one of shared, private, reduction, or readonly, to avoid the "default scoping rules."

Since the compiler does not perform any synchronization on accesses to shared variables, extreme care must be exercised before using an MP taskloop pragma for a loop that contains, for example, array references. If inter-iteration data dependencies exist in such an explicitly parallelized loop, then its parallel execution may give erroneous results. The compiler may or may not be able to detect such a potential problem situation and issue a warning message. In any case, the compiler will not disable the explicit parallelization of loops with potential shared variable problems.

Private Variables

#pragma MP taskloop private (list_of_private_variables) specifies all the variables that should be treated as private variables for this loop. All other variables used in the loop that are not explicitly specified as shared, readonly, or reduction variables, will be either shared or private as defined by the default scoping rules.

A private variable is one whose value is private to each processor processing some iterations of a loop. In other words, the value assigned to a private variable by one of the processors working on iterations of a loop is not propagated to other processors processing other iterations of that loop. A private variable has no initial value at the start of each iteration of a loop and must be set to a value within the iteration of a loop prior to its first use within that iteration. Execution of a program with a loop containing an explicitly declared private variable whose value is used prior to being set will result in undefined behavior.

Shared Variables

#pragma MP taskloop shared (list_of_shared_variables) specifies all the variables that should be treated as shared variables for this loop. All other variables used in the loop that are not explicitly specified as private, readonly, storeback or reduction variables, will be either shared or private as defined by the default scoping rules.

A shared variable is a variable whose current value is accessible by all processors processing iterations of a for loop. The value assigned to a shared variable by one processor working on iterations of a loop may be seen by other processors working on other iterations of the loop.

Read-only Variables

Read-only variables are a special class of shared variables that are not modified in any iteration of a loop. #pragma MP taskloop readonly (list_of_readonly_variables) indicates to the compiler that it may use a separate copy of that variable's value for each processor processing iterations of the loop.

Storeback Variables

#pragma MP taskloop storeback (list_of_storeback_variables) specifies all the variables to be treated as storeback variables.

A storeback variable is one whose value is computed in a loop, and this computed value is then used after the termination of the loop. The last loop iteration values of storeback variables are available for use after the termination of the loop. Such a variable is a good candidate to be declared explicitly via this directive as a storeback variable when the variable is a private variable, whether by explicitly declaring the variable private or by the default scoping rules.

Note that the storeback operation for a storeback variable occurs at the last iteration of the explicitly parallelized loop, regardless of whether or not that iteration updates the value of the storeback variable. In other words the processor that processes the last iteration of a loop may not be the same processor that currently contains the last updated value for a storeback variable. Consider the following example:

       #pragma MP taskloop private(x)

       #pragma MP taskloop storeback(x)

        for (i=1; i <= n; i++) {

            if (...) {

                x =...

          }

      }

        printf ("%d", x);

In the above example the value of the storeback variable x printed out via the printf() call may not be the same as that printed out by a serial version of the i loop, because in the explicitly parallelized case, the processor that processes the last iteration of the loop (when i==n), which performs the storeback operation for x may not be the same processor that currently contains the last updated value for x. The compiler will attempt to issue a warning message to alert the user of such potential problems.

In an explicitly parallelized loop, variables referenced as arrays are not treated as storeback variables. Hence it is important to include them in the list_of_storeback_variables if such storeback operation is desired (for example, if the variables referenced as arrays have been declared as private variables).

Savelast

#pragma MP taskloop savelast specifies that all the private variables of a loop be treated as a storeback variables. The syntax of this pragma is as follows:

      #pragma MP taskloop savelast

It is often convenient to use this form, rather than list out each private variable of a loop when declaring each variable as storeback variables.

Reduction Variables

#pragma MP taskloop reduction (list_of_reduction_variables) specifies that all the variables appearing in the reduction list will be treated as reduction variables for the loop. A reduction variable is one whose partial values can be individually computed by each of the processors processing iterations of the loop, and whose final value can be computed from all its partial values. The presence of a list of reduction variables can facilitate the compiler in identifying that the loop is a reduction loop, allowing generation of parallel reduction code for it.

Consider the following example:

        #pragma MP taskloop reduction(x)

             for (i=0; i<n; i++) {

                x = x + a[i];

          }

the variable x is a (sum) reduction variable and the i loop is a(sum) reduction loop.

Scheduling Control

The MP C compiler supports several pragmas that can be used in conjunction with the taskloop pragma to control the loop scheduling strategy for a given loop. The syntax for this pragma is:

#pragma MP taskloop schedtype (scheduling_type)

This pragma can be used to specify the specific scheduling_type to be used to schedule the parallelized loop. Scheduling_type can be one of the following:

static

In static scheduling all the iterations of the loop are uniformly distributed among all the participating processors.

Example:

            #pragma MP taskloop maxcpus(4)

             #pragma MP taskloop schedtype(static)

             for (i=0; i<1000; i++) {

               ...

           }

In the above example, each of the four processors will process 250 iterations of the loop.

self [(chunk_size)]

In self scheduling, each participating processor processes a fixed number of iterations (called the "chunk size") until all the iterations of the loop have been processed. The optional chunk_size parameter specifies the "chunk size" to be used. Chunk_size must be a positive integer constant, or variable of integral type. If specified as a variable chunk_size must evaluate to a positive integer value at the beginning of the loop. If this optional parameter is not specified or its value is not positive, the compiler will select the chunk size to be used.

Example:

            #pragma MP taskloop maxcpus(4)

             #pragma MP taskloop schedtype(self(120))

             for (i=0; i<1000; i++) {

               ...

           }

In the above example, the number of iterations of the loop assigned to each participating processor, in order of work request, are:

120, 120, 120, 120, 120, 120, 120, 120, 40.

gss [(min_chunk_size)]

In guided self scheduling, each participating processor processes a variable number of iterations (called the "min chunk size") until all the iterations of the loop have been processed. The optional min_chunk_size parameter specifies that each variable chunk size used must be at least min_chunk_size in size. Min_chunk_size must be a positive integer constant, or variable of integral type. If specified as a variable min_chunk_size must evaluate to a positive integer value at the beginning of the loop. If this optional parameter is not specified or its value is not positive, the compiler will select the chunk size to be used.

Example:

            #pragma MP taskloop maxcpus(4)

            #pragma MP taskloop schedtype(gss(10))

             for (i=0; i<1000; i++) {

               ...

           }

In the above example, the number of iterations of the loop assigned to each participating processor, in order of work request, are:

250, 188, 141, 106, 79, 59, 45, 33, 25, 19, 14, 11, 10, 10, 10.

factoring [(min_chunk_size)]

In factoring scheduling, each participating processor processes a variable number of iterations (called the "min chunk size") until all the iterations of the loop have been processed. The optional min_chunk_size parameter specifies that each variable chunk size used must be at least min_chunk_size in size. Min_chunk_size must be a positive integer constant, or variable of integral type. If specified as a variable min_chunk_size must evaluate to a positive integer value at the beginning of the loop. If this optional parameter is not specified or its value is not positive, the compiler will select the chunk size to be used.

Example:

           #pragma MP taskloop maxcpus(4)

            #pragma MP taskloop schedtype(factoring(10))

             for (i=0; i<1000; i++) {

               ...

           }

In the above example, the number of iterations of the loop assigned to each participating processor, in order of work request, are:

125, 125, 125, 125, 62, 62, 62, 62, 32, 32, 32, 32, 16, 16, 16, 16, 10, 10,
10, 10, 10, 10.

Compiler Options

The following compiler options can be used in MP C. Refer to Chapter 2, "cc Compiler Options" for complete descriptions of the options.

"-xautopar" on page 27
"-xdepend" on page 31
"-xexplicitpar" on page 31
"-xloopinfo" on page 34
"-xparallel" on page 39
"-xreduction" on page 42
"-xrestrict=f" on page 43
"-xvpara" on page 52
"-Zlp" on page 53.

Sun ANSI C Compiler-Specific Information

3

Environment Variables

TMPDIR

SUNPRO_SB_INIT_FILE_NAME

Global Behavior: Value versus unsigned Preserving

Keywords

asm Keyword

long long Data Type

Constants

Table 3-1 Data Type Suffixes

Character Constants

Table 3-2 Multiple-character Constant (ANSI)

Table 3-3 Multiple-character Constant (non-ANSI)

Include Files

Nonstandard Floating Point

Preprocessing Directives

Table 3-4 Predefined Identifier

MP C (SPARC)

Compiler Options

`TMPDIR`

`SUNPRO_SB_INIT_FILE_NAME`

Global Behavior: Value versus `unsigned` Preserving

`asm` Keyword

`long long` Data Type