Sun ANSI C Compiler-Specific Information |
3 |
![]() |
If you use a Bourne shell, type:
$ TMPDIR=dir; export TMPDIRIf you use a C shell, type:
% setenv TMPDIR dir
Global Behavior: Value versus unsigned Preserving
A program that depends on unsigned preserving arithmetic conversions behaves differently. This is considered to be the most serious change made by ANSI C.
Keywords
asm Keyword
The _asm keyword is a synonym for the asm keyword. asm is available under all compilation modes, although a warning is issued when it is used under the -Xc mode.
asm("string"): |
where string is a valid assembly language statement.
main() { int i; /* i = 10 */ asm("mov 10,%l0"); asm("st %l0,[%fp-8]"); printf("i = %d\n",i); } % cc foo.c % a.out i = 10 % |
asm statements must appear within function bodies.
_Restrict Keyword
For a compiler to effectively perform parallel execution of a loop, it needs to determine if certain lvalues designate distinct regions of storage. Aliases are lvalues whose regions of storage are not distinct. Determining if two pointers to objects are aliases is a difficult and time-consuming process because it could require analysis of the entire program.
void vsq(int n, double * a, double * b) { int i; for (i=0; i<n; i++) b[i] = a[i] * a[i]; } |
The compiler can parallelize the execution of the different iterations of the loops if it knows that pointers a and b access different objects. If there is an overlap in objects accessed through pointers a and b then it would be unsafe for the compiler to execute the loops in parallel. At compile time, the compiler does not know if the objects accessed by a and b overlap by simply analyzing the function vsq(); the compiler may need to analyze the whole program to get this information.
void vsq(int n, double * _Restrict a, double * _Restrict b) |
Pointers a and b are declared as restricted pointers, so the compiler knows that the regions of storage pointed to by a and b are distinct. With this alias information, the compiler is able to parallelize the loop.
#ifdef __RESTRICT #define restrict _Restrict #else #define restrict #endif void vsq(int n, double * restrict a, double * restrict b) { int i; for (i=0; i<n; i++) b[i] = a[i] * a[i]; } |
#define restrict _Restrict |
as in vsq() because this way there will be minimal changes should "restrict" become a keyword in the ANSI C Standard. The Sun ANSI C compiler uses _Restrict as the keyword because it is in the implementor's name space, so there is no conflict with identifiers in the user's name space.
If a function list is specified, pointer parameters in the specified functions are treated as restricted; otherwise, all pointer parameters in the entire C file are treated as restricted. For example, -xrestrict=vsq would qualify the pointers a and b given in "Example: the function vsq()" on page 59 with the keyword _Restrict.
It is critical that _Restrict be used correctly. If pointers qualified as restricted pointers point to objects which are not distinct, loops may be incorrectly parallelized, resulting in undefined behavior. For example, assume that pointers a and b of function vsq() point to objects which overlap, such that b[i] and a[i+1] are the same object. If a and b are not declared as restricted pointers, the loops will be executed serially. If a and b are incorrectly qualified as restricted pointers, the compiler may parallelize the execution of the loops; this is not safe, because b[i+1] should only be computed after b[i] has been computed.
long long Data Type
The Sun ANSI C compiler includes the data types long long, and unsigned long long, which are similar to the data type long. long long can store 64 bits of information; long can store 32 bits of information. long long is not available in -Xc mode. Printing long long Data Types
To print or scan long long data types, prefix the conversion specifier with the letters "ll." For example, to print llvar, a variable of long long data type, in signed decimal format, use:
printf("%lld\n", llvar); |
Usual Arithmetic Conversions
Some binary operators convert the types of their operands to yield a common type, which is also the type of the result. These are called the usual arithmetic conversions:
Suffix |
Type |
u or U |
unsigned |
l or L |
long |
ll or LL |
long long1 |
lu, LU, Lu, lU, ul, uL, Ul, or UL |
unsigned long |
llu, LLU, LLu, llU, ull, ULL, uLL, Ull |
unsigned long long1 |
1
long long and unsigned long long are not available in -Xc mode.
|
0 |
'3' |
'2' |
'1' |
or 0x333231.
0 |
'1' |
'2' |
'3' |
or 0x313233.
Include Files
To include any of the standard header files supplied with the C compilation system, use this format:
#include <stdio.h> |
The angle brackets (<>) cause the preprocessor to search for the header file in the standard place for header files on your system, usually the /usr/include directory.
#include "header.h" |
The quotation marks (" ") cause the preprocessor to search for header.h first in the directory of the file containing the #include line.
#include <stdio.h> #include "header.h" |
Suppose further that header.h is stored in the directory../defs. The command:
% cc -I../defs mycode.c
directs the preprocessor to search for header.h first in the directory containing mycode.c, then in the directory ../defs, and finally in the standard place. It also directs the preprocessor to search for stdio.h first in ../defs, then in the standard place. The difference is that the current directory is searched only for header files whose names you have enclosed in quotation marks.% cc -o prog -I../defs mycode.c
Nonstandard Floating Point
IEEE 754 floating-point default arithmetic is "nonstop." Underflows are "gradual." Following is a summary of explanation. See the Numerical Computation Guide for details.z = y / x;
By default, z is set to the value +Inf, and execution continues. With the -fnonstd option, however, this code causes an exit, such as a core dump.
x = 10; |
The first time through the loop, x is set to 1; the second time through, to 0.1; the third time through, to 0.01; and so on. Eventually, x reaches the lower limit of the machine's capacity to represent its value. What happens the next time the loop runs?
1.234567e-38
The next time the loop runs, the number is modified by "stealing" from the mantissa and "giving" to the exponent:1.23456e-39
and, subsequently,1.2345e-40
and so on. This is known as "gradual underflow," which is the default behavior. In nonstandard behavior, none of this "stealing" takes place; typically, x is simply set to zero.
Preprocessing Directives
This section describes assertions, pragmas, and predefined names. Assertions
A line of the form:
#assert predicate (token-sequence) |
#assert predicate |
asserts that predicate exists, but does not associate any token sequence with it.
-Xc mode):
#assert system (unix) #assert machine (sparc) (SPARC) #assert machine (i386) (Intel) #assert machine (ppc) (PowerPC) #assert cpu (sparc) (SPARC) #assert cpu (i386) (Intel) #assert cpu (ppc) (PowerPC) |
lint provides the following predefinition predicate by default (not in
-Xc mode):
#assert lint (on) |
Any assertion may be removed by using #unassert, which uses the same syntax as assert. Using #unassert with no argument deletes all assertions on the predicate; specifying an assertion deletes only that assertion.
#if #predicate(non-empty token-list) |
For example, the predefined predicate system can be tested with the following line:
#if #system(unix) |
which evaluates true.
Pragmas
Preprocessing lines of the form:
#pragma pp-tokens |
specify implementation-defined actions.
#pragma align 64 (aninteger, astring, astruct) int aninteger; static char astring[256]; struct astruct{int a; char *b;}; |
#pragma weak symbol |
#pragma weak symbol1 = symbol2 |
Identifier |
Description |
__STDC__ |
__STDC__ 1 -Xc __STDC__ 0 -Xa, -Xt Not defined -Xs |
The compiler will issue a warning if __STDC__ is undefined (#undef __STDC__). __STDC__ is not defined in -Xs mode.
to indicate that the pragma will be recognized.
The following is predefined in -Xa and -Xt modes only:
The SunSoft WorkShop includes the license required to use the features of MP C.
This section contains an overview and example of using MP C, and documents the environment variable, keyword, pragmas, and options used with MP C.
Refer to the "MP C" white paper, located in /opt/SUNWspro/READMEs/mpc.ps, for examples on using MP C and for further reference information.
Because of the way aliasing works in C, it is difficult to determine the safety of parallelization. To help the compiler, MP C offers pragmas and additional pointer qualifications to provide aliasing information known to the programmer that the compiler cannot determine.
% cc -fast -xO4 -xautopar example.c -o exampleThis generates an executable called example, which can be executed normally.
% setenv PARALLEL 2This will enable the execution of the program on two threads. If the target machine has multiple processors, the threads can map to independent processors.
% exampleRunning the program will lead to creation of two threads that will execute the parallelized portions of the program.
Explicit Parallelization and Pragmas
Often, there is not enough information available for the compiler to make a decision on the legality or profitability of parallelization. MP C supports pragmas that allow the programmer to effectively parallelize loops that otherwise would be too difficult or impossible for the compiler to handle. Serial Pragmas
There are two serial pragmas, and both apply to "for" loops:
The #pragma MP serial_loop_nested pragma indicates to the compiler that the next for loop and any for loops nested within the scope of this for loop are not to be implicitly/automatically parallelized. The scope of the serial_loop_nested pragma does not extend beyond the scope of the loop to which it applies.
The MP taskloop pragma can, optionally, take one or more of the following arguments.
#pragma MP taskloop maxcpus(4)These options may appear multiple times prior to the for loop to which they apply. In case of conflicting options, the compiler will issue a warning message.
#pragma MP taskloop shared(a,b)
#pragma MP taskloop storeback(x)
For loops with irregular control flow and unknown loop iteration increment are not eligible for parallelization. For example, for loops containing setjmp, longjmp, exit, abort, return, goto, labels, and break should not be considered as candidates for parallelization.
Of particular importance is to note that for loops with inter-iteration dependencies can be eligible for explicit parallelization. This means that if a MP taskloop pragma is specified for such a loop the compiler will simply honor it, unless the for loop is disqualified. It is the user's responsibility to make sure that such explicit parallelization will not lead to incorrect results.
If both the serial_loop or serial_loop_nested and taskloop pragmas are specified for a for loop, the last one specified will prevail.
Consider the following example:
#pragma MP serial_loop_nestedThe i loop will not be parallelized but the j loop might be.
for (i=0; i<100; i++) {
# pragma MP taskloop
for (j=0; j<1000; j++) {
...
}
}
The value of maxcpus must be a positive integer. If maxcpus equals 1, then the specified loop will be executed in serial. (Note that setting maxcpus to be 1 is equivalent to specifying the serial_loop pragma.) The smaller of the values of maxcpus or the interpreted value of the PARALLEL environment variable will be used. When the environment variable PARALLEL is not specified, it is interpreted as having the value 1.
If more than one maxcpus pragma is specified for a for loop, the last one specified will prevail.
In analyzing explicitly parallelized loops, the compiler uses the following "default scoping rules" to determine whether a variable is private or shared:
Since the compiler does not perform any synchronization on accesses to shared variables, extreme care must be exercised before using an MP taskloop pragma for a loop that contains, for example, array references. If inter-iteration data dependencies exist in such an explicitly parallelized loop, then its parallel execution may give erroneous results. The compiler may or may not be able to detect such a potential problem situation and issue a warning message. In any case, the compiler will not disable the explicit parallelization of loops with potential shared variable problems.
A private variable is one whose value is private to each processor processing some iterations of a loop. In other words, the value assigned to a private variable by one of the processors working on iterations of a loop is not propagated to other processors processing other iterations of that loop. A private variable has no initial value at the start of each iteration of a loop and must be set to a value within the iteration of a loop prior to its first use within that iteration. Execution of a program with a loop containing an explicitly declared private variable whose value is used prior to being set will result in undefined behavior.
A shared variable is a variable whose current value is accessible by all processors processing iterations of a for loop. The value assigned to a shared variable by one processor working on iterations of a loop may be seen by other processors working on other iterations of the loop.
A storeback variable is one whose value is computed in a loop, and this computed value is then used after the termination of the loop. The last loop iteration values of storeback variables are available for use after the termination of the loop. Such a variable is a good candidate to be declared explicitly via this directive as a storeback variable when the variable is a private variable, whether by explicitly declaring the variable private or by the default scoping rules.
Note that the storeback operation for a storeback variable occurs at the last iteration of the explicitly parallelized loop, regardless of whether or not that iteration updates the value of the storeback variable. In other words the processor that processes the last iteration of a loop may not be the same processor that currently contains the last updated value for a storeback variable. Consider the following example:
#pragma MP taskloop private(x)In the above example the value of the storeback variable x printed out via the printf() call may not be the same as that printed out by a serial version of the i loop, because in the explicitly parallelized case, the processor that processes the last iteration of the loop (when i==n), which performs the storeback operation for x may not be the same processor that currently contains the last updated value for x. The compiler will attempt to issue a warning message to alert the user of such potential problems.
#pragma MP taskloop storeback(x)
for (i=1; i <= n; i++) {
if (...) {
x =...
}
}
printf ("%d", x);
In an explicitly parallelized loop, variables referenced as arrays are not treated as storeback variables. Hence it is important to include them in the list_of_storeback_variables if such storeback operation is desired (for example, if the variables referenced as arrays have been declared as private variables).
#pragma MP taskloop savelastIt is often convenient to use this form, rather than list out each private variable of a loop when declaring each variable as storeback variables.
Consider the following example:
#pragma MP taskloop reduction(x)the variable x is a (sum) reduction variable and the i loop is a(sum) reduction loop.
for (i=0; i<n; i++) {
x = x + a[i];
}
#pragma MP taskloop schedtype (scheduling_type)This pragma can be used to specify the specific scheduling_type to be used to schedule the parallelized loop. Scheduling_type can be one of the following:
Example:
#pragma MP taskloop maxcpus(4)In the above example, each of the four processors will process 250 iterations of the loop.
#pragma MP taskloop schedtype(static)
for (i=0; i<1000; i++) {
...
}
Example:
#pragma MP taskloop maxcpus(4)In the above example, the number of iterations of the loop assigned to each participating processor, in order of work request, are:
#pragma MP taskloop schedtype(self(120))
for (i=0; i<1000; i++) {
...
}
Example:
#pragma MP taskloop maxcpus(4)In the above example, the number of iterations of the loop assigned to each participating processor, in order of work request, are:
#pragma MP taskloop schedtype(gss(10))
for (i=0; i<1000; i++) {
...
}
Example:
#pragma MP taskloop maxcpus(4)In the above example, the number of iterations of the loop assigned to each participating processor, in order of work request, are:
#pragma MP taskloop schedtype(factoring(10))
for (i=0; i<1000; i++) {
...
}