PowerPC Behavior and Implementation |
D |
![]() |
A fused multiply-add operation consists of a product and sum, a x b + c, computed with only one final rounding; the intermediate product a x b is not rounded. Because of this, the evaluation of an expression involving such a product and sum (for example, in an inner product or saxpy) can produce different results depending on whether the compiler generates a fused multiply-add instruction or a sequence of two instructions, a multiply followed by an add. Typically, the expression evaluated using a fused multiply-add operation will be at least as accurate, and possibly more accurate than the expression evaluated using separate multiply and add instructions, since the latter will incur two rounding errors rather than one. In fact, some codes can be written specifically to exploit the accuracy of the fused multiply-add. As is often true in numerical software, however, there are exceptions: some programs that expect each arithmetic operation to be rounded to working precision may function incorrectly when fused multiply-add instructions are used. The -xarch=ppc_nofma flag can be used to disable the generation of fused multiply-add instructions so that such programs can function correctly.
Two of the fused multiply-add instructions, the fnmadd and fnmsub instructions, negate their result, producing -(a x b + c) and c - a x b respectively. Note, however, that when an expression that does not produce an exactly representable result is evaluated in either round-to-positive-infinity or round-to-negative-infinity mode, the negation of the result does not produce the same value as the evaluation of the negated expression. (That is, if x is the result of evaluating expression a rounding toward positive infinity, -x is the result of evaluating -a rounding toward negative infinity.) Thus, when directed rounding modes are used, as in interval arithmetic, for example, care must be taken to avoid inadvertently reversing the sense of rounding by negation. Since the compiler cannot tell that a directed rounding mode is in effect, it may interpret an expression as involving a negation in order to issue a negated fused multiply-add instruction (for performance reasons); to avoid this problem, the -xarch=ppc_nofma flag should be used to disable generation of fused multiply-add instructions when directed rounding modes are used.
The PowerPC Architecture manual shows the bit assignments of the
Floating- Point Status and Control Register. Note that the
floating-point exception trap enable bits (VE, OE, UE, ZE, XE), the
non-IEEE mode bit (NI), and the rounding mode bits (RN) are control
bits: modifying these bits can alter the behavior of the floating-point
unit for subsequent operations. The remaining bits, namely the
exception summary bits, accrued exception bits, fraction rounded and
fraction inexact bits, and result flags, are status bits: they record
information about results computed in previous operations.
Floating-point Exceptions and Unimplemented
Floating-Point Instructions
When a floating-point exception occurs and the corresponding trap
enable bits not set, the untrapped default result specified by IEEE 754
is delivered to the destination register, the corresponding exception
bit in the FPSCR is set (causing the overall exception summary bit and,
in the case of an invalid operation exception, the invalid operation
exception summary bit, also to be set), and execution continues.
Gradual Underflow
The PowerPC 603 and 604 all handle subnormal operands and results
entirely in hardware. The NI (non-IEEE) mode bit in the FPSCR is
ignored on the 603. On the 604, the NI mode bit may be used to obtain
flush-to-zero treatment of underflow, but there is no performance
benefit in doing so. For this reason, the -fns compiler
flag and the nonstandard_arithmetic function in
libsunmath (see Chapter 3, "The Math
Libraries") have no effect on PowerPC systems.
Example: Using Fused Multiply-Add
The following code illustrates the explicit use of fused multiply-add
operations to achieve better accuracy than a separate multiply and
add. This C++ code implements the basic operations for a data type
ddouble that simulates arithmetic with roughly twice the
precision of double. To use this code, compile the C++
routines below together with the following inline template. (Do not
use the -fsimple option.)
.inline fmsub,0 fmsub %f1,%f1,%f2,%f3 .end |