SPARC Behavior and Implementation

B

This chapter discusses issues related to the floating-point units used in SPARC workstations and describes a way to determine which code generation flags are best suited for a particular workstation.

This appendix has the following organization:

Floating-point Hardware

page 131

fpversion(1) Function -- Finding Information About the FPU

page 138

Floating-point Hardware	page 131
fpversion(1) Function -- Finding Information About the FPU	page 138

Floating-point Hardware

This section describes details of SPARC implementation of IEEE exceptions. See the SPARC Architecture Manual, Version 8, Appendix N "SPARC IEEE 754 Implementation Recommendations" for a brief description of what happens when a trap is taken, the distinction between trapped and untrapped underflow, and recommended possible courses of action for SPARC implementations that choose to provide non-IEEE (nonstandard) arithmetic mode.

Many SPARC systems have floating-point units derived from cores developed by TI or Weitek.

TI family -- includes the TI8847 and the TMS390C602A
Weitek family -- includes the 1164/1165, the 3170, and 3171

These two families of FPUs have been licensed to other workstation vendors, so chips from other semiconductor manufacturers may be found in SPARC workstations.

Table B-1 lists the hardware floating-point implementations used by SPARC workstations. Because the SPARC architecture defines the instruction set implemented by a SPARC floating-point unit, the chips differ more in the technology used to construct them than in functionality.

Table B-1 SPARC Floating-Point Options

FPU

Description

Appropriate
for Machines
Notes

Compilation Switch

TI 8847-based FPU

TI 8847; controller from Fujitsu or LSI

Sun-4/1xx
Sun-4/2xx
Sun-4/3xx
Sun-4/4xx
SPARCstation 1 (4/60)

1989.
Most SPARCstation 1 workstations have
Weitek 3170

-xcg89

Weitek 3170-based FPU



SPARCstation 1 (4/60) SPARCstation 1+ (4/65)

1989, 1990

-xcg89

TI 602a



SPARCstation 2 (4/75)

1990

-xcg89

Weitek 3172-based FPU



SPARCstation SLC (4/20)
SPARCstation IPC (4/40)

1990

-xcg89

Weitek 8601 or Fujitsu 86903

Integrated CPU and FPU

SPARCstation IPX (4/50)
SPARCstation ELC (4/25)

1991;
IPX uses 40 MHz CPU/FPU; ELC uses 33 MHz

-xcg89

Cypress 602

Resides on Mbus Module

SPARCserver 6xx

1991

-xcg89

TI TMS390Z50

SuperSPARC or
SuperSPARC+

SPARCserver 6xx
SPARCstation 10
Model 30, 40, 41, 52, 54, 402MP, 512MP

1992, 1993

-xcg92

TI TMS390S10

microSPARC

SPARCstation LX (4/30)

1992

-xcg92

TI TMS390S10

microSPARC

SPARCclassic (4/15)

1992

-xcg92

TI TMS390Z50

SuperSPARC or SuperSPARC+

SPARCServer 1000
SPARCCenter 2000

1992, 1993

-xcg92

TI TMS390Z50 or Cypress 602

SuperSPARC or Cypress SPARC

SPARCSystem 6xxMP

1992, 1993

-xcg92

Weitek 1164/1165-based FPU
or no FPU

Kernel emulates floating-point
instructions

Obsolete

Slow; not recommended

-xcg89 or xcg92

Table B-1 SPARC Floating-Point Options
FPU	Description	Appropriate for Machines	Notes	Compilation Switch
TI 8847-based FPU	TI 8847; controller from Fujitsu or LSI	Sun-4/1xx Sun-4/2xx Sun-4/3xx Sun-4/4xx SPARCstation 1 (4/60)	1989. Most SPARCstation 1 workstations have Weitek 3170	`-xcg89`
Weitek 3170-based FPU		SPARCstation 1 (4/60) SPARCstation 1+ (4/65)	1989, 1990	`-xcg89`
TI 602a		SPARCstation 2 (4/75)	1990	`-xcg89`
Weitek 3172-based FPU		SPARCstation SLC (4/20) SPARCstation IPC (4/40)	1990	`-xcg89`
Weitek 8601 or Fujitsu 86903	Integrated CPU and FPU	SPARCstation IPX (4/50) SPARCstation ELC (4/25)	1991; IPX uses 40 MHz CPU/FPU; ELC uses 33 MHz	`-xcg89`
Cypress 602	Resides on Mbus Module	SPARCserver 6xx	1991	`-xcg89`
TI TMS390Z50	SuperSPARC or SuperSPARC+	SPARCserver 6xx SPARCstation 10 Model 30, 40, 41, 52, 54, 402MP, 512MP	1992, 1993	`-xcg92`
TI TMS390S10	microSPARC	SPARCstation LX (4/30)	1992	`-xcg92`
TI TMS390S10	microSPARC	SPARCclassic (4/15)	1992	`-xcg92`
TI TMS390Z50	SuperSPARC or SuperSPARC+	SPARCServer 1000 SPARCCenter 2000	1992, 1993	`-xcg92`
TI TMS390Z50 or Cypress 602	SuperSPARC or Cypress SPARC	SPARCSystem 6xxMP	1992, 1993	`-xcg92`
Weitek 1164/1165-based FPU or no FPU	Kernel emulates floating-point instructions	Obsolete	Slow; not recommended	`-xcg89` or `xcg92`

The systems based on SPARC FPUs preceding SuperSPARC, SuperSPARC+ and microSPARC implement the floating-point instruction set defined in the SPARC Architecture Manual Version 7. The systems based on the SuperSPARC, SuperSPARC+ and microSPARC FPUs implement the floating-point instruction set defined in the SPARC Architecture Manual Version 8.

The SuperSPARC and SuperSPARC+ FPUs implement the floating-point instruction set defined in the SPARC Architecture Manual Version 8 in hardware, except the quad precision instructions. Exceptional cases are handled in hardware.

The microSPARC FPUs implement SPARC Architecture Manual Version 8 floating-point instruction set in hardware, except FsMULd and quad precision instructions. Thus, for single-precision complex variables, use -xcg89 instead of -xcg92 for microSPARC FORTRAN programs.

Any unimplemented floating-point instruction causes a trap to the system kernel. The system kernel then emulates the unimplemented instruction. Similarly, running on a system with no FPU or disabled FPU, emulates the floating-point instructions in software. Usually this causes a severe performance degradation.

The default code generation switch is -xcg89 . The -xcg89 and -xcg92 flags are for cc and f77. (A more complete list of hardware targets is available by using the -xtarget= option; see the Fortran and C user's guides for more information.)

In accord with RISC (reduced instruction set computer) philosophy, complicated instructions such as the transcendental math functions are not implemented in hardware.

Handling Subnormal Results

On the TI-derived FPUs, a combined ALU and multiplier unit, the ALU wraps subnormal results whether SIGFPE has been enabled or not. (A wrapped number is created by multiplying the correct result by a constant power of two, prior to rounding.) But this wrapped result is never seen by software, because destination registers are never changed when traps occur. The correct subnormal result is computed by system software. The TI 8847 chip can be operated in nonstandard underflow mode, which forces all subnormal inputs and outputs to be flushed to zero without system software intervention.

In IEEE gradual underflow mode, if a wrapped result is produced by the multiplier, the result is passed to the ALU and unwrapped.

The TI 8847-based FPU nonstandard mode corresponds better to the intent of providing fast processing of small results than does the nonstandard mode of the Weitek 1164/1165-based FPU, which treats subnormal operands, but not results, as zero.

Status and Control Registers

The SPARC FPU has status and control registers associated with it: the floating-point status register (FSR) and the floating-point queue (FQ) set of control registers. Kernel software uses the FQ to recover from floating-point exceptions.

The floating-point status register (FSR) contains FPU mode and status information. The FSR is visible to user processes, and most fields can be written as well. The SPARC assembler and debugger know it as %fsr. Examples of %fsr use can be found in the file named libmil, containing in-line templates.

Figure B-1 shows the bit assignments of the Floating-Point Status Register.

Figure B-1 SPARC Floating-Point Status Register

The fields relevant to exception handling are TEM, NS, aexc, and cexc as shown in Table B-2.

Table B-2 Exception Handling Fields
Field	Corresponding bits in register
`TEM`, trap enable mask	NVM 27	OFM 26	UFM 25	DZM 24	NXM 23
`NS`, non standard floating-point			NS 22
`aexc`, accrued exception bits	nva 9	ofa 8	ufa 7	dza 6	nxa 5
`cexc`, current exception bits	nvc 4	ofc 3	ufc 2	dzc 1	nxc 0

The trap enable masks, the bit that indicates nonstandard floating-point, and the exception bits are either on or off. The low-order bit is bit 0.

The current exception bits are updated by the hardware when each floating-point operation successfully completes.

Handling Floating-point Exceptions

There are two cases when the hardware does not successfully complete a floating-point operation:

Operation is unimplemented (such as, the unimplemented fsqrt[sd] on Weitek 1164/1165-based FPUs)
Hardware is unable to deliver the correct result

In these cases, the process traps to the kernel, which emulates the floating-point operation and updates the FSR and destination registers.

If a floating-point exception results in a trap, then the destination floating-point register, the floating-point condition codes, (fcc), and the aexc fields remain unchanged. The cexc field is updated to show which exception caused the trap. (If the trap is caused by an unfinished or unimplemented floating point operation, instead of by one of the IEEE 754 floating point exceptions, then cexc is also unchanged.)

If the exception does not result in a trap, then the destination register, fcc, aexc and cexc are updated to their new values.

The following pseudo-code summarizes the handling of IEEE traps. Note that the aexc field can normally only be cleared by software.

FPop generates an IEEE exception;

texc IEEE exceptions generated by this FPop;

if (texc and TEM) = 0

then(aexc (aexc or texc); cexc = texc; f[] result;

fcc fcc_result)

else (cause fp_exception_trap)

FPop generates an IEEE exception; texc IEEE exceptions generated by this FPop; if (texc and TEM) = 0 then(aexc (aexc or texc); cexc = texc; f[] result; fcc fcc_result) else (cause fp_exception_trap)

Handling Gradual Underflow

In floating-point environments built on the architecture of some of the SPARC processors (TI TMS390Z5 and TI TMS390S10), and the Intel family, gradually underflowed results are almost always calculated by the floating-point unit in the CPU or coprocessor. Therefore, gradual underflow is much less likely to cause a performance degradation than it is when implemented in software. (For more information about gradual underflow, see the discussion of "Underflow" in Chapter 2, "IEEE Arithmetic," on page 3.)

If an application encounters frequent underflows, you may want to determine how much system time the application is using by timing the program execution with the time command.

demo% /bin/time myprog > myprog.output

305.3 real 32.4 user 271.9 sys

demo% /bin/time myprog > myprog.output 305.3 real 32.4 user 271.9 sys

To determine if underflows occurred in an application, you can use the math library function ieee_retrospective to check whether or not the underflow exception flag is raised when the program exits. FORTRAN programs call ieee_retrospective by default and C and C++ programs need to call ieee_retrospective prior to any exit points.

The function ieee_retrospective prints a message similar to the following to standard error, stderr:

Note: IEEE floating-point exception flags raised:
Inexact; Underflow;
See the Numerical Computation Guide, ieee_flags(3M)

Note: IEEE floating-point exception flags raised: Inexact; Underflow; See the Numerical Computation Guide, ieee_flags(3M)

The math library provides two functions to help programs toggle between standard underflow mode and nonstandard underflow mode. A call to nonstandard_arithmetic turns off IEEE gradual underflow (if applicable), and a call to standard_arithmetic restores IEEE behavior.

C, C++

nonstandard_arithmetic();
standard_arithmetic();

FORTRAN

call nonstandard_arithmetic()
call standard_arithmetic()

C, C++	nonstandard_arithmetic(); standard_arithmetic();
FORTRAN	call nonstandard_arithmetic() call standard_arithmetic()

You should use nonstandard_arithmetic with caution, because it causes the loss of the accuracy benefits of gradual underflow.

Nonstandard Arithmetic and Kernel Emulation (SPARC)

There are several ways to set the FPU to nonstandard mode: compile with
-fast or -fnonstd or invoke nonstandard_arithmetic() from inside the program.

Not all SPARC implementations provide a nonstandard mode. Trying to set it on implementations that do not provide it is ignored. For the SPARC implementations that provide a fast or nonstandard mode, setting the FPU to this mode means that some or all underflowed results (and/or operands) are flushed to zero. If for some reason a floating point operation that would underflow is interrupted (for example, it is in the queue when a context switch occurs), it is later emulated by kernel software, which always uses standard IEEE arithmetic.

Nonstandard mode is not emulated by the kernel because its behavior is undefined, and implementation-dependent. Thus, under unusual circumstances, it could happen that an executable that sets the FPU to nonstandard mode might produce slightly varying results depending on system load. This behavior has not been observed. It affects only those programs that are very sensitive to whether or not a particular computation (from among millions) was handled with gradual underflow or with abrupt underflow.

`fpversion`(1) Function -- Finding Information About the FPU

The utility fpversion(1), distributed with the unbundled compilers, identifies the installed CPU and FPU and estimates their clock speeds.

The fpversion(1) function determines the CPU and FPU type by interpreting the identification information stored by the CPU and FPU. The fpversion function is installed with the unbundled compilers, and is usually located in the same directory as the unbundled compilers.

On a SPARCStation^TM 2 workstation, the information returned from fpversion is similar to this example (there might be differences depending on configuration, of course):

demo% fpversion

A SPARC-based CPU is available.

CPU's clock rate appears to be approximately 39.3 MHz.

The clock rate is probably 40.0 Mz.

Sun-4 floating-point controller version 2 found.

A TI TMS390C602A-based FPU is available.

FPU's frequency appears to be approximately 38.6 MHz.

The clock rate is probably 40.0 Mz.

Use "-xcg89" floating-point option.

Hostid = 0x67003A21.

demo% fpversion A SPARC-based CPU is available. CPU's clock rate appears to be approximately 39.3 MHz. The clock rate is probably 40.0 Mz. Sun-4 floating-point controller version 2 found. A TI TMS390C602A-based FPU is available. FPU's frequency appears to be approximately 38.6 MHz. The clock rate is probably 40.0 Mz. Use "-xcg89" floating-point option. Hostid = 0x67003A21.