Previous Next Contents Index Doc Set Home


SPARC Behavior and Implementation

B


This chapter discusses issues related to the floating-point units used in SPARC workstations and describes a way to determine which code generation flags are best suited for a particular workstation.

This appendix has the following organization:

Floating-point Hardware

page 131

fpversion(1) Function -- Finding Information About the FPU

page 138


Floating-point Hardware

This section describes details of SPARC implementation of IEEE exceptions. See the SPARC Architecture Manual, Version 8, Appendix N "SPARC IEEE 754 Implementation Recommendations" for a brief description of what happens when a trap is taken, the distinction between trapped and untrapped underflow, and recommended possible courses of action for SPARC implementations that choose to provide non-IEEE (nonstandard) arithmetic mode.

Many SPARC systems have floating-point units derived from cores developed by TI or Weitek.

These two families of FPUs have been licensed to other workstation vendors, so chips from other semiconductor manufacturers may be found in SPARC workstations.

Table B-1 lists the hardware floating-point implementations used by SPARC workstations. Because the SPARC architecture defines the instruction set implemented by a SPARC floating-point unit, the chips differ more in the technology used to construct them than in functionality.

Table  B-1 SPARC Floating-Point Options  

FPU

Description

Appropriate
for Machines

Notes

Compilation Switch

TI 8847-based FPU

TI 8847; controller from Fujitsu or LSI

Sun-4/1xx

Sun-4/2xx

Sun-4/3xx

Sun-4/4xx

SPARCstation 1 (4/60)

1989.

Most SPARCstation 1 workstations have

Weitek 3170

-xcg89

Weitek 3170-based FPU

   

SPARCstation 1 (4/60) SPARCstation 1+ (4/65)

1989, 1990

-xcg89

TI 602a

   

SPARCstation 2 (4/75)

1990

-xcg89

Weitek 3172-based FPU

   

SPARCstation SLC (4/20)

SPARCstation IPC (4/40)

1990

-xcg89

Weitek 8601 or Fujitsu 86903

Integrated CPU and FPU

SPARCstation IPX (4/50)

SPARCstation ELC (4/25)

1991;

IPX uses 40 MHz CPU/FPU; ELC uses 33 MHz

-xcg89

Cypress 602

Resides on Mbus Module

SPARCserver 6xx

1991

-xcg89

TI TMS390Z50

SuperSPARC or

SuperSPARC+

SPARCserver 6xx
SPARCstation 10
Model 30, 40, 41, 52, 54, 402MP, 512MP

1992, 1993

-xcg92

TI TMS390S10

microSPARC

SPARCstation LX (4/30)

1992

-xcg92

TI TMS390S10

microSPARC

SPARCclassic (4/15)

1992

-xcg92

TI TMS390Z50

SuperSPARC or SuperSPARC+

SPARCServer 1000

SPARCCenter 2000

1992, 1993

-xcg92

TI TMS390Z50 or Cypress 602

SuperSPARC or Cypress SPARC

SPARCSystem 6xxMP

1992, 1993

-xcg92

Weitek 1164/1165-based FPU

or no FPU

Kernel emulates floating-point

instructions

Obsolete

Slow; not recommended

-xcg89 or xcg92

The systems based on SPARC FPUs preceding SuperSPARC, SuperSPARC+ and microSPARC implement the floating-point instruction set defined in the SPARC Architecture Manual Version 7. The systems based on the SuperSPARC, SuperSPARC+ and microSPARC FPUs implement the floating-point instruction set defined in the SPARC Architecture Manual Version 8.

The SuperSPARC and SuperSPARC+ FPUs implement the floating-point instruction set defined in the SPARC Architecture Manual Version 8 in hardware, except the quad precision instructions. Exceptional cases are handled in hardware.

The microSPARC FPUs implement SPARC Architecture Manual Version 8 floating-point instruction set in hardware, except FsMULd and quad precision instructions. Thus, for single-precision complex variables, use -xcg89 instead of -xcg92 for microSPARC FORTRAN programs.

Any unimplemented floating-point instruction causes a trap to the system kernel. The system kernel then emulates the unimplemented instruction. Similarly, running on a system with no FPU or disabled FPU, emulates the floating-point instructions in software. Usually this causes a severe performance degradation.

The default code generation switch is -xcg89 . The -xcg89 and -xcg92 flags are for cc and f77. (A more complete list of hardware targets is available by using the -xtarget= option; see the Fortran and C user's guides for more information.)

In accord with RISC (reduced instruction set computer) philosophy, complicated instructions such as the transcendental math functions are not implemented in hardware.

Handling Subnormal Results

On the TI-derived FPUs, a combined ALU and multiplier unit, the ALU wraps subnormal results whether SIGFPE has been enabled or not. (A wrapped number is created by multiplying the correct result by a constant power of two, prior to rounding.) But this wrapped result is never seen by software, because destination registers are never changed when traps occur. The correct subnormal result is computed by system software. The TI 8847 chip can be operated in nonstandard underflow mode, which forces all subnormal inputs and outputs to be flushed to zero without system software intervention.

In IEEE gradual underflow mode, if a wrapped result is produced by the multiplier, the result is passed to the ALU and unwrapped.

The TI 8847-based FPU nonstandard mode corresponds better to the intent of providing fast processing of small results than does the nonstandard mode of the Weitek 1164/1165-based FPU, which treats subnormal operands, but not results, as zero.

Status and Control Registers

The SPARC FPU has status and control registers associated with it: the floating-point status register (FSR) and the floating-point queue (FQ) set of control registers. Kernel software uses the FQ to recover from floating-point exceptions.

The floating-point status register (FSR) contains FPU mode and status information. The FSR is visible to user processes, and most fields can be written as well. The SPARC assembler and debugger know it as %fsr. Examples of %fsr use can be found in the file named libmil, containing in-line templates.

Figure B-1 shows the bit assignments of the Floating-Point Status Register.

Figure  B-1 SPARC Floating-Point Status Register
The fields relevant to exception handling are TEM, NS, aexc, and cexc as shown in Table B-2.

Table  B-2 Exception Handling Fields

Field
Corresponding bits in register

TEM, trap enable mask

NVM

27

OFM 26

UFM 25

DZM 24

NXM 23

NS, non standard floating-point

   

   

NS

22

   

   

aexc, accrued exception bits

nva

9

ofa

8

ufa

7

dza

6

nxa

5

cexc, current exception bits

nvc

4

ofc

3

ufc

2

dzc

1

nxc

0

The trap enable masks, the bit that indicates nonstandard floating-point, and the exception bits are either on or off. The low-order bit is bit 0.

The current exception bits are updated by the hardware when each floating-point operation successfully completes.

Handling Floating-point Exceptions

There are two cases when the hardware does not successfully complete a floating-point operation:

In these cases, the process traps to the kernel, which emulates the floating-point operation and updates the FSR and destination registers.

If a floating-point exception results in a trap, then the destination floating-point register, the floating-point condition codes, (fcc), and the aexc fields remain unchanged. The cexc field is updated to show which exception caused the trap. (If the trap is caused by an unfinished or unimplemented floating point operation, instead of by one of the IEEE 754 floating point exceptions, then cexc is also unchanged.)

If the exception does not result in a trap, then the destination register, fcc, aexc and cexc are updated to their new values.

The following pseudo-code summarizes the handling of IEEE traps. Note that the aexc field can normally only be cleared by software.

FPop generates an IEEE exception;
texc  IEEE exceptions generated by this FPop;
if (texc and TEM) = 0 
  then(aexc  (aexc or texc); cexc = texc; f[] result;
        fcc  fcc_result) 
  else (cause fp_exception_trap) 

Handling Gradual Underflow

In floating-point environments built on the architecture of some of the SPARC processors (TI TMS390Z5 and TI TMS390S10), and the Intel family, gradually underflowed results are almost always calculated by the floating-point unit in the CPU or coprocessor. Therefore, gradual underflow is much less likely to cause a performance degradation than it is when implemented in software. (For more information about gradual underflow, see the discussion of "Underflow" in Chapter 2, "IEEE Arithmetic," on page 3.)

If an application encounters frequent underflows, you may want to determine how much system time the application is using by timing the program execution with the time command.

demo% /bin/time myprog > myprog.output
305.3 real	      32.4 user      	271.9 sys 

To determine if underflows occurred in an application, you can use the math library function ieee_retrospective to check whether or not the underflow exception flag is raised when the program exits. FORTRAN programs call ieee_retrospective by default and C and C++ programs need to call ieee_retrospective prior to any exit points.

The function ieee_retrospective prints a message similar to the following to standard error, stderr:

Note: IEEE floating-point exception flags raised: 
Inexact; Underflow;
See the Numerical Computation Guide, ieee_flags(3M)

The math library provides two functions to help programs toggle between standard underflow mode and nonstandard underflow mode. A call to nonstandard_arithmetic turns off IEEE gradual underflow (if applicable), and a call to standard_arithmetic restores IEEE behavior.

C, C++

nonstandard_arithmetic();

standard_arithmetic();

FORTRAN

call nonstandard_arithmetic()

call standard_arithmetic()

You should use nonstandard_arithmetic with caution, because it causes the loss of the accuracy benefits of gradual underflow.

Nonstandard Arithmetic and Kernel Emulation (SPARC)

There are several ways to set the FPU to nonstandard mode: compile with
-fast or -fnonstd or invoke nonstandard_arithmetic() from inside the program.

Not all SPARC implementations provide a nonstandard mode. Trying to set it on implementations that do not provide it is ignored. For the SPARC implementations that provide a fast or nonstandard mode, setting the FPU to this mode means that some or all underflowed results (and/or operands) are flushed to zero. If for some reason a floating point operation that would underflow is interrupted (for example, it is in the queue when a context switch occurs), it is later emulated by kernel software, which always uses standard IEEE arithmetic.

Nonstandard mode is not emulated by the kernel because its behavior is undefined, and implementation-dependent. Thus, under unusual circumstances, it could happen that an executable that sets the FPU to nonstandard mode might produce slightly varying results depending on system load. This behavior has not been observed. It affects only those programs that are very sensitive to whether or not a particular computation (from among millions) was handled with gradual underflow or with abrupt underflow.


fpversion(1) Function -- Finding Information About the FPU

The utility fpversion(1), distributed with the unbundled compilers, identifies the installed CPU and FPU and estimates their clock speeds.

The fpversion(1) function determines the CPU and FPU type by interpreting the identification information stored by the CPU and FPU. The fpversion function is installed with the unbundled compilers, and is usually located in the same directory as the unbundled compilers.

On a SPARCStationTM 2 workstation, the information returned from fpversion is similar to this example (there might be differences depending on configuration, of course):

demo% fpversion 
 A SPARC-based CPU is available. 
 CPU's clock rate appears to be approximately 39.3 MHz.
 The clock rate is probably 40.0 Mz. 

 Sun-4 floating-point controller version 2 found. 
 A TI TMS390C602A-based FPU is available. 
 FPU's frequency appears to be approximately 38.6 MHz. 
 The clock rate is probably 40.0 Mz.

 Use "-xcg89" floating-point option.

 Hostid = 0x67003A21. 

Note that fpversion is not instantaneous. Indeed, fpversion might take about a minute of user time to run--this is deliberate. fpversion determines the approximate clock rates for the CPU and FPU by timing a loop that executes simple instructions that run in a predictable amount of time.

The loop is executed relatively many times to assure that the timing measurements and clock rate estimates are correct.

Floating-Point Hardware--Disable/Enable

It is possible to disable the Floating-Point Unit (FPU) on a SPARC workstation. This might be useful when trying to establish that the FPU is faulty. The probability of faulty hardware is far exceeded by the probability of programming error, and so the following examples should be used with extreme caution.

Floating-point hardware can be disabled/enabled in software by running the following scripts as root.

To disable the FPU:

#!/bin/sh - 
# script to turn the FPU off 
# 
adb -k -w /kernel/unix /dev/mem <<! 
fpu_exists/W0 
! 

It can be enabled again this way:

#!/bin/sh - 
# script to turn the FPU on 
# 
adb -k -w /kernel/unix /dev/mem <<! 
fpu_exists/W1 
! 

These scripts can be run in normal multi-user mode without rebooting, but you should avoid running them if any processes that use floating-point instructions are executing.


Previous Next Contents Index Doc Set Home