Index
A
- abort on exception
- C example, 117
- accuracy, 230
- floating-point operations, 4
- significant digits (number of), 19
- threshold, 29
- adb, 61
- addrans
- random number utilities, 49
- argument reduction
- trigonometric functions, 48
B
- base conversion
- base 10 to base 2, 50
- base 2 to base 10, 50
- formatted I/O, 50
- Bessel functions, 230
C
- C driver
- example, call FORTRAN subroutines from C, 119
- clock speed, 136
- compiler option
- -Xa
- X/Open behavior for libm, 229
- -Xt
- SVID behavior for libm, 229
- conversion between number sets, 20
- conversions between decimal strings and binary floating-point numbers, 4
- convert_external
- binary floating-point, 49
- data conversion, 49
D
- data types
- relation to IEEE formats, 5
- dbx, 61
- decimal representation
- maximum positive normal number, 18
- minimum positive normal number, 18
- precision, 18
- ranges, 18
- double-precision representation
- C example, 90
- FORTRAN example, 91
E
- errno.h
- define values for errno, 229
- examine the accrued exception bits
- C example, 104
- examine the accrued exception flags
- C example, 106
F
- f77_floatingpoint.h
- define handler types
- FORTRAN, 69
- -fast, 135
- floating-point
- exceptions list, 4
- rounding direction, 4
- rounding precision, 4
- tutorial, 147
- floating-point accuracy
- decimal strings and binary floating-point numbers, 4
- floating-point exceptions, 2, 133
- abort on exceptions, 117
- accrued exception bits, 104
- common exceptions, 54
- default result, 55
- definition, 54
- flags, 57
- accrued, 57
- current, 57
- ieee_functions, 39
- ieee_retrospective, 45
- list of exceptions, 54
- priority, 57
- trap precedence, 57
- floating-point options, 130
- floating-point queue (FQ), 132
- floating-point status register (FSR), 127, 132
- floating-point unit, 137
- disable on SPARC, 137
- enable on SPARC, 137
- floating-point unit (FPU), 130, 131
- floatingpoint.h
- define handler types
- C and C++, 69
- flush to zero (see Store 0), 23
- fmod, 230
- -fnonstd, 135
- fpversion, 136
G
- generate an array of numbers
- FORTRAN example, 92
- Goldberg paper, 147
- abstract, 148
- acknowledgments, 216
- details, 201
- IEEE standard, 165
- IEEE standards, 161
- introduction, 148
- references, 216
- rounding error, 149
- summary, 215
- systems aspects, 187
- gradual underflow
- error properties, 25
H
- HUGE
- compatibility with IEEE standard, 226
- HUGE_VAL
- compatibility with IEEE standard, 226
I
- IEEE double extended format
- biased exponent
- x86 architecture, 14
- bit-field assignment
- x86 architecture, 14
- fraction
- x86 architecture, 14
- Inf
- SPARC architecture, 13
- x86 architecture, 16
- NaN
- x86 architecture, 18
- normal number
- SPARC architecture, 13
- x86 architecture, 16
- quadruple precision
- SPARC architecture, 12
- sign bit
- x86 architecture, 15
- significand
- explicit leading bit
- x86 architecture, 14
- subnormal number
- SPARC architecture, 13
- x86 architecture, 16
- IEEE double format
- biased exponent, 8
- bit patterns and equivalent values, 10
- bit-field assignment, 8
- denormalized number, 10
- fraction, 8
- storage on SPARC, 8
- storage on x86, 8
- implicit bit, 10
- Inf, infinity, 10
- NaN, not a number, 11
- normal number, 10
- precision, 10
- sign bit, 9
- significand, 10
- subnormal number, 10
- IEEE formats
- relation to language data types, 5
- IEEE single format
- biased exponent, 6
- biased exponent,implicit bit, 7
- bit assignments, 6
- bit patterns and equivalent values, 7
- bit-field assignment, 6
- denormalized number, 7
- fraction, 6
- Inf,positive infinity, 7
- mixed number, significand, 7
- NaN, not a number, 8
- normal number
- maximum positive, 7
- normal number bit pattern, 6
- precision, normal number, 7
- sign bit, 6
- subnormal number bit pattern, 6
- IEEE Standard 754
- double extended format, 4
- double format, 3
- single format, 3
- ieee_flags, 42
- accrued exception flag, 42
- examine accrued exception bits
- C example, 104
- rounding direction, 42
- rounding precision, 42, 44
- set exception flags
- C example, 107
- truncate rounding, 43
- ieee_functions
- bit mask operations, 38
- floating-point exceptions, 39
- ieee_handler, 69
- abort on exception
- FORTRAN example, 117
- example, calling sequence, 62
- trap on common exceptions, 54
- trap on exception
- C example, 109
- ieee_retrospective
- check underflow exception flag, 135
- floating-point exceptions, 44
- floating-point status register (FSR), 45
- getting information about nonstandard IEEE modes, 44
- getting information about outstanding exceptions, 44
- nonstandard_arithmetic in effect, 45
- precision, 44
- rounding, 44
- suppress exception messages, 46
- ieee_sun
- IEEE classification functions, 38
- ieee_values
- quadruple-precision values, 40
- representing floating-point values, 40
- representing Inf, 40
- representing NaN, 40
- representing normal number, 40
- single-precision values, 40
- ieee_values functions
- C example, 99
- Inf, 2, 227
- default result of divide by zero, 55
L
- lcrans
- random number utilities, 49
- libm
- SVID compliance, 227
- libm
- default directories
- executables, 32
- header files, 32
- list of functions, 32
- standard installation, 32
- libm functions
- double precision, 37
- quadruple precision, 37
- single precision, 37
- libmil (see also in-line templates), 132, 227
- libsunmath
- default directories
- executables, 33
- header files, 33
- list of functions, 34
- standard installation, 33
M
- MAXFLOAT, 229
N
- NaN, 2, 14, 227, 230
- nonstandard_arithmetic, 135
- turn off IEEE gradual underflow, 135
- underflow, 46
- gradual, 46
- normal number
- maximum positive, 7
- minimum positive, 23, 27
- number line
- binary representation, 19
- decimal representation, 19
- powers of 2, 26
O
- operating system math library
- libm.a, 32
- libm.so, 32
P
- pi
- infinitely precise value, 48
- PowerPC
- bit pattern values, 13
- double format, 8
- double-extended format, 12
- IEEE arithmetic, 3
- quad-precision values, 41
- ranges and precisions, 18
- the FPSCR register, 57
- underflow thresholds, 22
Q
- quadruple-precision representation
- FORTRAN example, 91
- quiet NaN
- default result of invalid operation, 55
R
- random number generators, 92
- random number utilities
- shufrans, 49
- represent double-precision value
- C example, 90
- FORTRAN example, 92
- represent single-precision value
- C example, 90
- rounding direction, 4, 25
- C example, 102
- ulp (unit in the last place), 25
- rounding precision, 4
- roundoff error
- accuracy
- loss of, 24
S
- set exception flags
- C example, 107
- shufrans
- shuffle pseudo-random numbers, 49
- single format, 6
- single precision representation
- C example, 90
- SPARC
- FPU, 137
- square root instruction, 133, 227
- standard_arithmetic
- turn on IEEE behavior, 135
- Store 0, 23
- flush underflow results, 27, 28
- subnormal number, 27, 132
- floating-point calculations, 23
- SVID behavior of libm
- -Xt compiler option, 229
- SVID exceptions
- errno set to EDOM
- improper operands, 226
- errno set to ERANGE
- overflow or underflow, 226
- matherr, 226
- PLOSS, 230
- TLOSS, 230
- System V Interface Definition (SVID), 225
T
- trap, 131
- abort on exception, 117
- ieee_retrospective, 45
- trap on exception
- C example, 109, 110
- trap on floating-point exceptions
- C example, 109
- trigonometric functions
- argument reduction, 48
- tutorial, floating-point, 147
U
- underflow
- floating-point calculations, 22
- gradual, 23, 132
- nonstandard_arithmetic, 46
- threshold, 27
- underflow thresholds
- double extended precision, 22
- double precision, 22
- single precision, 22
- unordered comparison
- floating-point values, 56
- NaN, 56
V
- values.h
- define error messages, 229
X
- X/Open behavior of libm
- -Xa compiler option, 229
- X_TLOSS, 229
- -Xa, 229
- -Xc, 229
- -Xs, 229
- -Xt, 229