Glossary

This glossary describes computer floating-point arithmetic terms. It also describes terms and acronyms associated with parallel processing.

Note - This symbol, "^||" appended to a term designates it as associated with parallel processing.

accuracy

Accuracy is a measure of the extent to which a result is affected by error. Contrast with precision. For example, "the result is accurate to six decimal places" implies that all the errors incurred while calculating the result are not large enough to change the sixth decimal place of the result.

array processing^||

A number of processors working simultaneously, each handling one element of the array, so that a single operation can apply to all elements of the array in parallel.

associativity^||

See cache, direct mapped cache, fully associative cache, set associative cache.

asynchronous control^||

Computer control behavior where a specific operation is begun upon receipt of an indication (signal) that a particular event has occurred. Asynchronous control relies on synchronization mechanisms called locks to coordinate processors. See mutual exclusion, mutex lock, semaphore lock, single-lock strategy, spin lock.

backplane^||

See MBus, multiprocessor bus, XDBus.

barrier^||

A synchronization mechanism for coordinating tasks even when data accesses are not involved. A barrier is analogous to a gate. Processors or threads operating in parallel reach the gate at different times, but none can pass through until all processors reach the gate. For example, suppose at the end of each day, all bank tellers are required to tally the amount of money that was deposited, and the amount that was withdrawn. These totals are then reported to the bank vice president, who must check the grand totals to verify debits equal credits. The tellers operate at their own speeds; that is, they finish totaling their transactions at different times. The barrier mechanism prevents tellers from leaving for home before the grand total is checked. If debits do not equal credits, all tellers must return to their desks to find the error. The barrier is removed after the vice president obtains a satisfactory grand total.

biased exponent

The sum of the base-2 exponent and a constant (bias) chosen to make the stored exponent's range non-negative. For example, the exponent of 2^-100 is stored in IEEE single precision format as (-100) + (single precision bias of 127) = 27.

binade

The interval between any two consecutive powers of two.

blocked state^||

A thread is waiting for a resource or data; such as, return data from a pending disk read, or waiting for another thread to unlock a resource.

bound threads^||

For Solaris threads, a thread permanently assigned to a particular LWP is called a bound thread. Bound threads can be scheduled on a real-time basis in strict priority with respect to all other active threads in the system, not only within a process. An LWP is an entity that can be scheduled with the same default scheduling priority as any UNIX process.

cache^||

Small, fast, hardware-controlled memory that acts as a buffer between a processor and main memory. Cache contains a copy of the most recently used memory locations--addresses and contents--of instructions and data. Every address reference goes first to cache. If the desired instruction or data is not in cache, a cache miss occurs. The contents are fetched across the bus from main memory into the CPU register specified in the instruction being executed and a copy is also written to cache. It is likely that the same location will be used again soon, and, if so, the address is found in cache, resulting in a cache hit. If a write to that address occurs, the hardware not only writes to cache, but can also generate a write-through to main memory.

Cache for the SuperSPARC processor is organized into the following hierarchy:

First-level instruction cache. The 20Kbytes, on-chip, instruction cache fetches up to four instructions per cycle without memory wait states.
First-level data cache. The 16Kbytes, on-chip, data cache handles one 64-bit load or store per cycle.
Second-level external cache. The 1Mbyte (optional 2Mbytes) external cache is optimized to handle high bandwidth burst data transfers between the processor and cache.

Tables 1, 2, and 3 summarize the characteristics of the SuperSPARC cache.

First-Level Instruction Cache

SPARCServerType
Total Cache
Associ-ativity
Block
Size
Sub-blocks
Write Policy
Bus Type
& Protocol

SS10
(no ext cache)

20Kbytes

5-way

64 bytes

2

write-
back

MBus /
circuit
switched
SS10
(ext cache)

20Kbytes

5-way

64 bytes

2

write-
through

MBus /
circuit
switched
SS1000

20KByte

5-way

64 bytes

2

write-
through

XDBus /
packet switched
SS2000

20Kbytes

5-way

64 bytes

2

write-
through

XDBus /
packet switched

First-Level Instruction Cache
SPARCServerType	Total Cache	Associ-ativity	Block Size	Sub-blocks	Write Policy	Bus Type & Protocol
SS10 (no ext cache)	20Kbytes	5-way	64 bytes	2	write- back	MBus / circuit switched
SS10 (ext cache)	20Kbytes	5-way	64 bytes	2	write- through	MBus / circuit switched
SS1000	20KByte	5-way	64 bytes	2	write- through	XDBus / packet switched
SS2000	20Kbytes	5-way	64 bytes	2	write- through	XDBus / packet switched

First-Level Data Cache

SPARCServerType
Total Cache
Associ-ativity
Block
Size
Sub-blocks
Write Policy
Bus Type
& Protocol

SS10
(no ext cache)

16Kbytes

4-way

32 bytes

none

write-
back

MBus /
circuit
switched
SS10
(ext cache)

16Kbytes

4-way

32 bytes

none

write-
through

MBus /
circuit
switched
SS1000

16KByte

4-way

32 bytes

none

write-
through

XDBus /
packet switched
SS2000

16Kbytes

4-way

32 bytes

none

write-
through

XDBus /
packet switched

First-Level Data Cache
SPARCServerType	Total Cache	Associ-ativity	Block Size	Sub-blocks	Write Policy	Bus Type & Protocol
SS10 (no ext cache)	16Kbytes	4-way	32 bytes	none	write- back	MBus / circuit switched
SS10 (ext cache)	16Kbytes	4-way	32 bytes	none	write- through	MBus / circuit switched
SS1000	16KByte	4-way	32 bytes	none	write- through	XDBus / packet switched
SS2000	16Kbytes	4-way	32 bytes	none	write- through	XDBus / packet switched

Second-Level External Cache

SPARCServerType
Total Cache
Associ-ativity
Block
Size
Sub-blocks
Write Policy
Bus Type
& Protocol

SS10
(no ext cache)

none

none

none

none

none

none
SS10
(ext cache)

1Mbyte

direct
map

128 bytes

4

write-
back

MBus /
circuit
switched
SS1000

1MByte

direct
map

256 bytes

4

write-
back

XDBus /
packet switched
SS2000

1Mbyte

direct
map

256 bytes

4

write-
back

XDBus /
packet switched
SS2000
(mod. no. xx)

2Mbytes

direct
map

256

4

write-
back

XDBus /
packet switched

Second-Level External Cache
SPARCServerType	Total Cache	Associ-ativity	Block Size	Sub-blocks	Write Policy	Bus Type & Protocol
SS10 (no ext cache)	none	none	none	none	none	none
SS10 (ext cache)	1Mbyte	direct map	128 bytes	4	write- back	MBus / circuit switched
SS1000	1MByte	direct map	256 bytes	4	write- back	XDBus / packet switched
SS2000	1Mbyte	direct map	256 bytes	4	write- back	XDBus / packet switched
SS2000 (mod. no. xx)	2Mbytes	direct map	256	4	write- back	XDBus / packet switched

See associativity, circuit switching, direct mapped cache, fully associative cache, MBus, packet switching, set associative cache, write-back, write-through, XDBus.

cache locality^||

A program does not access all of its code or data at once with equal probability. Having recently accessed information in cache increases the probability of finding information locally without having to access memory. The principle of locality states that programs access a relatively small portion of their address space at any instant of time. There are two different types of locality: temporal and spatial.

Temporal locality (locality in time) is the tendency to reuse recently accessed items. For example, most programs contain loops, so that instructions and data are likely to be accessed repeatedly. Temporal locality retains recently accessed items closer to the processor in cache rather than requiring a memory access. See cache, competitive-caching, false sharing, write-invalidate, write-update.

Spatial locality (locality in space) is the tendency to reference items whose addresses are close to other recently accessed items. For example, accesses to elements of an array or record show a natural spatial locality. Caching takes advantage of spatial locality by moving blocks (multiple contiguous words) from memory into cache and closer to the processor. See cache, competitive-caching, false sharing, write-invalidate, write-update.

chaining

A hardware feature of some pipeline architectures that allows the result of an operation to be used immediately as an operand for a second operation, simultaneously with the writing of the result to its destination register. The total cycle time of two chained operations is less than the sum of the stand-alone cycle times for the instructions. For example, the TI 8847 supports chaining of consecutive fadd, fsub, and fmul (of the same precision). Chained faddd/fmuld requires 12 cycles, while consecutive unchained faddd/fmuld requires 17 cycles.

circuit switching^||

A mechanism for caches to communicate with each other as well as with main memory. A dedicated connection (circuit) is established between caches or between cache and main memory. While a circuit is in place no other traffic can travel over the bus.

coherence^||

In systems with multiple caches, the mechanism that ensures that all processors see the same image of memory at all times.

common exceptions

The three floating point exceptions overflow, invalid, and division are collectively referred to as the common exceptions for the purposes of ieee_flags(3m) and ieee_handler(3m). They are called common exceptions because they are commonly trapped as errors.

competitive-caching^||

Competitive-caching maintains cache coherence by using a hybrid of write-invalidate and write-update. Competitive-caching uses a counter to age shared data. Shared data is purged from cache based on a least-recently-used (LRU) algorithm. This can cause shared data to become private data again, thus eliminating the need for the cache coherency protocol to access memory (via backplane bandwidth) to keep multiple copies synchronized. See cache, cache locality, false sharing, write-invalidate, write-update.

concurrency^||

The execution of two or more active threads or processes in parallel. On a uniprocessor apparent concurrence is accomplished by rapidly switching between threads. On a multiprocessor system true parallel execution can be achieved. See asynchronous control, multiprocessor system, thread.

concurrent processes^||

Processes that execute in parallel in multiple processors or asynchronously on a single processor. Concurrent processes can interact with each other, and one process can suspend execution pending receipt of information from another process or the occurrence of an external event. See process, sequential processes.

condition variable^||

For Solaris threads, a condition variable enables threads to atomically block until a condition is satisfied. The condition is tested under the protection of a mutex lock. When the condition is false, a thread blocks on a condition variable and atomically releases the mutex waiting for the condition to change. When another thread changes the condition, it can signal the associated condition variable to cause one or more waiting threads to wake up, reacquire the mutex, and re-evaluate the condition. Condition variables can be used to synchronize threads in this process and other processes if the variable is allocated in memory that is writable and shared among the cooperating processes and have been initialized for this behavior.

context switch

In multitasking operating systems, such as the SunOS operating system, processes run for a fixed time quantum. At the end of the time quantum, the CPU receives a signal from the timer, interrupts the currently running process, and prepares to run a new process. The CPU saves the registers for the old process, and then loads the registers for the new process. Switching from the old process state to the new is known as a context switch. Time spent switching contexts is system overhead; the time required depends on the number of registers, and on whether there are special instructions to save the registers associated with a process.

control flow model^||

The von Neumann model of a computer. This model specifies flow of control; that is, which instruction is executed at each step of a program. All Sun workstations are instances of the von Neumann model. See data flow model, demand-driven dataflow.

critical region^||

An indivisible section of code that can only be executed by one thread at a time and is not interruptible by other threads; such as, code that accesses a shared variable. See mutual exclusion, mutex lock, semaphore lock, single-lock strategy, spin lock.

critical resource^||

A resource that can only be in use by at most one thread at any given time. Where several asynchronous threads are required to coordinate their access to a critical resource, they do so by synchronization mechanisms. See mutual exclusion, mutex lock, semaphore lock, single-lock strategy, spin lock.

data flow model^||

This computer model specifies what happens to data, and ignores instruction order. That is, computations move forward by nature of availability of data values instead of the availability of instructions. See control flow model, demand-driven dataflow.

data race^||

In multithreading, a situation where two or more threads simultaneously access a shared resource. The results are indeterminate depending on the order in which the threads accessed the resource. This situation, called a data race, can produce different results when a program is run repeatedly with the same input. See mutual exclusion, mutex lock, semaphore lock, single-lock strategy, spin lock.

deadlock^||

A situation that can arise when two (or more) separately active processes compete for resources. Suppose that process P requires resources X and Y and requests their use in that order at the same time that process Q requires resources Y and X and asks for them in that order. If process P has acquired resource X and simultaneously process Q has acquired resource Y, then neither process can proceed--each process requires a resource that has been allocated to the other process.