Analysis of the VLXbar design


Contents

Introduction

This document specifies the detailed behaviour of the simulation models of the VLXbar network.

At the lowest level, packets are represented by a set of flits. The low level terms are shown on the timing diagram (figure 1). They are the Input Packet Length defined as the total number of flits in the input packet, the Internal Delay defined as the time between the first input flit appearing and the first output flit appearing.

Figure 1: The basic delays imposed by a network.

Different networks will give different figures for the internal delay. The internal delay will also vary according to contention; in an ideal network it would be zero. The next timing diagram illustrates the components of the internal delay in the low level simulation of the VLXbar. The arbitration delay is the time from the destination address being known to the input queue being able to transfer a data flit to the output. The Iqueue delay is the time for a packet to cross from the input to the output of the input queue. This may be zero if an input queue bypass has been implemented. The Xbar delay is the time for a flit to traverse the crossbar. The Oqueue delay is the time to traverse the output queue. Again, this may be zero if an output queue bypass circuit is present.

Figure 2: Internal delays imposed by the VLXbar network.

Ideally all the delays would be zero. Unfortunately some are inescapable, but minimising them is a fundamental design aim.

Arbitration delay

The arbiter establishes a path across the crossbar between input and output queues. Up to Ninputs arbitration requests may be generated at the same time and the arbiter must ensure fairness and prevent starvation. This is a hard problem to solve for large N.

One algorithm considered is a tree based one. This requires N-1 basic 2-way arbiters per output, log2(N) logic stages. The total number of arbiters required is N.(N-1). The complexity of the basic 2-way arbiter is heavily dependent on the number of priority levels.

Figure 3: Tree arbitration circuit.

The circuit diagram for a simple one level arbiter is shown in figure 3. Requests are asserted on In0 and In1. Select acts as an enable/disable signal for the cell. Choice determines which of the grant signals Out0 or Out1 will be asserted.

Figure 4: A simple, 1 level 2-way arbitration cell.

These cells may be combined to produce higher level arbiters. Figure 4 shows a 4 way arbiter constructed with three of the above two-way cells. choice in this case determines whether odd or even outputs are being arbitrated.

Figure 5: A 4 way arbiter constructed from 3 2-way arbitration cells.

The arbitration policy for this scheme is not sophisticated. One option would be to invert the choice signal after every grant - this would help prevent starvation, but would impose a delay on a succession of packets from a single source if there were no other requesters. What is the policy between neighbouring inputs? Should the last choice at a level be stored in a flip-flop?

Extending the number of priority levels to include ``urgent'' increases the complexity of the cells significantly.

Input queue delay

Clawback. (reduces it by one)

Bypass. (reduces to zero if queue empty)

Crossbar delay

Depends on bus capacitance.

Output queue delay

Includes delay for output queue arbitration. This is a very simple arbiter (two queues competing for one output).

Bypass. (reduces to zero if queue empty)

Causes of longer than minimum delays

Causes of longer than minimum delays are:

Conclusion and Future Work


Fred Howell

Department of Computer Science
The University of Edinburgh
James Clerk Maxwell Building
King's Buildings
Mayfield Road
Edinburgh EH9 3JZ

Last modified: Thu Nov 27 16:16:59 GMT 1997