Issue date: | Friday 7th February |
Due date: | Monday 24th February |
The purpose of this third part of the coursework is to evaluate the behaviour of caches on applications which use a significant amount of data, and which ``stress'' the cache to the limit. Hopefully you will see that naive use of caches, typically algorithms which access large data structures in naive ways, can lead to serious performance problems.
You are given a program called mxm, which computes the product of two matrices. You are to simulate the behavour of this program for a variety of matrix sizes and see how the cache behaviour varies as a function of one particular array dimension.
![]() |
(1) |
There are a number of ways of computing the values of c, and the one chosen for this practical computes the elements of c using the nested loop structure shown below:
for (j = 0; j < m; j++) for (k = 0; k < n; k++) for (i = 0; i < l; i++) c[i][k] = c[i][k] + a[i][j] * b[j][k];
Now, there ought to be plenty of locality in this algorithm as each element of a and b is read m times, and each element of c is read and written m times. However, spatial locality on the a matrix is actually quite poor, and temporal locality on the c matrix is also poor.
You may have noticed by now that the reference to b[j][k]
is a loop invariant expression at the inner-most loop.
Consequently, we can hoist it to an outer loop and keep the
value we need within a scalar variable. The effect will be to assign
the current value of b[j][k]
to a register, and thus avoid
one load instruction within the inner loop. The resulting code
looks like this (assuming a suitable declaration for r1
).
for (j = 0; j < m; j++) for (k = 0; k < n; k++){ r1 = b[j][k]; for (i = 0; i < l; i++) c[i][k] = c[i][k] + a[i][j] * r1; }
l
?
To answer this you will need to perform a set of seven simulations using dinero. Take a copy of all the files located in directory:
~cs3/comparch/Coursework/src/stageC
You will find a Makefile which is already set up to compile the matrix multiply program. The program itself is in four parts: there are two header files (data.h and trace.h) and two C source files (mxm.c and main.c). None of these files should need to be modified for this stage of the assignment, although you will be required to modify mxm.c for stage D. The mxm.c file contains a single routine called mxm() which actually performs the multiplication. main(), in main.c, reads the command line parameters, initializes the arrays, calls mxm() and then checks that the results are numerically correct.
The command line parameters for the mxm program are:-
So, to compute the product of a matrix and a
matrix, one would type:
mxm -l128 -m32 -n32
The initilization routine fills the a and b matrices with pseudo-random floating point values prior to the computation, and after computing the product, the result is checked against a ``standard'' algorithm (this will only become significant in the final stage of the assignment).
The output of mxm can be piped directly into dinero. If you inspect mxm.c you will find that there are calls to a Trace() macro which effectively computes the address of each array element accessed and prints this to the standard output stream in dinero format. If the -q flag is present on the command line no such output is produced.
You should assume that the initial cache parameters are those given
in table 1.
You should simulate for m=n=16 and allow l to
take on values from the set .Your results should be tabulated (at least) but preferably
illustrated in the form of a graph.
Explain the data miss rate behaviour you observe. In particular, explain why the miss rate starts to level out for very large matrices.
To answer this, I suggest that you take three data points from question 2 (say l=16,128,1024) and for each of them simulate with 2-way, and 4-way set-associativity (you should already have the results for a direct-mapped cache).
Your submission should not only describe, but also explain, the behaviour of the cache.
This document was generated using the LaTeX2HTML translator Version 97.1 (release) (July 13th, 1997)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -split 1 partIII.
The translation was initiated by Nigel Topham on 6/25/1998