Go to the first, previous, next, last section, table of contents.
The following is a lic script file that generates the code templates to
perform a matrix transpose on a 2D blocked matrix. The matrix size is
U1xU2 and the block size is B1xB2. Before executing the
script file, variables U1, U2, B1 and B2 should
be set to the appropriate integer values. This script file is available
in the examples/lic
directory in the SUIF distribution.
// Need to set U1, U2, B1 and B2 to appropriate integer values before // calling this script file. // Size of the matrix = U1xU2 // Size of the blocks that is assigned to each processor = B1xB2 // The column assignments // p1 p2 p1x p2x i1 i2 b1 b2 a1 a2 // 1 2 3 4 5 6 7 8 9 10 // iteration space // 1 <= i1 <= U1 and 1 <= i2 <= U2 # iter = { !1 <= @5 <= U1! !1 <= @6 <= U1!} // Data decompositions of array A // B1*p1 <= a1-1 < B1 + B1*p1 and B2*p2 <= a2-1 < B2 + B2*p2 # dda = { !B1@1 <= @9 - 1 < B1 + B1@1! !B2@2 <= @10 - 1 < B2 + B2@2! } // Data decompositions of array B // B1*p1 <= b1-1 < B1 + B1*p1 and B2*p2 <= b2-1 < B2 + B2*p2 # ddb = { !B1@3 <= @7 - 1 < B1 + B1@3! !B2@4 <= @8 - 1 < B2 + B2@4! } // Left hand side A[i1][i2] = ... // i1 = a1 and i2 = a2 # lhs = { !@5 = @9! !@6 = @10! } // Right hand side ... = B[i2][i1] // i2 = b1 and i1 = b2 # rhs = { !@6 = @7! !@5 = @8! } # res = { iter dda ddb lhs rhs } // No communication # prc = { res !@1 = @3! !@2 = @4! } prc.code(proc1 proc2 p1x p2x i1 i2 b1 b2 a1 a2) // Communication # prc = { res !@1 > @3! } prc.code(proc1 proc2 p1x p2x i1 i2 b1 b2 a1 a2) # prc = { res !@1 < @3! } prc.code(proc1 proc2 p1x p2x i1 i2 b1 b2 a1 a2) # prc = { res !@1 = @3! !@2 > @4! } prc.code(proc1 proc2 p1x p2x i1 i2 b1 b2 a1 a2) # prc = { res !@1 = @3! !@2 < @4! } prc.code(proc1 proc2 p1x p2x i1 i2 b1 b2 a1 a2) end
The results of running this example for a 1000x1000 matrix with square blocks of 32 in lic is:
csh> lic > U1 = 1000 1000 > U2 = 1000 1000 > B1 = 32 32 > B2 = 32 32 > < tst1 # iter = { !1 <= @5 <= U1! !1 <= @6 <= U1!} # dda = { !B1@1 <= @9 - 1 < B1 + B1@1! !B2@2 <= @10 - 1 < B2 + B2@2! } # ddb = { !B1@3 <= @7 - 1 < B1 + B1@3! !B2@4 <= @8 - 1 < B2 + B2@4! } # lhs = { !@5 = @9! !@6 = @10! } # rhs = { !@6 = @7! !@5 = @8! } # res = { iter dda ddb lhs rhs } # prc = { res !@1 = @3! !@2 = @4! } prc.code(proc1 proc2 p1x p2x i1 i2 b1 b2 a1 a2) if((proc1 >= 0)&&(proc1 <= 31)) if(proc2 == proc1); p1x = proc1; p2x = proc1; for(i1 = 1+32*proc1; i1 <= MIN( 32+32*proc1, 1000); i1++) for(i2 = 1+32*proc1; i2 <= MIN( 32+32*proc1, 1000); i2++) b1 = i2; b2 = i1; a1 = i1; a2 = i2; # prc = { res !@1 > @3! } prc.code(proc1 proc2 p1x p2x i1 i2 b1 b2 a1 a2) if((proc1 >= 1)&&(proc1 <= 31)) if((proc2 >= 0)&&(proc2 <= -1+proc1)) p1x = proc2; p2x = proc1; for(i1 = 1+32*proc1; i1 <= MIN( 32+32*proc1, 1000); i1++) for(i2 = 1+32*proc2; i2 <= 32+32*proc2; i2++) b1 = i2; b2 = i1; a1 = i1; a2 = i2; # prc = { res !@1 < @3! } prc.code(proc1 proc2 p1x p2x i1 i2 b1 b2 a1 a2) if((proc1 >= 0)&&(proc1 <= 30)) if((proc2 >= 1+proc1)&&(proc2 <= 31)) p1x = proc2; p2x = proc1; for(i1 = 1+32*proc1; i1 <= 32+32*proc1; i1++) for(i2 = 1+32*proc2; i2 <= MIN( 32+32*proc2, 1000); i2++) b1 = i2; b2 = i1; a1 = i1; a2 = i2; # prc = { res !@1 = @3! !@2 > @4! } prc.code(proc1 proc2 p1x p2x i1 i2 b1 b2 a1 a2) cannot create code # prc = { res !@1 = @3! !@2 < @4! } prc.code(proc1 proc2 p1x p2x i1 i2 b1 b2 a1 a2) cannot create code end > quit csh>
When the blocks are not square but rectangles, the above script file will generate the necessary code which is much more complicated. For example, try using the following parameters.
U1 = 1000 U2 = 1000 B1 = 32 B2 = 80
Go to the first, previous, next, last section, table of contents.