In stage 1, the initial fragments of R and of Q () are repartitioned as illustrated in figure 8.9. On the architectural model, this process is performed as follows - the description is restricted to repartitioning of R; the process works analogously for Q.
Furthermore there are 2m output buffers , ..., ,, ..., . They are distributed over all processors of the machine according to figure 8.10, i.e. and are located at processor , or, put the other way round, a processor j has the output buffers , ..., and (see figure 8.11). An output buffer will later receive all the tuples of R'k which come from the hash buffers for all . Similarly, receives all the tuples of R''k which come from hash buffers for all .
A processor j hashes its tuples to the hash buffers in the following way: a tuple r with timestamp [ts,te] is put into
As soon as a hash buffer is full its contents is transmitted to the corresponding output buffer.
Each processor that holds an output buffer as described above replicates r in the following way:
/* or */ for l=k+1 to do send tuple to the output buffer od
The significant difference, in comparison to partitioning for a traditional parallel join, is the replication of tuples in steps (b).(ii) and (c). We chose a two-level replication: (b).(ii) replicates the tuples over the interconnect and positions the tuples on all nodes that have a processor that has to process the respective tuple. This step can be regarded as an inter-node replication . Step (c) replicates the tuples within the nodes and sends them to the remaining processors. This intra-node replication is faster because it can be done via shared-memory rather than via communication over the interconnection network. If this step was incorporated into step (b).(ii) the advantage of fast communication via main memory would be lost.
As already stated, there are more efficient ways to repartition R and Q, e.g. by building index structures to represent the new fragments rather than materialising them as described above. We will later see that the repartitioning stage, even when performed in this non-optimal manner, still contributes only a minor part to the overall costs. It is the joining stage that dominates the overall performance. For this reason we chose the naive approach for repartitioning which not only improves the readability but also simplifies the cost model that is created in section 8.4.