Stage 1: Repartitioning

Next: Stage 2: Joining Up: Temporal Join Processing Model Previous: Temporal Join Processing

Stage 1: Repartitioning

In stage 1, the initial fragments of R and of Q () are repartitioned as illustrated in figure 8.9. On the architectural model, this process is performed as follows - the description is restricted to repartitioning of R; the process works analogously for Q.

1.

Each node i reads its fragment of R; then each processor of that node processes the -th part of this fragment ().

2.

Each processor j has 2m hash buffers : to accommodate tuples for and hash buffers for tuples for .

Furthermore there are 2m output buffers , ..., ,, ..., . They are distributed over all processors of the machine according to figure 8.10, i.e. and are located at processor , or, put the other way round, a processor j has the output buffers , ..., and (see figure 8.11). An output buffer will later receive all the tuples of R'_k which come from the hash buffers for all . Similarly, receives all the tuples of R''_k which come from hash buffers for all .

A processor j hashes its tuples to the hash buffers in the following way: a tuple r with timestamp [t_s,t_e] is put into

(i): hash buffer with
(ii): hash buffers with .

Step (i) puts the tuple in the fragment that covers the range in which the timestamp's startpoint t_s falls; step (ii) puts the tuples in those fragments R''_k that are processed by the first processor on nodes (other than that covered by step (i)) that will perform an RQ_k that involves r. By doing so, we avoid the situation in which more than one copy of r is sent to the same node over the interconnect. Within a node, r can be replicated via main memory copies which is much faster (see step (c)). Thus step (ii) avoids a lot of possible network traffic.

As soon as a hash buffer is full its contents is transmitted to the corresponding output buffer.

3.

A further replication step is performed when a tuple r with a timestamp [t_s,t_e] arrives at an output buffer or at a with . Such a buffer is located at processor which itself resides at node . From there, r is replicated within node i and put into the output buffers with . This node-internal replication can be done within main memory which is much faster than if it had been performed over the interconnect (see step (ii) in (b)).

Each processor that holds an output buffer as described above replicates r in the following way:


		/*  or  */ 
 
		for l=k+1 to  do
		 		send tuple to the output buffer  
		od

4.

When an output buffer is full then its tuples are flushed to disk.

**Figure:** Buffers at processor j.

The significant difference, in comparison to partitioning for a traditional parallel join, is the replication of tuples in steps (b).(ii) and (c). We chose a two-level replication: (b).(ii) replicates the tuples over the interconnect and positions the tuples on all nodes that have a processor that has to process the respective tuple. This step can be regarded as an inter-node replication . Step (c) replicates the tuples within the nodes and sends them to the remaining processors. This intra-node replication is faster because it can be done via shared-memory rather than via communication over the interconnection network. If this step was incorporated into step (b).(ii) the advantage of fast communication via main memory would be lost.

As already stated, there are more efficient ways to repartition R and Q, e.g. by building index structures to represent the new fragments rather than materialising them as described above. We will later see that the repartitioning stage, even when performed in this non-optimal manner, still contributes only a minor part to the overall costs. It is the joining stage that dominates the overall performance. For this reason we chose the naive approach for repartitioning which not only improves the readability but also simplifies the cost model that is created in section 8.4.

Next: Stage 2: Joining Up: Temporal Join Processing Model Previous: Temporal Join Processing

Thomas Zurek