Stage 2: Joining

Next: Evaluation of Characteristics Up: Cost Model Previous: Stage 1: Repartitioning

Stage 2: Joining

In this section we derive the costs of stage 2 of the temporal join processing model. This stage is described in section 8.3.4. Here, each processor j performs computations , , ..., . Each computation RQ_k consists of performing the three subjoins

in exactly that order in order to exploit the opportunity to keep (a part of) R'_k in main memory at the end of join (a), thus avoiding to reload it at the beginning of join (b). Alternatively, we could use the order (c), (b), (a) and exploit the same fact for the Q'_k. In the remainder, however, we will assume the order (a), (b), (c); arguments and results can be easily transferred to the alternative case.

The subjoins are performed by using the nested-block join technique . This means that in joins (a) and (b) an R'_k is cached either entirely or blockwise in main memory while being joined with Q''_k and Q'_k respectively. In join (c) we assume the same for the Q'_k when being joined with R''_k. The number of blocks in which an R'_k has to be divided is referred to by . If is the amount of main memory at each node then each processor gets a share of

Thus can be computed by

Similarly, refers to the number of blocks into which a Q'_k is divided for computation in subjoin (c):

We note that and are 1 if R'_k and Q'_k respectively fit in main memory.

The costs for each of the subjoins are computed in the same way: the two sets of tuples have to be read from disk and are then joined using a nested-blocked approach. A minor difference appears for join (b) that can exploit the fact that (a block of) R'_k already resides in main memory and therefore has not to be reloaded from disk.

There is no communication via the interconnect involved because the computations RQ_k are independent from each other. In the following paragraphs, we describe the cost components for join (a). The costs for joins (b) and (c) are derived accordingly with the marginal difference in the case of join (b) that has been mentioned above. All cost components are summarised in tables 8.5, 8.6 and 8.7.

First, we look at the I/O costs. These have to be considered on a node-wide level as disks are shared among all the processors of a node. For join (a), all tuples of R'_k have to be read once from disk and those of Q''_k once per block of R'_k, i.e. times. Thus a node i has to load

tuples of sizes |r| and |q| respectively. This implies I/O costs of

at node i. For the overall costs only the costs of the node with the heaviest load is relevant, i.e.

(57)

As in stage 1, the individual processors have to initiate the I/O transfers. CPU instructions are required per page transfer. Furthermore, every tuple in R'_k is tested with every tuple in Q''_k which makes

tests on a processor j. We assume that processing a pair of tuples requires instructions. This means that the CPU costs for a processor j are

For the overall costs, only the processor with the heaviest load is relevant. Therefore the CPU costs for join (a) are

The joining stage requires one main memory access to a tuple per tuple , i.e.

accesses in total, each retrieving a tuple of size |r|. Memory - as disks - is shared among all processors per node. Costs for main memory access therefore have to be considered on a node-wide level. Thus on a node i there are

accesses to a tuple of size |r| which produces costs of

Again, for the overall costs only the costs of the node with the heaviest load is relevant, i.e.

(58)

The total costs for stage 2 (a) - the join (a) - arise from the maximum of equations (8.25), (8.26) and (8.27):

(59)

As explained above, the costs C_2(b) for join (b) and C_2(c) for join (c) are similarly derived. Tables 8.5, 8.6 and 8.7 summarise the cost components for stage 2. The total costs for this stage are

(60)

**Table:** Cost components for the joining stage 2 (a).
2\|c\|Stage 2 (a)
Disk I/O
CPU

Memory

**Table:** Cost components for the joining stage 2 (b).
2\|c\|Stage 2 (b)
Disk I/O
CPU

Memory

**Table:** Cost components for the joining stage 2 (c).
2\|c\|Stage 2 (c)
Disk I/O
CPU

Memory

Next: Evaluation of Characteristics Up: Cost Model Previous: Stage 1: Repartitioning

Thomas Zurek