Join Performance Issues

Next: Types of Joins Up: Role of the Join Previous: The Significance of the

Join Performance Issues

The join operation is closely related to the cartesian product as we have seen in (3.1). This means that, initially, each tuple of the cartesian product must be considered for the join result. The size of the cartesian product $\vert R\vert \cdot \vert Q\vert$ , however, is huge in comparison with the sizes of the participating relations, |R| and |Q|, with the latter already being large in many cases. This can involve a huge number of disk accesses to retrieve tuples of R and Q which implies a very poor performance.

The enormous amount of data and data movement and very poor performances for naive approaches have triggered a lot of research efforts aiming for improvement. The latter are summarised in the remainder of this chapter.

Thomas Zurek