The join operation is closely related to the cartesian
product as we have seen in
(3.1). This means that, initially, each tuple of
the cartesian product must be considered for the join result. The size
of the cartesian product , however, is huge in
comparison with the sizes of the participating relations, |R| and
|Q|, with the latter already being large in many cases. This can
involve a huge number of disk accesses to retrieve tuples of R and
Q which implies a very poor performance.
The enormous amount of data and data movement and very poor performances for naive approaches have triggered a lot of research efforts aiming for improvement. The latter are summarised in the remainder of this chapter.