The Significance of the Join in Relation Query Processing

For understanding the significance of the join for relational query processing we want to see why and where the necessity of a join arises. Amongst many reasons, it is already the database design process which creates an oncoming demand for joins within relational queries. Let us go one step back from the staff-student scenario as it was described in section 3.1: imagine that someone designs a database for this department. An initial step would be to create a conceptual schema in some semantic data model , e.g. the entity-relationship (ER) model [Chen, 1976]. A conceptual schema is then mapped into a logical schema using one of various logical data models, such as the network, the relational or the object-oriented model. We want to look at one aspect of this mapping in more detail in order to identify one reason for which the join operation is so important in relational query processing.

Figure 3.3 shows part of the departmental scenario in entity-relationship notation [Korth and Silberschatz, 1991]. There are two entity sets, namely Staff and Student, which are related via two relationships, namely teaches and supervises. It is intended to describe a department that has a number of staff and a number of students. Staff members teach students in various courses and also supervise students in certain research projects. For simplicity, courses and projects are omitted.

**Figure:** An example of a conceptual database design in entity-relationship notation.
$\begin{figure} \epsfxsize=0.9\textwidth \centerline{ \epsffile{/home/tz/work/thesis/fig/er-scenario.ps}} \centerline{ } \end{figure}$

This conceptual schema can now be translated into a logical schema. Depending on the logical data model that we choose, we will get different results. For our purpose, we want to look at the way in which the relationships between entities are translated, such as the teaches relationship between Staff and Student.

In the relational model, tuples within a relation are identified by a unique attribute value or a unique combination of the latter. This is a called a key . The Name attributes in Staff and Student are examples for keys as we assumed names to be unique. Keys are a kind of logical reference as opposed to a physical reference, such as a memory address. In our example, one can map the teaches relationship set of figure 3.3, the conceptual schema, into a relation Teaches in the relational schema. The relation Teaches holds tuples which consist of two parts: a key that refers to a tuple in the Staff relation and a key referring to a tuple in the Student relation. See figure 3.4. If a tuple's key is used within another relation for such purposes then the corresponding attribute in the other relation is called a foreign key . Thus a link between tuples of two different relations is established in a rather abstract way: it can be deduced from the condition `key = foreign key' .

**Figure:** Relation `Teaches`.
$\begin{figure} \begin{center} \begin{small} \begin{tabular} {\vert ll\vert} \hli... ...e \\ \hline\end{tabular}\end{small}\end{center}\index{{\em Teaches}}\end{figure}$

One might argue that the conceptual separation of relations does not have to translate to the physical level, i.e. data could be physically linked although being conceptually separated. This, however, is only sometimes the case. Relations are usually normalised because this facilitates updates and helps to maintain a consistent database.

Whereas relational database systems are based on such logical links, systems that are based on the network , such as ADABAS [Tsichritzis and Lochovsky, 1977], or some systems with an object-oriented data model, such as ObjectStore [Lamb et al., 1991], would create physical links between the entities: they would, for example, use pointers to link a staff member with a student if there exists a `teaches' relationship between them.

The operation that creates physical links between logically linked tuples of two (or more) relations is the join. The most frequent usage - as motivated in the previous paragraphs - is to materialise `key = foreign key' relationships. These arise from mapping ER-like relationship sets, such as teaches in figure 3.3, to relations in a relational database. Estimates are that around 90% of join conditions are such `key = foreign key' conditions [Valduriez, 1987].

As an example, consider the two `key = foreign key' relationships between the relations of figures 3.1 and 3.4, namely Staff.Name = Teaches.Teacher between the Staff and Teaches relations and Student.Name = Teaches.Student between the Student and Teaches relations. If, for example, we were searching for the workrooms of Alex's students we would need to materialise the `key = foreign key' link between the Student and the Teaches relations. This can be done by joining the two relations, using Student.Name = Teaches.Student as the join condition (see figure 3.5 (a)), followed by a selection of Alex's students and a projection on the workroom attribute (see figure 3.5 (b)).

**Figure:** Example of a *`key = foreign key'* join.
$\begin{figure} \begin{center} \begin{small} \subfigure[Result of the join {\bf ... ...hline\end{tabular}\end{center}\end{minipage}}\end{small}\end{center}\end{figure}$

The join operation, however, is not restricted to `key = foreign key' but has many other applications. As an example of a non-`key = foreign key'-link, we might want to find staff members who share an office in our scenario. To that end, the relation Staff can be joined with itself using Staff A.Office = Staff B.Office as the join condition. Another example would be to look for students who became a staff member by joining the Staff and Student relations via the Staff.Name = Student.Name condition.

Up to here, all examples of joins used equality as the predicate in the join condition. Such joins are called equi-joins . An, admittedly artificial, example of a nonequi-join is to find staff-student pairs in which the staff member worked in the department before the respective student entered. Such pairs can be found by joining Staff and Student using Staff.End < Student.Start as the condition. This is actually an example of a temporal join as the join condition is based on a temporal relationship between the timestamps of two temporal relations. More precisely, it is a before join because `before' is the temporal relationship. Join types, such as equi- and nonequi-joins, are discussed in some more detail in section 3.3 and, for the temporal case, in chapter 4.

In summary, we have seen that the join operation is used to materialise the various logical links that exist between data of two or more relations. Whenever a query needs to relate data of two or more relations a join operation is required. This situation appears frequently because of the many `key = foreign key' relationships that are introduced in the database design process, e.g. when translating relationship sets of a entity-relationship model into relations of the relational data model or through normalisation of relations. Furthermore, there are many more logical links of various types that might be materialised by a join operation.