Next: Summary of the Cost Up: Summary, Conclusions and Future Previous: Conclusions

Future Work

As we have seen in the conclusions section, there are several possibilities to confirm and extend the applicability of this work. For example, one has to consider that many temporal join conditions do not only consist of an intersection, contain, overlap or during predicate between timestamp intervals but possibly also of additional non-temporal expressions, e.g. equality of (non-temporal) attribute values. Such a situation suggests that partitioning over the attributes that are involved in the equality condition should be the preferred option as there is no overhead imposed through tuple replication. However, if one of the equi-condition attributes holds heavily skewed values an optimiser might dismiss this option. As we have seen in the experiments, tuple replication has not as much impact as we initially expected. On a parallel architecture the predominant goal must be to achieve a good load balance. Therefore, partitioning over the timestamp intervals is still a feasible alternative to partitioning over equi-condition attributes in the same way as the fragment-and-replicate technique has proved to be a valuable alternative to symmetric partitioning in commercial parallel query processing [Tseng and Reiner, 1993], despite the overhead that it incurs. It is necessary to get some experimental results on the issue when a query optimiser should opt for partitioning over interval timestamps.

A second issue that could contribute to the appeal of IP-tables is to resolve the doubts about whether IP-tables can be maintained in an efficient way. There are various alternatives if IP-table maintenance becomes an efficiency problem and future research could analyse these alternatives:

We have already seen that condensation is a good possibility to decrease the sizes of IP-tables without doing a lot of harm to the quality of the optimisation process. Reducing the sizes of the IP-tables should have an immediate performance benefit also for the IP-table update operations. One would need to know whether condensation is sufficient in the cases in which IP-table maintenance becomes a problem.
As mentioned in section 7.5, there might not be many individual updates to temporal relations in a data warehouse environment but one bulk update, for example once per night. In this case, one could compute a temporary IP-table for the bulk update (which should be significantly more efficient than updating an existing IP-table) and then merge this temporary IP-table with the existing IP-table of the corresponding temporal relation.
One could argue that condensation proved that we do not require exact numbers from the IP-tables. Therefore, one could consider that IP-tables do not require immediate updates. The latter could be accumulated and be processed similarly to the bulk update in a data warehouse or one could simply recompute a temporal relation's IP-table every now and then. One would need some experimental results in order to see if such an approach is viable.
Finally, one could look at more efficient algorithms for the IP-table updates than those that we presented in section 7.4. One possibility could be to store the values $s_{\scriptscriptstyle R}(t)$ and $e_{\scriptscriptstyle R}(t)$ rather than $s_{\scriptscriptstyle R}(t)$ and $o_{\scriptscriptstyle R}(t)$ . This makes the IP-table update operations more efficient (the for-loops in figures 7.10 - 7.15 can be avoided) but imposes more work when using IP-tables (one has to use the recursive equations in figure 5.1(a) rather than the non-recursive ones of figure 5.1(b)). One would need to determine the trade-off between these two effects.

A discussion of these options along with a quantitative analysis of the impact of the update operations is necessary and certainly an issue for future research. This might be supported by findings made in the context of histograms as outlined in section 7.6.

The validation of our cost model by implementation of simulation is another imminent task that should be tackled in future research. It could be done in two stages. The first one would try to confirm the relative differences between partitioning strategies. This would consolidate many statements made in this thesis (e.g. the statement that uniform partitioning can be up to three times more expensive on a parallel machine than underflow partitioning). In a second stage, one would try to validate the absolute numbers that we obtained from the cost model. As outlined in section 6.2, it would be especially useful to bring our cost model in line with cost models for other join techniques in order to allow an optimiser to select the most efficient join algorithm.

As we also mentioned in the conclusions, it would be advantageous to have a cost model for the optimisation process itself. However, considering the wide range of possible partitioning strategies and also the wide range of possible implementations - figure 6.1, for example, suggests that there is a good chance to parallelise the optimisation too - makes this task costly and tedious but not impossible.

As we have seen in chapter 11, IP-tables prove to be a metadata-structure whose applicability goes beyond interval partitioning for join processing. Selectivity estimation is one area and we require experimental analysis of the results that were obtained in chapter 11. For example, one needs to investigate the impact of condensation on the quality of the selectivity results.

A further area to which IP-tables and interval partitioning is relevant is that of temporal index structures. Here, tree balancing is a major task in order to optimise memory requirements and access times for such indexes. In fact, Gunadhi and Segev met similar partitioning problems for temporal indexes as we did for temporal joins [Gunadhi and Segev, 1993]. We therefore expect that our IP-table based approach could be beneficial in that area too.

All this can establish IP-tables as a generally useful index structure for interval data. The initial results in this thesis are very encouraging in this respect and provide the base for future research.

Next: Summary of the Cost Up: Summary, Conclusions and Future Previous: Conclusions

Thomas Zurek