next up previous contents index
Next: Future Work Up: Summary, Conclusions and Future Previous: Summary

Conclusions

   The principal contribution of this thesis is the elaboration and description of a novel way in which partitioned temporal join processing can be optimised. All parts of the optimisation process can be made very efficient by using IP-tables. If the algorithms, e.g. those presented in chapter 9, had to be implemented on top of a data sampling approach then they would be very inefficient as most of them would need to scan the data sample various times. The theoretical and experimental results provide a base as to how an optimisation module of a database management system can be enhanced to cope with partitioned temporal joins.

Apart from this major contribution there is a list of further important results that were obtained when investigating various aspects of this work. They are the following:

However, there are several issues about this work which require careful consideration and possibly some more research in the future. One of these issues is the efficiency of the maintenance of the IP-tables. In section 7.4, we were concerned with showing how IP-tables can be updated. Thus there was an emphasis on feasibility rather than efficiency. Therefore the algorithms in that section do not claim to be the most efficient ones. In fact, one could imagine temporal database applications and query situations in which the overhead that is imposed by IP-table updates might become so significant that it outweighs the benefits of the IP-tables. It is still unclear, for example, whether an operational database with frequent updates to its (temporal) tables would significantly suffer from the IP-table overhead. Some more analysis in this quantitative aspect is required, either to discard this possibility or to assume that such situations might appear. In the latter case, one might want to find indicators that identify such problematic situations.

Our analytical cost model is a further issue which needs some validation. In the past, similar approaches have proved to be valuable for qualitative analysis, e.g. in [Hua et al., 1991]. However, one cannot say the same about the quantitative aspect. Modern hardware, especially parallel machines, have become systems that employ many complex performance enhancing mechanisms (such as caching or special devices to accelerate broadcasts or other typical communication patterns over the interconnect) which we could hardly incorporate into our cost model if we wanted to keep it reasonably general (to allow to derive conclusions for a wide range of platforms) and reasonably simple so that it could be efficiently used in a query optimiser. In other words: there is a good justification that if our cost model shows that strategy X performs better than strategy Y then this effect can be observed on a wide range of implementations. However, we still need some validation for the absolute numbers, i.e. if a strategy causes costs X according to our cost model then it remains to be seen how realistic this prediction is. But this could be confirmed by implementing the strategies and the join algorithm on real hardware or at least by simulating them using one of the available simulation tools.

Finally there is another issue that has to be considered carefully: the costs for the optimisation itself. We gave results of elapsed times for deriving the costs imposed by the various strategies. These elapsed times were obtained on a specific machine. For an optimiser it could be beneficial to have a cost model for the optimisation process of section 6.1 itself. In that way, it could decide whether it is worth while to consider expensive partitioning techniques, such as those of the minimum-overlaps family, or whether simple and fast ones are sufficient in the light of saving optimisation costs.


next up previous contents index
Next: Future Work Up: Summary, Conclusions and Future Previous: Summary

Thomas Zurek