In this chapter, we analyse the problem of replication when partitioning a collection of intervals. As we have seen in section 4.4, this problem is relevant for the temporal join algorithms that are based on explicit partitioning of the data over the timestamp attribute.
The replication of tuples can have an impact on the performance
of the join algorithms in a variety of ways. The exact consequences
for the join costs, however, can only be determined by considering
characteristics that are algorithm- or hardware-specific:
The problem of finding such optimal partitions for interval data is called the interval partitioning problem (IP) . This is a rather tricky problem as it not only requires to choose appropriate breakpoints but also makes it delicate to determine the number of breakpoints. In fact, we have encountered a dilemma: in order to reduce the number of overlapping intervals one has to reduce the number of breakpoints but in order to increase the number m of partial joins in (3.6) - e.g. for increasing the degree of parallelism - one has to increase the number of breakpoints. Thus there are two contrary effects associated with interval partitioning; note that the first one does not exist in the case of an equi-join. This is a manifestation of the min-max dilemma [Zhou, 1993]. We therefore need an additional input parameter. A useful constraint is to form fragments with less than a certain number of intervals. Such a number, for example, could be implied by the amount of memory or the disk space that is available.
In this chapter, we will formally investigate the complexity of IP, i.e. we will check whether a solution for IP can be found in polynomial time and - if so - we would like to have an algorithm that can compute this solution. The remainder of this chapter is structured like this: Section 5.2 introduces the notation that we are going to use throughout the chapter and the remaining parts of the thesis. In section 5.3, IP is defined formally. Section 5.4 looks at the search space of the problem. The main result will be that optimal partitions can be found within the set of interval endpoints. In section 5.5, we describe the algorithm IP-opt that computes solutions for IP in polynomial time. Finally, section 5.6 shows an alternative approach by relating IP to a graph-theoretic problem, namely the problem of sequential graph partitioning (SGP). This leads to a similar result as manifested by IP-opt. However, the connection between IP and graph partitioning (GP) might prove to be quite fruitful as the many complexity and algorithmic results about GP might be applied to variations of IP.