Introduction

Imperative languages allow unrestricted graph structures using pointers which makes their static analysis difficult and often imprecise. An approach is proposed whereby data structures are described based on a path name . This name uniquely determines a node of the structure and must not depend on run time information. The aim is to develop a method for precise static analysis for a class of structures with this node-naming property.

The data structures considered are graphs with the data elements stored at the nodes, and each node has a fixed set of outgoing directions D. The nodes can be indexed by the language L over D, each word being a path name from a distinguished root node denoted by $\epsilon$ . There is also an equivalence relation, $\equiv$ , over the words of L, which describes when two different paths actually reach the same data node. This can be expressed as a finite set of relations $r_{i} \in (L \times L)$ , i.e. the relation r_i = (x_i,y_i) implies that $u.x_{i}.v \equiv u.y_{i}.v$ for all $u,v \in L$ . The set of names v such that $u \equiv v$ is the set of aliases . It should be noted that such a system would not necessarily have a decidable word problem. An example of such a set of relations that does is the properties: inverses $r_{i} = (ab,\epsilon)$ , and commuting pairs r_i = (ab,ba), where $a,b \in D$ . We can define a normalising function n on such path names, so that all paths can be mapped to a unique equivalent one, i.e. it has the property that $x \equiv y \iff n(x) = n(y)$ . In order to handle more general structures we would use the normalising function as the definition, rather than the relations. A Markov algorithm [2] is proposed as a formal way of doing this.

The programs under consideration are recursive functions with a path name as the parameter w, as shown in the example in figure 1. Statements in the body of the function can read and write to elements of the structure obtained by prefixing the parameter path name by a word from L. Additionally, we can call the function recursively with the new parameter w.x, for any direction x. During the analysis each recursive call in the function body is assigned a label from an alphabet $\mathcal{C}$ . Also, each read and write statement is given a label from $\mathcal{I}$ ; we will often only consider one read and one write denoted by R and W. A control word in $\mathcal{C}^{*}.I$ will then uniquely describe a run-time instance of either a read or a write statement. We will also assume that there is a predicate on the size of the w parameter that halts the recursion, causing the function invocation to terminate. If P is a word from $\mathcal{C}^{*}$ , then we define Current(P) to be the current parameter of the function call. Also, for a control word V we can define Data(V) to be the path name of the data element actually accessed by that statement, whether for a read or a write.

**Figure 1:** Example recursive function
$\begin{figure} \begin{ttfamily} \begin{tabbing} \hspace{1cm}\= \hspace{1cm}\= ... ... gt\textit{C:} \\ gt F(w.c); \\ \\ gt \}\end{tabbing}\end{ttfamily}\end{figure}$

There is a total ordering on all the control words obtained from the ordering on the call labels and the statement order within the function. This total order is obtained by extending the order on these symbols to a lexicographic order on the strings of symbols. This ordering corresponds to the actual execution order of the statements for a sequential execution of the program.

The aim of dataflow analysis is, given a particular statement P.R, that reads from the structure, to locate the source , the write statement that produced that value, should such a statement exist. This characterises the data dependencies within the function. Define two statements U, V as conflicting if Data $(U) \equiv$ Data(V). We define a potential source T as a statement that conflicts with P.R and occurs before it, T < P.R. Thus the source is the maximum of all potential sources.

We now consider other work that has been done in both the area of extending the notion of data structures, and applying static analysis to more complex datatypes and control structures.