Imperative languages allow unrestricted graph structures using pointers which makes their static analysis difficult and often imprecise. An approach is proposed whereby data structures are described based on a path name . This name uniquely determines a node of the structure and must not depend on run time information. The aim is to develop a method for precise static analysis for a class of structures with this node-naming property.
The data structures considered are graphs with the data elements
stored at the nodes, and each node has a fixed set of outgoing
directions D. The nodes can be indexed by the language L over D,
each word being a path name from a distinguished root node denoted by
. There is also an equivalence relation,
, over the
words of L, which describes when two different paths actually reach
the same data node. This can be expressed as a finite set of relations
, i.e. the relation ri =
(xi,yi) implies that
for all
. The set of names v such that
is the set of
aliases . It should be noted that such a system would not
necessarily have a decidable word problem. An example of such a set of
relations that does is the properties: inverses
, and commuting pairs ri = (ab,ba), where
. We can define a normalising function n on such path names,
so that all paths can be mapped to a unique equivalent one, i.e. it has the property that
. In
order to handle more general structures we would use the normalising
function as the definition, rather than the relations. A Markov
algorithm [2] is proposed as a formal way of doing this.
The programs under consideration are recursive functions with a path
name as the parameter w, as shown in the example in figure
1. Statements in the body of the function can read and
write to elements of the structure obtained by prefixing the parameter
path name by a word from L. Additionally, we can call the function
recursively with the new parameter w.x, for any direction
x. During the analysis each recursive call in the function body is
assigned a label from an alphabet . Also, each read and write
statement is given a label from
; we will often only consider one
read and one write denoted by R and W. A control word in
will then uniquely describe a run-time instance of either a
read or a write statement. We will also assume that there is a
predicate on the size of the w parameter that halts the recursion,
causing the function invocation to terminate. If P is a word from
, then we define Current(P) to be the current parameter of
the function call. Also, for a control word V we can define
Data(V) to be the path name of the data element actually accessed by
that statement, whether for a read or a write.
There is a total ordering on all the control words obtained from the ordering on the call labels and the statement order within the function. This total order is obtained by extending the order on these symbols to a lexicographic order on the strings of symbols. This ordering corresponds to the actual execution order of the statements for a sequential execution of the program.
The aim of dataflow analysis is, given a particular statement P.R,
that reads from the structure, to locate the source , the
write statement that produced that value, should such a statement
exist. This characterises the data dependencies within the function.
Define two statements U, V as conflicting if Data Data(V). We define a potential source T as a statement
that conflicts with P.R and occurs before it, T < P.R. Thus the
source is the maximum of all potential sources.
We now consider other work that has been done in both the area of extending the notion of data structures, and applying static analysis to more complex datatypes and control structures.