What are types for?

To be updated - a bit of a jumble at the moment.

The notion of `type' isn't really a single idea, but rather a tangle of notions in which some of the strands come from grammar, some from mathematics, some from antiquity. But some strands are from programming. I'll get round to programming in a moment. But first, a bit of a sermon.

It is fairly easy to make fun of attempts to define, explain, or motivate fundamental notions, like type or set. What's a set? Why, "a manifold conceived as a unity", a "definite collection", or some junk like that. Well, not junk, not at all; if someone is trying to define or even just to explain or motivate a fundamental notion, what they say has to be heard or read with charity and a sensitive ear as well as scepticism. And so it has to be; it is no good just drumming one's fingers on the table till the tedious fool has finished waffling the motivational stuff, waiting for the solid, scientific, rigorous axioms. It is one thing to write down a bunch of axioms that hold for such a fundamental notion; but how did we arrive at those axioms in the first place? Historically, by taking an intuitive notion seriously, and analysing it. Informal rigour, as in any case when a fundamental notion like effectivity, or validity in all structures has to be analysed. Analysis certainly involves trial axiomatisation, and formal (metamathematical) analysis -- but on its own, the formalism is worthless, even worse than worthless. What else is there? There's no recipe. Experience. Examples. Trying to explain it in several different ways. Sensitive attention to terminology, its etymology, its grammar, its meaning. Mapping out the conceptual territory, seeing what depends on what and what is near to what. Teasing apart different strands in the same notion and nailing down the difference formally. Trial and error. Thought!

I have heard or read some attempts to say what a type `is' that I thought had something in them. One is somewhere in the early writings of Bertrand Russell.

A type is the domain of a propositional function.
It is short, pithy, and bears thinking about, but does it hit the nail squarely on the head? To my mind it hits it very crooked, and there's nothing to do but claw it out and try to understand what is wrong with it, and the smidgeon that is right about it. Which you will be relieved to hear I am not going to inflict on you. It's better to ask, I think, what types are for. After all they are inventions of human beings.

Types in programming

What are types for? Why do our programming languages have type systems? The most common answer is probably that they are there to prevent errors. (Cardelli begins his famous paper Type Systems by saying "The fundamental purpose of a type system is to prevent the occurrence of execution errors during the running of a program".) The suggestion is that type systems are supposed to prevent certain bad events, like a character string getting jammed in an orifice designed only for booleans, or slip through, breaking the delicate mechanisms inside with who knows what consequences. That is certainly true. Types are there, partly, to support policies to protect us from, or take some of the sting from the bad things in life.

However, besides ruling out bad events, type systems also have (or can have) a more positive rôle, in requiring good events -- specifically successful termination of the computations evoked by running the program on an ideal computer. Lack of termination, were it to "occur", would be an error of omission, rather than an event that occurs at any specific time. Types are also there, partly, to support policies to assure us of the good things in life.

In the jargon of computer science, type systems are to do with correctness, not just partial correctness (absence of incorrectness), but total correctness (inevitability of termination). I am aware that this is an eccentric view; it isn't even true, taken as an observation of existing type systems, though it is I think `true', as a reminder, if not an exhortation.

In fact there are type systems which ensure terminating computations. An example, admittedly not a type system for a real programming language, is Martin-Löf's type system. It is designed so that computations of well-typed expressions to weak head normal form always terminate (assuming that the computing is done on an ideal computer). For a long time I thought this meant that the language could not be taken seriously as a real programming language; over many years I have come to understand the notion of `program' better, and now find Martin-Löf's approach much more credible with respect to real programming languages. Perhaps it will not be long before there are good implementations of programming languages based on type systems like Martin-Löf's.

(Type systems for termination in polynomial time, constant space.)

I should like to distinguish between two concepts, namely `computation', and `running' (there must be better words than these). Computation is what a computer does when we don't care what it is doing, but only that it should hurry up and finish. Running is what it does when it does what we want, i.e. when we get something out of it.

Perhaps one can put the matter like this: all computers fetch instructions, and then execute them. The really important instructions are however the input-output instructions, not the internal ones that move bits around inside the computer.

With functional languages, fetching instructions means (for example) rewriting a graph to weak head normal form, having (a dispatch code for) an input-output instruction (or run-time primitive, or whatever you want to call it) at the head. Execution of such an instruction involves interaction with the outside world, using the operands of the instruction (the use of which may entail further computation). The input from the interaction (which may be a command or a response) feeds into the computation of the next instruction (as with a computed GOTO, or carry-branch).

Functional languages, or rather the virtual machines that run them have the same `fetch-execute' cycle (at a higher level) that we have in all machine architectures; only in this case, the "fetch" part of the cycle is highly non trivial, and (if we get things wrong) may not even terminate. The computer might spend eternity running round a loop trying to figure out, or calculate what the next thing to do is. Somehow, people in functional programming (or, for that matter proof theory) have tended to be fascinated by the fetch part of the cycle, and forget about the execution part - what happens when the normal form has been decoded into output command and continuation code.

In Andy Gordon's thesis there is the idea of giving an operational semantics to IO programs, using labelled transition systems. But what we really want is an axiomatic semantics, and refinement methodology appropriate to reasoning about predicates, relations, and transformers of them; we want to know how to reason about them.

An excellent set of notes a propos of operational semantics is Simon Peyton-Jones's awkward squad.

The distinction between computing and running is manifest in all real programs, except for one-step terminating batch computations where any interaction takes place either entirely before the program starts or entirely after it stops. Running for ever is a wonderful attribute of programs, but computing for ever is (to put it mildly) not, and is usually known as a `hang'. (Computer scientists prefer the jargon of `divergence'. I abhor the word. I know from real life what it is for a program to hang, but if `divergence' means anything to me, it's the opposite of convergence, and that's something in topology, not programming.)

In these senses of `compute' and `run', we are positively interested in terminating computations, because perpetual termination ensures that programs don't stop running. So it is valuable to have type systems (like Martin-Löf's) which guarantee that computation of properly typed programs to constructor form always terminates, assuming an ideal computer.

The price is that such a system is necessarily (Turing) incomplete: there will be constructive functions that cannot be written as programs in the system, or true Pi-0-2 statements that cannot be proved (except by extending the system). Of course a formal system with a decidable `proof-predicate' (type checking algorithm) is necessarily a countable, or more precisely recursively-enumerable kind of a thing. So there is no hope of fully representing in a formal system any of the classic uncountable domains, like numerical functions (corresponding to batch programs), or countably branching well-founded trees (a vestigial form of interactive program).

So the types in your program are there not just to stop your program going wrong, but also to make it go right. At least, that is how we could in principle use them.

Types in grammar

If types serve such a worthwhile purpose in connection with programs, then what have they got to do with logic? After all, logicians were inventing type theories long before computers and programming.

The connection is I think that as soon as we have `function and argument' in a language, as in predicate logic with its predicate constants and schemata (or that matter propositional calculus with its logical constants), then we have the notion of applying one thing to another which mightn't `fit', `go', or `compute'. The very grammar of predicate logic (especially its derivations) involves distinctions of grammatical category, between formulas and terms, schemata and the substitutions by which their instances are obtained, derivations from hypotheses with free parameters, and so on. There is a kind of grammatical calculus implicit in a logical language of any complexity.

Think of what we need to know to recognising an instance of the natural-deduction elimination rule for the existential quantifier:

                              [ P(x) ]
                                 :
                                 : x
                                 :
         (Exists y)P(y)          C
         ------------------------------
                        C
To begin with, the whole thing is schematic in the predicate
P and the statement C, the ground variable x
may occur in no assumptions on which the subderivation of C depends
other than the discharged occurrences of P(x).  And
then x is only a metavariable: the right hand sub-derivation
is a schematic object; and so on.  

I believe the first person to recognise and draw out explicitly an important thread in the logical notion of type was Frege. Frege's hierarchy of unsaturated expressions (on which his analysis of first order quantifiers as second order predicates was based) was about the earliest manifestation of a notion of type, at any rate that I can think of. It is close to that in Church-Ramsey "simple" type theory, except that Frege did not make a distinction in type between individuals and propositions, having instead a function (the content stroke) from individuals in general to propositions; propositions are retracted from individuals.

Then Russell. Wittgenstein. Ramsey. Church. Then the cumulative hierarchy of iterations of the powerset. Then Godel, Curry, Howard, Tait, Scott, Prawitz, Martin-Lof, de Bruijn. But feeding into/off all this, Algol, Wirth, Hoare, Dahl, Milner, Damas, Cardelli.

What would Frege have made of types in programming languages? These are in my opinion of a different category or "realm" (a realm is the domain throughout which a certain body of rules prevail); they are distinctions within ground (saturated) expressions. They reflect the underlying distinctions of category.

Because there is calculation (which can go wrong, or fail to go right), there are types. The words `compute', `fit', `go' belong together. It doesn't go. It doesn't fit. It doesn't compute. It doesn't work. It is not fit. It is not a fit argument for definitions by recursion that we want to be preserve productivity, or continuity, or termination, or well-foundedness.

We can fit a (first order) predicate into the argument place of the (first order) universal quantifier, but we cannot fit any (first order) quantifier like `some' or `all'. (Annoyingly, this raises the question of different levels of quantifier; quantifiers appear to have a kind of polymorphism.) It does not compute/fit/go, because we don't know how to compute or communicate an instance of the `predicate'.

Detritus

If we declare a variable to be FIXED64, or what-not, we tell the compiler how many bits to reserve for the representation of that value, what machine instructions to use to multiply by it, and give it a small but perhaps worthwhile opportunity to complain when the programmer has written gibberish. If we say that a variable holds a `stack', or something, we tell the compiler how to record its state, about init, push, top, pop and so on, and give an opportunity to complain about some other kind of access.

Hoare (Notes on Data Structuring) has a remarkable early paper about data types in programming. Each data type (enumeration, subrange, records, discriminated union, function/array, powerset, sequence) comes with an implementation-oriented account of its representation/manipulation in a conventional computer.

I think that types are essentially connected with programs. They have to do primarily with computation. The notion of type with which programmers are concerned is in some sense the most fundamental notion of type. Difference between program computation and grammatical elaboration.

Many languages have a word for formal, calculational arithmetic: something like "reckoning". If we understand reckoning, or calculation, then we still don't truely understand arithmetic. What is missing from "true" arithmetic is counting, particularly iterated interactions with the real world, with its apples, sheep, tape-measures and shop-keepers.


Peter Hancock
Last modified: Sun Dec 29 16:27:22 GMT 2002