Transaction processing

By transaction processing I mean designing and implementing computer systems which appear to a user as if they do one thing at a time, despite running on machines that can "crash" (in various ways) at any given point.

The interest of the subject can be explained as follows. In the old days we had batch processing systems, where one program or transaction was run at a time (by someone feeding in data and programs, perhaps on cards), even if most of the peripherals were idle. These systems had the merit that at least you could understand what was going on, and if something went wrong, you knew where you were. But nowadays we have much more complicated systems in which different programs are run at the same time, or their execution is interleaved in some way such that interaction is possible. Because of its conceptual simplicity, we would like to approach and if possible maintain the illusion that one thing is happening at a time. This enormous subtley of this notion becomes evident when one starts to look at it.

What I find interesting about transaction processing is that it is a philosophical rat's nest. It involves extremely basic, almost "structural" concepts like atomicity, which are not easy to pin down.

The most interesting paper I've read in this area is by Butler Lampson. ["Atomic transactions", (with H. Sturgis), in "Distributed Systems-Architecture and Implementation, Lecture Notes in Computer Science 105, Springer, 1981, ed B. Lampson, with M. Paul and H. Siegert)."] In it, he isolates various "isotopes" of the notion of atomicity [one is called unicity], and sketches out the essence of various implementation techniques for obtaining atomicity, such as reduction to atomicity of appending a record to an unbounded log or sequential file.

I read this paper when thinking about how to implement a "queue manager" planned for an ambitious "transaction processing architecture", in a VMS engineering group within Compaq, or Digital as it was then know. (One interesting thing was that there really wasn't a strong requirement for priority, or fifo behaviour. A loose statistical guarantee about the extent to which there could be reordering would do. There was no absolute requirement for fifo behaviour. It seemed possible that this could be exploited for the sake of good performance in latency or throughput. It seemed impossibly difficult to even imagine thinking about trying to specify precisely the behaviour that was being implemented, or the nature of the guarantees desired in terms of long term statistics.)

The most perplexing of my experiences working on this project were in interaction with engineers working on an environment for transactions called a transaction monitor called (I think) ACMS. Interacting with these people was only marginally less perplexing than interacting with transaction processing "architects", some of whom had written books about the subject. I found it very difficult to tell what anyone is talking about in this area, and very frustrating to make any progress in discussions. My uncharitable guess is that many people in transaction processing completely abandoned making the slighest effort to communicate clearly a long time ago and suffer from a (lucrative) form of mental disease. I can't have been pleasant to work with.

Someone who seems to have thought hard about the very subtle issues that seem to arise around the notion of atomicity is Leslie Lamport. I first read a short paper by him about a way to implement atomic access to a multi-digit clock. (It had not been long since I had implemented the lowest level software of a system in which the passage of time turned out to be (exactly) proportional to the number of CPU-cards -- I was sensitive to the subtlety of problems to do with interference.) The prevailing wisdom was that dealing with interference between concurrent processes necessarily requires preventing it, by some form of locking. Lamport's paper gives a generalisable method of implementing single-writer multi-reader atomicity, using not locking but a form of interference detection based on certain reasonable assumptions about the behaviour of memory reads and writes.

It is strange how subtle such a simple thing as a memory cell becomes once you think about your interface to it, which can be pretty complicated in modern computer systems. In one way or another, I have found the notion of a "memory" or array of rewritable cells a useful idea iin designing complicated pieces of crash-tolerant systems. The great thing about a memory is that by replaying a sequence of updates to it until they finish, you wind up with the right answer, despite multiple restarts. A memory cell is in some sense the simplest possible non-trivial state machine.)

It seems to me that the basis of transaction processing is about how to approximate a simple picture in which the entire universe marches along in a straight line with one thing happening at a time. The virtue of this picture is not that it is true, or philosophically correct, but that it is simple. It has to be so simple that any customer logging on to a web-server to make a purchase can assume that transactions take place in a linear sequence. The "reality" is in fact hideously unlike this. While the transactions are underway, reality is more "arborescent" than linear. Each transaction evolves in a world of its own, which can interfere or even conflict with the world in which another transaction evolves. It can be extremely tricky to control the execution of transactions so as to maintain the picture that the transactions commit one at a time, or abort (execute as an idle or "skip" transaction). But at least it is simple to understand.

As a subject, transaction processing is full of technicalities: some of the most unreadable papers in the world are those about clever ways of organising the content of disk blocks or network messages so as to get a fast implementation of a database that can participate in transactions of some kind. Yet, completely dwarfing the technicalities are conceptual questions about exactly what one is trying to achieve. My impression is that people who are talented at thinking of implementation tricks aren't often also talented at teasing out in a precise way and nailing down elementary or structural concepts like atomicity.

One approach which is helpful is to model the problem to be solved as one of topologically sorting (serialising, or finding a linear arrangement extending) a given "dependency" relation between atomic actions. If this action reads a shared variable which that one updates, then we have to decide their order. The dependency order is generated by the variables. It is a question of preventing non-serialisable executions, or detecting them and ensuring they do no harm. This approach isn't fully satisfactory as a characterisation of the problem to be solved, because the notion of dependency between actions arising through their accesses to shared variables isn't really entirely clear.

Peter Hancock

Last modified: Fri Dec 27 16:14:53 GMT 2002