Chapter 15 - File For Immediate Use

Direct and Byte Files

Variations on the File concept

This chapter introduces a new type of Image File. It allows a file to be treated like an array of records and is called a Direct File.

We also meet a type of File which is not an ImageFile and so does not hold its contents as a series of records. It is called a Byte File and has three subclasses, InByteFile, OutByteFile and DirectByteFile.

Direct or random access to a file

Image File class DirectFile implements a concept known as direct or random access to the contents of a file. It only works for files which are genuinely areas of the computer's memory and which are permanent during the running of the program. In particular, random access cannot be used on terminals, printers or other input and output devices. In this way DirectFiles are very different from InFiles and OutFiles.

The ImageFile generally views files as ordered lists of records. The DirectFile views them as numbered lists of records. The difference is rather like the difference between linked lists and arrays. To reach item 100 on a linked list you must count past the first 99, while to reach item 100 in an array you can go straight to it using a subscripted variable. With an InFile or OutFile, in order to read or write record 100, you must first read or write the previous 99, while with DirectFile you can go straight to record 100, using a procedure called Locate, which is an attribute of DirectFile.

The other difference with a DirectFile is that a program can both read from and write to the same file, without closing and reopening as a different type of file. This makes DirectFile a very useful concept for what is generally known as database work. This involves retrieving, storing and updating information.

What does a DirectFile look like?

Every SIMULA system is free to store the information in the physical locations referred to through DirectFile objects in any way that it finds suitable. What is important is how it presents this to the SIMULA program. This presentation must conform to the rules of SIMULA. The real, physical storage of data should be given in the User's Guide or Programmer's Reference Manual for the SIMULA system you are using.

When a DirectFile is first created it is empty, but it has certain properties. One of these is the record length, which must be the same for all its records. Another may be the maximum number of records that the file can hold, but not all systems will fix this in advance.

Once some output has been transferred to a DirectFile, through OutImage, this will contain one or more non-empty records. Each of these will have a sequence number, called its Location. As it is not necessary to write to records in the order of their sequence numbers, these non-empty records may be mixed up with empty or unwritten records.

It is central to the understanding of DirectFile to realise that there can be "holes" in the sequence of Locations of written records and that these represent unwritten records. This is particularly crucial when an attempt is made to read from a particular location. The effect will depend upon whether or not a record has been written there.

Consider example 15.1. The only unfamiliar concept is that of Locate. This simply moves the current position of the program within the sequence of records in a DirectFile to the record whose location is given as a parameter to Locate.

Example 15.1: Writing to a DirectFile.

   begin
      ref(DirectFile) Direct;
      Direct :- new DirectFile("Data");
      inspect Direct do
      begin
         Open(Blanks(80));
         OutText("First");
         OutImage;
         Locate(4);
         OutText("Fourth");
         OutImage;
         Locate(10);
         OutText("Tenth");
         OutImage;
         OutText("Eleventh");
         OutImage;
         OutText("Sixth");
         Locate(6);
         OutImage;
         Close
      end--of--inspect
   end**of**program
Diagram 15.1 shows the contents of the file at the end of the program. Note the holes in the record sequence.

Diagram 15.1: Contents of file after example 15.1.

     Location      Content

       1            "First"
       2            unwritten
       3            unwritten
       4            "Fourth"
       5            unwritten
       6            "Sixth"
       7            unwritten
       8            unwritten
       9            unwritten
      10            "Tenth"
      11            "Eleventh"
Having described informally what a DirectFile represents, we can now move on to consider the attributes of ImageFile class DirectFile. These are as specified in the 1984 SIMULA Standard and may not all be implemented on some older systems. As usual, you should check the documentation for the system you are using.

The attributes of DirectFile

Since DirectFile is a subclass of ImageFile, it contains all the attributes of ImageFile. This means the Image, Pos, SetPos, Length and More are defined in the same way as for the other subclasses of ImageFile. These are all attributes concerned with the current Image text.

Other attributes of ImageFile are redefined slightly for DirectFile. The differences depend on the current value of Location and what is found there. We shall consider the attributes dealing with the image locations first, to make it easier to understand these redefinitions.

Locate, Location, MaxLoc and LastLoc

The highest permitted image number in the file is fixed on some systems and can be any number on others. The integer procedure MaxLoc returns the limit for any given file. Where no limit is imposed, the procedure will return the largest value allowed to an integer minus one.

Since not all the images in a DirectFile objects permitted range may be filled, the integer procedure LastLoc is provided. This gives the index number of the highest numbered image location currently in use.

Location is an integer procedure which gives the index number of the image location which is currently being accessed. It is legal for the current location to be unused.

Procedure Locate takes one integer parameter. It resets the currently accessed image location to the one whose index matches the parameter. An attempt to exceed the value of MaxLoc causes a runtime error.

These procedures allow programs to access image locations in any order and to check that the locations being accessed match the current contents of the file.

Now let us look at attributes with, mostly, familiar names. First image handling procedures.

InImage, OutImage and DeleteImage

These three procedures all deal with the contents of the currently accessed image location.

InImage reads the contents of the current location into text attribute Image, in a similar way to the InImage of InFile. It is different since the contents of the current location can produce more possibilities. Essentially there are three cases:

  1. The current location has an image which has been written to. This image is transferred to Image and the current location is updated to point to the next location.

  2. The current location has had no image written to it, but is not beyond the highest currently used location, as given by LastLoc. Image is filled with nul characters, Pos is set to Length + 1, making More give False. The current location is again increased by one.

  3. The current location is beyond LastLoc, but less than or equal to MaxLoc. This causes EndFile to return True and Image is given the single character representing end of file, ISOChar(25), as its contents.

OutImage transfers the contents of text attribute Image to the current location. It makes a previously unwritten location into a written one. If the current location is initially greater than LastLoc, LastLoc will be updated to the current location. The current location is then increased by one.

DeleteImage removes the current location's image. This leaves the current location unwritten, i.e. as if it had never been filled. If the location deleted is the same as LastLoc, LastLoc will be reduced to the index of the next highest position written to.

Note on sequential access to DirectFile

If Locate is never used to read images out of order, DirectFiles can be read or written to in the same way as InFiles and OutFiles. The incrementing of the current location by one after InImage or OutImage is identical to what happens in these types of file.

EndFile

EndFile is a Boolean procedure, like that of InFile, with the same purpose. It checks whether the current location is greater than LastLoc, i.e. the highest location currently holding a written image. If it is beyond LastLoc it returns True, otherwise False. It also returns True if the file is currently closed.

CheckPoint

As with OutFile and PrintFIle, the system may not actually update the memory of the computer each time an OutImage is performed. Instead it may hold several outstanding requests until a suitable limit for the system and then write them together. In order to be secure, it is sometimes desirable to force the updating of the physical file before proceeding. CheckPoint is particularly important for DirectFile objects, where access is both for reading and writing and files are often shared. It works in the same way as in OutFile.

SetAccess

For database applications shared and readwrite modes are especially important. The use of SetAccess is as described in chapter 7. If readwrite mode is set to readonly or writeonly, the DirectFile in question may only be accessed in that way from the program. This can prevent corruption of data.

Locking a DirectFile

As a major use of DirectFiles is in database applications, where a file may well be shared amongst several users, it is often important to allow one user to gain exclusive access to the file for a limited period, to prevent it being written to while someone is trying to read from it. This is known as locking the file.

In fact it may be better to lock only that part of the file which the particular program wants to access, leaving the rest free for others to acess. This may or more may not be possible, depending on the system.

Three procedures are provided for this.

Lock is an integer procedure with three parameters.
The first is a real, which specifies how long the program is prepared to wait for the file to be locked. If this time is exceeded, the system returns a result of -1. If the lock fails for any other reason, a value less than minus one is returned, with a system defined meaning. If the time pased is zero or negative, lock returns without doing anything.

The other two parameters are both integers. They indicate the range of locations within the file which this program wishes to lock. Some systems will lock the whole file regardless. If both integers are zero, the whole file is to be locked. If the system does not support locking of files, it returns a negative result to indicate this.

If Lock succeeds within its time limit, zero is returned.

A second call of Lock, with the file already locked, will cause it to be first unlocked and then locked again. This may mean that it becomes locked first by another program.

Unlock is a Boolean procedure with no parameters.
It cancels any current locking of the file by this program, having first called CheckPoint to preserve any unwritten changes made by it. The result of the CheckPoint call is returned as the result of Unlock.

Locked is a Boolean procedure, returning True if the file is currently locked by this program.

Open and Close

Procedures Open and Close work in the same way as for Infile and OutFile, except that the length of the text passed to Open will be taken as the fixed length of all records in the file. This length may not be changed.

Item oriented procedures

All item oriented input and output procedures from InFile and OutFile are also found in DirectFile. There is one difference. Any unwritten locations are skipped by the input procedures and the first location containing a non-empty image becomes the current location.

An example of the use of DirectFile

An obvious use of DirectFile is to hold information with some kind of numerical key. Example 15.2 shows a simple program which adds a new employee record to a personnel database. The records are listed according to the works number of the employees.

Example 15.2: Use of DirectFile for numbered records. begin ref(DirectFile) Records; ref(InFile) InPut; InPut :- new InFile("Additions"); InPut.Open(Blanks(86)); ! First 6 hold employee number; Records :- new DirectFile("StaffRecs"); Records.Open(Blanks(80)); while not InPut.EndFile do begin Records.Locate(InPut.InInt); Records.OutText(InPut.InText(80)); Records.OutImage end--of--reading--in--records; while not Records.EndFile do OutText(Records.InText(80)) end**of**program

Not all records contain a convenient number of this sort. It may be necessary to scan a file checking for the required record. Even so, there is often an advantage in being able to search and write to a file without copying it into a new file. Think back to our earlier label programs and see how much simpler they would be with a DirectFile.

A particular example of efficient searching on a non-numerical key is known as hashing. DirectFiles are very useful for simple hashing. Most text books on searching and sorting will explain this in full.

Exercise

15.1 Rewrite exercise 9.5 using a DirectFile.

Files without records

So far we have only considered files as lists of records or images. All the reading and writing has been in terms of the current record and what happens when we reach its end. Even the item oriented input and output procedures have worked on items within records.

This approach is often quite natural. It has its origins in the use of punched cards, usually holding up to 80 characters, for input and line printers, printing up to 132 characters per line, for output. The name Image is a contraction of the old term "card image", referring to how the contents of a punched card is stored in a particular computer's memory.

This view of the world has never covered all the possible devices for input and output for computers. It certainly does not represent the memory in which most information is stored on modern computers. Neither does it represent "screen oriented" input and output, nor graph plotter output.

In fact, most computers use a large number of different structures for representing data. Some are held in memory, others are connections to external sources and destinations. Only some of them can be adequately thought of in terms of records.

Even when it is possible to pretend that an unstructured file is made up of records, this may slow down access as make believe records are constructed or disassembled. In recognition of the need to provide a solution, SIMULA has a type of file called a ByteFile. This attempts to provide the most general way of reading or writing information, with no assumptions about what that information looks like. This approach is sometimes called "stream oriented" input and output.

Bytes and files

It is beyond the scope of this book to explain the details of how a computer works. The following explanation is as complete as it needs to be, but does not cover everything.

As far as most programmers are concerned, the way in which a computer stores information is irrelevant. They are normally interested in manipulating characters and numbers. Numbers are held in most computers as sequences of binary digits (bits) with a fixed maximum length. In real numbers some of these digits represent the position of the decimal point, the others the decimal digits. In integers they all represent the digits. In general the number of bits in an integer and the number of bits in a real is the same and this number of bits is called a "word".

Long reals and short integers may be stored in longer or shorter locations as appropriate.

A computer's memory is an enormous number of bits, divided into fixed size locations which are words. The number of bits in a word is usually several times larger than that needed to represent one ISO character, which requires a minimum of eight bits for the full set. Most computers divide the words in their memory into smaller locations called bytes. Each byte can hold one character and so is, normally, at least eight, but possibly more, bits long.

Figure 15.2 shows some typical memory locations on what is called "32 bit" computer architecture. Computers are often categorised by the number of bits in one word of their memory.

Diagram 15.2: 32 bit memory locations.

   integer       Ibyte1Ibyte2Ibyte3Ibyte4I -> 1 word -> 32 bits

   real          Ibyte1Ibyte2Ibyte3Ibyte4I -> 1 word -> 32 bits

   character     Ibyte1I                   -> 1 byte ->  8 bits

   long real     Ibyte1Ibyte2Ibyte3Ibyte4Ibyte5Ibyte6Ibyte7Ibyte8I
                                           
                                           -> 2 words-> 64 bits

   short integer Ibyte1Ibyte2I             -> 2 bytes-> 16 bits
N.b. this varies from system to sytem, even among 32 bit machines. Consult your documentation carefully if you use ByteFiles.

Clearly most programs handle information by the word (integers and reals) or by the byte (characters and texts). Furthermore, it is usually possible to treat a word as a sequence of bytes. Thus a file type which allows byte by byte access to memory can be used to read words.

Records are also sequences of bytes. They contain a fixed number of bytes (fixed length images), are prefixed by a byte or word indicating their length (variable length images), end with a special character (also variable length images) or are marked in any of a large number of possible ways. Thus reading a byte at a time allows records to be accessed as well. In fact many new ways of structuring files can be built on top of the ByteFile.

A simple example

Most practical uses of ByteFile are likely to be extremely technical. It is a concept which is likely to be useful to all of us, but in very different situations. Example 15.3 shows an important use in a realistic situation, and demonstrates the main attributes of InByteFile and OutByteFile.

Example 15.3: Use of ByteFiles.

   begin

      ref(InByteFile) LocalChars;
      ref(OutByteFile) ISOChars;

      LocalChars :- new InByteFile("SOURCE");
      ISOChars :- new OutByteFile("OUTPUT");
      inspect LocalChars do
      begin
         Open;
         inspect ISOChars do
         begin
            Open;
            SetAccess("bytesize:8");   ! Standard for ISO/ASCII files;
            while not EndFile do OutByte(Rank(ISOChar(InByte)));
            Close
         end--of--inspecting--ISOChars;
         Close
      end..of..inspecting..LocalChars
   end**of**program

The program will convert a file from ISO into local characters, by reading it as a stream of bytes. each of these will occupy the standard eight bits for an ISO character and the value read will be the ISORank for it. By passing this to ISOChar a local character corresponding to this ISO internal code is generated. By passing this to Rank, the local internal code is generated. This is written out to a file as a local byte, with the appropriate number of bits.

File class ByteFile

ByteFile is a subclass of File, but not of ImageFile. As it deals in bytes, not records it does not need an Image or any of the associated attributes.

It has a short integer procedure ByteSize, which will return the number of bits in a byte on that SIMULA system. The value of ByteSize is fixed.

It also has an Open procedure. Note that this requires no parameters, since there is no Image.

SetAccess also works for ByteFile. The mode bytesize is especially provided for use when files with non-standard byte sizes for a particular system, such as those brought from another computer, are to be processed.

ByteFile class InByteFile

InByteFile is used to represent input to the program as a sequence of bytes. It is used in a similar way to InFile. Its only attributes are:
EndFile
Is defined as usual as a Boolean procedure to say when the input is exhausted. Initially set to True.

Open
Is a Boolean procedure. Sets EndFile to False. Has no parameters. Acts in a similar way to Open for other File subclasses.

Close
Is a Boolean procedure. Sets EndFile to True. Acts in a similar way to Close in other File subclasses.

InByte
Is a short integer procedure. Reads in the next byte and returns its value as a short integer. If there are no more bytes to read sets EndFile to True.

InText
Is a text procedure with a text reference parameter. It fills the text parameter with the next N bytes, where N is the length of this text. It returns a reference to this, with its Pos reset to one.

ByteFile class OutByteFile

OutByteFile is the equivalent of OutFile. It is used to output to a file representing a sequence of bytes. It has the following attributes:
Open
Is a Boolean procedure, like that for InByteFile.

Close
Is a Boolean procedure, like that for InByteFile.

OutByte
Is a procedure with one short integer parameter. If this is greater than the largest value which can be represented as a binary number by one byte on this system, it is a runtime error. If it is a legal value it is written as the next byte to the output defined by this file.

OutText
Is a procedure with one text reference parameter. The characters of this text are written as the next N bytes to the file, where N is the length of the text.

ByteFile class DirectByteFile

In a DirectByteFile, the file represents a numbered sequence of bytes. The highest permitted number is fixed or set to the highest possible integer value on that SIMULA system minus one. In effect the DirectByteFile is the byte equivalent of a DirectFile. Unwritten bytes have a value of zero. It is possible to both read and write. The attributes are:
Open
Is a Boolean procedure with no parameters.

Close
Is a Boolean procedure with no parameters.

EndFile
Is a Boolean procedure returning True when the file is unopened or closed or the location being read from is greater than the highest currently written to.

MaxLoc
Is an integer procedure returning the highest byte number allowed for this file.

LastLoc
Is an integer procedure returning the highest byte number so far written to.

Location
Is an integer procedure returning the number of the current byte.

Locate
Is a procedure with one integer parameter, which moves the current location to the byte whose number is passed as a parameter. This value must be less than or equal to MaxLoc.

InByte
Is a short integer procedure which reads the current byte if its location is less than or equal to LastLoc and returns its value. It then moves the current location to the next byte. If the current byte is unwritten a zero is returned. If the current location is beyond LastLoc or is a previously unwritten location, zero is returned. If it is beyond MaxLoc, a runtime error is reported.

OutByte, InText and OutText are all like their equivalents in InByteFile and OutByteFile.

Summary

This rather brief chapter has outlined a new type of ImageFile and three types of File which do not use an Image.

The concepts of direct access and byte oriented access have been outlined.

We have seen the attributes and uses of ImageFile class DirectFile.

We have seen briefly the attributes of ByteFile and its subclasses InByteFile, OutByteFile and DirectByteFile.