A Small Algol 68 Project, Part 2

In memory of J. Kevin Douglas, a good friend and fellow fan of Algol 68

In the last article in this series, we described a simple linear least squares problem – fitting a line to a sequence of (x,y) coordinates – and we wrote a small Algol 68 program to create a test data set for use in the development of a program to carry out this fitting operation.

In this article, we will develop a small collection of useful routines for reading text files and extracting data from them, setting us up for a more streamlined solution to the least squares problem.

Files in Algol 68

I would venture to guess that most of us think of files, especially text files, as collections of characters with intervening line breaks. But back in the days when Algol 68 was being designed and ~~dinosaurs~~ mainframes ruled the world, there was a tendency to think of files in terms of things like record length and access methods. In the IBM mainframe world, access methods referred to whether files were suitable for sequential or random record access, whether they could be lengthened or shortened and so on. Many applications, including compilers for programming languages like FORTRAN, COBOL or PL/1 were originally designed to interpret input coming from 80 column keypunch cards.

Moreover, different mainframes had differing word lengths; for example, the Univac 1100 series used a 36 bit word, whereas the IBM 360 and successors used a 32 bit word. Those 36 bit words in the Univac mainframes held six 6-bit characters, encoded according to the US Army FIELDATA communications standard (upper case letters, numbers and a few symbols). The IBM 360’s 32 bit words held four 8-bit characters encoded as EBCDIC.

Contrast this to our current perspective of 32 and 64 bit words and Unicode character encoding.

Given that Algol 68 was designed according to the file gestalt of the 1960s, I find it kind of surprising that it seems to work quite well with the “file as stream of characters” mindset that really kind of took flight thanks to the approach to files taken by the early Unix developers.

So let’s dig in.

Like many modern programming languages, Algol 68 relies on opening files in order to acquire the ability to read or write them and closing them in order to relinquish this ability.

The act of opening a file is handled by the open procedure defined as:

PROC open = (REF FILE f, STRING idf, CHANNEL chan) INT

The first parameter is a reference to an Algol 68 file data structure – the FILE – which is set by the procedure and returned to the caller.

The second parameter is the constant externally-known name of the file; its identifier, STRING idf.

The third parameter is the channel – CHANNEL – to be used for reading from or writing to the file. Back in the 60s, channels were the means to associate the program endpoint (the FILE) with the external storage referred to by the file name.

A call to OPEN() returns an integer value which is the return code – the result – of attempting to open the file.

Algol 68 Genie pre-defines a standard set of FILEs (stand in, stand out, stand error and stand back) and a standard set of CHANNELs (stand in channel, stand out channel, stand error channel, stand back channel, stand draw channel).¹ As in most programming languages used in Linux, we can associate those with physical files on the terminal command line.

We can also define our own FILE and associate it with one of the existing channels (the purpose of the open() procedure). And of course there are other operations we can undertake; but for the purposes of this article, we’re really only interested in reading the file, detecting the end of file while reading and closing it once we’re done.

The get() procedure is typically used to read lines from a file:

PROC get = (REF FILE f, [] UNION (INTYPE, PROC (REF FILE) VOID) items) VOID

We see that get() is passed a reference to the FILE, allowing it to update the FILE object; and a complicated looking thing – an array of united modes, which can be either mode INTYPE or a PROC. I don’t really want to make a huge digression at this point into united modes in general nor to INTYPE, which is itself a kind of united mode. Those impatient to find out more may wish to consult Chapter 7 of the Algol 68 Genie learning guide.² Suffice to say for now that get() can be made to return all manner of things if they are properly formatted; we will use it to return lines of the file, one by one.

The on logical file end() procedure is used to register a procedure to be called when reading reaches the end of the file:

PROC on logical file end = (REF FILE f, PROC (REF FILE) BOOL p) VOID

This procedure is an interesting thing – it detects an event – in this case, reaching the logical end of the file – and calls another procedure to deal with that situation – what we might call a listener in today’s jargon.

The close() procedure is used to cleanly sever the connection between the external file and the program:

PROC close = (REF FILE f) VOID

We can see that there is some boiler plate code involved here that we might not wish to deal with every time we want to read a text file, so let’s encapsulate this code in our own procedure to simplify its use.

Streamlined text file reading in Algol 68

Here is a procedure to streamline reading a text file line by line:

1 PROC each line = (STRING input file name, PROC (STRING, INT) VOID l) VOID:
2 BEGIN
3     FILE input file;
4     VOID(open(input file, input file name, standinchannel));
5     BOOL fr := FALSE;
6     on logical file end (input file, (REF FILE f) BOOL: fr := TRUE);
7     INT linecount := 0;
8     WHILE
9         STRING line;
10         get(input file,(line, newline));
11         NOT fr
12     DO
13         linecount +:= 1;
14         l(line, linecount)
15     OD;
16     close(input file)
17 END # each line #;

Let’s walk through this line by line:

Line 1 declares the procedure, which takes a STRING argument (the external file name) and a PROC to be called to process each line once it’s read, returning VOID (no value). Here you see Algol 68 giving us the ability to use what are called “here procedures” or “lambdas” or “closures” (with apologies for trampling on some fine distinctions here). We’ll see how this is used below in a simple example.

Line 2 declares the variable input file.

Line 3 opens the input file using the input file name parameter and associates that with standin channel. The call is wrapped in a coercion of the return value to VOID. We’ll assume for now that the user is careful to check that the external file exists and is readable, and doesn’t bother checking a return code. Is this good design? Well maybe not really; but I’m not an immediate fan of bubbling up the return code from open() to the caller of each line(), either. I guess we could just halt at this point…

Lines 4 and 5 declare a variable fr that is used to indicate whether or not the reading has reached the end of file; set fr to FALSE initially; and pass a listener procedure to on logical file end that sets fr to TRUE when the end of file is reached.

Line 6 declares and initializes the line count variable, to be passed to the line listener routine.

Lines 8 through 12 merit careful attention, as they offer a minor but clear demonstration of the concept of a serial clause yielding a value.

The construct WHILE … DO … OD executes whatever is between DO and OD repetitively as long as whatever is between WHILE and DO evaluates to TRUE.
Line 9 declares the STRING variable line; line 10 calls the get() procedure to read line from input file. Note that new line is called to advance past the line end to the next line (or logical end of file).
Line 10 yields a value that is either TRUE or FALSE, controlling whether the loop is executed again. Of course, this value is the negation of the value of the variable fr which is set to TRUE when logical end of file is reached.

Lines 12 through 15 are now a bit of an anticlimax – in line 13 line count is incremented since we know line was successfully read by the serial clause between the WHILE and DO; and in line 14 we call the routine provided by the caller to manage each line, passing it the values in line and line count.

Finally, in line 16 we close the file and we are done.

A simple program that exercises this each line() procedure might look like:

each line(“.profile”, (STRING line, INT line count) VOID: BEGIN
    print((line count, “:”, line, new line))
END)

Once again, this demonstrates the use of an anonymous “here procedure” by the programmer, obscuring the boiler plate and concentrating on the details of what must be done, rather than how it’s to be done.

Text files with delimited fields in Algol 68

As we move into reading the data file needed for our least squares fit, we recognize that the problem looks a lot like something we’ve dealt with in the AWK programming language in the past – a file whose lines are separated into fields with a delimiter character.

Therefore, we need two things:

A mechanism to split a string into an array of fields based on delimiter characters, and
A slightly revised version of each line() that does that splitting

Here is a procedure to split a string into an array of fields:

1 PROC split = (STRING s, CHAR d) [] STRING:
2 BEGIN
3     INT fieldcount := 1;
4     FOR p FROM LWB s TO UPB s DO
5         IF s[p] = d THEN
6             fieldcount +:= 1
7         FI
8     OD;
9     [1:fieldcount] STRING fields;
10     fieldcount := 1;
11     INT fieldstart := LWB s;
12     FOR p FROM LWB s TO UPB s DO
13         IF s[p] = d THEN
14             fields[fieldcount] := (p > fieldstart | s[fieldstart:(p - 1)] | "");
15             fieldcount +:= 1;
16             fieldstart := p + 1
17         FI
18     OD;
19     fields[fieldcount] := (UPB s > fieldstart | s[fieldstart:UPB s] | "");
20     fields
21 END # split #;

I’m just going to touch on the highlights in the above.

Lines 3 through 8 count the number of delimiters and therefore the number of delimited fields.

In line 4 we see LWB s and UPB s – the LWB operator gives the lower bound of the argument array; the UPB operator gives the upper bound of the argument array. Generally Algol 68 arrays start at 1 but let’s get in a good habit here.

Line 9 declares the array of fields.

Lines 10 through 19 pass over the input string once more, copying the delimited text into the array of fields.

Line 14 has an interesting expression on the right hand side of the assignment statement.
In line 14, we’ve arrived at the situation where we have detected the next delimiter, which is at position p in the input string.
If there is at least one character between the start of field and the next delimiter, then we return the slice of the input string starting at position fieldstart and going through to position p - 1.
Otherwise the field is empty and we return the empty string “”.
The construct ( … | … | … ) is equivalent to – shorthand for, really – IF … THEN … ELSE … FI, and returns either the last value of the THEN part or the last value of the ELSE part. So quite similar to the workings of the ternary operator in C or Java; except that it’s the same old if-statement.
Line 19 looks a lot like line 14 except we assume the end of the line is the final field delimiter.

Next, we’ll modify each line() to provide the split fields in a very AWK-ish fashion.

1 PROC each delimited line = (STRING input file name, CHAR d, PROC (STRING, []STRING, INT) VOID l) VOID:
2 BEGIN
3     FILE input file;
4     VOID(open(input file, input file name, standinchannel));
5     BOOL fr := FALSE;
6     on logical file end (input file, (REF FILE f) BOOL: fr := TRUE);
7     INT linecount := 0;
8     WHILE
9         STRING line;
10         get(input file,(line, newline));
11         NOT fr
12     DO
13         linecount +:= 1;
14         []STRING fields = split(line,d);
15         l(line, fields, linecount)
16     OD;
17     close(input file)
18 END # each delimited line #;

Really only three differences here:

The procedure each delimited line() takes a third argument, d, which is the delimiter character;
The procedure parameter takes a third argument – the array of strings containing the fields;
The array fields is declared and set to the split() of the line based on the delimiter character.

Here is the outline of our program to undertake the least squares fitting:

BEGIN
    each delimited line("test_data.txt", ",", (STRING line, []STRING fields, INT line count) VOID: BEGIN
        print((line count, " ", fields, new line))
    END)
END

In the next article, we’ll add in the calculations in place of the print() call.

Learning Algol 68 Genie, 7.2 Channels and files, p. 120 ↩︎
Ibid, 7 Transput, p. 119. ↩︎

InAlgol 68, Programming