
A Small Algol 68 Project, Part 2
In memory of J. Kevin Douglas, a good friend and fellow fan of Algol 68
In the last article in this series, we described a simple linear least squares problem – fitting a line to a sequence of (x,y) coordinates – and we wrote a small Algol 68 program to create a test data set for use in the development of a program to carry out this fitting operation.
In this article, we will develop a small collection of useful routines for reading text files and extracting data from them, setting us up for a more streamlined solution to the least squares problem.
Files in Algol 68
I would venture to guess that most of us think of files, especially text files, as collections of characters with intervening line breaks. But back in the days when Algol 68 was being designed and dinosaurs mainframes ruled the world, there was a tendency to think of files in terms of things like record length and access methods. In the IBM mainframe world, access methods referred to whether files were suitable for sequential or random record access, whether they could be lengthened or shortened and so on. Many applications, including compilers for programming languages like FORTRAN, COBOL or PL/1 were originally designed to interpret input coming from 80 column keypunch cards.
Moreover, different mainframes had differing word lengths; for example, the Univac 1100 series used a 36 bit word, whereas the IBM 360 and successors used a 32 bit word. Those 36 bit words in the Univac mainframes held six 6-bit characters, encoded according to the US Army FIELDATA communications standard (upper case letters, numbers and a few symbols). The IBM 360’s 32 bit words held four 8-bit characters encoded as EBCDIC.
Contrast this to our current perspective of 32 and 64 bit words and Unicode character encoding.
Given that Algol 68 was designed according to the file gestalt of the 1960s, I find it kind of surprising that it seems to work quite well with the “file as stream of characters” mindset that really kind of took flight thanks to the approach to files taken by the early Unix developers.
So let’s dig in.
Like many modern programming languages, Algol 68 relies on opening files in order to acquire the ability to read or write them and closing them in order to relinquish this ability.
The act of opening a file is handled by the open procedure defined as:
PROC open = (REF FILE f, STRING idf, CHANNEL chan) INT
The first parameter is a reference to an Algol 68 file data structure – the FILE
– which is set by the procedure and returned to the caller.
The second parameter is the constant externally-known name of the file; its identifier, STRING idf
.
The third parameter is the channel – CHANNEL
– to be used for reading from or writing to the file. Back in the 60s, channels were the means to associate the program endpoint (the FILE
) with the external storage referred to by the file name.
A call to OPEN()
returns an integer value which is the return code – the result – of attempting to open the file.
Algol 68 Genie pre-defines a standard set of FILE
s (stand in, stand out, stand error
and stand back
) and a standard set of CHANNEL
s (stand in channel, stand out channel, stand error channel, stand back channel, stand draw channel
).1 As in most programming languages used in Linux, we can associate those with physical files on the terminal command line.
We can also define our own FILE
and associate it with one of the existing channels (the purpose of the open()
procedure). And of course there are other operations we can undertake; but for the purposes of this article, we’re really only interested in reading the file, detecting the end of file while reading and closing it once we’re done.
The get()
procedure is typically used to read lines from a file:
PROC get = (REF FILE f, [] UNION (INTYPE, PROC (REF FILE) VOID) items) VOID
We see that get()
is passed a reference to the FILE
, allowing it to update the FILE
object; and a complicated looking thing – an array of united modes, which can be either mode INTYPE
or a PROC
. I don’t really want to make a huge digression at this point into united modes in general nor to INTYPE
, which is itself a kind of united mode. Those impatient to find out more may wish to consult Chapter 7 of the Algol 68 Genie learning guide.2 Suffice to say for now that get()
can be made to return all manner of things if they are properly formatted; we will use it to return lines of the file, one by one.
The on logical file end()
procedure is used to register a procedure to be called when reading reaches the end of the file:
PROC on logical file end = (REF FILE f, PROC (REF FILE) BOOL p) VOID
This procedure is an interesting thing – it detects an event – in this case, reaching the logical end of the file – and calls another procedure to deal with that situation – what we might call a listener in today’s jargon.
The close()
procedure is used to cleanly sever the connection between the external file and the program:
PROC close = (REF FILE f) VOID
We can see that there is some boiler plate code involved here that we might not wish to deal with every time we want to read a text file, so let’s encapsulate this code in our own procedure to simplify its use.
Streamlined text file reading in Algol 68
Here is a procedure to streamline reading a text file line by line:
1 PROC each line = (STRING input file name, PROC (STRING, INT) VOID l) VOID:
2 BEGIN
3 FILE input file;
4 VOID(open(input file, input file name, standinchannel));
5 BOOL fr := FALSE;
6 on logical file end (input file, (REF FILE f) BOOL: fr := TRUE);
7 INT linecount := 0;
8 WHILE
9 STRING line;
10 get(input file,(line, newline));
11 NOT fr
12 DO
13 linecount +:= 1;
14 l(line, linecount)
15 OD;
16 close(input file)
17 END # each line #;
Let’s walk through this line by line:
Line 1 declares the procedure, which takes a STRING
argument (the external file name) and a PROC
to be called to process each line once it’s read, returning VOID
(no value). Here you see Algol 68 giving us the ability to use what are called “here procedures” or “lambdas” or “closures” (with apologies for trampling on some fine distinctions here). We’ll see how this is used below in a simple example.
Line 2 declares the variable input file
.
Line 3 opens the input file
using the input file name
parameter and associates that with standin channel
. The call is wrapped in a coercion of the return value to VOID
. We’ll assume for now that the user is careful to check that the external file exists and is readable, and doesn’t bother checking a return code. Is this good design? Well maybe not really; but I’m not an immediate fan of bubbling up the return code from open()
to the caller of each line()
, either. I guess we could just halt at this point…
Lines 4 and 5 declare a variable fr
that is used to indicate whether or not the reading has reached the end of file; set fr
to FALSE
initially; and pass a listener procedure to on logical file end
that sets fr
to TRUE
when the end of file is reached.
Line 6 declares and initializes the line count
variable, to be passed to the line listener routine.
Lines 8 through 12 merit careful attention, as they offer a minor but clear demonstration of the concept of a serial clause yielding a value.
- The construct
WHILE … DO … OD
executes whatever is betweenDO
andOD
repetitively as long as whatever is betweenWHILE
andDO
evaluates toTRUE
. - Line 9 declares the
STRING
variableline
; line 10 calls theget()
procedure to readline
frominput file
. Note thatnew line
is called to advance past the line end to the next line (or logical end of file). - Line 10 yields a value that is either
TRUE
orFALSE
, controlling whether the loop is executed again. Of course, this value is the negation of the value of the variablefr
which is set toTRUE
when logical end of file is reached.
Lines 12 through 15 are now a bit of an anticlimax – in line 13 line count
is incremented since we know line was successfully read by the serial clause between the WHILE
and DO
; and in line 14 we call the routine provided by the caller to manage each line, passing it the values in line
and line count
.
Finally, in line 16 we close the file and we are done.
A simple program that exercises this each line()
procedure might look like:
each line(“.profile”, (STRING line, INT line count) VOID: BEGIN
print((line count, “:”, line, new line))
END)
Once again, this demonstrates the use of an anonymous “here procedure” by the programmer, obscuring the boiler plate and concentrating on the details of what must be done, rather than how it’s to be done.
Text files with delimited fields in Algol 68
As we move into reading the data file needed for our least squares fit, we recognize that the problem looks a lot like something we’ve dealt with in the AWK
programming language in the past – a file whose lines are separated into fields with a delimiter character.
Therefore, we need two things:
- A mechanism to split a string into an array of fields based on delimiter characters, and
- A slightly revised version of
each line()
that does that splitting
Here is a procedure to split a string into an array of fields:
1 PROC split = (STRING s, CHAR d) [] STRING:
2 BEGIN
3 INT fieldcount := 1;
4 FOR p FROM LWB s TO UPB s DO
5 IF s[p] = d THEN
6 fieldcount +:= 1
7 FI
8 OD;
9 [1:fieldcount] STRING fields;
10 fieldcount := 1;
11 INT fieldstart := LWB s;
12 FOR p FROM LWB s TO UPB s DO
13 IF s[p] = d THEN
14 fields[fieldcount] := (p > fieldstart | s[fieldstart:(p - 1)] | "");
15 fieldcount +:= 1;
16 fieldstart := p + 1
17 FI
18 OD;
19 fields[fieldcount] := (UPB s > fieldstart | s[fieldstart:UPB s] | "");
20 fields
21 END # split #;
I’m just going to touch on the highlights in the above.
Lines 3 through 8 count the number of delimiters and therefore the number of delimited fields.
In line 4 we see LWB s
and UPB s
– the LWB
operator gives the lower bound of the argument array; the UPB
operator gives the upper bound of the argument array. Generally Algol 68 arrays start at 1 but let’s get in a good habit here.
Line 9 declares the array of fields.
Lines 10 through 19 pass over the input string once more, copying the delimited text into the array of fields.
- Line 14 has an interesting expression on the right hand side of the assignment statement.
- In line 14, we’ve arrived at the situation where we have detected the next delimiter, which is at position
p
in the input string. - If there is at least one character between the start of field and the next delimiter, then we return the slice of the input string starting at position
fieldstart
and going through to positionp - 1
. - Otherwise the field is empty and we return the empty string
“”
. - The construct
( … | … | … )
is equivalent to – shorthand for, really –IF … THEN … ELSE … FI
, and returns either the last value of theTHEN
part or the last value of theELSE
part. So quite similar to the workings of the ternary operator in C or Java; except that it’s the same old if-statement. - Line 19 looks a lot like line 14 except we assume the end of the line is the final field delimiter.
Next, we’ll modify each line()
to provide the split fields in a very AWK
-ish fashion.
1 PROC each delimited line = (STRING input file name, CHAR d, PROC (STRING, []STRING, INT) VOID l) VOID:
2 BEGIN
3 FILE input file;
4 VOID(open(input file, input file name, standinchannel));
5 BOOL fr := FALSE;
6 on logical file end (input file, (REF FILE f) BOOL: fr := TRUE);
7 INT linecount := 0;
8 WHILE
9 STRING line;
10 get(input file,(line, newline));
11 NOT fr
12 DO
13 linecount +:= 1;
14 []STRING fields = split(line,d);
15 l(line, fields, linecount)
16 OD;
17 close(input file)
18 END # each delimited line #;
Really only three differences here:
- The procedure
each delimited line()
takes a third argument,d
, which is the delimiter character; - The procedure parameter takes a third argument – the array of strings containing the fields;
- The array
fields
is declared and set to thesplit()
of the line based on the delimiter character.
Here is the outline of our program to undertake the least squares fitting:
BEGIN
each delimited line("test_data.txt", ",", (STRING line, []STRING fields, INT line count) VOID: BEGIN
print((line count, " ", fields, new line))
END)
END
In the next article, we’ll add in the calculations in place of the print()
call.
- Learning Algol 68 Genie, 7.2 Channels and files, p. 120 ↩︎
- Ibid, 7 Transput, p. 119. ↩︎