
An Algol 68 Pretty Printer
For Marcel van der Veer, with many thanks for Algol 68 Genie
If you pick up a textbook on Algol 68 from the 1970s, such as Andrew McGettrick’s wonderful primer, Algol 68 – a first and second course1, you will see code examples represented as follows:2
begin real e = 2.7182818284; real circum;
circum := 2 × pi × e;
print ((“circumference of a cricle of radius e is”, circum))
end
Professor McGettrick soon explains:
When a program has to be run on a computer it will have to be prepared in some suitable manner… Unfortunately, however, these and similar devices do not permit the printing in bold type of words such as begin. Indeed, often small leters are also unavailable… Typically begin might appear in one of the forms
.BEGIN. ‘BEGIN’ ‘BEGIN BEGIN
… Moreover the multiplication sign has been replaced by an asterisk.3
This convention of using boldfaced words for symbols comes from the Revised Report,4 which states:
The manner in which symbols for newly defined mode-indications and operators are to be represented is now more closely defined. Thus it is clear that the implementer is to provide a special alphabet of bold-faced, or “‘stropped”‘, marks from which symbols such as person may be made, and it is also clear that operators such as » are to be allowed.5
Worthwhile mentioning at this point is that the members of the Working Group defining Algol 68 were developing a whole new framework for describing the programming language, and even for describing the description, with new concepts such as stropping mentioned above.
Most of us have had some experience with programming languages whose designers have adopted a standard for styling code in that language. Think of the arguments between C programmers as to whether to write } else {
or have one or more of the braces appear on separate lines.
And yes, one of the little things that challenges Algol 68 implementations is how to deal with the difference between begin, which is a symbol, and begin, which is an identity.
In the articles I have written to date about Algol 68 on both.org, I have used the Algol 68 Genie compiler – interpreter, which prefers UPPER stropping – symbols are written all in capital letters and identities are written all in lower case letters and numbers. Moreover, symbols cannot contain blanks, but identifiers can.
This UPPER stropping makes it pretty straightforward to write a simple pretty printer, in order to convert bold-stropped Algol 68 code into the more decorated version, in HTML format, for inclusion in web pages or elsewhere.
Enough background, let’s get to work on the code.
The pretty printer
Here it is. It’s pretty long, but not super-complicated, and because of the length I’ve added quite a few comments. Discussion follows:
1 # 2 Library-preludes 3 4 /Nothing, however, prevents you from subjoining to this standard-prelude a 5 set of home made declarations. Of course you are free to declare such new 6 things in your own particular-program; but, as soon as you want to apply 7 them in several programs,or you want to enable others to use them, or you 8 have reason to expect that more efficiency may be acquired, then you can go 9 to your implementor and ask him to make the whole set an as efficient as 10 possible extension of the standard-prelude, i.e. 'it 'library-prelude'. In 11 that way an arbitrary number of problem oriented dialects may be defined./ 12 Lindsey, C.H., & van der Meulen, S.G. (1973). 13 /Informal Introduction to Algol 68/ (Revised Edition, Ch. 0.10.7, p. 52). 14 North Holland Publishing Company. 15 # 16 PR read "each_line.a68" PR 17 # 18 Particular-program 19 Algol 68 pretty printer. 20 Depends on UPPER stropping. 21 Lowercases and bolds reserved words. 22 Italicizes variable names. 23 Generates HTML output. 24 # 25 BEGIN 26 # 27 Using a68g, we can get at Linux command line arguments. 28 In order to pass arguments to the program, we terminate arguments to a68g 29 itself using the command line option "--". 30 Therefore, there are three arguments to be skipped before the program can 31 find its own arguments: 32 argv(1) - "a68g" 33 argv(2) - name of the program being interpreted 34 argv(3) - "--" which is necessary to get a68g to leave off interpreting args 35 Following those, we can provide a list of one or more Algol 68 files, 36 which appear in argv(4) and onward. 37 # 38 IF argc < 4 THEN 39 print("Usage: a68g pp.a68 -- [source files for pretty printing]") 40 ELSE 41 # 42 Embed our pretty-printed code in a minimal HTML head-body section 43 # 44 print(("<html>",new line)); 45 print(("<head>",new line)); 46 print(("<title> Algol 68 Pretty Print</title>",new line)); 47 print(("<body>",new line)); 48 # 49 Loop over source files. 50 There is no sophisticated parsing nor error checking here 51 # 52 FOR an FROM 4 TO argc DO 53 STRING file name = argv(an); 54 # 55 If the file is a readable regular file, we attempt to pretty print it 56 # 57 IF file is regular(file name) THEN 58 print((new line, new line,"<p><strong><em>{ **** ",file name," **** }</em></strong></p>",new line,new line)); 59 # 60 There are six "states" that the pretty printer can be in: 61 - commenting (between opening sharp and closing sharp): 62 do no pretty printing 63 - stringing (between opening doublequote and closing doublequote): 64 do no pretty printing 65 - leadering (processing leading blanks on a line): 66 replace blanks with , tabs with 67 - reserved wording (processing upper case words): 68 lowercase and wrap in <strong></strong> 69 - identitying (processing lower case variable names possibly 70 including spaces or numerals): 71 wrap in <em></em> 72 - other (for example processing numerals, punctuation): 73 just let the characters through 74 Commenting can be active over more than one line, but other states reset at the beginning of each line. 75 Each file appears as one HTML paragraph, with line breaks after each line. 76 # 77 print("<p>"); 78 # multiline state resets # 79 BOOL commenting := FALSE; 80 # 81 Loop over the lines in the file 82 # 83 each line(file name, (STRING line, INT linecount) VOID: BEGIN 84 # individual line state resets # 85 BOOL reserved wording := FALSE; 86 BOOL identitying := FALSE; 87 BOOL stringing := FALSE; 88 BOOL leadering := TRUE; 89 # process the line character-by-character # 90 FOR cn FROM LWB line TO UPB line DO 91 CHAR c := line[cn]; 92 # analyze the current character and change states where necessary # 93 IF c = """" AND NOT commenting THEN 94 # start or end of a string - flip stringing; off reserved wording, identifying, leadering # 95 stringing := NOT stringing; 96 IF reserved wording THEN reserved wording := FALSE; print("</strong>") FI; 97 IF identitying THEN identitying := FALSE; print("</em>") FI; 98 leadering := FALSE 99 ELIF c = "#" AND NOT stringing THEN 100 # start or end of a comment - flip commenting; off reserved wording, identifying, leadering # 101 commenting := NOT commenting; 102 IF reserved wording THEN reserved wording := FALSE; print("</strong>") FI; 103 IF identitying THEN identitying := FALSE; print("</em>") FI; 104 leadering := FALSE 105 ELIF NOT (commenting OR stringing) THEN 106 IF "A" <= c AND c <= "Z" THEN 107 # reserved wording # 108 IF NOT reserved wording THEN 109 # start of reserved wording - on reserved wording; off identifying, stringing, commenting, leadering # 110 reserved wording := TRUE; print("<strong>"); 111 IF identitying THEN identitying := FALSE; print("</em>") FI; 112 leadering := FALSE 113 FI; 114 # translate to lower case # 115 c := REPR (ABS c - ABS "A" + ABS "a") 116 ELIF "a" <= c AND c <= "z" THEN 117 # identifying # 118 IF NOT identitying THEN 119 # start of identifying - on identifying; off reserved wording, stringing, commenting, leadering # 120 identitying := TRUE; print("<em>"); 121 IF reserved wording THEN reserved wording := FALSE; print("</strong>") FI; 122 leadering := FALSE 123 FI 124 # don't mess with the character # 125 ELIF ABS c = 9 OR c = " " THEN 126 # leadering, space within an identity or just a space - off reserved wording # 127 IF reserved wording THEN reserved wording := FALSE; print("</strong>") FI 128 ELIF "0" <= c AND c <= "9" THEN 129 # continuation of an identity or just a numeral - off reserved wording, leadering # 130 IF reserved wording THEN reserved wording := FALSE; print("</strong>") FI; 131 leadering := FALSE 132 ELSE 133 # something else, e.g. punctuation - off reserved wording, identitying and leadering # 134 IF reserved wording THEN print("</strong>"); reserved wording := FALSE FI; 135 IF identitying THEN print("</em>"); identitying := FALSE FI; 136 leadering := FALSE 137 FI 138 FI; 139 # emit the current character (or its surrogate) # 140 IF leadering AND c = " " THEN 141 print(" ") 142 ELIF leadering AND ABS c = 9 THEN 143 print(" ") 144 ELIF c = "<" THEN 145 print("<") 146 ELIF c = ">" THEN 147 print(">") 148 ELSE 149 print(c) 150 FI 151 OD; 152 # end of line - off reserved wording, identifying; print line break # 153 IF reserved wording THEN print("</strong>"); reserved wording := FALSE FI; 154 IF identitying THEN print("</em>"); identitying := FALSE FI; 155 print(("<br />",new line)) 156 END); 157 # 158 That's it for this file, close off the HTML paragraph 159 # 160 print(("</p>",new line)) 161 ELSE 162 # 163 Oops - not a regular file 164 # 165 print((new line, new line,"<p><strong><em>{ **** ",file name," **** is not a regular file - skipping }</em></strong></p>",new line,new line)) 166 FI 167 OD; 168 # 169 Emit HTML postamble 170 # 171 print(("</body>",new line)); 172 print(("</html>",new line)) 173 FI 174 END
On line 16, we use the Algol 68 Genie pragmat read to include the source code for our each line() procedure that lets us process a text file without worrying about the “how” details.
The particular-program, that is the main program, runs from the BEGIN
on line 25 through to the END
on line 174.
Lines 26 – 40 use Algol 68 Genie extensions to get at the Linux command line arguments, allowing the user of our program to pass in one or more source code files for pretty printing. The comments explain this in more detail.
Lines 41 – 47 emit the HTML the opening “html”, the “head” block and the opening “body” elements; lines 168 – 173 emit the closing “body” and “html” elements. This lets us have a standalone HTML page; or we can take the output lines between these two HTML segments if we just want the pretty printed code.
Lines 52 – 167 are a loop over each file supplied by the user. As the comments note, the pretty printed code is placed in a single HTML paragraph with HTML line breaks at the end of each line.
Line 53 sets the file name from the relevant argv()
call.
Lines 54 – 161 process the file if it is “regular”; lines 162 – 166 print a warning and skip the file otherwise.
Lines 58 – 77 print an HTML header for the file being processed, describe the interpretation process as a set of states established by what is read and used for the pretty printing and print the HTML opening “p” element, indicating the start of the pretty printed code. Lines 157 – 160 print the closing “p” element.
Lines 78 – 156 initialize the commenting
state to FALSE
– that is, the pretty printing starts knowing that it is not within a comment – and use the each line()
procedure to process the lines in the text file. This then is where the work is done. Comments in Algol 68 are potentially multline and therefore cannot be reset to “not commenting” at the end of each line.
Lines 84 – 88 reset the remaining states – stringing
, reserved wording
, identifying
and leadering
at the beginning of each line. These states are not multiline, so this is appropriate. The leadering
state might confuse – this refers to the zero or more blanks or tabs that begin each line of code. All of these states are mutually exclusive, as is the “not any of these” state, at least as far as pretty printing goes. There is no adjustment to text within comments or strings. Symbols (reserved words) are converted to HTML “strong” text and lower-cased; identifiers are converted to HTML “em” text.
Lines 90 – 151 loop over the characters in a line, looking for state changes and wrapping symbols and identifiers as required.
Lines 93 – 98 detect the beginning or end of a string, flip the stringing
state and turn off reserved wording
, identifying
and leadering
states.
Lines 98 – 104 detect the beginning or end of a comment, flip the commenting
state and turn off reserved wording
, identifying
and leadering
states.
Lines 105 – 138 deal with the rest of the states provided that the pretty printing process is in neither the stringing
nor commenting
state.
Lines 106 – 115 deal with symbols, turning on the reserved wording
state at the beginning of a symbol (upper case letter) and turning off identifying
and leadering
states. Uppercase characters are translated to lower case in line 115 using a rule that works with ASCII upper- and lowercase characters.
Lines 116 – 124 deal with identifiers, turning on the identifying
state at the beginning of an identifier (lower case letter) and turning off reserved wording
and leadering
states. No adjustments are made to the characters themselves.
Lines 125 – 127 deal with spaces that terminate symbols, can be embedded in identifiers or can be “significant” white space (from a pretty printing perspective) at the beginning of a line, turning off the reserved wording
state.
Lines 128 – 131 deal with numerical digits that terminate symbols and leading white space but can be a second or subsequent character in an identifier, turning off the reserved wording
and leadering
states.
Lines 132 – 136 deal with any other characters which terminate symbols, identifiers and leading white space, turning off reserved wording
, identifying
and leadering
states.
Phew, done with tracking all the state changes and bits of HTML markup!
Lines 140 – 150 convert the character to one or four HTML non-breaking spaces if it’s a space ot tab and the leadering state is on; to HTML lt and gt symbols if it’s the less-than or greater-than symbol respectively; or just output it otherwise.
Finally lines 152 – 155 handle the end of line.
Running the code and looking at the results
Let’s test the program out on the regress.a68 program we wrote in the previous article:
$ a68g pp.a68 -- regress.a68 > regress.html $
And here’s the output:
{ **** regress.a68 **** }
#
Utility routines
#
pr read “str_to_real.a68” pr
pr read “split.a68” pr
pr read “each_delimited_line.a68” pr
#
Perform a least-squares fit of a line to a data file containing (x,y) values
#
begin
[1:2,1:2] real xtx := ((0.0, 0.0),(0.0,0.0));
[1:2] real xty := (0.0, 0.0);
# Accumulate XTX and XTy #
each delimited line(“test_data.txt”, “,”, (string line, []string fields, int line count) void: begin
# skip the header line #
if line count > 1 then
real xi = str to real(fields[1]), yi = str to real(fields[2]);
xtx[1,1] +:= xi * xi;
xtx[1,2] +:= xi;
xtx[2,2] +:= 1.0;
xty[1] +:= xi * yi;
xty[2] +:= yi
fi
end);
xtx[2,1] := xtx[1,2];
# let’s see the matrix and vector before Gauss #
print((“before Gauss:”,new line));
print((“⎡”,xtx[1,1],” “,xtx[1,2],”⎤ = ⎡”,xty[1],”⎤”,new line));
print((“⎣”,xtx[2,1],” “,xtx[2,2],”⎦ ⎣”,xty[2],”⎦”,new line));
# Gaussian elimination #
real a = –xtx[2,1] / xtx[1,1];
xtx[2,1] +:= a * xtx[1,1];
xtx[2,2] +:= a * xtx[1,2];
xty[2] +:= a * xty[1];
# let’s see the matrix and vector after Gauss #
print((“after Gauss:”,new line));
print((“⎡”,xtx[1,1],” “,xtx[1,2],”⎤ = ⎡”,xty[1],”⎤”,new line));
print((“⎣”,xtx[2,1],” “,xtx[2,2],”⎦ ⎣”,xty[2],”⎦”,new line));
# back substitution #
real b = xty[2] / xtx[2,2];
real m = (xty[1] – xtx[1,2] * b) / xtx[1,1];
# let’s see the equation #
print((“equation:”,new line));
print((“y = “,fixed(m,0,7),” * x + “,fixed(b,0,7),new line))
endhttps://www.both.org/?p=11217
Not bad! I like to have string constants like “equation:” stand out from identifiers so I don’t italicize them. But other than that, I think it looks pretty close to the beautifully typeset examples from the 1970s.
What did we learn here? Two things come to mind:
- we gained some familiarity with processing text in Algol 68, which is cool because we now know it’s not just a “numerical algorithms language”;
- we saw that we can parse an Algol 68 program that is UPPER stropped with a very simple state-machine approach.
I’ll be back with more Algol 68 in the not-too-distant future, when we’ll take a look at the exciting new GNU Algol 68 compiler, built with the GNU Compiler Collection.
1 McGettrick, A.D. (1978). Algol 68 – a first and second course. Cambridge University Press.
2 Ibid, p. 6
3 Ibid, p. 7-8
4 van Wijngaarden, A., Mailloux, B.J., Peck, J.E.L., Koster, C.H.A., Sintzoff, M., Lindsey, C.H., Meertens, L.G.L.T., & Fisker, R.G. (1978). Revised Report on the Algorithmic Language Algol 68. IFIP Working Group 2.1.
5 Ibid, § 0.3.6, p. 14