An Algol 68 Pretty Printer
For Marcel van der Veer, with many thanks for Algol 68 Genie
If you pick up a textbook on Algol 68 from the 1970s, such as Andrew McGettrick’s wonderful primer, Algol 68 – a first and second course1, you will see code examples represented as follows:2
begin real e = 2.7182818284; real circum;
circum := 2 × pi × e;
print ((“circumference of a cricle of radius e is”, circum))
end
Professor McGettrick soon explains:
When a program has to be run on a computer it will have to be prepared in some suitable manner… Unfortunately, however, these and similar devices do not permit the printing in bold type of words such as begin. Indeed, often small leters are also unavailable… Typically begin might appear in one of the forms
.BEGIN. ‘BEGIN’ ‘BEGIN BEGIN
… Moreover the multiplication sign has been replaced by an asterisk.3
This convention of using boldfaced words for symbols comes from the Revised Report,4 which states:
The manner in which symbols for newly defined mode-indications and operators are to be represented is now more closely defined. Thus it is clear that the implementer is to provide a special alphabet of bold-faced, or “‘stropped”‘, marks from which symbols such as person may be made, and it is also clear that operators such as » are to be allowed.5
Worthwhile mentioning at this point is that the members of the Working Group defining Algol 68 were developing a whole new framework for describing the programming language, and even for describing the description, with new concepts such as stropping mentioned above.
Most of us have had some experience with programming languages whose designers have adopted a standard for styling code in that language. Think of the arguments between C programmers as to whether to write } else { or have one or more of the braces appear on separate lines.
And yes, one of the little things that challenges Algol 68 implementations is how to deal with the difference between begin, which is a symbol, and begin, which is an identity.
In the articles I have written to date about Algol 68 on both.org, I have used the Algol 68 Genie compiler – interpreter, which prefers UPPER stropping – symbols are written all in capital letters and identities are written all in lower case letters and numbers. Moreover, symbols cannot contain blanks, but identifiers can.
This UPPER stropping makes it pretty straightforward to write a simple pretty printer, in order to convert bold-stropped Algol 68 code into the more decorated version, in HTML format, for inclusion in web pages or elsewhere.
Enough background, let’s get to work on the code.
The pretty printer
Here it is. It’s pretty long, but not super-complicated, and because of the length I’ve added quite a few comments. Discussion follows:
1 #
2 Library-preludes
3
4 /Nothing, however, prevents you from subjoining to this standard-prelude a
5 set of home made declarations. Of course you are free to declare such new
6 things in your own particular-program; but, as soon as you want to apply
7 them in several programs,or you want to enable others to use them, or you
8 have reason to expect that more efficiency may be acquired, then you can go
9 to your implementor and ask him to make the whole set an as efficient as
10 possible extension of the standard-prelude, i.e. 'it 'library-prelude'. In
11 that way an arbitrary number of problem oriented dialects may be defined./
12 Lindsey, C.H., & van der Meulen, S.G. (1973).
13 /Informal Introduction to Algol 68/ (Revised Edition, Ch. 0.10.7, p. 52).
14 North Holland Publishing Company.
15 #
16 PR read "each_line.a68" PR
17 #
18 Particular-program
19 Algol 68 pretty printer.
20 Depends on UPPER stropping.
21 Lowercases and bolds reserved words.
22 Italicizes variable names.
23 Generates HTML output.
24 #
25 BEGIN
26 #
27 Using a68g, we can get at Linux command line arguments.
28 In order to pass arguments to the program, we terminate arguments to a68g
29 itself using the command line option "--".
30 Therefore, there are three arguments to be skipped before the program can
31 find its own arguments:
32 argv(1) - "a68g"
33 argv(2) - name of the program being interpreted
34 argv(3) - "--" which is necessary to get a68g to leave off interpreting args
35 Following those, we can provide a list of one or more Algol 68 files,
36 which appear in argv(4) and onward.
37 #
38 IF argc < 4 THEN
39 print("Usage: a68g pp.a68 -- [source files for pretty printing]")
40 ELSE
41 #
42 Embed our pretty-printed code in a minimal HTML head-body section
43 #
44 print(("<html>",new line));
45 print(("<head>",new line));
46 print(("<title> Algol 68 Pretty Print</title>",new line));
47 print(("<body>",new line));
48 #
49 Loop over source files.
50 There is no sophisticated parsing nor error checking here
51 #
52 FOR an FROM 4 TO argc DO
53 STRING file name = argv(an);
54 #
55 If the file is a readable regular file, we attempt to pretty print it
56 #
57 IF file is regular(file name) THEN
58 print((new line, new line,"<p><strong><em>{ **** ",file name," **** }</em></strong></p>",new line,new line));
59 #
60 There are six "states" that the pretty printer can be in:
61 - commenting (between opening sharp and closing sharp):
62 do no pretty printing
63 - stringing (between opening doublequote and closing doublequote):
64 do no pretty printing
65 - leadering (processing leading blanks on a line):
66 replace blanks with , tabs with
67 - reserved wording (processing upper case words):
68 lowercase and wrap in <strong></strong>
69 - identitying (processing lower case variable names possibly
70 including spaces or numerals):
71 wrap in <em></em>
72 - other (for example processing numerals, punctuation):
73 just let the characters through
74 Commenting can be active over more than one line, but other states reset at the beginning of each line.
75 Each file appears as one HTML paragraph, with line breaks after each line.
76 #
77 print("<p>");
78 # multiline state resets #
79 BOOL commenting := FALSE;
80 #
81 Loop over the lines in the file
82 #
83 each line(file name, (STRING line, INT linecount) VOID: BEGIN
84 # individual line state resets #
85 BOOL reserved wording := FALSE;
86 BOOL identitying := FALSE;
87 BOOL stringing := FALSE;
88 BOOL leadering := TRUE;
89 # process the line character-by-character #
90 FOR cn FROM LWB line TO UPB line DO
91 CHAR c := line[cn];
92 # analyze the current character and change states where necessary #
93 IF c = """" AND NOT commenting THEN
94 # start or end of a string - flip stringing; off reserved wording, identifying, leadering #
95 stringing := NOT stringing;
96 IF reserved wording THEN reserved wording := FALSE; print("</strong>") FI;
97 IF identitying THEN identitying := FALSE; print("</em>") FI;
98 leadering := FALSE
99 ELIF c = "#" AND NOT stringing THEN
100 # start or end of a comment - flip commenting; off reserved wording, identifying, leadering #
101 commenting := NOT commenting;
102 IF reserved wording THEN reserved wording := FALSE; print("</strong>") FI;
103 IF identitying THEN identitying := FALSE; print("</em>") FI;
104 leadering := FALSE
105 ELIF NOT (commenting OR stringing) THEN
106 IF "A" <= c AND c <= "Z" THEN
107 # reserved wording #
108 IF NOT reserved wording THEN
109 # start of reserved wording - on reserved wording; off identifying, stringing, commenting, leadering #
110 reserved wording := TRUE; print("<strong>");
111 IF identitying THEN identitying := FALSE; print("</em>") FI;
112 leadering := FALSE
113 FI;
114 # translate to lower case #
115 c := REPR (ABS c - ABS "A" + ABS "a")
116 ELIF "a" <= c AND c <= "z" THEN
117 # identifying #
118 IF NOT identitying THEN
119 # start of identifying - on identifying; off reserved wording, stringing, commenting, leadering #
120 identitying := TRUE; print("<em>");
121 IF reserved wording THEN reserved wording := FALSE; print("</strong>") FI;
122 leadering := FALSE
123 FI
124 # don't mess with the character #
125 ELIF ABS c = 9 OR c = " " THEN
126 # leadering, space within an identity or just a space - off reserved wording #
127 IF reserved wording THEN reserved wording := FALSE; print("</strong>") FI
128 ELIF "0" <= c AND c <= "9" THEN
129 # continuation of an identity or just a numeral - off reserved wording, leadering #
130 IF reserved wording THEN reserved wording := FALSE; print("</strong>") FI;
131 leadering := FALSE
132 ELSE
133 # something else, e.g. punctuation - off reserved wording, identitying and leadering #
134 IF reserved wording THEN print("</strong>"); reserved wording := FALSE FI;
135 IF identitying THEN print("</em>"); identitying := FALSE FI;
136 leadering := FALSE
137 FI
138 FI;
139 # emit the current character (or its surrogate) #
140 IF leadering AND c = " " THEN
141 print(" ")
142 ELIF leadering AND ABS c = 9 THEN
143 print(" ")
144 ELIF c = "<" THEN
145 print("<")
146 ELIF c = ">" THEN
147 print(">")
148 ELSE
149 print(c)
150 FI
151 OD;
152 # end of line - off reserved wording, identifying; print line break #
153 IF reserved wording THEN print("</strong>"); reserved wording := FALSE FI;
154 IF identitying THEN print("</em>"); identitying := FALSE FI;
155 print(("<br />",new line))
156 END);
157 #
158 That's it for this file, close off the HTML paragraph
159 #
160 print(("</p>",new line))
161 ELSE
162 #
163 Oops - not a regular file
164 #
165 print((new line, new line,"<p><strong><em>{ **** ",file name," **** is not a regular file - skipping }</em></strong></p>",new line,new line))
166 FI
167 OD;
168 #
169 Emit HTML postamble
170 #
171 print(("</body>",new line));
172 print(("</html>",new line))
173 FI
174 END
On line 16, we use the Algol 68 Genie pragmat read to include the source code for our each line() procedure that lets us process a text file without worrying about the “how” details.
The particular-program, that is the main program, runs from the BEGIN on line 25 through to the END on line 174.
Lines 26 – 40 use Algol 68 Genie extensions to get at the Linux command line arguments, allowing the user of our program to pass in one or more source code files for pretty printing. The comments explain this in more detail.
Lines 41 – 47 emit the HTML the opening “html”, the “head” block and the opening “body” elements; lines 168 – 173 emit the closing “body” and “html” elements. This lets us have a standalone HTML page; or we can take the output lines between these two HTML segments if we just want the pretty printed code.
Lines 52 – 167 are a loop over each file supplied by the user. As the comments note, the pretty printed code is placed in a single HTML paragraph with HTML line breaks at the end of each line.
Line 53 sets the file name from the relevant argv() call.
Lines 54 – 161 process the file if it is “regular”; lines 162 – 166 print a warning and skip the file otherwise.
Lines 58 – 77 print an HTML header for the file being processed, describe the interpretation process as a set of states established by what is read and used for the pretty printing and print the HTML opening “p” element, indicating the start of the pretty printed code. Lines 157 – 160 print the closing “p” element.
Lines 78 – 156 initialize the commenting state to FALSE – that is, the pretty printing starts knowing that it is not within a comment – and use the each line() procedure to process the lines in the text file. This then is where the work is done. Comments in Algol 68 are potentially multline and therefore cannot be reset to “not commenting” at the end of each line.
Lines 84 – 88 reset the remaining states – stringing, reserved wording, identifying and leadering at the beginning of each line. These states are not multiline, so this is appropriate. The leadering state might confuse – this refers to the zero or more blanks or tabs that begin each line of code. All of these states are mutually exclusive, as is the “not any of these” state, at least as far as pretty printing goes. There is no adjustment to text within comments or strings. Symbols (reserved words) are converted to HTML “strong” text and lower-cased; identifiers are converted to HTML “em” text.
Lines 90 – 151 loop over the characters in a line, looking for state changes and wrapping symbols and identifiers as required.
Lines 93 – 98 detect the beginning or end of a string, flip the stringing state and turn off reserved wording, identifying and leadering states.
Lines 98 – 104 detect the beginning or end of a comment, flip the commenting state and turn off reserved wording, identifying and leadering states.
Lines 105 – 138 deal with the rest of the states provided that the pretty printing process is in neither the stringing nor commenting state.
Lines 106 – 115 deal with symbols, turning on the reserved wording state at the beginning of a symbol (upper case letter) and turning off identifying and leadering states. Uppercase characters are translated to lower case in line 115 using a rule that works with ASCII upper- and lowercase characters.
Lines 116 – 124 deal with identifiers, turning on the identifying state at the beginning of an identifier (lower case letter) and turning off reserved wording and leadering states. No adjustments are made to the characters themselves.
Lines 125 – 127 deal with spaces that terminate symbols, can be embedded in identifiers or can be “significant” white space (from a pretty printing perspective) at the beginning of a line, turning off the reserved wording state.
Lines 128 – 131 deal with numerical digits that terminate symbols and leading white space but can be a second or subsequent character in an identifier, turning off the reserved wording and leadering states.
Lines 132 – 136 deal with any other characters which terminate symbols, identifiers and leading white space, turning off reserved wording, identifying and leadering states.
Phew, done with tracking all the state changes and bits of HTML markup!
Lines 140 – 150 convert the character to one or four HTML non-breaking spaces if it’s a space ot tab and the leadering state is on; to HTML lt and gt symbols if it’s the less-than or greater-than symbol respectively; or just output it otherwise.
Finally lines 152 – 155 handle the end of line.
Running the code and looking at the results
Let’s test the program out on the regress.a68 program we wrote in the previous article:
$ a68g pp.a68 -- regress.a68 > regress.html $
And here’s the output:
{ **** regress.a68 **** }
#
Utility routines
#
pr read “str_to_real.a68” pr
pr read “split.a68” pr
pr read “each_delimited_line.a68” pr
#
Perform a least-squares fit of a line to a data file containing (x,y) values
#
begin
[1:2,1:2] real xtx := ((0.0, 0.0),(0.0,0.0));
[1:2] real xty := (0.0, 0.0);
# Accumulate XTX and XTy #
each delimited line(“test_data.txt”, “,”, (string line, []string fields, int line count) void: begin
# skip the header line #
if line count > 1 then
real xi = str to real(fields[1]), yi = str to real(fields[2]);
xtx[1,1] +:= xi * xi;
xtx[1,2] +:= xi;
xtx[2,2] +:= 1.0;
xty[1] +:= xi * yi;
xty[2] +:= yi
fi
end);
xtx[2,1] := xtx[1,2];
# let’s see the matrix and vector before Gauss #
print((“before Gauss:”,new line));
print((“⎡”,xtx[1,1],” “,xtx[1,2],”⎤ = ⎡”,xty[1],”⎤”,new line));
print((“⎣”,xtx[2,1],” “,xtx[2,2],”⎦ ⎣”,xty[2],”⎦”,new line));
# Gaussian elimination #
real a = –xtx[2,1] / xtx[1,1];
xtx[2,1] +:= a * xtx[1,1];
xtx[2,2] +:= a * xtx[1,2];
xty[2] +:= a * xty[1];
# let’s see the matrix and vector after Gauss #
print((“after Gauss:”,new line));
print((“⎡”,xtx[1,1],” “,xtx[1,2],”⎤ = ⎡”,xty[1],”⎤”,new line));
print((“⎣”,xtx[2,1],” “,xtx[2,2],”⎦ ⎣”,xty[2],”⎦”,new line));
# back substitution #
real b = xty[2] / xtx[2,2];
real m = (xty[1] – xtx[1,2] * b) / xtx[1,1];
# let’s see the equation #
print((“equation:”,new line));
print((“y = “,fixed(m,0,7),” * x + “,fixed(b,0,7),new line))
endhttps://www.both.org/?p=11217
Not bad! I like to have string constants like “equation:” stand out from identifiers so I don’t italicize them. But other than that, I think it looks pretty close to the beautifully typeset examples from the 1970s.
What did we learn here? Two things come to mind:
- we gained some familiarity with processing text in Algol 68, which is cool because we now know it’s not just a “numerical algorithms language”;
- we saw that we can parse an Algol 68 program that is UPPER stropped with a very simple state-machine approach.
I’ll be back with more Algol 68 in the not-too-distant future, when we’ll take a look at the exciting new GNU Algol 68 compiler, built with the GNU Compiler Collection.
1 McGettrick, A.D. (1978). Algol 68 – a first and second course. Cambridge University Press.
2 Ibid, p. 6
3 Ibid, p. 7-8
4 van Wijngaarden, A., Mailloux, B.J., Peck, J.E.L., Koster, C.H.A., Sintzoff, M., Lindsey, C.H., Meertens, L.G.L.T., & Fisker, R.G. (1978). Revised Report on the Algorithmic Language Algol 68. IFIP Working Group 2.1.
5 Ibid, § 0.3.6, p. 14