An Algol 68 Pretty Printer


For Marcel van der Veer, with many thanks for Algol 68 Genie

If you pick up a textbook on Algol 68 from the 1970s, such as Andrew McGettrick’s wonderful primer, Algol 68 – a first and second course1, you will see code examples represented as follows:2

begin real e = 2.7182818284; real circum;

    circum := 2 × pi × e;

    print ((“circumference of a cricle of radius e is”, circum))

end

Professor McGettrick soon explains:

When a program has to be run on a computer it will have to be prepared in some suitable manner… Unfortunately, however, these and similar devices do not permit the printing in bold type of words such as begin. Indeed, often small leters are also unavailable… Typically begin might appear in one of the forms

.BEGIN. ‘BEGIN’ ‘BEGIN BEGIN

… Moreover the multiplication sign has been replaced by an asterisk.3

This convention of using boldfaced words for symbols comes from the Revised Report,4 which states:

The manner in which symbols for newly defined mode-indications and operators are to be represented is now more closely defined. Thus it is clear that the implementer is to provide a special alphabet of bold-faced, or “‘stropped”‘, marks from which symbols such as person may be made, and it is also clear that operators such as » are to be allowed.5

Worthwhile mentioning at this point is that the members of the Working Group defining Algol 68 were developing a whole new framework for describing the programming language, and even for describing the description, with new concepts such as stropping mentioned above.

Most of us have had some experience with programming languages whose designers have adopted a standard for styling code in that language. Think of the arguments between C programmers as to whether to write } else { or have one or more of the braces appear on separate lines.

And yes, one of the little things that challenges Algol 68 implementations is how to deal with the difference between begin, which is a symbol, and begin, which is an identity.

In the articles I have written to date about Algol 68 on both.org, I have used the Algol 68 Genie compiler – interpreter, which prefers UPPER stropping – symbols are written all in capital letters and identities are written all in lower case letters and numbers. Moreover, symbols cannot contain blanks, but identifiers can.

This UPPER stropping makes it pretty straightforward to write a simple pretty printer, in order to convert bold-stropped Algol 68 code into the more decorated version, in HTML format, for inclusion in web pages or elsewhere.

Enough background, let’s get to work on the code.

The pretty printer

Here it is. It’s pretty long, but not super-complicated, and because of the length I’ve added quite a few comments. Discussion follows:

     1	#
     2	    Library-preludes
     3	    
     4	    /Nothing, however, prevents you from subjoining to this standard-prelude a
     5	     set of home made declarations. Of course you are free to declare such new
     6	     things in your own particular-program; but, as soon as you want to apply
     7	     them in several programs,or you want to enable others to use them, or you
     8	     have reason to expect that more efficiency may be acquired, then you can go
     9	     to your implementor and ask him to make the whole set an as efficient as
    10	     possible extension of the standard-prelude, i.e. 'it 'library-prelude'. In
    11	     that way an arbitrary number of problem oriented dialects may be defined./
       
    12	     Lindsey, C.H., & van der Meulen, S.G. (1973).
    13	     /Informal Introduction to Algol 68/ (Revised Edition, Ch. 0.10.7, p. 52).
    14	     North Holland Publishing Company.
    15	#
       
    16	PR read "each_line.a68" PR
       
    17	#
    18	    Particular-program
       
    19	    Algol 68 pretty printer.
    20	    Depends on UPPER stropping.
    21	    Lowercases and bolds reserved words.
    22	    Italicizes variable names.
    23	    Generates HTML output.
    24	#
       
    25	BEGIN
       
    26	    #
    27	        Using a68g, we can get at Linux command line arguments.
       
    28	        In order to pass arguments to the program, we terminate arguments to a68g
    29	        itself using the command line option "--".
       
    30	        Therefore, there are three arguments to be skipped before the program can
    31	        find its own arguments:
       
    32	        argv(1) - "a68g"
    33	        argv(2) - name of the program being interpreted
    34	        argv(3) - "--" which is necessary to get a68g to leave off interpreting args
       
    35	        Following those, we can provide a list of one or more Algol 68 files,
    36	        which appear in argv(4) and onward.
       
    37	    #
       
    38	    IF argc < 4 THEN
       
    39	        print("Usage: a68g pp.a68 -- [source files for pretty printing]")
       
    40	    ELSE
       
    41	        #
    42	            Embed our pretty-printed code in a minimal HTML head-body section
    43	        #
       
    44	        print(("<html>",new line));
    45	        print(("<head>",new line));
    46	        print(("<title> Algol 68 Pretty Print</title>",new line));
    47	        print(("<body>",new line));
       
    48	        #
    49	            Loop over source files.
    50	            There is no sophisticated parsing nor error checking here
    51	        #
       
    52	        FOR an FROM 4 TO argc DO
       
    53	            STRING file name = argv(an);
       
    54	            #
    55	                If the file is a readable regular file, we attempt to pretty print it
    56	            #
       
    57	            IF file is regular(file name) THEN
       
    58	                print((new line, new line,"<p><strong><em>{ **** ",file name," **** }</em></strong></p>",new line,new line));
       
    59	                #
    60	                    There are six "states" that the pretty printer can be in:
       
    61	                    - commenting (between opening sharp and closing sharp):
    62	                        do no pretty printing
    63	                    - stringing (between opening doublequote and closing doublequote):
    64	                          do no pretty printing
    65	                    - leadering (processing leading blanks on a line):
    66	                        replace blanks with &nbsp;, tabs with &nbsp;&nbsp;&nbsp;&nbsp;
    67	                    - reserved wording (processing upper case words):
    68	                        lowercase and wrap in &lt;strong&gt;&lt;/strong&gt;
    69	                    - identitying (processing lower case variable names possibly
    70	                      including spaces or numerals): 
    71	                          wrap in &lt;em&gt;&lt;/em&gt;
    72	                    - other (for example processing numerals, punctuation):
    73	                        just let the characters through
       
    74	                    Commenting can be active over more than one line, but other states reset at the beginning of each line.
    75	                    Each file appears as one HTML paragraph, with line breaks after each line.
    76	                #
       
    77	                print("<p>");
       
    78	                # multiline state resets #
       
    79	                BOOL commenting := FALSE;
       
    80	                #
    81	                    Loop over the lines in the file
    82	                #
       
    83	                each line(file name, (STRING line, INT linecount) VOID: BEGIN
       
    84	                    # individual line state resets #
       
    85	                    BOOL reserved wording := FALSE;
    86	                    BOOL identitying := FALSE;
    87	                    BOOL stringing := FALSE;
    88	                    BOOL leadering := TRUE;
       
    89	                    # process the line character-by-character #
       
    90	                    FOR cn FROM LWB line TO UPB line DO
    91	                        CHAR c := line[cn];
       
    92	                        # analyze the current character and change states where necessary #
       
    93	                        IF c = """" AND NOT commenting THEN
       
    94	                            # start or end of a string - flip stringing; off reserved wording, identifying, leadering #
       
    95	                            stringing := NOT stringing;
       
    96	                            IF reserved wording THEN reserved wording := FALSE; print("</strong>") FI;
    97	                            IF identitying THEN identitying := FALSE; print("</em>") FI;
    98	                            leadering := FALSE
       
    99	                        ELIF c = "#" AND NOT stringing THEN
       
   100	                            # start or end of a comment - flip commenting; off reserved wording, identifying, leadering #
       
   101	                            commenting := NOT commenting;
       
   102	                            IF reserved wording THEN reserved wording := FALSE; print("</strong>") FI;
   103	                            IF identitying THEN identitying := FALSE; print("</em>") FI;
   104	                            leadering := FALSE
       
   105	                        ELIF NOT (commenting OR stringing) THEN
       
   106	                            IF "A" <= c AND c <= "Z" THEN
       
   107	                                # reserved wording #
       
   108	                                IF NOT reserved wording THEN
       
   109	                                    # start of reserved wording - on reserved wording; off identifying, stringing, commenting, leadering #
       
   110	                                    reserved wording := TRUE; print("<strong>");
       
   111	                                    IF identitying THEN identitying := FALSE; print("</em>") FI;
   112	                                    leadering := FALSE
   113	                                FI;
       
   114	                                # translate to lower case #
       
   115	                                c := REPR (ABS c - ABS "A" + ABS "a")
       
   116	                            ELIF "a" <= c AND c <= "z" THEN
       
   117	                                # identifying #
       
   118	                                IF NOT identitying THEN
       
   119	                                    # start of identifying - on identifying; off reserved wording, stringing, commenting, leadering #
       
   120	                                    identitying := TRUE; print("<em>");
       
   121	                                    IF reserved wording THEN reserved wording := FALSE; print("</strong>") FI;
   122	                                    leadering := FALSE
   123	                                FI
       
   124	                                # don't mess with the character #
       
   125	                            ELIF ABS c = 9 OR c = " " THEN
       
   126	                                # leadering, space within an identity or just a space - off reserved wording #
       
   127	                                IF reserved wording THEN reserved wording := FALSE; print("</strong>") FI
       
   128	                            ELIF "0" <= c AND c <= "9" THEN
       
   129	                                # continuation of an identity or just a numeral - off reserved wording, leadering #
       
   130	                                IF reserved wording THEN reserved wording := FALSE; print("</strong>") FI;
   131	                                leadering := FALSE
       
   132	                            ELSE
       
   133	                                # something else, e.g. punctuation - off reserved wording, identitying and leadering #
       
   134	                                IF reserved wording THEN print("</strong>"); reserved wording := FALSE FI;
   135	                                IF identitying THEN print("</em>"); identitying := FALSE FI;
   136	                                leadering := FALSE
       
   137	                            FI
   138	                        FI;
       
   139	                        # emit the current character (or its surrogate) #
       
   140	                        IF leadering AND c = " " THEN
   141	                            print("&nbsp;")
   142	                        ELIF leadering AND ABS c = 9 THEN
   143	                            print("&nbsp;&nbsp;&nbsp;&nbsp;")
   144	                        ELIF c = "<" THEN
   145	                            print("&lt;")
   146	                        ELIF c = ">" THEN
   147	                            print("&gt;")
   148	                        ELSE
   149	                            print(c)
   150	                        FI
   151	                    OD;
       
   152	                    # end of line - off reserved wording, identifying; print line break #
       
   153	                    IF reserved wording THEN print("</strong>"); reserved wording := FALSE FI;
   154	                    IF identitying THEN print("</em>"); identitying := FALSE FI;
   155	                    print(("<br />",new line))
       
   156	                END);
       
   157	                #
   158	                    That's it for this file, close off the HTML paragraph
   159	                #
       
   160	                print(("</p>",new line))
       
   161	            ELSE
       
   162	                #
   163	                    Oops - not a regular file
   164	                #
       
   165	                print((new line, new line,"<p><strong><em>{ **** ",file name," **** is not a regular file - skipping }</em></strong></p>",new line,new line))
       
   166	            FI
   167	        OD;
       
   168	        #
   169	            Emit HTML postamble
   170	        #
       
   171	        print(("</body>",new line));
   172	        print(("</html>",new line))
   173	    FI
       
   174	END

On line 16, we use the Algol 68 Genie pragmat read to include the source code for our each line() procedure that lets us process a text file without worrying about the “how” details.

The particular-program, that is the main program, runs from the BEGIN on line 25 through to the END on line 174.

Lines 26 – 40 use Algol 68 Genie extensions to get at the Linux command line arguments, allowing the user of our program to pass in one or more source code files for pretty printing. The comments explain this in more detail.

Lines 41 – 47 emit the HTML the opening “html”, the “head” block and the opening “body” elements; lines 168 – 173 emit the closing “body” and “html” elements. This lets us have a standalone HTML page; or we can take the output lines between these two HTML segments if we just want the pretty printed code.

Lines 52 – 167 are a loop over each file supplied by the user. As the comments note, the pretty printed code is placed in a single HTML paragraph with HTML line breaks at the end of each line.

Line 53 sets the file name from the relevant argv() call.

Lines 54 – 161 process the file if it is “regular”; lines 162 – 166 print a warning and skip the file otherwise.

Lines 58 – 77 print an HTML header for the file being processed, describe the interpretation process as a set of states established by what is read and used for the pretty printing and print the HTML opening “p” element, indicating the start of the pretty printed code. Lines 157 – 160 print the closing “p” element.

Lines 78 – 156 initialize the commenting state to FALSE – that is, the pretty printing starts knowing that it is not within a comment – and use the each line() procedure to process the lines in the text file. This then is where the work is done. Comments in Algol 68 are potentially multline and therefore cannot be reset to “not commenting” at the end of each line.

Lines 84 – 88 reset the remaining states – stringing, reserved wording, identifying and leadering at the beginning of each line. These states are not multiline, so this is appropriate. The leadering state might confuse – this refers to the zero or more blanks or tabs that begin each line of code. All of these states are mutually exclusive, as is the “not any of these” state, at least as far as pretty printing goes. There is no adjustment to text within comments or strings. Symbols (reserved words) are converted to HTML “strong” text and lower-cased; identifiers are converted to HTML “em” text.

Lines 90 – 151 loop over the characters in a line, looking for state changes and wrapping symbols and identifiers as required.

Lines 93 – 98 detect the beginning or end of a string, flip the stringing state and turn off reserved wording, identifying and leadering states.

Lines 98 – 104 detect the beginning or end of a comment, flip the commenting state and turn off reserved wording, identifying and leadering states.

Lines 105 – 138 deal with the rest of the states provided that the pretty printing process is in neither the stringing nor commenting state.

Lines 106 – 115 deal with symbols, turning on the reserved wording state at the beginning of a symbol (upper case letter) and turning off identifying and leadering states. Uppercase characters are translated to lower case in line 115 using a rule that works with ASCII upper- and lowercase characters.

Lines 116 – 124 deal with identifiers, turning on the identifying state at the beginning of an identifier (lower case letter) and turning off reserved wording and leadering states. No adjustments are made to the characters themselves.

Lines 125 – 127 deal with spaces that terminate symbols, can be embedded in identifiers or can be “significant” white space (from a pretty printing perspective) at the beginning of a line, turning off the reserved wording state.

Lines 128 – 131 deal with numerical digits that terminate symbols and leading white space but can be a second or subsequent character in an identifier, turning off the reserved wording and leadering states.

Lines 132 – 136 deal with any other characters which terminate symbols, identifiers and leading white space, turning off reserved wording, identifying and leadering states.

Phew, done with tracking all the state changes and bits of HTML markup!

Lines 140 – 150 convert the character to one or four HTML non-breaking spaces if it’s a space ot tab and the leadering state is on; to HTML lt and gt symbols if it’s the less-than or greater-than symbol respectively; or just output it otherwise.

Finally lines 152 – 155 handle the end of line.

Running the code and looking at the results

Let’s test the program out on the regress.a68 program we wrote in the previous article:

$ a68g pp.a68 -- regress.a68 > regress.html
$

And here’s the output:

{ **** regress.a68 **** }

#
    Utility routines
#

pr read “str_to_real.a68” pr
pr read “split.a68” pr
pr read “each_delimited_line.a68” pr

#
    Perform a least-squares fit of a line to a data file containing (x,y) values
#

begin

    [1:2,1:2] real xtx := ((0.0, 0.0),(0.0,0.0));
    [1:2] real xty := (0.0, 0.0);

    # Accumulate XTX and XTy #

    each delimited line(“test_data.txt”, “,”, (string line, []string fields, int line count) void: begin
        # skip the header line #
        if line count > 1 then
            real xi = str to real(fields[1]), yi = str to real(fields[2]);
            xtx[1,1] +:= xi * xi;
            xtx[1,2] +:= xi;
            xtx[2,2] +:= 1.0;
            xty[1] +:= xi * yi;
            xty[2] +:= yi
        fi
    end);
    xtx[2,1] := xtx[1,2];

    # let’s see the matrix and vector before Gauss #

    print((“before Gauss:”,new line));
    print((“⎡”,xtx[1,1],” “,xtx[1,2],”⎤ = ⎡”,xty[1],”⎤”,new line));
    print((“⎣”,xtx[2,1],” “,xtx[2,2],”⎦ ⎣”,xty[2],”⎦”,new line));

    # Gaussian elimination #

    real a = –xtx[2,1] / xtx[1,1];
    xtx[2,1] +:= a * xtx[1,1];
    xtx[2,2] +:= a * xtx[1,2];
    xty[2] +:= a * xty[1];

    # let’s see the matrix and vector after Gauss #

    print((“after Gauss:”,new line));
    print((“⎡”,xtx[1,1],” “,xtx[1,2],”⎤ = ⎡”,xty[1],”⎤”,new line));
    print((“⎣”,xtx[2,1],” “,xtx[2,2],”⎦ ⎣”,xty[2],”⎦”,new line));

    # back substitution #

    real b = xty[2] / xtx[2,2];
    real m = (xty[1] – xtx[1,2] * b) / xtx[1,1];

    # let’s see the equation #

    print((“equation:”,new line));
    print((“y = “,fixed(m,0,7),” * x + “,fixed(b,0,7),new line))

endhttps://www.both.org/?p=11217

Not bad! I like to have string constants like “equation:” stand out from identifiers so I don’t italicize them. But other than that, I think it looks pretty close to the beautifully typeset examples from the 1970s.

What did we learn here? Two things come to mind:

  • we gained some familiarity with processing text in Algol 68, which is cool because we now know it’s not just a “numerical algorithms language”;
  • we saw that we can parse an Algol 68 program that is UPPER stropped with a very simple state-machine approach.

I’ll be back with more Algol 68 in the not-too-distant future, when we’ll take a look at the exciting new GNU Algol 68 compiler, built with the GNU Compiler Collection.


1 McGettrick, A.D. (1978). Algol 68 – a first and second course. Cambridge University Press.

2 Ibid, p. 6

3 Ibid, p. 7-8

4 van Wijngaarden, A., Mailloux, B.J., Peck, J.E.L., Koster, C.H.A., Sintzoff, M., Lindsey, C.H., Meertens, L.G.L.T., & Fisker, R.G. (1978). Revised Report on the Algorithmic Language Algol 68. IFIP Working Group 2.1.

5 Ibid, § 0.3.6, p. 14

Leave a Reply