
Exploring GNU Algol 68
Since early 2019, I’ve spent some time here and there refreshing my appreciation for the Algol 68 programming language, courtesy of Marcel van der Veer’s most excellent Algol 68 Genie implementation. Using the Genie compiler / interpreter, I’ve written a fair bit of code, mostly for fun but occasionally for small work-related projects. And even though the design of Algol 68 predates many useful and appealing programming language features that are today mainstream, Algol 68 got so much very right that it seems short-sighted to ignore the many useful lessons it can provide.
Earlier in 2025, I was delighted to stumble upon another modern implementation of Algol 68 – GNU Algol 68, which is built using the GNU Compiler Collection. If having one great Algol 68 implementation is good, how can having two such not be even better?
And so I decided to spend some time exploring GNU Algol 68, which has motivated me to write a few more articles about this revolutionary programming language.
GNU Algol 68 is in development…
… and the Roadmap section of the GNU Algol 68 wiki entry in the GNU Compiler Collection – hereafter abbreviated to ga68 and GCC respectively – provides an overview of what’s yet to be implemented. Perhaps the most broadly painful as-yet-missing feature is conversion of integers, reals, booleans, bits and bytes to strings and back again, since without these utilities, it’s quite difficult to write a program that can output its results in human readable form.
So my first order of business was to hack together the standard Algol 68 transput procedures whole() and fixed(), which convert integers and reals to string. In writing these procedures, I reacquainted myself with a few Algol 68 design decisions that have never felt particularly useful to me:
- a call to whole(i, -4) will yield
- “﹎﹎﹎0”, “﹎﹎99”, “﹎-99”, “9999” or, if i were greater than 9999, “****”, where “*” is the yield of errorchar and “﹎” represents a space;
- a call to whole(i, 4) will yield
- “﹎+99” rather than “﹎﹎99”;
- a call to whole(i, 0) will yield
- “O”, “99”, “-99”, “9999” or “99999”;
- and similar with fixed().1
What is it I don’t like about this? Two main things, really: first, as far as I can recall, I’ve never wanted to print a positive number preceded by a plus sign, and while I’m prepared to concede that some programmers might find this to be a great idea, I’d rather it not require the active use of the negative width to turn it off; and second, if I’m planning for output to fit in a certain width and I produce a larger number than will fit, I’d prefer that my formatting be scrambled rather than my numbers, which seems like a lesser sin to me.
As well, calls to whole() and fixed() most often occur within calls to the print() procedure, which is kind of a weird beast in Algol 68, defined as:2
proc print= ([ ] union (outtype, proc (ref file) void) x) void:
put (stand out, x)
with the definition of the mode outtype being well and truly fudged as:
mode outtype = c an actual-declarer specifying a mode united from (2.1.3.6.a} a sufficient set of modes none of which is ‘void’ or contains ‘flexible‘, ‘reference to‘, ‘procedure‘ or ‘union of‘ c;
Moreover, the Revised Report goes on to define the coercion straightening which is used to deal with multiple values (rows, rows of rows etc) and structured values; this definition is equally an algorithmic comment, rather than actual code.3
… but I want to use it!
Given all this complexity and needing to live – for now, at least – within ga68’s limitation of only providing the puts() procedure, which emits a single string, I came to some decisions – I would write a set of routines that:
- use a “locale” facility to govern things like thousands separator, plus or minus sign, decimal separator…;
- for integers, convert the number to the minimum length string, with or without thousands separators;
- for real numbers, convert the number to either fixed or scientific notation, respecting the approximate number of significant figures provided by the real mode (ie no specification for number of figures to the right of the decimal point – I might add this later);
- besides integer and real, additionally handle long integer, long real, bits and boolean, but for now no additional handling for short integer, etc;
- additionally provide binary and hexadecimal format for integer and long integer; and
- position one string within another – e.g. placing a 3 digit number to the right of a 10 character space – separately from the format conversion of the number to a string.
A design decision made by the design committee allows the overloading of operators, so that the same operator – e.g. plus, + – can be used between two integers, an integer and a real, and so forth. However, the committee did not extend the concept of overloading to procedures. At least in relation to the problem at hand, this pushed me into deciding to implement my output routines as unary operators, of which there are currently:
- TOS { format int, long int, bool as decimal string }
- TOSS { format int, long int as decimal string with thousands separators }
- TOSB { format int, long int, bits, long bits as binary string e.g. 2r010010 }
- TOSX { format int, long int, bits, long bits as hexadecimal string e.g. 16r15ea }
- TOSE { format real, long real as scientific notation }
- TOSF { format real, long real as fixed decimal string }
Implementing these routines as operators provides the additional benefit of not requiring parentheses around the operand (unless it is an expression).
Note that there is an indirect method to handle overloading, which works for both operators and procedures – the use of united mode parameters and the conformity clause. For example, a procedure that handles the same arguments as the TOS operator might look like:
proc tos = (union (int, long int, bool) x) string:
case x in
(int ix): { do something when int argument supplied },
(long int lx): { do something when long int argument supplied },
(bool bx): { do something when bool argument supplied }
esac
so it’s not the end of the world to go with procedure versions. Worth noting is that – as long as the argument to the procedure (or operator) taking a united mode parameter is not itself of a similar united mode, then the compiler could determine at compile time which case applies and the case clause need not be evaluated at run time.
As I was working on these operators, a thought occurred to me – were I writing some program dealing with a particular structured or multiple value, I might find myself implementing a TOS operator for instances of those structured or row values. For example, I could see myself writing a TOS operator for the vectors and matrices I used in my least-squares fitting article.
In my next article, we’ll dive into coding the TOS, TOSS, TOSB and TOSX operators.
- Algol 68 Revised Report, p. 159. ↩︎
- ibid, p. 209. ↩︎
- ibid, p. 163. ↩︎