Exploring GNU Algol 68: Formatting numbers as strings for output

As I mentioned in my introductory article to this mini-series, GNU Algol 68 is in development, and as of the date of writing this article, does not include procedures (nor operators) to convert integer, floating point, boolean, bits or bytes to string for output – what Algol 68 calls transput. While the Algol 68 revised report defines transput procedures whole(), fixed() and float(),1 I mentioned that I don’t care for some of the design decisions that went into those routines and so I decided to create my own.

In this article, we will design and code Algol 68 operators:

  • TOS { format int, long int, bool as decimal string }
  • TOSS { format int, long int as decimal string with thousands separators }
  • TOSB { format int, long int, bits, long bits as binary string e.g. 2r010010 }
  • TOSX { format int, long int, bits, long bits as hexadecimal string e.g. 16r15ea }

But first, terminology, operators and stropping…

Algol 68 was designed by the ALGOL 68 Support subcommittee (the committee) of the International Federation for Information Processing (IFIP) Working Group 2.1 on ALGOL (WG2.1). In addition to designing the programming language, the committee also selected a metalanguage, proposed by A. van Wijngaarden, to formally describe the syntax and semantics of the programming language.

Because it was a fresh-sheet design and took on the task of formally specifying both syntax and semantics, this metalanguage was seen as going too far by some of the committee members and other interested parties, resulting in acrimony and division.

As a result, endless ink (or bits, at least) have been spilled complaining about the design of Algol 68, including up to present times.2 Most of the early complaints centre on the supposed difficulty of writing a compiler for the language, which seems to be less of an issue today given the existence of the excellent GNU Algol 68 and Algol 68 Genie compilers. Many other complaints stem from a commonly held perspective that a fresh-sheet design including a new means for language description was much more than just coming up with a facelift for Algol 60, which had been the original goal of WG2.1, and which eventually materialized independently as Algol W, Pascal and its descendants.

More recent negative observations on Algol 68 rehash these two arguments over and over again, but beyond that, one clear thread emerges from the criticism: van Wijngaarden’s metalanguage is unusual, even today, and the writing style of the report is replete with puns and in-jokes that is obviously seen as insufferable to some and generates dismissive comments based on no real understanding of, nor experience with, the programming language itself. Many have stopped before starting, to their loss. Oh well!

One of the really delightful and useful by-products of the GNU Algol 68 compiler is The Algol 68 Jargon File. Having this document handy while learning the terminology of Algol 68 is a great help.

Another nearly essential document for learning Algol 68 is Informal Introduction to ALGOL 68, created at the behest of the committee as a more approachable document for learning the language and its accompanying terminology.

Algol 68 introduced the concept of user-definable operators, distinct from procedures, in several ways:

  • operators are monadic (taking one argument) or dyadic (taking two arguments), whereas procedures take zero or more arguments;
  • monadic operators are applied to their operand immediately to the right of the operator;
  • dyadic operators are applied to the operand immediately to the left and immediately to the right of the operator;
  • procedures with one or more arguments accepted those as a parenthesized list separated by commas, that is, a collateral clause;
  • operators are associated with the concept of priority; monadic operators have a single priority, higher than any dyadic operator; dyadic operators also declare their priority;
  • operators can be overloaded; that is, the same name can be defined multiply for different parameter types or modes;
  • both operators and procedures can achieve a kind of “overloading effect” by declaring parameters of united modes and dealing with the different constituent modes using a special kind of case statement, the conformity clause; however, operator overloading permits the defining of a similar operation on a new type in independent fashion, whereas using united modes requires altering the united parameter definition and conformity clause.

Finally, in Algol 68 there is a need to distinguish between a begin-symbol and, for example, a variable named “begin”, both of which are legal. In documentation, the begin symbol (and all other symbols) are written in boldface (sometimes bold italics, as in the Revised Report), as begin, end, if, then, else, do and so on; names, such as constants or variables, are written in italics, as x, y, foo, bar and so on; and non-text symbols such as parentheses, or plus-symbol as (, ), + and so on. Recognizing that this typography wasn’t immediately feasible back in the 1970s, a mechanism for marking text symbols was defined as stropping. Some forms of stropping included point stropping, where begin would be typed as .begin, or upper stropping, where begin would be typed as BEGIN.

By default, GNU Algol 68 uses supper stropping, where, as the fledgling ga68.pdf manual (created by the compiler build process) states:

In the SUPPER stropping regime bold words are written by writing a sequence of one or more taggles. Each taggle is written by writing a letter followed by zero or more other letters and digits and is optionally followed by a trailing underscore character. The first letter in a bold word shall be an upper-case letter. The rest of the letters in the bold word may be either upper- or lower-case.

The manual goes on to state that a mode-indicant, for example defining a 3D vector, could be entered as Vector3D, VECTOR3D, Vector_3D and so on, suggesting that “camel case” convention used for type or class definitions in other languages is supported here. The manual further counsels Introducing new operator-indicants as all in upper case, as in

v1 EQUALS v2

Finally, implied but not stated, all predefined symbols such as begin, end, if etc and all predefined mode-indicants such as int, real, bool etc are entered in lower case; and all predefined operators such as and, or, abs etc are entered in upper case.

Ok, enough background. Whew! Let’s get coding…

Conversion from integer to string

This article has a nice description of an efficient conversion from integer to string. We’ll follow its recommendations.

First, because we are proceeding from left to right for the sake of cutting the number of integer divisions in half, we need to know how many base-10 digits there are in the integer to be converted, so we define this Algol 68 operator:

op DIGITS = (int number) int: begin

    { get number of digits in an integer (max int = 2147483647) }

    int work = ABS number;
    if   work < 10 then 1
    elif work < 100 then 2
    elif work < 1 000 then 3
    elif work < 10 000 then 4
    elif work < 100 000 then 5
    elif work < 1 000 000 then 6
    elif work < 10 000 000 then 7
    elif work < 100 000 000 then 8
    elif work < 1 000 000 000 then 9
    else 10 fi

end { DIGITS };

Two alternatives exist to the sequence of if… elif… fi would be

  • binary search instead of sequential;
  • calculate the base 10 logarithm of the number and truncate it.

I have a feeling there is nothing much to be gained by either of these…

Second, we need a “power of 10” lookup table, which looks like this:

[]int int_powers_of_10 = (1, 10, 100, 1 000, 10 000, 100 000, 1 000 000, 10 000 000, 100 000 000, 1 000 000 000);

Since we know the number of digits, and we also know the sign, and we know we don’t want a plus-symbol to precede a positive number, we can modify the suggested algorithm slightly as follows:

op TOS = (int number) string: begin

    { convert integer to string with no MOD }

    int number_digits = DIGITS number;
    int work;
    int chars_needed = (number < 0 | work := -number; number_digits + 1 | work := number; number_digits);
    [1:chars_needed]char result;
    int result_char := (number < 0 | result[1] := "-"; 2 | 1);
    int char_zero = ABS "0";
    for digit_number from number_digits by -1 to 1 do
        int p10 = int_powers_of_10[digit_number];
        int digit = work % p10;
        result[result_char] := REPR (digit + char_zero);
        result_char +:= 1;
        work -:= digit * p10
    od;
    result

end { TOS };

A few things to mention in relation to the above code:

  • one thing that often catches me when I haven’t used Algol 68 for awhile:
    • the integer division symbol is % or OVER;
    • the real division symbol is / or DIV;
    • the integer modulo symbol is %* or MOD
    • I should probably always use OVER and MOD and I do dislike %* though I can see the sense of it…
  • in the definition of the constant chars_needed I use the brief form of if … then … else … fi which is ( … | … | … ) and I also set the value of the variable work within that expression; normally I would avoid side effects like this but I believe in this kind of simple case there is more clarity created by the compactness than by some other approach;
  • the same is true for the definition and initialization of the variable result_char and inserting the minus-symbol into result if the number is negative;
  • because we have the number of digits needed, we can create the output buffer result to be the exact size required;
  • I’m making a conscious effort to use constant values whenever possible, leaving only three variables: work, result and result_char.

Integer to string, with thousands separators

What if we want thousands separators in our output? Here’s TOSS:

op TOSS = (int number) string: begin

    { convert integer to string with no MOD and with thousands separators }

    int number_digits = DIGITS number;
    int work;
    int chars_needed = (number < 0 | work := -number; number_digits + 1 | work := number; number_digits) + number_digits % 3;
    [1:chars_needed]char result;
    int result_char := (number < 0 | result[1] := "-"; 2 | 1);
    int char_zero = ABS "0";
    for digit_number from number_digits by -1 to 1 do
        int p10 = int_powers_of_10[digit_number];
        int digit = work % p10;
        result[result_char] := REPR (digit + char_zero);
        if digit_number > 1 AND (digit_number - 1) MOD 3 = 0 then
            result_char +:= 1;
            result[result_char] := ","
        fi;
        result_char +:= 1;
        work -:= digit * int_powers_of_10[digit_number]
    od;
    result

end { TOSS };

A few comments:

  • the chars_needed constant definition is adjusted for the number of thousands separators required;
  • the separator itself is inserted after the thousands digit;
  • thinking for down the road, we should probably provide some basic internationalization here by defining constants that indicate the positive, negative, decimal and thousands separator characters.

Boolean to string

Ok, we’re on a bit of a roll… Let’s write a version of TOS that takes a boolean argument:

op TOS = (bool v) string: begin

    { convert boolean to “true” or “false }

    if v then "true" else "false" fi

end { TOS };

This is pretty self-explanatory – the result of the if … fi is either the string “true” or the string “false”.

Bits and int to bit string

Moving on to TOSB, let’s do the operator to convert a bits value to the standard Algol 68 binary representation e.g. 2r0101010:

op TOSB = (bits v) string: begin

    { convert bits to Algol 68 bit string representation }

    [1:bits_width]char co;
    for c from LWB co to UPB co do
        if c ELEM v then
            co[c] := "1"
        else
            co[c] := "0"
        fi
    od;
    "2r" + co

end { TOSB };

This is straightforward, depending on the constant bits_width and the bits operator ELEM which indicates whether a given bit is 0 or 1, defined in the standard prelude. Note that it prints all 32 bits, rather than suppressing leading zeros.

We can build on this to handle integer values:

op TOSB = (int v) string: begin

    { convert int to Algol 68 bit string representation }

    TOSB BIN v

end { TOSB };

Here we use the BIN operator to convert the int value v to bits.

This is a good moment to revisit the concept of monadic operators – the expression TOSB BIN v is evaluated as though there are parentheses around BIN v, forcing it to be evaluated before the operator TOSB is applied. It’s also a good moment to remember that the so-called “unary minus” is a monadic operator in Algol 68. Later on, when we consider extending these TOS operators to work on long int, long bits and so on, we will see that the expression

long 3

is not an operator applied to the int value 3 but rather a way of declaring a long int constant, whereas

LENG 3

converts the int value 3 to long int.

Bits and string to hexadecimal

FInally, the TOSX operator that converts a bits value to the standard Algol 68 hexadecimal representation e.g. 16r1234abcd:

op TOSX = (bits v) string: begin

    { convert bits to Algol 68 hexadecimal string representation }

    bits bv := v;
    [1:bits_width % 4] char co;
    int vc0 = ABS "0", vca = ABS "A";
    for c from UPB co by -1 to LWB co do
        int hdv = ABS (bv AND 16r0f); 
        co[c] := REPR (hdv + (hdv < 10 | vc0 | vca - 10)); 
        bv := bv SHR 4
    od;
    "16r" + co

end { TOSX };

and the version that converts an int to hexadecimal:

op TOSX = (int v) string: begin

    { convert int to Algol 68 hexadecimal string representation }

    TOSX BIN v

end { TOSX }; 

Let’s test it…

Here’s a very minimal test script:

pr include "tos.a68" pr

begin

	puts("TOS 123456 = " + TOS 123456 + "'n");
	puts("TOSS 123456 = " + TOSS 123456 + "'n");
	puts("TOS -123456 = " + TOS -123456 + "'n");
	puts("TOSS -123456 = " + TOSS -123456 + "'n");

	puts("TOS (1 = 1) = " + TOS (1 = 1) + "'n");

	puts("TOSB 2r010101 = " + TOSB 2r010101 + "'n");
	puts("TOSB 42 = " + TOSB 42 + "'n");
	puts("TOSB -1 = " + TOSB -1 + "'n");

	puts("TOSX 2r010101 = " + TOSX 2r010101 + "'n");
	puts("TOSX 42 = " + TOSX 42 + "'n");
	puts("TOSX -1 = " + TOSX -1 + "'n");


	skip
end

A few observations:

  • here we see clearly the difference between TOS as operators vs as procedures – the only time we use parentheses is to provide an expression as the argument, for example TOS (1 = 1);
  • what’s not seen is that I have incorporated all my operator definitions into the file “tos.a68” and I include it using the pr includepr pragmat;
  • when I’m writing test code I often leave a skip just before the end of the particular-program (the serial clause between the begin and end), so that I don’t have to be careful about leaving off the semicolon on the last test statement.

Assuming this code is in the file “testtos01a.a68”, compile it with

ga68 testtos01a.a68

and run it with

./a.out

to get the output

TOS 123456 = 123456
TOSS 123456 = 123,456
TOS -123456 = -123456
TOSS -123456 = -123,456
TOS (1 = 1) = true
TOSB 2r010101 = 2r00000000000000000000000000010101
TOSB 42 = 2r00000000000000000000000000101010
TOSB -1 = 2r11111111111111111111111111111111
TOSX 2r010101 = 16r00000015
TOSX 42 = 16r0000002A
TOSX -1 = 16rFFFFFFFF

Dealing with long int, long bits

We’re going to park this issue for now and revisit it later, but a few things to ponder in the meantime:

  • it seems like the conversion process will look almost the same for long int and long bits as for int and bits, which leads to the question – should we duplicate the code, then change it? or should we use the same code for both int and long int, bits and long bits, by converting the int and bits parameters to long int and long bits, thereby using the long version for each?
  • we’ve already almost duplicated the code for TOS and TOSS on int values; should we refactor the formatting into a shared routine and use a flag to indicate whether or not to incorporate the thousands separators?
  • should we take on the idea of internationalizing the code while we’re making the change, so that we can use for example the period as the thousands separator and the comma as the decimal point, or printing numbers in financial format, where ␢12,345.00␢ represents positive 12345 and (12,345.00) represents negative 12345?

In our next article, we’ll take on the floating point conversion operators TOSE and TOSF.


  1. Algol 68 Revised Report, pp. 159-160. ↩︎
  2. See for example “Algol 68 seemed like a good idea until it wasn’t”, published in late 2024 in the blog The Craft of Coding. ↩︎

Leave a Reply