A Small Algol 68 Project, Part 3

In memory of J. Kevin Douglas, a good friend and fellow fan of Algol 68

In the previous article in this series, we diverged from the main goal of this three article mini-series – determining a best fit of a line to a data set consisting of pairs of (x,y) values. We did so in order to have some utility routines for reading delimited text files (like CSV files), and to learn a bit more about Algol 68.

In this article, we’re going to program the calculations necessary. It could be a good moment to review the first article in the series in order to remember that we are solving a matrix equation

X^T X c = X^Ty

where the matrix X has n rows (n being the number of data points) and two columns (the first being the x data points, the second being all 1), and the vector y has n rows (containing all the y data points).

For those of you struggling to remember how to come up with the matrix X^T, you can think of spinning X diagonally along the upper left – lower right dimension so that the first row is all the x values (the first column of X) and the second row is all the 1s (the second column of X).

Then we calculate the following dot products and assign them to the elements of X^T X

	(row 1 of X^T) dot (column 1 of X)	(row 1 of X^T) dot (column 2 of X)
	(row 2 of X^T) dot (column 1 of X)	(row 2 of X^T) dot (column 2 of X)

We can express these dot products as sums of multiplied terms as follows

	x₁² + x₂² + … + x_n²	x₁ + x₂ + … + x_n
	x₁ + x₂ + … + x_n	1 + 1 + … + 1

On the right hand side, we proceed in a similar fashion:

	(row 1 of X^T) dot y
	(row 2 of X^T) dot y

to calculate X^Ty:

	x₁ · y₁ + x₂ · y₂ + … + x_n · y_n
	y₁ + y₂ + … + y_n

Assembling the matrix equation from the data

Inspecting the above sums, we see that we can accumulate them “on the fly”, as we read the input data file; so no need to create an n rows by 2 columns matrix, nor its transpose, nor an n row vector.

Note as well that the matrix X^T X is symmetric, that is the value in row 1 column 2 is the same as the value in row 2 column 1; so we only need to accumulate one of those; we’ll do that into row 1 column 2.

So now we know how to accumulate the values in the matrix equation as we read them; but how do we solve the matrix equation to get the slope m and the y-intercept b of the line?

Solving the matrix equation

In a simple case like this, we can use Gaussian elimination, followed by back substitution. That is, we determine a factor a that when row 1 values of X^T X and X^Ty are muliplied by a and added to the values of row 2, the new value of X^T X_1,1 is zero.

Therefore, the value of a needs to be calculated as

a = -X^T X_2,1 ÷ X^T X_1,1

and we update the second row of X^T X as follows

X^T X_2,1← X^T X_2,1+ a · X^T X_1,1 = X^T X_2,1– X^T X_2,1= 0

X^T X_2,2← X^T X_2,2+ a · X^T X_1,2

and the second element of X^T y as follows

X^T y₂← X^T y₂+ a · X^T y₁

At this point we can back substitute for the value of b by taking the dot product of the second row of X^T X with the vector [m,b] to solve for b

0 · m + X^T X_2,2· b = X^T y₂

and so

b = X^T y₂÷ X^T X_2,2

And then, knowing b, similarly with the dot product of the first row of X^T X for m

X^T X_1,1 · m + X^T X_2,2 · b = X^T y₁

m = (X^T y₁ – b · X^T X_2,2)÷ X^T X_1,1

Never mind the formulas, let’s see the code

As I began this article, I realized I forgot one small but important utility routine, in order to convert the STRING fields on the data records to REAL. Here it is:

     1	PROC str to real = (STRING s) REAL:
     2	BEGIN
     3	    REAL v := 0.0, pt := 1.0, scale := 1.0;
     4	    BOOL negative := FALSE;
     5	    INT a0 = ABS "0";
     6	    FOR d FROM UPB s BY -1 TO LWB s DO
     7	        CHAR sd = s[d];
     8	        IF is digit(sd) THEN
     9	            v +:= (ABS sd - a0) * pt;
    10	            pt *:= 10.0
    11	        ELIF sd = "-" THEN
    12	            negative := TRUE
    13	        ELIF sd = "." THEN
    14	            scale := pt
    15	        FI
    16	    OD;
    17	    (negative | -v | v) / scale
    18	END;

Nothing too crazy here. Worth emphasizing:

This procedure returns a real value; hence the final expression – a conditional-clause in Algol 68 terms – yields a REAL value

Algol 68 Genie only uses ASCII characters in names, so the numeric characters are therefore known to be contiguous and the corresponding integer value can be calculated by taking the ABS of the character minus ABS “0”

And finally, the least squares fitting code:

     1	#
     2	    Utility routines
     3	#
       
     4	PR read "str_to_real.a68" PR
     5	PR read "split.a68" PR
     6	PR read "each_delimited_line.a68" PR
       
     7	#
     8	    Perform a least-squares fit of a line to a data file containing (x,y) values
     9	#
       
    10	BEGIN
       
    11	    [1:2,1:2] REAL xtx := ((0.0, 0.0),(0.0,0.0));
    12	    [1:2] REAL xty := (0.0, 0.0);
       
    13	    # Accumulate XTX and XTy #
       
    14	    each delimited line("test_data.txt", ",", (STRING line, []STRING fields, INT line count) VOID: BEGIN
    15	        # skip the header line #
    16	        IF line count > 1 THEN
    17	            REAL xi = str to real(fields[1]), yi = str to real(fields[2]);
    18	            xtx[1,1] +:= xi * xi;
    19	            xtx[1,2] +:= xi;
    20	            xtx[2,2] +:= 1.0;
    21	            xty[1] +:= xi * yi;
    22	            xty[2] +:= yi
    23	        FI
    24	    END);
    25	    xtx[2,1] := xtx[1,2];
       
    26	    # let's see the matrix and vector before Gauss #
       
    27	    print(("before Gauss:",new line));
    28	    print(("⎡",xtx[1,1],"   ",xtx[1,2],"⎤  =  ⎡",xty[1],"⎤",new line));
    29	    print(("⎣",xtx[2,1],"   ",xtx[2,2],"⎦     ⎣",xty[2],"⎦",new line));
       
    30	    # Gaussian elimination #
       
    31	    REAL a = -xtx[2,1] / xtx[1,1];
    32	    xtx[2,1] +:= a * xtx[1,1];
    33	    xtx[2,2] +:= a * xtx[1,2];
    34	    xty[2] +:= a * xty[1];
       
    35	    # let's see the matrix and vector after Gauss #
       
    36	    print(("after Gauss:",new line));
    37	    print(("⎡",xtx[1,1],"   ",xtx[1,2],"⎤  =  ⎡",xty[1],"⎤",new line));
    38	    print(("⎣",xtx[2,1],"   ",xtx[2,2],"⎦     ⎣",xty[2],"⎦",new line));
       
    39	    # back substitution #
       
    40	    REAL b = xty[2] / xtx[2,2];
    41	    REAL m = (xty[1] - xtx[1,2] * b) / xtx[1,1];
       
    42	    # let's see the equation #
       
    43	    print(("equation:",new line));
    44	    print(("y = ",fixed(m,0,7)," * x + ",fixed(b,0,7),new line))
       
    45	END

Not too much new here, either, but worth mentioning:

I’ve saved the utility procedures in files and I’m using Algol 68 Genie’s read pragmat to include them at compile time. Pragmats are Algol 68’s mechanism for issuing directives to the compiler.^¹
I’ve used some Unicode characters to better symbolize the matrix and vector. This seems to be OK in Algol 68 Genie character strings… something I have to dig into a bit more.
You’ve seen this before, and maybe you’ve made a mental note… but for example in line 40, you see a constant value b calculated from an expression that includes the values of variables at the time of declaration of the constant. For clarity:
- Algol 68 constants are therefore more like what we think of today as immutable values and unlike #define constants in C.
- the scope of all variables and constants lies within the enclosed-clause where they are declared, including within any other enclosed-clauses declared within. (Algol 68 Genie warns if you declare a procedure with parameters of the same name as constants or variables declared within the same enclosed-clause.)
Given that, in lines 11 and 12, we used a row display to initialize the matrix and vector, why can’t we use a formula like xtx +:= ((xi * xi, xi),(xi, 1.0)) to accumulate the matrix (and similarly for the vector)? The answer to this is both interesting and disappointing:
- We can, but..
- We would first have to define a +:= operator that takes a REF [,]REAL parameter and a [,]REAL parameter and returns a [,]REAL value, because that operator, operating on matrices, is not defined in the standard prelude.
- You might think that this seems a bit odd, perhaps “non-orthogonal”, since the assignation := seems to take all manner of expressions on its left- and right-hand sides, but it’s worth reflecting that you are in essence defining a composite “add together two values and then assign” operator that takes the same parameters; it’s the “add together” part, not the assignation itself that needs extending. And, let’s just emphasize for now, the assignation := is special in a few ways, including the fact that its right-hand side must be fully evaluated before being assigned to its left-hand side, whereas the arguments to dyadic operators in Algol 68 are evaluated collaterally, that is, in no particular order.
- This approach would be more compact but would also not take into account the symmetric nature of the matrix and therefore do an unnecessary accumulation. While this doesn’t sound like much, if we are doing a regression with many independent variables, it starts to add up (for example, if we were fitting z = a + bx + cy, we would have a 3×3 matrix). We could of course make a special version of “plus and becomes” that only builds the diagonal and upper right part of the matrix, but this does not smell like +:= to me.

Let’s run the program:

$ a68g regress.a68
before Gauss:
⎡+1.55925163017308e  +4   +2.59649902000000e  +3⎤  =  ⎡+1.81890000036266e  +4⎤
⎣+2.59649902000000e  +3   +5.77000000000000e  +2⎦     ⎣+3.17362830000000e  +3⎦
after Gauss:
⎡+1.55925163017308e  +4   +2.59649902000000e  +3⎤  =  ⎡+1.81890000036266e  +4⎤
⎣+0.00000000000000e  +0   +1.44625453749719e  +2⎦     ⎣+1.44757284553051e  +2⎦
equation:
y = .9998472 * x + 1.0009115
$

We know the equation of the original line was

y = x + 1

before we randomized the y values; so the coefficients m and b determined by the fit seem quite reasonable (each equal to one to four significant figures).

We could calculate residuals and various statistical measures, but I’m going to leave that as an exercise for the interested reader.

Now; wasn’t that more fun than "Hello world"?

In the next article, we’ll write a pretty-printer that makes our Algol 68 code look the way it appears in reference material.

1 Learning Algol 68 Genie, 9.7.2 Inclusion of files, p. 201.

InAlgol 68, Programming