“Linguistics” vs. Mathematics?

I happened across an interesting post on Chris Okasaki’s blog, titled Less than vs Greater than. Let me suggest that you read it before continuing here.

I would paraphrase his point about errors he observed in students’ programs as follows:

A student who writes an expression such as

expL < expR

often appears to lock on the concept of “something being smaller” and then become mentally stuck on “dealing with smaller stuff”, even if that is not appropriate for the meaning of expL and expR at that point in the program.

He illustrates his point very nicely with a flawed binary search.

For quite some time I’ve tended to write expressions of the form

lowerBound <= someVar && someVar < upperBound

to express that someVar is within the half-open range

[lowerBound..upperBound)

e.g. array subscripts that must be at least zero, but less than the length of the array. The visual hint of placing the variable textually between the limiting values seemed nicely mnemonic to me. (From there, it has also seemed natural to experiment with preferring left-to-right, lesser-to-greater ordering consistently in comparison expressions, although I’m not suggesting that as a universal coding convention).

However, I was quite surprised by the first comment, which described as “linguistically wrong” the common C-language-based idiom

if (5 == i)

(intended to catch typos involving only a single equal sign). The commenter said

Linguistically that’s wrong. You’re not testing 5 to see if it’s equal to i, you’re testing i.

which implies an asymmetrical interpretation of equality as being a “test” on its left-hand value!

By ignoring the fact that == is simply an assertion that its operands are equal, that interpretation fails on many completely valid cases, such as

(a + 1 == b - 1)

and the first clause

lowerBound <= someVar

of the “within bounds” cliché above.

That misinterpretation of == seems consistent with a variety of incorrect or awkward uses of boolean expressions widely seen. Many of us have probably seen code (or read blog posts about code) resembling:

boolean fooIsEmpty;
...
if (foo == null) {
    fooIsEmpty = true;
} else {
    if (foo.length() == 0) {
        fooIsEmpty = true;
    } else {
        fooIsEmpty = false;
    }
}

I suspect two culprits behind this kind of abuse:

  1. Failure to include the right kind and amount of Mathematics in the education of a programmer, and
  2. The C/FORTRAN use of = for assignment.

With respect to point 1, I believe that Boolean Algebra is simply a fundamental skill for programming. The ability to manipulate and understand boolean expressions equally important as the ability to deal with numeric expressions for most programming of the kind I see daily. It is understandable that a programmer whose training never addressed boolean expressions except in the context of if() or while() would be uncomfortable with other uses, but that’s a problem to be solved, not a permanent condition to be endured.

Regarding point 2, much of what I’ve read or experienced supports the idea that FORTRAN—and later, C—were more commonly used in the US than Algol or other alternatives due to political, cultural, and commercial issues rather than technical ones. (I recommend Dijkstra’s “A new science, from birth to maturity” and Gabriel’s “Worse is Better” as good starting points.)

Whether that conclusion is valid or flawed, the decision to use = for the (asymmetrical!) “assignment” operation immediately creates two new problems:

  1. The need to express the “equality test” in a way that won’t be confused with assignment; and
  2. The risk that users of the language become confused about the symmetry of equality, due to guilt by association with the “assignment” operator.

Algol—and its descendants, including Pascal—avoided those problems by using the asymmetrical := for assignment. The ! character is used as an operator or suffix in other languages to express destructive value-setting. But Java, C#, and others, have followed the FORTRAN/C convention, with the predictable effects.

Given such far-reaching consequences from a single character in the source code, I’m reminded again how important it is to make good choices in coding style, API design, and other naming contexts.


Postscript:

The left-to-right ordering of the number line has some interesting cultural and even individual baggage. I recall reading about a mathematician whose personal mental model was of negative values being behind him and positive values receding into the distance in front of him.

I believe it was in connection with the Dutch National Flag Problem that Esdger Dijkstra wrote about a student whose native language was written right-to-left. While other students had designed programs with indices increasing from zero, that student had produced an equally-valid program with an index that decreased from the maximal value.

Advertisements
Trackbacks are closed, but you can post a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: