Category Archives: education

Lab Rat Code

My older son has gone back to school to study IT, and we occasionally discuss his courses or internship (though not his homework). As a graphic artist, musician, gamer, and box-builder, he is an experienced user, but thinking as a programmer is new to him. Therefore I find his perspective on programming an interesting counterpoint to my own:

  • I have been programming long enough to have forgotten what seemed clear or opaque when I was a beginner.
  • Our backgrounds (artistic versus mathematical) provided different sets of expectations and metaphors.
  • He can take advantage of, and for granted, an enormous variety of resources that did not exist when I started, from a wealth of alternative programming languages and open-source code to pervasive consumer-level use of the Internet.

CodeMazeSmall.jpg

He has also encountered a phenomenon that has frustrated me throughout my time as a student, teacher, and practitioner: lab rat code.

Illustrative code in blogs, articles, and textbooks is often unrealistic for the same reason that physics homework refers to friction-free pool tables and the stereotypical psychology lab is filled with rats and mazes. A writer who wants to illustrate a technique needs to apply it to a task that is simple enough not to create distraction. Stated from the other perspective, a reader who doesn’t understand the goal will likely not appreciate the path.

So the canonical first program in a new language prints “Hello, world!” to standard output, often soon followed by YAFG (Yet Another Fibonacci Generator).

The challenge remains for the reader to ignore distraction. I’d like to offer some strategies that I find helpful in that role.

  1. If the sample task seems too trivial, then the author succeeded in picking a goal that doesn’t require much of your attention. Stop reading long enough to sketch out what you regard as the obvious solution, then resume reading to see if the author’s solution offers you any new insights.
  2. If the task seems unfamiliar, then the author may have used more detail than necessary. Skim the problem statement, then examine the solution to see how many of those details actually matter for the illustration.
  3. If the solution seems too heavy, the author may be illustrating it on a problem that doesn’t require its full power. Check to see whether the author follows up with a harder problem that exploits more of the solution’s capabilities. Or create your own example of a harder problem with similar characteristics. Delaying the natural tendency to think, “I can solve this more easily another way!” may provide an opportunity to understand the reasons for the apparent complexity.
  4. Develop a tolerance for uncertainty. I have developed a better appreciation for some concepts only through repeated exposure. (It really doesn’t matter whether that statement is about the concept or about me. The result was worth the process.) Similarly, I have sometimes learned something interesting from a solution to a problem outside my experience or interest. Over time you may develop a sense for what you can safely ignore.

The author who wants to reduce the risk of distraction might consider some of these strategies:

  1. Avoid cliches like the plague. Instead of offering YAFG, come up with a fresher example (unless you know your readers really LIKE Fibonacci numbers.) Which brings me to my next cliche…
  2. Know your audience. I have heard busy practitioners dismiss a technique because (IMHO) they had never been shown its application to a problem about which they cared. A bill-of-materials example might illustrate recursion to an industrial programmer far better than parsing or binary tree traversal.
  3. When using a lab-rat example, be explicit about that fact. At least then your reader will be forewarned.
  4. If realism, or an agenda beyond the single example, prompts you to include more detail than necessary for the current illustration, consider the structure of your presentation. Can those details be delayed? If not, than being explicit about which ones are important to the solution may help your reader avoid bogging down on the less relevant ones.

I’ll try to apply those practices myself, both as reader and writer.

BuilderBuilder: The Model in Java

This post will describe a tiny Java model for implementing the BuilderBuilder task. It is simple almost to the point of crudity, because the goal of the series is to compare languages and styles, not to produce production-ready sample code.

This post will focus on the parts of the overall data flow highlighted below:

GenerationModel.jpg

The interfaces:

I’m using interfaces to hide implementation from the remainder of the code. The first version will use simple DTOs, but I want to leave other options (e.g. by reflection against existing DTO classes) open for later exploration.

This first model has two interfaces; one for a Java class:

package com.localhost.builderbuilder;

public interface IJClass {
    public String getPkg();
    public String getName();
    public IJField[] getFields();
}

and the other for a Java field:

package com.localhost.builderbuilder;

public interface IJField {
    public String getName();
    public String getType();
}

We all know that “the simplest thing that could possibly work” doesn’t mean “the stupidest thing that could possibly work”. The use of an array may cross that line, but it was a deliberate choice. Developers who moved to OOP from imperative programming are very familiar with arrays. We’ll be able to compare array processing against the FP style of list processing, and perhaps consider other OOP alternatives later on.

First implementations:

In the spirit of eating our own dog food, the simple DTO implementation of those interfaces will contain their own Builder inner classes. Given that, there’s no surprise in the JFieldDTO code, which appears at the end of this post.

The JClassDTO class throws in one new wrinkle—instead of having a fields(IJField[] fields) method that accepts an entire field array, JClassDTO.Builder provides a field(IJField field) method that accepts one field at a time, accumulating them to be placed in an array by the instance() method. The complete code for JClassDTO is given at the end.

It remains to be seen whether this DTO-style implementation is throw-away code, but getting a first implementation in hand will allow us to start comparing data types and structures with the other language, and then move directly to the generation phase of the project. We can always come back and add features (and complexity 😉 ) at a later time.


Recommended reading:


The JFieldDTO implementation:

package com.localhost.builderbuilder;

public class JFieldDTO implements IJField {

    private final String name;
    private final String type;

    public static class Builder {
        
        private String name;
        private String type;

        private Builder() {
            // do nothing
        }

        public Builder name(String name) {
            this.name = name;
            return this;
        }

        public Builder type(String type) {
            this.type = type;
            return this;
        }

        public JFieldDTO instance() {
            return new JFieldDTO(name, type);
        }
    }

    public static Builder builder() {
        return new Builder();
    }

    private JFieldDTO(String name, String type) {
        this.name = name;
        this.type = type;
    }

    public String getName() {
        return name;
    }

    public String getType() {
        return type;
    }

}

The JClassDTO implementation:

package com.localhost.builderbuilder;

import java.util.ArrayList;
import java.util.List;

public class JClassDTO implements IJClass {

    private final String pkg;
    private final String name;
    private final IJField[] fields;

    public static class Builder {
        
        private String pkg;
        private String name;
        private List<JFieldDTO> fields;

        private Builder() {
            fields = new ArrayList<JFieldDTO>();
        }

        public Builder pkg(String pkg) {
            this.pkg = pkg;
            return this;
        }

        public Builder name(String name) {
            this.name = name;
            return this;
        }

        public Builder field(JFieldDTO field) {
            this.fields.add(field);
            return this;
        }

        public IJClass instance() {
            return new JClassDTO(
                pkg,
                name,
                fields.toArray(new JFieldDTO[fields.size()])
            );
        }

    }

    public static Builder builder() {
        return new  Builder();
    }

    private JClassDTO(String pkg, String name, IJField[] fields) {
        this.pkg = pkg;
        this.name = name;
        this.fields = fields;
    }

    public String getPkg() {
        return pkg;
    }

    public String getName() {
        return name;
    }

    public IJField[] getFields() {
        return fields;
    }

}

Updated 2009-05-09 to fix some formatting and to add a category.

“Linguistics” vs. Mathematics?

I happened across an interesting post on Chris Okasaki’s blog, titled Less than vs Greater than. Let me suggest that you read it before continuing here.

I would paraphrase his point about errors he observed in students’ programs as follows:

A student who writes an expression such as

expL < expR

often appears to lock on the concept of “something being smaller” and then become mentally stuck on “dealing with smaller stuff”, even if that is not appropriate for the meaning of expL and expR at that point in the program.

He illustrates his point very nicely with a flawed binary search.

For quite some time I’ve tended to write expressions of the form

lowerBound <= someVar && someVar < upperBound

to express that someVar is within the half-open range

[lowerBound..upperBound)

e.g. array subscripts that must be at least zero, but less than the length of the array. The visual hint of placing the variable textually between the limiting values seemed nicely mnemonic to me. (From there, it has also seemed natural to experiment with preferring left-to-right, lesser-to-greater ordering consistently in comparison expressions, although I’m not suggesting that as a universal coding convention).

However, I was quite surprised by the first comment, which described as “linguistically wrong” the common C-language-based idiom

if (5 == i)

(intended to catch typos involving only a single equal sign). The commenter said

Linguistically that’s wrong. You’re not testing 5 to see if it’s equal to i, you’re testing i.

which implies an asymmetrical interpretation of equality as being a “test” on its left-hand value!

By ignoring the fact that == is simply an assertion that its operands are equal, that interpretation fails on many completely valid cases, such as

(a + 1 == b - 1)

and the first clause

lowerBound <= someVar

of the “within bounds” cliché above.

That misinterpretation of == seems consistent with a variety of incorrect or awkward uses of boolean expressions widely seen. Many of us have probably seen code (or read blog posts about code) resembling:

boolean fooIsEmpty;
...
if (foo == null) {
    fooIsEmpty = true;
} else {
    if (foo.length() == 0) {
        fooIsEmpty = true;
    } else {
        fooIsEmpty = false;
    }
}

I suspect two culprits behind this kind of abuse:

  1. Failure to include the right kind and amount of Mathematics in the education of a programmer, and
  2. The C/FORTRAN use of = for assignment.

With respect to point 1, I believe that Boolean Algebra is simply a fundamental skill for programming. The ability to manipulate and understand boolean expressions equally important as the ability to deal with numeric expressions for most programming of the kind I see daily. It is understandable that a programmer whose training never addressed boolean expressions except in the context of if() or while() would be uncomfortable with other uses, but that’s a problem to be solved, not a permanent condition to be endured.

Regarding point 2, much of what I’ve read or experienced supports the idea that FORTRAN—and later, C—were more commonly used in the US than Algol or other alternatives due to political, cultural, and commercial issues rather than technical ones. (I recommend Dijkstra’s “A new science, from birth to maturity” and Gabriel’s “Worse is Better” as good starting points.)

Whether that conclusion is valid or flawed, the decision to use = for the (asymmetrical!) “assignment” operation immediately creates two new problems:

  1. The need to express the “equality test” in a way that won’t be confused with assignment; and
  2. The risk that users of the language become confused about the symmetry of equality, due to guilt by association with the “assignment” operator.

Algol—and its descendants, including Pascal—avoided those problems by using the asymmetrical := for assignment. The ! character is used as an operator or suffix in other languages to express destructive value-setting. But Java, C#, and others, have followed the FORTRAN/C convention, with the predictable effects.

Given such far-reaching consequences from a single character in the source code, I’m reminded again how important it is to make good choices in coding style, API design, and other naming contexts.


Postscript:

The left-to-right ordering of the number line has some interesting cultural and even individual baggage. I recall reading about a mathematician whose personal mental model was of negative values being behind him and positive values receding into the distance in front of him.

I believe it was in connection with the Dutch National Flag Problem that Esdger Dijkstra wrote about a student whose native language was written right-to-left. While other students had designed programs with indices increasing from zero, that student had produced an equally-valid program with an index that decreased from the maximal value.

Scala and Programming 2.0

I commented elsewhere on how the “Architecture of Participation” idea may be percolating into the field of programming languages. I am especially interested in seeing whether the adoption of Scala provides evidence of this phenomenon.

Scala is a strongly, statically typed language implemented on the JVM—all characteristics that raise eyebrows (if not noses) in some circles. However, Scala’s ultralight approach to syntax is very much in line with the current taste for highly flexible notation and internal DSLs as a tool of expression.

Type inference is a very attractive compiler feature. And it’s great to get performance improvements “for free” every time the JVM team makes HotSpot smarter about JIT compilation, method in-lining, etc. But every time I revisit the ideas in Martin Odersky’s Scala talk at Javapolis 2007, I’m impressed with the design of Scala as a notation that invites participation.

Time will tell.

Stanford on Stairs

Stanford University has a student-led course on “Cross-Paradigm Programming with Scala” this Spring. The resource list on their page has a number of must-reads, including the draft Programming in Scala book by Martin Odersky, Lex Spoon, and Bill Venners, and the draft scala.xml book by Burak Emir. They’ve also scheduled a guest lecture by David Pollak of lift fame.

To jump nearly to the other side of the continent, I hear rumors that there’s a course in development at an old university near a large river.

Language(s) for teaching programming

I’ve seen (and participated in) a number of discussions recently about the selection of a first programming language. I don’t think the choice is a trivial matter, and don’t necessarily think there’s a single right answer, depending on the overall goals of a curriculum. Here are some choice strategies I’ve seen discussed recently:

  • Commercially-popular language
    Pros:

    • students see a connection to the job market,
    • prospective employers may like it,
    • students may have had prior exposure (especially non-traditional students), and
    • a wealth of related materials (books, web pages, etc.) is available.

    Cons (especially if the curriculum doesn’t provide multi-language experience):

    • risks biasing the student’s perception of programming,
    • risks putting the student into the “knowledge half-life” problem, and
    • risks the nasty choice, as language trends change, between churn in the curriculum or becoming less “relevant”.
  • Academically-popular research language
    Pros:

    • professors and grad students may like it, and
    • some of these languages explore ideas that may be important by the time the students graduate.

    Cons:

    • similar to the “cons” list above(no pun intended!).
  • Teaching-oriented language (remember Pascal?)
    Pros:

    Cons:

    • can risk separation of “theory” and “practice”,
    • teaching-oriented aspects may shield students from the “pain” of production programming, and
    • prospective employers may strongly prefer “practical” language experience.

Following the line of thinking from the previous post, I believe that a language can serve as a bride from one style of thinking and problem solving to another, if it is properly designed for that purpose. Going further, the book Concepts, Techniques, and Models of Computer Programming makes use of a multi-paradigm approach in which a kernel language is extended in various ways to expose the student/reader to a variety of styles of programming. I’m currently focusing my spare time on Scala, but I’m intrigued by Van Roy and Haridi’s use of Oz to explore the style “spokes” out from a simple “hub” of basic computation concepts.

Maybe I can get back to Mozart/Oz after I’ve mastered Scala. Of course, I’d have to make friends with emacs