Sliding up the banister

Sneaking up on functional programming

Neely’s Laws

I first wrote down these observations ten or fifteen years ago, as a result of several tongue-in-cheek conversations. I recently had occasion to apply one of them, so I thought I might as well post them here. Please don’t take them (or me!) any more seriously than I do.

However, I might observe that the first post of “The burden of FP” series can be read as an extended statement of Neely’s Third Law.


Neely’s First Law

(Monotonicity of Complexity)

Complexity is like entropy;
you can’t decrease it and doing almost anything increases it.
You can hide it, cover it up, pretend it’s not there (until later),
or make it somebody else’s problem,
but it won’t go away.


Neely’s Second Law

(The Means/Ends Dictum)

Solve the problem;
don’t solve the solution.


Neely’s Third Law

(The Blind-Spot Mandate)

In any systems design effort,
begin by carefully documenting all your unconscious assumptions.


Neely’s Fourth Law

(The Focus Figment)

A tool is an instrument for limiting the types of work you are able to perform.


Neely’s Fifth Law

(The Skills Lifecycle)

Necessities become options become obsolete become artforms.


Neely’s Sixth Law

(The Intuition Impasse)

“Common sense”
is an oxymoron.


2008-04-09 Posted by joelneely | Uncategorized | | 1 Comment

The burden of FP, part 3

Inside out

This series has used the first problem from Project Euler to look at different programming styles; Part 1 asked questions about the obvious imperative solution, and Part 2 used higher-order functions in a functional approach. This part considers performance to take us in a different direction.

Despite their differences in style, all of the previous solutions had the following scheme in common:

ProjectEuler-1-a.jpg

Beginning with the positive integers below a given limit, select those that are multiples of two given values, and compute the sum of the selected set. The higher-order function approach allowed us to think about each of the components individually, composing them into a solution. However, this basic generate-select-process pipeline may hide some wasted effort.

There and back again

First, notice that problem specified multiples of 3 and 5 but the filtering used division, in the disguise of remainder calculation. This works only because the multiple-of property has an easy-to-compute inverse.

Second, just under half of all positive integers (7/15 to be exact) actually fall into the selected set. The generate-and-test scheme becomes more wasteful as the yield becomes smaller. For example, if the problem had been to sum multiples of 97 and 113, then most of the generated numbers would be discarded.

Thinking along these lines suggests another scheme, illustrated below:

ProjectEuler-1-b.jpg

Instead of generating and discarding failed candidates, this approach directly generates the two sets of multiples (e.g., of 3 and 5). Merging those two sets will ensure that duplicates (e.g. 15) are only included once. The upper limit is applied to the merged values, and the results are summed.

I want to maintain separation of concerns here, so I’m going to pretend that we don’t know how to limit the sets of multiples (i.e., that we can’t easily compute the largest multiple of 3 less than 1000). In other words, I’m going to take this opportunity to model the multiples as unbounded sequences, using Scala’s Stream type, which provides a lazy list.

Given integer expressions init and incr, evaluating Stream.from(init, incr) returns the stream of values beginning with init and stepping by incr. For example:

  scala> val odds = Stream.from(1, 2)
  odds: Stream[Int] = Stream(1, ?)
  scala> odds take 20 print
  1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, Stream.empty

So the stream of multiples of a given value are easy to produce:

  scala> val threes = Stream.from(3, 3)
  threes: Stream[Int] = Stream(3, ?)
  scala> threes take 20 print
  3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, Stream.empty

So much for the first part of this design!

Don’t merge the streams!

Ignoring the above advice, here’s a simple recursive function to merge two streams of increasing integers:

  def merge(as: Stream[Int], bs: Stream[Int]): Stream[Int] = {
    val h = as.head min bs.head
    Stream.cons(h, merge(as dropWhile {_ == h}, bs dropWhile {_ == h})
    )
  }

At any point, the merged stream contains the minimum of the input streams followed by merging whatever follows that minimum in both of the original streams.

That kind of terse, recursive definition is fairly common in FP, and is sufficiently unfamiliar to some that it may be a bit off-putting. At the risk of using a tired analogy, it’s like bike-riding; once you can stay on long enough to get down the driveway, you’ve got it made. (And it’s worth the effort, because you can outrun the kids that are still on foot! ;-)

As proof of that point, we now have everything we need for the solution to problem one:

  def sumStream(a: Int, b: Int)(n: Int) =
    (merge(Stream.from(a, a), Stream.from(b, b)).takeWhile{_ < n}.foldLeft (0)) {_ + _}

As in part 2, the key is being able to string together the components with minimum fuss. We can easily test the construction of the streams, the merge function, and the limiting via takeWhile (we’ve exercised foldLeft in the previous post). Because there is no shared state, composing them does not alter their behavior.

Twice-told tales

A persistent myth is that high-level languages require more resources than lower-level languages. Although the stream-based solution uses objects to manage future state, that was an explicit tradeoff to avoid producing unsuccessful candidate values. If we are really concerned with performance, functional programming will still support us as we manually optimize our solution.

A stream of multiples can be represented simply by the head of the virtual stream (the next multiple) and the increment. Iterating through the stream is accomplished by adding the increment to the head, so dropWhile amounts to deciding whether the next head of a stream should be the incremented head or the current head.

Applying those special-case representations to our merging strategy, we get this solution:

  def sumRec(a: Int, b: Int)(n: Int) = {
    def iter(ma: Int, mb: Int, t: Int): Int = {
      val m = ma min mb
      def next(mx: Int, x: Int) =
        if (mx == m) mx + x else mx
      if (m < n)
        iter(next(ma, a), next(mb, b), t + m)
      else
        t
    }
    iter(a, b, 0)
  }

The local variable m is the head of the merged streams, the inner next function accomplishes the equivalent of dropWhile, and the iter function simultaneously merges the streams (represented by ma and mb) and accumulates the total (in t).

The iter function is directly tail-recursive; it either terminates with the result or recursively invokes itself with adjusted arguments. Because of this, the Scala compiler actually compiles iter into a tight loop occupying only a handful of JVM bytecodes. It is by far the fastest of all the solutions we’ve examined!

Take my code, please!

For me, there’s a punch line to this series. As implied above, I thought of this last version only after writing sumStream; I created sumRec by rethinking the code to take advantage of the special cases in the problem. I find it interesting that the fastest function was created by first completely ignoring performance and thinking about making the design as clean and modular as possible.

As I’ve believed for years, it’s easier to make correct code faster than to make fast (but buggy) code correct.

2008-04-09 Posted by joelneely | Project Euler, Scala, functional programming | | 4 Comments