The burden of FP, part 3

Inside out

This series has used the first problem from Project Euler to look at different programming styles; Part 1 asked questions about the obvious imperative solution, and Part 2 used higher-order functions in a functional approach. This part considers performance to take us in a different direction.

Despite their differences in style, all of the previous solutions had the following scheme in common:

ProjectEuler-1-a.jpg

Beginning with the positive integers below a given limit, select those that are multiples of two given values, and compute the sum of the selected set. The higher-order function approach allowed us to think about each of the components individually, composing them into a solution. However, this basic generate-select-process pipeline may hide some wasted effort.

There and back again

First, notice that problem specified multiples of 3 and 5 but the filtering used division, in the disguise of remainder calculation. This works only because the multiple-of property has an easy-to-compute inverse.

Second, just under half of all positive integers (7/15 to be exact) actually fall into the selected set. The generate-and-test scheme becomes more wasteful as the yield becomes smaller. For example, if the problem had been to sum multiples of 97 and 113, then most of the generated numbers would be discarded.

Thinking along these lines suggests another scheme, illustrated below:

ProjectEuler-1-b.jpg

Instead of generating and discarding failed candidates, this approach directly generates the two sets of multiples (e.g., of 3 and 5). Merging those two sets will ensure that duplicates (e.g. 15) are only included once. The upper limit is applied to the merged values, and the results are summed.

I want to maintain separation of concerns here, so I’m going to pretend that we don’t know how to limit the sets of multiples (i.e., that we can’t easily compute the largest multiple of 3 less than 1000). In other words, I’m going to take this opportunity to model the multiples as unbounded sequences, using Scala’s Stream type, which provides a lazy list.

Given integer expressions init and incr, evaluating Stream.from(init, incr) returns the stream of values beginning with init and stepping by incr. For example:

  scala> val odds = Stream.from(1, 2)
  odds: Stream[Int] = Stream(1, ?)
  scala> odds take 20 print
  1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, Stream.empty

So the stream of multiples of a given value are easy to produce:

  scala> val threes = Stream.from(3, 3)
  threes: Stream[Int] = Stream(3, ?)
  scala> threes take 20 print
  3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, Stream.empty

So much for the first part of this design!

Don’t merge the streams!

Ignoring the above advice, here’s a simple recursive function to merge two streams of increasing integers:

  def merge(as: Stream[Int], bs: Stream[Int]): Stream[Int] = {
    val h = as.head min bs.head
    Stream.cons(h, merge(as dropWhile {_ == h}, bs dropWhile {_ == h})
    )
  }

At any point, the merged stream contains the minimum of the input streams followed by merging whatever follows that minimum in both of the original streams.

That kind of terse, recursive definition is fairly common in FP, and is sufficiently unfamiliar to some that it may be a bit off-putting. At the risk of using a tired analogy, it’s like bike-riding; once you can stay on long enough to get down the driveway, you’ve got it made. (And it’s worth the effort, because you can outrun the kids that are still on foot! πŸ˜‰

As proof of that point, we now have everything we need for the solution to problem one:

  def sumStream(a: Int, b: Int)(n: Int) =
    (merge(Stream.from(a, a), Stream.from(b, b)).takeWhile{_ < n}.foldLeft (0)) {_ + _}

As in part 2, the key is being able to string together the components with minimum fuss. We can easily test the construction of the streams, the merge function, and the limiting via takeWhile (we’ve exercised foldLeft in the previous post). Because there is no shared state, composing them does not alter their behavior.

Twice-told tales

A persistent myth is that high-level languages require more resources than lower-level languages. Although the stream-based solution uses objects to manage future state, that was an explicit tradeoff to avoid producing unsuccessful candidate values. If we are really concerned with performance, functional programming will still support us as we manually optimize our solution.

A stream of multiples can be represented simply by the head of the virtual stream (the next multiple) and the increment. Iterating through the stream is accomplished by adding the increment to the head, so dropWhile amounts to deciding whether the next head of a stream should be the incremented head or the current head.

Applying those special-case representations to our merging strategy, we get this solution:

  def sumRec(a: Int, b: Int)(n: Int) = {
    def iter(ma: Int, mb: Int, t: Int): Int = {
      val m = ma min mb
      def next(mx: Int, x: Int) =
        if (mx == m) mx + x else mx
      if (m < n)
        iter(next(ma, a), next(mb, b), t + m)
      else
        t
    }
    iter(a, b, 0)
  }

The local variable m is the head of the merged streams, the inner next function accomplishes the equivalent of dropWhile, and the iter function simultaneously merges the streams (represented by ma and mb) and accumulates the total (in t).

The iter function is directly tail-recursive; it either terminates with the result or recursively invokes itself with adjusted arguments. Because of this, the Scala compiler actually compiles iter into a tight loop occupying only a handful of JVM bytecodes. It is by far the fastest of all the solutions we’ve examined!

Take my code, please!

For me, there’s a punch line to this series. As implied above, I thought of this last version only after writing sumStream; I created sumRec by rethinking the code to take advantage of the special cases in the problem. I find it interesting that the fastest function was created by first completely ignoring performance and thinking about making the design as clean and modular as possible.

As I’ve believed for years, it’s easier to make correct code faster than to make fast (but buggy) code correct.

Advertisements
Trackbacks are closed, but you can post a comment.

Comments

  • Russ  On 2008-04-11 at 02:51

    Hi Joel,

    I’ve really enjoyed this series, are you going to do any more? You’ve somewhat trumped me for detail and analysis, I’ll have to up my game for problem 4 πŸ˜‰

  • OJ  On 2008-07-23 at 21:10

    This is similar to the extra answer I gave in my blog post which looks like this:

    sum $ [3,6..999] ++ [5,20..999] ++ [10,25..999]

    Performance-wise I think it’s a definite win. Though I haven’t verified it by running it through a profiler πŸ™‚

    Cheers!

  • clinton  On 2008-09-07 at 04:19

    FIX (sorry, less than symbol didn’t show up, i think cuz no html tags)
    what about this for a scheme

    A –> [multiples (a, 2a, …)] –> [less than n] –> [sum] –> Z (warp bcuz cant do angled lines)

    B –> [multiples (b, 2b, …)] β€”> [less than n] –> [sum] –> Z (warp bcuz cant do angled lines)

    Z –> [sum] ——–>

    i think it avoids merging and also keeps it functional rather than sequential.

  • joelneely  On 2008-09-07 at 07:11

    @clinton: If I understand your proposed scheme, it has a bug in that duplicates (e.g. when using multiples of 3 and 5, multiples of 15) get accumulated twice.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: