Daily Archives: 2008-04-24

A composite function

The previous post began working with problem 3 from Project Euler. The allFactors function produced a list of the prime factors of its argument (all occurrences of all prime factors), such as:

  scala> val nums = allFactors(96000)
  nums: List[Long] = List(5, 5, 5, 3, 2, 2, 2, 2, 2, 2, 2, 2)

It would be nice to have a representation that’s a bit closer to the conventional 53 × 31 × 28, which makes clear the exponent for each prime factor.

In thinking about solutions to that issue, we get a prime use case for Scala’s composite nature as a hybrid OO/FP language. In addition, we encounter another heuristic for avoiding errors (functional or otherwise).

A separate piece

One easy solution is to write a separate function that collapses a list of values into a list of value–count pairs, for which we can define a type alias as:

  type PrimePower = (Long, Int)

The partition method on List is perfect for our current need; given a condition (represented as a boolean-valued function), it splits the list into the prefix that satisfies the condition and the remainder.

  scala> nums.partition(_ == 5)
  res82: (List[Long], List[Long]) = (List(5, 5, 5),List(3, 2, 2, 2, 2, 2, 2, 2, 2))

Of course, the length of the prefix is the exponent, so the function almost writes itself:

  def groupCount(fs: List[Long]): List[PrimePower] = fs match {
    case Nil => Nil
    case p :: _ =>
      val (ps, rest) = fs.partition(_ == p)
      (p, ps.length) :: groupCount(rest)
  }

If factor list is empty, the result is an empty list. On the other hand, if fs begins with some prime p, we split the prefix containing all ps from the rest of the list, and return a list containing the PrimePower for p consed onto the groupCounted rest of the original list.

  scala> groupCount(nums)              
  res86: List[()] = List((5,3), (3,1), (2,8))

The advantages of writing this as a separate function are ease of testing and availability for re-use. On the other hand, it would be interesting to see the consequences of directly producing the prime powers.

One list to bind them

For reference, here’s the function from last time, with the inner alternatives labeled for reference and highlighting on the parts we need to consider:

  def allFactors(t: Long) = {
    def loop(q: Long, fs: List[Long], d: Long): List[Long] =
      if (q == 1)
        fs                        // case 1
      else if (q < d * d)
        q :: fs                   // case 2
      else if (q % d == 0)
        loop(q / d, d :: fs, d)  // case 3
      else
        loop(q, fs, d + 1)        // case 4
    loop(t, Nil, 2)
  }

The result will be List[PrimePower], which is the new type of the second argument and result of loop.

Case 2 deals with a quotient which we know to be prime, because all possible divisors have been exhausted. Given that q occurs exactly once, we replace q :: fs with (q, 1) :: fs as the result.

Case 3 requires the most thought. It only accounts for a single occurrence of a divisor; q / d removes one factor of d from q and d :: fs saves that occurrence in the list being accumulated.

Help me, Rhonda!

We could use a helper function to handle d, based on whether it is another occurrence of the previous divisor or a new divisor not previously seen. Because that distinction is based on data in the list, it seems natural to drive the logic by matching on the list:

  def countFactor(f: Long, fs: List[PrimePower]) = fs match {
    case (d, c) :: rest if f == d => (d, c + 1) :: rest
    case _                        => (f, 1) :: fs
  }

To test countFactor, we need to iterate it over a sequence of values for f with fs initially Nil. This looks like a job for a fold!

  def testCountFactor(ls: List[Long]) =
    ls.foldRight(Nil: List[PrimePower])(countFactor(_, _))

In the interest of space, I won’t include all the test scenarios: Nil, a list of length 1, all values equal, all values different, etc. As the old saying goes, checking those is “left as an exercise for the reader”.

True confessions

Substituting a call to countFactor into case 3, along with the other changes already discussed, gives us this…

  def allFactorsCounted(t: Long) = {
    def loop(q: Long, fs: List[PrimePower], d: Long): List[PrimePower] =
      if (q == 1)
        fs                                    // case 1
      else if (q < d * d)
        (q, 1) :: fs                          // case 2
      else if (q % d == 0)
        loop(q / d, countFactor(d, fs), d)    // case 3
      else
        loop(q, fs, d + 1)                    // case 4
    loop(t, Nil, 2)
  }

…which behaves as follows…

  scala> val nums2 = allFactorsCounted(96000)                                            
  nums2: List[()] = List((5,1), (5,2), (3,1), (2,8))

…or, I should say, “misbehaves as follows”! What went wrong?

The root cause of this error is the special-case code in case 2, which is doing two things at once: handling a prime factor (in its own way!) and terminating the recursion. Put another way, the design breaks the symmetry between cases 2 and 3, both of which identify a prime factor and do something with it. We can make that symmetry more obvious by rewriting case 2 as in the following:

  def allFactorsCounted(t: Long) = {
    def loop(q: Long, fs: List[PrimePower], d: Long): List[PrimePower] =
      if (q == 1)
        fs
      else if (q < d * d)
        loop(1,   countFactor(q, fs), d)
      else if (q % d == 0)
        loop(q / d, countFactor(d, fs), d)
      else
        loop(q, fs, d + 1)
    loop(t, Nil, 2)
  }

And now each case in the inner loop has exactly one responsibility:

  1. Terminates the computation because all factors have been found;
  2. Removes q as a known factor;
  3. Removes d as a known factor;
  4. Moves past an unsuccessful divisor.

Mixing concerns in a single part of the code creates these risks:

  • The code can become harder to understand;
  • Symmetries can be hidden or broken;
  • Change driven by (only) one concern can have unintended consequences.

For me, the moral of this error is this:

Each part of the code (especially alternatives within a single function) should “Do one thing, and do it well!”

An object lesson

In the OO world, primitive obsession is regarded as a code smell. Notice that q and fs together represent the current state of our factorization; we have the invariant that q times the product of fs equals the original number to be factored! Failing to maintain that invariant would likely break the code. The clue to watch for is that the (tail-)recursive calls of the inner loop function either alter both of those parameters or neither of them. This is telling us that they are related aspects of a single concept.

OO gives us a technique for managing related data: put them in an object as private fields, expose the minimum methods required by the object’s client, and ensure that those methods maintain the invariants. We can refactor the quotient and factor list into the following class:

  class Factorization(val n: Long, val fs: List[PrimePower]) {
    def isComplete = n == 1
    def move(d: Long): Factorization = {
      def remove(i: Int, q: Long, d: Long): Factorization =
        if (q % d == 0)
          remove(i + 1, q / d, d)
        else
          new Factorization(q, (d, i) :: fs)
      if (n % d == 0)
        remove(0, n, d)
      else
        this
    }
  }

Each instance of Factorization represents some stage of the factorization of a number by n (the non-factored part) and fs (the factors previously found). An instance can tell whether it is a complete factorization (all factors have been extracted). Finally, an instance can return a Factorization resulting from move-ing all occurrences of a given divisor from the non-factored part to the factor list.

Notice that a Factorization is immutable; the result of a move will be the same instance if the divisor isn’t a factor, or a new instance resulting from the extraction of that divisor.

With this class defined, our factorization function is even more conceptually streamlined:

  def factorize(n: Long): List[PrimePower] = {
    def loop(f: Factorization, d: Long): Factorization = {
      if (f.isComplete)
        f
      else if (f.n < d * d)
        loop(f.move(f.n), d)
      else
        loop(f.move(d), d + 1)
    }
    loop(new Factorization(n, Nil), 2).fs
  }

If the factorization is complete, then its factor list is the desired result. Otherwise, ask the factorization to remove a candidate divisor and continue with the result of that request. The inner loop function is only concerned with managing the sequence of divisors until the factorization is complete.

So the second moral of this post is this:

Whenever part of a function’s parameter list represents some concept, consider making that explicit by using a class for the concept.

Doing so will allow that concept to be implemented and tested independently, and will improve the clarity of the remaining code. Although the example in this post is a very small programming task, I think it makes a nice demonstration of the value of Scala’s hybrid nature.

Follow

Get every new post delivered to your Inbox.