Why Data Structures Matter

Our experience on Day 0 of JPR11 yielded a nice example of the need to choose an appropriate implementation of an abstract concept. As I mentioned in the previous post, we experimented with Michael Barker’s Scala implementation of Guy Steele’s parallelizable word-splitting algorithm (slides 51-67). Here’s the core of the issue.

Given a type-compatible associative operator and sequence of values, we can fold the operator over the sequence to obtain a single accumulated value. For example, because addition of integers is associative, addition can be folded over the sequence:

1, 2, 3, 4, 5, 6, 7, 8

from the left:

((((((1 + 2) + 3) + 4) + 5) + 6) + 7) + 8

or the right:

1 + (2 + (3 + (4 + (5 + (6 + (7 + 8))))))

or from the middle outward, by recursive/parallel splitting:

((1 + 2) + (3 + 4)) + ((5 + 6) + (7 + 8))

A 2-D view shows even more clearly the opportunity to evaluate sub-expressions in parallel. Assuming that addition is a constant-time operation, the left fold:

foldLeftPlain.jpg

and the right fold:

foldRightPlain.jpg

require linear time, but the balanced tree:

foldTreePlain.jpg

can be done in logarithmic time.

But the associative operation for the word-splitting task involves accumulating lists of words. With a naive implementation of linked lists, appending is not a constant-time operation; it is linear on the length of the left operand. So for this operation the right fold is linear on the size of the task:

foldRightLinear.jpg

the left fold is quadratic:

foldLeftLinear.jpg

and the recursive/parallel version is linear:

foldTreeLinear.jpg

Comparing just the “parallel-activity-versus-time” parts of those diagrams makes it clear that right fold is as fast as the parallel version, and also does less total work:

compareLinear.jpg

Of course, there are other ways to implement the sequence-of-words concept, and that is the whole point. This little example provides a nice illustration of how parallel execution of the wrong implementation is not a win.


The title of this post is a not-very-subtle (but respectful) reference to “Why Functional Programming Matters“, an excellent summary by John Hughes that deserves to be more widely read.

Purely Functional Data Structures, by Chris Okasaki, covers a nice collection of algorithms that avoid the trap mentioned above.

41XlPaC+ZqL._SL160_.jpg

Also, video from this year’s Northeast Scala Symposium is now on-line for the session on functional data structures in Scala, presented by Daniel Spiewak.

Trackbacks are closed, but you can post a comment.

Comments

  • Sharkey  On 2011-03-06 at 09:53

    Comparing just the “parallel-activity-versus-time” parts of those diagrams makes it clear that ***right*** fold is as fast as the parallel version, and also does less total work.

    I believe you meant “right” fold. Other than that, excellent post!

    • joelneely  On 2011-03-06 at 11:30

      Yes, thanks for catching my typo. I’ve corrected it.

  • Blogreader  On 2011-03-06 at 10:58

    How many addition steps are done in the balanced tree approach vs unbalanced? It looks the same to me.
    If you can do all of the additions on one row at the same time you’ll have logN steps.

    • joelneely  On 2011-03-06 at 11:39

      In the case of e.g. addition, the total work is the same for all three (ignoring communication overhead). But for the list append operation in the last part of the post (using a naive implementation), there is extra work due to traversing the accumulating left-side elements more than once. The total number of “steps” is the same (if one append counts as one step), but the amount of each work done per step increases moving from the leaves to the root.

  • Fred  On 2011-03-06 at 17:01

    Great post, what did you use to make those graphs?

    • joelneely  On 2011-03-06 at 20:03

      Thanks, Fred! The diagrams were made with OmniGraffle.

  • Edward  On 2011-03-09 at 13:18

    Hi @Joelneely,

    Please forgive my ignorance, can you kindly explain why this is so? I.e. “right fold is linear on the size of the task … the left fold is quadratic:”?

    Many thanks!

    Ed

    • joelneely  On 2011-03-12 at 12:32

      Hi, Edward,

      The naive implementation of immutable lists will walk its left argument, creating a copy as it goes, ending with a reference to its right argument when the left copy bottoms out.

      Looking at the tree, we can see that the right fold always has a left argument of length 1, so 1+1+…+1 is linear. But the left fold has progressively longer left arguments because the accumulation is happening on the left, giving 1+2+3+…+(n-1), which is quadratic.

  • discretestates  On 2012-01-06 at 22:01

    When discussing the different ways to perform the sums of the integers, you claim that the balance tree approach is performed in log N time. This is not correct — count your number of additions. You still have to perform every addition. The only way it could be logarithmic in time is if you threw out half of the values in every steps — e.g., the way a binary search works. Obviously, with the addition, you don’t do that.

    • joelneely  On 2012-01-07 at 07:39

      Actually, it is correct — count the number of levels in the diagram. The four additions at the bottom of the tree are all done in parallel, the two additions above those sums are done in parallel, and the final addition at the root is done last. All seven additions are still performed, but in only three units of time.

      • discretestates  On 2012-01-07 at 11:54

        You’re mixing issues. N additions are still being made.

        If you have enough processors relative to the size of your data set, then the execution time will come out logarithmic. If you don’t have enough processors, then it won’t time logarithmic time. Long story short, I feel that there are several assumptions in that analysis which you just brushed over without explaining.

Leave a comment