Category Archives: JPR

Why Data Structures Matter

Our experience on Day 0 of JPR11 yielded a nice example of the need to choose an appropriate implementation of an abstract concept. As I mentioned in the previous post, we experimented with Michael Barker’s Scala implementation of Guy Steele’s parallelizable word-splitting algorithm (slides 51-67). Here’s the core of the issue.

Given a type-compatible associative operator and sequence of values, we can fold the operator over the sequence to obtain a single accumulated value. For example, because addition of integers is associative, addition can be folded over the sequence:

1, 2, 3, 4, 5, 6, 7, 8

from the left:

((((((1 + 2) + 3) + 4) + 5) + 6) + 7) + 8

or the right:

1 + (2 + (3 + (4 + (5 + (6 + (7 + 8))))))

or from the middle outward, by recursive/parallel splitting:

((1 + 2) + (3 + 4)) + ((5 + 6) + (7 + 8))

A 2-D view shows even more clearly the opportunity to evaluate sub-expressions in parallel. Assuming that addition is a constant-time operation, the left fold:

foldLeftPlain.jpg

and the right fold:

foldRightPlain.jpg

require linear time, but the balanced tree:

foldTreePlain.jpg

can be done in logarithmic time.

But the associative operation for the word-splitting task involves accumulating lists of words. With a naive implementation of linked lists, appending is not a constant-time operation; it is linear on the length of the left operand. So for this operation the right fold is linear on the size of the task:

foldRightLinear.jpg

the left fold is quadratic:

foldLeftLinear.jpg

and the recursive/parallel version is linear:

foldTreeLinear.jpg

Comparing just the “parallel-activity-versus-time” parts of those diagrams makes it clear that right fold is as fast as the parallel version, and also does less total work:

compareLinear.jpg

Of course, there are other ways to implement the sequence-of-words concept, and that is the whole point. This little example provides a nice illustration of how parallel execution of the wrong implementation is not a win.


The title of this post is a not-very-subtle (but respectful) reference to “Why Functional Programming Matters“, an excellent summary by John Hughes that deserves to be more widely read.

Purely Functional Data Structures, by Chris Okasaki, covers a nice collection of algorithms that avoid the trap mentioned above.

41XlPaC+ZqL._SL160_.jpg

Also, video from this year’s Northeast Scala Symposium is now on-line for the session on functional data structures in Scala, presented by Daniel Spiewak.

JPR 2011 Day 0

Despite the winter storm and avalanche warnings in the Crested Butte area, the flight into Gunnison went off without a hitch. The last few miles of descent were as bumpy as an old dirt road, but the sky was mostly clear, as were the roads leaving the airport.

GunnisonSkyJpr11.jpg

A few miles uphill, the story was a bit different, but our Alpine Express driver got us to our door smoothly and safely.

RoadJpr11.jpg

Day Zero is traditionally reserved for explorations of new technologies, alternate languages on the JVM, and collaborative skills-building. I joined a group that focused on Scala. We talked about Guy Steele’s parallel string-splitting algorithm as presented at StrangeLoop 2010, got a demo of parallel collections in Scala 2.9, and worked on Scala koans.

ScalaKoansJpr11.jpg

After I pointed the folks at my table to Steele’s published slides and summarized the the algorithm, Michael Barker immediately re-coded it in Scala and we started looking at performance. More on that later.

Inspired by the excellent Ruby koans work led by Jim Weirich, Dick Wall had begun working on Scala koans at a previous year’s CodeMash. Dianne Marsh took up the reins; the informal team continues to refine and add to the material, and welcomes additional participants. The Scala koan collection is currently a work in progress; despite a rough edge or two, I really like this approach to ease into the thought processes of a language and very much appreciate the work of the team.

The end-of-day semi-lightning-talk presentations demonstrated a breadth of interests and subjects that will likely help shape this year’s Roundup:

  • Michael and I discussed our observations on the performance of the parallel string splitter;
  • Dianne and Dick described the Scala koans and their current statusy
  • Joe Sundow summarized the jQuery and JavaScript MVS explorations that he had led through the day;
  • James (“I’m just a Flex guy!”) Ward showed how he’s using Spring and Hibernate on a current demo project;
  • Fred Simon gave a quick demo of Fantom, including its ability to compile to JavaScript.

Handerson Gomes and Jim Hurne get my “Wow!” vote for a joint presentation which really made us all sit up and take notice. In the course of one day, they downloaded the Android development kit, got themselves up to speed, and built and tested a small app which include audio and use of the touch-screen gestures.

It was a great start to what promises to be an excellent week. In other words, it was typical for a Roundup!

From Whose Perspective?

Java Posse Roundup 2010 is now history, and I’m still digesting and pondering. But one “Aha!” moment was worth posting quickly.

During a discussion on productivity and job satisfaction, a participant stated a view that I suspect many of us have shared: “If I can get to the office early, I can get my work done before the distractions and interruptions begin.” After a moment in which many of us nodded appreciatively, Diane Marsh replied, “But it’s all work.

Earlier in the same session, several of us had mentioned “learning” or “helping others learn” as one of the joys of our craft. But it occurred to me that perhaps there was an implicit “…what I want to learn” tacked onto the end.

I am not defending pointless interruptions or feckless meetings. But I benefit from Diane’s reminder that the programmer’s equivalent of taking out the trash and washing the dishes are still valuable parts of the day. Flow is good. So is individual accomplishment. But so are balance and avoiding tunnel vision.