Sliding up the banister

Sneaking up on functional programming

Design by proof

The idea of proving programs correct has been around (and hotly debated) for roughly forty years. The irony is that some proof advocates anticipated by more than thirty years what the agile advocates are now saying:

Designing with verification in mind improves the quality of the resulting design.

To be even more specific, compare these statements:

Design the proof,
then write the code so that
it meets the obligations of the proof.
  Write the test first,
then write the code so that
it passes the test.
Edsger Dijkstra   (paraphrased)   Kent Beck   (paraphrased)

I’ve experienced the benefits of following both pieces of advice. Let me apply this approach to a little bit of design which I’ll then use in the Project Euler series. There’s a small amount of algebra involved, but if you don’t want to bother with that, let me request that you at least look at the last section for the punchline.

Sum thing

Consider the following sums, where i in each case ranges from 1 to n:

Power Sum Expanded form Closed form Factored form
0 ∑ i0 1 + 1 + 1 + … + 1 n1 n
1 ∑ i1 1 + 2 + 3 + … + n 1/2 n2 + 1/2 n1 n(n+1)/2
2 ∑ i2 12 + 22 + 32 + … + n2 ?
3 ∑ i3 13 + 23 + 33 + … + n3 ?

Let’s assume that we need the solutions for the last two rows (and Google is down at the moment ;-) ). Using the “designing with verification in mind” approach, we can work them out for ourselves.

Sum of squares

Noticing that ∑ i0 is a polynomial of degree 1, and ∑ i1 is a polynomial of degree 2, we might suspect that the sum of i2 is a polynomial of degree 3. If so, the problem amounts to finding the coefficients for ∑ i2 = an3 + bn2 + cn + d. Let’s assume that as the structure of the solution, and apply test cases to that assumption.

Test case 0: The sum must be zero when n is zero; a03 + b02 + c0 + d = d, therefore d must be zero. That streamlines the polynomial to ∑ i2 = an3 + bn2 + cn.

Test case 1: Adding (n+1)2 to the sum for n must be equal to the result of substituting (n+1) for n in the polynomial.

  an3 + bn2 + cn + (n+1)2
an3 + (b+1)n2 + (c+2)n + 1
and  
  a(n+1)3 + b(n+1)2 + c(n+1)  
an3 + (3a+b)n2 + (3a+2b+c)n +(a+b+c)

These two reduced expressions can only be equal if we can satisfy the equations implied by the coefficients for n2, n1, and n0:

n?   Coefficients   Satisfied if
n2   b+1 = 3a+b   a = 1/3
n1   c+2 = 3a+2b+c   b = 1/2
n0   1 = a+b+c   c = 1/6

So we have a solution of 1/3 n3 + 1/2 n2 + 1/6 n, which we can factor to n(n+1)(2n+1)/6.

Sum of cubes

Based on that success, we’ll pursue the coefficients for ∑ i3 = an4 + bn3 + cn2 + dn + e.

Test case 0: The sum must be zero when n is zero; a04 + b03 + c02 + d0 + e = e, therefore e must be zero, therefore ∑ i3 = an4 + bn3 + cn2 + dn.

Test case 1: Adding (n+1)2 to the sum for n must be equal to the result of substituting (n+1) for n in the polynomial.

  an4 + bn3 + cn2 + dn

+ (n+1)3
an4 + (b+1)n3 + (c+3)n2 + (d+3)n + 1
and  
  a(n+1)4 + b(n+1)3 + c(n+1)2 + d(n+1)  
an4 + (4a+b)n3 + (6a+3b+c)n2 + (4a+3b+2c+d)n + (a+b+c+d)

Again, we look at the coefficients for descending powers of n:

n?   Coefficients   Satisfied if
n3   b+1 = 4a+b   a = 1/4
n2   c+3 = 6a+3b+c   b = 1/2
n1   d+3 = 4a+3b+2c+d   c = 1/4
n0   1 = a+b+c+d   d = 0

So we have a solution of 1/4 n4 + 1/2 n3 + 1/4 n2, which we can factor to (n(n+1)/2)2.

The punchline

“That’s no exponential, that’s my polynomial!” But seriously, folks…

The mathematicians among us–those who haven’t died of boredom–will recognize what we’ve done as an informal (hand-waving) inductive proof.

But I hope that the programmers among us will recognize what we’ve done as test-driven development and (re)factoring. Allowing verification to drive design offers a number of benefits, including:

  • The result itself is verifiable.
  • Each step along the way is more obvious and involves less risk.
  • The result is usually cleaner and more understandable.

Incidentally, I first encountered the term “factoring” applied to programs in Leo Brodie’s book Thinking Forth (1984). Given my background in Mathematics, it immediately clicked with me. Brodie introduced the idea with a quotation which seems entirely contemporary, once we get past the dated terminology:

“If a module seems almost, but not quite, useful from a second place in the system, try to identify and isolate the useful subfunction. The remainder of the module might be incorporated in its original caller.”

(The quotation is from “Structured Design“, by W.P. Stevens, G.J. Myers, and L.L. Constantine, IBM Systems Journal, vol. 13, no. 2, 1974, © 1974 by International Business Machines Corporation)

As a working developer, I’m excited by the potential of these techniques and the underlying ideas. But as a card-carrying member of the Sid Dabster fan club, I’m constantly amazed at the number of ideas that were actively in play in our field thirty or forty years ago that are now regarded as new, innovative, or even radical.


Recommended reading:

2008-04-30 Posted by joelneely | Mathematics, Project Euler, agile development, algorithms, tutorial | | 2 Comments

JPR 2008: Wednesday morning, part 1

This post resumes coverage from my notes on the Java Posse Roundup, 2008 conference.

Agile methodologies on large projects

The conventional wisdom is that agile development is a game for small teams. This session examined that issue, informed by the experience of attendees who have worked on large development projects.

Getting started on large projects

The discussion began with some opinions and suggestions which apply to getting started with agile methodologies:

  • Agile works best (only?) if it is driven from the bottom up.
  • Refactoring requires permission and inclination:
    • Individual developers must be allowed to make pervasive changes to the codebase.
    • Individual developers must be willing to follow the consequences of a change through to the end.
  • Lead by example.
    • Start with a pilot project with a small number of developers.
    • Use the success of that project as a “poster child” to the rest of the organization.
  • Modularity is a key for success.

A couple of the participants had been involved in a large, distributed project. The team comprised 500 developers (in a company which had a total of 1200 developers) located in India, Israel, the Bay area, Houston, and Europe. They were able implement continuous integration across that size and geographic distribution.

Agile is stereotypically described in terms of small teams. What is “large”?

  • Small: one team
  • Medium: 2–3 teams
  • Large: more than 3 teams

One effort discussed had a limit of 6 people per team.

How does a project achieve modularity and team structure? One choice is to spend an iteration on defining the modular structure. Some agile methodologies emphasize delivering working, business-value code in each iteration. But kick-starting a multi-team effort may require using an iteration to get structure (of the modules and the teams!) in place.

However, there can be built-in tension with a layered effort that separates “framework” from “application”. The framework team(s) are naturally going to have a bias in favor of flexibility, to make the framework as general as possible. The application team(s) are naturally going to have a bias in favor of stability, to minimize the rework of application code due to framework evolution.

I think we can generalize this to any agile multi-team scenario where there’s a common interface or (especially) a producer-consumer relationship. It’s crucial to maximize communication and minimize “us-versus-them” thinking. Otherwise it will be natural to see “my team’s changes” as natural or necessary agile evolution, versus “your team’s changes” as “churn”.

There was some discussion of the roles of “gatekeeper” and “contributor”, and the separation of those roles as a way to help achieve stability. However, needs to be balanced by strong advocacy for the view that “anybody can fix the code” if the overall effort is to remain agile.

Engaging the business owner

The discussion turned to the non-developer aspects of an agile project. Getting a product owner may be much harder than getting buy-in from the technical side. There may be a greater need for education, as developers may be more likely to be familiar with the concepts of agile projects. However, the pain is still less than that of being stuck in waterfall mode. The consensus was to get the business side involved as early and thoroughly as possible.

There’s another contrast here. Because business value is achieved by integrating the total effort, the business owner for a large project is likely to fit most naturally at a cross-group level. However, agility demands level-of-effort to be understood at a fairly fine-grained level. Mike Cohn’s books on agile were highly recommended. I’ve put links to a couple of his books at the end.

Estimating and scoping

Estimates are critical to the communication from developers to the product owner and to planning (at all levels). How does the team come up with estimates level of effort?

Most of the participants leaned toward estimating in “points” rather than hours or other calendar units. One common scheme is to assess points along a Fibonacci scale: 1, 2, 3, 5, 8, 13, 21, etc. What’s the difference between a 5 and an 8? It’s simply the developers’ best-effort assessment of relative effort based on complexity. Product backlog is measured in points, and the sprint backlog is extracted from that. The planning conversation addresses the questions:

  • How long is a sprint?
  • What is our velocity (points per sprint)?
  • How many sprints will it take for this feature?

The focus on continuous learning, communication, and adjustment makes this a very different conversation from the stereotypical “Gannt chart mentality” that just wants to put milestones on a calendar, set in concrete. Priorities can adjust, based on experience earned during the project.

Barry Hawkins pointed out that an important concept to keep in front of the product owner is: “You can have anything you want, but you can’t have everything you want!” It’s also critical to communicate the concept of “technical debt” to the product owner. (There are some good on-line discussions of technical debt which provide a comprehensive discussion, a quick, business-oriented metaphor, some nice visual aids, and reminders of the development process aspects.) Participants also referenced Mary Poppendiek’s work on lean software.

Barry also offered the view that agile methodologies don’t address some issues that become important on larger projects, including the cross-feature dependencies and how to partition functionality. He suggested domain-driven design as a good fit with agile development practices as a way to address these issues.

As the session was wrapping up, Barry offered a summary that seems applicable to any introduction of agile practice. He has observed that politics only diminished after the entire team (product owner, development, etc.) had a backlog of experience that showed the productivity benefits of the agile approach.


Recommended reading

2008-04-16 Posted by joelneely | JavaPosse, agile development | | No Comments Yet