Wednesday, July 30, 2008

Forcing Abstraction

I just read the post: Rail Spikes: Functional programming and looping by Jon. It's a nice post, and I completely agree - you should never write for/while/do-while loops.

My gripe here is that this post has to exist at all. Everyone should know this, it is fundamental to have collections and operate on the collection - not using indices into collections1. IMO, this should be taught from day 1 in programming courses. Kudos to Ruby for getting the syntax clean - the left to right parsing is easier on (English reading) programmers than Lisp's inside-out parsing.

It seems to me that "abstraction" is generally applied to only classes people write, and not much thought is given to the lower-level usages. In this case: looping constructs.

I find it interesting that many people won't move away from for/while/do-while loops. In my previous job, people actively resisted such efforts. I remember introducing BOOST_FOREACH into our C++ development (a poor replacement for map, but a step in the right direction), and I had to pull teeth to get folks to use it. And, as awkward as the STL algorithms may be to use, they're still easier than rolling your own loop to remove elements.

I think Jon has taken a (permanent) step up the abstraction layer, he's moved away from thinking about the mechanics of iteration to focusing on what he wants to get done. Isn't that what we're taught in our first class on abstraction?

The main reasons people seem to give for not taking the step Jon has are:

  1. language doesn't support it (Java/C++/C/whatever)
  2. need for speed
  3. for loops are easy

Language. The first is legitimate, and I don't see an easy way around this. I find it sad when languages do not allow this, but my whining doesn't make the problem go away. Let's move on.

Speed. Funny how most everyone thinks their code just needs to be fast and they just know the code they're writing is fast. We should all trust Knuth when he said "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." If you really think you're smarter than Knuth, there's nothing I can say that will ever help you.

Easy. The third point, in my opinion, boils down to short-sightedness. People can be too narrow-minded (lazy) to want to learn something new. They might not see the fact they could remove an entire class of bugs from their code, forever. Folks don't realize the visual/mental clutter for loops add to their code. And, most importantly, for loops focus on the wrong thing (the indices, as opposed to the transformation) - see abstraction.

On a related abstraction tangent, a colleague of mine at Intel started writing a geometric template library for C++. One of the guiding ideas (not novel, but perhaps pushed to the extreme) was to remove 'if' statements from the code, as well as to remove all references to X/Y/Z. As an example, you might want to know whether the points a,b,c form a concave right angle2. One way is to have four different checks looking like:
if(
a.x == b.x &&
a.y < b.y &&
b.y == c.y &&
b.x < c.x
) || ...

repeated 3 more times with slight variation. Or, you could write a one-liner like so:

bool concave = (a.towards(b).left() == b.towards(c));

Teams of people were shown this example, contrasting 20 lines of code against the one-liner, and the response was almost universally, "so? what's the problem?"

I am truly stunned and left speechless when people cannot comprehend the difference in readability/maintainability of 20 lines of (mind-numbingly-repetitive-and-error-prone) code versus one line. We can debate whether the single line is simply readable or more elegant, but surely the 20 lines can be agreed as horrific.

After finally bridging that gap, people immediately ask why the GTL library does not provide direct access to the X/Y/Z coordinates. The question is usually phrased as, "Ok, ok, the one-liner is better. But what if I want to use the X/Y version?" Nobody has given an example where the isotropic (aka coordinate-free) code fails to provide something the X/Y interface provides. Yet people still cling to wanting that interface3.

I'm of the opinion we should force the level of abstraction up.

New languages should not have for/while/do-while loops, and (in this example), a geometric library should not provide an interface to X/Y/Z coordinates.

Personally, I'd rather not work with you if you can't wrap your mind around a higher level of abstraction4.

Footnotes:

  1. Sure, there are occasions in high-performance code where you might need to deal directly with indices, or write a for loop, but 97% of the time you don't, so give it up.

  2. This example assumes manhattan geometry, a common assumption in the EDA world.

  3. Yes, when you draw a rectangle, it might be clearer to access the X/Y coordinates directly. However, you'd better write that code in only one location. And why isn't your graphics library providing this for you in the first place?

  4. Interestingly, the GTL library provides significant speed improvement over other libraries because its isotropic bent allowed it to get rid of if statements which kill performance. I also found this post on high level optimization interesting, as well as Yegge's talk on dynamic languages (specifically on optimizations they can make).

3 comments:

jon said...

Nice post. Ruby could do without its while loop and no one would notice the loss. I think some bare looping functionality is useful (ping a server every 10 seconds to see if it is alive), but even that could be eliminated through recursion, right?

I also think that Yegge's story about the 15 million line assembler program is a perfect example of why abstraction is generally more important than speed, or at least should be chosen by default.

Unknown said...

Though the basic idea is nice (abstractions lead to more concise and readable code), one must remember that abstractions do tend to leak and that one shouldn't accidentally give away resources (speed/memory). Maybe you will be interested in this article about optimization.

Unknown said...

There's actually an entirely-valid reason for the use of numeric offsets into arrays: when you're writing C/C++/Java, and have arrays that are in fact simply contiguous memory buffers. Abstraction is great for application code, but sometimes you're doing low-level systems work that really does need to peek and poke at individual bytes of memory.

That being said, the template support in C++ and generic types + inner classes in Java make a lot of higher-level visitor-based iteration possible, if programmers care to use it.