[PRL] basic terminology: abstraction

Thu Aug 21 12:37:24 EDT 2003

This is a little embarrassing, but I'm not sure I understand some of the 
most basic vocabulary we use in CS. It seems that the term "abstraction" is 
badly abused: sometimes people use it when they really mean modularity, 
sometimes when they mean encapsulation. I think encapsulation is pretty 
clear and I'm not as much interested in that at the moment. But there are 
three other concepts I can think of, and I'm not sure how to map them to 
the correct terms.

(1) generalization -- I think this is the real meaning of abstraction; it's 
the process you go through when you pull out the common parts of multiple 
specific parts of a program (computation, system, idea, etc.) into a more 
general version, which is parameterized on the parts that vary across the 
specific instances. E.g., in FP, functions are the primary mechanism for 
generalization: first-class functions generalize common calculations, and 
higher-order functions are generalized calculations parameterized on other 
calculations.

(2) modularity -- Separating mutually independent parts of a program so 
that (a) the code becomes more legible (you are only ever looking at one 
conceptually cohesive part of the program at once) and (b) changing the 
program is easier because changes are localized to the module to which they 
relate.

(3) layering -- This is the one I don't quite know how to classify. 
Sometimes you hear "levels of abstraction" and "abstraction boundaries," 
but I'm not sure that abstraction is the right term (I'm not sure it's not, 
either). Typical examples are the view of systems as layers of languages or 
virtual machines (e.g., gates -> microcode -> assembly -> OS -> JVM -> Java 
application) or the OSI network protocol stack. In a sense this is 
modularity, because it's separating a system into conceptually disjoint 
pieces, but it's vertical rather than horizontal, so I think it's worth 
distinguishing from modularity. I don't, however, see how this relates to 
the idea of generalization.

I'm most unclear on the third term. I've always felt uncomfortable with the 
term "abstraction boundaries" -- I have an intuitive sense of when a smelly 
program crosses these boundaries, but I'm not comfortable with the label.

A recent example of where using the word "abstraction" for these two 
different ideas leads to confusion is Guido van Rossum's addled blog entry 
about loops and recursion:

    I'd say that a loop is a higher-level concept than recursion; recursion
    is more powerful, but also more low-level, like assembly language. ...
    I had my epiphany about loops as a higher level of abstraction than
    recursion ...

    <http://www.artima.com/weblogs/viewpost.jsp?forum=106&thread=4550>

If loops are higher-level than recursion, then they are a *specialization* 
of recursion. Calling them a "higher level of abstraction" makes it sound 
like they are *more* abstract, when in fact he's saying the opposite. (The 
result is that anyone who's heard Matthias's lecture on "Scheme's 241 
Loops" choked on their breakfast upon reading Guido's blog.)

Thanks,
Dave