[PRL] In praise of mandatory indentation for novice programmers

Mitch mwand1 at gmail.com
Mon Feb 25 17:28:51 EST 2008


 Sent to you by Mitch via Google Reader: In praise of mandatory
indentation for novice programmers via Teaching, Playing, and
Programming by Chris Okasaki on 2/25/08
About four years ago, I created my own programming language for
teaching. I'll probably write more about this language at some other
time, but for now I want to focus on one feature of the language: the
use of mandatory indentation. My experience with this aspect of the
language has been so overwhelmingly positive that I will never again
voluntarily use a language without mandatory indentation for teaching
novice programmers.

Of course, sometimes the choice of language is not under my control.
Even when it is, there are always many different factors that go into
that choice. But no other single factor I've run across has greater
significance. For example, programming language afficionados spend
endless hours arguing about static vs dynamic typing, or functional vs
object-oriented languages, or strict vs lazy evaluation, or...you get
the idea. Those differences can indeed be important, but more so for
experienced programmers working on large projects than for novice
programmers working on classroom projects. None of these differences
individually comes close to the issue of indentation.

I say this with some pain, because I'm a programming languages guy
myself. I've taken part in some of those arguments, and spent many
hours contemplating the relative merits of many much deeper programming
language properites. It hurts me to say that something so shallow as
requiring a few extra spaces can have a bigger effect than, say,
Hindley-Milner type inference. I wish it weren't so, but that is what
my classroom experience tells me, loudly and unambigously.
Why not mandatory indentation?
The vast majority of languages don't make indentation mandatory.
Instead, they usually use explicit syntax to indicate block structure,
such as { and }, or BEGIN and END. Yet, if you look at well-written
programs in those languages, they are almost always indented sensibly.
Furthermore, there's remarkably little disagreement as to what
"sensible" indentation looks like. So why not make that sensible
indentation mandatory? There are several reasons that are often put
forth:
- It's weird. Because the vast majority of languages don't use it, most
programmers aren't used to the idea. Therefore, there's an initial
sense of unease.
- It messes up the scanner/parser. True, mandatory indentation is
harder to deal with using traditional scanners and parsers based
strictly on regular expressions and context-free grammars,
respectively. But it's usually trivial to modify the scanner to keep
track of indentation and issue an INDENT token when indentation
increases, and one or more OUTDENT tokens when indentation decreases.
The parser can then treat these tokens just like normal BEGIN/END
keywords. In this approach the scanner is no longer based strictly on
regular expressions, but most scanners aren't anyway (for example, when
dealing with nested comments). Using the INDENT/OUTDENT tokens, the
parser can still be based strictly on context-free grammars.
- Don't try to take away my freedom! Programmers are a pretty
libertarian bunch. Anytime somebody tries to impose rules that they
follow 99% of the time anyway, they always focus on the 1% exceptions.
For indentation, these exceptions often involve what to do with lines
that are too long. So yeah, a language with mandatory indentation shoud
deal gracefully with that issue. Or sometimes the exceptions involve
code that is nested 20 levels deep. But these cases are almost always
easy to rewrite into an equivalent but shallower structure. One place
where I tend to deliberately break indentation rules is with temporary
debugging output. I often leave such print statements unindented, so
that they're easier to find when it's time to take them out. This is
covenient, but I can certainly live without it.
- I don't want people to be able to read my code! Maybe some people
view obfuscated code as job security. As a different example, the
former champion in the TopCoder programming contest, John Dethridge,
was famous for never indenting. Why? Because in TopCoder, there is a
"challenge" phase, where other competitors look at your code and 
try to
find bugs. So there's an incentive to make your code hard for other
competitors to understand. I remember teasing him about this once, and
he said laughingly "Beware my left-justified fury!" I replied that I'd
be more afraid if his fury was right justified.
- It doesn't scale. As programs get bigger, both in lines of code and
in number of programmers, you run into more mismatches in indentation.
For example, you might want to move or copy a loop that was nested 5
levels deep to another location nested 3 levels deep. Or you might need
to integrate code written by programmers that used different numbers of
spaces per indentation level. Refactoring tools can certainly help
here. But, you know, if you're the sort of programmer who would leave
the indentation messed up when you moved that loop, just because your
language didn't require you to fix it, then I probably don't want to
work with you anyway.

What about novices?
Most of the objections above don't really apply to novices. Programming
is new to them so it's all weird anyway. They have no idea what
scanners and parsers are. As teachers, we already take away a lot of
their freedoms anyway, and we certainly want them to care if somebody
(namely us!) can read their code. And novices are usually not going to
be writing large enough programs for the scaling issues to be a big
problem.

Ok, but what are the benefits for novices?
- They are already used to the idea of indentation. Both from writing
outlines in English class and from nested bullet lists in the (almost)
ubiquitous PowerPoint, novices already have experience with the idea of
indicating grouping using indentation. This makes such languages much
easier for novices to learn. In contrast, explicit markers such as
curly braces or BEGIN/END keywords are something novices have much less
experience with. However natural such markers might seem to us, they
are not natural for novices, and are a constant source of mistakes.
(Worse, a typical novice strategy for dealing with those mistakes is to
randomly insert or delete braces until it compiles--a strategy Peter Lee
used to call "programming by random perturbation".)
- Less is more. Or, put another way, smaller is better. To the novice,
a fifteen-line program is less intimidating than a twenty-line program,
a program that fits on one page is much easier to understand than a
program that spans multiple pages. Those extra lines taken up by
explict braces or BEGIN/END keywords really add up. Even if you use a
style that puts a { at the end of the previous line, the } still
usually goes on a line by itself. I shudder now everytime I look at a
Java program and see a code fragment like ... } } } } } } Note that I
am not advocating compressing everything into as few lines as possible
(a la Perl Golf). Nor am I saying that all redundancy is bad. But in
this case, the redundancy of explicit markers was hurting more than it
was helping.
- Mandatory indentation promotes good habits. I've taught plenty of
novices in languages that did not require indentation. If the language
doesn't require it, they won't do it, or at least not consistently. If
they are using an IDE that indents for them, fine, but sometimes they
need to write code in a primitive editor like Notepad, and then they
just won't bother. Even if I require the final submission to be
properly indented, all too often they will do all their development
without indentation, and then indent the code just before turning it in
(kind of like the typical novice approach to commenting). Of course,
indenting after the fact means that they don't get any of the benefits
from indenting their code, such as making debugging easier.
On the other hand, if the language makes indentation mandatory, then
the novice needs to keep their indentation up to date during the entire
development cycle, so they will reap those benefits. Since I started
using this language, I've also noticed improved indentation habits even
when students switch to other languages without mandatory indentation.
I can at least hope that this habit is permanent, although I have no
evidence to back that up.
A surprise
I was shocked by how much the mandatory indentation seemed to help my
students. I did not come into this expecting much of a change at all. I
had experience with mandatory indentation in a couple of languages
(most notably Haskell), and I had found it to be a pleasant way to
code. Also, I had heard good things about people using Python in the
classroom. However, I was by no means a convert at the time that I was
designing my language.

I had two motivations for making indentation mandatory in the language.
First, this language was designed to be the second language most of the
students saw, and I wanted expose them to a range of language ideas
that they had not seen before. For example, their first language used
static typing so I made my language use dynamic typing. Similarly,
their first language did not make indentation mandatory, so I took the
opposite route in my language. My second motivation was simply that I
was annoyed. I was tired of students coming to me with code what was
either completely unindented or, worse, randomly indented. I figured
that making the compiler enforce indentation was the surest way to stop
this.

Imagine my surprise when I started teaching this language and found the
students picking it up faster than any language I had ever taught
before. As fond as I am of the language, I'm certainly under no
illusions that it's the ultimate teaching language. After carefully
watching the kinds of mistakes the students were and were not making, I
gradually realized that the mandatory indentation was the key to why
they were doing better. This seemed to manifest itself to two ways, one
obvious and one more subtle. The obvious way was that they were simply
spending much less time fighting the syntax.

The more subtle way was that they appeared to be finding it easier to
hold short code fragments in their head and figure out exactly what the
fragment was doing. I conjecture that there may be some kind of
seven-plus-or-minus-two phenomenon going on here, where adding extra
lines to a code fragment in the form of explicit braces or BEGIN/END
keywords pushes the code fragment above some size limit of what novices
can hold in their heads. This wouldn't affect expert programmers as
much, because they see beneath the braces to the chunks underneath, but
novices live at the level of syntax.

Whatever the explanation, I'm now a convert to the power of mandatory
indentation for novices. I've never taught Python, but I suspect those
who have may have had similar experiences. If so, I'd love to hear from
you.



Things you can do from here:
- Subscribe to Teaching, Playing, and Programming using Google Reader
- Get started using Google Reader to easily keep up with all your
favorite sites
-------------- next part --------------
HTML attachment scrubbed and removed


More information about the PRL mailing list