[PRL] laziness
Johan Ovlinger
johan at ccs.neu.edu
Wed Jun 23 12:10:13 EDT 2004
Joe Marshall wrote:
>>foldl (+) 0 [0 .. 10000000]
>>
>>fails to terminate: runs out of memory.
>
>
> I believe that may be an obscure bug. The same thing happens when you
> use streams in Scheme if you are not careful about how storage is
> maintained.
No, it's the 'correct' behavior. Well, it sorta depends on whether
strictness optimizations are guaranteed. Ifso then yes, this might be a
bug in the same vein as ommiting a guaranteed tail-call optimization.
However, the big lesson from lazy languages is that strictness analysis
is both very important and very difficult.
> Richard is lazy
Since I'm shirking writing regression tests right now, I'll work it out:
Each step along the way, I build a thunk of the (+ hd acc), and since
the main body is tail recursive, I HAVE to walk to the end of the list
until I return anything.
In a rewriting system where let bound expressions implicitly allocate a
thunk in the heap (incidentally how ghc's core syntax works), it would
rewrite like so:
foldl (+) 0 [0..1000000]
-->
let acc = (+ 0 0) // a thunk
in foldl (+) acc [1..1000000]
-->
let acc = (+ 0 0) // a thunk
in let acc = (+ 1 acc) // a thunk
in foldl (+) acc [2..1000000]
--> ... -->
let acc = (+ 0 0) // a thunk
in let acc = (+ 1 acc) // a thunk
...
in acc
It gets worse.
As soon as I return this result and try to print it, I'll discover that
the innermost acc thunk needs the value of (+ 99999999 acc), so I push
(+ 10000000 acc) on the stack and try to force the next-to-innermost.
Of course, we continue this way until we reach (+0 0), which we can
evaluate. By then we'll have run out of stack.
In lazy languages, tail recursive functions need linear stack space!!!
Strictness
Now, if we could figure out that + is strict in its right argument AND
that we needed the result of the foldl (because + may be expensive), it
is safe-for-divergence to evaluate each expression directly, instead of
creating a thunk, transforming the program (again in ghc interal lang:
case forces its expression) to the strict version:
foldl (+) 0 [0..100000]
--> // meta syntax
case (+0 0) of acc -> foldl (+) acc [1..1000000] // -> is base syntax
-->
foldl (+) 0 [1..1000000]
-->
case (+1 0) of acc -> foldl (+) acc [2..1000000]
-->
foldl (+) 1 [1..1000000]
...
which runs like we'd expect it to.
I'd imagine that ghc's strictness analyser gets this simple example
right (I can't recall the state-of-the-art in such analysis; if it
doesn't work, try defining foldl locally)
More information about the PRL
mailing list