[PRL] laziness

Wed Jun 23 12:10:13 EDT 2004

Joe Marshall wrote:
>>foldl (+) 0 [0 .. 10000000]
>>
>>fails to terminate: runs out of memory.
> 
> 
> I believe that may be an obscure bug.  The same thing happens when you
> use streams in Scheme if you are not careful about how storage is
> maintained.

No, it's the 'correct' behavior. Well, it sorta depends on whether 
strictness optimizations are guaranteed. Ifso then yes, this might be a 
bug in the same vein as ommiting a guaranteed tail-call optimization. 
However, the big lesson from lazy languages is that strictness analysis 
is both very important and very difficult.

 > Richard is lazy

Since I'm shirking writing regression tests right now, I'll work it out:

Each step along the way, I build a thunk of the (+ hd acc), and since 
the main body is tail recursive, I HAVE to walk to the end of the list 
until I return anything.

In a rewriting system where let bound expressions implicitly allocate a 
thunk in the heap (incidentally how ghc's core syntax works), it would 
rewrite like so:

foldl (+) 0 [0..1000000]
-->
let acc = (+ 0 0)   // a thunk
in foldl (+) acc [1..1000000]
-->
let acc = (+ 0 0)   // a thunk
in let acc = (+ 1 acc)   // a thunk
in foldl (+) acc [2..1000000]
--> ... -->
let acc = (+ 0 0)   // a thunk
in let acc = (+ 1 acc)   // a thunk
...
in acc

It gets worse.

As soon as I return this result and try to print it, I'll  discover that 
the innermost acc thunk needs the value of (+ 99999999 acc), so I push 
(+ 10000000 acc) on the stack and try to force the next-to-innermost. 
Of course, we continue this way until we reach  (+0 0), which we can 
evaluate.  By then we'll have run out of stack.

In lazy languages, tail recursive functions need linear stack space!!!

Strictness

Now, if we could figure out that + is strict in its right argument AND 
that we needed the result of the foldl (because + may be expensive), it 
is safe-for-divergence to evaluate each expression directly, instead of 
creating a thunk, transforming the program (again in ghc interal lang: 
case forces its expression) to the strict version:

foldl (+) 0 [0..100000]
-->  // meta syntax
case (+0 0) of acc -> foldl (+) acc [1..1000000]   // -> is base syntax
-->
foldl (+) 0 [1..1000000]
-->
case (+1 0) of acc -> foldl (+) acc [2..1000000]
-->
foldl (+) 1 [1..1000000]
...

which runs like we'd expect it to.

I'd imagine that ghc's strictness analyser gets this simple example 
right (I can't recall the state-of-the-art in such analysis; if it 
doesn't work, try defining foldl locally)