[Larceny-users] Parallel assignment optimization and complicatedness.
Jed Davis
jdev at panix.com
Wed Feb 7 21:56:03 EST 2007
Once upon a time, I was curious about Larceny/Twobit's code generation,
and so wound up with this simple function:
(lambda (v)
(let loop ((n 0) (a 0))
(if (>= n 26) a (loop (+ n 1) (fx+ a (vector-ref v n))))))
This, on SPARC (with Larceny v0.93) with 'fast-unsafe, compiles to:
4 or %g0, 0, %r3
8 subcc %r2, 104, %g0 ! 0x68
12 bl,a #36
16 add %r2, 4, %r7
20 ld [ %stkp + 4 ], %o7
24 jmpl %o7 + 8, %g0
28 or %g0, %r3, %result
32 add %r2, 4, %r7
36 add %r1, 1, %tmp0
40 ld [ %tmp0+%r2 ], %r6
44 or %g0, %r7, %r2
48 add %r3, %r6, %r3
52 subcc %timer, 1, %timer
56 bne,a #12
60 subcc %r2, 104, %g0 ! 0x68
64 jmpl %globals + 1376, %o7 ! timer-exception
68 add %o7, -64, %o7
Which is nice, except that the increment is evaluated before the
vector-ref, and thus requires a temporary -- and I recalled having read
about parallel assignment optimization, which is supposed to prevent
exactly that.
So, after far too much use of trace, I found that: the PAO in pass 4 (as
mentioned by the old notes) wasn't being terribly pointful, because the
subexpressions had all been flattened into registers by the conversion
to A-normal form in pass 3; and that, while A-normal-form itself might
have applied argument reordering, the code it got had the vector-ref call
expanded to a beta-redex festooned with safety checks.
This last was a problem because complicated? considers any non-primitive
call to be complicated, and would also give up early due to the
expression's size. So, for the sake of experiment, I adjusted
complicated? to consider a call of a lambda with an uncomplicated body
the same way as a primitive (which may well not be the right thing in
general, but this is just a heuristic) and raised the budget (and made
it tunable with set!). And that fixes that (and should be attached to
this message), though I haven't yet tried to measure what if any effect
it has on actual performance.
But it gets more fun. I was originally looking at this on Intel, where
a stack frame and lots of unnecessary moves are introduced[*]:
0: 31 d2 xor %edx,%edx
2: 31 ff xor %edi,%edi
4: 83 fa 68 cmp $0x68,%edx
7: 0f 8c 06 00 00 00 jl 0x13
d: 89 fb mov %edi,%ebx
f: ff 75 04 pushl 0x4(%ebp)
12: c3 ret
13: 83 ed 18 sub $0x18,%ebp
16: 3b 6c 24 0c cmp 0xc(%esp),%ebp
1a: 7d 0a jge 0x26
1c: 83 c5 18 add $0x18,%ebp
1f: ff 54 24 14 call *0x14(%esp)
23: 90 nop
24: eb ed jmp 0x13
26: c7 45 00 14 00 00 00 movl $0x14,0x0(%ebp)
2d: 31 db xor %ebx,%ebx
2f: 89 5d 04 mov %ebx,0x4(%ebp)
32: 89 75 0c mov %esi,0xc(%ebp)
35: 89 7d 14 mov %edi,0x14(%ebp)
38: 89 5d 10 mov %ebx,0x10(%ebp)
3b: 8d 5a 04 lea 0x4(%edx),%ebx
3e: 89 5c 24 7c mov %ebx,0x7c(%esp)
42: 8b 54 11 01 mov 0x1(%ecx,%edx,1),%edx
46: 89 55 10 mov %edx,0x10(%ebp)
49: 8b 54 24 7c mov 0x7c(%esp),%edx
4d: 8b 7d 10 mov 0x10(%ebp),%edi
50: 8b 5d 14 mov 0x14(%ebp),%ebx
53: 01 fb add %edi,%ebx
55: 89 df mov %ebx,%edi
57: 83 c5 18 add $0x18,%ebp
5a: ff 4c 24 08 decl 0x8(%esp)
5e: 0f 85 a0 ff ff ff jne 0x4
64: ff 54 24 18 call *0x18(%esp)
68: e9 97 ff ff ff jmp 0x4
This is a little hideous. Felix (in private mail) noticed that raising
*nhwregs* makes it not do that (but doesn't seem to help general
performance). With my change to complicated?, it also doesn't do that:
0: 31 d2 xor %edx,%edx
2: 31 ff xor %edi,%edi
4: 83 fa 68 cmp $0x68,%edx
7: 0f 8c 06 00 00 00 jl 0x13
d: 89 fb mov %edi,%ebx
f: ff 75 04 pushl 0x4(%ebp)
12: c3 ret
13: 8b 5c 11 01 mov 0x1(%ecx,%edx,1),%ebx
17: 89 5c 24 7c mov %ebx,0x7c(%esp)
1b: 89 fb mov %edi,%ebx
1d: 03 5c 24 7c add 0x7c(%esp),%ebx
21: 89 df mov %ebx,%edi
23: 8d 52 04 lea 0x4(%edx),%edx
26: ff 4c 24 08 decl 0x8(%esp)
2a: 0f 85 d4 ff ff ff jne 0x4
30: ff 54 24 18 call *0x18(%esp)
34: e9 cb ff ff ff jmp 0x4
There continues to not be a stack frame with safety enabled. (Note also
the unnecessary soft register, but lamenting that is beyond the scope of
this message.)
[*] Obtained by writing the code vector to a file and calling an
external disassembler.
--
(let ((C call-with-current-continuation)) (apply (lambda (x y) (x y)) (map
((lambda (r) ((C C) (lambda (s) (r (lambda l (apply (s s) l)))))) (lambda
(f) (lambda (l) (if (null? l) C (lambda (k) (display (car l)) ((f (cdr l))
(C k))))))) '((#\J #\d #\D #\v #\s) (#\e #\space #\a #\i #\newline)))))
-------------- next part --------------
Index: src/Compiler/pass3anormal.sch
===================================================================
--- src/Compiler/pass3anormal.sch (revision 3972)
+++ src/Compiler/pass3anormal.sch (working copy)
@@ -133,6 +133,8 @@
(define A-normal-form-declaration (list 'anf))
+(define *anf-complication* 100)
+
(define (A-normal-form E . rest)
(define (A-normal-form E)
@@ -530,7 +532,7 @@
(define (complicated? exp)
; Let's not spend all day on this.
- (let ((budget 10))
+ (let ((budget *anf-complication*))
(define (complicated? exp)
(set! budget (- budget 1))
(if (zero? budget)
@@ -546,11 +548,15 @@
((begin? exp) (some? complicated?
(begin.exprs exp)))
((call? exp) (let ((proc (call.proc exp)))
- (if (and (variable? proc)
- (prim-entry (variable.name proc)))
- (some? complicated?
- (call.args exp))
- #t)))
+ (if (cond
+ ((variable? proc)
+ (prim-entry (variable.name proc)))
+ ((lambda? proc)
+ (not (complicated? (lambda.body proc))))
+ (else #f))
+ (some? complicated?
+ (call.args exp))
+ #t)))
(else (error "Unrecognized expression" exp)))))
(complicated? exp)))
More information about the Larceny-users
mailing list