[Larceny-users] Commutativity

Fri Feb 23 14:00:05 EST 2007

On Mon, Feb 12, 2007 at 11:58:48AM -0500, William D Clinger wrote:
> 
> Exactly right.  I think the order-of-evaluation
> is being chosen without regard to whether the
> expressions are operands to a primitive.  When
> they are, the first operand should be evaluated
> last unless there is some good reason not to.

Now that I've looked at this more closely: anf-call does have a special
case for the arguments to a primitive, but it doesn't do anything
differently from a regular call.  And it's not even the wrong thing,
provided that the arguments aren't considered complicated.

Before r3999, (car x) and (cdr x) were complicated, because they expand
to a LET, so that the argument may be type-tested and then operated on.
After r3999, they're no longer complicated, and the right thing is done.
That is, I seem to have already accidentally fixed this.

Also, a LET uses the reverse of whatever order of evaluation a call with
the same arguments would get; this does not yet make sense to me.

> The best solution would be to rewrite the code
> generator yet again, but a good interim solution
> would be to improve the choice of order of
> evaluation, assuming the R6RS lets us do that.

Changing the order of evaluation would be able to be helpful when that
first argument isn't a register, but something like (vector-ref x (car
y)) is still out of luck.

So, I was thinking about that, and my current rumination (which has
eaten far too much time to not be shared) is that this can be considered
a special case of a backend's having access to some storage more
efficient than a soft register or stack slot, but unsuitable for global
assignment as a hard register.  Thus, one may wish that the backend
could use such for caching values that nominally reside in memory.

In this case, it's RESULT; the machine-independent code generation
doesn't know where the machine-dependent peepholes might be able to make
it vacant.  For the x86 backend at least, TEMP/SECOND could also be a
candidate; and maybe there are even times when one of the more important
MacScheme SPRs could be temporarily displaced in favor of other use.

Petit Larceny, meanwhile, has an arbitrarily-sized supply of C local
variables, which can't survive indirect control transfer, but should be
more amenable to optimization by the C compiler than array slots.

However, this is less useful if the backend has to store back to memory
values that pass4 knows will never be used.  The transient-operand-ness
of RESULT provides some help here, as designed, but is unhelpful when
(for example) testing a value's type and then extracting something from
it.  (I get the impression that separating those operations above the
MacScheme machine layer is a more recent development.)

I'm left to speculate if enough information already exists (there are
no callee-save registers on the MacScheme machine, right?) for such an
optimization, or if an instruction specified to destroy the contents of
a general register might become desired at some unspecified point in the
future.

-- 
(let ((C call-with-current-continuation)) (apply (lambda (x y) (x y)) (map
((lambda (r) ((C C) (lambda (s) (r (lambda l (apply (s s) l))))))  (lambda
(f) (lambda (l) (if (null? l) C (lambda (k) (display (car l)) ((f (cdr l))
(C k)))))))    '((#\J #\d #\D #\v #\s) (#\e #\space #\a #\i #\newline)))))