[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] procedure identity

Hash: SHA1

On 06/05/2013 11:40 AM, Taylan Ulrich B. wrote:
> Per Bothner <per@x> writes:
>> The problem is I would like to allow the following to return true
>> (as it does in Kawa):
>> (define (maybe-negate negp)
>>    (if negp (lambda (x) (- x))
>>        (lambda (x) x)))
>> (eq? (maybe-negate #t) (maybe-negate #t))
>> ;; Likewise for eqv?
> I don't understand why this should be a problem.  The semantics proposed
> by Will retain the notion of a location tag, and merely change it from
> being the procedure's identity (which eq? test for) to being an extra
> trait of a procedure (which eqv? uses to report operational equivalence
> despite differing (or nonexistent) identity).  But neither R5RS nor
> Will's proposal require eq? or eqv? to return #f when the location tag
> differs; if an implementation can solve the halting problem and always
> detect operationally equivalent procedures, by all means it ought to be
> allowed to. :)

As I see it, the root of the matter (from an implementation perspective)
is this.

To an implementation, a first-class closure consists of a
vector/record-like object stored in memory, containing a pointer to some
code and a bunch of closed-over variables. "lambda" is a construct that,
when evaluated, gathers the closed-over variables and allocates a
closure-thing and puts them in along with the hardcoded code pointer.

It would make sense to deduplicate identical code, so that (lambda (x)
x) need only be compiled once.

It may or may not make sense to deduplicate operationally equivalent
closures - that have the same (or equivalent in some other way) code
pointers, and eqv? closed-over variables, and that don't set! their
variables so they will remain eqv?.

It may well make sense to deduplicate closures with *no* closed-over
variables, in essence creating them once rather than every time (lambda
...) is "executed", so that Per's example returns #t for the eq? and the
eqv? tests. However, we must be slightly careful - people routinely take
advantage of the uniqueness of conses that are never mutated. But if we
offer a split into mutable and immutable conses in future, and allow
immutable conses to be shared so that (eq? (cons 1 2) (cons 1 2)) may be
#t but (eq? (mcons 1 2) (mcons 1 2)) must be #f, then a similar split
with mutable and immutable closures might be considered consistent! And
also, arguably, immutable closures should be more loosely-specified than
potentially-mutable-but-never-mutated cons cells, not only under the
banner of allowing compiler optimisations, but because equivalence of
functions is a trickier thing to specify in the abstract mathematical
realm to begin with (definitions that would require near-infinite time
to test do not count!), while the semantics of a mutable cons cell are a
little clearer to begin with.

What is the "location tag" of a procedure? Perhaps it's the location of
the closure-thing or, for implementations with garbage collectors that
like to move things around, it might be an arbitrary unique ID that gets
stowed in the closure-thing.

However, the other case to consider is that while closures are "first
class in Scheme", that does not prohibit an implementation which sees
closures which are never used in a first-class manner from implementing
them in non-first-class ways, such as inlining them. Here it gets a
little more complicated, as there is some obligation on the
implementation to:

1) Only do such optimisations when the first-class nature of procedures
is not really required. They rarely apply to procedures stored in
arbitrary lists and then applied after extraction from the list!

2) When they choose to apply such optimisations, take extra steps to
preserve any semantics that do not conflict with being able to use the
optimisation at all, but would nonetheless be otherwise affected by the
change. In this case, optimisations that avoid allocating closure-things
at all will certainly make it hard to compare them (either entirely, or
through some means of sharing provably equivalent closure-things), and
there's issues for ones that allocate different forms of closure-things
(how about procedures that close over nothing? Could we just pass around
the code pointer, rather than a closure-thing that contains nothing
other than that pointer?). So we must either not apply those
optimisations at all to procedures that go drifting near to eq? or eqv?,
or if we do, making sure we store enough information somewhere to be
able to still produce the desired semantics.

For instance, if you go around inlining procedures, there's no
closure-thing and there's not even a "code pointer"; but inlining only
makes sense when the procedure is applied. If it's passed to eq? or eqv?
it's not applied, so some form of object can be made to be passed around
and compared with eq? and eqv?, even if that object can't actually be
applied at all! However, unless we have great whole-program analysis
information to be sure of that, we're probably better off allocating an
actual closure, even if we also inline the code directly in places where
we can statically deduce it to be the applied procedure.

3) When (1) is too restrictive on the useful optimisation being applied
and (2) is too onerous to make the optimisation worthwhile, yet the
desirability of the optimisation (perhaps we will NEVER BEAT C FOR
PERFORMANCE WITHOUT IT!!!), then we might think about loosening the
standard (and, therefore, restricting what programmers can do with
closures) in order to reduce the restrictions on implementations.


- --
Alaric Snell-Pym
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/


Scheme-reports mailing list