[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] Write procedure is not backwards compatible

We agree that there should be write/shared--read-write equivalence is useful and all that.

You describe a use case where the user is writing a structure that he knows has no cycles.  This case is handled well by a naive recursive writing procedure.  You call this "write/simple".

I describe a use case where the user *expects* that a structure has no cycles, and wants it printed without datum labels, but wants the Scheme runtime to verify that it has no cycles--and otherwise to give a useful error message, and not loop infinitely and possibly crash or need to be killed due to grabbing memory to hold a growing call stack and so on.  This is the same thing as "write/simple", except it is guaranteed to be safe (assuming no interference from other threads).  It can be handled by a cycle check, plus a naive recursive writing procedure.  I call this "write/safe".

It should be obvious that the first use case is addressed completely and perfectly by the second solution, except for performance considerations; given that we have write/safe, the only *possible* reason a user in the first group would prefer write/simple is to avoid the performance overhead of the cycle check.[1]  Therefore, I think a name like "write/small" is in fact appropriate for the procedure you describe, at least for this discussion.  (Call it what you like when it goes in the standard.  I'm talking about what functionality is provided, not names for it.)

> Now, your "solution" just boils down to "they can implement it
> themselves" which is a non-solution.  Programmers should be
> able to write out any list they can fit in memory, and this should
> be provided by the language.

That was not a proposed solution.  That was a description (or prediction) of how bad the consequences would be for users who wanted, respectively, write/small or write/safe and didn't have it.  I described it as a royal pain but not impossible in the first case, and as impossible without sacrificing portability or efficiency in the second case.  This was analytic, not normative.  My normative recommendations came in the next paragraph.

It occurs to me that an elegant solution is to have "write-simple" as currently proposed, and to have it take an optional argument, check-for-cycle, which defaults to #t--or perhaps to a parameter (like the optional port argument), initially #t.  (I think you'd agree that the default should be safe.)  If check-for-cycle is true, then it checks, and raises an exception if it finds one; then it takes the naive printing approach.  This seems a nice way to provide the functionalities of write/safe and write/small.  Consider this my recommendation.
(write-simple obj)
(write-simple obj port)
(write-simple obj port check-for-cycle)  ;a better name would be welcome


Incidentally, what Aaron W. Hsu asks for is different from either write/shared, write/safe, or write/small:
> a version that prints the same as
> write-simple in all cases for which write-simple returns, but should
> work like write-shared on all cases for which write-simple would not return.

Given exception-handling, and given write-simple as I have proposed it, it seems a user could implement this easily--try to call write-simple with check-for-cycle set to #t, and catch the exception by calling write-shared.  (One might think about directly exposing an internal "has-cycle?" procedure.  I imagine a user wanting that functionality by itself, and doing something like (begin (write-simple x (open-output-string)) #f) inside a handler that would return #t on an exception, in place of (has-cycle? x); and I think "that is so bad; that's a sign this should be exposed".  But I'm not conscious of anyone asking for it as a user yet.  I guess implementers can expose it if they want.)

I was thinking that the name "write" should refer to Aaron's procedure here, but now I think that "read-write equivalence" is a good mental model that I can get used to (which clearly prescribes write-shared).  Having a common, basic built-in procedure that completely changes its behavior if there is a cycle is a bad idea.

So now my complete recommendation is simply to take the current spec, to insert the "check-for-cycle" parameter into write-simple, to specify write-simple's behavior under its truth and falsity, and to specify that its default is true.


[1] And I think implementers would be mostly indifferent, contra the WG1Ballot: "Note the algorithms for detecting shared structure differ from those for detecting cycles, so providing both -shared and -cyclic imposes an additional implementation burden."  The difficulty of implementing eq-hashes or an equivalent feature in an implementation that wasn't made to support pointer-hashing or whatever is, I imagine, much greater than that of writing two algorithms on top of that feature.  You suggested a method of marking nodes with one bit as they were visited; this by itself is insufficient for write/shared, because you would have to store the numerical label for that node so as to print it out; and if you can store numerical labels, you can probably store reference counts.  In short, detecting cycles is *easier* than detecting and labeling shared structure.

--John Boyle
Science is what we understand well enough to explain to a computer. Art is everything else we do. --Knuth

On Wed, Jul 11, 2012 at 4:58 PM, Alex Shinn <alexshinn@x> wrote:
On Sun, Jul 8, 2012 at 8:48 PM, John Boyle <johnthescavenger@x> wrote:
> Alex Shinn writes:
>> It is not unreasonable to work with a single data
>> structure that large, and we should ensure it's possible
>> to write it out.
> True.  Still, this can in theory be done with all desired space-efficiency
> and portability (given that all data structures used are portable) by the
> user: write "write/fast" recursively in Scheme.

Let's stop using names like "write/fast" or "write/small"
which refer to the implementation.  The current draft uses
"write-simple" to refer to _what_ you're writing - a simple data
structure with no cycles.  This discourages people from using
it simply because they think it will be faster.

Now, your "solution" just boils down to "they can implement it
themselves" which is a non-solution.  Programmers should be
able to write out any list they can fit in memory, and this should
be provided by the language.