[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] (read|write)-char [was Opinion about R7RS]

Jean-Michel HUFFLEN scripsit:

>    I understand what you mean. However, the fact that neither Java
> nor C# does so is a *defect*, from my point of view. I think that
> Scheme should not inherit such defects. Besides, if we consider the
> R7RS draft, an implementation of Scheme may provide calculations
> about the full range of Unicode characters - recognising accented
> letters by means of the "char-alphabetic?" function, for example -
> but be limited to ASCII characters when reading or writing files.
> From my point of view, that is nonsense.

>From my point of view, that is appropriate in certain cases.  For
example, in an implementation that has immediate (non-pointer) values,,
it's usually straightforward to represent all 17*65536 characters as
immediates: you just need some tables, which only have to be loaded if
the `(scheme char)` library is imported.  However, it may suffice in the
use cases of the implementation to just have Latin-1 strings, in which
case Latin-1 I/O is probably all that makes sense.  This for example is
how Chicken works unless you load the UTF-8 egg.  R7RS-small allows for
these things.

>    In addition, a tremendious progress of R7RS is that some
> situations can be handled more easily than in R(5|6)RS. For example,
> if "open-input-file" cannot open a port, an error is signalled and
> can be handled as such, whereas pathological cases for
> "open-input-file" were unspecified in R(5|6)RS.

This turns out not to be the case.  R5RS required an error to be
signalled but provided no standard means of handling it.  R6RS required
an error to be signalled and provided a standard means of handling it.
Draft 6 of R7RS-small requires an error to be signalled and provides a
standard means of handing it, but does not provide a standard means of
discriminating this error from other errors.  Ticket #391 proposes such
a standard means, the predicate `file-error?`.

>    Let us come back to "write-char", the previous implementations of
> this function just had to check that the character was valid w.r.t.
> the ASCII encoding.

R5RS was not in any way tied to ASCII.  It defines a portable character
repertoire, but no specific mapping is required between this repertoire
and any particular exact integers, merely that a bidirectional mapping
of *some* sort exists.  Implementations often provided full Unicode.

> Now, if we open an output port with the Latin-1 encoding, some valid
> characters of Unicode cannot be written: what happens in such a case?

It's implementation-dependent.

>    OK, but perhaps the draft may make precise that the Unicode-based
> rules apply even only a proper subset of Unicode is processed.

It already does so.

John Cowan    cowan@x    http://ccil.org/~cowan
This great college [Trinity], of this ancient university [Cambridge],
has seen some strange sights. It has seen Wordsworth drunk and Porson
sober. And here am I, a better poet than Porson, and a better scholar
than Wordsworth, somewhere betwixt and between.  --A.E. Housman

Scheme-reports mailing list