[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] Sequence to sequence conversion

Marc Feeley scripsit:

> Not really.  I expected bytevector->string to be equal to
>        (lambda (bv) (list->string (map integer->char (bytevector->list bv))))
> which would correspond I guess to a latin1->string functionality with
> your naming Scheme.

Well, it seems to me that if reasonable people disagree about the meaning
of the name, it's probably not such a good name.

As a similar example:  In R5RS, the predicates `real?`, `rational?`, and
`integer?` all require that the imaginary part of their arguments be
zero, but either an exact or an inexact zero is permitted.  In R6RS,
this was changed to require that it be exact, presumably on the
grounds that 0.0 doesn't necessary represent a mathematical zero --
it can be the result of an underflowed computation.  New functions
`{real,rational,integer}-valued?` were added to R6RS to provide the R5RS
semantics (though R6RS does not say so).

WG1 originally decided to adopt the R6RS definitions, but it was pointed
out that it was a silent incompatible change from R5RS.  Various options
were proposed: stick with the already-voted-in R6RS semantics or revert
to the R5RS semantics, with or without additional names to represent
the other case.

When WG members voted on this, they decided to return to the R5RS
semantics and not adopt names for the R6RS semantics, on the grounds
that it was impossible to remember exactly what names would refer to
which functions.  The distinction was too subtle to be suitably captured.

> 2) The procedures specify in their names the character encoding to use.
> But there are oodles of character encodings, so for easy extensibility
> to other encodings, it would be better to use a parameter as in
> (decode-string bytevector 'UTF-8) and (encode-string string 'UTF-8)
> instead of oodles of different procedures.

But there are not oodles of character encodings that encode some 70%
of all documents on the Web.  Like it or not, UTF-8 has come to have a
privileged position (except in Windows).

> 3) The main reason for character encodings is to perform I/O on
> byte-oriented streams.  Yet the only procedures having to do with
> character encodings in R7RS are utf8->string and string->utf8.  This
> seems wrong.  If textual output could be performed on binary ports and
> the character encoding could be specified when the port is opened (as
> was proposed in SRFI-91, http://srfi.schemers.org/srfi-91/srfi-91.html,
> and implemented in Gambit), then the procedures utf8->string and
> string->utf8 would be superfluous since they could be defined easily
> like this:

My proposals for WG2 are a variant of SRFI-91.  WG1's view was that it
was unnecessary at the level of the small language to standardize means
of controlling the encoding of text.

By Elbereth and Luthien the Fair, you shall     cowan@x
have neither the Ring nor me!  --Frodo          http://www.ccil.org/~cowan

Scheme-reports mailing list