[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] Draft 3 Comments: Chapter 6

On characters and string case mappings:

On Wed, Aug 3, 2011 at 7:19 AM, Denis Washington <denisw@x> wrote:

> The remarks about Turkic casing pairs not being used for char-upcase /
> char-downcase are somewhat disturbing.

Or, how about just saying that "there procedures uses language-insensitive
case mappings as defined in Unicode"?  Turkish special mappings
are defined as language-sensitive mapping in SpecialCasing.txt of
UnicodeData.   There's another language-sensitive mapping about Lithuanian,
which  I assume won't be used either.  Lithuanian mapping isn't simple mapping,
so it can only affect string-upcase; yet in the string case conversion routine
only Turkish is mentioned.   By saying excluding language-sensitive
mappings, we can cover both.   (R6RS says "locale-independent mapping").

In the string case conversion, it mentions the context sensitivity of
Greek sigma: A small final sigma needs to be used when it is
at the end of the word.  However, there's no definition of "word",
which can lead inconsistent behavior among implementations.
We can refer to UAX #29, as R6RS does.

(To illustrate the latter issue: R6RS shows two examples on the

  (string-downcase "ΧΑΟΣΣ")         => "χαοσς"
  (string-downcase "ΧΑΟΣ Σ")         => "χαος σ"

Here are some other cases which are somewhat non-obvious:

  (string-downcase "ΧΑΟΣ.Σ")         => "χαοσ.ς"
  (string-downcase "ΧΑΟΣ. Σ")         => "χαος. σ"

Naive word segmentation may miss "χαοσ.ς" case.)

Scheme-reports mailing list