[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Scheme-reports] Proposing amending char-numeric? definition
- To: Scheme-reports@x
- Subject: [Scheme-reports] Proposing amending char-numeric? definition
- From: Shiro Kawai <shiro.kawai@x>
- Date: Tue, 26 Apr 2011 22:40:01 -1000
Ok, I rehash the argument and make it more a proposal.
The draft's wording of char-numeric? is confusing, for Unicode doesn't
define "Numeric" property explicitly like "Alphabetic" or "Uppercase"
properties. So I propose to change it.
There can be a few possible resolutions.
(1) Define char-numeric? returns #t if the character's Numeric_Type
property value is other than 'None'. This seems a natural
interpretation of the current wording. However, I think it is
practically useless, since it *can't* be used to separate numbers from
a string. Characters whose Numeric_Type isn't 'None' includes
ordinary alphabetic characters (category Lo) that happens to have
meanings related to numbers. For example, '幺' (U+5e7a) has
Numeric_Type = 'Numeric', since the character means small or young, so
it can sometimes mean 1 in some specific context (for Japanese,
probably the only place it means '1' is in some Mah-jong terms.) So,
when I'm scanning a string and found that char-numeric? returns #t for
a character, and that character happens to '幺' (U+5e7a), and then what
I do? It is probably a part of other word so I should treat it as an
alphabetic character. And even if I want to make use of it, I need a
separate database to look up to know what number '幺' is representing.
(2) Drop char-numeric?, and add char-numeric-type and
char-numeric-value. The former returns the value of Numeric_Type
property, and the latter returns the value of Numeric_Value property.
This should be the way to provide access to a character's Unicode
(3) Define char-numeric? to return #t only for 0,1,2,3,4,5,6,7,8 and
9. This retains the compatibility to R5RS, and we can still use
char-numeric? to parse numbers, and safely use (- (char->integer c)
(char->integer #\0)) to obtain the digit value the character
represents. (Note: R5RS programs that use char-numeric? to parse
numbers will break if we adopt the current draft's definition of
Scheme-reports mailing list