[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] Bytevectors should be called u8vectors

On 2012-07-03, at 11:17 AM, Jussi Piitulainen wrote:

> Marc Feeley writes:
>> I have a feeling that the use of "bytevector" in the names of
>> procedures in R7RS small is due to WG2 concerns of extending the set
>> of operations on bytevectors to 16, 32, etc bit width access
>> operations.  Alaric Snell-Pym and others have pointed out that
>> "blob" is a better name for such a data type.  I am not saying I
>> prefer it, but perhaps that's the name WG2 and the community will
>> prefer for that data type.  So committing to the name "bytevector"
>> in R7RS small is premature.  On the other hand, you say that in your
>> WG2 bytevector proposal, you were proposing to support u8vectors and
>> the other SRFI-4 names.  So I don't see your position of prefering
>> to standardize in R7RS small the "bytevector" names instead of the
>> "u8vector" names.
> Apologies if I mistake, but I think John distinguishes meaningfully
> between the bytevector-* interface and the [fus]{8,16,32,64}vector-*
> interface (somewhere in this thread) and this distinction should be
> appreciated more. These are not alternative names for the same thing:
> bytevector-* offsets are in bytes, the other kind offsets in the units
> indicated by the name. The bytevector interface can interpret binary
> formats byte by byte in varying units. The other interface fixes an
> interpretation as a homogeneous vector, and the interfaces overlap in
> the 8-bit cases.
> Let v be #u8(a, b, c, d, e, f, g, h) for suitable integers a, b, ...
> Let w be the same memory as an u16vector, disjoint type or not.
> (bytevector-u8-ref v 3)  => d as unsigned-int8
> (bytevector-s8-ref v 3)  => d as signed-int8
> (u8vector-ref v 3)       => d as unsigned-int8
> (bytevector-u16-ref w 3) => d, e as unsigned-int16
> (bytevector-u16-ref w 4) => e, f as unsigned-int16
> ;; no access to d, e with u16vector-ref, er, (u16vector-ref w 3/2)?
> (u16vector-ref w 2)      => e, f as unsigned-int16
> (u16vector-ref w 3)      => g, h as unsigned-int16
> Hm. I'm not sure if it makes much practical sense for u8vector and
> bytevector to be disjoint types. For the other homogeneous types a
> distinct written representation would be nice, at least in a REPL.
> Both interfaces seem important to me.

Sorry for not being precise, but yes that's what I undestood.  I agree that an "integer layout" API offers more operations than SRFI-4, but it is a complex API that involves additional concepts such as numerical encoding and endianness and possibly alignment.  The complex API adds run time overhead which goes against the purpose of these homogeneous vectors (in other words, if mixed-type access to the binary data is not required by a program, which I expect to be the more common case, it is preferable to use the SRFI-4 interface for performance reasons).

A compromise which would eliminate the bloat of the two interfaces, is to have R7RS small adopt the u8vector names, and for R7RS large to add the rest of the SRFI-4 procedures, and the "integer layout" API using a u8vector prefix, i.e.

   (u8vector-u16-ref u8vect byte-offset endianness)

I would find this more consistent, given that the external representation for the vectors operated on by these procedures is #u8(...).


Scheme-reports mailing list