[Toybox] od: yet more corner cases.

Rob Landley rob at landley.net
Sun Jul 1 20:44:59 PDT 2012


The reason for 7 digits is that tabs are 8 spaces. When you request 
multiple output formats each "additional" output line starts with a 
single tab.

This means the alignment goes wonky when the offset has _more_ than 7 
digits, ala:

$ od -c -b -j 700000000 -N 64 /home/landley/qemu/images/KNOPPIX_V6.7.1CD-2011-09-14-EN.iso 
5156223400 221   _ 247   % 210 320 373 265 252 356 366 272 373   9 225 031
        221 137 247 045 210 320 373 265 252 356 366 272 373 071 225 031

The standard says:

> The number of bytes transformed by the output type specifier c
> may be variable depending on the LC_CTYPE category.

Which says to me -c should be utf8 aware, which knocks the "16 bytes
input" thing a bit wobbly, really. I'll have to come back to that
one.

> the implementation shall support values of the optional number of
> bytes to be converted corresponding to the number of bytes in the
> C-language types char, short, int, and long. These numbers can also
> be specified by an application as the characters 'C' , 'S' , 'I' ,
> and 'L' , respectively.

Which says that when built on a 32-bit host, "L" should be 4 bytes
due to LP64. If you want constent 64-bit, specify "x8" I guess.

Heh.  Oh that's nasty. I can repurpose the "detect duplicate lines
and print * unless -v specified" buffer and use _that_ for utf8
cross-line continuations. Bwahahaha! (But still: later. And it
requires readahead, because I can't _complete_ a line until I've
read to the end of the utf8 sequence. It'd be so much easier if
I could * out the first ones and put the multibyte character as
the _last_ character in the sequence...)

Rob
-- 
GNU/Linux isn't: Linux=GPLv2, GNU=GPLv3+, they can't share code.
Either it's "mere aggregation", or a license violation.  Pick one.

 1341200699.0


More information about the Toybox mailing list