[Toybox] od: yet more corner cases.
Rob Landley
rob at landley.net
Sun Jul 1 20:44:59 PDT 2012
The reason for 7 digits is that tabs are 8 spaces. When you request
multiple output formats each "additional" output line starts with a
single tab.
This means the alignment goes wonky when the offset has _more_ than 7
digits, ala:
$ od -c -b -j 700000000 -N 64 /home/landley/qemu/images/KNOPPIX_V6.7.1CD-2011-09-14-EN.iso
5156223400 221 _ 247 % 210 320 373 265 252 356 366 272 373 9 225 031
221 137 247 045 210 320 373 265 252 356 366 272 373 071 225 031
The standard says:
> The number of bytes transformed by the output type specifier c
> may be variable depending on the LC_CTYPE category.
Which says to me -c should be utf8 aware, which knocks the "16 bytes
input" thing a bit wobbly, really. I'll have to come back to that
one.
> the implementation shall support values of the optional number of
> bytes to be converted corresponding to the number of bytes in the
> C-language types char, short, int, and long. These numbers can also
> be specified by an application as the characters 'C' , 'S' , 'I' ,
> and 'L' , respectively.
Which says that when built on a 32-bit host, "L" should be 4 bytes
due to LP64. If you want constent 64-bit, specify "x8" I guess.
Heh. Oh that's nasty. I can repurpose the "detect duplicate lines
and print * unless -v specified" buffer and use _that_ for utf8
cross-line continuations. Bwahahaha! (But still: later. And it
requires readahead, because I can't _complete_ a line until I've
read to the end of the utf8 sequence. It'd be so much easier if
I could * out the first ones and put the multibyte character as
the _last_ character in the sequence...)
Rob
--
GNU/Linux isn't: Linux=GPLv2, GNU=GPLv3+, they can't share code.
Either it's "mere aggregation", or a license violation. Pick one.
1341200699.0
More information about the Toybox
mailing list