[Toybox] Numeric values in dd operands

enh enh at google.com
Tue Feb 20 09:32:30 PST 2018


On Tue, Feb 20, 2018 at 9:28 AM, Rob Landley <rob at landley.net> wrote:
> On 02/19/2018 09:09 PM, scsijon wrote:
>> On 02/20/2018 08:32 AM, toybox-request at lists.landley.net wrote:
>>> Are you actually using that mid-number multiplier? I was asking on the list last
>>> year if anyone anywhere actually did that. (It's a relic from before the shell
>>> provided $((123*456)).)
>>>
>>
>> I always interpreted it as the ability for someone putting an double character,
>> such as kb in instead of just a k, I have seen 12gb316 used (meaning
>> 12,316,000,000) in the auto-output for raid drive stats before today. I hadn't
>> thought of it being a multiplier character.
>
> Huh. That's an interesting non-posix extension I hadn't heard of before.
>
>>>> The 0.7.5 implementation assumes that x is part of a hexadecimal prefix so 0x12
>>>> is interpreted as 18 rather than 0, and 3x12 is an error.
>>
>> And 3x12 could be interpreted as 3 to the power 12 of whatever base is being
>> used as it would be in calculus.
>
> Not according to http://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html
>
>> For the bs=, cbs=, ibs=, and obs= operands, the application shall supply an
>> expression specifying a size in bytes. The expression, expr, can be:
>>
>>   A positive decimal number
>>
>>   A positive decimal number followed by k, specifying multiplication by 1024
>>
>>   A positive decimal number followed by b, specifying multiplication by 512
>>
>>   Two or more positive decimal numbers (with or without k or b) separated by
>>   x, specifying the product of the indicated values
>>
>> All of the operands are processed before any input is read.
>
> Product means multiply.
>
> I'm not _against_ extensions, toybox is also handling megabyte and gigabyte and
> such because it's not the 1980's anymore. (And blocks went from 512 bytes to 4k
> ten years ago: https://lwn.net/Articles/322777/ but that's _far_ too recent for
> posix to even be aware it's happened, let alone respond to it.)
>
>> However a leading 0 (0x12) would only define,
>> not say, it would depend on the base being used, not just hexidecimal. It could
>> also be quaternary (base4), octal (base8), or even radix (64bit),
>
> I didn't make the 0 means octal and 0x means hexadecimal prefixes up:
>
>   http://pubs.opengroup.org/onlinepubs/9699919799/functions/strtol.html
>
> It's already widely implemented in the Linux command line. Even though posix
> doesn't say "printf %d 0x1234" should print 4660, ubuntu's printf does (and yes
> that also means 01234 prints as 668).
>
> Every toybox command that takes a number argument has been doing this for years.
> You can go "head -n 0xa". This is the first complaint about it so far.
>
> In toybox I try to err on the side of _consistency_. All the commands behave the
> _same_ way. This would be an explicit exception where dd count=01234 is _not_
> interpreted as 668, meaning dd is a special case different from everything else.
>
>> all used in computation and hardware data collection input circuits.
>
> But not in c99 or posix.
>
>> I wonder how you would define/control this?
>
> I wouldn't?
>
> Base 4 output has never historically been part of any unix command I'm aware of.
> It's not in c99, it's not in posix, it's not in the linux standard base, it's
> not in busybox, it's not in ubuntu's current command line, it's not in my old
> the Red hat 9 qemu image (which I flung on https://busybox.net/downloads/qemu/
> years ago and is strangely enough still there, although the README isn't. user
> busybox password busybox, I think that's the root password too.)
>
> I'm not trying to make up new stuff, I'm trying to serve an existing userbase
> base of people who are part of a 50 year tradition. (The first pdp-7 unix was
> written in 1969, the 50th anniversary is next year.) The best way to serve them
> is to be consistent with historical practice.
>
> I'm also trying to provide something new users can learn easily. The best way to
> serve _them_ is provide something consistent so they only have to learn a trick
> once and then it works the same way everywhere.
>
> When consistency and historical practice collide, the answer is not always
> obvious to me.
>
> I've often chosen to implement only some of the posix spec, because portions of
> posix are deeply obsolete. It specifies the "sccs" source control system, batch
> control commands (qdel and friends), fortran 77, the "ed" line editor, uucp. The
> roadmap.html page has a section devoted to this.
>
> Sometimes you have to explicitly break posix: it says zcat undoes "compress"
> format (adaptive Lempel-Ziv coding, which was patented in 1984 and utterly dead
> by the time that expired). Everybody else has zcat undo "deflate", the algorithm
> introduced by pkzip 2.x in 1993. I care about current reality, not what posix
> says to do. I only care about posix when it documents current reality.
>
> In dd posix specifies ebcdic to ascii conversions, the "swap" byte swapping
> option (assuming only 16 bit systems have endianness issues), ucase/lcase case
> mapping that does not _conceptually_ work with multibyte encodings like (let's
> chop that data into blocks and then do case conversion across block boundaries
> without ever looking at data from another block...)
>
> A real user piped up and said their existing script doesn't work with my tool.
> That feedback is of interest to me.

and as the person who set us down the strtol path
(https://github.com/landley/toybox/commit/d5088a059649daf34e729995bb3daa3eb64fa432#diff-ce001a87e82f850a38fd93183e12b417),
the original request i had was just for hex. like you say, no-one's
used octal (on purpose) for anything other than mode for decades now.

> Rob
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net



More information about the Toybox mailing list