[Toybox] Numeric values in dd operands

Rob Landley rob at landley.net
Tue Feb 20 09:28:38 PST 2018


On 02/19/2018 09:09 PM, scsijon wrote:
> On 02/20/2018 08:32 AM, toybox-request at lists.landley.net wrote:
>> Are you actually using that mid-number multiplier? I was asking on the list last
>> year if anyone anywhere actually did that. (It's a relic from before the shell
>> provided $((123*456)).)
>>
> 
> I always interpreted it as the ability for someone putting an double character,
> such as kb in instead of just a k, I have seen 12gb316 used (meaning
> 12,316,000,000) in the auto-output for raid drive stats before today. I hadn't
> thought of it being a multiplier character.

Huh. That's an interesting non-posix extension I hadn't heard of before.

>>> The 0.7.5 implementation assumes that x is part of a hexadecimal prefix so 0x12
>>> is interpreted as 18 rather than 0, and 3x12 is an error.
> 
> And 3x12 could be interpreted as 3 to the power 12 of whatever base is being
> used as it would be in calculus.

Not according to http://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html

> For the bs=, cbs=, ibs=, and obs= operands, the application shall supply an
> expression specifying a size in bytes. The expression, expr, can be:
>
>   A positive decimal number
>
>   A positive decimal number followed by k, specifying multiplication by 1024
>
>   A positive decimal number followed by b, specifying multiplication by 512
>
>   Two or more positive decimal numbers (with or without k or b) separated by
>   x, specifying the product of the indicated values
>
> All of the operands are processed before any input is read.

Product means multiply.

I'm not _against_ extensions, toybox is also handling megabyte and gigabyte and
such because it's not the 1980's anymore. (And blocks went from 512 bytes to 4k
ten years ago: https://lwn.net/Articles/322777/ but that's _far_ too recent for
posix to even be aware it's happened, let alone respond to it.)

> However a leading 0 (0x12) would only define,
> not say, it would depend on the base being used, not just hexidecimal. It could
> also be quaternary (base4), octal (base8), or even radix (64bit),

I didn't make the 0 means octal and 0x means hexadecimal prefixes up:

  http://pubs.opengroup.org/onlinepubs/9699919799/functions/strtol.html

It's already widely implemented in the Linux command line. Even though posix
doesn't say "printf %d 0x1234" should print 4660, ubuntu's printf does (and yes
that also means 01234 prints as 668).

Every toybox command that takes a number argument has been doing this for years.
You can go "head -n 0xa". This is the first complaint about it so far.

In toybox I try to err on the side of _consistency_. All the commands behave the
_same_ way. This would be an explicit exception where dd count=01234 is _not_
interpreted as 668, meaning dd is a special case different from everything else.

> all used in computation and hardware data collection input circuits.

But not in c99 or posix.

> I wonder how you would define/control this?

I wouldn't?

Base 4 output has never historically been part of any unix command I'm aware of.
It's not in c99, it's not in posix, it's not in the linux standard base, it's
not in busybox, it's not in ubuntu's current command line, it's not in my old
the Red hat 9 qemu image (which I flung on https://busybox.net/downloads/qemu/
years ago and is strangely enough still there, although the README isn't. user
busybox password busybox, I think that's the root password too.)

I'm not trying to make up new stuff, I'm trying to serve an existing userbase
base of people who are part of a 50 year tradition. (The first pdp-7 unix was
written in 1969, the 50th anniversary is next year.) The best way to serve them
is to be consistent with historical practice.

I'm also trying to provide something new users can learn easily. The best way to
serve _them_ is provide something consistent so they only have to learn a trick
once and then it works the same way everywhere.

When consistency and historical practice collide, the answer is not always
obvious to me.

I've often chosen to implement only some of the posix spec, because portions of
posix are deeply obsolete. It specifies the "sccs" source control system, batch
control commands (qdel and friends), fortran 77, the "ed" line editor, uucp. The
roadmap.html page has a section devoted to this.

Sometimes you have to explicitly break posix: it says zcat undoes "compress"
format (adaptive Lempel-Ziv coding, which was patented in 1984 and utterly dead
by the time that expired). Everybody else has zcat undo "deflate", the algorithm
introduced by pkzip 2.x in 1993. I care about current reality, not what posix
says to do. I only care about posix when it documents current reality.

In dd posix specifies ebcdic to ascii conversions, the "swap" byte swapping
option (assuming only 16 bit systems have endianness issues), ucase/lcase case
mapping that does not _conceptually_ work with multibyte encodings like (let's
chop that data into blocks and then do case conversion across block boundaries
without ever looking at data from another block...)

A real user piped up and said their existing script doesn't work with my tool.
That feedback is of interest to me.

Rob



More information about the Toybox mailing list