[Toybox] [CLEANUP] dd

Rob Landley rob at landley.net
Fri May 2 04:09:23 PDT 2014


I've been banging on dd, and here's the first section of my notes:

I'm uncomfortable having a command repeat the same information in
multiple places. In this case, something like C_SEEK is #defined, that
define is used in an array, then it's used in a case statement during
option parsing, and in an if statement performing the actual work. So if
you add or remove an option, you have to do so in four places.

We actually need the information in _maybe_ two places, and even that's
technically an optimization by not having the option parsing inside the
loop doing the actual work, thus cacheing the results of command line
option parsing. In theory there's common infrastructure to handle the
command line option parsing (lib/args.c), but it doesn't handle
keyword=value arguments which is why we have to implement something
different here. Given how common keyword=value arguments are, maybe we
can factor some of that out into lib in the future. We can _definitely_
factor out the "comma separated list" part (conv=), which is also needed
by "mount -o", but let's implement the potentially common infrastructure
locally for now, and then peel off into lib/lib.c later.

So: I want to massage this code to remove the need for us to #define
information before we can use it, and to make at least some of the new
option parsing infrastructure reusable by other commands.

The size suffices are also kind of weird weird. We already have atolx()
which is parsing suffixes, but these are _different_ suffixes, and it's
got a "d" option to do decimal instead of binary units (so multibyte
suffixes). It's using "c" is 1, "w" is 2, "b" is 512 (presumably meaning
disk block size, except it's 4096 these days)... It has mega and giga
but not tera? I guess you don't want to malloc a terabyte any time soon,
but that's fairly domain specific and might change within 10 years.

Is this really what posix specifies?

Nope: posix specifies "k" and "b", and that's it. This seems to have
come from gnu, and busybox seems to have copied gnu, and this is just
implementing what gnu did. Except busybox added "swab" last year, so the
set of options this command is implementing is basically a snapshot of
what busybox was doing at some point in the past.

(Why did busybox implement swab? The mailing list has somebody just
submitting it out of the blue without actually explaining their use case
for it. I'd think this option was on the obsolete side: the "nuxi"
problem was fairly specific to 16 bit systems, 32 bit systems have the
"xinu" problem instead. :)

Alright, back to the size suffix issue: trying to extend atolx() doesn't
seem helpful, that's returning "long" and this is "long long" (different
on 32 bit systems, still relevant for about the rest of this decade).
The suffixes are different, with at least one outright conflict (b=block
vs b=byte). The decimal sizes is sort of interesting but nobody's
actually _asked_ for it.

So yeah, we probably need a new function in to dd to do this parsing,
but we can at least move the table into the function. And we can use the
"one\0two\0three\0" trick to avoid an array of pointers (8 bytes each on
64 bit systems) to the actual string data, and have the names and values
both local variables. (I think locality of reference trumps collating
'em, but that's a judgement call bordering on
opinion.)

As for option parsing corner cases... why is it skipping spaces at the
beginning?

  $ dd " count=1"
  dd: unrecognized operand ` count=1'
  $ ~/busybox/busybox/busybox dd "count= 1"
  dd: invalid number ' 1'

In the submitted toybox implementation, "dd count=" with no argument
treats it as 0, both of those say invalid number. What does posix say?
It... doesn't seem to specify...

Wait, posix says number parsing should support:

  Two or more positive decimal numbers (with or without k or b)
  separated by x, specifying the product of the indicated values

What, _really_? Um, wow: gnu accepts it. (Busybox doesn't.) And the
_sad_ part is that posix only says you have to do that for bs, cbs, ibs,
and obs, but not count= (and indeed, gnu won't take it for count).

Right, I need to go closely read what posix actually says for this
command, and at least document our deviations from it...

Meanwhile, attached is a first stab at new parsing functions for dd that
might be reusable in lib. It doesn't actually work yet (I have several
test cases that break it and I'm not sure this snapshot even compiles)
but just FYI on the direction I'm going...

Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dd.c
Type: text/x-csrc
Size: 4491 bytes
Desc: not available
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20140502/cd282bca/attachment-0002.c>


More information about the Toybox mailing list