[Toybox] dd cleanup notes

Tue Jul 12 11:03:21 PDT 2016

This xmove_fd() thing should just be xopen_nostd() in lib, and I should
probably go through and audit the existing commands to see if they
should use it. (I've been doing a lot of mental bookkeeping of "what if
this returns stdin, will it still work ok" but just having a function
that DOESN'T is a lot cleaner. And I should plug the holes with
/dev/null. And this implementation assumes that if our parent closed
stdout() they didn't provide us an existing fd #3, dunno where they got
that assumption from...)

The function strsuftoll() is a bit awkward, for a comple reasons.

The big one is we have atolx_range() in lib already, which ALMOST does
what this wants. But that uses long instead of long long, which is a
longstanding issue I blogged about recently at
http://landley.net/notes-2016.html#10-07-2016 and the tl;dr of which is
"long being 32 bits on 32 bit systems is kinda awkward". I might switch
atolx_range() to use long long internally soonish, but I can't easily
change the option parsing plumbing because it depends on LP64 and
sizeof(pointer) != sizeof(long long) on 32 bits.

The other problem is that the b and w suffixes are defined differently
here: in atolx() "b" is a synonym for "c" (meaning bytes), and here "b"
is 512 and "w" is 2. I've literally never seen either of these used, but
they're in the posix spec... along with ebcdic conversion support and
some other real lunacy.

Why did I add b and c to atolx?  Commit 87c06e15329a says "find" needs c
to mean bytes, and commit b8ef889cbfae says od needed "b" to mean bytes.
Yay source control!

Another thing posix specifies here, which this doesn't seem to do, is
bs=1024kx1024M (which would be 1 petabyte, but still). If we're ignoring
that can we get away with ignoring the b and w extensions and just use
atolx_range()? (Which would be limited to 2G maximum unit size on 32 bit
systems?) If I'd initially written it that way, I'm pretty sure the
answer is "yes" (since moving to 64 bit solves that size limitation).
But since I got a version in pending that people are actually using,
this would be a regression, so I guess I don't get to make that
architectural decision. (Pending!)

As for compatibility with "b" and "w", if I was planning to implement
the majority of what posix says about this command I'd find that
deviation annoying, but given ebcdic conversion support and 123x234
already being dropped in this implementation with nobody noticing, I'd
happily have done without it if it had been my choice (and waited for
somebody to complain; adding "w" is trivial but silly, adding "b"
conflicts with the existing "b".)

Would adding "d" to specify decimal sizes be actually useful to anyone?
The dd man page says:

  N and BYTES may be followed by the following multiplicative suffixes:
  c =1, w =2, b =512, kB =1000, K =1024, MB =1000*1000, M =1024*1024,
  xM =M GB =1000*1000*1000, G =1024*1024*1024, and so on for T, P, E,
  Z, Y.

But what this IMPLEMENTS is

  { "c", 1 }, { "w", 2 }, { "b", 512 },
  { "kD", 1000 }, { "k", 1024 }, { "K", 1024 },
  { "MD", 1000000 }, { "M", 1048576 },
  { "GD", 1000000000 }, { "G", 1073741824 }

I.E. ubuntu has an extra "B" suffix meaning decimal,and this has an
extra "D" meaning decimal...? (I can add it to atolx(), just... lemme
know what makes sense here?)

Hang on...

      case C_BS:
        TT.in.sz = TT.out.sz = strsuftoll(arg, 1, LONG_MAX);
        break;
      case C_IBS:
        sz = strsuftoll(arg, 1, LONG_MAX);
        if (!(toys.optflags & C_BS)) TT.in.sz = sz;
        break;
      case C_OBS:
        sz = strsuftoll(arg, 1, LONG_MAX);
        if (!(toys.optflags & C_BS)) TT.out.sz = sz;

That's capping the range to LONG_MAX. So half the uses here are acting
like atolx_range() _ANYWAY_, although count= seek= and skip= can do the
full unsigned long long range. So if you fed in a block size of 4G it
would go "boing" even on 64 gigs...

Let's see, what else:

Does C_COUNT need to be recorded as a flag?

  $ dd if=/dev/zero of=boing bs=1 count=1
  0+0 records in
  0+0 records out
  0 bytes (0 B) copied, 0.000591678 s, 0.0 kB/s

Yes, apparently. Well, I could use -1 since...

  $ dd if=/dev/zero of=boing bs=1 count=-1
  dd: invalid number ‘-1’

Hmmm, if you seek output but don't conv=notrunc, presumably it truncates
to _that_ point? (Yes it does. I need to add a test for that to the test
suite...)

If you're wondering why I haven't cleaned this one up before now... it's
got issues.

Rob