[Toybox] dd posix spec.

Rob Landley rob at landley.net
Wed Jul 12 01:22:36 PDT 2017


Last conversation about the dd spec on here, people suggested the spec
was literally a practical joke. That said, posix published the darn
thing, so let's look through it.

First, we will not be implementing the whole thing. It requires
ascii/ebcdic mapping, which was dead 30 years ago. So the question is
what _subset is worth implementing.

bs= without data modifying conversions means you output what you input.
(If you got a short read, you do a write of that size.) Otherwise, you
collate input blocks into output blocks of the requested size.

question: what happens if there's a short write? Do you collate to the
next full output block size, or do you re-write the missing chunk as a
short write?

Question: what if bs= and obs= are both specified? (Answer: bs= wins.)

sync is silly. swab is silly. But both are easy to do...

Question: lcase and ucase are utf8 now, and any fixed block size is
going to chop characters in the middle. It says conversions operate
independent of input blocking, so I guess I need a minimum buffer size
of 512 or so... (gotta look up what this block/unblock stuff is doing...)

Ok, block or unblock do nothing unless you specify cbs= "conversion
block size". Which is different from ibs=, obs=, and bs=. Right, I'm
going to throw that in the "did not implement" pile and wait for
somebody to complain.

There's a bs=123x456 in the spec, we didn't previously implement that,
I'm not adding it now because it's crazy and $((123*456)) exists.

No if= default to stdin, no of= default to stdout. Got it.

sigint causes progress indicator output, but I have a "todo" that says
it's not ending the process...

Question: if bs= _isn't_ specified (but nor is ibs= or obs=) I vaguely
recall the default block size is 512 bytes. Is that considered bs= being
specified in terms of the "write what you wrote" behavior, or do we fill
up 512 byte output blocks if we read less than that? (This matters if
you dd from /dev/ttyS0 and get bytes typed by humans.)

If your output block is a short write, do you retry the rest of tha that
block or do a whole next block?

of= is truncated by default, to seek= position if that's specified.
Disabled with conv=notrunc.

Edge case: If you specify ibs=prime1 obs=prime2 then the smallest
internal buffer you can have without memcpy is ibs*obs... except even
_that_ won't work if you have a short read that screws up the alignment and

If you're willing to do memcpy to preserve block size, then you just
need ibs+obs as your worst case, and can memcpy to realign after each
block if necessary.

Three potential output formats:

  "%u+%u records in\n", <number of whole input blocks>,
      <number of partial input blocks>
  "%u+%u records out\n", <number of whole output blocks>,
      <number of partial output blocks>
  "%u truncated %s\n", <number of truncated blocks>, "record" (if
      <number of truncated blocks> is one) "records" (otherwise)

Great, and now THIS nonsense in the rationale:

> Another point is that a failed read on a regular file or a disk
> generally does not increment the file offset, and dd must then
> seek past the block on which the error occurred; otherwise, the
> input error occurs repetitively. When the input is a magnetic
> tape, however, the tape normally has passed the block containing
> the error when the error is reported, and thus no seek is necessary.

So... try to seek after an error but ignore failure of lseek()?

The bit about writing a partial block after an error without noerror...
I guess that's from obs being larger than ibs? (Because read() either
returns data _or_ error, not both...)

Sigh. What a mess. Next up, a similarly close reading of the man page...

Rob


More information about the Toybox mailing list