[Toybox] tar tests.

Rob Landley rob at landley.net
Mon Mar 18 20:24:38 PDT 2019


On 3/16/19 8:43 PM, Rob Landley wrote:
> On 3/14/19 1:00 PM, Rob Landley wrote:
>> On 3/14/19 12:08 AM, enh wrote:
>>> On Wed, Mar 13, 2019 at 4:05 PM Rob Landley <rob at landley.net> wrote:
>>>
>>> ((speaking of which, some of the tar tests are failing now.))
>>
>> I usually consider "consumes its own output" to be "necessary but not
>> sufficient" as tests go: if it makes the same mistake at each end, we won't
>> catch it. So during development I've been testing that it can extract and
>> reproduce the linux-4.20.tar.gz tarball.
> 
> The next reason these are bad tests is they're not orthogonal. The first test is
> testing 37 things so if it doesn't work you have no idea why, and a single
> failure (in this case autodetection of decompression type when you didn't
> specify) makes pretty much all the tests fail.

Corner case: the gnu/dammit tar fills out the checksum field weird, I kinda
dowanna do that but the resulting tarballs won't be binary identical if I _don't_...

Backstory: tar header fields are fixed length records with left-justified ascii
contents, padded with NUL bytes. The numerical ones are octal strings (because
PDP-7 used a 6 bit byte, the one Ken and Dennis had was installed with 1024
18-bit words of memory).

The "checksum" field is just the sum of all the bytes in the header, and is
calculated as if the checksum field itself is memset with space characters.
(Then you write the checksum into the field after you've calculated it.) The
checksum has 7 digits reserved (plus a NUL) but due to all the NUL bytes in the
header, the checksum is almost always 6 digits. So it _should_ have 2 NUL bytes
after it... but it doesn't. It has a NUL and a space, ala:

00000090  31 34 31 00 30 31 32 32  36 36 00 20 30 00 00 00  |141.012266. 0...|

The _reason_ for this is historical implementations would memset the field,
iterate over the values, and then sprintf() into the field which would add a
NULL terminator but not overwrite the last space in the field. And the
gnu/dammit tar is either _doing_ that, or emulating it.

I'm not memsetting spaces into the cksum field, I'm starting with 8*' ' and
skipping those 8 bytes... but the result is I'm printing out two NUL bytes at
the end instead of NUL space. And if you check for binary identical files...

It's _almost_ certain no tar program out there is going to care about this, but
if I don't and I use canned tarballs, CHECK_HOST would always fail with the
gnu/dammit implementation. (Or possibly busybox, I haven't looked at what that's
doing yet.)

Rob



More information about the Toybox mailing list