[Toybox] tar tests.
Rob Landley
rob at landley.net
Tue Mar 19 12:39:12 PDT 2019
On 3/18/19 10:24 PM, Rob Landley wrote:
> On 3/16/19 8:43 PM, Rob Landley wrote:
>> On 3/14/19 1:00 PM, Rob Landley wrote:
>>> On 3/14/19 12:08 AM, enh wrote:
>>>> On Wed, Mar 13, 2019 at 4:05 PM Rob Landley <rob at landley.net> wrote:
>>>>
>>>> ((speaking of which, some of the tar tests are failing now.))
>>>
>>> I usually consider "consumes its own output" to be "necessary but not
>>> sufficient" as tests go: if it makes the same mistake at each end, we won't
>>> catch it. So during development I've been testing that it can extract and
>>> reproduce the linux-4.20.tar.gz tarball.
>>
>> The next reason these are bad tests is they're not orthogonal. The first test is
>> testing 37 things so if it doesn't work you have no idea why, and a single
>> failure (in this case autodetection of decompression type when you didn't
>> specify) makes pretty much all the tests fail.
>
> Corner case: the gnu/dammit tar fills out the checksum field weird, I kinda
> dowanna do that but the resulting tarballs won't be binary identical if I _don't_...
>
> Backstory: tar header fields are fixed length records with left-justified ascii
> contents, padded with NUL bytes. The numerical ones are octal strings (because
> PDP-7 used a 6 bit byte, the one Ken and Dennis had was installed with 1024
> 18-bit words of memory).
>
> The "checksum" field is just the sum of all the bytes in the header, and is
> calculated as if the checksum field itself is memset with space characters.
> (Then you write the checksum into the field after you've calculated it.) The
> checksum has 7 digits reserved (plus a NUL) but due to all the NUL bytes in the
> header, the checksum is almost always 6 digits. So it _should_ have 2 NUL bytes
> after it... but it doesn't. It has a NUL and a space, ala:
>
> 00000090 31 34 31 00 30 31 32 32 36 36 00 20 30 00 00 00 |141.012266. 0...|
>
> The _reason_ for this is historical implementations would memset the field,
> iterate over the values, and then sprintf() into the field which would add a
> NULL terminator but not overwrite the last space in the field. And the
> gnu/dammit tar is either _doing_ that, or emulating it.
>
> I'm not memsetting spaces into the cksum field, I'm starting with 8*' ' and
> skipping those 8 bytes... but the result is I'm printing out two NUL bytes at
> the end instead of NUL space. And if you check for binary identical files...
>
> It's _almost_ certain no tar program out there is going to care about this, but
> if I don't and I use canned tarballs, CHECK_HOST would always fail with the
> gnu/dammit implementation. (Or possibly busybox, I haven't looked at what that's
> doing yet.)
The cksum field is only filling out 6 digits but is allocated 8 bytes. The
reason is that 777777 = 262143 which is one less than 512*512, meaning if all
512 bytes were set to 255 you wouldn't set the high bit of that, so it can never
use the seventh digit.
So what it's doing is filling out 6 digits (with zero padding on the left if
it's short!) and then sticking a space in the last byte.
*shrug* I can do that. It's silly, but I can do it.
Rob
More information about the Toybox
mailing list