[Toybox] GNU tar sparse files

Rob Landley rob at landley.net
Tue May 7 00:26:39 PDT 2019


On 5/1/19 1:12 PM, Rob Landley wrote:
> On 4/30/19 10:59 AM, enh via Toybox wrote:
>> there's a tar file checked in to AOSP that causes trouble for toybox
>> tar. you can find it (and the script that generates it) here:
>>
>> https://android.googlesource.com/platform/system/update_engine/+/refs/heads/master/sample_images/
> 
> Adding support for sparse files is on my todo list, I just waited for a user to
> crop up. Lemme see what I can do...
> 
> (Hmmm, we already truncate the file when --overwrite (and O_EXCL otherwise, AHA
> bug if O_EXCL fails we don't skippy() when we didn't sendfile(), this is why
> proper testing includes ALL THE ERROR PATHS, grrr...)
> 
> Ahem. Working on it...

Ok, having run rather a lot of tests, S headers don't seem to use the "prefix"
field to store a prefix at all, even if putting the prefix in the leftover space
would make a long name fit in the "name" field. This is despite the fact that
the sparse offset/data list starts at 386 bytes into the header and the prefix
starts at 345, it just skips 41 bytes for no readily apparent reason? Each word
pair is 12+1+12+1=26 bytes, they could fit another pair in but _don't_. (In fact
if they wanted to go to the full 512 they could fit in 6 pairs instead of 4.)

The <strike>aristocrats</strike> FSF!

Anyway, the follow it with a "more is expected" flag (0 or 1), and then the
total sparse file's reported length (the length in the S record's length field
is the stored data length, this is apparently for sanity checking?)

When more is expected is set it's followed by a header that's just offset/length
pairs (up to 21 of them), with another more is expected flag at 504. (This isn't
a proper header, no "ustar" or record type or anything, you know because of the
previous header.) And then after the first one without a more is expected flag,
the data (which you chop up yourself based on the offset/length array).

Easy enough to make it read them. Slightly fiddlier to make it write them (gotta
do two passes with SEEK_HOLE).

Oh, and the data they put in adds a hole of size zero at the end of the file,
which seems completely unnecessary but if I'm still going for binary
compatibility with the other thing's output...

And I still need to teach lib/args.c to allow "tar --sparse cf blah.tgz
filename" don't I? Tomorrow is 3 months from the last release...

Rob



More information about the Toybox mailing list