[Toybox] GNU tar sparse files

enh enh at google.com
Wed May 8 17:51:30 PDT 2019


well, that motivated me to add SEEK_DATA and SEEK_HOLE to bionic...
they'll be in NDK r21.

(why didn't we get them for free from the uapi headers? because the
SEEK_ constants are in <linux/fs.h> which also contains a ton of other
stuff, in particular a BLOCK_SIZE macro that conflicts with just about
every piece of code ever written...)

From: Rob Landley <rob at landley.net>
Date: Tue, May 7, 2019 at 12:26 AM
To: enh, toybox

> On 5/1/19 1:12 PM, Rob Landley wrote:
> > On 4/30/19 10:59 AM, enh via Toybox wrote:
> >> there's a tar file checked in to AOSP that causes trouble for toybox
> >> tar. you can find it (and the script that generates it) here:
> >>
> >> https://android.googlesource.com/platform/system/update_engine/+/refs/heads/master/sample_images/
> >
> > Adding support for sparse files is on my todo list, I just waited for a user to
> > crop up. Lemme see what I can do...
> >
> > (Hmmm, we already truncate the file when --overwrite (and O_EXCL otherwise, AHA
> > bug if O_EXCL fails we don't skippy() when we didn't sendfile(), this is why
> > proper testing includes ALL THE ERROR PATHS, grrr...)
> >
> > Ahem. Working on it...
>
> Ok, having run rather a lot of tests, S headers don't seem to use the "prefix"
> field to store a prefix at all, even if putting the prefix in the leftover space
> would make a long name fit in the "name" field. This is despite the fact that
> the sparse offset/data list starts at 386 bytes into the header and the prefix
> starts at 345, it just skips 41 bytes for no readily apparent reason? Each word
> pair is 12+1+12+1=26 bytes, they could fit another pair in but _don't_. (In fact
> if they wanted to go to the full 512 they could fit in 6 pairs instead of 4.)
>
> The <strike>aristocrats</strike> FSF!
>
> Anyway, the follow it with a "more is expected" flag (0 or 1), and then the
> total sparse file's reported length (the length in the S record's length field
> is the stored data length, this is apparently for sanity checking?)
>
> When more is expected is set it's followed by a header that's just offset/length
> pairs (up to 21 of them), with another more is expected flag at 504. (This isn't
> a proper header, no "ustar" or record type or anything, you know because of the
> previous header.) And then after the first one without a more is expected flag,
> the data (which you chop up yourself based on the offset/length array).
>
> Easy enough to make it read them. Slightly fiddlier to make it write them (gotta
> do two passes with SEEK_HOLE).
>
> Oh, and the data they put in adds a hole of size zero at the end of the file,
> which seems completely unnecessary but if I'm still going for binary
> compatibility with the other thing's output...
>
> And I still need to teach lib/args.c to allow "tar --sparse cf blah.tgz
> filename" don't I? Tomorrow is 3 months from the last release...
>
> Rob



More information about the Toybox mailing list