[Toybox] tar tests.

Rob Landley rob at landley.net
Thu Mar 21 11:12:40 PDT 2019


On 3/21/19 4:36 AM, scsijon wrote:
> Dumb Question, but,
> 
> And if the filename was a link?
> 
> Sorry Rob..., hopefully your seeing where i'm coming from as i'm not sure I can
> explain it further without a lot of waffle.
> 
> regards
> scsijon

Not sure I am?

Symlinks can have any character except NUL in them, it's basically file contents
interpreted as a name (but the API to set the contents takes a NUL terminated
string). So theoretically unlimited length (the kernel guys have been trying to
clean the PATH_MAX stuff out of the system for a while now, and yes I have a
todo item to make sure rm handles that properly but it's another one of those
"test it in a VM because if it fails SO MUCH MESS" things...)

  X=a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a; while true; do mkdir -p $X; cd $X; done

Tar starts with 512 byte records each of which has an octal "size" field, when
it's not zero each record is followed by size many bytes of data, and then
padded with NUL bytes up to an even 512 length before the next header. And then
you end with 2 records worth of NUL bytes... except the gnu/dammit one is ending
with _19_ records worth of NUL bytes? (When I touch a file and tar it by itself,
the result is 0x2800 bytes? Almost 10k of padding added? What, did it round it
up to a multiple of 20 records? If so, why?)

In tar, there's a "name" field and a "link" field, each 100 bytes, which can
strings up to 100 (in which case no NUL terminator!) but then for 101 and longer
there are "extension headers" which are headers of type L and K, which means "my
file contents is the value to use for the next header". (SEE???!? COMBINING
CHARACTERS SHOULD BE PREFIXES NOT #*%(&#&% SUFFIXES ARE YOU LISTENING UNICODE
CLOWNS? Grumble grumble... Yeah still a watch.c todo item pending there.)

The L record does _not_ replace the "link" field (that would make sense), it's
name. The K record provides a long link target value.

Posix also defines an arbitrarily extensible header type 'x' which nobody ever
uses for anything (I can't find an example to test against!) where the contents
is a bunch of newline separated keyword=value lines, and the only one the code I
inherited was parsing is "path=" which is an alternative to the K record above.
(Is there a link= or similar? Who knows? Todo item to try to look that up...)

The file types that create stuff use the octal digits 0-7 as their file type.
Regular file is 0, 1 is hardlink, 2 is symlink, 3 chr, 4 blk, 5 dir, 6 fifo, and
7 is... also regular file for some reason? (Dig dig... Once upon a time it
requested "contiguous allocation", which is not currently a thing that I am
aware of, so yeah: regular file.)

In theory "hardlink" and "symlink" are the only users of the link field. In
practice given we're testing for "a regular file ending in / is actually a
directory"... I need to look at more old tarballs. :P

Rob

P.S. Inclusion defaults to --anchored and exclusion defaults to --no-anchored
and the ones on the command line aren't patterns. Wheee! I held off on opening
this can of worms for a reason...


More information about the Toybox mailing list