[Toybox] tar tests.

scsijon scsijon at lamiaworks.com.au
Thu Mar 21 19:47:37 PDT 2019



On 22/03/19 05:12, Rob Landley wrote:
> On 3/21/19 4:36 AM, scsijon wrote:
>> Dumb Question, but,
>>
>> And if the filename was a link?
>>
>> Sorry Rob..., hopefully your seeing where i'm coming from as i'm not sure I can
>> explain it further without a lot of waffle.
>>
>> regards
>> scsijon
> 
> Not sure I am?
> 
> Symlinks can have any character except NUL in them, it's basically file contents
> interpreted as a name (but the API to set the contents takes a NUL terminated
> string). So theoretically unlimited length (the kernel guys have been trying to
> clean the PATH_MAX stuff out of the system for a while now, and yes I have a
> todo item to make sure rm handles that properly but it's another one of those
> "test it in a VM because if it fails SO MUCH MESS" things...)
> 
>    X=a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a; while true; do mkdir -p $X; cd $X; done
> 
> Tar starts with 512 byte records each of which has an octal "size" field, when
> it's not zero each record is followed by size many bytes of data, and then
> padded with NUL bytes up to an even 512 length before the next header. And then
> you end with 2 records worth of NUL bytes... except the gnu/dammit one is ending
> with _19_ records worth of NUL bytes? (When I touch a file and tar it by itself,
> the result is 0x2800 bytes? Almost 10k of padding added? What, did it round it
> up to a multiple of 20 records? If so, why?)
> 
> In tar, there's a "name" field and a "link" field, each 100 bytes, which can
> strings up to 100 (in which case no NUL terminator!) but then for 101 and longer
> there are "extension headers" which are headers of type L and K, which means "my
> file contents is the value to use for the next header". (SEE???!? COMBINING
> CHARACTERS SHOULD BE PREFIXES NOT #*%(&#&% SUFFIXES ARE YOU LISTENING UNICODE
> CLOWNS? Grumble grumble... Yeah still a watch.c todo item pending there.)
> 
> The L record does _not_ replace the "link" field (that would make sense), it's
> name. The K record provides a long link target value.
> 
> Posix also defines an arbitrarily extensible header type 'x' which nobody ever
> uses for anything (I can't find an example to test against!) where the contents
> is a bunch of newline separated keyword=value lines, and the only one the code I
> inherited was parsing is "path=" which is an alternative to the K record above.
> (Is there a link= or similar? Who knows? Todo item to try to look that up...)

The only one I know relates to url and that's in  www3.org and 
mozilla.org's docs/rfc's. But even that's confusing as in www3.org it's 
in a couple of rfc's caching sections still, but later in the same docs 
it says that extensible header type 'x' is not to be cached, and it 
leaves it as that.

I don't suppose it relates to the mac addresses we use to use on 
networks, as they could be 'copy-directed' as 192bytes long addresses to 
two locations if the port was made 'open for monitoring' and then the 
resultant output 'cleaned-up' with tar to the real mac address. The 
process doesn't work nowadays of course.

So I don't think that's it, unless it's one of those 'left-over' things.

> 
> The file types that create stuff use the octal digits 0-7 as their file type.
> Regular file is 0, 1 is hardlink, 2 is symlink, 3 chr, 4 blk, 5 dir, 6 fifo, and
> 7 is... also regular file for some reason? (Dig dig... Once upon a time it
> requested "contiguous allocation", which is not currently a thing that I am
> aware of, so yeah: regular file.)
> 
> In theory "hardlink" and "symlink" are the only users of the link field. In
> practice given we're testing for "a regular file ending in / is actually a
> directory"... 

My brain says, not always, something about I/O ?control maybe? as that 
worked with tar. There is an old paper rfc on that somewhere in my 
archive(/paper) referance library room, as it can be used for something 
else, something about 4bits+4bits can appear as a / but isn't! I will 
see if I can find it this weekend. No promises, but will get back early 
next week if I have found it.

I need to look at more old tarballs. :P
> 
> Rob
> 
> P.S. Inclusion defaults to --anchored and exclusion defaults to --no-anchored
> and the ones on the command line aren't patterns. Wheee! I held off on opening
> this can of worms for a reason...
> 

ok.
scsijon


More information about the Toybox mailing list