[Toybox] [PATCH] tar: Fix support for long symbolic links

Rob Landley rob at landley.net
Thu Mar 21 15:08:37 PDT 2019

On 3/20/19 10:08 PM, Daniel Mentz wrote:
> On Wed, Mar 20, 2019 at 7:24 PM Rob Landley <rob at landley.net> wrote:
>> On 3/19/19 6:38 PM, Daniel Mentz via Toybox wrote:
>>> -// TODO: does this need NUL terminator?
>>>      if (strlen(lnk) > sizeof(hdr.link))
>> The todo item you erased there was a note to self about checking what happens
>> when a value fits exactly in the field and thus there is no NUL terminator, and
>> whether the read side was handling that correctly. (Similarly, in theory a
> If I'm not mistaken, the code on the read side that handles the
> tar.link field is the following:
> if (!TT.hdr.link_target && *tar.link)
>   TT.hdr.link_target = xstrndup(tar.link, sizeof(tar.link));

That would be the read side I said I hadn't yet checked, yes. (And I think I
rewrote that code between when I wrote the TODO and when you just checked it?)

The note also tells me to add tests for the corner cases of the file sizes
(which I tested from the command line for "name" a couple commits back but
haven't done for "link" yet, and I need to add them to the tar.tests file which
I'm currently completely rewriting because it needs "done as root" vs "not done
as root" sections _and_ it needs at least some tests that work in the absence of
gzip on the test system; right now compression failure makes almost all tests
fail. I didn't write that test file and my complaints about it from last week
remain true.

>>>        write_longname(lnk, 'K'); //write longname LINK
>>> -// TODO: this will error_exit() if too long, not truncate.
>>> -    xstrncpy(hdr.link, lnk, sizeof(hdr.link));
>>> +    strncpy(hdr.link, lnk, sizeof(hdr.link));
>> That todo was because it should be strlcpy(), but unfortunately Ulrich Drepper
>> was an asshole and the committee that replaced him is very, very slow. As in
>> this was 2014: https://lwn.net/Articles/612244/
> I believe that strncpy is the correct function, since the string does
> not need to be NULL terminated if it's 100 bytes long (i.e.

And doesn't need to be memset with zeroes if it's 3 bytes long, since it was
already memset with zeroes above. It's not a _lot_ of unnecessary work but it's
annoying. That's why xstrncpy() exists, and there's only 3 instances of the
function outside of pending:

$ grep [^x]strncpy toys/*/*.c
toys/net/tunctl.c:  strncpy(ifr->ifr_name, *toys.optargs, sizeof(ifr->ifr_name));
toys/other/mkdosfs.c:  strncpy(fat+3, oem, 8); // OEM
toys/other/mkswap.c:  if (TT.L) strncpy(label, TT.L, 15);

> sizeof(hdr.link)). If you were to use strlcpy(), you would actually
> lose one useful byte i.e. you wouldn't be able to fit a string that is
> exactly 100 bytes long (you'd truncate it to 99 bytes and add a NULL
> byte that's not required).

I know, that's why I didn't. (That plus the glibc not having strlcpy thing.)

This is why the TODO was still there. I had not gone back and attempted to
resolve the issue yet. I'm unhappy with the gratuitous memset of the rest of the
field that strncpy does is the wrong behavior, but it's the function that's
currently available and it was blocking you, so I changed it for now. Doesn't
mean I don't want to take another look at it.

> I briefly looked at a tar file generated by GNU tar, and hdr.link
> wasn't NULL terminated there either (I had GNU tar store a symbolic
> link where the target was longer than 100 bytes).

As I did with the file side multiple days ago, yes. And fixed up the code to
handle it properly (The "Similar to commit 28711d30" you mentioned at the start
of yours). I had not confirmed that all the cases were handled (the "overflow
issue" I moved on to was files to big to fit in the size field, and thus the big
endian binary encoding with the high bit set), but I need to add test suite
entries for all of the corner cases I make the code handle and test from the
command line. That's why the TODO was still there.

The pending/ directory means it's code I haven't properly reviewed yet. Android
using stuff out of the pending directory is awkward. I'm trying to get it _out_
of the pending directory, but that tends to involve really big changes.

The issue I'm poking at _now_ is that testing tar against _itself_ proves very
little to me, and I can't depend on multiple implementations being available
(and if so, which is right?), so I'd like at least simple tarballs to be binary
identical to what ubuntu produces, and I started with "touch -t 198001010101
hello; tar c hello --owner root --group root | sha1sum" to get a known sha1sum,
but they're _not_ identical because gnu tar is adding WAAAAAAY too much NUL
padding to the end, for reasons I do not understand, which is making the sha1sum
not match even when I add the stupid space back after the checksum field.

So I ran it through "head -c $((3*512))", but I tried to understand why gnu was
doing that first. (Not a clue. Working my way through
https://www.gnu.org/software/tar/manual/html_node/Standard.html again in case I
missed something, but given it says that numeric fields are 0 padded N-1 length
with a NUL terminator and does NOT mention the weird thing chksum does means
this is not a very good document. I.E. adding the darn space back after the
checksum violates GNU's own page on what the tarball is supposed to look like.)

Anyway, back to the topic: this file is covered with TODO entries the same way
an active construction site would be covered with warning tape. I'm doing
https://landley.net/toybox/cleanup.html and have made a dozen commits to this
one file since March 3. It's lingered in pending for years partly because I've
been dreading finding out what Al Viro was talking about in
http://lkml.iu.edu/hypermail/linux/kernel/0112.2/1540.html and partly because
I'm never sure what counts as compatibility testing. (There were many historical
format variants which I think no longer matter? For once the 7 year rule
(http://landley.net/toybox/faq.html#support_horizon) may not be long _enough_
here, but it's what I've got.)

In that context, deleting 2 of 19 TODO notes during this process may seem
helpful, but isn't. Thanks for the bug report, I'll try not to break you again
in what I commit, but the next thing I need to do is rewrite the tar tests so
they actually test the sort of things you're hitting, so I _can_ reasonably
regression test before checking in.


P.S. Plenty of todos that aren't in the file, those were just notes on lines of
code as I was going past. For example, I'm adding an xabspath() constraint where
each name it's about to create has to fit under the top level directory it's
exracting under, that's why I'm storing the abspath of cwd in main. This
_doesn't_ apply to symlinks but _does_ apply to hardlinks. And then there's the
whole sparse file can of worms, and xattr/selinux support.

I guess the lesson here is it was a mistake to make any of my TODO entries
visible to other people. There was zero interest in
http://lists.landley.net/pipermail/toybox-landley.net/2019-March/010268.html and
making it public results in people trying to delete entries out of it and then
when I object that I'm not done with them yet I get explained at, so I should
just just remove them all in the next checkin, keep them in a file here like I
usually do, and not bother anybody else with them.

P.P.S. This sort of trivia is why I have a blog. I should really catch it up to

More information about the Toybox mailing list