[Toybox] [PATCH] Fix 'cp -R' handling of symlinks

Rob Landley rob at landley.net
Sat Jan 10 12:01:58 PST 2015


Sorry, can't reply to Stephen's message because gmail went "the
user-not-subscribed notice had the same message ID as the actual
approved message, obviously I can't deliver two different messages with
the same message ID, let me delete all but the first with no appeal".

Same reason email threading is completely useless for messages I'm cc'd
on: half of 'em wind up in my inbox and half of them wind up in the
message folder. Depends whether the one with or without the list-id:
header got deleted by gmail this time around, which is a race condition...

(I miss the web archive...)

However, I _can_ cut and paste your reply at the end to the front. :)

> On Thu, Jan 8, 2015 at 7:03 AM, Stephen Mcgruer <smcgruer at google.com>
wrote:
>> Currently 'cp -R' incorrectly handles symlinks in the source
directory, as
>> it reuses the same global buffer (toybuf) for successive pairs of
>> 'readlinkat', 'symlinkat'.

toybuf is sizeof(PATH_MAX). I'm aware PATH_MAX went away but for
symlinks I was just going to wait for somebody to complain about it
truncating one. :)

>> The issue is that readlinkat does not append a
>> null byte to the buffer,

Oh joy. I'd missed that. That's... STUPID.

>> so longer symlink paths 'stick around' in toybuf
>> and corrupt later, shorter paths.

Every readlink call in the tree needs checking, thanks for the heads up.

>> To fix this, I just inserted a null-byte using the return value of
>> readlinkat (which returns the number of bytes written). Note that this is
>> safe because the existing code first checks that sizeof(toybuf) > i.
It's a
>> little messy, but I was trying to avoid unrolling the ternary into
multiple
>> layers of conditionals. Perhaps someone on the list can think of a better
>> solution!

I've got an xreadlink() but that does a malloc. (And it does a darn
iterative search for the size; it would be a binary search but the
common case is short links and redoing the allocation to make it
_shorter_ is one of those "doing this is unnecessary, returning the
original would work! But it wastes memory! But it doesn't waste much
memory" things that gives me a headache because I'm not comfortable with
either stopping point....

So I just split the difference and made it grow in 64 byte increments.
Inefficient if you have a megabyte long symlink, but it'll work...

The reason I don't always use it is I don't necessarily want to do a
malloc all the time...

>> Thanks,
>> Stephen
>>
>> Patch:
>>
>> diff --git a/toys/posix/cp.c b/toys/posix/cp.c
>> --- a/toys/posix/cp.c
>> +++ b/toys/posix/cp.c
>> @@ -231,7 +231,8 @@
>>          // make symlink, or make block/char/fifo/socket
>>          if (S_ISLNK(try->st.st_mode)
>>              ? (0 < (i = readlinkat(tfd, try->name, toybuf,
sizeof(toybuf)))
>> &&
>> -               sizeof(toybuf) > i && !symlinkat(toybuf, cfd, catch))
>> +               sizeof(toybuf) > i && ((toybuf[i] = 0) == 0) &&

Is there a chance toybuf[i] = 0 will _not_ be == 0?

>> +               !symlinkat(toybuf, cfd, catch))
>>              : !mknodat(cfd, catch, try->st.st_mode, try->st.st_rdev))
>>          {
>>            err = 0;

On 01/09/2015 12:47 AM, enh wrote:
> it's not just cp(1). there are several other incorrect calls to
> readlink/readlinkat.

Indeed:

$ grep -l '[^x]readlink' lib/*.c toys/*/*.c
lib/dirtree.c
lib/xwrap.c
toys/other/lspci.c
toys/other/pwdx.c
toys/other/readlink.c
toys/other/stat.c
toys/pending/tar.c
toys/posix/cp.c
toys/posix/cpio.c

I'm still working through the strtol bugs...

(I really miss a mailman style web archive that lets me see all the
messages for a week or a month in unified threaded and flat views. All
the other archives take the twitter approach of "oh, you only care about
the last 7 minutes, everything before that's a pain to get to...")

> stat(1), for example:
> 
> # stat / /system/bin/stat
>   File: `/'
>   Size: 0 Blocks: 0 IO Blocks: 4096 directory
> Device: 1h Inode: 1 Links: 16
> Access: (755/drwxr-xr-x) Uid: (0/    root) Gid: (0/    root)
> Access: 1970-01-01 00:00:00.010000000
> Modify: 2015-01-08 18:55:35.739999999
> Change: 2015-01-08 18:55:35.739999999
>   File: `/system/bin/stat' -> `toybox1-08 18:55:35'
>   Size: 6 Blocks: 0 IO Blocks: 4096 symbolic link
> Device: 1030dh Inode: 341 Links: 1
> Access: (755/lrwxr-xr-x) Uid: (0/    root) Gid: (2000/    root)
> Access: 2015-01-08 18:44:26.000000000
> Modify: 2015-01-08 18:44:26.000000000
> Change: 2015-01-08 18:44:26.000000000
> 
> lspci(1) looks wrong by inspection, though i don't have an example failure.
> 
> readlink(1) and tar(1) are fine because they use xreadlink, and
> cpio(1) only writes the number of bytes returned by readlink(2). pwdx
> manually terminates the string.
> 
> the dirtree and xabspath helpers both manually terminate, though they
> seem to have a 4KiB fixed-sized buffer, unlike xreadlink which doesn't
> care how large your link target is.

Again, PATH_MAX above. If somebody has a use case with a symlink longer
than that, I can switch?

> we should probably make more use of xreadlink.

I didn't do that because then it's a separate allocation requiring a
separate free(). I can use it, it's just messy.

I'm off to a corner to try to finish printf.c and sed.c, then I think
the next pop off the stack is... the strtol() stuff? Or did I not finish
the xstrncat() switchover? (Those notes are on the netbook...)

Thanks for the heads up,

Rob

 1420920118.0


More information about the Toybox mailing list