[Toybox] realpath/tar status

enh enh at google.com
Tue Feb 7 08:42:40 PST 2023


On Mon, Feb 6, 2023 at 7:35 PM Rob Landley <rob at landley.net> wrote:
>
>
>
> On 2/6/23 12:07, enh wrote:
> >> > -*-
> >> >
> >> > as for the new tar, i updated the prebuilts yesterday and we've seen
> >> > enough OOM kills since that i've had to revert it.
> >>
> >> Grrr. I've got a couple of suspects for what I screwed up, but... Lemme see if I
> >> can work the ASAN leak detector into my workflow at all here. (There's a lot of
> >> commands that intentionally leak resources on the way out because the OS will
> >> free them, but there's a category of commands that should NOT do that because
> >> they process theoretically unbounded input and cannot be allowed to leak-per.)
>
> I haven't wired up the ASAN leak detector yet.

i don't think there's a leak.

> >> > every case i saw
> >> > was from https://cs.android.com/android/platform/superproject/+/master:system/update_engine/Android.bp;l=947?q=ue_unittest_disk_imgs
>
> I wget that, and it was html.
>
> >> > so basically:
> >> > ```
> >> > tar -jxf sample_images/sample_images.tar.bz2 -C gen disk_ext2_1k.img
> >> > disk_ext2_4k.img disk_ext2_4k_empty.img disk_ext2_unittest.img
> >> > ```
> >> >
> >> > (direct link to the .tar.bz2:
> >> > https://android.googlesource.com/platform/system/update_engine/+/refs/heads/master/sample_images/)
>
> I wget _that_ and it was html.
>
> >> Wait, OOM on _extract_? Did I change the extract codepath? (That's sounding more
> >> like I damaged dirtree in the non-breadth case...)
> >
> > yeah, that didn't make much sense to me either, and i should have
> > explicitly said "i'm seeing tar killed by SIGKILL and _assume_ that's
> > the oom killer". but -- now i've had time to repro -- it looks like
> > suicide instead:
>
> I committed a fix in the --sort path that in theory removed a memory leak, and
> removed an "always add null terminator to empty directories thus dereferencing
> null" thinko that was disguised by the memory leak.
>
> But when I stress tested "tar c /home | tar t" (over 1.5 terabytes of files
> including an AOSP checkout) I don't like the memory behavior, it was up to 256
> megs used when I killed it. (Possibly just glibc's horrible heap, but I wanted
> to poke at it more to be sure.)
>
> I still haven't fiddled with the extract path, that I know...
>
> > ~/aosp-master-with-phones$ ./out/host/linux-x86/bin/toybox tar -jxf
> > system/update_engine/sample_images/sample_images.tar.bz2 -C
> > out/soong/.temp/sbox/c744f840647dbf476d005753d9a7965d5af2771b/out/gen
> > disk_ext2_1k.img disk_ext2_4k.img disk_ext2_4k_empty.img
> > disk_ext2_unittest.img
> > tar: had errors
>
> I didn't manage to grab your test image on friday because the THIRD link I tried
> https://android.googlesource.com/platform/system/update_engine/+/refs/heads/master/sample_images/sample_images.tar.bz2
> is still an html file with no obvious "download the darn file" link. There are
> "txt" and "json" links at the bottom, but the txt was not a bzip2 file.
>
> I just went and looked at the code instead, on the theory this worked before so
> must have been part of the recent changes...
>
> > read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> > 512) = 512
> > kill(3992265, SIGKILL)                  = 0
> > wait4(3992265, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3992265
> > exit_group(0)                           = ?
> > +++ exited with 0 +++
>
> ... OOM killer maybe?

no, sorry --- this was the point i was trying to make. _you're_
calling kill(). (if it was the OOM killer, you'd just see the signal
appear out of nowhere.)

i think this is probably the earlier race workaround coming back to
bite us? https://github.com/landley/toybox/blob/master/toys/posix/tar.c#L1120

> > (yes, that pid is legit. now you see why i needed the column size fix
> > for ps/top :-) )
> >
> > (note that the command line there should give you convenient repro
> > steps --- you don't need a whole AOSP tree, just the .tar.bz2 i linked
> > to in the first post.)
>
> Ok, I backed up to "update engine" link which told me the git clone invocation
> for this repo, and the resulting 485 files totaling 33 megabytes did include the
> relevant file, although at 6k you probably could have just attached it.
>
> Let's see...
>
> $ mkdir sub && cd sub
> $ ~/toybox/toybox/tar xvf ../sample_images.tar.bz2
> disk_ext2_1k.img
> disk_ext2_4k.img
> disk_ext2_4k_empty.img
> disk_ext2_unittest.img
> disk_sqfs_empty.img
> disk_sqfs_default.img
> disk_sqfs_unittest.img
> $ echo $?
> 0
>
> Seems to have worked with the test binary I made before that last checkin?
> Unless the -C or explicit -j were doing something weird, or I need more to the
> reproduction sequence. (Built with NDK or musl...?)

it's definitely flaky for me, but i was able to reproduce most of the
time. given my suspicion that it's related to your earlier attempt to
deal with a race condition, the active ingredient between my 8/10
failure rate and your inability to repro might just be "i have a
really fast machine"?

> Rob


More information about the Toybox mailing list