[Toybox] one last find thing...

enh enh at google.com
Thu Jun 20 16:33:16 PDT 2019


(looks like it breaks find too --- with O_PATH find stops reporting
anything but the directory it's given.)

On Thu, Jun 20, 2019 at 4:27 PM enh <enh at google.com> wrote:
>
> On Tue, Jun 18, 2019 at 12:24 AM Rob Landley <rob at landley.net> wrote:
> >
> > On 6/17/19 5:12 PM, enh wrote:
> > >> Sigh. It's a pity I can't see what the actual failure is. (When you say timezone
> > >> testing, do you mean toybox's date.test...?)
> > >
> > > no, it's "x86 CtsHostTzDataTests". afaict that's hundreds of lines of
> > > Java to do what could be done in a couple of lines of shell, if only
> > > we supported the latter.
> >
> > *blink* *blink* How is a change in toybox's shared plumbing making java fail...?
> > (Is java shelling back out to the command line?)
>
> yes, exactly. that's why i'm grumbling about "we should have just
> supported shell scripts". rewriting shell scripts as repeated calls to
> adb in Java isn't helping anyone. (but it's the hammer they had,
> so...)
>
> > I've debugged my way into the kernel and back out to track down weirdness, but I
> > haven't got a reproduction environment for this. (Or nearly enough familiarity
> > with the layers of java, selinux, and adb that seem to be involved in
> > reproducing this). Is the vm _required_ for this to reproduce? What the...
> >
> > If you run the failing test twice (not rebuild, but redo the adb transaction)
> > does it fail again? If so, does swapping in the previous toybox binary make it
> > succeed?
> >
> > If that's the case we don't need the _build_, we just need the VM and the test
> > harness...
> >
> > >>>>> increasingly convinced that the DIRTREE_STATELESS patch does break
> > >>>>> something, and it's not just an infrastructure issue... i wouldn't
> > >>>>> normally send such a useless bug report, but i've failed to get to
> > >>>>> this in 3 days, and i'm not likely to for at least 3 more at this
> > >>>>> point, so i thought i'd at least mention it...)
> > >>>>
> > >>>> This isn't going to break anything, is it?
> > >>>>
> > >>>> -      openat(dirtree_parentfd(new), new->name, O_CLOEXEC), flags);
> > >>>> +      openat(dirtree_parentfd(new), new->name, O_PATH|O_CLOEXEC), flags);
> > >>>
> > >>> (one thing that occurred to me over the weekend is that it anywhere we
> > >>> use O_PATH might break macOS, since there is no O_PATH there.
> > >>
> > >> #ifdef __APPLE__
> > >> #define O_PATH 0
> > >> #endif
> > >
> > > i actually meant behavioral breaks. specifically xabspath breaks the
> > > usual "you have to pick one of O_RDONLY/O_RDWR/O_WRONLY" rule.
> >
> > O_RDONLY is zero on Linux? We can | in O_RDONLY if we need that for mac to work,
> > I just didn't include a NOP. (Same with the above code snippet.)
> >
> > > i'm not worried about this right now, but it's something to think
> > > about if we're going to seriously support macOS.
> >
> > I did Python for years, this is part of the reason I want a test suite covering
> > everything. There's no substitute for thorough runtime testing. (Of course in C
> > there's no substitute for thorough code review in _addition_ to thorough runtime
> > testing.)
> >
> > That said, the interaction with another OS should boil down to A) supporting
> > another libc (finding NEW ways to break sprintf), B) what system calls it makes,
> > and the arguments thereto (which includes different structure layouts). That's
> > assuming same toolchain and a processor type we've supported before. :)
> >
> > >>> but the
> > >>> failures in question are on Android. [the builds in question don't
> > >>> contain a new host prebuilt.])
> > >>
> > >> So the host prebuilt is the same, you rebuild toybox from source, and then a
> > >> test it runs afterwards fails?
> > >
> > > yes, this is only rebuilding the device toybox. and then the
> > > should-be-unrelated test fails.
> >
> > Ok, sounds like you're already avoiding a full rebuild and doctoring the image
> > to pass or not pass?
> >
> > > next time i'm actually at a real
> > > computer i'll try to find out exactly what commands this test is
> > > running on the device.
> >
> > I'm sorry this is being weird. I need to reproduce your test environment here so
> > I can avoid this sort of thing in future (or at least be some help when it happens).
> >
> > Would telling AOSP to build a Pixel 3xl image help in any way? (I didn't spot an
> > adb call into a VM in that last I checked, but how would I know?)
> >
> > >> Is the test the only thing that fails? (Or does the build stop there?)
> > >
> > > afaict it's only this. (the tests all run after the build is done.)
> >
> > So you have a build image, and a command line that calls adb to do Things to it.
> > Can the test run more than once, or does it modify the image to prevent itself
> > from running again? (If you don't need to rebuild between each run, why does it
> > take so long? The test is part of a large test suite and that's what hasn't been
> > isolated yet?0
> >
> > How big is the image, and what emulator do you need to run it?
> >
> > Also, how would I spray down an strace binary with enough selinux permissions
> > that it could actually run on android? (Presumably there's some sort of setprop
> > idkfa "^^vv<><>baSTART" call buried in the AOSP build scripts?)
> >
> > >> Can this sideload test be extracted from the larger build? Maybe I can get an
> > >> image I can replace toybox in and try the test again to see what's going on?
> > >
> > > afaict it's only affecting x86? which is presumably another clue. but
> > > i'm not certain that's true yet because i haven't had time to confirm
> > > that everything still works on arm/arm64.
> >
> > You have an environment in which it reproduces, but it could be _anything_. For
> > all we know it could be toybox passing a critical size which causes an extra
> > block to be allocated in the filesystem which moves the xattr into a
> > pathological alignment where we trigger an selinux bug. (I've hit, and debugged,
> > crazier. That's why when I get a bug I FREEZE and don't NOBODY MOVE and we
> > SNAPSHOT and pull out the lapidoptry dissection equipment before the needle we
> > stepped on gets mixed back into the haystack.)
> >
> > Yeah, makes me a kinda slow developer at times.
> >
> > (I'm still thinking the O_PATH thing is the most likely externally visible
> > change. That's something you could spot in strace.)
>
> still haven't had time to actually reproduce this properly, but
> abusing the build servers in the background while i did other stuff
> did let me confirm that it's the O_PATH. removing that fixes things
> again.
>
> one symptom i noticed by accident when running toybox with the O_PATH
> change locally: ps doesn't show any processes! oh, and it's
> reproduceable on my desktop too. so i don't know that that's _the_
> issue, but that's certainly _an_ issue, and a convincing reason to
> revert at least the O_PATH hunk.
>
> patch attached.
>
> > Rob



More information about the Toybox mailing list