[Toybox] one last find thing...

Thu Jun 20 16:34:36 PDT 2019

((as is very visible from running the find tests. i _really_ need to
start automating some of these tests... could have saved a week that
way :-( ))

On Thu, Jun 20, 2019 at 4:33 PM enh <enh at google.com> wrote:
>
> (looks like it breaks find too --- with O_PATH find stops reporting
> anything but the directory it's given.)
>
> On Thu, Jun 20, 2019 at 4:27 PM enh <enh at google.com> wrote:
> >
> > On Tue, Jun 18, 2019 at 12:24 AM Rob Landley <rob at landley.net> wrote:
> > >
> > > On 6/17/19 5:12 PM, enh wrote:
> > > >> Sigh. It's a pity I can't see what the actual failure is. (When you say timezone
> > > >> testing, do you mean toybox's date.test...?)
> > > >
> > > > no, it's "x86 CtsHostTzDataTests". afaict that's hundreds of lines of
> > > > Java to do what could be done in a couple of lines of shell, if only
> > > > we supported the latter.
> > >
> > > *blink* *blink* How is a change in toybox's shared plumbing making java fail...?
> > > (Is java shelling back out to the command line?)
> >
> > yes, exactly. that's why i'm grumbling about "we should have just
> > supported shell scripts". rewriting shell scripts as repeated calls to
> > adb in Java isn't helping anyone. (but it's the hammer they had,
> > so...)
> >
> > > I've debugged my way into the kernel and back out to track down weirdness, but I
> > > haven't got a reproduction environment for this. (Or nearly enough familiarity
> > > with the layers of java, selinux, and adb that seem to be involved in
> > > reproducing this). Is the vm _required_ for this to reproduce? What the...
> > >
> > > If you run the failing test twice (not rebuild, but redo the adb transaction)
> > > does it fail again? If so, does swapping in the previous toybox binary make it
> > > succeed?
> > >
> > > If that's the case we don't need the _build_, we just need the VM and the test
> > > harness...
> > >
> > > >>>>> increasingly convinced that the DIRTREE_STATELESS patch does break
> > > >>>>> something, and it's not just an infrastructure issue... i wouldn't
> > > >>>>> normally send such a useless bug report, but i've failed to get to
> > > >>>>> this in 3 days, and i'm not likely to for at least 3 more at this
> > > >>>>> point, so i thought i'd at least mention it...)
> > > >>>>
> > > >>>> This isn't going to break anything, is it?
> > > >>>>
> > > >>>> -      openat(dirtree_parentfd(new), new->name, O_CLOEXEC), flags);
> > > >>>> +      openat(dirtree_parentfd(new), new->name, O_PATH|O_CLOEXEC), flags);
> > > >>>
> > > >>> (one thing that occurred to me over the weekend is that it anywhere we
> > > >>> use O_PATH might break macOS, since there is no O_PATH there.
> > > >>
> > > >> #ifdef __APPLE__
> > > >> #define O_PATH 0
> > > >> #endif
> > > >
> > > > i actually meant behavioral breaks. specifically xabspath breaks the
> > > > usual "you have to pick one of O_RDONLY/O_RDWR/O_WRONLY" rule.
> > >
> > > O_RDONLY is zero on Linux? We can | in O_RDONLY if we need that for mac to work,
> > > I just didn't include a NOP. (Same with the above code snippet.)
> > >
> > > > i'm not worried about this right now, but it's something to think
> > > > about if we're going to seriously support macOS.
> > >
> > > I did Python for years, this is part of the reason I want a test suite covering
> > > everything. There's no substitute for thorough runtime testing. (Of course in C
> > > there's no substitute for thorough code review in _addition_ to thorough runtime
> > > testing.)
> > >
> > > That said, the interaction with another OS should boil down to A) supporting
> > > another libc (finding NEW ways to break sprintf), B) what system calls it makes,
> > > and the arguments thereto (which includes different structure layouts). That's
> > > assuming same toolchain and a processor type we've supported before. :)
> > >
> > > >>> but the
> > > >>> failures in question are on Android. [the builds in question don't
> > > >>> contain a new host prebuilt.])
> > > >>
> > > >> So the host prebuilt is the same, you rebuild toybox from source, and then a
> > > >> test it runs afterwards fails?
> > > >
> > > > yes, this is only rebuilding the device toybox. and then the
> > > > should-be-unrelated test fails.
> > >
> > > Ok, sounds like you're already avoiding a full rebuild and doctoring the image
> > > to pass or not pass?
> > >
> > > > next time i'm actually at a real
> > > > computer i'll try to find out exactly what commands this test is
> > > > running on the device.
> > >
> > > I'm sorry this is being weird. I need to reproduce your test environment here so
> > > I can avoid this sort of thing in future (or at least be some help when it happens).
> > >
> > > Would telling AOSP to build a Pixel 3xl image help in any way? (I didn't spot an
> > > adb call into a VM in that last I checked, but how would I know?)
> > >
> > > >> Is the test the only thing that fails? (Or does the build stop there?)
> > > >
> > > > afaict it's only this. (the tests all run after the build is done.)
> > >
> > > So you have a build image, and a command line that calls adb to do Things to it.
> > > Can the test run more than once, or does it modify the image to prevent itself
> > > from running again? (If you don't need to rebuild between each run, why does it
> > > take so long? The test is part of a large test suite and that's what hasn't been
> > > isolated yet?0
> > >
> > > How big is the image, and what emulator do you need to run it?
> > >
> > > Also, how would I spray down an strace binary with enough selinux permissions
> > > that it could actually run on android? (Presumably there's some sort of setprop
> > > idkfa "^^vv<><>baSTART" call buried in the AOSP build scripts?)
> > >
> > > >> Can this sideload test be extracted from the larger build? Maybe I can get an
> > > >> image I can replace toybox in and try the test again to see what's going on?
> > > >
> > > > afaict it's only affecting x86? which is presumably another clue. but
> > > > i'm not certain that's true yet because i haven't had time to confirm
> > > > that everything still works on arm/arm64.
> > >
> > > You have an environment in which it reproduces, but it could be _anything_. For
> > > all we know it could be toybox passing a critical size which causes an extra
> > > block to be allocated in the filesystem which moves the xattr into a
> > > pathological alignment where we trigger an selinux bug. (I've hit, and debugged,
> > > crazier. That's why when I get a bug I FREEZE and don't NOBODY MOVE and we
> > > SNAPSHOT and pull out the lapidoptry dissection equipment before the needle we
> > > stepped on gets mixed back into the haystack.)
> > >
> > > Yeah, makes me a kinda slow developer at times.
> > >
> > > (I'm still thinking the O_PATH thing is the most likely externally visible
> > > change. That's something you could spot in strace.)
> >
> > still haven't had time to actually reproduce this properly, but
> > abusing the build servers in the background while i did other stuff
> > did let me confirm that it's the O_PATH. removing that fixes things
> > again.
> >
> > one symptom i noticed by accident when running toybox with the O_PATH
> > change locally: ps doesn't show any processes! oh, and it's
> > reproduceable on my desktop too. so i don't know that that's _the_
> > issue, but that's certainly _an_ issue, and a convincing reason to
> > revert at least the O_PATH hunk.
> >
> > patch attached.
> >
> > > Rob