[Toybox] one last find thing...

enh enh at google.com
Thu Jun 20 16:27:27 PDT 2019


On Tue, Jun 18, 2019 at 12:24 AM Rob Landley <rob at landley.net> wrote:
>
> On 6/17/19 5:12 PM, enh wrote:
> >> Sigh. It's a pity I can't see what the actual failure is. (When you say timezone
> >> testing, do you mean toybox's date.test...?)
> >
> > no, it's "x86 CtsHostTzDataTests". afaict that's hundreds of lines of
> > Java to do what could be done in a couple of lines of shell, if only
> > we supported the latter.
>
> *blink* *blink* How is a change in toybox's shared plumbing making java fail...?
> (Is java shelling back out to the command line?)

yes, exactly. that's why i'm grumbling about "we should have just
supported shell scripts". rewriting shell scripts as repeated calls to
adb in Java isn't helping anyone. (but it's the hammer they had,
so...)

> I've debugged my way into the kernel and back out to track down weirdness, but I
> haven't got a reproduction environment for this. (Or nearly enough familiarity
> with the layers of java, selinux, and adb that seem to be involved in
> reproducing this). Is the vm _required_ for this to reproduce? What the...
>
> If you run the failing test twice (not rebuild, but redo the adb transaction)
> does it fail again? If so, does swapping in the previous toybox binary make it
> succeed?
>
> If that's the case we don't need the _build_, we just need the VM and the test
> harness...
>
> >>>>> increasingly convinced that the DIRTREE_STATELESS patch does break
> >>>>> something, and it's not just an infrastructure issue... i wouldn't
> >>>>> normally send such a useless bug report, but i've failed to get to
> >>>>> this in 3 days, and i'm not likely to for at least 3 more at this
> >>>>> point, so i thought i'd at least mention it...)
> >>>>
> >>>> This isn't going to break anything, is it?
> >>>>
> >>>> -      openat(dirtree_parentfd(new), new->name, O_CLOEXEC), flags);
> >>>> +      openat(dirtree_parentfd(new), new->name, O_PATH|O_CLOEXEC), flags);
> >>>
> >>> (one thing that occurred to me over the weekend is that it anywhere we
> >>> use O_PATH might break macOS, since there is no O_PATH there.
> >>
> >> #ifdef __APPLE__
> >> #define O_PATH 0
> >> #endif
> >
> > i actually meant behavioral breaks. specifically xabspath breaks the
> > usual "you have to pick one of O_RDONLY/O_RDWR/O_WRONLY" rule.
>
> O_RDONLY is zero on Linux? We can | in O_RDONLY if we need that for mac to work,
> I just didn't include a NOP. (Same with the above code snippet.)
>
> > i'm not worried about this right now, but it's something to think
> > about if we're going to seriously support macOS.
>
> I did Python for years, this is part of the reason I want a test suite covering
> everything. There's no substitute for thorough runtime testing. (Of course in C
> there's no substitute for thorough code review in _addition_ to thorough runtime
> testing.)
>
> That said, the interaction with another OS should boil down to A) supporting
> another libc (finding NEW ways to break sprintf), B) what system calls it makes,
> and the arguments thereto (which includes different structure layouts). That's
> assuming same toolchain and a processor type we've supported before. :)
>
> >>> but the
> >>> failures in question are on Android. [the builds in question don't
> >>> contain a new host prebuilt.])
> >>
> >> So the host prebuilt is the same, you rebuild toybox from source, and then a
> >> test it runs afterwards fails?
> >
> > yes, this is only rebuilding the device toybox. and then the
> > should-be-unrelated test fails.
>
> Ok, sounds like you're already avoiding a full rebuild and doctoring the image
> to pass or not pass?
>
> > next time i'm actually at a real
> > computer i'll try to find out exactly what commands this test is
> > running on the device.
>
> I'm sorry this is being weird. I need to reproduce your test environment here so
> I can avoid this sort of thing in future (or at least be some help when it happens).
>
> Would telling AOSP to build a Pixel 3xl image help in any way? (I didn't spot an
> adb call into a VM in that last I checked, but how would I know?)
>
> >> Is the test the only thing that fails? (Or does the build stop there?)
> >
> > afaict it's only this. (the tests all run after the build is done.)
>
> So you have a build image, and a command line that calls adb to do Things to it.
> Can the test run more than once, or does it modify the image to prevent itself
> from running again? (If you don't need to rebuild between each run, why does it
> take so long? The test is part of a large test suite and that's what hasn't been
> isolated yet?0
>
> How big is the image, and what emulator do you need to run it?
>
> Also, how would I spray down an strace binary with enough selinux permissions
> that it could actually run on android? (Presumably there's some sort of setprop
> idkfa "^^vv<><>baSTART" call buried in the AOSP build scripts?)
>
> >> Can this sideload test be extracted from the larger build? Maybe I can get an
> >> image I can replace toybox in and try the test again to see what's going on?
> >
> > afaict it's only affecting x86? which is presumably another clue. but
> > i'm not certain that's true yet because i haven't had time to confirm
> > that everything still works on arm/arm64.
>
> You have an environment in which it reproduces, but it could be _anything_. For
> all we know it could be toybox passing a critical size which causes an extra
> block to be allocated in the filesystem which moves the xattr into a
> pathological alignment where we trigger an selinux bug. (I've hit, and debugged,
> crazier. That's why when I get a bug I FREEZE and don't NOBODY MOVE and we
> SNAPSHOT and pull out the lapidoptry dissection equipment before the needle we
> stepped on gets mixed back into the haystack.)
>
> Yeah, makes me a kinda slow developer at times.
>
> (I'm still thinking the O_PATH thing is the most likely externally visible
> change. That's something you could spot in strace.)

still haven't had time to actually reproduce this properly, but
abusing the build servers in the background while i did other stuff
did let me confirm that it's the O_PATH. removing that fixes things
again.

one symptom i noticed by accident when running toybox with the O_PATH
change locally: ps doesn't show any processes! oh, and it's
reproduceable on my desktop too. so i don't know that that's _the_
issue, but that's certainly _an_ issue, and a convincing reason to
revert at least the O_PATH hunk.

patch attached.

> Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Revert-O_PATH-use-in-dirtree.patch
Type: text/x-patch
Size: 1022 bytes
Desc: not available
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20190620/5148217f/attachment-0003.bin>


More information about the Toybox mailing list