[Toybox] one last find thing...

Tue Jun 18 00:25:53 PDT 2019

On 6/17/19 5:12 PM, enh wrote:
>> Sigh. It's a pity I can't see what the actual failure is. (When you say timezone
>> testing, do you mean toybox's date.test...?)
> 
> no, it's "x86 CtsHostTzDataTests". afaict that's hundreds of lines of
> Java to do what could be done in a couple of lines of shell, if only
> we supported the latter.

*blink* *blink* How is a change in toybox's shared plumbing making java fail...?
(Is java shelling back out to the command line?)

I've debugged my way into the kernel and back out to track down weirdness, but I
haven't got a reproduction environment for this. (Or nearly enough familiarity
with the layers of java, selinux, and adb that seem to be involved in
reproducing this). Is the vm _required_ for this to reproduce? What the...

If you run the failing test twice (not rebuild, but redo the adb transaction)
does it fail again? If so, does swapping in the previous toybox binary make it
succeed?

If that's the case we don't need the _build_, we just need the VM and the test
harness...

>>>>> increasingly convinced that the DIRTREE_STATELESS patch does break
>>>>> something, and it's not just an infrastructure issue... i wouldn't
>>>>> normally send such a useless bug report, but i've failed to get to
>>>>> this in 3 days, and i'm not likely to for at least 3 more at this
>>>>> point, so i thought i'd at least mention it...)
>>>>
>>>> This isn't going to break anything, is it?
>>>>
>>>> -      openat(dirtree_parentfd(new), new->name, O_CLOEXEC), flags);
>>>> +      openat(dirtree_parentfd(new), new->name, O_PATH|O_CLOEXEC), flags);
>>>
>>> (one thing that occurred to me over the weekend is that it anywhere we
>>> use O_PATH might break macOS, since there is no O_PATH there.
>>
>> #ifdef __APPLE__
>> #define O_PATH 0
>> #endif
> 
> i actually meant behavioral breaks. specifically xabspath breaks the
> usual "you have to pick one of O_RDONLY/O_RDWR/O_WRONLY" rule.

O_RDONLY is zero on Linux? We can | in O_RDONLY if we need that for mac to work,
I just didn't include a NOP. (Same with the above code snippet.)

> i'm not worried about this right now, but it's something to think
> about if we're going to seriously support macOS.

I did Python for years, this is part of the reason I want a test suite covering
everything. There's no substitute for thorough runtime testing. (Of course in C
there's no substitute for thorough code review in _addition_ to thorough runtime
testing.)

That said, the interaction with another OS should boil down to A) supporting
another libc (finding NEW ways to break sprintf), B) what system calls it makes,
and the arguments thereto (which includes different structure layouts). That's
assuming same toolchain and a processor type we've supported before. :)

>>> but the
>>> failures in question are on Android. [the builds in question don't
>>> contain a new host prebuilt.])
>>
>> So the host prebuilt is the same, you rebuild toybox from source, and then a
>> test it runs afterwards fails?
> 
> yes, this is only rebuilding the device toybox. and then the
> should-be-unrelated test fails.

Ok, sounds like you're already avoiding a full rebuild and doctoring the image
to pass or not pass?

> next time i'm actually at a real
> computer i'll try to find out exactly what commands this test is
> running on the device.

I'm sorry this is being weird. I need to reproduce your test environment here so
I can avoid this sort of thing in future (or at least be some help when it happens).

Would telling AOSP to build a Pixel 3xl image help in any way? (I didn't spot an
adb call into a VM in that last I checked, but how would I know?)

>> Is the test the only thing that fails? (Or does the build stop there?)
> 
> afaict it's only this. (the tests all run after the build is done.)

So you have a build image, and a command line that calls adb to do Things to it.
Can the test run more than once, or does it modify the image to prevent itself
from running again? (If you don't need to rebuild between each run, why does it
take so long? The test is part of a large test suite and that's what hasn't been
isolated yet?0

How big is the image, and what emulator do you need to run it?

Also, how would I spray down an strace binary with enough selinux permissions
that it could actually run on android? (Presumably there's some sort of setprop
idkfa "^^vv<><>baSTART" call buried in the AOSP build scripts?)

>> Can this sideload test be extracted from the larger build? Maybe I can get an
>> image I can replace toybox in and try the test again to see what's going on?
> 
> afaict it's only affecting x86? which is presumably another clue. but
> i'm not certain that's true yet because i haven't had time to confirm
> that everything still works on arm/arm64.

You have an environment in which it reproduces, but it could be _anything_. For
all we know it could be toybox passing a critical size which causes an extra
block to be allocated in the filesystem which moves the xattr into a
pathological alignment where we trigger an selinux bug. (I've hit, and debugged,
crazier. That's why when I get a bug I FREEZE and don't NOBODY MOVE and we
SNAPSHOT and pull out the lapidoptry dissection equipment before the needle we
stepped on gets mixed back into the haystack.)

Yeah, makes me a kinda slow developer at times.

(I'm still thinking the O_PATH thing is the most likely externally visible
change. That's something you could spot in strace.)

Rob