[Toybox] one last find thing...

Mon Jun 17 15:01:24 PDT 2019

On 6/17/19 2:49 PM, enh wrote:
> On Sat, Jun 15, 2019 at 4:30 PM Rob Landley <rob at landley.net> wrote:
>>
>> On 6/14/19 3:57 PM, enh wrote:
>>> (i haven't had time to investigate, and i don't have any useful test
>>> case other than "some timezone testing fails to run on emulators in
>>> the cloud, in a way that gives me no useful failure", but i'm getting
>>
>> Does it _reliably_ fail to run?
> 
> seems like it. i've kicked off another build "just in case" every day,
> but it looks like the same failure (modulo the fact that i don't have
> any real detail), and _other_ changes are going through fine.

If it fails reliably, we can start test-reverting bits of it. I'd start with the
O_PATH (to eliminated it, if nothing else).

Also, did you try a build with the commit before this one just to confirm this
is what did it?

Sigh. It's a pity I can't see what the actual failure is. (When you say timezone
testing, do you mean toybox's date.test...?)

>>> increasingly convinced that the DIRTREE_STATELESS patch does break
>>> something, and it's not just an infrastructure issue... i wouldn't
>>> normally send such a useless bug report, but i've failed to get to
>>> this in 3 days, and i'm not likely to for at least 3 more at this
>>> point, so i thought i'd at least mention it...)
>>
>> This isn't going to break anything, is it?
>>
>> -      openat(dirtree_parentfd(new), new->name, O_CLOEXEC), flags);
>> +      openat(dirtree_parentfd(new), new->name, O_PATH|O_CLOEXEC), flags);
> 
> (one thing that occurred to me over the weekend is that it anywhere we
> use O_PATH might break macOS, since there is no O_PATH there.

#ifdef __APPLE__
#define O_PATH 0
#endif

> but the
> failures in question are on Android. [the builds in question don't
> contain a new host prebuilt.])

So the host prebuilt is the same, you rebuild toybox from source, and then a
test it runs afterwards fails?

Is the test the only thing that fails? (Or does the build stop there?)

>> Moving struct st earlier within struct dirtree could reveal an existing bug, but
>> the bug itself would be elsewhere.
>>
>> If strcpy(s, "") with only a single byte allocated to s[] wrote past the end of
>> it, we'd have bigger problems...
>>
>> I'm not spotting what else could be the culprit? (And with a _timezone_
>> test...?)
> 
> i don't think that's relevant. it's just a test that (afaict) runs on
> the host and calls commands on the device via adb. (don't ask... i
> can't defend the "host-side tests" stuff because aiui it's
> indefensible.)

I try never to criticize the user workload. It has seniority, I CANNOT break it.
(Unless they're really the only user and I ask nicely.)

I added a workaround to toybox sed for an outright _bug_ in the perl package
build (where whoever wrote the regex didn't understand how ranges work so
created a NOP but I was erroring out on the invalid construct).

Can this sideload test be extracted from the larger build? Maybe I can get an
image I can replace toybox in and try the test again to see what's going on?

>> My approach would be to revert bits of it (go back to the xzalloc()
>> etc, which is really an attempt to speed up top with less memory churn although
>> I should break down and bench what that's spending its time on...)
>>
>> But if I can't reproduce the failure, I can't bisect it. Hmmm.
> 
> yeah, when i get time i'll try the bisection. (unfortunately it's a
> multi-hour thing for me. but at that's better than nothing.)

Obviously I can just revert the patch, but that doesn't explain what's happening.

>> Rob

Rob