[Toybox] ps crashes

enh enh at google.com
Wed Mar 22 11:18:25 PDT 2017


(For those following along at home, a fix was merged yesterday.)

On Mar 12, 2017 01:04, "Rob Landley" <rob at landley.net> wrote:

> On 03/11/2017 01:22 PM, enh wrote:
> > from the line number addr2line gave above, it's the line marked with !
> > below. i've included the block between too, since that modifies
> > new->child.
> >
> >   // Recurse down into tasks, retaining thread groups.
> >   // Disable show_process at least until we can calculate tcount
> >   kcount = TT.kcount;
> >   sprintf(toybuf, "/proc/%u/task", pid);
> >   new->child = dirtree_flagread(toybuf, DIRTREE_SHUTUP|DIRTREE_PROC,
> get_ps);
> >   TT.threadparent = 0;
> >   kcount = TT.kcount-kcount+1;
> >   tb = (void *)new->extra;
> >   tb->slot[SLOT_tcount] = kcount;
> >
> >   // Fill out tid and thread count for each entry in group
> > !  if (new->child) for (dt = new->child->child; dt; dt = dt->next) {
> >
> > so i assume it's that dirtree_flagread that's failing, returning
> > DIRTREE_ABORTVAL, and since the code that follows just assumes it's
> > either valid or null, we die on the line marked ! (as claimed by the
> > stack trace).
> >
> > i don't know is why why dirtree_flagread is failing, though, and can't
> > reproduce it.
>
> Oh that's easy to see from context: /proc/%u exited before /proc/$u/task
> could be opened, so dirtree_flagread() went "nope" and we didn't check
> for it.
>
> Asynchronous process exit while we're reading its data is something I
> tried to be robust against in the rest of ps, usually by effectively
> moving the process exit a milisecond or so earlier. If we couldn't read
> the data because process go bye-bye how does that differ from process go
> bye-bye an instant before we would have noticed it?
>
> Looks like I messed up here going back and adding thread support because
> the dirtree_flagread() semantics are non-obvious. (The
> dirtree_flagread() reopen in thread scanning is a rough edge in the
> design. I'm splicing together discontiguous trees manually and the
> result doesn't have the checks and review the existing dirtree.c stuff
> does.)
>
> And that's my real concern here: the dirtree_flagread() semantics
> _shouldn't_ be non-obvious. Should I change dirtree_flagread() to always
> return NULL or valid? Why did I have it NOT do that in the first place
> (to the point I commented it not doing that). I vaguely recall there's
> some error condition that the caller can only distinguish if it gets the
> extra error return to distinguish the case, but I don't remember _what_
> the condition is. (Exists but permissions wrong to actually read its
> contents, maybe?)
>
> Alas my toybox work directory's in a bit of a shambles at the moment
> because I've got half-finished make.sh changes for getconf.c and
> half-finished help text parsing changes, and between the two of 'em it
> doesn't _build_ at the moment. (My two weeks in tokyo and the week in
> portland before that I was too busy with other stuff to close any tabs.
> Happy to be home, but torn between resting and shoveling out...)
>
> I can do a quick fix here (just add a test for abortval on that line)
> but I'd rather change flagread not to return abortval, which means
> understanding what I lose by doing that and being comfortable with it.
> (And why the _other_ calls to flagread aren't dying more regularly with
> abortval returns? How is this NOT screwing up ps with no /proc ?)
>
> But once again, I'm only getting to this at 4am. (Ok, mostly that's
> jetlag royally horking my schedule.) Might not get to this tomorrow if
> getconf build infrastructure continues to be stroppy, but it's #3 on my
> toybox todo list right now.
>
> Thanks,
>
> Rob
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20170322/dcbc03d0/attachment-0001.htm>


More information about the Toybox mailing list