[Toybox] ps down, top to go

Fri May 20 14:57:18 PDT 2016

On Fri, May 20, 2016 at 2:30 PM, Rob Landley <rob at landley.net> wrote:
> Now that I've dealt with the "ps -A not working after building top" bug,
> being sick for a week due to travel, and catching up on the $DAYJOB
> backlog thereof...
>
> On 05/09/2016 07:46 PM, enh wrote:
>>>>> my real problem is that i don't currently have a field that gives me
>>>>> the process name in -T/-H mode.
>>>>
>>>> Define "process name"?
>>>>
>>>> There are 6 right now: args, cmd, cmdline, comm, command, and name.
>>>>
>>>> COMM is stat[2], NAME is argv[0] minus the path, COMMAND is argv[0] with
>>>> the path.
>>>>
>>>> Those are the three variants of "process name", the rest show command
>>>> line arguments too: CMDLINE is the full unmodified command line. ARGS is
>>>> the full command line using NAME for argv[0] (I.E. minus the path to the
>>>> binary you're running, if any). And then CMD is this crazy posix thing
>>>> that's one of the others depending on your command line options.
>>>
>>> compare
>>>
>>> ./toybox ps -A -T -o pid,tid,comm,command
>>>
>>> with
>>>
>>> ps -A -T -o pid,tid,comm,command
>>
>> (since it's taken this long and you still don't see what i'm saying, i
>> guess i shouldn't assume you're seeing what i'm seeing...
>>
>> here's what i see for some random chrome processes with ps:
>>
>>  86993  86993 chrome          /opt/google/chrome/chrome --type=renderer --lang=e
>>  86993  86997 Chrome_ChildIOT /opt/google/chrome/chrome --type=renderer --lang=e
>>  86993  86999 Compositor      /opt/google/chrome/chrome --type=renderer --lang=e
>>  86993  87000 CompositorTileW /opt/google/chrome/chrome --type=renderer --lang=e
>>  86993  87001 CompositorTileW /opt/google/chrome/chrome --type=renderer --lang=e
>>  86993  87002 CompositorTileW /opt/google/chrome/chrome --type=renderer --lang=e
>>  86993  87003 CompositorTileW /opt/google/chrome/chrome --type=renderer --lang=e
>>  86993  87004 handle-watcher- /opt/google/chrome/chrome --type=renderer --lang=e
>>  86993  87005 HTMLParserThrea /opt/google/chrome/chrome --type=renderer --lang=e
>>  86993  87008 ScriptStreamerT /opt/google/chrome/chrome --type=renderer --lang=e
>>  86993 128020 WorkerPool/5655 /opt/google/chrome/chrome --type=renderer --lang=e
>>
>> and then toybox (ignoring that toybox mangled the large tid for the
>> last thread):
>>
>> 86993 86993 chrome          /opt/google/chrome/chrome (deleted)
>> 86993 86997 Chrome_ChildIOT [Chrome_ChildIOT]
>> 86993 86999 Compositor      [Compositor]
>> 86993 87000 CompositorTileW [CompositorTileW]
>> 86993 87001 CompositorTileW [CompositorTileW]
>> 86993 87002 CompositorTileW [CompositorTileW]
>> 86993 87003 CompositorTileW [CompositorTileW]
>> 86993 87004 handle-watcher- [handle-watcher-]
>> 86993 87005 HTMLParserThrea [HTMLParserThrea]
>> 86993 87008 ScriptStreamerT [ScriptStreamerT]
>> 86993 12802 WorkerPool/5655 [WorkerPool/5655]
>>
>> )
>
> Threads have nothing in /proc/$$/cmdline, so command and friends have
> nothing to show and fall back to showing kernel thread, yes.
>
> It sounds like the behavior you _want_ is for one of them to show the
> $PID command line for this $TID? I.E. show some OTHER process's command
> line because threads have magic relationships that ps needs to learn
> about. Most likely this should be CMDLINE doing it.

yes.

for us, where everything's threaded, it's getpid() == gettid() that's
the weird special case.

> Which is a _bit_ of a problem because the display code only has access
> to a single process, it can't reach out and grab another process. While
> I can stick a pointer in a slot[], the way the top and iotop logic
> shuffle stacks of processes together could screw up the lifetime rules
> there and traverse a stale pointer if I did that. (Can of worms, dowanna
> go there.)
>
> Hmmm. It's a layering violation: code that looks at an array of
> processes calls code that looks at single processes, and the code that
> looks at single processes hasn't got any way to get back to that array.
>
> In fact it's worse than that, get_ps() populates toybuf and then _if_
> we're doing fancy sorting things will memcpy() the data out into a
> malloc. But if we're not, it just displays the toybuf data and frees it.
> So the parent node data no longer _exists_ by the time we're displaying
> the threads.
>
> So, what we gotta do is snapshot the data into toybuf. I can add another
> entry to the fetch[] array at the start of get_ps() and have that be a
> zero length string for non-threads but a copy of the parent process's
> command line for threads, and then have CMDLINE print that if it's
> non-null, otherwise fall back to previous behavior. Actually I can be
> slimy about initializing struct carveup offset[6] so it only points to
> the new entry if there is one, and points to

assuming you were going to finish that sentence more reasonably than
"yo momma", yes, that sounds like what toolbox did.

> The question is, which -o fields should show slot -7 and which should
> show slot -1? I.E. when do I do the current [thread] behavior, and when
> do I lie and show the parent's command line instead?

i think that your existing heuristics are fine for ps. i think it's
only top (where you lose the relationship between pid and tids) where
it matters. toolbox top wasn't trying to be consistent with ps, and
neither supported all the traditional CMD/COMM/CMDLINE/ARGS variants,
so effectively just went with a new field that's explicitly the
process name. so if you don't mind adding yet another field name, i
think that's the simplest option (and simplest to explain in the
--help output too).

the primary use case is that in a bug report we include a top, and the
last two columns will be the thread name and process name. i guess the
interesting question is "what should interactive top -H show?". looks
like desktop top uses thread name, so if you change nothing, you'll be
consistent with existing expectations :-)

> (I could add a ps
> --lie-about-threads option pretty easily that just replaces it for
> everybody, ala set slot -1 to the cached parent data. I just dunno what
> the correct behavior should be here. I got threads out of my system back
> under OS/2, I really haven't dealt with them much in a posix-ish
> context. Mostly because pthreads were abominable and inexplicably tied
> event semaphores to variable tests for no obvious reason. "Go wake up
> this thread" is a _primitive_, darn it...)
>
>>>>>> (Did you know "top -O" in ubuntu lists all the available field names? I
>>>>>
>>>>> i only found that out while experimenting recently. i'd assumed it
>>>>> worked like ps' much more useful -O.
>
> I implemented ps style -O for top a few days ago. If there isn't a spec,
> then "what ps does" is as valid as anything...
>
> This one swaps out PR,NI,VIRT,RES,SHR,S for what you supply in -O.
>
>>>> The problem is ps's default output has buckets of free space and top's
>>>> doesn't, so if -O inserts fields it pushes stuff off the right edge
>>>> pretty quickly.
>>>
>>> (remember i only care about this for batch mode, for inclusion in a
>>> bug report. so columns are basically unlimited. i think it's
>>> reasonable to argue that using top -O and expecting to fit in 80
>>> columns is clearly unreasonable, and you should use -o to choose for
>>> yourself how to divide up the space.)
>
> No longer a problem either way. :)
>
> (Oh, the other thing the new top -O does is move the default sort to the
> first -O field, instead of the CPU field. Because that's what seemed
> useful.)
>
>>>>> even though you hate them, this is one of the nice things about long
>>>>> options. they're easier to remember, and no one cares that you've
>>>>> already taken --list-fields because they're not likely to want
>>>>> --list-fields to mean anything else.
>>>>
>>>> I just like there to be a short option corresponding to each long option.
>>>
>>> many long options just aren't worth a short option.
>
> If they aren't worth a short option, are they worth _having_?

i don't use a toilet plunger often, but when i do i'm usually pretty
glad it was there :-)

>>> there are 26*2
>>> available short options, and everyone's better off if they're at least
>>> somewhat mnemonic but the most important thing is that you can type
>>> them quickly because you use them all the time.
>
> Short options also group in a way long options don't.
>
> I dunno about mnemonics, but the "lord of the rings" option to ls:
>
>   ls -lotr
>
> Is pretty nifty for finding the most recently modified file(s) in a
> directory.
>
>>> whereas for
>>> rarely-used long options it's better if you don't waste a precious
>>> short option,
>
> Agreed.
>
>>> and you're more likely to remember the descriptive
>>> option. (long options work really well for --something/--no-something
>>> pairs too.)
>
> "The hardest part of design is keeping features out." - Don Norman.
>
> I lean towards "if it's not worth a short option, why is it worth _doing_?"
>
> There are exceptions for things like ls --color which only ever get used
> via "alias" set in the shell profile, but they _are_ excpetions.
>
>>>>>> (And I gotta finish ioctl...)
>>>>>
>>>>> that seems too broken for me to believe anyone's actually been using
>>>>> it. but then one might equally well say that about the kernel's ioctl
>>>>> interface and it's sadly not dead yet.
>>>>
>>>> I did half a replacement once and I should finish it. Alas, there's a
>>>> dozen things I could say that about and the past few days I've been
>>>> wrestling with j-core repository conversion.
>>>>
>>>> (And if there's going to be a "sysctl" command, there might as well be
>>>> an ioctl command...)
>>>>
>>>> Sigh, I had this message half-finished for a few days and looking back
>>>> I'm going "Oh right, I forgot I was in the middle of that" about 3
>>>> different things. I suspect I should pull up the mailing list threads
>>>> for the month to re-read on the plane...
>
> And if I hadn't flown United, that might have been a useful thing to do.
>
> (They're cheap for a reason.)
>
> Rob

-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.