[Toybox] top -H segfault

Rob Landley rob at landley.net
Fri May 27 12:56:38 PDT 2016



On 05/26/2016 05:20 PM, enh wrote:
> On Thu, May 26, 2016 at 1:03 PM, Rob Landley <rob at landley.net> wrote:
>> On 05/25/2016 04:35 PM, enh wrote:
>>> On Wed, May 25, 2016 at 1:59 PM, enh <enh at google.com> wrote:
>>>> On Tue, May 24, 2016 at 2:23 PM, Rob Landley <rob at landley.net> wrote:
>>>>> On 05/23/2016 07:59 PM, enh wrote:
>>>>>> just running home, so no time to debug further, but in case you
>>>>>> haven't seen this yet... plain "top -H" is segfaulting for me before
>>>>>> displaying anything.
>>>>>
>>>>> Hmmm, no I hadn't seen that. But I do see that ./toybox top -H isn't
>>>>> displaying anything when I do that...
>>>>>
>>>>> Ah. The active node logic changed and I missed updating a spot. (I
>>>>> tested it, but nothing I tested used "collate". Oops. I need to work out
>>>>> how to add this sort of stuff to the test suite. Probably have to use
>>>>> archival snapshots of "ps" contents to get consistent reproducible test
>>>>> output...)
>>>>>
>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>                                                     0x00000000004219c1
>>>>>> in top_common (filter=filter at entry=0x41f738 <merge_deltas>) at
>>>>>> toys/posix/ps.c:1349
>>>>>> 1349      if (old.count && (!new.count || *otb->slot < *ntb->slot)) {
>>>>>
>>>>> Null dereference on *otb and *ntb because now you traverse the tree in
>>>>> order and keep each node that has a non-null ->extra field.
>>>>>
>>>>> Try now?
>>>>
>>>> (sorry, was too busy to get back to this until now...)
>>>>
>>>> yes, works for me now, and i probably wouldn't have found that myself
>>>> in a reasonable amount of time anyway!
>>>>
>>>> "toybox top -b -n 1 -o
>>>> pid,tid,user,pr,ni,%cpu,s,vsz,rss,pcy,command,name -s 6" now looks
>>>> close enough to what dumpstate wants for bug reports that i can give
>>>> it a go...
>>>
>>> actually, no, -o COMMAND isn't giving me what i thought. (previously
>>> i'd only tested on the desktop.) on Android everything shows up as
>>> /system/bin/app_process64 --- it's the first word of
>>> readlink('/proc/pid/exe'). (when the zygote forks it sets argv[0] to
>>> the package name of the app it's about to run, so you can see things
>>> like "com.android.inputmethod.latin" here.)

Is _THAT_ why ubuntu 14.04's chromium process filenames are screwed up?
Because the filename of zygote-after-fork is:

  /usr/lib/chromium-browser/chro

Truncated at that point.

Using the new logic, with "ps -AOtid,tname" I get:

  PID TTY          TIME   TID TNAME           CMD
...
30878 ?        00:49:12 30878                 chromium-browse
30878 ?        00:00:30 30879 chro            [Chrome_ChildIOT]
30878 ?        00:00:30 30881 chro            [Compositor]
30878 ?        00:00:24 30882 chro            [CompositorTileW]
30878 ?        00:00:00 30883 chro            [handle-watcher-]
30878 ?        00:00:02 30884 chro            [HTMLParserThrea]
30878 ?        00:00:09 30885 chro            [ScriptStreamerT]
30878 ?        00:00:01  5875 chro            [Media]

That TNAME was trimmed down from the above longer path. This sounds like
the group name substitution logic you're talking about misbehaving.

Oh well, I'm sure both remaining users of Ubuntu as a desktop system
(now that Cannonical helped Windows add a Linux system call translation
layer so people cross compiling to Android and such from there can do so
natively from windows) can put up with it until we get Android on the
workstation...

Oh THIS is fun:

16990 ?        15:37:50 16990                 chromium-browse
16990 ?        00:12:03 17003  --enable-pinch [sandbox_ipc_thr]
16990 ?        00:00:00 17012  --enable-pinch [chromium-browse]
16990 ?        00:00:01 17013  --enable-pinch [NetworkChangeNo]
16990 ?        00:00:00 17014  --enable-pinch [D-Bus thread]
16990 ?        00:00:00 17015  --enable-pinch [CrShutdownDetec]
16990 ?        00:00:00 17016  --enable-pinch [inotify_reader]
16990 ?        00:06:40 17020  --enable-pinch [BrowserBlocking]
16990 ?        00:00:00 17021  --enable-pinch [dconf worker]
16990 ?        00:00:04 17022  --enable-pinch [gdbus]
16990 ?        00:00:16 17023  --enable-pinch [Chrome_DBThread]
16990 ?        00:00:10 17024  --enable-pinch [Chrome_FileThre]
16990 ?        00:00:22 17025  --enable-pinch [Chrome_FileUser]
16990 ?        00:00:00 17026  --enable-pinch [Chrome_ProcessL]
16990 ?        00:10:35 17027  --enable-pinch [Chrome_CacheThr]
16990 ?        02:39:06 17028  --enable-pinch [Chrome_IOThread]
16990 ?        00:00:39 17029  --enable-pinch [IndexedDB]
16990 ?        00:29:08 17030  --enable-pinch [CompositorTileW]
16990 ?        00:00:10 17031  --enable-pinch [AudioThread]
16990 ?        00:00:00 17032  --enable-pinch [threaded-ml]
16990 ?        00:00:00 17033  --enable-pinch [BrowserWatchdog]
16990 ?        00:06:25 17042  --enable-pinch [BrowserBlocking]

That's trimmed down from "chromium-browser --enable-pinch" and yes that
is a space in argv[0]. Chromium's logic here on Ubuntu 14.04 is just NUTS.

24723 ?        01:20:44 24723 shot-passed-by- chromium-browse
24723 ?        00:10:38 24727 shot-passed-by- [chromium-browse]
24723 ?        00:00:09 24728 shot-passed-by- [Watchdog]
24723 ?        00:09:55 24729 shot-passed-by- [Chrome_ChildIOT]
24723 ?        00:00:00 24730 shot-passed-by- [handle-watcher-]

Not even asking...

>> Slight awkwardness: TNAME (for "thread name") fits in the help text
>> column but TCOMMAND doesn't.
>>
>> Right now NAME is argv[0] with leading path stripped and COMMAND is
>> argv[0] verbatim, so TNAME would logically be the thread parent's
>> argv[0] with path stripped.
>>
>> Would stripping path (if any) be a bad thing here? (I don't have to, I'm
>> just trying to figure out which behavior is less surprising...)
> 
> on Android there would never be a path anyway; it'll be a
> fully-qualified Java package name. com.facebook.katana or whatever.
> (but if you're asking what the old top did, it didn't strip paths. i'm
> looking at everyone's favorite, /system/bin/mediaserver, right now :-)
> )

A thread won't (unless there's a / in a package name, which should never
happen), but non-threaded commands might, and could easily overflow the
display field before prividing useful info. So I'll stick with the /path
trimming I guess.

Sorry for the delay, I found another display bug: the recent fiddling
with the width logic to add overflow capability meant that when the last
field was trimmed to screen width size it was always right justified.
The "trim to width" logic didn't propogate the sign of the padding
value, negative means left justified. Fixed now.

Also, the ARGS space trimming logic was stopping at the first space
(paths can have spaces in them) which is the whole reason I saved
argv0len in a slot before replacing low ascii stuff (including NUL
bytes) with spaces, and I went back through all the rationale of that
confirming I still need it. Reading /proc/$$/exe means I don't need it
to reliably display just the command name, but I _do_ need it for
reliably displaying all of argv0 (which TNAME does) and for stripping
the path from argv0 (which ARGS does).

(Yes, along the way I've tested that you can have a command name with
')' in it and the stat[2] parsing won't get confused, that you can have
spaces and newlines in command names and the display won't get
confused... Really want to get this into the regression testing but as
previously mentioned: these commands measure brownian motion and it's
easy to get reliably human readable output but really hard to get
reproducible machine readable output on different machines. Upgrade the
kernel version in your VM and it's butterfly effects all over the place...)

Rob


More information about the Toybox mailing list