[Toybox] top -H segfault

enh enh at google.com
Sat Jun 4 09:08:15 PDT 2016


On Fri, May 27, 2016 at 12:56 PM, Rob Landley <rob at landley.net> wrote:
>
>
> On 05/26/2016 05:20 PM, enh wrote:
>> On Thu, May 26, 2016 at 1:03 PM, Rob Landley <rob at landley.net> wrote:
>>> On 05/25/2016 04:35 PM, enh wrote:
>>>> On Wed, May 25, 2016 at 1:59 PM, enh <enh at google.com> wrote:
>>>>> On Tue, May 24, 2016 at 2:23 PM, Rob Landley <rob at landley.net> wrote:
>>>>>> On 05/23/2016 07:59 PM, enh wrote:
>>>>>>> just running home, so no time to debug further, but in case you
>>>>>>> haven't seen this yet... plain "top -H" is segfaulting for me before
>>>>>>> displaying anything.
>>>>>>
>>>>>> Hmmm, no I hadn't seen that. But I do see that ./toybox top -H isn't
>>>>>> displaying anything when I do that...
>>>>>>
>>>>>> Ah. The active node logic changed and I missed updating a spot. (I
>>>>>> tested it, but nothing I tested used "collate". Oops. I need to work out
>>>>>> how to add this sort of stuff to the test suite. Probably have to use
>>>>>> archival snapshots of "ps" contents to get consistent reproducible test
>>>>>> output...)
>>>>>>
>>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>>                                                     0x00000000004219c1
>>>>>>> in top_common (filter=filter at entry=0x41f738 <merge_deltas>) at
>>>>>>> toys/posix/ps.c:1349
>>>>>>> 1349      if (old.count && (!new.count || *otb->slot < *ntb->slot)) {
>>>>>>
>>>>>> Null dereference on *otb and *ntb because now you traverse the tree in
>>>>>> order and keep each node that has a non-null ->extra field.
>>>>>>
>>>>>> Try now?
>>>>>
>>>>> (sorry, was too busy to get back to this until now...)
>>>>>
>>>>> yes, works for me now, and i probably wouldn't have found that myself
>>>>> in a reasonable amount of time anyway!
>>>>>
>>>>> "toybox top -b -n 1 -o
>>>>> pid,tid,user,pr,ni,%cpu,s,vsz,rss,pcy,command,name -s 6" now looks
>>>>> close enough to what dumpstate wants for bug reports that i can give
>>>>> it a go...
>>>>
>>>> actually, no, -o COMMAND isn't giving me what i thought. (previously
>>>> i'd only tested on the desktop.) on Android everything shows up as
>>>> /system/bin/app_process64 --- it's the first word of
>>>> readlink('/proc/pid/exe'). (when the zygote forks it sets argv[0] to
>>>> the package name of the app it's about to run, so you can see things
>>>> like "com.android.inputmethod.latin" here.)
>
> Is _THAT_ why ubuntu 14.04's chromium process filenames are screwed up?
> Because the filename of zygote-after-fork is:
>
>   /usr/lib/chromium-browser/chro
>
> Truncated at that point.
>
> Using the new logic, with "ps -AOtid,tname" I get:
>
>   PID TTY          TIME   TID TNAME           CMD
> ...
> 30878 ?        00:49:12 30878                 chromium-browse
> 30878 ?        00:00:30 30879 chro            [Chrome_ChildIOT]
> 30878 ?        00:00:30 30881 chro            [Compositor]
> 30878 ?        00:00:24 30882 chro            [CompositorTileW]
> 30878 ?        00:00:00 30883 chro            [handle-watcher-]
> 30878 ?        00:00:02 30884 chro            [HTMLParserThrea]
> 30878 ?        00:00:09 30885 chro            [ScriptStreamerT]
> 30878 ?        00:00:01  5875 chro            [Media]
>
> That TNAME was trimmed down from the above longer path. This sounds like
> the group name substitution logic you're talking about misbehaving.
>
> Oh well, I'm sure both remaining users of Ubuntu as a desktop system
> (now that Cannonical helped Windows add a Linux system call translation
> layer so people cross compiling to Android and such from there can do so
> natively from windows) can put up with it until we get Android on the
> workstation...
>
> Oh THIS is fun:
>
> 16990 ?        15:37:50 16990                 chromium-browse
> 16990 ?        00:12:03 17003  --enable-pinch [sandbox_ipc_thr]
> 16990 ?        00:00:00 17012  --enable-pinch [chromium-browse]
> 16990 ?        00:00:01 17013  --enable-pinch [NetworkChangeNo]
> 16990 ?        00:00:00 17014  --enable-pinch [D-Bus thread]
> 16990 ?        00:00:00 17015  --enable-pinch [CrShutdownDetec]
> 16990 ?        00:00:00 17016  --enable-pinch [inotify_reader]
> 16990 ?        00:06:40 17020  --enable-pinch [BrowserBlocking]
> 16990 ?        00:00:00 17021  --enable-pinch [dconf worker]
> 16990 ?        00:00:04 17022  --enable-pinch [gdbus]
> 16990 ?        00:00:16 17023  --enable-pinch [Chrome_DBThread]
> 16990 ?        00:00:10 17024  --enable-pinch [Chrome_FileThre]
> 16990 ?        00:00:22 17025  --enable-pinch [Chrome_FileUser]
> 16990 ?        00:00:00 17026  --enable-pinch [Chrome_ProcessL]
> 16990 ?        00:10:35 17027  --enable-pinch [Chrome_CacheThr]
> 16990 ?        02:39:06 17028  --enable-pinch [Chrome_IOThread]
> 16990 ?        00:00:39 17029  --enable-pinch [IndexedDB]
> 16990 ?        00:29:08 17030  --enable-pinch [CompositorTileW]
> 16990 ?        00:00:10 17031  --enable-pinch [AudioThread]
> 16990 ?        00:00:00 17032  --enable-pinch [threaded-ml]
> 16990 ?        00:00:00 17033  --enable-pinch [BrowserWatchdog]
> 16990 ?        00:06:25 17042  --enable-pinch [BrowserBlocking]
>
> That's trimmed down from "chromium-browser --enable-pinch" and yes that
> is a space in argv[0]. Chromium's logic here on Ubuntu 14.04 is just NUTS.
>
> 24723 ?        01:20:44 24723 shot-passed-by- chromium-browse
> 24723 ?        00:10:38 24727 shot-passed-by- [chromium-browse]
> 24723 ?        00:00:09 24728 shot-passed-by- [Watchdog]
> 24723 ?        00:09:55 24729 shot-passed-by- [Chrome_ChildIOT]
> 24723 ?        00:00:00 24730 shot-passed-by- [handle-watcher-]
>
> Not even asking...
>
>>> Slight awkwardness: TNAME (for "thread name") fits in the help text
>>> column but TCOMMAND doesn't.
>>>
>>> Right now NAME is argv[0] with leading path stripped and COMMAND is
>>> argv[0] verbatim, so TNAME would logically be the thread parent's
>>> argv[0] with path stripped.
>>>
>>> Would stripping path (if any) be a bad thing here? (I don't have to, I'm
>>> just trying to figure out which behavior is less surprising...)
>>
>> on Android there would never be a path anyway; it'll be a
>> fully-qualified Java package name. com.facebook.katana or whatever.
>> (but if you're asking what the old top did, it didn't strip paths. i'm
>> looking at everyone's favorite, /system/bin/mediaserver, right now :-)
>> )
>
> A thread won't (unless there's a / in a package name, which should never
> happen), but non-threaded commands might, and could easily overflow the
> display field before prividing useful info. So I'll stick with the /path
> trimming I guess.
>
> Sorry for the delay, I found another display bug: the recent fiddling
> with the width logic to add overflow capability meant that when the last
> field was trimmed to screen width size it was always right justified.
> The "trim to width" logic didn't propogate the sign of the padding
> value, negative means left justified. Fixed now.

sorry, been busy/out but finally got chance to look at this. TNAME
seems broken in a couple of ways. it's empty for kernel threads which
is a problem because TNAME is now the appropriate replacement for NAME
here (otherwise on a booted system, everything shows up as
app_process):

  else if (CFG_TOYBOX_ON_ANDROID)
    not_o = "USER,%sPPID,VSIZE,RSS,WCHAN:10,ADDR:10=PC,S,NAME";

TNAME should also have width -27 like the other "long" name fields,
rather than -15 like the "short" ones.

speaking of which, given that it's 2016 it's time the VSIZE and RSS
fields were wider too.

but getting back to TNAME... if we use TNAME rather than NAME by
default on Android, with the current implementation that implies -H.
(so you get 84 lines all saying "system_server" instead of 1.)

which brings me to: TNAME isn't really "thread name". the way
everyone's thread library works, COMM is actually the "thread name"
(that is, the name that [can be] different for each thread in a
process). TNAME is really "argv[0] of the containing process", which
is definitely a useful thing, but not something i can easily think of
a snappy name for.


so the good news -- there is good news! -- is that i think i now have
all the pieces to replace Android's old top. TNAME gives me what we've
historically called the "process name" and COMM gives me what we've
called the "thread name". something like this looks right to me
(modulo the question of whether we should rename TNAME and the fact
i'll have to manually size the VSZ/RSS fields or they'll always be
unaligned):

$ adb shell toybox top -b -n 1 -o
pid,tid,user,pr,ni,%cpu,s,vsz,rss,pcy,tname,comm -s 6
Tasks: 556 total,   1 running, 555 sleeping,   0 stopped,   0 zombie
Mem:   1879752k total,  1250856k used,   628896k free,     2284k buffers
Swap:   520908k total,        0k used,   520908k free,   538276k cached
200%cpu  14%user   0%nice  34%sys 152%idle   0%iow   0%irq   0%sirq   0%host
  PID   TID USER     PR  NI[%CPU]S    VSZ   RSS PCY TNAME
         COMM
 1942  1942 root     20   0 48.0 R   9396  2432  fg toybox
         toybox
    8     8 root     20   0  2.0 S      0     0  fg
         rcu_preempt
  190  1943 root     20   0  0.0 S  14372  1096  fg adbd
         shell srvc 40
 1939  1939 root     20   0  0.0 S      0     0  fg
         kworker/0:3
 1923  1923 root     20   0  0.0 S      0     0  fg
         kworker/0:1
 1905  1905 root     20   0  0.0 S      0     0  fg
         kworker/0:0
 1893  1893 root     20   0  0.0 S      0     0  fg
         kworker/u4:1
 1869  1869 root     20   0  0.0 S      0     0  fg
         kworker/1:0
 1838  1838 root     20   0  0.0 S      0     0  fg
         kworker/u4:2
 1806  1806 root     20   0  0.0 S      0     0  fg
         kworker/0:2
 1674  1674 root     20   0  0.0 S      0     0  fg
         kworker/1:1
  606  1665 system   30  10  0.0 S 2239144 128196 bg system_server
         AsyncTask #3
  606  1652 system   20   0  0.0 S 2239144 128196 fg system_server
         SyncHandler-0
 1127  1648 nfc      30  10  0.0 S 1522608 62988 bg com.android.nfc
         AsyncTask #3
 1593  1643 u0_a30   20   0  0.0 S 1516520 64672 fg com.android.email
         Binder_3
 1628  1640 u0_a10   20   0  0.0 S 1501988 48620 fg
com.android.musicfx         Binder_1
 1628  1641 u0_a10   20   0  0.0 S 1501988 48620 fg
com.android.musicfx         Binder_2
 1628  1637 u0_a10   20   0  0.0 S 1501988 48620 bg
com.android.musicfx         FinalizerDaemon
 1628  1639 u0_a10   20   0  0.0 S 1501988 48620 bg
com.android.musicfx         HeapTaskDaemon
 1628  1638 u0_a10   20   0  0.0 S 1501988 48620 bg
com.android.musicfx         FinalizerWatchd
 1628  1636 u0_a10   20   0  0.0 S 1501988 48620 bg
com.android.musicfx         ReferenceQueueD
 1628  1633 u0_a10   29   9  0.0 S 1501988 48620 bg
com.android.musicfx         Jit thread pool
 1628  1635 u0_a10   20   0  0.0 S 1501988 48620 bg
com.android.musicfx         JDWP

but ps has regressed as described above.

i tried reverting the obvious changes to NAME in
https://github.com/landley/toybox/commit/bc8139a3a011568727219d3f1375061612e901ed
(the slot changed from 6 to 5 and the corresponding file changed from
cmdline to exe), but i must have missed something else because that
leaves me with empty names for kernel threads.

anyway, here's as far as i got...

diff --git a/toys/posix/ps.c b/toys/posix/ps.c
index 8aaeb93..755a679 100644
--- a/toys/posix/ps.c
+++ b/toys/posix/ps.c
@@ -347,7 +347,7 @@ struct typography {

   // String fields
   {"COMM", -15, -1}, {"TTY", -8, -2}, {"WCHAN", -6, -3}, {"LABEL", -30, -4},
-  {"NAME", -15, -5}, {"TNAME", -15, -7}, {"COMMAND", -27, -5},
+  {"NAME", -15, -6}, {"TNAME", -27, -7}, {"COMMAND", -27, -5},
   {"CMDLINE", -27, -6}, {"ARGS", -27, -6}, {"CMD", -27, -1},

   // user/group
@@ -603,7 +603,7 @@ static int get_ps(struct dirtree *new)
   } fetch[] = {
     // sources for carveup->offset[] data
     {"fd/", _PS_TTY}, {"wchan", _PS_WCHAN}, {"attr/current", _PS_LABEL},
-    {"exe", _PS_COMMAND|_PS_NAME}, {"cmdline", _PS_CMDLINE|_PS_ARGS|_PS_TNAME},
+    {"exe", _PS_COMMAND}, {"cmdline", _PS_CMDLINE|_PS_ARGS|_PS_NAME|_PS_TNAME},
     {"", _PS_TNAME}
   };
   struct carveup *tb = (void *)toybuf;
@@ -1177,7 +1177,7 @@ void ps_main(void)
   else if (toys.optflags&FLAG_l)
     not_o = "F,S,UID,%sPPID,C,PRI,NI,ADDR,SZ,WCHAN,TTY,TIME,CMD";
   else if (CFG_TOYBOX_ON_ANDROID)
-    not_o = "USER,%sPPID,VSIZE,RSS,WCHAN:10,ADDR:10=PC,S,NAME";
+    not_o = "USER,%sPPID,VSIZE:7,RSS:6,WCHAN:10,ADDR:10=PC,S,NAME";
   sprintf(toybuf, not_o, (toys.optflags & FLAG_T) ? "PID,TID," : "PID,");

   // Init TT.fields. This only uses toybuf if TT.ps.o is NULL

it's not clear to me why the meaning of NAME changed in the TNAME
patch, so i'm assuming that reverting that (fully, not the
half-working revert here) makes more sense than moving to (the
questionably named) TNAME for ps and trying to fix the implied -H?

> Also, the ARGS space trimming logic was stopping at the first space
> (paths can have spaces in them) which is the whole reason I saved
> argv0len in a slot before replacing low ascii stuff (including NUL
> bytes) with spaces, and I went back through all the rationale of that
> confirming I still need it. Reading /proc/$$/exe means I don't need it
> to reliably display just the command name, but I _do_ need it for
> reliably displaying all of argv0 (which TNAME does) and for stripping
> the path from argv0 (which ARGS does).
>
> (Yes, along the way I've tested that you can have a command name with
> ')' in it and the stat[2] parsing won't get confused, that you can have
> spaces and newlines in command names and the display won't get
> confused... Really want to get this into the regression testing but as
> previously mentioned: these commands measure brownian motion and it's
> easy to get reliably human readable output but really hard to get
> reproducible machine readable output on different machines. Upgrade the
> kernel version in your VM and it's butterfly effects all over the place...)

(this is why unit testing is superior: you'd test the display code
independently of the data collection code, which in turn would be
independent of the parsing code. no input is hard to get at that
point, and all input is stable and predictable. plus you can have
realistic test data from other OSes at no extra cost. sadly although
gtest can be used to test C code, it's C++ itself so the tests
themselves would have to be C++.)

> Rob



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.



More information about the Toybox mailing list