[Toybox] top -H segfault
enh
enh at google.com
Sat Jun 4 09:08:15 PDT 2016
On Fri, May 27, 2016 at 12:56 PM, Rob Landley <rob at landley.net> wrote:
>
>
> On 05/26/2016 05:20 PM, enh wrote:
>> On Thu, May 26, 2016 at 1:03 PM, Rob Landley <rob at landley.net> wrote:
>>> On 05/25/2016 04:35 PM, enh wrote:
>>>> On Wed, May 25, 2016 at 1:59 PM, enh <enh at google.com> wrote:
>>>>> On Tue, May 24, 2016 at 2:23 PM, Rob Landley <rob at landley.net> wrote:
>>>>>> On 05/23/2016 07:59 PM, enh wrote:
>>>>>>> just running home, so no time to debug further, but in case you
>>>>>>> haven't seen this yet... plain "top -H" is segfaulting for me before
>>>>>>> displaying anything.
>>>>>>
>>>>>> Hmmm, no I hadn't seen that. But I do see that ./toybox top -H isn't
>>>>>> displaying anything when I do that...
>>>>>>
>>>>>> Ah. The active node logic changed and I missed updating a spot. (I
>>>>>> tested it, but nothing I tested used "collate". Oops. I need to work out
>>>>>> how to add this sort of stuff to the test suite. Probably have to use
>>>>>> archival snapshots of "ps" contents to get consistent reproducible test
>>>>>> output...)
>>>>>>
>>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>> 0x00000000004219c1
>>>>>>> in top_common (filter=filter at entry=0x41f738 <merge_deltas>) at
>>>>>>> toys/posix/ps.c:1349
>>>>>>> 1349 if (old.count && (!new.count || *otb->slot < *ntb->slot)) {
>>>>>>
>>>>>> Null dereference on *otb and *ntb because now you traverse the tree in
>>>>>> order and keep each node that has a non-null ->extra field.
>>>>>>
>>>>>> Try now?
>>>>>
>>>>> (sorry, was too busy to get back to this until now...)
>>>>>
>>>>> yes, works for me now, and i probably wouldn't have found that myself
>>>>> in a reasonable amount of time anyway!
>>>>>
>>>>> "toybox top -b -n 1 -o
>>>>> pid,tid,user,pr,ni,%cpu,s,vsz,rss,pcy,command,name -s 6" now looks
>>>>> close enough to what dumpstate wants for bug reports that i can give
>>>>> it a go...
>>>>
>>>> actually, no, -o COMMAND isn't giving me what i thought. (previously
>>>> i'd only tested on the desktop.) on Android everything shows up as
>>>> /system/bin/app_process64 --- it's the first word of
>>>> readlink('/proc/pid/exe'). (when the zygote forks it sets argv[0] to
>>>> the package name of the app it's about to run, so you can see things
>>>> like "com.android.inputmethod.latin" here.)
>
> Is _THAT_ why ubuntu 14.04's chromium process filenames are screwed up?
> Because the filename of zygote-after-fork is:
>
> /usr/lib/chromium-browser/chro
>
> Truncated at that point.
>
> Using the new logic, with "ps -AOtid,tname" I get:
>
> PID TTY TIME TID TNAME CMD
> ...
> 30878 ? 00:49:12 30878 chromium-browse
> 30878 ? 00:00:30 30879 chro [Chrome_ChildIOT]
> 30878 ? 00:00:30 30881 chro [Compositor]
> 30878 ? 00:00:24 30882 chro [CompositorTileW]
> 30878 ? 00:00:00 30883 chro [handle-watcher-]
> 30878 ? 00:00:02 30884 chro [HTMLParserThrea]
> 30878 ? 00:00:09 30885 chro [ScriptStreamerT]
> 30878 ? 00:00:01 5875 chro [Media]
>
> That TNAME was trimmed down from the above longer path. This sounds like
> the group name substitution logic you're talking about misbehaving.
>
> Oh well, I'm sure both remaining users of Ubuntu as a desktop system
> (now that Cannonical helped Windows add a Linux system call translation
> layer so people cross compiling to Android and such from there can do so
> natively from windows) can put up with it until we get Android on the
> workstation...
>
> Oh THIS is fun:
>
> 16990 ? 15:37:50 16990 chromium-browse
> 16990 ? 00:12:03 17003 --enable-pinch [sandbox_ipc_thr]
> 16990 ? 00:00:00 17012 --enable-pinch [chromium-browse]
> 16990 ? 00:00:01 17013 --enable-pinch [NetworkChangeNo]
> 16990 ? 00:00:00 17014 --enable-pinch [D-Bus thread]
> 16990 ? 00:00:00 17015 --enable-pinch [CrShutdownDetec]
> 16990 ? 00:00:00 17016 --enable-pinch [inotify_reader]
> 16990 ? 00:06:40 17020 --enable-pinch [BrowserBlocking]
> 16990 ? 00:00:00 17021 --enable-pinch [dconf worker]
> 16990 ? 00:00:04 17022 --enable-pinch [gdbus]
> 16990 ? 00:00:16 17023 --enable-pinch [Chrome_DBThread]
> 16990 ? 00:00:10 17024 --enable-pinch [Chrome_FileThre]
> 16990 ? 00:00:22 17025 --enable-pinch [Chrome_FileUser]
> 16990 ? 00:00:00 17026 --enable-pinch [Chrome_ProcessL]
> 16990 ? 00:10:35 17027 --enable-pinch [Chrome_CacheThr]
> 16990 ? 02:39:06 17028 --enable-pinch [Chrome_IOThread]
> 16990 ? 00:00:39 17029 --enable-pinch [IndexedDB]
> 16990 ? 00:29:08 17030 --enable-pinch [CompositorTileW]
> 16990 ? 00:00:10 17031 --enable-pinch [AudioThread]
> 16990 ? 00:00:00 17032 --enable-pinch [threaded-ml]
> 16990 ? 00:00:00 17033 --enable-pinch [BrowserWatchdog]
> 16990 ? 00:06:25 17042 --enable-pinch [BrowserBlocking]
>
> That's trimmed down from "chromium-browser --enable-pinch" and yes that
> is a space in argv[0]. Chromium's logic here on Ubuntu 14.04 is just NUTS.
>
> 24723 ? 01:20:44 24723 shot-passed-by- chromium-browse
> 24723 ? 00:10:38 24727 shot-passed-by- [chromium-browse]
> 24723 ? 00:00:09 24728 shot-passed-by- [Watchdog]
> 24723 ? 00:09:55 24729 shot-passed-by- [Chrome_ChildIOT]
> 24723 ? 00:00:00 24730 shot-passed-by- [handle-watcher-]
>
> Not even asking...
>
>>> Slight awkwardness: TNAME (for "thread name") fits in the help text
>>> column but TCOMMAND doesn't.
>>>
>>> Right now NAME is argv[0] with leading path stripped and COMMAND is
>>> argv[0] verbatim, so TNAME would logically be the thread parent's
>>> argv[0] with path stripped.
>>>
>>> Would stripping path (if any) be a bad thing here? (I don't have to, I'm
>>> just trying to figure out which behavior is less surprising...)
>>
>> on Android there would never be a path anyway; it'll be a
>> fully-qualified Java package name. com.facebook.katana or whatever.
>> (but if you're asking what the old top did, it didn't strip paths. i'm
>> looking at everyone's favorite, /system/bin/mediaserver, right now :-)
>> )
>
> A thread won't (unless there's a / in a package name, which should never
> happen), but non-threaded commands might, and could easily overflow the
> display field before prividing useful info. So I'll stick with the /path
> trimming I guess.
>
> Sorry for the delay, I found another display bug: the recent fiddling
> with the width logic to add overflow capability meant that when the last
> field was trimmed to screen width size it was always right justified.
> The "trim to width" logic didn't propogate the sign of the padding
> value, negative means left justified. Fixed now.
sorry, been busy/out but finally got chance to look at this. TNAME
seems broken in a couple of ways. it's empty for kernel threads which
is a problem because TNAME is now the appropriate replacement for NAME
here (otherwise on a booted system, everything shows up as
app_process):
else if (CFG_TOYBOX_ON_ANDROID)
not_o = "USER,%sPPID,VSIZE,RSS,WCHAN:10,ADDR:10=PC,S,NAME";
TNAME should also have width -27 like the other "long" name fields,
rather than -15 like the "short" ones.
speaking of which, given that it's 2016 it's time the VSIZE and RSS
fields were wider too.
but getting back to TNAME... if we use TNAME rather than NAME by
default on Android, with the current implementation that implies -H.
(so you get 84 lines all saying "system_server" instead of 1.)
which brings me to: TNAME isn't really "thread name". the way
everyone's thread library works, COMM is actually the "thread name"
(that is, the name that [can be] different for each thread in a
process). TNAME is really "argv[0] of the containing process", which
is definitely a useful thing, but not something i can easily think of
a snappy name for.
so the good news -- there is good news! -- is that i think i now have
all the pieces to replace Android's old top. TNAME gives me what we've
historically called the "process name" and COMM gives me what we've
called the "thread name". something like this looks right to me
(modulo the question of whether we should rename TNAME and the fact
i'll have to manually size the VSZ/RSS fields or they'll always be
unaligned):
$ adb shell toybox top -b -n 1 -o
pid,tid,user,pr,ni,%cpu,s,vsz,rss,pcy,tname,comm -s 6
Tasks: 556 total, 1 running, 555 sleeping, 0 stopped, 0 zombie
Mem: 1879752k total, 1250856k used, 628896k free, 2284k buffers
Swap: 520908k total, 0k used, 520908k free, 538276k cached
200%cpu 14%user 0%nice 34%sys 152%idle 0%iow 0%irq 0%sirq 0%host
PID TID USER PR NI[%CPU]S VSZ RSS PCY TNAME
COMM
1942 1942 root 20 0 48.0 R 9396 2432 fg toybox
toybox
8 8 root 20 0 2.0 S 0 0 fg
rcu_preempt
190 1943 root 20 0 0.0 S 14372 1096 fg adbd
shell srvc 40
1939 1939 root 20 0 0.0 S 0 0 fg
kworker/0:3
1923 1923 root 20 0 0.0 S 0 0 fg
kworker/0:1
1905 1905 root 20 0 0.0 S 0 0 fg
kworker/0:0
1893 1893 root 20 0 0.0 S 0 0 fg
kworker/u4:1
1869 1869 root 20 0 0.0 S 0 0 fg
kworker/1:0
1838 1838 root 20 0 0.0 S 0 0 fg
kworker/u4:2
1806 1806 root 20 0 0.0 S 0 0 fg
kworker/0:2
1674 1674 root 20 0 0.0 S 0 0 fg
kworker/1:1
606 1665 system 30 10 0.0 S 2239144 128196 bg system_server
AsyncTask #3
606 1652 system 20 0 0.0 S 2239144 128196 fg system_server
SyncHandler-0
1127 1648 nfc 30 10 0.0 S 1522608 62988 bg com.android.nfc
AsyncTask #3
1593 1643 u0_a30 20 0 0.0 S 1516520 64672 fg com.android.email
Binder_3
1628 1640 u0_a10 20 0 0.0 S 1501988 48620 fg
com.android.musicfx Binder_1
1628 1641 u0_a10 20 0 0.0 S 1501988 48620 fg
com.android.musicfx Binder_2
1628 1637 u0_a10 20 0 0.0 S 1501988 48620 bg
com.android.musicfx FinalizerDaemon
1628 1639 u0_a10 20 0 0.0 S 1501988 48620 bg
com.android.musicfx HeapTaskDaemon
1628 1638 u0_a10 20 0 0.0 S 1501988 48620 bg
com.android.musicfx FinalizerWatchd
1628 1636 u0_a10 20 0 0.0 S 1501988 48620 bg
com.android.musicfx ReferenceQueueD
1628 1633 u0_a10 29 9 0.0 S 1501988 48620 bg
com.android.musicfx Jit thread pool
1628 1635 u0_a10 20 0 0.0 S 1501988 48620 bg
com.android.musicfx JDWP
but ps has regressed as described above.
i tried reverting the obvious changes to NAME in
https://github.com/landley/toybox/commit/bc8139a3a011568727219d3f1375061612e901ed
(the slot changed from 6 to 5 and the corresponding file changed from
cmdline to exe), but i must have missed something else because that
leaves me with empty names for kernel threads.
anyway, here's as far as i got...
diff --git a/toys/posix/ps.c b/toys/posix/ps.c
index 8aaeb93..755a679 100644
--- a/toys/posix/ps.c
+++ b/toys/posix/ps.c
@@ -347,7 +347,7 @@ struct typography {
// String fields
{"COMM", -15, -1}, {"TTY", -8, -2}, {"WCHAN", -6, -3}, {"LABEL", -30, -4},
- {"NAME", -15, -5}, {"TNAME", -15, -7}, {"COMMAND", -27, -5},
+ {"NAME", -15, -6}, {"TNAME", -27, -7}, {"COMMAND", -27, -5},
{"CMDLINE", -27, -6}, {"ARGS", -27, -6}, {"CMD", -27, -1},
// user/group
@@ -603,7 +603,7 @@ static int get_ps(struct dirtree *new)
} fetch[] = {
// sources for carveup->offset[] data
{"fd/", _PS_TTY}, {"wchan", _PS_WCHAN}, {"attr/current", _PS_LABEL},
- {"exe", _PS_COMMAND|_PS_NAME}, {"cmdline", _PS_CMDLINE|_PS_ARGS|_PS_TNAME},
+ {"exe", _PS_COMMAND}, {"cmdline", _PS_CMDLINE|_PS_ARGS|_PS_NAME|_PS_TNAME},
{"", _PS_TNAME}
};
struct carveup *tb = (void *)toybuf;
@@ -1177,7 +1177,7 @@ void ps_main(void)
else if (toys.optflags&FLAG_l)
not_o = "F,S,UID,%sPPID,C,PRI,NI,ADDR,SZ,WCHAN,TTY,TIME,CMD";
else if (CFG_TOYBOX_ON_ANDROID)
- not_o = "USER,%sPPID,VSIZE,RSS,WCHAN:10,ADDR:10=PC,S,NAME";
+ not_o = "USER,%sPPID,VSIZE:7,RSS:6,WCHAN:10,ADDR:10=PC,S,NAME";
sprintf(toybuf, not_o, (toys.optflags & FLAG_T) ? "PID,TID," : "PID,");
// Init TT.fields. This only uses toybuf if TT.ps.o is NULL
it's not clear to me why the meaning of NAME changed in the TNAME
patch, so i'm assuming that reverting that (fully, not the
half-working revert here) makes more sense than moving to (the
questionably named) TNAME for ps and trying to fix the implied -H?
> Also, the ARGS space trimming logic was stopping at the first space
> (paths can have spaces in them) which is the whole reason I saved
> argv0len in a slot before replacing low ascii stuff (including NUL
> bytes) with spaces, and I went back through all the rationale of that
> confirming I still need it. Reading /proc/$$/exe means I don't need it
> to reliably display just the command name, but I _do_ need it for
> reliably displaying all of argv0 (which TNAME does) and for stripping
> the path from argv0 (which ARGS does).
>
> (Yes, along the way I've tested that you can have a command name with
> ')' in it and the stat[2] parsing won't get confused, that you can have
> spaces and newlines in command names and the display won't get
> confused... Really want to get this into the regression testing but as
> previously mentioned: these commands measure brownian motion and it's
> easy to get reliably human readable output but really hard to get
> reproducible machine readable output on different machines. Upgrade the
> kernel version in your VM and it's butterfly effects all over the place...)
(this is why unit testing is superior: you'd test the display code
independently of the data collection code, which in turn would be
independent of the parsing code. no input is hard to get at that
point, and all input is stable and predictable. plus you can have
realistic test data from other OSes at no extra cost. sadly although
gtest can be used to test C code, it's C++ itself so the tests
themselves would have to be C++.)
> Rob
--
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.
More information about the Toybox
mailing list