[Toybox] ps -T
enh
enh at google.com
Wed Nov 1 09:05:30 PDT 2017
On Wed, Nov 1, 2017 at 1:57 AM, Rob Landley <rob at landley.net> wrote:
> On 10/19/2017 06:13 PM, enh wrote
>>> On 09/20/2017 05:08 PM, enh wrote:
>>>> ps -T doesn't really work if you have any filters. so ps -AT is fine,
>>>> but ps -p <chrome pid> -T only shows the main thread.
>>>
>>> Alas, I don't personally use threads much so basically never test this.
>>>
>>>> why? because slots[SLOT_pid] is "wrong" in shared_match_process (where
>>>> by wrong i mean "is the tid").
>>>>
>>>> why? because toybox reads (say) /proc/147047/task/147058/stat and sees
>>>>
>>>> 147058 (CompositorTileW) S 31782 6249 6249 0 -1 1077952576 4 0 0 0 0 0
>>>> 0 0 20 0 11 0 1211910244 928649216 35602 18446744073709551615
>>>> 94558515900416 94558627572512 140720560858928 140506510343072
>>>> 140506833892356 0 0 4098 1073827581 1 0 0 -1 31 0 0 0 0 0
>>>> 94558627579744 94558633602584 94558666661888 140720560866826
>>>> 140720560866928 140720560866928 140720560869342 0
>>>>
>>>> and copies 147058 into SLOT_pid because that code no longer knows the real pid.
>
> I added -H to iotop, fixed the off-by-one error in screen width
> truncation, set the screen width to 72, and ran "top -H -O TID,SHR" and
> cursored over to the SHR column and:
>
> PID USER TID [SHR]%CPU %MEM TIME+ THREAD PROCE
> 1865 landley 1865 84M 98.1 14.8 128:47.94 thunderbird thun+
> 1865 landley 1895 84M 1.8 14.8 41:53.81 SoftwareVsyncTh thun+
> 1865 landley 723 84M 0.0 14.8 0:00.00 StreamT~s #3048 thun+
> 1865 landley 30482 84M 0.0 14.8 2:14.82 DOM Worker thun+
> 1865 landley 9488 84M 0.0 14.8 7:16.20 DOM Worker thun+
> 1865 landley 16082 84M 0.0 14.8 15:33.85 DOM Worker thun+
> 1865 landley 15523 84M 0.0 14.8 0:00.49 DOM Worker thun+
> 1865 landley 12086 84M 0.0 14.8 17:24.69 DOM Worker thun+
> 1865 landley 3838 84M 0.0 14.8 18:14.68 DOM Worker thun+
> 1865 landley 2254 84M 0.0 14.8 18:26.28 DOM Worker thun+
> 1865 landley 30746 84M 0.0 14.8 18:30.12 DOM Worker thun+
>
> An they all seem to think they're PID 1865, but each knows its TID?
yep, that works fine.
> $ ls /proc/1865/task/
> 10228 1865 1875 1880 1887 1893 1899 1911 1934 30482 5471
> 12086 1871 1876 1881 1888 1894 1903 1912 19926 30746 6316
> 14369 1872 1877 1882 1889 1895 1904 1921 22017 32148 9488
> 15523 1873 1878 1885 1891 1897 1906 1927 2254 3838 9955
> 16082 1874 1879 1886 1892 1898 1908 1933 2770 5469
>
> Which seems reasonable? It's got PID, it's got TID, what do I need to
> fix here?
the bug i reported... :-)
i think you read the bug backwards because i included a case that *does* work...
> (Aside: thunderbird really, really, really doesn't like a local
> linux-kernel folder with 500k messages in it. Or a pop3 inbox going back
> to 2013, which is the last time I split it. But I think what it's mad
> about right now is the BUG() I hit in the vfat code which I had to fsck
> away to do some work, and thus the vfat maintainer couldn't reproduce
> it. I've meant to reboot ever since, it happened about 5 times before I
> fixed it, emergency-zapping a filesystem each time, and the memory
> menagement on this box has gone all wonky since then. I've been meaning
> to reboot to replace the keyboard anyway, but 8 desktops full of windows
> full of tabs takes a while to unwind...)
>
>>>> not sure how best to fix this.
>>>
>>> Hmmm... Reasonably straightforward to fix,
>
> Not necessarily straightfowrward to reproduce.
>
> $ ps -AT
> PID TID
> 32667 32667 ? 00:26:22 chromium-browse
> 32667 32668 ? 00:00:00 TaskSchedulerSe
> 32667 32669 ? 00:00:20 Chrome_ChildIOT
> 32667 32670 ? 00:00:00 GpuMemoryThread
...specifically the ps -AT case...
...and a case that *doesn't* work, which is when there's a filter
*instead of* -A. i'll just repeat the original because i'm too lazy to
rewrite it and because i think the problem is just that you were just
too tired when you looked at it :-)
ps -T doesn't really work if you have any filters. so ps -AT is fine,
but ps -p <chrome pid> -T only shows the main thread.
why? because slots[SLOT_pid] is "wrong" in shared_match_process (where
by wrong i mean "is the tid").
why? because toybox reads (say) /proc/147047/task/147058/stat and sees
147058 (CompositorTileW) S 31782 6249 6249 0 -1 1077952576 4 0 0 0 0 0
0 0 20 0 11 0 1211910244 928649216 35602 18446744073709551615
94558515900416 94558627572512 140720560858928 140506510343072
140506833892356 0 0 4098 1073827581 1 0 0 -1 31 0 0 0 0 0
94558627579744 94558633602584 94558666661888 140720560866826
140720560866928 140720560866928 140720560869342 0
and copies 147058 into SLOT_pid because that code no longer knows the real pid.
i'll also add a concrete example from my laptop right now:
/tmp/toybox$ ps -T 32190
PID SPID TTY STAT TIME COMMAND
32190 32190 ? Sl 0:00 /usr/bin/uplink-soecks
32190 32192 ? Sl 0:01 /usr/bin/uplink-soecks
32190 32193 ? Sl 0:00 /usr/bin/uplink-soecks
32190 32194 ? Sl 0:02 /usr/bin/uplink-soecks
32190 32197 ? Sl 0:02 /usr/bin/uplink-soecks
32190 32204 ? Sl 0:02 /usr/bin/uplink-soecks
32190 32205 ? Sl 0:00 /usr/bin/uplink-soecks
32190 32206 ? Sl 0:02 /usr/bin/uplink-soecks
32190 32207 ? Sl 0:00 /usr/bin/uplink-soecks
32190 21914 ? Sl 0:02 /usr/bin/uplink-soecks
32190 21915 ? Sl 0:01 /usr/bin/uplink-soecks
32190 31176 ? Sl 0:01 /usr/bin/uplink-soecks
/tmp/toybox$ ./toybox ps -T 32190
PID TID TTY TIME CMD
32190 32190 ? 00:00:16 uplink-soecks
/tmp/toybox$
> They're different?
>
> $ ls /proc/32667/task/
> 10087 32668 32670 32672 32674 5747 5754
> 32667 32669 32671 32673 32697 5748
>
> And they're reasonable?
>
>>> but my tree has local c
>>> changes in ps.c. Looks like I'm adding -m to show maximum number of
>>> lines (somebody asked, it's easy enough.)
>>
>> i said "no" to the single internal request we had for that when we
>> switched from traditional Android top to toybox top. easy, yes, but
>> not obviously useful. the original Android top only had batch mode, so
>> it was a bit more useful then. but "first N" isn't an obviously
>> meaningful heuristic. "field X no lower than Y" would be more
>> convincing. but that's no longer as easy :-)
>
> I can yank it again. If you wanna design a filter syntax I can probably
> implement something.
no, i don't care either way. i just wasn't going to implement
something that i personally believe isn't useful (or at least "doesn't
answer the question they actually have, and is prone to giving you too
much or too little information"). it's two-year absence has weaned
Google's Android folks off it, so i'll probably never be annoyed by a
useless bug report caused by truncation again :-)
> I've just started poking at bc ala shell $((blah)) math syntax, and I'm
> likely to make a function or two that lets you substitute in variables
> (via string substitution) and do math on the result.
>
> That said... that's not the syntax we've got in find, or in test. And ps
> has:
>
> -o FIELDs instead of defaults, each with optional :size and =title
>
> I could add <XXX and >XXX to that, I suppose? Might take a bit of
> fiddling to make room.
>
> That said, you still couldn't implement -m with a syntax like that.
> Maximum number of fields to display isn't a -o field. :)
yeah, but like i said: it's just not a useful thing anyway. if you
genuinely just care about lines of text, use `head`. if you're using
it as a proxy for something else, this feature would let you actually
cut off at the right point. (so personally i'd have said "no -m, but
generalized filtering is on the TODO list, waiting for enough people
to have concrete use cases".)
>>> And -H to iotop (which is
>>> where I left off; need to come up with a test for this and haven't got
>>> one. Is chrome threads or processes? The big scott mccloud comic implied
>>> processes, but the way google does everything implies threads, but
>>> threads would defeat the entire VM sandboxing purpose of having each tab
>>> in its own process...)
>>
>> if you run (GNU) "ps -AT" while chrome's running, you'll see it's a
>> mix. i think even the original design assumed that. see the first
>> diagram here, where solid boxes are processes and dashed boxes are
>> threads: https://www.chromium.org/developers/design-documents/multi-process-architecture
>
> I read Scott McCloud's comic way back when, and am currently on a plane
> to tokyo and would have to pay for network access (it's not the money,
> it's the visceral dislike of entering credit card information into a web
> page; DO NOT TRUST), but I'll try to remember to take a look.
>
> Meanwhile, I believe you. :)
>
>> here's far more detail than you could possibly want (including the
>> command-line options that let you configure the model):
>> https://www.chromium.org/developers/design-documents/process-models
>
> I'm happy that as of a couple months back:
>
> ps ax | grep renderer | awk '{print $1}' | xargs kill
i actually slightly miss that it no longer leaks itself to death every
couple of weeks --- now i'm sometimes forced to manually restart it
when the "time to upgrade" blob has been red too long.
> works again. I haven't asked much deeper than that. (I remember the
> videos of high speed lightning discharge vs chrome page rendering. My
> experience on this netbook is more "under 5 seconds is pretty good,
> under 15 is usually tolerable", but the stop button on chrome is UTTERLY
> USELESS (doesn't even stop the notification craw at the bottom from
> TELLING you all the things it's loading, often using the monthly 4 gig
> data cap t-mobile applies to tethering but not to what the actual phone
> uses because money; I'm checking to see I've got the right youtube link
> before tweeting it, I _DON'T_ want you to spool 10 megabytes of video
> data through a metered connection).
>
> The workaround is to right click disable wifi on the networkmangler icon
> until it stops trying, usually about 15 seconds. One time out of a
> thousand it gets Confused and your network goes away and CANNOT BE FIXED
> until you reboot (ok, I've killed it, done the dbus status flush thing,
> and respawned it from the command line successfully twice, and each time
> was like half an hour of research _how_). But it was written by the same
> guys who did pulseaudio and systemd so you can't expect reliablity out
> of it.) At least these days chrome doesn't listen to the "network is
> down" dbus notification and then refuse to show you pages from the web
> server running on loopback.
>
> (Did I mention I break everything? Seriously. People kept trying to pull
> me into a tester role for the first decade of my career... and the
> combination of sleep deprivation and caffeine makes me REALLY CHATTY and
> the airport shuttle arrived at 4:20am and last week I bought a 50 pack
> of "driving chocolate" squares that are 150mg of caffeine each and
> packed ALL OF THEM. Minus the ones I already ate.)
see... i knew there was a reason you read the bug report backwards :-)
> Anyway, what I want here is something with threads to to test against,
> and both chrome and thunderbird have those, so....
>
>> and here's a Chrome engineer's "it's complicated, and everyone
>> misunderstands" post:
>> https://plus.google.com/+PeterKasting/posts/TC4ACtKevJY
>
> I follow "Security Princess" on twitter (probably @laprissa? See "no net
> right now" above). She blogs about this stuff from time to time. But I
> copied that link into a tab for when I get to Akihabra.
>
> Rob
>
> P.S. 9 open reply windows to deal with before I can close thunderbird! Woo!
--
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.
More information about the Toybox
mailing list