[Toybox] ps -T

Wed Nov 1 01:57:22 PDT 2017

On 10/19/2017 06:13 PM, enh wrote
>> On 09/20/2017 05:08 PM, enh wrote:
>>> ps -T doesn't really work if you have any filters. so ps -AT is fine,
>>> but ps -p <chrome pid> -T only shows the main thread.
>>
>> Alas, I don't personally use threads much so basically never test this.
>>
>>> why? because slots[SLOT_pid] is "wrong" in shared_match_process (where
>>> by wrong i mean "is the tid").
>>>
>>> why? because toybox reads (say) /proc/147047/task/147058/stat and sees
>>>
>>> 147058 (CompositorTileW) S 31782 6249 6249 0 -1 1077952576 4 0 0 0 0 0
>>> 0 0 20 0 11 0 1211910244 928649216 35602 18446744073709551615
>>> 94558515900416 94558627572512 140720560858928 140506510343072
>>> 140506833892356 0 0 4098 1073827581 1 0 0 -1 31 0 0 0 0 0
>>> 94558627579744 94558633602584 94558666661888 140720560866826
>>> 140720560866928 140720560866928 140720560869342 0
>>>
>>> and copies 147058 into SLOT_pid because that code no longer knows the real pid.

I added -H to iotop, fixed the off-by-one error in screen width
truncation, set the screen width to 72, and ran "top -H -O TID,SHR" and
cursored over to the SHR column and:

  PID USER           TID [SHR]%CPU  %MEM     TIME+ THREAD          PROCE
 1865 landley       1865  84M 98.1  14.8 128:47.94 thunderbird     thun+
 1865 landley       1895  84M  1.8  14.8  41:53.81 SoftwareVsyncTh thun+
 1865 landley        723  84M  0.0  14.8   0:00.00 StreamT~s #3048 thun+
 1865 landley      30482  84M  0.0  14.8   2:14.82 DOM Worker      thun+
 1865 landley       9488  84M  0.0  14.8   7:16.20 DOM Worker      thun+
 1865 landley      16082  84M  0.0  14.8  15:33.85 DOM Worker      thun+
 1865 landley      15523  84M  0.0  14.8   0:00.49 DOM Worker      thun+
 1865 landley      12086  84M  0.0  14.8  17:24.69 DOM Worker      thun+
 1865 landley       3838  84M  0.0  14.8  18:14.68 DOM Worker      thun+
 1865 landley       2254  84M  0.0  14.8  18:26.28 DOM Worker      thun+
 1865 landley      30746  84M  0.0  14.8  18:30.12 DOM Worker      thun+

An they all seem to think they're PID 1865, but each knows its TID?

$ ls /proc/1865/task/
10228  1865  1875  1880  1887  1893  1899  1911  1934   30482  5471
12086  1871  1876  1881  1888  1894  1903  1912  19926  30746  6316
14369  1872  1877  1882  1889  1895  1904  1921  22017  32148  9488
15523  1873  1878  1885  1891  1897  1906  1927  2254   3838   9955
16082  1874  1879  1886  1892  1898  1908  1933  2770   5469

Which seems reasonable? It's got PID, it's got TID, what do I need to
fix here?

(Aside: thunderbird really, really, really doesn't like a local
linux-kernel folder with 500k messages in it. Or a pop3 inbox going back
to 2013, which is the last time I split it. But I think what it's mad
about right now is the BUG() I hit in the vfat code which I had to fsck
away to do some work, and thus the vfat maintainer couldn't reproduce
it. I've meant to reboot ever since, it happened about 5 times before I
fixed it, emergency-zapping a filesystem each time, and the memory
menagement on this box has gone all wonky since then. I've been meaning
to reboot to replace the keyboard anyway, but 8 desktops full of windows
full of tabs takes a while to unwind...)

>>> not sure how best to fix this.
>>
>> Hmmm... Reasonably straightforward to fix,

Not necessarily straightfowrward to reproduce.

$ ps -AT
PID   TID
32667 32667 ?        00:26:22 chromium-browse
32667 32668 ?        00:00:00 TaskSchedulerSe
32667 32669 ?        00:00:20 Chrome_ChildIOT
32667 32670 ?        00:00:00 GpuMemoryThread

They're different?

$ ls /proc/32667/task/
10087  32668  32670  32672  32674  5747  5754
32667  32669  32671  32673  32697  5748

And they're reasonable?

>> but my tree has local c
>> changes in ps.c. Looks like I'm adding -m to show maximum number of
>> lines (somebody asked, it's easy enough.)
> 
> i said "no" to the single internal request we had for that when we
> switched from traditional Android top to toybox top. easy, yes, but
> not obviously useful. the original Android top only had batch mode, so
> it was a bit more useful then. but "first N" isn't an obviously
> meaningful heuristic. "field X no lower than Y" would be more
> convincing. but that's no longer as easy :-)

I can yank it again. If you wanna design a filter syntax I can probably
implement something.

I've just started poking at bc ala shell $((blah)) math syntax, and I'm
likely to make a function or two that lets you substitute in variables
(via string substitution) and do math on the result.

That said... that's not the syntax we've got in find, or in test. And ps
has:

  -o  FIELDs instead of defaults, each with optional :size and =title

I could add <XXX and >XXX to that, I suppose? Might take a bit of
fiddling to make room.

That said, you still couldn't implement -m with a syntax like that.
Maximum number of fields to display isn't a -o field. :)

>> And -H to iotop (which is
>> where I left off; need to come up with a test for this and haven't got
>> one. Is chrome threads or processes? The big scott mccloud comic implied
>> processes, but the way google does everything implies threads, but
>> threads would defeat the entire VM sandboxing purpose of having each tab
>> in its own process...)
> 
> if you run (GNU) "ps -AT" while chrome's running, you'll see it's a
> mix. i think even the original design assumed that. see the first
> diagram here, where solid boxes are processes and dashed boxes are
> threads: https://www.chromium.org/developers/design-documents/multi-process-architecture

I read Scott McCloud's comic way back when, and am currently on a plane
to tokyo and would have to pay for network access (it's not the money,
it's the visceral dislike of entering credit card information into a web
page; DO NOT TRUST), but I'll try to remember to take a look.

Meanwhile, I believe you. :)

> here's far more detail than you could possibly want (including the
> command-line options that let you configure the model):
> https://www.chromium.org/developers/design-documents/process-models

I'm happy that as of a couple months back:

  ps ax | grep renderer | awk '{print $1}' | xargs kill

works again. I haven't asked much deeper than that. (I remember the
videos of high speed lightning discharge vs chrome page rendering. My
experience on this netbook is more "under 5 seconds is pretty good,
under 15 is usually tolerable", but the stop button on chrome is UTTERLY
USELESS (doesn't even stop the notification craw at the bottom from
TELLING you all the things it's loading, often using the monthly 4 gig
data cap t-mobile applies to tethering but not to what the actual phone
uses because money; I'm checking to see I've got the right youtube link
before tweeting it, I _DON'T_ want you to spool 10 megabytes of video
data through a metered connection).

The workaround is to right click disable wifi on the networkmangler icon
until it stops trying, usually about 15 seconds. One time out of a
thousand it gets Confused and your network goes away and CANNOT BE FIXED
until you reboot (ok, I've killed it, done the dbus status flush thing,
and respawned it from the command line successfully twice, and each time
was like half an hour of research _how_). But it was written by the same
guys who did pulseaudio and systemd so you can't expect reliablity out
of it.) At least these days chrome doesn't listen to the "network is
down" dbus notification and then refuse to show you pages from the web
server running on loopback.

(Did I mention I break everything? Seriously. People kept trying to pull
me into a tester role for the first decade of my career... and the
combination of sleep deprivation and caffeine makes me REALLY CHATTY and
the airport shuttle arrived at 4:20am and last week I bought a 50 pack
of "driving chocolate" squares that are 150mg of caffeine each and
packed ALL OF THEM. Minus the ones I already ate.)

Anyway, what I want here is something with threads to to test against,
and both chrome and thunderbird have those, so....

> and here's a Chrome engineer's "it's complicated, and everyone
> misunderstands" post:
> https://plus.google.com/+PeterKasting/posts/TC4ACtKevJY

I follow "Security Princess" on twitter (probably @laprissa? See "no net
right now" above). She blogs about this stuff from time to time. But I
copied that link into a tab for when I get to Akihabra.

Rob

P.S. 9 open reply windows to deal with before I can close thunderbird! Woo!