[Toybox] top overhead
Rob Landley
rob at landley.net
Sat Nov 16 23:13:59 PST 2024
On 11/16/24 12:34, enh wrote:
> toybox top's relatively high overhead bugged me again, and a couple of
> things stood out from `strace -c`.
>
> one is that we call sysinfo() once for each process. this doesn't take a
> _huge_ amount of cpu time, but it's still noticeable. procps and busybox
> don't do this, and ironically the code comment says this is _for_ slow
> systems:
>
> // /proc data is generated as it's read, so for maximum accuracy on slow
> // systems (or ps | more) we re-fetch uptime as we fetch each /proc line.
> sysinfo(&si);
> slot[SLOT_uptime] = si.uptime;
> slot[SLOT_totalram] = si.totalram;
> slot[SLOT_upticks] = slot[SLOT_uptime]*TT.ticks - slot[SLOT_starttime];
>
> unfortunately that comment was there since the sysinfo() call was first
> introduced, so i'm not sure what the motivating example was there? (and
> even if it did matter, would it make more sense to just compute all the
> "upticks" slots at the end? uptime and totalram seem like it's _less_
> confusing if they're just measured once per snapshot.)
It would have to be cached before we start reading because ps does
progressive output (it reads a /proc entry and prints it to stdout
before reading the next one, the sorting comes from /proc returning PID
entries in numerical order, ala ls -1f /proc).
I was thinking if you "ps | more" you can have quite a gap, but if the
overhead is significant move it to the front to do once and call pausing
the output pilot error that may give inconsistent results upticks.
We're also doing a stat() of each /proc entry to get uid and gid of the
process (well, dirtree does but we let it), I didn't think this was more
expensive than that? (Alas, getdents returns filetype but doesn't return
uid/gid.)
> -*-
>
> separately, we're doing a lot more read()s, apparently because we always
> read /proc/pid/status in addition to stat and statm. a bit of printf
> debugging shows that's because _PS_RUSER is set ... but i'm not sure why?
Sigh, I _did_ try to use those bits to gate stuff so it didn't do more
work than necessary to populate fields we weren't going to use. :P
Let's see, it's checking TT.bits which is populated by get_headers()
which when running "top" with no arguments is doing:
PID=1
USER=20000000
PR=800
NI=8
VIRT=80000000000
RES=100000000000
SHR=200000000000
S=80000000000000
%CPU=40000000000
%MEM=20000000000
TIME+=4000000000
ARGS=4000000
803e4024000809
And PS_RUSER is 31 ala _PS_RUSER is 80000000 which is NOT set in that
(the relevant digit is 2).
The only other thing potentially assigning to those bits is
if (FLAG(f)) TT.bits |= _PS_CMDLINE;
> we're not actually showing that in the default set of fields? i tried to
> work out where that was coming from, but it's time to get some fresh air
> and i haven't made much progress, so i wanted to at least record/report
> that, since this is further than i'd poked before. hacking this out is a
> bigger improvement than hacking sysinfo() out.
... yay?
It doesn't appear to be doing it here and it's still slow for me, I've
meant to track down why and try to speed it up but I didn't think it was
necessarily on the ps side? Debian's ps is slower:
$ time bash -c 'for ((i=0; i<100; i++)); do ps -A >/dev/null; done'
real 0m4.632s
user 0m1.649s
sys 0m2.926s
$ time bash -c 'for ((i=0; i<100; i++)); do toybox ps -A >/dev/null; done'
real 0m2.874s
user 0m1.220s
sys 0m1.638s
Seemed most likely to be the crunch_string() stuff being terrible...
> -*-
>
> even more confusing, i'm struggling to correlate what i'm seeing in strace
> with what i expect from the code. given that ps.c is using readfile() i'd
> expect an openat(), an fstatat() for the size, and one read(). but i
> actually see the openat(), a short read() into a larger buffer, then a
> read() that returns 0. oh, because the loop in readfd() is calling
> readall(), not read(), and readall() doesn't have the same "short read =>
> EOF" that readfd() appears to have [but doesn't actually have, because it's
> calling readall() which won't return short reads except on error].
We dunno how big the data is, stat on proc returns 0 for all the file
lengths. (Understandably, but still.)
> this caution probably isn't necessary for procfs? (procps isn't this
> cautious, but busybox looks a lot like toybox under strace, presumably also
> calling a generic routine rather than taking advantage of any assumptions
> about procfs.)
Hmmm, I suppose that might make /proc calculate the contents twice.
There's quite a bit of plumbing in lib/lib.c for this, readfd() can do
an fdlength() which is silly here...
Every user in ps is supplying a buffer with max length, I can do a local
procread() that does a one shot read if you think it would help?
> -*-
>
> but it looks like just fixing whatever's setting _PS_RUSER even though we
> don't need it gets us a lot closer to busybox/procps top load. definitely
> time for that fresh air though...
I don't think I have your test environment here...
I commented out start_redraw() so we don't go into raw mode and stuck an
exit(1) on line 795 (right before the sprintf("%lld/status") and it
triggered, but... Hmmm.
if ((TT.bits&(_PS_RGROUP|_PS_RUSER|_PS_STAT|_PS_RUID|_PS_RGID|_PS_SWAP
|_PS_IO|_PS_DIO)) || TT.GG.len || TT.UU.len)
TT.GG.len is 0, TT.UU.len is 0, TT.bits and all that mask stuff is
803e4000000000 which is strange because none of the mask bits actually
intersect? What is...
dprintf(2, "mask=%llx\n",
_PS_RGROUP|_PS_RUSER|_PS_STAT|_PS_RUID|_PS_RGID|_PS_SWAP|_PS_IO|_PS_DIO);
mask=ffffffffc0000000
What? These all have parentheses, right? Yes:
generated/tags.h:#define _PS_RGROUP (1LL<<35)
AHA! Found it:
#define _PS_RUSER (1<<31)
As a signed 32 bit int that's a negative number, which when type
promoted to a signed long in the big or chain sets the top 32 bits in
the mask.
Ok, I need to to switch to 1LL one entry earlier in scripts/make.sh...
Commit a51c66edb7c9. (The performance is still terrible, but that's one
issue down...)
Rob
More information about the Toybox
mailing list