[Toybox] sizeof(toybuf)

Rob Landley rob at landley.net
Tue Apr 11 12:25:38 PDT 2023


Rather than bury this in an obscure place on github and never be able to find it
again, in reply to:

https://github.com/landley/toybox/commit/aa88571a6b847a96bb8ee998a9868c5a1bdb3a6e#r108474092

> do you want a static_assert somewhere that toybuf is 4096 bytes? since that's
> not necessarily the page size for arm64, say.
> 
> (unrelated, i've been meaning to ask whether we should make toybuf larger
> anyway. 4KiB is really small for modern hardware, though at the same time
> it does make it more likely that we test all the "toybuf too small, loop"
> cases even with small test inputs...)

A) not a fan of asserts.

B) it was only ever coincidentally page size, and huge pages are a thing even on
x86.

I never annotated toybuf or libbuf with any sort of alignment directive or tried
to make it come first in its segment (toybuf and libbuf are the fifth and sixth
globals defined in main.c), so they're both reasonably likely to straddle page
boundaries anyway. Heck, I'm not even sure it's cache line aligned. The actual
_guarantee_ is something like 4 bytes, except when it suddenly isn't. I fought
with this in 2021 trying to get a simple "hello world" kernel out of gcc without
needing a linker script: https://landley.net/notes-2021.html#12-04-2021

The 4096 is just a convenient scratch pad size. I use sizeof(toybuf) in a bunch
of places... and hardwire in the knowledge of its size in a bunch of others.
Plus there's a bunch of implicit "toybuf and/or this slice of it is big enough
to stick this struct in, so I can safely typecast the pointer" instances I
checked at the time (and all of them had a big fudge factor in case of future
glibc bloat).

It's really a "convenient granularity" thing. Copy loops doing byte-at-a-time
stuff is known terrible because the library and syscall execution paths come to
dominate, and grouping it into 4k blocks is 12 doublings of efficiency right
there. Going to 64k is 1/16th as much syscalls, which is not as big a deal as
1/4000th as many syscalls. And then raises the question "why not a megabyte
then" which is something you don't just casually want to do on embedded devices
without thinking about it (might as well malloc there)...

I could probably be talked into bumping it up to 64k if somebody measured
numbers saying it would help something specific? Triaging all the existing users
isn't that big a deal. The linux pipe buffer plumbing changed to collate stuff
so there's some internal copying larger granularity output might help that
wasn't the case 10 years ago... but then we get back into the "output piped to
less displays nothing for 3 minutes, and then it's a screenfull" issue. Line
buffered output is usually like ~60 bytes at a time, 4k is bigger than most
whole text screens (ok, maybe half one of yours but still ballpark). And we can
just as easily malloc a bigger scratch buffer as needed in any case where it
matters...

Rob


More information about the Toybox mailing list