[Toybox] buffer sizes
enh
enh at google.com
Thu Feb 29 12:46:46 PST 2024
On Thu, Feb 29, 2024 at 9:42 AM Rob Landley <rob at landley.net> wrote:
>
>
>
> On 2/28/24 17:02, enh wrote:
> > On Wed, Feb 28, 2024 at 1:33 PM Rob Landley <rob at landley.net> wrote:
> >>
> >> On 2/28/24 13:14, enh via Toybox wrote:
> >> > just fyi if you don't follow the coreutils list,
> >>
> >> Sadly, I am still subscribed to that because:
> >>
> >> https://lists.gnu.org/archive/html/coreutils/2023-08/msg00100.html
> >>
> >> STILL hasn't been addressed. (How many reminders is too many? Are they being
> >> passive aggressive or just disorganized?)
> >>
> >> Recently they've been arguing about hash functions in sort, in a feature (-R)
> >> I've never even used, let alone implemented. (And would hit with a cheap CRC64
> >> or something if I did.) I've been "reading with the d key" in that thread, as it
> >> were...
> >>
> >> > i see that they're
> >> > looking at moving up from 128KiB to 256KiB buffers (but without saying
> >> > how _much_ "more performance" that gets them, nor what exactly "modern
> >> > hardware" means).
> >>
> >> I remember when L3 cache was introduced in those tyan boards in something like
> >> 2001. Sweet spots migrate.
> >>
> >> Going from "byte" to "block" is a big win. But how BIG the block is often has
> >> exponentially diminishing returns. A system call is a blocking round trip
> >> introducing latency into your process (which you never get back no matter how
> >> parallel the rest of the system is), with opportunity for the scheduler not to
> >> immediately resume you and so on basically amortized in.
> >>
> >> 1 byte to 128 bytes saves you 6 doublings in the number of system calls (round
> >> trips). Going the rest of the way from 128 bytes to 4k is only 5 doublings, less
> >> of a win than even that small initial buffer/batching. And going from 4k to 128k
> >> is again 5 doublings, so _maybe_ another 1/3 gain assuming that's your bottleneck.
> >>
> >> > (don't get me wrong --- this is definitely a tricky one. bionic and
> >> > musl chose smaller-than-traditional values for BUFSIZ for a reason,
> >> > and although there's a question of whether that applies to a small
> >> > stand-alone tool like toybox, i'm unconvinced that "one size fits all
> >> > for toybox" either.
> >>
> >> The reason arm64 switched from 4k pages to 64k pages wasn't performance, it was
> >> a hack to get a bigger physical memory address range without increasing the
> >> number of page table levels. Moving from 4k to 64k pages let them go from 48 to
> >> 52 bits of physical memory. (And PISSED OFF the musl maintainer...)
> >
> > (not sure how we got onto this, but 16KiB page sizes for arm64 are
> > very much about performance ... apple isn't using 16KiB pages on iOS
> > to support larger amounts of physical memory :-) )
>
> I thought Arm supported 64k TLB entries back in arm6 in 1991? Not using them
> before arm inc. decided that was the way they'd get more physical address bits
> seems a bit of a coincidence to me. (Why now, what changed?)
oh, i thought we were talking about arm64... back in the 1980s when i
was first using the arm2, yes, the page size was variable and weird
there (https://www.riscosopen.org/wiki/documentation/show/Archimedes%20Hardware#:~:text=Page%20size,address%20to%20a%20physical%20address.
for the full details) but that was a hardware quirk of the [separate]
MEMC chip.
> *shrug* I'm not a domain expert, but let's just say I'm not convinced. Apple is
> a marketing company, and "we decided to do this for reason A, but have come up
> with reasons B through F in support of what we'd already decided to do" isn't
> new for them. A feature exists and they're making use of it, doesn't mean that's
> why it happened. "We meant to do that" is what they'll say regardless.
i'm not aware that Apple has said anything public about this other
than the "thou shalt not make assumptions about page size" in their
developer documentation.
(note also that Apple's on 16KiB pages, not the 64KiB pages that
RedHat supports.)
> I don't know if the iOS and desktop kernels are different development teams now.
> I didn't hear about Apple playing around with transparent huge pages back in
> 2009 when Linux was doing https://lwn.net/Articles/359158/ , have no clue what
> Apple's M1 TLB looks like, nor at what point spectre/meltdown mitigations
> entered their design thinking so leaving the kernel mapped but not readable was
> no longer a good idea hence more TLB swapping, nor when the larger physical
> address strategy got communicated .from whom to whom in that process...
>
> But "our new chip is terrible at 4k access, we needed something to take the
> pressure off" is not how they're gonna phrase it. They'll talk about the
> advantages of what they already chose to do because they're Apple. (Intel's
> Pentium Pro faceplanting on 16 bit code comes to mind. They decided what to
> optimize for.)
>
> *shrug* Maybe it's even true. Who knows with Apple?
>
> Rob
More information about the Toybox
mailing list