[Toybox] buffer sizes

Thu Feb 29 09:50:59 PST 2024

On 2/28/24 17:02, enh wrote:
> On Wed, Feb 28, 2024 at 1:33 PM Rob Landley <rob at landley.net> wrote:
>>
>> On 2/28/24 13:14, enh via Toybox wrote:
>> > just fyi if you don't follow the coreutils list,
>>
>> Sadly, I am still subscribed to that because:
>>
>>   https://lists.gnu.org/archive/html/coreutils/2023-08/msg00100.html
>>
>> STILL hasn't been addressed. (How many reminders is too many? Are they being
>> passive aggressive or just disorganized?)
>>
>> Recently they've been arguing about hash functions in sort, in a feature (-R)
>> I've never even used, let alone implemented. (And would hit with a cheap CRC64
>> or something if I did.) I've been "reading with the d key" in that thread, as it
>> were...
>>
>> > i see that they're
>> > looking at moving up from 128KiB to 256KiB buffers (but without saying
>> > how _much_ "more performance" that gets them, nor what exactly "modern
>> > hardware" means).
>>
>> I remember when L3 cache was introduced in those tyan boards in something like
>> 2001. Sweet spots migrate.
>>
>> Going from "byte" to "block" is a big win. But how BIG the block is often has
>> exponentially diminishing returns. A system call is a blocking round trip
>> introducing latency into your process (which you never get back no matter how
>> parallel the rest of the system is), with opportunity for the scheduler not to
>> immediately resume you and so on basically amortized in.
>>
>> 1 byte to 128 bytes saves you 6 doublings in the number of system calls (round
>> trips). Going the rest of the way from 128 bytes to 4k is only 5 doublings, less
>> of a win than even that small initial buffer/batching. And going from 4k to 128k
>> is again 5 doublings, so _maybe_ another 1/3 gain assuming that's your bottleneck.
>>
>> > (don't get me wrong --- this is definitely a tricky one. bionic and
>> > musl chose smaller-than-traditional values for BUFSIZ for a reason,
>> > and although there's a question of whether that applies to a small
>> > stand-alone tool like toybox, i'm unconvinced that "one size fits all
>> > for toybox" either.
>>
>> The reason arm64 switched from 4k pages to 64k pages wasn't performance, it was
>> a hack to get a bigger physical memory address range without increasing the
>> number of page table levels. Moving from 4k to 64k pages let them go from 48 to
>> 52 bits of physical memory. (And PISSED OFF the musl maintainer...)
> 
> (not sure how we got onto this, but 16KiB page sizes for arm64 are
> very much about performance ... apple isn't using 16KiB pages on iOS
> to support larger amounts of physical memory :-) )

I thought Arm supported 64k TLB entries back in arm6 in 1991? Not using them
before arm inc. decided that was the way they'd get more physical address bits
seems a bit of a coincidence to me. (Why now, what changed?)

*shrug* I'm not a domain expert, but let's just say I'm not convinced. Apple is
a marketing company, and "we decided to do this for reason A, but have come up
with reasons B through F in support of what we'd already decided to do" isn't
new for them. A feature exists and they're making use of it, doesn't mean that's
why it happened. "We meant to do that" is what they'll say regardless.

I don't know if the iOS and desktop kernels are different development teams now.
I didn't hear about Apple playing around with transparent huge pages back in
2009 when Linux was doing https://lwn.net/Articles/359158/ , have no clue what
Apple's M1 TLB looks like, nor at what point spectre/meltdown mitigations
entered their design thinking so leaving the kernel mapped but not readable was
no longer a good idea hence more TLB swapping, nor when the larger physical
address strategy got communicated .from whom to whom in that process...

But "our new chip is terrible at 4k access, we needed something to take the
pressure off" is not how they're gonna phrase it. They'll talk about the
advantages of what they already chose to do because they're Apple. (Intel's
Pentium Pro faceplanting on 16 bit code comes to mind. They decided what to
optimize for.)

*shrug* Maybe it's even true. Who knows with Apple?

Rob