[Toybox] copy_file_range and (1<<30)

Fri Jan 10 15:09:49 PST 2025

On 1/10/25 15:40, enh wrote:
> On Fri, May 26, 2023 at 10:26 AM Rob Landley <rob at landley.net> wrote:
>>
>> On 5/25/23 19:08, enh via Toybox wrote:
>>> so i finally enabled copy_file_range for the _host_ toybox because someone
>>> pointed out that we copying 16GiB zip files around in the build, and even though
>>> obviously we should stop doing that, 30s seemed unreasonable, and coreutils cp
>>> "only" took 20s because of copy_file_range.
>>
>> Hardlinking them is not an option? :)
>>
>>> but toybox cp with copy_file_range still takes 25s. why?
>>>
>>>        if (bytes<0 || bytes>(1<<30)) len = (1<<30);
>>>
>>> the checkin comment being:
>>>
>>> Update comments and add "sanity check" from kernel commit f16acc9d9b376.
>>> (The kernel's been doing this since 2019, but older kernels may not, so...)
>>
>> The problem being that _before_ that commit, too big a sendfile didn't work
>> right (returned an error from the kernel?). I suspect my range check was just
>> the largest power of 2 that fit in the constraint...
> 
> is that true? the diff for that commit makes it look like it
> internally silently used `min(MAX_RW_COUNT, len)` which should be fine
> with the usual "subtract what was actually written" logic?
> 
> (libc++ just started to use copy_file_range(), and i asked whether
> they knew about this limit, and then couldn't explain why toybox has a
> special case...)

Let's see... grep for '1<<30' in lib, git annotate the file, search for 
the 1<<30... cut and paste the hash and do a ^1 to peel off that commit 
(sigh, I want a GUI/IDE tool for this where I could just click)... it 
was introduced in toybox commit 9b368059deec which says "Update comments 
and add "sanity check" from kernel commit f16acc9d9b376. (The kernel's 
been doing this since 2019, but older kernels may not, so...)"

The check from that kernel commit was:

+       return do_splice_direct(file_in, &pos_in, file_out, &pos_out,
+                               len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);

And checking out that commit and searching for MAX_RW_COUNT :

include/linux/fs.h:#define MAX_RW_COUNT (INT_MAX & PAGE_MASK)

And of course:

include/asm-generic/page.h:#define PAGE_MASK	(~(PAGE_SIZE-1))

So they're using 2 gigs minus 4k, so yes 1<<30 (1 gig) is the largest 
power of 2 that fits within that constraint.

Is there a big performance difference between queueing up 1 gigabyte at 
a time and 2 gigabytes at a time? We can be exact, I just didn't want it 
to start going boing again on systems where PAGE_SIZE isn't 4k or some 
such. For a 16 gigabyte file it returns back into our code 16 times (vs 
8 times) which either way you'd THINK would take a grand total of like a 
milisecond even with switching overhead to deal with. A 5 second 
difference seems... odd? (If you've got zero readahead and zero 
writeback cacheing and you're on rotating media where a slight hitch 
costs a rotational delay then maybe? Even so, it should just be 16 extra 
rotations vs 8 extra rotations. A dog slow ancient hard drive was 1200 
rpm meaning 20 rps meaning 8 extra was less than half a second...)

I mean, IDEALLY the kernel would have a "repeat until spanked" -1 value 
where it just copies until EOF and stops bothering us, but the kernel 
didn't seem to offer that as an option last I checked...

>>> what the kernel _actually_ does though is clamp to MAX_RW_COUNT. which is
>>> actually (INT_MAX & PAGE_MASK). which i'm assuming changes for a non-4KiB page
>>> kernel?

Yeah, that. ^^^

The constraint is there because older kernels would error out on too big 
a value, rather than clamping, and I didn't want to query the kernel 
version and have two codepaths.

Rob