[Toybox] [PATCH 1/2] Handle large read and write lengths.

Samanta Navarro ferivoz at riseup.net
Sat Aug 14 05:10:09 PDT 2021


Hi Rob,

I hope that you have recovered from your sickness by now!

On Mon, Aug 09, 2021 at 02:44:45AM -0500, Rob Landley wrote:
> > The functions readall and writeall can return an error value by mistake
> > if more than 2 GB of data are read or written.
> 
> That was intentional. If your file sizes are that big we probably want to mmap()
> stuff.

The functions read and mmap have different use cases. You cannot mmap a
pipe, a socket or any other form of byte streams. Also it can be risky
to use mmap if mapped data is validated before actually used. If it is a
shared file which can be modified by a malicious actor, then this can
lead to security issues as well (setuid with mmap of user data happened
already).

> Actually on 32 bit Linux architectures ssize_t is also long long because "large
> file support" was introduced over 20 years ago:

Did you mean off_t? The signed size_t type is 32 bit on 32 bit systems.
But off_t depends on large file support compiled in. So it's sometimes
32 and sometimes 64 bit.

> So if we're changing the type it should change to long long

I disagree here. First off, I would not recommend to use "long long"
just because it's most of the times of the same size. The data types
exist for a reason, the most important I think is the implied intention
of their use cases.

Use size_t for memory operations. Use off_t for file operations. Use
long long if your prefered C standard is too old for int64_t or the
API of library functions in use want long long.

Since read and write are used to operate on memory, size_t is the best
choice. Or ssize_t for included error handling. And this is exactly what
the underlying C library functions do.

> P.S. One of my most unsolvable todo items is what to do about readline() on
> /dev/zero. If it's looking for /n it's just gonna allocate a bigger and bigger
> buffer until it triggers the OOM killer. If a single line IS a gigabyte long,
> what am I supposed to _do_ about it?

I would say: Do what open does on a system without large file suport
with large files: Return an error.

And as it has been discussed in enh's thread: It depends on the
application. Does it need random access after parsing? Can it have
random access on the file? Is a streaming approach possible?

Sincerely,
Samanta



More information about the Toybox mailing list