[Toybox] getline() length

Rob Landley rob at landley.net
Sun Nov 9 13:18:21 PST 2014



On 11/09/14 13:48, Jeroen van Rijn wrote:
> On Sun, Nov 9, 2014 at 8:34 PM, Rob Landley <rob at landley.net
> <mailto:rob at landley.net>> wrote:
> 
>     Again, I don't know what the correct response is. Since this is
>     something posix-2008 picked up from the gnu/dammit guys, it is of course
>     badly specified. (The FSF is a political organization with only a
>     vestigial software engineering effort attached, neither programming nor
>     documentation are anywhere near their core competency.) Specifically,
>     the man page's "ERRORS" section doesn't even mention -ENOMEM (let alone
>     -EPIPE or -EIO)... In an out of memory situation or with an input error
>     could it return a _short_ line? Discard partially read input if an error
>     occurred partway into a line? Wazzit do?
> 
>     I _suppose_ on a certain level this is really libc's problem. Or at
>     least as much libc's problem as the user's. But "juristictional
>     arbitrage" does not solve problems.
> 
> 
> 
> I suppose Toyboy could include in its config an optional safety valve
> that defines a GETLINE_MAX_LENGTH of a specified size, with a reasonable
> default. Upon setting this to -1, it would effectively default to the
> underspecified behaviour we're used to.

s/Toybox/musl/

To toybox, getline() is a black box. It hasn't got a parameter for
maximul line length.

We do have our own get_line() but it's horribly inefficient and I've
been trying to move away from it. (I hit this while looking at that.)

> What the reasonable default limit should be is a bit of a question, but
> at least this way the packagers for a system could set a limit (or its
> absence) as makes sense for their requirements.

http://en.wikipedia.org/wiki/Belling_the_cat

> If they can guarantee
> grep will never need to work on lines longer than 128kb, they'd specify
> that as its limit. Another user might prefer the unbounded case.

Oddly on a nommu system you get -ENOMEM from the allocation when it runs
out of memory, but on a full-fledged Linux system you get swap thrashing
(even without a swap file: discard mmap() pages from executables and
libraries, and then fault them right back in again because you're
running code of them, it can take 5 minutes of thrashing to reach an
ACTUAL deadlock point, so the OOM killer triggers before that based on
heuristics which can by definition never be perfect...

It's one of the big unsolved problems in computing. There was a
marvelous quote a decade and change back about how virtual memory was a
game that couldn't be won but without it there were no stakes to play
for, or something like that...

I note that environment space (environment variables plus command line
arguments) is capped at 128k in Linux and users seldom actually notice.
An optional cap on this stuff as a kconfig option or similar would be
nice. (Not an environment variable, having "be safe" be a button the
end-user has to press each time they use your thing is seldom a good
idea. Sometimes unavoidable, but still...)

(The correct behavior for dealing with long lines would probably be
error_exit(). This is a "should never happen, fail noisily, do not
silently corrupt data" sort of thing.)

Anyway, back to banging on sed...

Rob

 1415567901.0


More information about the Toybox mailing list