[Toybox] [PATCH] Start replacing get_line() with getline().

Rob Landley rob at landley.net
Mon Jul 22 14:46:16 PDT 2019


On 7/22/19 10:17 AM, enh via Toybox wrote:
> thoughts?

Sorry, laptop battery ran out a few days ago (got caught inside by a
thunderstorm and couldn't get back to my charger before the battery drained
enough the Dell bios woke up from suspend and resumed-to-nothing), and
thunderbird doesn't remember  open reply windows the way kmail used to.

> On Fri, Jul 19, 2019 at 9:55 AM enh <enh at google.com> wrote:
>>
>> I started this last night, but thought I'd aim to send multiple small
>> patches rather than work through all the callers and send one big patch.

This has also been on my todo list, it's just big and touches a lot of files
some of which are dirty in my tree, so I hadn't gotten all the way through it
yet. :)

>> I've deliberately chosen the ugly name `allocated_length` because we've
>> had historical bugs where folks think this a line length in the sense of
>> the return value. I do wonder whether we should actually have some kind
>> of getline() wrapper that hides the `char *`/`size_t` pair in lib/,

As long as we're changing things anyway, a wrapper that simplifies the API would
probably make sense? Just always malloc it and return the pointer. Let's see, is:

  char *blah = xgetline(fp);

enough info? For the /n version it is (we can strlen), for the delim version we
want the returned length. and it's a minor optimization to _have_ the length...

  char *blah = xgetline(FILE *fp, unsigned *len);

Where if len is NULL it doesn't try to set it, so the "simple call is
s = xgetline(fp, 0);

And then it can return NUL for EOF. (The downside of this is it wouldn't report
read errors, although you can always zero and check errno. In theory we could
use int *len and return -1 but... why? Returning null for EOF on a read error is
right 95% of the time anyway, and the function itself could error_msg() if we
really wanted it to, but we probably don't?)

On my todo list this is entangled with "make the data member of double_list a
void * instead of a char *" because both design decisions were fallout from
implementing patch.c back in the day. (Although the double_list thing also
involves turning dlist_add_nomalloc into just dlist_add and then making the
current behavior dlist_add_wrap()... possibly struct double_list should _just_
have the prev and next pointers, and then it can be a first member of subclassed
double lists and we can dlist_add(&list, &new->dl);

Ahem. I have a whole lot of polishing work I _could_ do if I wasn't busy with
everything else. :)

>> which makes the function easier to use in most cases but does add the
>> less common gotcha that you wouldn't be able to getline() through
>> multiple files at once (which does happen in at least one toy).
>>
>> But maybe the real fix is to look harder for places where we can just
>> use loopfiles_lines?

Also worth doing, yes. Especially since I want to make the plumbing for
loopfiles_lines() try to read larger blocks and chop them up itself.

>> Speaking of which, should we actually add two more
>> arguments to that? Specifically: switch it to getdelim() rather than
>> getline() behind the scenes, and also add a way to have the trailing
>> '\n' automatically removed, since that seems to be what most callers
>> want?

Yes, the xgetline() above should NUL out the \n before returning the string.

There should be a getdelim version, but NUL-ing out the delimiter still makes
sense there. (And nulling out a NUL is a NOP so shouldn't hurt anything.)
ALTHOUGH, it's possible the delimiter could be a string, allowing us to have
utf8 delimiters? (Which makes NUL delimiters awkward: should it treat a NULL
pointer as meaning delimit with NUL, should it treat an empty string that way,
should the delimiter have a length...?)

  char *s = xgetdelim(FILE *fp, char *delim, int delimlen, int *len);

Eh, it's not _too_ awkward. But is there ever a case where we'd want a delimiter
with a NUL _in_ it? (Probably not?) In which case the above is xgetdelim_utf8()
and then we'd have an:

  char *s = xgetdelim(FILE *fp, char delim, int *len);

Wrapper to call it. And then xgetline() could also be a wrapepr for
xgetdelim_utf8(), and _that_ would be the function doing the fancy "read 64k at
at a time and chop lines out of it to xstrndup(), calling xrealloc() as
necessary when you cross a block boundary or hit a crazy long one". AND we could
put a ceiling on line length if we wanted to, although... do we want to? Under
what circumstances? (Query available memory? What does ulimit -d actually limit
again? The problem is malloc() only allocating _virtual_ memory and then the OOM
killer telling you when you've run out of physical memory, often long after swap
thrashing...)

Ahem. There's a reason this hairball is still on the todo list. :)

>> Anyway, that seemed like enough questions that it was time to send this
>> initial patch out before doing too much more...
>> ---
>>  lib/password.c        |  9 +++++----
>>  toys/other/rev.c      | 26 ++++++++++++--------------
>>  toys/posix/uudecode.c | 21 +++++++++++++--------
>>  3 files changed, 30 insertions(+), 26 deletions(-)

Applied.

Longer-term the input batching is a reasonable performance win, something you
care about in grep. And that rewrites xgetline() and xgetdelim() wrappers around
a new function anyway, and as long as we're doing that we may as well change the
API to a) always return a malloced string, b) optionally return the length of
the string...

Ah, wait. There ARE times we care about the delimiter being there or not: the
last line isn't guaranteed to have it and there are subtle behavior changes in
sed and such (which I implemented!) based on this. (It's related to the
sed-of-a-binary thing, you don't want to add a NUL to the end of the file if
there wasn't one...)

Grrr. <marvin the martian voice>design, design...</marvin the martian voice>

Rob

(This is why I didn't immediately reply to the first email.)



More information about the Toybox mailing list