[Toybox] Embedded NUL bytes in grep/sed, or "strings are hard".
Rob Landley
rob at landley.net
Sun Oct 5 10:46:35 PDT 2014
On 09/30/14 14:02, Owen Shepherd wrote:
> Rob Landley wrote:
>> In theory I can implement my own get_line() on top of FILE * using fgetc,
>> but this is again looping over single bytes (because with ungetc only one
>> pushback is guaranteed). A function call is cheaper than
>> a system call, but still not exactly ideal. Unfortunately, I can't ask stdio
>> "how many bytes of readahead are in your internal buffer" because it wants to
>> hide those details. (Under strace, most actual fgetc() loops I actually
>> watched did the darn one syscall per byte thing anyway.)
>
> Is the file/stdin appropriately buffered? (i.e. is your implementation
> being conservative and making stdin _IONBF for no good reason?)
I very much want this to be libc's problem, not mine. That's the main
reason to use FILE *.
> More concretely: what libc was this tested with? If uclibc, I'm inclined
> to believe uclibc is a pile of crap. If musl, WTF.
I believe I looked at uClibc and glibc both, but it was a while ago. (As
in several years.)
> glibc gets this right, FWIW:
> oshepherd at Shinji:~$ cat testbuf.c
> #include <stdio.h>
>
> int main()
> {
> int c;
> while((c = fgetc(stdin)) != EOF)
> fputc(c, stdout);
> return 1;
> }
> oshepherd at Shinji:~$ strace ./testbuf < testbuf.c
> execve("./testbuf", ["./testbuf"], [/* 21 vars */]) = 0
> /* dynamic linker noise excised */
> read(0, "#include <stdio.h>\n\nint main()\n{"..., 4096) = 123
For output it was using newlines to flush the buffer. For input it was
doing single bytes. Good to see that's changed, I guess...
> For best performance, make sure that stdin is fully buffered and then
>
> 1. flockfile(stdin), because POSIX says to do so
> 2. Use getc_unlocked, which may be a macro, and should be the fastest
> way to grab a character
See "want this to be libc's problem".
Getting the block size right is 99% of optimizing this sort of thing.
The rest is details. (Maddog had a marvelous talk about this at
LinuxWorld Expo in 2001. I have a tape of it somewhere, but alas it's on
casette and I no longer have a player for that.)
> The cost of all those function calls should be much less than the cost
> of a system call per line, especially if you give stdio a big buffer to
> work with. Whatever you do, give stdio a big buffer
I don't want to micromanage stdio's buffer size. That's libc's job. Glad
to hear it's doing a better job of it than it was circa 2008.
Rob
More information about the Toybox
mailing list