[Toybox] Embedded NUL bytes in grep/sed, or "strings are hard".
Owen Shepherd
owen.shepherd at e43.eu
Tue Sep 30 12:02:54 PDT 2014
Rob Landley wrote:
> In theory I can implement my own get_line() on top of FILE * using fgetc,
> but this is again looping over single bytes (because with ungetc only one
> pushback is guaranteed). A function call is cheaper than
> a system call, but still not exactly ideal. Unfortunately, I can't ask stdio
> "how many bytes of readahead are in your internal buffer" because it wants to
> hide those details. (Under strace, most actual fgetc() loops I actually
> watched did the darn one syscall per byte thing anyway.)
Is the file/stdin appropriately buffered? (i.e. is your implementation
being conservative and making stdin _IONBF for no good reason?)
More concretely: what libc was this tested with? If uclibc, I'm inclined
to believe uclibc is a pile of crap. If musl, WTF.
glibc gets this right, FWIW:
oshepherd at Shinji:~$ cat testbuf.c
#include <stdio.h>
int main()
{
int c;
while((c = fgetc(stdin)) != EOF)
fputc(c, stdout);
return 1;
}
oshepherd at Shinji:~$ strace ./testbuf < testbuf.c
execve("./testbuf", ["./testbuf"], [/* 21 vars */]) = 0
/* dynamic linker noise excised */
read(0, "#include <stdio.h>\n\nint main()\n{"..., 4096) = 123
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f93b7e9b000
write(1, "#include <stdio.h>\n", 19#include <stdio.h>
) = 19
write(1, "\n", 1
) = 1
write(1, "int main()\n", 11int main()
) = 11
write(1, "{\n", 2{
) = 2
write(1, " int c;\n", 11 int c;
) = 11
write(1, " while((c = fgetc(stdin)) != "..., 37 while((c =
fgetc(stdin)) != EOF)
) = 37
write(1, " fputc(c, stdout);\n", 26 fputc(c, stdout);
) = 26
write(1, " return 1;\n", 14 return 1;
) = 14
write(1, "}\n", 2}
) = 2
read(0, "", 4096) = 0
exit_group(1) = ?
For best performance, make sure that stdin is fully buffered and then
1. flockfile(stdin), because POSIX says to do so
2. Use getc_unlocked, which may be a macro, and should be the fastest
way to grab a character
The cost of all those function calls should be much less than the cost
of a system call per line, especially if you give stdio a big buffer to
work with. Whatever you do, give stdio a big buffer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20140930/d93002f4/attachment-0004.htm>
More information about the Toybox
mailing list