[Toybox] toybox - added cmp

Frank Bergmann toybox at tuxad.com
Tue Feb 14 03:42:17 PST 2012


Hi,

On Mon, Feb 13, 2012 at 10:34:46PM -0600, Rob Landley wrote:
> We can't feed those bytes _back_ into stdin.  They exist in a memory

at least you can feed one byte back to stdin but not to STDIN. ;-)

> B) Fork the child, have the child close stdin and dup() the receiving
> end of the pipe to stdin.

Why? Using "cat/foo/bar/... something | xargs blah" you still have only
one stdin which is only sufficient for xargs. The command blah does not
get data on stdin (IMHO xargs must close the descriptor).
THIS is actually the thing I STILL don't understand.

> (This is why xargs children reading from stdin get undefined results,
> even with the -E option, because doing this is ridiculous.)

Even with -e you don't get data on stdin of command blah.

Maybe we should use some code to explain (I guess C is a language we
understand ;-) ). To explain my (mis-?) understanding:

[fwb at vdr toybox]$ cat test-stdin.c 
#include <unistd.h>
#include <stdio.h>
int main() {
  char buf[2];
  ssize_t c;
  c = read(0, buf, 1);
  printf("%d bytes read\n", c);
  return 0;
}
[fwb at vdr toybox]$ echo 1 2 3|./test-stdin 
1 bytes read
[fwb at vdr toybox]$ echo 1 2 3|xargs ./test-stdin 
0 bytes read

test-stdin (blah) gets EOF. The descriptors in /proc does also show this:

[fwb at vdr toybox]$ ls -l /proc/6759/fd
total 0
lr-x------ 1 fwb fwb 64 Feb 14 11:49 0 -> /dev/null
lrwx------ 1 fwb fwb 64 Feb 14 11:49 1 -> /dev/pts/1
lrwx------ 1 fwb fwb 64 Feb 14 11:49 2 -> /dev/pts/1

> The ANSI guys added FILE * so they'd have the buffer handling in the
> library itself.  But if you've got existing functions that use a file
> descriptor instead of a FILE * they don't necessarily mix cleanly due to
> this issue, and FILE * results in bigger code than fd.  (For one thing,

*g* Who was the guy shouting at stdio overhead, you or me? ;-)
Of course you can't mix it up but if you decide to only use FILE* in a
specific tool then you got the benefits like buffering for every
descriptor you want it to have.
If you have a tool like xargs it benefits from buffered stdin.

> As I said, libc does implement this for you: badly.

WHICH libc? As setvbuf(3) says:
  "The setbuf() and setvbuf() functions conform to C89 and C99."
And if you don't want/like/use this then the only way is to implement
buffering for yourself.
Then we are back again at the starting point when I ask about the sense of
buffer_putlong() and more. ;-)

[...]
> otherwise defaulting to one.  There's no way to say "read until end of
> line" except to give an arbitrarily big length limiter.

Yes, you can only buffer chars you don't know coming from stdin. But using
this buffer with read(0,buf,anysizebiggerthan1) you can offer a getc
function which won't call read for any single character.

> Due to the funky way %s handles whitespace I can't get a verbatim line
> out of it, and %c doesn't stop at newlines (it's a block read until it
> fills the buffer).  I _might_ be able to abuse %[blah] to do what I

If your readline uses getc you don't have to use %s or %[^\n] or something
comparable (useable ony for most simple purposes). But with getc you may
get it buffered with low amount of read calls.

> Did the above explain it?

What a pity: No. Maybe someone of the list may show up where we don't
understand each other.
(I'm sure that it is not my br0ken english language. ;-) )

> Read past the end of the buffer, sucking too
> much data out of the OS, and unable to feed the extra data _back_ into
> the OS's filehandle because it's unidirectional?  So when I do the exec
> the child doesnt' get all the data unless the parent sets up a pipe to
> manually forward it?

I understand this in case that a program must use its stdin also for some
childs which must use exactly this stdin, too. But in case of xargs I
don't understand this. I also don't understand why xargs should not use
FILE* and a getline/readline which use a buffered getc. Maybe you could
use the whole toybuf for a STDIN buffer set up by setvbuf.

> >> includes all child programs that want to read from the same filehandle...
> > 
> > IMHO I'm missing the point.
> 
> Yeah, I noticed.

Yeah, me too. :-)

(er... at this point I think about some clone options which should make
it possible to even share FILE* filehandles. ;-) )

> So the parent can create a synthetic stdin filehandle 0 for the child
> which supplies _all_ the data, since the upstream one no longer supplies
> all the data because we took too much out and can't put it back into the
> original filehandle.

This point I already understood and still understand. But I still don't
know how this belongs to xargs which uses stdin only for itself.

Sigh.

Frank

-- 
EDV Frank Bergmann                           Tel.     05221-9249753
LPIC-3 Linux Professional                    Fax      05221-9249754
Pödinghauser Str. 5                          email    iservice at tuxad.com
32051 Herford                                USt-IdNr DE237314606

 1329219737.0


More information about the Toybox mailing list