[Toybox] toybox - added cmp

Rob Landley rob at landley.net
Tue Feb 14 07:01:34 PST 2012


On 02/14/2012 05:42 AM, Frank Bergmann wrote:

>> >> Ok, suppose I want to extend xargs
...
>> >> (This is why xargs children reading from stdin get undefined results,
> >  The command blah does not
> > get data on stdin (IMHO xargs must close the descriptor).
> > THIS is actually the thing I STILL don't understand.

Please read what I wrote.

I'm explaining _why_ xargs must close the descriptor, due to exactly the
limitation I was explaining.  It would be _useful_ to have -E delimit
input between command line options and child stdin, but turns out to be
difficult enough to implement that the standard allows but does not
require it.

This is not the only place this comes up, it's just one I hit recently
when I was implementing xargs, so it was fresh in my mind.

>> >> (This is why xargs children reading from stdin get undefined results,
>> >> even with the -E option, because doing this is ridiculous.)
> > 
> > Even with -e you don't get data on stdin of command blah.

And this issue is _why_.

>> >> The ANSI guys added FILE * so they'd have the buffer handling in the
>> >> library itself.  But if you've got existing functions that use a file
>> >> descriptor instead of a FILE * they don't necessarily mix cleanly due to
>> >> this issue, and FILE * results in bigger code than fd.  (For one thing,
> > 
> > *g* Who was the guy shouting at stdio overhead, you or me? ;-)

The contents of FILE * do not cleanly go to child processes.  If the
internal implementation of FILE * reads more data and buffers it, you
wind up having to care anyway.

Relying on FILE * doing this also means you can't mix FILE * and
filehandle in the same program, even though filehandle is generally
simpler and results in smaller code and is the way Unix works
underneath.  FILE * is a leaky wrapper.

> > Of course you can't mix it up but if you decide to only use FILE* in a
> > specific tool then you got the benefits like buffering for every
> > descriptor you want it to have.

Assuming you never exec() a child.

> > If you have a tool like xargs it benefits from buffered stdin.

Which is a program that exec()s a child, and as a result has to close
stdin instead of being able to delimit it and pass on a portion, which
would often be extremely useful. (My filchmail script needed that, among
other things...)

>> >> As I said, libc does implement this for you: badly.
> > 
> > WHICH libc? As setvbuf(3) says:

All of them.  It's inherent in ANSI's design, and the limitations of POSIX.

> >   "The setbuf() and setvbuf() functions conform to C89 and C99."
> > And if you don't want/like/use this then the only way is to implement
> > buffering for yourself.
> > Then we are back again at the starting point when I ask about the sense of
> > buffer_putlong() and more. ;-)

If we're arguing in circles, I suggest we stop.

> (er... at this point I think about some clone options which should make
> it possible to even share FILE* filehandles. ;-) )

You mean the way after a fork _both_ children have the buffered data and have
to work out which one's going to use it?  Or the way you have to use _exit()
instead of exit() after a fork (at least when you don't exec()) because
flushing the stdout buffer will otherwise happen twice and give you
stuttering data?

The Ansi FILE * stuff is useful, but also has a bunch of flaws.

Rob

 1329231694.0


More information about the Toybox mailing list