[Toybox] toybox - added cmp

Rob Landley rob at landley.net
Tue Feb 7 05:25:59 PST 2012


On 02/06/2012 04:11 AM, Tim Elliott wrote:
> Rob,
> 
> I implemented cmp. I want to get better at C and the best way to learn
> is to get feedback on my code. I welcome even nitpicky feedback and I
> am eager to learn.

Woot.

I'll add it now to keep your commit metadata, then do a second
review/comment/cleanup pass later (have to go to work soon).

Lemme respond to the rest of your email first:

> I've been following toybox and your blog since it was linked from the
> suckless mailing list in November 2011. Thanks for keeping it
> interesting!
> 
> My references were the SUSv4 spec, the man page, comparing output to
> GNU diffutils, and toybox code (cat.c, cp.c, lib.c).

Sounds good.

> Here are some notes:
> 
>   * I'm assuming you don't care about the localization stuff mentioned
> in the spec. Looks like a headache.

Agreed.  I'm skipping internationalization support in general.  It
belongs at higher levels (either in Java or in X11).

It's important to be 8-bit clean, and I may add some support for UTF-8
at some point.  But not the multiple date formats, comma and period
layouts for numbers, and so on.  (And the insanity of string comparisons
being case insensitive unless you remember to specify the locale C...
That was just crazy.)

>   * Doesn't throw an error if both -s and -l are used. SUSv4 is
> ambiguously ambiguous.

I think it's "last one wins".  In theory, you can say that with the x~y
thing in the option string.  In practice, the option string stuff needs
more work.  (I'll go into that next message, it's long and not really
relevant here.)

>   * Is it okay to exit on error without closing file descriptors and
> freeing allocated memory?

Yes, the operating system will do it for you.

If the program works on an unlimited amount of data (input of arbitrary
size, or persistent daemons) then you don't want to leak memory as you
go along because that limits scalability.  But if you're exiting anyway,
the cleanup is redundant.

That said, I sometimes do if (CONFIG_TOYBOX_FREE) { } blocks that clean
up these things.  This is partly because some people like to run stuff
under tools like valgrind that try to find the persistent-style leaks
above, and get confused by lots of leftover noise at the end.  Also, if
a program can perfectly clean up after itself it might be possible to
run it from the shell without forking, which is a slight performance
boost... but not worth a whole lot of effort.

>   * Is it safe how I use printf with size_t? Will this be okay across
> architectures?

In this case, just typecast it to a long and you'll be fine.  (I think
that's what it already is, but just to be safe...)

FYI I keep a copy of the last freely distributed draft version of the
c99 specification before they went "ok, pay us gazillions of dollars for
a dead tree version of this":

  http://landley.net/c99-draft.html

For some things (like uint_32t) there are special macros that define
strings you can use with printf, but I seldom bother because the LP64
spec covers us on Linux.  (See the design page.)  I basically just use
uint64_t (or is it uint_64t, I can never remember) if I need a known 64
bit value even on a 32 bit platform, and then I have to pull out the
PRIu64 macro to print it.)

(Because long long is possibly 128 bits in some circumstances, I think?
 Not sure.  It's not exactly well-defined, that one.  Even LP64 leaves
the "long long on 64 bit host" spaces in the table blank.  In practice I
think gcc makes long long 64 bit always, but what llvm or pcc do I
couldn't tell you.  Some targets _do_ have 128 bit integer types, I know
that...)

>   * Should I test IO error cases? If so, how?

I've got a bunch of wrapper macros in lib/lib.c.  xprintf(), xputs(),
xflush() and so on.  They exit the program with an error message if the
output device can't accept the data.

If exiting the program with an error isn't the right behavior, you have
to check return values or ferror() on the stream yourself.

>   * SUSv4 didn't say anything about directories as arguments. GNU
> diffutils throws an error when you do. This doesn't.

There's a lot of stuff SUSv4 doesn't mention. :)

That said, read() is returning -1 trying to get data from directories,
and your code is ignoring that error.  (It generally returns 0 at EOF,
-1 indicates something went wrong.)

>   * GNU diffutils output is right-aligned (with -l).

Yeah, I'm doing similar alignment in df to match up with the text
header.  There's support for it in printf:

  printf("% 4d", blah);

If you go %03x it'll lead with zeroes, "% 3x" leads with spaces.  If you
don't want to hardwire an ascii constant into the string, you can do the
* trick:

  printf("% *d", len, blah);

Which is, believe it or not, in both the printf man page and the c99
standard for printf().

> Cheers,
> Tim

Thanks,

Rob

 1328621159.0


More information about the Toybox mailing list