[Toybox] PM: code style, was: Re: New Subscriber

Rob Landley rob at landley.net
Sun Feb 5 10:07:57 PST 2012


On 02/05/2012 03:58 AM, somebody wrote:
> Hello Rob,
> 
> I read some list postings in the archive.
> I write as PM because I don't want to "scare" one of the authors (Tim Bird).

Trust me, if the lwn.net coverage didn't scare him, I doubt honest code
review would make him blink. :)

I'll cc: this to the list but chop out your email, reply there if you
want to decloak. :)

> He wrote on 2012-01-26 about an id-implementation. Here are my questions:
> 
> - Does your goal "be simple" mean that the code must be very easy
>   readable? id.c IS very easy readable but in case of e.g. flags "u" and
>   "r" set you will make 4 syscalls when only 1 is needed. Shouldn't it
>   be "optimized"?

I tend to go through and do optimization passes on submissions.  I
haven't gotten to id yet.

My "quick glance" when he sent it said "would putting a for(;;) loop
around only two printf calls actually improve matters, or is the
overhead of the loop going to make it larger?  I need to bench it."

What I didn't do in the quick glance was notice that the options are
exclusive.  As usual, the gnu/dammit version has a special error message
for every occasion:

  $ id -ug
  id: cannot print "only" of more than one choice

Or that this id implementation is just a stub that is _not_ compliant
with SUSv4, which says:

> The following formats shall be used...
>
> "uid=%u(%s) gid=%u(%s)\n", <real user ID>, <user-name>,
>    <real group ID>, <group-name>

So yeah, it needs a second pass in a big way.  I should default it to
"n" in the meantime because it's not really there right now...

Instead I've been doing release cleanup.  Specificially regression
testing the existing commands in my aboriginal linux setup, and trying
to figure out when I swap in each toybox command individually the build
works for all of them, but when I swap in all of them uClibc dies with
an internal compiler error and I get segfaults from "sort" and "xargs"
in dmesg.

Plus I'd like to finish the C rewrite of scripts/config2help.py so
python isn't a build prerequisite if "help" is enabled.  And I had to
teach scripts/genconfig.sh to do a compile probe against the containers
infrastructure and disable unshare if the target toolchain headers
haven't got the appropriate #defines...

> - Many coders who extremely dislike overhead (including myself) only use
>   libcall printf() for debugging purposes. It's some kind of
>   "deprecated".

A valid concern, and I did think about this.  But
printf/asprintf/vsprintf/fprintf are pretty much all the same engine,
once it's pulled in for one use the overhead of using it in a second
place is trivial.

Something _like_ it would be unavoidable for perror_exit() and friends
(attempts to move complexity from the library to the callsite are a net
increase in overhead when you scale to many calls).  For example, the
above output sequence for "id" with no arguments would be 9 separate
output statements without printf.  (Or 9 statements assembling a string,
of who knows what length, and then a tenth to write it out.)

Keep in mind that my primary design goal is _simplicity_, then size,
speed, and features. You have to trade these off against each other when
implementing, but simplicity is the one I weight the highest, because in
the long run it's the one that makes the implementation understandable,
and thus maintainable (and security auditable). That means I tend to
approach commands with "what's the simplest way to implement this
feature", "what's the simplest way to speed this up", and so on.

Leveraging code people already wrote makes _toybox_ simple, otherwise
you spend eight lines of code doing what one line of code can do, and
you wind up repeating that all through your program. This code is
required to be there by c99, and it's required to have certain behavior.
 There are _sucky_implementations_ of this code, but there are also
reasonable ones.

The linux kernel contains a reasonable printf() implementation, and I
feel comfortable writing my own if necessary. The uClibc implementation
of printf() is about 10k, based on binary size growth for a static i586
compile between "puts("Hello\n")" and "printf("Hello%d\n",0)".  (Note
that printf() with a just constant string and no arguments gets
downshifted to a puts equivalent in uClibc, identical binary size.)

Yes, I'm balancing several definitions of "small" here.  I want to make
toybox as a whole, statically linked, take up as little disk space as
possible.  And I want to make it take up as little ram as possible at
runtime.

And I want to let you build individual commands and have _those_ be as
small as possible, but in doing so I assume those will be dynamically
linked.  I am not optimizing for the "multiple individual executables,
each statically linked" case because it's _crazy_ and means I should
never leverage shared infrastructure, which is how you make the one big
statically linked executable small.

If your objection is that most people don't know how to _use_ these
tools in a reasonable manner: Yes, I'm aware of the newbie mistake the
sudo guys did recently. (Welcome to C, where we cook with knives and
fire.) I've got linked lists containing pointers from different
allocation contexts.  (The struct arg_list is malloced from the heap,
the values it points to live in environment space.  That's the normal
use case of lib/args.c, because I don't want to copy memory
unnecessarily.  There's also the global toybuf[] and the stack without
even bringing mmap() into it.  When you write in C, you have to know
what you're _doing_.)

Over on xargs I was at first trying to make a version that fit in
toybuf[] and upshifted to a malloc() when it outgrew the 4k static
buffer, but as long as I had to write the malloc() version anyway making
it the _only_ code path simplified things, so that's what I checked in.
 I do a lot of that, weigh multiple approaches and then wind up doing
the _less_ microoptimized version because it's simpler and scales better
and performance is reasonable and yes I'm aware of nommu systems but
this mostly still runs on them, and trying to make a nommu system
reliable in low memory situations isn't worth complicating the common case.

> Libs like libowfat or libdjb will reduce overhead while
>   not making the source hard to read. What's your opinion?

Never heard of either, but in general:

The external dependencies of BusyBox (under my tenure):

  libc

The external dependencies of toybox:

  libc

I'm all for finding a better libc, but I don't even depend on stuff like
zlib or ncurses. (I'm ok shelling out to an external gzip binary and
piping stuff through it; the only sane way to add https support to wget
and such would be via stunnel.)

(Part of my reasoning here is that at the lowest levels of the system
the dependnecies go circular, which makes bootstrapping hard, and you
wind up building stuff twice, which is nuts and one configuration
inevitably gets way less testing.  Avoiding external dependencies in
these packages other than "I need a compiler with libc" means you don't
have to build toybox twice to bootstrap a system.)

Note how glibc had a "stage 1 compiler", but toybox didn't.  (Sheesh,
perl's probably the silliest one: building microperl so perl's build
system can be written _in_perl_.  The C compiler has a good excuse for
that, perl does not.)

> Frank
> 
> id.c excerpts as posted:
> 	/* show effective, unless user specifies real */
> 	uid = geteuid();
> 	gid = getegid();
> 	if (flags & FLAG_r) {
> 		uid = getuid();
> 		gid = getgid();
> 	}
> 	if (flags & FLAG_u) {
> 	    printf("%d\n", uid);
> 		return;
> 	}
> 	if (flags & FLAG_g) {
> 		printf("%d\n", gid);
> 		return;
> 	}
> 	printf("%d %d\n", uid, gid);

if (flags & (FLAG_u|FLAG_g))
  printf("%d\n", (flags & FLAG_u) ? uid | gid);
else printf("%d %d\n", uid, gid);

Except the standard _also_ requires -G and -n.  And a far more thorough
reading.  (Yeah, it's a todo item.)

Rob

 1328465277.0


More information about the Toybox mailing list