[Toybox] PM: code style, was: Re: New Subscriber

Rob Landley rob at landley.net
Sun Feb 5 12:02:59 PST 2012


I suck at this bcc: thing.

On 02/05/2012 12:07 PM, Rob Landley wrote:
> On 02/05/2012 03:58 AM, somebody wrote:
>> Hello Rob,
>>
>> I read some list postings in the archive.
>> I write as PM because I don't want to "scare" one of the authors (Tim Bird).
> 
> Trust me, if the lwn.net coverage didn't scare him, I doubt honest code
> review would make him blink. :)
> 
> I'll cc: this to the list but chop out your email, reply there if you
> want to decloak. :)
> 
>> He wrote on 2012-01-26 about an id-implementation. Here are my questions:
>>
>> - Does your goal "be simple" mean that the code must be very easy
>>   readable? id.c IS very easy readable but in case of e.g. flags "u" and
>>   "r" set you will make 4 syscalls when only 1 is needed. Shouldn't it
>>   be "optimized"?
> 
> I tend to go through and do optimization passes on submissions.  I
> haven't gotten to id yet.
> 
> My "quick glance" when he sent it said "would putting a for(;;) loop
> around only two printf calls actually improve matters, or is the
> overhead of the loop going to make it larger?  I need to bench it."
> 
> What I didn't do in the quick glance was notice that the options are
> exclusive.  As usual, the gnu/dammit version has a special error message
> for every occasion:
> 
>   $ id -ug
>   id: cannot print "only" of more than one choice
> 
> Or that this id implementation is just a stub that is _not_ compliant
> with SUSv4, which says:
> 
>> The following formats shall be used...
>>
>> "uid=%u(%s) gid=%u(%s)\n", <real user ID>, <user-name>,
>>    <real group ID>, <group-name>
> 
> So yeah, it needs a second pass in a big way.  I should default it to
> "n" in the meantime because it's not really there right now...
> 
> Instead I've been doing release cleanup.  Specificially regression
> testing the existing commands in my aboriginal linux setup, and trying
> to figure out when I swap in each toybox command individually the build
> works for all of them, but when I swap in all of them uClibc dies with
> an internal compiler error and I get segfaults from "sort" and "xargs"
> in dmesg.
> 
> Plus I'd like to finish the C rewrite of scripts/config2help.py so
> python isn't a build prerequisite if "help" is enabled.  And I had to
> teach scripts/genconfig.sh to do a compile probe against the containers
> infrastructure and disable unshare if the target toolchain headers
> haven't got the appropriate #defines...
> 
>> - Many coders who extremely dislike overhead (including myself) only use
>>   libcall printf() for debugging purposes. It's some kind of
>>   "deprecated".
> 
> A valid concern, and I did think about this.  But
> printf/asprintf/vsprintf/fprintf are pretty much all the same engine,
> once it's pulled in for one use the overhead of using it in a second
> place is trivial.
> 
> Something _like_ it would be unavoidable for perror_exit() and friends
> (attempts to move complexity from the library to the callsite are a net
> increase in overhead when you scale to many calls).  For example, the
> above output sequence for "id" with no arguments would be 9 separate
> output statements without printf.  (Or 9 statements assembling a string,
> of who knows what length, and then a tenth to write it out.)
> 
> Keep in mind that my primary design goal is _simplicity_, then size,
> speed, and features. You have to trade these off against each other when
> implementing, but simplicity is the one I weight the highest, because in
> the long run it's the one that makes the implementation understandable,
> and thus maintainable (and security auditable). That means I tend to
> approach commands with "what's the simplest way to implement this
> feature", "what's the simplest way to speed this up", and so on.
> 
> Leveraging code people already wrote makes _toybox_ simple, otherwise
> you spend eight lines of code doing what one line of code can do, and
> you wind up repeating that all through your program. This code is
> required to be there by c99, and it's required to have certain behavior.
>  There are _sucky_implementations_ of this code, but there are also
> reasonable ones.
> 
> The linux kernel contains a reasonable printf() implementation, and I
> feel comfortable writing my own if necessary. The uClibc implementation
> of printf() is about 10k, based on binary size growth for a static i586
> compile between "puts("Hello\n")" and "printf("Hello%d\n",0)".  (Note
> that printf() with a just constant string and no arguments gets
> downshifted to a puts equivalent in uClibc, identical binary size.)
> 
> Yes, I'm balancing several definitions of "small" here.  I want to make
> toybox as a whole, statically linked, take up as little disk space as
> possible.  And I want to make it take up as little ram as possible at
> runtime.
> 
> And I want to let you build individual commands and have _those_ be as
> small as possible, but in doing so I assume those will be dynamically
> linked.  I am not optimizing for the "multiple individual executables,
> each statically linked" case because it's _crazy_ and means I should
> never leverage shared infrastructure, which is how you make the one big
> statically linked executable small.
> 
> If your objection is that most people don't know how to _use_ these
> tools in a reasonable manner: Yes, I'm aware of the newbie mistake the
> sudo guys did recently. (Welcome to C, where we cook with knives and
> fire.) I've got linked lists containing pointers from different
> allocation contexts.  (The struct arg_list is malloced from the heap,
> the values it points to live in environment space.  That's the normal
> use case of lib/args.c, because I don't want to copy memory
> unnecessarily.  There's also the global toybuf[] and the stack without
> even bringing mmap() into it.  When you write in C, you have to know
> what you're _doing_.)
> 
> Over on xargs I was at first trying to make a version that fit in
> toybuf[] and upshifted to a malloc() when it outgrew the 4k static
> buffer, but as long as I had to write the malloc() version anyway making
> it the _only_ code path simplified things, so that's what I checked in.
>  I do a lot of that, weigh multiple approaches and then wind up doing
> the _less_ microoptimized version because it's simpler and scales better
> and performance is reasonable and yes I'm aware of nommu systems but
> this mostly still runs on them, and trying to make a nommu system
> reliable in low memory situations isn't worth complicating the common case.
> 
>> Libs like libowfat or libdjb will reduce overhead while
>>   not making the source hard to read. What's your opinion?
> 
> Never heard of either, but in general:
> 
> The external dependencies of BusyBox (under my tenure):
> 
>   libc
> 
> The external dependencies of toybox:
> 
>   libc
> 
> I'm all for finding a better libc, but I don't even depend on stuff like
> zlib or ncurses. (I'm ok shelling out to an external gzip binary and
> piping stuff through it; the only sane way to add https support to wget
> and such would be via stunnel.)
> 
> (Part of my reasoning here is that at the lowest levels of the system
> the dependnecies go circular, which makes bootstrapping hard, and you
> wind up building stuff twice, which is nuts and one configuration
> inevitably gets way less testing.  Avoiding external dependencies in
> these packages other than "I need a compiler with libc" means you don't
> have to build toybox twice to bootstrap a system.)
> 
> Note how glibc had a "stage 1 compiler", but toybox didn't.  (Sheesh,
> perl's probably the silliest one: building microperl so perl's build
> system can be written _in_perl_.  The C compiler has a good excuse for
> that, perl does not.)
> 
>> Frank
>>
>> id.c excerpts as posted:
>> 	/* show effective, unless user specifies real */
>> 	uid = geteuid();
>> 	gid = getegid();
>> 	if (flags & FLAG_r) {
>> 		uid = getuid();
>> 		gid = getgid();
>> 	}
>> 	if (flags & FLAG_u) {
>> 	    printf("%d\n", uid);
>> 		return;
>> 	}
>> 	if (flags & FLAG_g) {
>> 		printf("%d\n", gid);
>> 		return;
>> 	}
>> 	printf("%d %d\n", uid, gid);
> 
> if (flags & (FLAG_u|FLAG_g))
>   printf("%d\n", (flags & FLAG_u) ? uid | gid);
> else printf("%d %d\n", uid, gid);
> 
> Except the standard _also_ requires -G and -n.  And a far more thorough
> reading.  (Yeah, it's a todo item.)
> 
> Rob
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
> 




More information about the Toybox mailing list