[Toybox] PM: code style, was: Re: New Subscriber
Rob Landley
rob at landley.net
Sun Feb 5 12:02:59 PST 2012
I suck at this bcc: thing.
On 02/05/2012 12:07 PM, Rob Landley wrote:
> On 02/05/2012 03:58 AM, somebody wrote:
>> Hello Rob,
>>
>> I read some list postings in the archive.
>> I write as PM because I don't want to "scare" one of the authors (Tim Bird).
>
> Trust me, if the lwn.net coverage didn't scare him, I doubt honest code
> review would make him blink. :)
>
> I'll cc: this to the list but chop out your email, reply there if you
> want to decloak. :)
>
>> He wrote on 2012-01-26 about an id-implementation. Here are my questions:
>>
>> - Does your goal "be simple" mean that the code must be very easy
>> readable? id.c IS very easy readable but in case of e.g. flags "u" and
>> "r" set you will make 4 syscalls when only 1 is needed. Shouldn't it
>> be "optimized"?
>
> I tend to go through and do optimization passes on submissions. I
> haven't gotten to id yet.
>
> My "quick glance" when he sent it said "would putting a for(;;) loop
> around only two printf calls actually improve matters, or is the
> overhead of the loop going to make it larger? I need to bench it."
>
> What I didn't do in the quick glance was notice that the options are
> exclusive. As usual, the gnu/dammit version has a special error message
> for every occasion:
>
> $ id -ug
> id: cannot print "only" of more than one choice
>
> Or that this id implementation is just a stub that is _not_ compliant
> with SUSv4, which says:
>
>> The following formats shall be used...
>>
>> "uid=%u(%s) gid=%u(%s)\n", <real user ID>, <user-name>,
>> <real group ID>, <group-name>
>
> So yeah, it needs a second pass in a big way. I should default it to
> "n" in the meantime because it's not really there right now...
>
> Instead I've been doing release cleanup. Specificially regression
> testing the existing commands in my aboriginal linux setup, and trying
> to figure out when I swap in each toybox command individually the build
> works for all of them, but when I swap in all of them uClibc dies with
> an internal compiler error and I get segfaults from "sort" and "xargs"
> in dmesg.
>
> Plus I'd like to finish the C rewrite of scripts/config2help.py so
> python isn't a build prerequisite if "help" is enabled. And I had to
> teach scripts/genconfig.sh to do a compile probe against the containers
> infrastructure and disable unshare if the target toolchain headers
> haven't got the appropriate #defines...
>
>> - Many coders who extremely dislike overhead (including myself) only use
>> libcall printf() for debugging purposes. It's some kind of
>> "deprecated".
>
> A valid concern, and I did think about this. But
> printf/asprintf/vsprintf/fprintf are pretty much all the same engine,
> once it's pulled in for one use the overhead of using it in a second
> place is trivial.
>
> Something _like_ it would be unavoidable for perror_exit() and friends
> (attempts to move complexity from the library to the callsite are a net
> increase in overhead when you scale to many calls). For example, the
> above output sequence for "id" with no arguments would be 9 separate
> output statements without printf. (Or 9 statements assembling a string,
> of who knows what length, and then a tenth to write it out.)
>
> Keep in mind that my primary design goal is _simplicity_, then size,
> speed, and features. You have to trade these off against each other when
> implementing, but simplicity is the one I weight the highest, because in
> the long run it's the one that makes the implementation understandable,
> and thus maintainable (and security auditable). That means I tend to
> approach commands with "what's the simplest way to implement this
> feature", "what's the simplest way to speed this up", and so on.
>
> Leveraging code people already wrote makes _toybox_ simple, otherwise
> you spend eight lines of code doing what one line of code can do, and
> you wind up repeating that all through your program. This code is
> required to be there by c99, and it's required to have certain behavior.
> There are _sucky_implementations_ of this code, but there are also
> reasonable ones.
>
> The linux kernel contains a reasonable printf() implementation, and I
> feel comfortable writing my own if necessary. The uClibc implementation
> of printf() is about 10k, based on binary size growth for a static i586
> compile between "puts("Hello\n")" and "printf("Hello%d\n",0)". (Note
> that printf() with a just constant string and no arguments gets
> downshifted to a puts equivalent in uClibc, identical binary size.)
>
> Yes, I'm balancing several definitions of "small" here. I want to make
> toybox as a whole, statically linked, take up as little disk space as
> possible. And I want to make it take up as little ram as possible at
> runtime.
>
> And I want to let you build individual commands and have _those_ be as
> small as possible, but in doing so I assume those will be dynamically
> linked. I am not optimizing for the "multiple individual executables,
> each statically linked" case because it's _crazy_ and means I should
> never leverage shared infrastructure, which is how you make the one big
> statically linked executable small.
>
> If your objection is that most people don't know how to _use_ these
> tools in a reasonable manner: Yes, I'm aware of the newbie mistake the
> sudo guys did recently. (Welcome to C, where we cook with knives and
> fire.) I've got linked lists containing pointers from different
> allocation contexts. (The struct arg_list is malloced from the heap,
> the values it points to live in environment space. That's the normal
> use case of lib/args.c, because I don't want to copy memory
> unnecessarily. There's also the global toybuf[] and the stack without
> even bringing mmap() into it. When you write in C, you have to know
> what you're _doing_.)
>
> Over on xargs I was at first trying to make a version that fit in
> toybuf[] and upshifted to a malloc() when it outgrew the 4k static
> buffer, but as long as I had to write the malloc() version anyway making
> it the _only_ code path simplified things, so that's what I checked in.
> I do a lot of that, weigh multiple approaches and then wind up doing
> the _less_ microoptimized version because it's simpler and scales better
> and performance is reasonable and yes I'm aware of nommu systems but
> this mostly still runs on them, and trying to make a nommu system
> reliable in low memory situations isn't worth complicating the common case.
>
>> Libs like libowfat or libdjb will reduce overhead while
>> not making the source hard to read. What's your opinion?
>
> Never heard of either, but in general:
>
> The external dependencies of BusyBox (under my tenure):
>
> libc
>
> The external dependencies of toybox:
>
> libc
>
> I'm all for finding a better libc, but I don't even depend on stuff like
> zlib or ncurses. (I'm ok shelling out to an external gzip binary and
> piping stuff through it; the only sane way to add https support to wget
> and such would be via stunnel.)
>
> (Part of my reasoning here is that at the lowest levels of the system
> the dependnecies go circular, which makes bootstrapping hard, and you
> wind up building stuff twice, which is nuts and one configuration
> inevitably gets way less testing. Avoiding external dependencies in
> these packages other than "I need a compiler with libc" means you don't
> have to build toybox twice to bootstrap a system.)
>
> Note how glibc had a "stage 1 compiler", but toybox didn't. (Sheesh,
> perl's probably the silliest one: building microperl so perl's build
> system can be written _in_perl_. The C compiler has a good excuse for
> that, perl does not.)
>
>> Frank
>>
>> id.c excerpts as posted:
>> /* show effective, unless user specifies real */
>> uid = geteuid();
>> gid = getegid();
>> if (flags & FLAG_r) {
>> uid = getuid();
>> gid = getgid();
>> }
>> if (flags & FLAG_u) {
>> printf("%d\n", uid);
>> return;
>> }
>> if (flags & FLAG_g) {
>> printf("%d\n", gid);
>> return;
>> }
>> printf("%d %d\n", uid, gid);
>
> if (flags & (FLAG_u|FLAG_g))
> printf("%d\n", (flags & FLAG_u) ? uid | gid);
> else printf("%d %d\n", uid, gid);
>
> Except the standard _also_ requires -G and -n. And a far more thorough
> reading. (Yeah, it's a todo item.)
>
> Rob
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
>
More information about the Toybox
mailing list