[Toybox] PM: code style, was: Re: New Subscriber

Frank Bergmann toybox at tuxad.com
Wed Feb 8 00:00:10 PST 2012


Hi.

On Tue, Feb 07, 2012 at 07:26:22PM -0600, Rob Landley wrote:
> But in this case, I plan to continue using the printf() family because
> doing without it wouldn't actually simplify the code, I'd just wind up
> writing something just as bad.

In this case you may use more control to use printf as desired. I'd
made a clone pf toybox and busybox and issued one command to compare:

$ grep -r setvbuf busybox|wc -l
9
$ grep -r setvbuf toybox-cloned-1328632507|wc -l
0

> C and string handling are not a good mix.  String handling is the thing
> C is weakest at, because string handling actually turns out to be a hard
> problem to get right at the hardware level.

We all know that C actually doesn't know anything about "strings". ;-)
Writing "bigger" software it *may* be worth implementing strings as a
class in C (er... struct) like Wietse Venema does. But this means to
rewrite all code.

> Yeah, but after 40 years of being grandfathered in, it's still useful
> enough to stick around.

Yeah, but 99% of software out in the wild breaks the basic rules KISS and
YAGNI and introduce (sometimes many) bugs and holes with this.

> I run things under strace rather a lot, but even if it boiled down to
> doing a for() loop around write(1, &char, 1) I'd tell you to fix your
> libc rather than  change what I was doing.

OK, this self-written printf-implementation you are stracing is not very
well optimized. ;-)

> I'm not that interested in micromanaging something that most likely
> stays in L1 cache either way.

Hmmm... if your code fits in L1-cache right before doing a sysenter then
the cache will be dirty when doing the call of sysenter, isn't it?
I never measured applications touching this "problem" mostly because of
that causes:
- hard to measure L1 cache running an OS with many tasks and not many
  cores
- Loops already fit in L1 cache and did not call code "outside", running
  so fast that you can't measure
- reducing the amount of syscalls brought most speedup, other changes were
  only ambiguous measurable
- too many syscalls which can't be reduced let the advantage of the cache
  vanish (mostly showing big i/o waits in top at the core the application
  runs)
- some causes where further optimizations didn't make any sense (due to
  e.g. network latencies)
As you wrote: You'll have to measure it (all). Until then you must keep
caches in mind.

> That's the worst you've seen?

Yes, I didn't expect it on rrdtool. But after stracing it I understand the
cause why it is actually running slow even on big hosts with many
rrd-updates even though fadvise/madvise should catch these cases.

> Never run strace on gcc.  Certainly not

gcc is one of the tools I never *wanted* to strace because I already
expected a nightmare (other tools are e.g. php). ;-)

> See lib/lib.h

clone done. Patches submit to the list?
My first make did throw the nasty "dereferencing type-punned pointer will
break strict-aliasing rules". In sort.c you use TT.lines as char* and not
char**.

> command line stuff.  It's only a win if you never have to do it more
> than once.

That's why I often used small internal output buffers and the nasty
stpcpy.

> Minimal system bootstrapping is theoretically four things:

BTW - I've read that pivot_root doesn't have a high priority in your
TODO-list. It's very easy to implement cause Linux offers a syscall. Older
glibc didn't offer a wrapping but this is also easy to check (and to
implement if necessary). I want to write it as my first toy-code if no one
else is working on it.
Next thing could be mount even though it's not that easy.

> Or you could do toys/cat.c using the global toybuf[4096] which is part
> of the bss and only gets its page faulted in if it's actually dirtied.
> (Modulo alignment considerations I haven't bothered about.)

I've read your docs but know I did also the clone and read some code. :-)

> As I think I said in design.html, I'm replying on c99, posix-2008, and
> LP64.  (If I wasn't clear enough there tell me and I'll go fix it.)

No, it is clear. It was just a big bunch of docs. I yet don't know POSIX
2008, only 2001. I think there is much more "deprecated".

> the first commands I wrote for toybox.  (Actually I started it for
> busybox but left that project before it was finished, so never submitted
> it.)

Maybe they will backport some day. ;-)

> Never heard of it.  I've got a strlcpy() but everybody does since
> strncpy() isn't guaranteed to null terminate the output.)

Better you forget this nasty thing. The man-page says that it is a
GNU-extension and that it maybe goes back to the old msdos times...

> Huh, apparently it's not an extension, it's in SUSv4:
>   http://pubs.opengroup.org/onlinepubs/9699919799/functions/stpcpy.html

er... maybe this is a april fools day joke

> That's really cool.  Thanks.  I wonder where that's needed in lib/* or

Don't forget the name "msdos" in its history. ;-)

> Did you read http://landley.net/toybox/design.html yet?  Do I need to
> fluff that out a bit?

Sorry, it was a bunch of docs. First I read the stuff easily linked and
then the urls you posted.

> all the same darn thing just like ANSI/ISO C is the same standard
> approved by two standards bodies...)

Yes, we should be glad there is not a DIN standard yet. ;-)

> Trust me: I know how to profile stuff, and how to understand the

I do. Before I read your opinion about tumb programmers I already tried to
think so.

My experiences are mainly the results of writing some monitoring tools
which sometimes can cause i/o wait, or writing a fast fgrep where I
measured that the size of the buffer is a great killer if you want to
speed it up. :-)

> Which is why they changed it so gettimeofday() can just read an atomic
> variable out of the vsyscall page:

Yes, I know this. But even if gettimeofday is not a "young" call there are
many, many projects which didn't recognize. 
I still use it in some of my tools but only one time and not many times
and even more not many times a second.

> I'm totally aware that most existing userspace software is crap:
> 
>   http://lwn.net/Articles/192214/

Bookmarked. And - yes! - stat calls are the next "evil" calls which are
way too many called. Like times it is also a problem in the tool mentioned
above (not gcc but the other ;-) ).

Frank

-- 
EDV Frank Bergmann                           Tel.     05221-9249753
LPIC-3 Linux Professional                    Fax      05221-9249754
Pödinghauser Str. 5                          email    iservice at tuxad.com
32051 Herford                                USt-IdNr DE237314606

 1328688010.0


More information about the Toybox mailing list