[Toybox] [CLEANUP] paste

Rob Landley rob at landley.net
Fri Jul 19 00:15:53 PDT 2013


Description of the paste cleanup that accidentally got checked in  
during commit 944 because hg import -f has side effects.

   http://landley.net/hg/toybox/rev/944

(Yes, that commit description is actively ironic. Scroll down to  
patch.c.)

Minor tweaks to the first loop: if -d isn't set then TT.delim will be  
NULL, and as long as that's the value we're playing with anyway the  
optimizer tends to do better if it can load that into a register and  
perform several operations on it, rather than load/mask/test an  
otherwise unrelated integer. So yank the test and instead start the  
loop with p = TT.delim ? TT.delim : "\t"; (In theory gcc offers an "x ?  
: y;" syntax where the blank between ? : substitutes in the result of  
the test again, so I _could_ say p = TT.delim ? : "\t";. In practice,  
c99 never picked that one up so I feel uncomfortable using it.)

I was on the fence about moving the assignment from the else case into  
the if, but did it anyway, so:

-    if (*p == '\\') {
-    } else *buf = *p;

becomes:

+  if ((*buf = *p) == '\\') {
+  }

This means we can load *p into a register once and then perform both  
the assignment and the test from that register. The initial write only  
goes into L1 cache and if it gets overwritten a couple lines later we  
only have one write to main memory. In general modern processors have a  
better time eliding writes than predicting branches. A lot of stuff  
like:

   if (x) y = 1;
   else y = 2;

Is going to get rewritten by the optimizer into:

   y = 2;
   if (x) y = 1;

So the "if (x)" can become a conditional assignment instruction instead  
of a branch instruction. (This avoids bubbles in the instruction  
pipeline due to branch mispredicts, and/or wasted work from speculative  
execution on mispredicted branches eating your battery power and then  
being discarded.)

I was on the fence because this is phrasing easier for the computer to  
do, and less repetitive (only one mention of *p instead of 2), but  
slightly less easy to read.

Turned the p++ on its own line into a ++p in the stridx.

Changed the error message to say "bad -d" instead of "bad delimiter" so  
it's one less word to translate to other languages.

Moved the "// Sequential" comment from on the same line to the line  
before. I know it uses more vertical screen space and I ordinarily try  
to conserve that, but mixing comment and code on the same line is  
something I only really do describing data values. (Describing  
structure members, table entries, or magic constants when a symbol name  
is not conveniently available in a header.) This comment is more "what  
does this block of code do", and that I prefer to put on its own line.

This loop is what loopfiles() is for. That handles the aliasing of "-"  
with stdin and everything.

(No, I'm probably not done cleaning up paste, this was just my initial  
pass that got accidentally checked in.)

Rob
 1374218153.0


More information about the Toybox mailing list