[Toybox] [CLEANUP] the rest of uuencode.c

Rob Landley rob at landley.net
Tue Apr 16 01:39:11 PDT 2013


The rest of the uuencode cleanup was a big opportunity to inline stuff,  
put common code next to each other, and then eliminate duplication.  
Alas, if you edit a file and move it in the same commit, mercurial  
loses track of it. We can beat the patch out of it via:

diff -u <(hg cat -r 830 toys/pending/uuencode.c) <(hg cat -r 831  
toys/posix/uuencode.c)

Unfortunately that's not a huge help because I did too much in one  
commit. But let's go through the theory. We had these six functions:

-static void uuencode_b64_3bytes(const char *in, int bytes)
-static void uuencode_b64_line(const char *in, int len)
-static void uuencode_b64(int fd, const char *name)
-static void uuencode_uu_3bytes(const char *in)
-static void uuencode_uu_line(char *in, int len)
-static void uuencode_uu(int fd, const char *name)

This is two parallel callchains of three functions each. Each function  
is only called from one place: uuencode_X calls uuencode_X_line calls  
uuencode_x_3bytes. So I did two things in parallel:

1) inline the body of each function at the place it's called from.

2) Merge together the two parallel call chains into one set of  
functions.

To start with, uuencode_uu() and uuencode_b64() are basically the same  
function:

-static void uuencode_b64(int fd, const char *name)
-{
-  int len;
-  char buf[(76/4)*3];
-
-  xprintf("begin-base64 744 %s\n",name);
-  do {
-    len = xread(fd, buf, sizeof(buf));
-
-    uuencode_b64_line(buf, len);
-  } while (len > 0);
-  xprintf("====\n");
-}

-static void uuencode_uu(int fd, const char *name)
-{
-  int len;
-  char buf[45];
-
-  xprintf("begin 744 %s\n",name);
-  do {
-    len = xread(fd, buf, 45);
-    uuencode_uu_line(buf, len);
-  } while (len > 0);
-  xprintf("end\n");
-}

What you xprintf() before and after varies, the length of buf varies,  
and the function you call varies. But the rest is identical.

The distinguishing factor between the two code paths is the -m flag, so  
I made a local variable

int m = toys.optflags & FLAG_m;

And then did things like:

   xprintf("begin%s 744 %s\n", m ? "-base64" : "", name);
    if (!(i = xread(fd, buf, m ? sizeof(buf) : 45))) break;
    xputs(m ? "====" : "end");

Note that I declared buf[] as the larger of the two sizes and then did  
a ? : operator in the actual read to set the length. As long as the  
buffer's big enough, we're good.

Similarly these two functions:

-static void uuencode_b64_line(const char *in, int len)
-{
-  while (len > 0) {
-    uuencode_b64_3bytes(in, len < 3 ? len : 3);
-    in += 3;
-    len -= 3;
-  };
-  xprintf("\n");
-}

-static void uuencode_uu_line(char *in, int len)
-{
-  if (len > 0) {
-    int i;
-
-    xputc(len+32);
-    for (i = 0; i < len; i += 3) uuencode_uu_3bytes(in+i);
-  }
-  xprintf("\n");
-}

Are basically the same thing. The b64 variant gets two parameters  
because it has special "short write" behavior that traditional uuencode  
doesn't (which uses the length at the start of the line for that and  
then always prints out a full block size). But feeding in a length that  
doesn't get used isn't a big deal. So I inlined _that_ at the

Finally, the two encoding functions:

-static void uuencode_b64_3bytes(const char *in, int bytes)
-{
-  unsigned int i, x;
-
-  for (i = x = 0; i<4; i++) {
-    if (i < bytes) x |= (in[i] & 0x0ff) << (8*(2-i));
-    xputc(i > bytes ? '=' : toybuf[(x>>((3-i)*6)) & 0x3f]);
-  }
-}

-static void uuencode_uu_3bytes(const char *in)
-{
-  unsigned int i, x = 0;
-
-  for (i = 0; i <= 2; i++) x |= (in[i] & 0x0ff) << (8*(2-i));
-  for (i = 0; i <= 3; i++) xputc(32 + ((x >> (6*(3-i))) & 0x3f));
-}

The "x |= (in[i] & 0xff << (8*2-i))" code to load the next byte into  
the integer is identical, as is the "(x>>(6*(3-i))) & 0x3f)" to get the  
next 6 bits out. (At least mathematically speaking it is.) The  
difference is that one adds 32 to the output and prints it  
unconditionally, the other transforms it via the toybuf[] lookup table  
and then substitutes a '=' if we're over the output length.

Collectively, with a lot of banging on it with a hammer, this all  
collapses down to:

+  xprintf("begin%s 744 %s\n", m ? "-base64" : "", name);
+  for (;;) {
+    char *in;
+
+    if (!(i = xread(fd, buf, m ? sizeof(buf) : 45))) break;
+
+    if (!m) xputc(i+32);
+    in = buf;
+
+    for (in = buf; in-buf < i; ) {
+      int j, x, bytes = i - (in-buf);
+
+      if (bytes > 3) bytes = 3;
+
+      for (j = x = 0; j<4; j++) {
+        int out;
+
+        if (j < bytes) x |= (*(in++) & 0x0ff) << (8*(2-j));
+        out = (x>>((3-j)*6)) & 0x3f;
+        xputc(m ? j > bytes ? '=' : toybuf[out] : out + 0x20);
+      }
+    }
+    xputc('\n');
+  }
+  xputs(m ? "====" : "end");

Yes, that's three nested for loops (corresponding to the three  
functions we inlined). The steps are:

0) Write the appropriate start of file string for the mode we're in.

1) loop until xread() returns 0, each time reading in the next line's  
worth of data.

2) For non-m mode, output the length character for the new line of  
output.

3) Loop through all the data in this line, grabbing the next 3 byte  
chunk of input (or however much we've got left).

4) Loop through the 4 bytes of output this chunk produces:

4A) if there's input left, load all 8 bits of it into the register.
4B) grab the next 6 bits of output.
4C) write the output byte

5) newline at the end of the line

6) Appropriate end of file string

yes, 4C is totally cheating with two ? : operators in the same line. I  
parenthesized them in a later commit, but it's just doing this:

   if (m) {
     if (EOF) xputc('=');
     else xputc(toybuf[out]);
   } else xputc(out+0x20);

except the ? : allows me to move the xputc() out around the whole thing  
and just return the character to write.

Later on (commit 853), I added a _third_ ? : to that line, conceptually  
in that third xputc:

   (out ? out + 0x20 : 0x60)

The problem is 0+0x20 is ascii 32, I.E. the space character. So  
uuencode according to the standard is full of significant whitespace,  
which mail clients eat for lunch. Since the decoders are all taking the  
bottom 6 bits (&0x3f) I added 0x40 to make it a backquote so it's not  
whitespace. Even though this isn't what posix says, the other encoders  
all seem to do that too. (I.E. when posix is wrong, everybody ignores  
it.)

Rob


More information about the Toybox mailing list