[Toybox] Inconsistent gnu crap.

Rob Landley rob at landley.net
Mon Apr 13 20:28:35 PDT 2020


On 4/13/20 7:38 PM, enh wrote:
> i was shocked when i looked at this back when i wanted to get all the
> toys onto the same implementation --- testing the various GNU tools
> that support escapes, they all seemed to support slightly different
> subsets, with slightly different interpretations of corner cases
> (including some that preferred to report an error rather than "take a
> side").

Earlier today I implemented unescape2() that advances the pointer by the amount
consumed, and converted echo and find -printf over to the new plumbing.

I went on this tangent because $'blah\n' was right before ${blah/abc/def} in the
if/else staircase and I didn't want to leave myself another todo item, but the
man page for $'' has yet more features, so I reopened this can of worms.

I have _not_ converted sed, paste, patch, or printf yet. (And in the case of
paste I'm not 100% sure I should, but I don't use that command regularly and
need to work out what it's using it for.)

The fiddly part is the new one can return a _wide_ character, which then gets
output as a utf8 sequence, due to the .

But hey:

./echo -e '\ufb3b'
כּ

I need to write documentation about these escapes somewhere. Probably in "help
busybox".

> if we were starting again from scratch, i'd definitely favor
> consistency but as it is, i have the same fear as you that there's
> stuff out there relying on all these ugly corner cases.

I added a second argument to unescape2 so it skips the initial 0 for echo but
not for find -printf, because the existing test cases passing is not negotiable.
Also:

$ ./echo -e '\ux \xu \z'
\ux \xu \z

It's reasonably lenient about passing through whatever it didn't understand
unmodified.

That said, if something does break, we need to add a test for it. :)

> did busybox try to unify the various users of escaping?

When I was there I was doing this sort of cleanup, but I handed over the reigns
in 2007 and my last commit to that project was in 2011. So I doubt it, but not
because there was a decision not to. But let's see...

Oh hey, they have a bb_process_escape_sequence which is used by echo.c:

                                        const char *z = arg;
                                        c = bb_process_escape_sequence(&z);
                                        arg = z;

And I'm just going to stop trying to understand what they're doing at that
point. I have no idea why the temporary variable exists, and I'm not asking. (I
checked and arg already _was_ const? Then I closed the file and backed away.)

Their find doesn't support -printf at all, and grep did not find the function in
sed. It is in their awk, ash, printf, and tr. I already listed printf, tr is inn
pending, awk hasn't been started yet, and toysh is what sent me down this
rathole. :)

Oh, if you mean the \0## vs \## thing:

$ busybox echo -e '\072'
:
$ busybox echo -e '\72'
:

It does _not_ require the leading zero.

$ echo -e '\72'
\72

But bash does.

$ ./echo -e '\72'
\72

And oddly enough:

$ /bin/echo -e '\72'
:

Currently toybox does because bash did, because bash did and I copied what bash
was doing when implementing the previous toybox echo plumbing, and I kept it the
same while moving to the new plumbing that is otherwise doing what bash $'' does.

> (the best alternative i could think of was One True unescape that took
> a bunch of flags for all the variants. but even getting a complete
> list of all the variants seemed like enough of a challenge that i just
> moved on to other stuff instead.)

If you could _document_ the variants, that would be really cool. (Knowing is
half the battle. The other half is blue lasers.)

Right now, I just switched over "echo" and "find -printf". They pass their test
suites, and I'd give it a while to see if anybody complains before converting
anything else.

I _do_ note that both users have this sort of wrapper:

echo:
      if (*c == '\\' && c[1] == 'c') return;
      if ((u = unescape2(&c, 1))<128) putchar(u);
      else printf("%.*s", (int)wcrtomb(out, u, 0), out);

find:
      if (fmt[1] == 'c') break;
      if ((u = unescape2(&fmt, 0))<128) putchar(u);
      else printf("%.*s", (int)wcrtomb(buf, u, 0), buf);

which seems like it could be shoved into the function somehow (maybe
unescape2(&c, 2)?) but again: lemme finish toysh and give the new plumbing time
to settle.

Rob



More information about the Toybox mailing list