[Toybox] Inconsistent gnu crap.
Rob Landley
rob at landley.net
Mon Apr 13 20:28:35 PDT 2020
On 4/13/20 7:38 PM, enh wrote:
> i was shocked when i looked at this back when i wanted to get all the
> toys onto the same implementation --- testing the various GNU tools
> that support escapes, they all seemed to support slightly different
> subsets, with slightly different interpretations of corner cases
> (including some that preferred to report an error rather than "take a
> side").
Earlier today I implemented unescape2() that advances the pointer by the amount
consumed, and converted echo and find -printf over to the new plumbing.
I went on this tangent because $'blah\n' was right before ${blah/abc/def} in the
if/else staircase and I didn't want to leave myself another todo item, but the
man page for $'' has yet more features, so I reopened this can of worms.
I have _not_ converted sed, paste, patch, or printf yet. (And in the case of
paste I'm not 100% sure I should, but I don't use that command regularly and
need to work out what it's using it for.)
The fiddly part is the new one can return a _wide_ character, which then gets
output as a utf8 sequence, due to the .
But hey:
./echo -e '\ufb3b'
כּ
I need to write documentation about these escapes somewhere. Probably in "help
busybox".
> if we were starting again from scratch, i'd definitely favor
> consistency but as it is, i have the same fear as you that there's
> stuff out there relying on all these ugly corner cases.
I added a second argument to unescape2 so it skips the initial 0 for echo but
not for find -printf, because the existing test cases passing is not negotiable.
Also:
$ ./echo -e '\ux \xu \z'
\ux \xu \z
It's reasonably lenient about passing through whatever it didn't understand
unmodified.
That said, if something does break, we need to add a test for it. :)
> did busybox try to unify the various users of escaping?
When I was there I was doing this sort of cleanup, but I handed over the reigns
in 2007 and my last commit to that project was in 2011. So I doubt it, but not
because there was a decision not to. But let's see...
Oh hey, they have a bb_process_escape_sequence which is used by echo.c:
const char *z = arg;
c = bb_process_escape_sequence(&z);
arg = z;
And I'm just going to stop trying to understand what they're doing at that
point. I have no idea why the temporary variable exists, and I'm not asking. (I
checked and arg already _was_ const? Then I closed the file and backed away.)
Their find doesn't support -printf at all, and grep did not find the function in
sed. It is in their awk, ash, printf, and tr. I already listed printf, tr is inn
pending, awk hasn't been started yet, and toysh is what sent me down this
rathole. :)
Oh, if you mean the \0## vs \## thing:
$ busybox echo -e '\072'
:
$ busybox echo -e '\72'
:
It does _not_ require the leading zero.
$ echo -e '\72'
\72
But bash does.
$ ./echo -e '\72'
\72
And oddly enough:
$ /bin/echo -e '\72'
:
Currently toybox does because bash did, because bash did and I copied what bash
was doing when implementing the previous toybox echo plumbing, and I kept it the
same while moving to the new plumbing that is otherwise doing what bash $'' does.
> (the best alternative i could think of was One True unescape that took
> a bunch of flags for all the variants. but even getting a complete
> list of all the variants seemed like enough of a challenge that i just
> moved on to other stuff instead.)
If you could _document_ the variants, that would be really cool. (Knowing is
half the battle. The other half is blue lasers.)
Right now, I just switched over "echo" and "find -printf". They pass their test
suites, and I'd give it a while to see if anybody complains before converting
anything else.
I _do_ note that both users have this sort of wrapper:
echo:
if (*c == '\\' && c[1] == 'c') return;
if ((u = unescape2(&c, 1))<128) putchar(u);
else printf("%.*s", (int)wcrtomb(out, u, 0), out);
find:
if (fmt[1] == 'c') break;
if ((u = unescape2(&fmt, 0))<128) putchar(u);
else printf("%.*s", (int)wcrtomb(buf, u, 0), buf);
which seems like it could be shoved into the function somehow (maybe
unescape2(&c, 2)?) but again: lemme finish toysh and give the new plumbing time
to settle.
Rob
More information about the Toybox
mailing list