[Toybox] [PATCH] tr: fix pathological flushing.

Sat Dec 5 11:32:57 PST 2020

On Sat, Dec 5, 2020 at 3:38 AM Rob Landley <rob at landley.net> wrote:

> On 12/4/20 1:58 PM, enh via Toybox wrote:
> > The AOSP build doesn't use tr (or anything that's still in pending), but
> > the kernel folks have been more aggressive. They found that tr's
> > pathological flushing was adding minutes to their build times.
>
> +  while ((n = read(0, toybuf, sizeof(toybuf)))) {
> +    if (!FLAG(d) && !FLAG(s)) {
> +      for (dst = 0; dst < n; dst++) toybuf[dst] = TT.map[toybuf[dst]];
>
> And when read returns -1 what happens?
>

enh@ realizes he thought he'd changed this to xread and sends a patch,
that's what :-)

(patch sent.)

it's a pity there's no /dev/eio or whatever to test these cases explicitly.

> Sounds like time for me to cleanup and promote this command. (Let's see,
> my todo
> entry for tr is: "TODO: -t (truncate) -a (ascii)" and I do NOT remember
> what -a
> is but I think I was going to make tr support utf8? Which required a
> redesign.
> Possibly it should be -U to support utf8 instead but everything ELSE
> supports
> utf8 by default, because planet. Hmmm...)
>

aye, but i don't think anyone makes you _pay_ for unicode (if you ignore
Plan9 and Inferno, both of which did, but you'd have to go a long way to
find anyone else who's ever used either of those).

but more than that i'm not sure anyone else _has_ done this (if you ignore
Plan9 and Inferno, both of which did, but you'd have to go a long way to
find anyone else who's ever used either of those):

~$ echo '동해 물과 백두산이' | tr '동' '東'
東해 欼과 氱摐산이
~$ echo '동해 물과 백두산이' | busybox tr '동' '東'
東해 欼과 氱摐산이
~$

(you'd expect to see '東해 물과 백두산이' if this actually worked: change from the
hangeul for "east" to the hanja for "east" but leave everything else alone.)

xxd confirms it's just screwing up bytes:

~$ echo '동해 물과 백두산이' |  xxd
00000000: eb8f 99ed 95b4 20eb acbc eab3 bc20 ebb0  ...... ...... ..
00000010: b1eb 9190 ec82 b0ec 9db4 0a              ...........
~$ echo '동해 물과 백두산이' | tr '동' '東' | xxd
00000000: e69d b1ed 95b4 20e6 acbc eab3 bc20 e6b0  ...... ...... ..
00000010: b1e6 9190 ec82 b0ec 9db4 0a              ...........

동 is 0xeb 0x8f 0x99 and the equivalent Chinese character 東 is 0xe6 0x9d
0xb1, and you can see from those hex dumps that what tr did was replace
0xeb with 0xe6, 0x8f with 0x9d, and 0x99 with 0xb1. this did the right
thing by accident for 동 but mangled other characters that contained any of
those bytes.

and although philosophically i'm usually on board with your "all times are
ISO, all text is UTF8", i'm really not sure it makes much sense to even
*try* to support this in tr. why? because i think it opens the i18n/l11n
can of worms again. if you think about non-binary uses of tr, they're often
stuff like "convert to all caps", but are we going to get that right for
Turkish/Azeri dotted/dotless 'i's, Greek final/non-final sigma, etc? are we
going to have tr's behavior then depend on your locale? are we going to
deal with combining characters too, or do i have to specify all the ways
you can write "ö" to get "Freude, schöner Götterfunken" right (because
without a hex dump, neither you nor i know whether those two 'ö's were
encoded the same way)?

amusingly, the Plan9 man page only gave one example of using tr(1), and it
was converting ASCII upper/lower. so i don't think they had any _use_ for
it either, they just wrote everything in terms of runes.

personally i'd s/characters/bytes/ in the docs and call it done. we can
"fix" it if/when anyone has an actual practical need for it.

> > Just removing the fflush() made tr significantly faster for my trivial
> > test, but still slow, with all the time going into stdio.
>
> Single byte writes suck no matter how you slice 'em.
>
> > Rewriting the
> > loop to modify toybuf in place and then do one write per read made most
> > of the difference, but special-casing the "neither -d nor -s" case made
> > a measurable difference too on a Xeon.
>
> Sigh. I have this patch in my tree, which I haven't applied yet because I
> don't
> have the regression test setup to see what it would slow down in the AOSP
> build:
>
> --- a/main.c
> +++ b/main.c
> @@ -103,7 +103,7 @@ void toy_singleinit(struct toy_list *which, char
> *argv[])
>        // that choose non-UTF-8 locales. macOS doesn't support C.UTF-8
> though.
>        if (!setlocale(LC_CTYPE, "C.UTF-8")) setlocale(LC_CTYPE, "");
>      }
> -    setlinebuf(stdout);
> +    setvbuf(stdout, 0, (which->flags & TOYFLAG_LINEBUF) ? _IOLBF :
> _IONBF, 0);
>    }
>  }
>
> --- a/lib/toyflags.h
> +++ b/lib/toyflags.h
> @@ -32,6 +32,9 @@
>  // Suppress default --help processing
>  #define TOYFLAG_NOHELP   (1<<10)
>
> +// Line buffered stdout
> +#define TOYFLAG_LINEBUF  (1<<11)
> +
>  // Error code to return if argument parsing fails (default 1)
>  #define TOYFLAG_ARGFAIL(x) (x<<24)
>
> --- a/toys/posix/grep.c
> +++ b/toys/posix/grep.c
> @@ -10,9 +10,9 @@
>  * echo hello | grep -f </dev/null
>  *
>
> -USE_GREP(NEWTOY(grep,
>
> "(line-buffered)(color):;(exclude-dir)*S(exclude)*M(include)*ZzEFHIab(byte-offset)h(no-filename)ino(only-matching)rRsvwcl(files-with-matches)q(quiet)(silent)e*f*C#B#A#m#x[!wx][!EFw]",
> TOYFLAG_BIN|TOYFLAG_ARGFAIL(2)))
> -USE_EGREP(OLDTOY(egrep, grep, TOYFLAG_BIN|TOYFLAG_ARGFAIL(2)))
> -USE_FGREP(OLDTOY(fgrep, grep, TOYFLAG_BIN|TOYFLAG_ARGFAIL(2)))
> +USE_GREP(NEWTOY(grep,
>
> "(line-buffered)(color):;(exclude-dir)*S(exclude)*M(include)*ZzEFHIab(byte-offset)h(no-filename)ino(only-matching)rRsvwcl(files-with-matches)q(quiet)(silent)e*f*C#B#A#m#x[!wx][!EFw]",
> TOYFLAG_BIN|TOYFLAG_ARGFAIL(2)|TOYFLAG_LINEBUF))
> +USE_EGREP(OLDTOY(egrep, grep,
> TOYFLAG_BIN|TOYFLAG_ARGFAIL(2)|TOYFLAG_LINEBUF))
> +USE_FGREP(OLDTOY(fgrep, grep,
> TOYFLAG_BIN|TOYFLAG_ARGFAIL(2)|TOYFLAG_LINEBUF))
>
>  config GREP
>    bool "grep"
>
> I started down that path because changing everything to line buffered by
> default
> is why we had to add an extra flush to passwd() and broke "watch". (The
> second
> is cosmetic breakage, but still...)
>
> I would very much LIKE to move in the direction of knowing what commands
> actually need (having line buffered be something we switch on in commands
> rather
> than switch on everywhere by default), but I don't want to surprise you by
> pulling the trigger on that one without some sort of coordination.
>
> (Also, line buffering sucks because it'll flush at the buffer size anyway
> so
> you're not guaranteed to get a full line of contiguous output. What it
> REALLY
> wants is nagle's algorithm for stdout but no libc ever bothered to
> IMPLEMENT it,
> possibly because of the runtime expense... Ahem. My point is commands
> should
> probably do sane output blocking on their own.)
>

iirc line buffering was the compromise we arrived at that neither of us
really likes, because i just want full buffering (because although i know
the problem you're trying to solve, i've lived with it for decades without
actually feeling like it's ever hurt me, and just consider it "working as
intended") while you want a kind of buffering that doesn't actually exist
(and would be non-trivial to make exist).

and sadly my list of "anything that can be used in a pipeline during a
build" and your list of "anything that can be used in a pipeline
interactively" [aiui] has a lot of overlap.

speaking of buffering, one argument i've been avoiding for years is that
toybuf should be (say) 64KiB rather than just 4KiB. no-one's been asking me
for anything more than "in the same ballpark" performance, but in many of
the cases i've looked at, our remaining delta relative to GNU is our much
smaller buffer. (though as you'd expect, judging from straces over the
years, they seem to be wildly inconsistent there, and i've seen everything
from 8KiB for something like tr(1) to 128KiB for cat(1).)

> Rob
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20201205/005b5997/attachment-0001.htm>