[Toybox] [PATCH] seq: fast path for integers.

Sat Dec 12 04:07:55 PST 2020

On 12/11/20 1:26 PM, enh via Toybox wrote:
> The question "why do we have setlinebuf(stdout)?" came up on the list
> recently, and by strange coincidence here it is causing more trouble:
> performance rather than usability this time. Just removing the line
> buffering alone results in a significant speedup.

I have a pending change to NOT line buffer by default, and add a TOYFLAG_LINEBUF
to request that in a command's toyflags. I haven't pulled the trigger on it
because I don't want to screw up AOSP performance without warning you, but it
seems the right thing to do?

If your patch is yanking setlinebuf() globally, then my patch seems the right
way to do that. :)

> The switch to use %lld rather than %f in obvious cases is another major
> speedup.

I had not realized this command was performance critical. (If so, it seems like
BLOCK buffering is what we want, and possibly the easy thing to do there is use
toybuf as the output buffer...)

> Calling strlen() once on TT.s and using fwrite() rather than using
> printf() every time wouldn't be noticeable if the other two issues
> weren't fixed, but is a decent chunk of the remaining time at this point.

Lemme take a look...

$ time toybox seq 1 10000000 > /dev/null
real	0m7.484s
user	0m5.008s
sys	0m2.420s
$ time seq 1 10000000 > /dev/null
real	0m0.139s
user	0m0.124s
sys	0m0.012s

Yeah, that's significant. But your patch breaks a test:

FAIL: seq padding
echo -ne '' | seq -s, -w -2 19 120

--- expected	2020-12-11 23:22:18.577933444 -0600
+++ actual	2020-12-11 23:22:18.577933444 -0600
@@ -1 +1 @@
--02,017,036,055,074,093,112
+-2,17,36,55,74,93,112

> +  // Can we avoid %f and just use printf("%lld") for a 10x speedup?
> +  incrementl = increment;
> +  firstl = first;
> +  lastl = last;
> +  easy = incrementl==increment && firstl==first && lastl==last && !FLAG(f) &&
> +    !TT.precision;

How does this differ from just !FLAG(f) && !TT.precision? What do those
assignments and comparisons do? (A range check to see if it fits in "int"?
Shouldn't 64 bit processors be just as fast with "long"?) Maybe...

$ toybox seq 1.e3 | wc -l
1000

No, that's still  integer. And debian seems to have changed the "seq 1.0e3"
upstream behavior, and seq 0 1.111111e3 10000" too. Grrr, moving target. (No,
posix never mentions ANY of this, why would it?)

> GNU seq is still *another* 10x faster on my desktop, and I did experiment
> with inlining the trivial `n /= 10` integer-to-ascii algorithm (to at
> least remove some of the remaining printf() overhead) but the savings from
> that were quite small at this point and didn't seem worth the extra
> code. And while being 100x worse was a human-noticeable thing, being
> 10x worse than "basically instant even on very large inputs" doesn't
> seem likely to be something anyone will notice (and we can worry about
> that if/when they do).

So the main bug here is that sprintf("%f") is pathologically slow? Except now
with your patch applied it's taking even longer for me? Ah, I see: I did a
different change to main() and you were depending on output to a tty defaulting
to block buffered when we didn't specify, and mine is doing "line buffer or
block buffer" depending on what the command asked for. Sigh. I should buffer
into toysh myself...

Ok, making my main() look like yours, and applying this patch, and ignoring the
test it breaks:

$ time ./seq 10000000 >/dev/null

real	0m3.442s
user	0m3.420s
sys	0m0.008s

That's still kind of sad. Hmmm. Poke poke... you know, the gnu version of this
is really BAD at this:

$ seq -w 0 .3 10
00.0
00.3
00.6
...
09.3
09.6
09.9

Right. Wrote a new integer path and made both do block output via toybuf. The
tests pass, including your new one.

Try now?

Rob