[Toybox] Fwd: tsort regrets

Fri Nov 3 12:34:46 PDT 2023

Got permission to forward it to the list...

-------- Forwarded Message --------
Subject: Re: tsort regrets
Date: Thu, 2 Nov 2023 13:47:05 -0500
From: Rob Landley <rob at landley.net>
To: Ray Gardner <raygard at gmail.com>

On 11/1/23 10:57, Ray Gardner wrote:
> Hi Rob,
> 
> I hope you are feeling better today.

Better is a relative term. So much coughing...

> I apologize for sending the tsort implementation. As I said, I
> couldn't help myself, I wanted to see how it would turn out. Tell me
> you've never had that feeling about programming something. It had
> nothing to do with you "didn't do it fast enough."

I'm not annoyed at you, I'm annoyed at _me_. You did good work, I'm just not in
a position to make proper design decisions about it.

You pulled in hash table code from python, which (assuming there's no license
entanglement which I THINK is fine here) in its original context has 200 lines
of description of its hash algorithm choice and performance tuning, and I'm
going "is this generic? Should it go in lib? Should it be used in the tar
dev/ino->name mapping for hardlink detection?")

This rewrite is in pursuit of performance I haven't got real world tests for.
For tsort I was thinking maybe gluing together a bunch of different seq stride
outputs?

$ for i in {1..1000}; do X=$(seq 1 $i 10000); echo "$X"; (($(wc -l <<< "$X")&1))
&& echo 10000; done | tsort

And similar... Except just running the above to wc -l instead of tsort takes 3.6
seconds on my laptop (for reasons that are probably more of this stdio output
buffer size nonsense. And none of that's toybox, it's all debian commands
running there) and I don't like individual tests taking longer than a second
because there's a LOT of them and they don't run in parallel because the phrase
"nondeterministic test suite" is kill-it-with-fire territory...

> I figured you have your plate plenty full to overflowing with your
> todo list(s), the shell, the orange pi, etc., and you'd be happy to
> have that implementation. But I should have tried to see things from
> your point of view, that maybe you also wanted to try re-implementing
> tsort.

I do want the implementation. I started reading through yours yesterday (hence
the hash table comment). I've largely left hash tables as a todo item because
their corner case behavior is unpredictable with hash collisions and so on: I am
not up for _tuning_ a hash algorithm or table size in the absence of real world
data to fling through it. I was leaving all the performance questions until
after I had at least the linux from scratch builds going through it, but the
AOSP guys started giving feedback on performance in _their_ build, which is
read-only from my perspective (can't personally reproduce this to examine
further, but they say this is the problem so...)

You did nothing wrong. I'm venting at _myself_. I said it was unfair when I
wrote it, and thought about deleting it instead of posting but I'm trying to
blog honestly. I still think even messing with yes is probably premature
optimization, but the generic problem of output cache to batch writes is one of
the classic big ones (as the maddog anecdote shows). It came back as an android
pain point in grep, and got spread out from there but so far it's ONLY been seen
in grep. (There was a pain point in sed but it was big s///g substitutions in
giant strings, solved differently.)

> Please set it aside, don't look at it, or delete it if you wish. Maybe
> I'll put a standalone version on my site when I have time to write the
> code to handle non-seekable input.

It wants seekable input? I hadn't gotten that far...

> My attempts to contribute to toybox have had a cool reception so far.
> I hope I can do better next time.

It's me, not you, for which I apologize. I've had a rough month health-wise
(nothing serious, just darn lingering, I suspect I got covid again and for the
past week definitely some kind of cold on top of it that won't go away) and it's
been affecting my programming focus.

I don't want to discourage you, and I definitely plan to use at least your tsort
explanation if not the new implementation (haven't decided on the latter yet,
wandered off into reading the python code it referenced yesterday, and then
fixed something else)...

> Ray

Rob

P.S. Do you mind if I post this reply to the list?