[Toybox] tsort

Wed Oct 11 14:52:18 PDT 2023

On Wed, Oct 11, 2023 at 2:26 PM Ray Gardner <raygard at gmail.com> wrote:
>
> On Oct 11, 2023 at 2:42 AM Rob Landley <rob at landley.net> wrote:
> > On 10/10/23 17:46, Ray Gardner wrote:
> > > After seeing your comments about tsort in your blog, I thought about trying
> > > to implement Knuth's algorithm,
> >
> > Is that the recommended algorithm here?
>
> https://man.openbsd.org/tsort mentions Knuth's TAOCP Vol 1, and under
> "History" says: "This tsort command was completely rewritten by Marc Espie for
> OpenBSD, to finally use the well-known optimal algorithms for topological
> sorting." So I think that's a reasonable guess. Not sure why it says
> algorithms (plural), TAOCP has only one. (That's also the most informative
> tsort man page I've seen.)

there's a big comment in the source:

https://github.com/openbsd/src/blob/c10f63ba96f890ce7ce3d41cb55f23bd30c228b3/usr.bin/tsort/tsort.c#L31

interestingly, there are literally more matches for tsort tests than
there are uses (though at least one is in a .mk fragment that's
probably widely used). seems like openbsd does use tsort with ar.

> Taking your response out of order a bit:
>
> > > I can post here about how I tested, if you're interested.
> >
> > How you tested what? That it's shorter? ...
>
> How I tested that your tsort measures roughly O(N^2). And that it did a correct
> sort, for the cases I tested.
>
> > (I might still be just a little bit cheesed about this. Normal I vent the above
> > sort of thing into the blog instead of here, but you asked. :)
>
> I don't recall asking. I only mentioned Knuth because you were talking about
> implementing tsort on your blog and I thought I might try to do it myself, to
> see if I could come up with anything useful. I put it aside when you got yours
> working. I mentioned the N^2 thing because you mentioned O(N^2) in your Sept 7
> blog so I thought you were hoping to avoid it.
>
> > How big an input have you fed it? Being N^2 over 10 elements is fine by me.
> ...
> > up to a million A->B pairs. ...
>
> I had to use pretty large input to see the N^2 effect. I used 100,000,
> 200,000, and 400,000 pairs, seeing that doubling the input quadrupled the time.
> I'm not trying to criticize; your tsort is quite small and I'm sure a
> Knuth-based algo would be lots longer. Just thought you'd like to know.

(the openbsd one is 1kloc.)

> The Knuth algo is roughly linear time.
>
> GNU tsort took (on 400,000 pairs):
> real    0m1.362s user    0m0.469s sys     0m0.048s
> Your tsort takes
> real    0m50.051s user    0m50.027s sys     0m0.011s
>
> On 1,000,000 pairs, GNU tsort takes 3.458s, yours takes 7m29.647s (real).
>
> > But sure, if you've got an easily explicable better algorithm I'm all ears.
>
> I was really not trying to make a big deal about the algo or the run time! I
> just thought you might find it interesting. And I can still take a shot at a
> Knuth-based version if you're interested.
>
> What my post was trying to do was suggest what I thought was a clearer
> way to handle the "search the second element of array by first element of key"
> thing without resorting to a "trick".
>
> > Yes, I considered declaring "char *sadness[3], **keep = sadness+1" but the
> > reason I _didn't_ is the compiler is WRONG.
>
> At the risk of annoying you further, I'll mention that forcing a pointer to
> point below the base of an array is undefined behavior, and has been
> since ANSI C89. Even if it's not dereferenced. Modern compilers can do weird
> things with UB, as you know.

(and as long as someone like rob isn't on WG14 to push back, they'll
keep winning these fights to make the language worse.)

> Of course the LP64 trick works. I like the sadness thing better, but I think
> the additional keepfake[] array is better yet, because IMO it's clearer, and
> with the **keep=sadness+1 you have to change all the sizeof(keep) instances.
> So I just tossed that out there to see if you had considered it to avoid the
> "trick".
>
> Ray