[Toybox] tsort

Ray Gardner raygard at gmail.com
Wed Oct 11 14:25:37 PDT 2023


On Oct 11, 2023 at 2:42 AM Rob Landley <rob at landley.net> wrote:
> On 10/10/23 17:46, Ray Gardner wrote:
> > After seeing your comments about tsort in your blog, I thought about
trying
> > to implement Knuth's algorithm,
>
> Is that the recommended algorithm here?

https://man.openbsd.org/tsort mentions Knuth's TAOCP Vol 1, and under
"History" says: "This tsort command was completely rewritten by Marc Espie
for
OpenBSD, to finally use the well-known optimal algorithms for topological
sorting." So I think that's a reasonable guess. Not sure why it says
algorithms (plural), TAOCP has only one. (That's also the most informative
tsort man page I've seen.)

Taking your response out of order a bit:

> > I can post here about how I tested, if you're interested.
>
> How you tested what? That it's shorter? ...

How I tested that your tsort measures roughly O(N^2). And that it did a
correct
sort, for the cases I tested.

> (I might still be just a little bit cheesed about this. Normal I vent the
above
> sort of thing into the blog instead of here, but you asked. :)

I don't recall asking. I only mentioned Knuth because you were talking about
implementing tsort on your blog and I thought I might try to do it myself,
to
see if I could come up with anything useful. I put it aside when you got
yours
working. I mentioned the N^2 thing because you mentioned O(N^2) in your
Sept 7
blog so I thought you were hoping to avoid it.

> How big an input have you fed it? Being N^2 over 10 elements is fine by
me.
...
> up to a million A->B pairs. ...

I had to use pretty large input to see the N^2 effect. I used 100,000,
200,000, and 400,000 pairs, seeing that doubling the input quadrupled the
time.
I'm not trying to criticize; your tsort is quite small and I'm sure a
Knuth-based algo would be lots longer. Just thought you'd like to know.
The Knuth algo is roughly linear time.

GNU tsort took (on 400,000 pairs):
real    0m1.362s user    0m0.469s sys     0m0.048s
Your tsort takes
real    0m50.051s user    0m50.027s sys     0m0.011s

On 1,000,000 pairs, GNU tsort takes 3.458s, yours takes 7m29.647s (real).

> But sure, if you've got an easily explicable better algorithm I'm all
ears.

I was really not trying to make a big deal about the algo or the run time! I
just thought you might find it interesting. And I can still take a shot at a
Knuth-based version if you're interested.

What my post was trying to do was suggest what I thought was a clearer
way to handle the "search the second element of array by first element of
key"
thing without resorting to a "trick".

> Yes, I considered declaring "char *sadness[3], **keep = sadness+1" but the
> reason I _didn't_ is the compiler is WRONG.

At the risk of annoying you further, I'll mention that forcing a pointer to
point below the base of an array is undefined behavior, and has been
since ANSI C89. Even if it's not dereferenced. Modern compilers can do weird
things with UB, as you know.

Of course the LP64 trick works. I like the sadness thing better, but I think
the additional keepfake[] array is better yet, because IMO it's clearer, and
with the **keep=sadness+1 you have to change all the sizeof(keep) instances.
So I just tossed that out there to see if you had considered it to avoid the
"trick".

Ray
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20231011/9f729586/attachment.htm>


More information about the Toybox mailing list